Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
Jeff King: I was really hoping to avoid getting into all of the real-world messiness that a real http client needs to deal with (as opposed to just following the standards). Yeah, I agree, you're probably fine without all this detail in over 99% of the cases where this code would ever be exposed. I'm a bit damaged after 10+ years as a web browser developer, responsible for exactly this kind of stuff... :-) -- \\// Peter - http://www.softwolves.pp.se/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
On Thu, May 22, 2014 at 08:12:58AM +0100, Peter Krefting wrote: > Kyle J. McKay: > > >I think that a strict reading of RFC 2616 allows "text/plain ; > >charset=utf-8" as well as "text/plain;charset=utf-8" and "text/plain; > >charset=utf-8". > > It does indeed, and I have seen servers send both variants, so they do need > to be catered for. > > The number of servers that would actually send the charset attribute here > (for error messages) are probably not that many. It is probably a good idea > to make the default user-configurable (I know the specs state that anything > undeclared is iso-8859-1, but the real world doesn't agree to that). I was really hoping to avoid getting into all of the real-world messiness that a real http client needs to deal with (as opposed to just following the standards). This is only used for an optional relay of error messages from the server. Most servers will send back text/html by default, and only those which specifically configure text/plain will have their messages shown. IOW, I expect this to be configured specifically for git messages, and the server admin can make sure they are following the standard and that it works with git. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
On Thu, May 22, 2014 at 12:27:38AM -0700, Kyle J. McKay wrote: > Yeah I think so too. It's probably enough though just to just strip all " " > and "\t" characters at the same time the content type is lowercased. While > that would cause invalid content types such as "text / plain" to be > recognized it would keep the rest of the code simpler. Since a producer of > an invalid content type shouldn't really be depending on any particular > behavior by a receiver of an invalid content type it's probably not an > issue. I think you have to retain whitespace between parameters. Not that I expect there to be multiple parameters in a text/plain, but it's allowed. I started doing this path of trying to produce a normalized version, but found that it was just as much code as parsing at all (and it still leaves the calling code to do its own parse). Instead, what I ended up with was just doing the parsing in http.c into nice, normalized chunks, and then the calling code can compare them with simple strcmps. I'll post the patches in a few minutes. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
On May 21, 2014, at 23:05, Jeff King wrote: On Wed, May 21, 2014 at 05:07:38PM -0700, Kyle J. McKay wrote: + p = skip_prefix(type->buf, "text/plain"); + if (!p || (*p && *p != ';')) + return 0; + + return 1; +} + I think that a strict reading of RFC 2616 allows "text/plain ; charset=utf-8" as well as "text/plain;charset=utf-8" and "text/plain; charset=utf-8". So perhaps this if line instead: + if (!p || (*p && *p != ';' && *p != ' ' && *p != '\t')) See RFC 2616 section 2.2 for the definition of linear white space (LWS) and the end of section 2.1 for the "implied *LWS" rule that allows it. You're right. There are a few exceptions in 3.7: Linear white space (LWS) MUST NOT be used between the type and subtype, nor between an attribute and its value. but it does not include between the subtype and parameter. So the "find_parameter" also needs to accept the collapsed whitespace, too, I guess. Yeah I think so too. It's probably enough though just to just strip all " " and "\t" characters at the same time the content type is lowercased. While that would cause invalid content types such as "text / plain" to be recognized it would keep the rest of the code simpler. Since a producer of an invalid content type shouldn't really be depending on any particular behavior by a receiver of an invalid content type it's probably not an issue. --Kyle -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
Kyle J. McKay: I think that a strict reading of RFC 2616 allows "text/plain ; charset=utf-8" as well as "text/plain;charset=utf-8" and "text/plain; charset=utf-8". It does indeed, and I have seen servers send both variants, so they do need to be catered for. The number of servers that would actually send the charset attribute here (for error messages) are probably not that many. It is probably a good idea to make the default user-configurable (I know the specs state that anything undeclared is iso-8859-1, but the real world doesn't agree to that). -- \\// Peter - http://www.softwolves.pp.se/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
On Wed, May 21, 2014 at 05:07:38PM -0700, Kyle J. McKay wrote: > >+p = skip_prefix(type->buf, "text/plain"); > >+if (!p || (*p && *p != ';')) > >+return 0; > >+ > >+return 1; > >+} > >+ > > I think that a strict reading of RFC 2616 allows "text/plain ; > charset=utf-8" as well as "text/plain;charset=utf-8" and "text/plain; > charset=utf-8". So perhaps this if line instead: > > + if (!p || (*p && *p != ';' && *p != ' ' && *p != '\t')) > > See RFC 2616 section 2.2 for the definition of linear white space (LWS) and > the end of section 2.1 for the "implied *LWS" rule that allows it. You're right. There are a few exceptions in 3.7: Linear white space (LWS) MUST NOT be used between the type and subtype, nor between an attribute and its value. but it does not include between the subtype and parameter. So the "find_parameter" also needs to accept the collapsed whitespace, too, I guess. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter
On May 21, 2014, at 03:33, Jeff King wrote: Commit 426e70d (remote-curl: show server content on http errors, 2013-04-05) tried to recognize text/plain content types, but fails to do so if they have any parameters. This patches makes our parsing more liberal, though we still do not do anything useful with a charset parameter. Signed-off-by: Jeff King --- remote-curl.c | 26 ++ t/lib-httpd/error.sh | 8 +++- t/t5550-http-fetch-dumb.sh | 5 + 3 files changed, 30 insertions(+), 9 deletions(-) diff --git a/remote-curl.c b/remote-curl.c index a5ab977..6d1b206 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -194,20 +194,30 @@ static void free_discovery(struct discovery *d) } } +/* + * We only show text/plain parts, as other types are likely + * to be ugly to look at on the user's terminal. + */ +static int content_type_is_terminal_friendly(struct strbuf *type) +{ + const char *p; + + p = skip_prefix(type->buf, "text/plain"); + if (!p || (*p && *p != ';')) + return 0; + + return 1; +} + I think that a strict reading of RFC 2616 allows "text/plain ; charset=utf-8" as well as "text/plain;charset=utf-8" and "text/plain; charset=utf-8". So perhaps this if line instead: + if (!p || (*p && *p != ';' && *p != ' ' && *p != '\t')) See RFC 2616 section 2.2 for the definition of linear white space (LWS) and the end of section 2.1 for the "implied *LWS" rule that allows it. static int show_http_message(struct strbuf *type, struct strbuf *msg) { const char *p, *eol; - /* -* We only show text/plain parts, as other types are likely -* to be ugly to look at on the user's terminal. -* -* TODO should handle "; charset=XXX", and re-encode into -* logoutputencoding -*/ - if (strcmp(type->buf, "text/plain")) + if (!content_type_is_terminal_friendly(type)) return -1; + /* TODO should record charset and reencode msg to logOutputEncoding */ + strbuf_trim(msg); if (!msg->len) return -1; diff --git a/t/lib-httpd/error.sh b/t/lib-httpd/error.sh index 786f281..02e80b3 100755 --- a/t/lib-httpd/error.sh +++ b/t/lib-httpd/error.sh @@ -3,6 +3,7 @@ printf "Status: 500 Intentional Breakage\n" printf "Content-Type: " +charset=iso8859-1 case "$PATH_INFO" in *html*) printf "text/html" @@ -10,8 +11,13 @@ case "$PATH_INFO" in *text*) printf "text/plain" ;; +*charset*) + printf "text/plain; charset=utf-8" + charset=utf-8 + ;; esac printf "\n" printf "\n" -printf "this is the error message\n" +printf "this is the error message\n" | +iconv -f us-ascii -t $charset diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh index 13defd3..b35b261 100755 --- a/t/t5550-http-fetch-dumb.sh +++ b/t/t5550-http-fetch-dumb.sh @@ -181,5 +181,10 @@ test_expect_success 'git client does not show html errors' ' ! grep "this is the error message" stderr ' +test_expect_success 'git client shows text/plain with a charset' ' + test_must_fail git clone "$HTTPD_URL/error/charset" 2>stderr && + grep "this is the error message" stderr +' + stop_httpd test_done -- 2.0.0.rc1.436.g03cb729 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html