Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter

2014-05-22 Thread Peter Krefting

Jeff King:


I was really hoping to avoid getting into all of the real-world
messiness that a real http client needs to deal with (as opposed to just
following the standards).


Yeah, I agree, you're probably fine without all this detail in over 
99% of the cases where this code would ever be exposed. I'm a bit 
damaged after 10+ years as a web browser developer, responsible for 
exactly this kind of stuff... :-)


--
\\// Peter - http://www.softwolves.pp.se/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter

2014-05-22 Thread Jeff King
On Thu, May 22, 2014 at 08:12:58AM +0100, Peter Krefting wrote:

> Kyle J. McKay:
> 
> >I think that a strict reading of RFC 2616 allows "text/plain ;
> >charset=utf-8" as well as "text/plain;charset=utf-8" and "text/plain;
> >charset=utf-8".
> 
> It does indeed, and I have seen servers send both variants, so they do need
> to be catered for.
> 
> The number of servers that would actually send the charset attribute here
> (for error messages) are probably not that many. It is probably a good idea
> to make the default user-configurable (I know the specs state that anything
> undeclared is iso-8859-1, but the real world doesn't agree to that).

I was really hoping to avoid getting into all of the real-world
messiness that a real http client needs to deal with (as opposed to just
following the standards). This is only used for an optional relay of
error messages from the server. Most servers will send back text/html by
default, and only those which specifically configure text/plain will
have their messages shown. IOW, I expect this to be configured
specifically for git messages, and the server admin can make sure they
are following the standard and that it works with git.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter

2014-05-22 Thread Jeff King
On Thu, May 22, 2014 at 12:27:38AM -0700, Kyle J. McKay wrote:

> Yeah I think so too.  It's probably enough though just to just strip all " "
> and "\t" characters at the same time the content type is lowercased.  While
> that would cause invalid content types such as "text / plain" to be
> recognized it would keep the rest of the code simpler.  Since a producer of
> an invalid content type shouldn't really be depending on any particular
> behavior by a receiver of an invalid content type it's probably not an
> issue.

I think you have to retain whitespace between parameters. Not that I
expect there to be multiple parameters in a text/plain, but it's
allowed.

I started doing this path of trying to produce a normalized version, but
found that it was just as much code as parsing at all (and it still
leaves the calling code to do its own parse). Instead, what I ended up
with was just doing the parsing in http.c into nice, normalized chunks,
and then the calling code can compare them with simple strcmps.

I'll post the patches in a few minutes.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter

2014-05-22 Thread Kyle J. McKay

On May 21, 2014, at 23:05, Jeff King wrote:


On Wed, May 21, 2014 at 05:07:38PM -0700, Kyle J. McKay wrote:


+   p = skip_prefix(type->buf, "text/plain");
+   if (!p || (*p && *p != ';'))
+   return 0;
+
+   return 1;
+}
+


I think that a strict reading of RFC 2616 allows "text/plain ;
charset=utf-8" as well as "text/plain;charset=utf-8" and "text/plain;
charset=utf-8".  So perhaps this if line instead:

+   if (!p || (*p && *p != ';' && *p != ' ' && *p != '\t'))

See RFC 2616 section 2.2 for the definition of linear white space  
(LWS) and

the end of section 2.1 for the "implied *LWS" rule that allows it.


You're right. There are a few exceptions in 3.7:

 Linear white space (LWS) MUST NOT be used between the type and
 subtype, nor between an attribute and its value.

but it does not include between the subtype and parameter. So the
"find_parameter" also needs to accept the collapsed whitespace, too, I
guess.


Yeah I think so too.  It's probably enough though just to just strip  
all " " and "\t" characters at the same time the content type is  
lowercased.  While that would cause invalid content types such as  
"text / plain" to be recognized it would keep the rest of the code  
simpler.  Since a producer of an invalid content type shouldn't really  
be depending on any particular behavior by a receiver of an invalid  
content type it's probably not an issue.


--Kyle
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter

2014-05-22 Thread Peter Krefting

Kyle J. McKay:

I think that a strict reading of RFC 2616 allows "text/plain ; charset=utf-8" 
as well as "text/plain;charset=utf-8" and "text/plain; charset=utf-8".


It does indeed, and I have seen servers send both variants, so they do 
need to be catered for.


The number of servers that would actually send the charset attribute 
here (for error messages) are probably not that many. It is probably a 
good idea to make the default user-configurable (I know the specs 
state that anything undeclared is iso-8859-1, but the real world 
doesn't agree to that).


--
\\// Peter - http://www.softwolves.pp.se/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter

2014-05-21 Thread Jeff King
On Wed, May 21, 2014 at 05:07:38PM -0700, Kyle J. McKay wrote:

> >+p = skip_prefix(type->buf, "text/plain");
> >+if (!p || (*p && *p != ';'))
> >+return 0;
> >+
> >+return 1;
> >+}
> >+
> 
> I think that a strict reading of RFC 2616 allows "text/plain ;
> charset=utf-8" as well as "text/plain;charset=utf-8" and "text/plain;
> charset=utf-8".  So perhaps this if line instead:
> 
> + if (!p || (*p && *p != ';' && *p != ' ' && *p != '\t'))
> 
> See RFC 2616 section 2.2 for the definition of linear white space (LWS) and
> the end of section 2.1 for the "implied *LWS" rule that allows it.

You're right. There are a few exceptions in 3.7:

  Linear white space (LWS) MUST NOT be used between the type and
  subtype, nor between an attribute and its value.

but it does not include between the subtype and parameter. So the
"find_parameter" also needs to accept the collapsed whitespace, too, I
guess.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] remote-curl: recognize text/plain with a charset parameter

2014-05-21 Thread Kyle J. McKay

On May 21, 2014, at 03:33, Jeff King wrote:


Commit 426e70d (remote-curl: show server content on http
errors, 2013-04-05) tried to recognize text/plain content
types, but fails to do so if they have any parameters.

This patches makes our parsing more liberal, though we still
do not do anything useful with a charset parameter.

Signed-off-by: Jeff King 
---
remote-curl.c  | 26 ++
t/lib-httpd/error.sh   |  8 +++-
t/t5550-http-fetch-dumb.sh |  5 +
3 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index a5ab977..6d1b206 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -194,20 +194,30 @@ static void free_discovery(struct discovery *d)
}
}

+/*
+ * We only show text/plain parts, as other types are likely
+ * to be ugly to look at on the user's terminal.
+ */
+static int content_type_is_terminal_friendly(struct strbuf *type)
+{
+   const char *p;
+
+   p = skip_prefix(type->buf, "text/plain");
+   if (!p || (*p && *p != ';'))
+   return 0;
+
+   return 1;
+}
+


I think that a strict reading of RFC 2616 allows "text/plain ;  
charset=utf-8" as well as "text/plain;charset=utf-8" and "text/plain;  
charset=utf-8".  So perhaps this if line instead:


+   if (!p || (*p && *p != ';' && *p != ' ' && *p != '\t'))

See RFC 2616 section 2.2 for the definition of linear white space  
(LWS) and the end of section 2.1 for the "implied *LWS" rule that  
allows it.



static int show_http_message(struct strbuf *type, struct strbuf *msg)
{
const char *p, *eol;

-   /*
-* We only show text/plain parts, as other types are likely
-* to be ugly to look at on the user's terminal.
-*
-* TODO should handle "; charset=XXX", and re-encode into
-* logoutputencoding
-*/
-   if (strcmp(type->buf, "text/plain"))
+   if (!content_type_is_terminal_friendly(type))
return -1;

+	/* TODO should record charset and reencode msg to  
logOutputEncoding */

+
strbuf_trim(msg);
if (!msg->len)
return -1;
diff --git a/t/lib-httpd/error.sh b/t/lib-httpd/error.sh
index 786f281..02e80b3 100755
--- a/t/lib-httpd/error.sh
+++ b/t/lib-httpd/error.sh
@@ -3,6 +3,7 @@
printf "Status: 500 Intentional Breakage\n"

printf "Content-Type: "
+charset=iso8859-1
case "$PATH_INFO" in
*html*)
printf "text/html"
@@ -10,8 +11,13 @@ case "$PATH_INFO" in
*text*)
printf "text/plain"
;;
+*charset*)
+   printf "text/plain; charset=utf-8"
+   charset=utf-8
+   ;;
esac
printf "\n"

printf "\n"
-printf "this is the error message\n"
+printf "this is the error message\n" |
+iconv -f us-ascii -t $charset
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 13defd3..b35b261 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -181,5 +181,10 @@ test_expect_success 'git client does not show  
html errors' '

! grep "this is the error message" stderr
'

+test_expect_success 'git client shows text/plain with a charset' '
+   test_must_fail git clone "$HTTPD_URL/error/charset" 2>stderr &&
+   grep "this is the error message" stderr
+'
+
stop_httpd
test_done
--
2.0.0.rc1.436.g03cb729



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html