Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-27 Thread Jakub Narębski
W dniu 2014-05-16 19:05, Junio C Hamano pisze: Jakub Narębski jna...@gmail.com writes: Correct, but is where does it appear the question we are primarily interested in, wrt this breakage and its fix? That of course depends on how we want to test gitweb output. The simplest solution,

Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-16 Thread Jakub Narębski
On Fri, May 16, 2014 at 3:26 AM, Junio C Hamano gits...@pobox.com wrote: Jakub Narębski jna...@gmail.com writes: On Thu, May 15, 2014 at 9:38 PM, Junio C Hamano gits...@pobox.com wrote: Jakub Narębski jna...@gmail.com writes: Writing test for this would not be easy, and require some HTML

Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-16 Thread Junio C Hamano
Jakub Narębski jna...@gmail.com writes: Correct, but is where does it appear the question we are primarily interested in, wrt this breakage and its fix? That of course depends on how we want to test gitweb output. The simplest solution, comparing with known output with perhaps fragile /

Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-15 Thread Peter Krefting
Michael Wagner: Decoding the UTF-8 encoded file name (again with an additional print statement): $ REQUEST_METHOD=GET QUERY_STRING='p=notes.git;a=blob_plain;f=work/G%C3%83%C2%BCtekriterien.txt;hb=HEAD' ./gitweb.cgi work/Gütekriterien.txt Content-disposition: inline;

Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-15 Thread Jakub Narębski
On Thu, May 15, 2014 at 7:08 AM, Michael Wagner accou...@mwagner.org wrote: On Thu, May 15, 2014 at 12:25:45AM +0200, Jakub Narębski wrote: On Wed, May 14, 2014 at 11:57 PM, Junio C Hamano gits...@pobox.com wrote: Michael Wagner accou...@mwagner.org writes: Perl has an internal encoding used

Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-15 Thread Junio C Hamano
Peter Krefting pe...@softwolves.pp.se writes: What is happening is that whatever is generating the URI us UTF-8-encoding the string twice (i.e., it generates a string with the proper C3 BC in it, and then interprets it as iso-8859-1 data and runs that through a UTF-8 encoder again, yielding

Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-15 Thread Michael Wagner
On Thu, May 15, 2014 at 10:04:24AM +0100, Peter Krefting wrote: Michael Wagner: Decoding the UTF-8 encoded file name (again with an additional print statement): $ REQUEST_METHOD=GET QUERY_STRING='p=notes.git;a=blob_plain;f=work/G%C3%83%C2%BCtekriterien.txt;hb=HEAD' ./gitweb.cgi

Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-15 Thread Jakub Narębski
On Thu, May 15, 2014 at 8:48 PM, Michael Wagner accou...@mwagner.org wrote: On Thu, May 15, 2014 at 10:04:24AM +0100, Peter Krefting wrote: Michael Wagner: Decoding the UTF-8 encoded file name (again with an additional print statement): $ REQUEST_METHOD=GET

Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-15 Thread Jakub Narębski
On Thu, May 15, 2014 at 9:28 PM, Jakub Narębski jna...@gmail.com wrote: On Thu, May 15, 2014 at 8:48 PM, Michael Wagner accou...@mwagner.org wrote: [...] The subroutine git tree generates the tree view. It stores the output of git ls-tree -z ... in an array named @entries. Printing the content

Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-15 Thread Junio C Hamano
Jakub Narębski jna...@gmail.com writes: Writing test for this would not be easy, and require some HTML parser (WWW::Mechanize, Web::Scraper, HTML::Query, pQuery, ... or low level HTML::TreeBuilder, or other low level parser). Hmph. Is it more than just looking for a specific run of %xx we

Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-15 Thread Jakub Narębski
On Thu, May 15, 2014 at 9:38 PM, Junio C Hamano gits...@pobox.com wrote: Jakub Narębski jna...@gmail.com writes: Writing test for this would not be easy, and require some HTML parser (WWW::Mechanize, Web::Scraper, HTML::Query, pQuery, ... or low level HTML::TreeBuilder, or other low level

Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-15 Thread Junio C Hamano
Jakub Narębski jna...@gmail.com writes: On Thu, May 15, 2014 at 9:38 PM, Junio C Hamano gits...@pobox.com wrote: Jakub Narębski jna...@gmail.com writes: Writing test for this would not be easy, and require some HTML parser (WWW::Mechanize, Web::Scraper, HTML::Query, pQuery, ... or low level

[PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-14 Thread Michael Wagner
Perl has an internal encoding used to store text strings. Currently, trying to view files with UTF-8 encoded names results in an error (either 404 - Cannot find file [blob_plain] or XML Parsing Error [blob]). Converting these UTF-8 encoded file names into Perl's internal format resolves these

Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-14 Thread Junio C Hamano
Michael Wagner accou...@mwagner.org writes: Perl has an internal encoding used to store text strings. Currently, trying to view files with UTF-8 encoded names results in an error (either 404 - Cannot find file [blob_plain] or XML Parsing Error [blob]). Converting these UTF-8 encoded file

Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-14 Thread Jakub Narębski
On Wed, May 14, 2014 at 11:57 PM, Junio C Hamano gits...@pobox.com wrote: Michael Wagner accou...@mwagner.org writes: Perl has an internal encoding used to store text strings. Currently, trying to view files with UTF-8 encoded names results in an error (either 404 - Cannot find file

Re: [PATCH/RFC] Gitweb: Convert UTF-8 encoded file names

2014-05-14 Thread Michael Wagner
On Thu, May 15, 2014 at 12:25:45AM +0200, Jakub Narębski wrote: On Wed, May 14, 2014 at 11:57 PM, Junio C Hamano gits...@pobox.com wrote: Michael Wagner accou...@mwagner.org writes: Perl has an internal encoding used to store text strings. Currently, trying to view files with UTF-8