Re: Re: [msysGit] Re: Re: File path not escaped in warning message

2012-08-07 Thread jbialobr
Hi Karsten,

Thanks, you helped me a lot. I see now, that logoutputencoding 
is for commit header data, not for whole log output. The name
is little missleading, so do description:
i18n.logOutputEncoding
   Encoding to use when displaying logs.

Regarding git-log / git-diff output, there are basically three different

character set encodings involved:
1. commit log messages: re-coded to i18n.logoutputencoding (usually
UTF-8)

also commiter and author fields.

2. file content: printed verbatim (no re-coding); gui tools such as gitk

may decode this based on gui.encoding or .gitattributes settings
3. everything else (file names, diff headers, error / warning messages):

always UTF-8 (at least in Git for Windows)

It is more like git encoding than UTF-8. 


Gui tools such as gitk decode this output line by line using the 
appropriate encoding.

It would be easier to do it if output was displayed in consistent way. I
see no reason why one part of output is quoted and other is not. Another
example:

$ git status
warning: LF will be replaced by CRLF in 1ą.txt.
The file will have its original line endings in your working directory.
# On branch master
# Your branch and 'origin/master' have diverged,
# and have 2 and 2 different commit(s) each, respectively.
#
# Changes to be committed:
#   (use "git reset HEAD ..." to unstage)
#
#   modified:   "1\261.txt"
#

Git doesn't recode commit header when --pretty is set to format,
but recodes when --pretty is set to full, raw, etc. Do you know if it is
done by mistake or by design?

Thanks again,
Janusz Białobrzewski.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [msysGit] Re: Re: File path not escaped in warning message

2012-08-06 Thread karsten . blees
Hi Janusz,

It seems you're mixing up a few completely unrelated concepts here.

Core.quotepath enables quoting and escaping of special characters in file 
names. This has nothing to do with character set encoding of file names 
(i.e. Cp1250/ISO-8859-2/UTF-8). AFAIK, apart from git-svn, git currently 
doesn't support character set re-coding of file names at all, so 
core.quotepath and encoding are completely unrelated.

Regarding git-log / git-diff output, there are basically three different 
character set encodings involved:
1. commit log messages: re-coded to i18n.logoutputencoding (usually UTF-8)
2. file content: printed verbatim (no re-coding); gui tools such as gitk 
may decode this based on gui.encoding or .gitattributes settings
3. everything else (file names, diff headers, error / warning messages): 
always UTF-8 (at least in Git for Windows)

Gui tools such as gitk decode this output line by line using the 
appropriate encoding.


 wrote on 06.08.2012 08:53:17:
> File name is 1ą.txt its content is encoded in windows-1250

File name encoding and file content encoding are completely unrelated. 
File name encoding in current Git for Windows is *always* UTF-8, file 
content encoding can be anything.

> Output of git diff after reencoding to windows1250 is:
>
> warning: LF will be replaced by CRLF in 1Ä….txt.
> The file will have its original line endings in your working directory.

This looks like the file name is UTF-8, but reinterpreted (not reencoded) 
as if it were Cp1250. However, as stated above, you cannot simply 
interpret the entire git-log / git-diff output as beeing one particular 
encoding, as the encoding may vary on a line by line basis.

> Here is output from linux:
>
> [janusz@mikrus JavaCommon]$ git config --add core.quotepath false
> [janusz@mikrus JavaCommon]$ git diff  --unified=3 -- "1ą.txt"
> warning: LF will be replaced by CRLF in 1.txt.
> The file will have its original line endings in your working directory.

"" looks like less's escaping with missing LESSCHARSET setting.

Additionally, your Linux box seems to be set up with ISO-8859-2 system 
encoding. Git repositories created on this system will not be portable, 
i.e. using the same repository on other Linux systems, Git for Windows, 
Cygwin-git, or JGit/EGit will result in completely broken file names. The 
quasi-standard file name encoding in git repositories is UTF-8.

> There is nothing said in the manual, that core.quotepath affects 
> only header. But it is not the point. You don't know which part of 
> git output will be consumed by machine. Warning message is addressed
> to human, but it can be consumed by program in the same way as all 
> other messages and output data.

Error / warning messages may be localized, so they are particularly 
unsuitable for consumption by other programs. That's why many git commands 
have special switches to make their output machine readable (e.g. -z). 
Incidentally, 'git-log -z' also disables core.quotepath. So if you write a 
program that parses git output, and you're using the proper 'machine 
readable' version, you should never have to worry about quoted paths, 
irrespective of the core.quotepath setting.

> Imho, since warning comes from git, path should be quoted to
> make git behaviour consistent. 
> From git-log help:
> > Note that we deliberately chose not to re-code the commit log 
> message when a commit is made to force UTF-8 at the commit object 
> level, because re-coding to UTF-8 is not necessarily a reversible 
operation.
> 
> If re-coding from one encoding to other is not necessarily a 
> reversible operation, and you can set logoutputencoding to any 
> encoding you wish, you may loose some charatcers while recoding file
> path in warning message. Quoting it would be desired then.
> 

The i18n.commitencoding and i18n.logoutputencoding settings only affect 
commit log messages. They are completely unrelated to error / warning 
messages, file names, or file name quoting.

Hope that helps,
Karsten

N�r��yb�X��ǧv�^�)޺{.n�+ا���ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf

Re: Re: File path not escaped in warning message

2012-08-05 Thread jbialobr
There is nothing said in the manual, that core.quotepath affects only header. 
But it is not the point. You don't know which part of git output will be 
consumed by machine. Warning message is addressed to human, but it can be 
consumed by program in the same way as all other messages and output data.
Imho, since warning comes from git, path should be quoted to
make git behaviour consistent. 
>From git-log help:
> Note that we deliberately chose not to re-code the commit log message when a 
> commit is made to force UTF-8 at the commit object level, because re-coding 
> to UTF-8 is not necessarily a reversible operation.

If re-coding from one encoding to other is not necessarily a reversible 
operation, and you can set logoutputencoding to any encoding you wish, you may 
loose some charatcers while recoding file path in warning message. Quoting it 
would be desired then.

Janusz Białobrzewski.

 Wiadomość Oryginalna 
Od: Junio C Hamano 
Do: Janusz Białobrzewski 
Kopia do: msys...@googlegroups.com,  git@vger.kernel.org
Data: 5 sierpnia 2012 21:48
Temat: Re: File path not escaped in warning message

> Janusz Białobrzewski  writes:
> 
> > Here is output from linux:
> >
> > [janusz@mikrus JavaCommon]$ git config --add core.quotepath false
> > [janusz@mikrus JavaCommon]$ git diff  --unified=3 -- "1ą.txt"
> > warning: LF will be replaced by CRLF in 1.txt.
> > The file will have its original line endings in your working directory.
> 
> I do not know offhand if the literal  is the byte value you want
> or not, but core.quotepath should not affect it.
> 
> The configuration is primarily about quoting paths that appear in
> the header part in the diff output for machine readability.  In this
> output,
> 
> > diff --git a/1.txt b/1.txt
> > index 281ad6f..9444a66 100644
> > --- a/1.txt
> > +++ b/1.txt
> 
> the paths are not quoted because quotepath is set to false, but in
> the next example, it
> 
> > ...
> > [janusz@mikrus JavaCommon]$ git config --unset core.quotepath
> > [janusz@mikrus JavaCommon]$ git config --add core.quotepath true
> > [janusz@mikrus JavaCommon]$ git diff  --unified=3 -- "1ą.txt"
> > warning: LF will be replaced by CRLF in 1.txt.
> > The file will have its original line endings in your working directory.
> > diff --git "a/1\261.txt" "b/1\261.txt"
> > index 281ad6f..9444a66 100644
> > --- "a/1\261.txt"
> > +++ "b/1\261.txt"
> 
> is quoted due to the configuration setting.
> 
> Again,  in the warning message is not affected, as the quotepath
> configuration is not meant to affect messages that are meant for
> human consumption.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html