Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8
On Sun, Aug 04, 2013 at 11:14:40AM -0700, Jonathan Nieder wrote: Alexey Shumkin wrote: On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote: 1. Log messages use the configured log output encoding, which is meant to be whatever encoding works best with local terminals (and does not have much to do with what encoding should be used for email) 2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw port (which uses Unicode filesystem APIs), always UTF-8 I cannot say exactly if it makes sense for THIS patch, but I'd like to remind about Cygwin port, which definitely does not use UTF-8 encoding (in my case it is Windows-1251) for filenames. 3. The This is an automated email preface uses a project description from .git/description, which is typically in UTF-8 to support gitweb. Thanks for clarifying. So in the context you describe, (1) is configurable, (2) is Windows-1251, (3) is unconfigurably UTF-8, and there is no way with current git facilities to force the email to use a single encoding unless (3) happens to contain no special characters. What is the value of the [i18n] commitEncoding setting in your project? commitEncoding is equal to filenames' encoding, Windows-1251, of course. What encoding do the raw commit messages (shown with git log --format=raw) use for their text, and what do they declare with an in-commit 'encoding' header, if any? Well, despite `git log --help` --8-- raw The raw format shows the entire commit exactly as stored in the commit object --8-- on a Linux box (UTF-8) I can see readable commit messages nevertheless they are stored in 'Windows-1251' (so they are converted to UTF-8). To be sure I've checked actual content of them with `git cat-file commit` Actually, to be honest, I usually use modified version of Git (see ecaee8050cec23eb4cf082512e907e3e52c20b57) in 'next' branch, that could affect the results, so I've checked `git log --format=raw` with unmodified v1.8.3.3 of Git. But let's go back to the answer to your question. Commit encoding stored as a header in a raw commit messages is 'Windows-1251'. Does everyone on this project use Cygwin?i This is a closed (commercial) project and every developer uses Cygwin, except me. I use a Linux box as a desktop (mail, IM, web-browsing; but development goes on Cygwin). And sometimes I run utility scripts included to that project on my desktop (as far as Linux works with files much faster than Cygwin does ;)) Also, a Git server is a coLinux box (http://www.colinux.org/) on a Windows Server 2003, but I guess, it does not much matter here. That should be fine, but I'd expect there to be problems as soon as someone wants to try the Mingw port (Git for Windows). Yep, one of our developers tried to use modern version of TortoiseGit with MinGW port of Git. That was a failure. As far as since v1.7.9 MinGW port transcodes filenames to store them internally in UTF-8. This problem could be solved with converting once that non-ASCII filenames to UTF-8, but I do not want to use MinGW port. I like Cygwin infrastructure that is more Linux-like than MinGW. I wonder if there should be an [i18n] repositoryPathEncoding configuration item to support this kind of repository. Then git could be aware of the intended encoding of paths, could recode them for display to a terminal, and at least on Linux and Mingw could recode them for use in filenames on disk. repositoryPathEncoding = none would mean the current behavior of treating paths as raw sequences of bytes. I'd be happy if such a setting exists. That could solve many problems with cross-platform projects with non-ASCII filenames. Indeed, MinGW port does resolve that problem somehow! What do you think? Jonathan -- Alexey Shumkin -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8
On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote: Git commands write commit messages in UTF-8 by default, but that default can be overridden by the [i18n] commitEncoding and logOutputEncoding settings. With such a setting, the emails written by the post-receive-email hook use a mixture of encodings: 1. Log messages use the configured log output encoding, which is meant to be whatever encoding works best with local terminals (and does not have much to do with what encoding should be used for email) 2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw port (which uses Unicode filesystem APIs), always UTF-8 I cannot say exactly if it makes sense for THIS patch, but I'd like to remind about Cygwin port, which definitely does not use UTF-8 encoding (in my case it is Windows-1251) for filenames. 3. The This is an automated email preface uses a project description from .git/description, which is typically in UTF-8 to support gitweb. So (1) is configurable, and (2) and (3) are unconfigurable and typically UTF-8. Override the log output encoding to always use UTF-8 when writing the email to get the best chance of a comprehensible single-encoding email. I cannot agree to receive e-mails in UTF-8 only for Windows projects which have non-UTF-8 encoding. I want to see and read correctly formed e-mail without any corrupted symbols instead of filenames (that is the main problem here as far as filenames are not converted unlike log messages) Signed-off-by: Jonathan Nieder jrnie...@gmail.com --- contrib/hooks/post-receive-email | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email index 72084511..ba93a0d8 100755 --- a/contrib/hooks/post-receive-email +++ b/contrib/hooks/post-receive-email @@ -471,7 +471,7 @@ generate_delete_branch_email() echowas $oldrev echo echo $LOGBEGIN - git diff-tree -s --always --pretty=oneline $oldrev + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev echo $LOGEND } @@ -571,7 +571,7 @@ generate_delete_atag_email() echowas $oldrev echo echo $LOGBEGIN - git diff-tree -s --always --pretty=oneline $oldrev + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev echo $LOGEND } @@ -617,7 +617,7 @@ generate_general_email() echo if [ $newrev_type = commit ]; then echo $LOGBEGIN - git diff-tree -s --always --pretty=medium $newrev + git diff-tree -s --always --encoding=UTF-8 --pretty=medium $newrev echo $LOGEND else # What can we do here? The tag marks an object that is not @@ -636,7 +636,7 @@ generate_delete_general_email() echowas $oldrev echo echo $LOGBEGIN - git diff-tree -s --always --pretty=oneline $oldrev + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev echo $LOGEND } -- 1.8.4.rc1 -- Alexey Shumkin -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8
Alexey Shumkin wrote: On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote: 1. Log messages use the configured log output encoding, which is meant to be whatever encoding works best with local terminals (and does not have much to do with what encoding should be used for email) 2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw port (which uses Unicode filesystem APIs), always UTF-8 I cannot say exactly if it makes sense for THIS patch, but I'd like to remind about Cygwin port, which definitely does not use UTF-8 encoding (in my case it is Windows-1251) for filenames. 3. The This is an automated email preface uses a project description from .git/description, which is typically in UTF-8 to support gitweb. Thanks for clarifying. So in the context you describe, (1) is configurable, (2) is Windows-1251, (3) is unconfigurably UTF-8, and there is no way with current git facilities to force the email to use a single encoding unless (3) happens to contain no special characters. What is the value of the [i18n] commitEncoding setting in your project? What encoding do the raw commit messages (shown with git log --format=raw) use for their text, and what do they declare with an in-commit 'encoding' header, if any? Does everyone on this project use Cygwin? That should be fine, but I'd expect there to be problems as soon as someone wants to try the Mingw port (Git for Windows). I wonder if there should be an [i18n] repositoryPathEncoding configuration item to support this kind of repository. Then git could be aware of the intended encoding of paths, could recode them for display to a terminal, and at least on Linux and Mingw could recode them for use in filenames on disk. repositoryPathEncoding = none would mean the current behavior of treating paths as raw sequences of bytes. What do you think? Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8
Git commands write commit messages in UTF-8 by default, but that default can be overridden by the [i18n] commitEncoding and logOutputEncoding settings. With such a setting, the emails written by the post-receive-email hook use a mixture of encodings: 1. Log messages use the configured log output encoding, which is meant to be whatever encoding works best with local terminals (and does not have much to do with what encoding should be used for email) 2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw port (which uses Unicode filesystem APIs), always UTF-8 3. The This is an automated email preface uses a project description from .git/description, which is typically in UTF-8 to support gitweb. So (1) is configurable, and (2) and (3) are unconfigurable and typically UTF-8. Override the log output encoding to always use UTF-8 when writing the email to get the best chance of a comprehensible single-encoding email. Signed-off-by: Jonathan Nieder jrnie...@gmail.com --- contrib/hooks/post-receive-email | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email index 72084511..ba93a0d8 100755 --- a/contrib/hooks/post-receive-email +++ b/contrib/hooks/post-receive-email @@ -471,7 +471,7 @@ generate_delete_branch_email() echowas $oldrev echo echo $LOGBEGIN - git diff-tree -s --always --pretty=oneline $oldrev + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev echo $LOGEND } @@ -571,7 +571,7 @@ generate_delete_atag_email() echowas $oldrev echo echo $LOGBEGIN - git diff-tree -s --always --pretty=oneline $oldrev + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev echo $LOGEND } @@ -617,7 +617,7 @@ generate_general_email() echo if [ $newrev_type = commit ]; then echo $LOGBEGIN - git diff-tree -s --always --pretty=medium $newrev + git diff-tree -s --always --encoding=UTF-8 --pretty=medium $newrev echo $LOGEND else # What can we do here? The tag marks an object that is not @@ -636,7 +636,7 @@ generate_delete_general_email() echowas $oldrev echo echo $LOGBEGIN - git diff-tree -s --always --pretty=oneline $oldrev + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev echo $LOGEND } -- 1.8.4.rc1 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html