Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8

2013-08-05 Thread Alexey Shumkin
On Sun, Aug 04, 2013 at 11:14:40AM -0700, Jonathan Nieder wrote:
 Alexey Shumkin wrote:
  On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote:
 
   1. Log messages use the configured log output encoding, which is
  meant to be whatever encoding works best with local terminals
  (and does not have much to do with what encoding should be used
  for email)
 
   2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw
  port (which uses Unicode filesystem APIs), always UTF-8
 
  I cannot say exactly if it makes sense for THIS patch, but I'd like to
  remind about Cygwin port, which definitely does not use UTF-8 encoding
  (in my case it is Windows-1251) for filenames.
 
  
   3. The This is an automated email preface uses a project description
  from .git/description, which is typically in UTF-8 to support
  gitweb.
 
 Thanks for clarifying.  So in the context you describe, (1) is
 configurable, (2) is Windows-1251, (3) is unconfigurably UTF-8, and
 there is no way with current git facilities to force the email to use
 a single encoding unless (3) happens to contain no special characters.
 
 What is the value of the [i18n] commitEncoding setting in your
 project?
commitEncoding is equal to filenames' encoding, Windows-1251, of course.

 What encoding do the raw commit messages (shown with
 git log --format=raw) use for their text, and what do they declare
 with an in-commit 'encoding' header, if any?
Well, despite `git log --help` 
--8--
raw
   The raw format shows the entire commit exactly as stored in
   the commit object
--8--
on a Linux box (UTF-8) I can see readable commit messages nevertheless
they are stored in 'Windows-1251' (so they are converted to UTF-8). To
be sure I've checked actual content of them with `git cat-file commit`
Actually, to be honest, I usually use modified version of Git (see
ecaee8050cec23eb4cf082512e907e3e52c20b57) in 'next' branch, that could
affect the results, so I've checked `git log --format=raw` with
unmodified v1.8.3.3 of Git.

But let's go back to the answer to your question. Commit encoding stored
as a header in a raw commit messages is 'Windows-1251'.
 
 Does everyone on this project use Cygwin?i
This is a closed (commercial) project and every developer uses Cygwin,
except me. I use a Linux box as a desktop (mail, IM, web-browsing; but
development goes on Cygwin). And sometimes I run utility scripts
included to that project on my desktop (as far as Linux works with files
much faster than Cygwin does ;))
Also, a Git server is a coLinux box (http://www.colinux.org/) on a
Windows Server 2003, but I guess, it does not much matter here.
  That should be fine, but
 I'd expect there to be problems as soon as someone wants to try the
 Mingw port (Git for Windows).
Yep, one of our developers tried to use modern version of TortoiseGit
with MinGW port of Git. That was a failure. As far as since v1.7.9 MinGW
port transcodes filenames to store them internally in UTF-8. This
problem could be solved with converting once that non-ASCII filenames to
UTF-8, but I do not want to use MinGW port. I like Cygwin
infrastructure that is more Linux-like than MinGW.
 
 I wonder if there should be an [i18n] repositoryPathEncoding
 configuration item to support this kind of repository.  Then git could
 be aware of the intended encoding of paths, could recode them for
 display to a terminal, and at least on Linux and Mingw could recode
 them for use in filenames on disk.  repositoryPathEncoding = none
 would mean the current behavior of treating paths as raw sequences of
 bytes.
I'd be happy if such a setting exists. That could solve many problems
with cross-platform projects with non-ASCII filenames.
Indeed, MinGW port does resolve that problem somehow!
 
 What do you think?
 Jonathan

-- 
Alexey Shumkin
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8

2013-08-04 Thread Alexey Shumkin
On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote:
 Git commands write commit messages in UTF-8 by default, but that
 default can be overridden by the [i18n] commitEncoding and
 logOutputEncoding settings.  With such a setting, the emails written
 by the post-receive-email hook use a mixture of encodings:
 
  1. Log messages use the configured log output encoding, which is
 meant to be whatever encoding works best with local terminals
 (and does not have much to do with what encoding should be used
 for email)
 
  2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw
 port (which uses Unicode filesystem APIs), always UTF-8
I cannot say exactly if it makes sense for THIS patch, but I'd like to
remind about Cygwin port, which definitely does not use UTF-8 encoding
(in my case it is Windows-1251) for filenames.
 
  3. The This is an automated email preface uses a project description
 from .git/description, which is typically in UTF-8 to support
 gitweb.
 
 So (1) is configurable, and (2) and (3) are unconfigurable and
 typically UTF-8.  Override the log output encoding to always use UTF-8
 when writing the email to get the best chance of a comprehensible
 single-encoding email.
I cannot agree to receive e-mails in UTF-8 only for Windows projects
which have non-UTF-8 encoding. I want to see and read correctly formed
e-mail without any corrupted symbols instead of filenames (that is the
main problem here as far as filenames are not converted unlike log
messages)
 
 Signed-off-by: Jonathan Nieder jrnie...@gmail.com
 ---
  contrib/hooks/post-receive-email | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/contrib/hooks/post-receive-email 
 b/contrib/hooks/post-receive-email
 index 72084511..ba93a0d8 100755
 --- a/contrib/hooks/post-receive-email
 +++ b/contrib/hooks/post-receive-email
 @@ -471,7 +471,7 @@ generate_delete_branch_email()
   echowas  $oldrev
   echo 
   echo $LOGBEGIN
 - git diff-tree -s --always --pretty=oneline $oldrev
 + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
   echo $LOGEND
  }
  
 @@ -571,7 +571,7 @@ generate_delete_atag_email()
   echowas  $oldrev
   echo 
   echo $LOGBEGIN
 - git diff-tree -s --always --pretty=oneline $oldrev
 + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
   echo $LOGEND
  }
  
 @@ -617,7 +617,7 @@ generate_general_email()
   echo 
   if [ $newrev_type = commit ]; then
   echo $LOGBEGIN
 - git diff-tree -s --always --pretty=medium $newrev
 + git diff-tree -s --always --encoding=UTF-8 --pretty=medium 
 $newrev
   echo $LOGEND
   else
   # What can we do here?  The tag marks an object that is not
 @@ -636,7 +636,7 @@ generate_delete_general_email()
   echowas  $oldrev
   echo 
   echo $LOGBEGIN
 - git diff-tree -s --always --pretty=oneline $oldrev
 + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
   echo $LOGEND
  }
  
 -- 
 1.8.4.rc1
 

-- 
Alexey Shumkin
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8

2013-08-04 Thread Jonathan Nieder
Alexey Shumkin wrote:
 On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote:

  1. Log messages use the configured log output encoding, which is
 meant to be whatever encoding works best with local terminals
 (and does not have much to do with what encoding should be used
 for email)

  2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw
 port (which uses Unicode filesystem APIs), always UTF-8

 I cannot say exactly if it makes sense for THIS patch, but I'd like to
 remind about Cygwin port, which definitely does not use UTF-8 encoding
 (in my case it is Windows-1251) for filenames.

 
  3. The This is an automated email preface uses a project description
 from .git/description, which is typically in UTF-8 to support
 gitweb.

Thanks for clarifying.  So in the context you describe, (1) is
configurable, (2) is Windows-1251, (3) is unconfigurably UTF-8, and
there is no way with current git facilities to force the email to use
a single encoding unless (3) happens to contain no special characters.

What is the value of the [i18n] commitEncoding setting in your
project?  What encoding do the raw commit messages (shown with
git log --format=raw) use for their text, and what do they declare
with an in-commit 'encoding' header, if any?

Does everyone on this project use Cygwin?  That should be fine, but
I'd expect there to be problems as soon as someone wants to try the
Mingw port (Git for Windows).

I wonder if there should be an [i18n] repositoryPathEncoding
configuration item to support this kind of repository.  Then git could
be aware of the intended encoding of paths, could recode them for
display to a terminal, and at least on Linux and Mingw could recode
them for use in filenames on disk.  repositoryPathEncoding = none
would mean the current behavior of treating paths as raw sequences of
bytes.

What do you think?
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8

2013-08-02 Thread Jonathan Nieder
Git commands write commit messages in UTF-8 by default, but that
default can be overridden by the [i18n] commitEncoding and
logOutputEncoding settings.  With such a setting, the emails written
by the post-receive-email hook use a mixture of encodings:

 1. Log messages use the configured log output encoding, which is
meant to be whatever encoding works best with local terminals
(and does not have much to do with what encoding should be used
for email)

 2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw
port (which uses Unicode filesystem APIs), always UTF-8

 3. The This is an automated email preface uses a project description
from .git/description, which is typically in UTF-8 to support
gitweb.

So (1) is configurable, and (2) and (3) are unconfigurable and
typically UTF-8.  Override the log output encoding to always use UTF-8
when writing the email to get the best chance of a comprehensible
single-encoding email.

Signed-off-by: Jonathan Nieder jrnie...@gmail.com
---
 contrib/hooks/post-receive-email | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email
index 72084511..ba93a0d8 100755
--- a/contrib/hooks/post-receive-email
+++ b/contrib/hooks/post-receive-email
@@ -471,7 +471,7 @@ generate_delete_branch_email()
echowas  $oldrev
echo 
echo $LOGBEGIN
-   git diff-tree -s --always --pretty=oneline $oldrev
+   git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
echo $LOGEND
 }
 
@@ -571,7 +571,7 @@ generate_delete_atag_email()
echowas  $oldrev
echo 
echo $LOGBEGIN
-   git diff-tree -s --always --pretty=oneline $oldrev
+   git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
echo $LOGEND
 }
 
@@ -617,7 +617,7 @@ generate_general_email()
echo 
if [ $newrev_type = commit ]; then
echo $LOGBEGIN
-   git diff-tree -s --always --pretty=medium $newrev
+   git diff-tree -s --always --encoding=UTF-8 --pretty=medium 
$newrev
echo $LOGEND
else
# What can we do here?  The tag marks an object that is not
@@ -636,7 +636,7 @@ generate_delete_general_email()
echowas  $oldrev
echo 
echo $LOGBEGIN
-   git diff-tree -s --always --pretty=oneline $oldrev
+   git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
echo $LOGEND
 }
 
-- 
1.8.4.rc1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html