Re: [PATCH] Fixed two bugs in git-cvsimport-script.

2005-08-17 Thread David Kågedal
Junio C Hamano <[EMAIL PROTECTED]> writes:

> Junio C Hamano <[EMAIL PROTECTED]> writes:
>
>> Yes, the patch had some context conflicts with some other patch
>> so the patch application was done by hand, and I did a quick and
>> dirty cut & paste of the commit message from "cat mbox" output.
>>
>> I will probably drop future patches encoded in QP.
>
> This was totally inappropriate; sorry, but I was in a bad mood.
>
> A more serious response.
>
>  - I personally consider commit message encoding a per project
>issue (so is blob contents encoding).

Agreed.  And your response is probably good enough for now.  I also
think that having UTF-8 as the standard convention is the way to go.

-- 
David Kågedal

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fixed two bugs in git-cvsimport-script.

2005-08-17 Thread David Kågedal
Junio C Hamano <[EMAIL PROTECTED]> writes:

>> Apparently, my mail was encoded as QP, which is not very popular
>> around here.  But it seems that the diff part was decoded properly
>> before applying.  Was that done manually?
>
> Yes, the patch had some context conflicts with some other patch
> so the patch application was done by hand, and I did a quick and
> dirty cut & paste of the commit message from "cat mbox" output.
>
> I will probably drop future patches encoded in QP.

Ok, but do you have an answer to my real question?  What is the
character encoding for commit objects in your git repository?

It is obviously one that is compatible with ASCII, which probably
leaves you with the alternatives ASCII, Latin1, and UTF-8.  And plain
ASCII obviously doesn't work very well for me.

-- 
David Kågedal

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fixed two bugs in git-cvsimport-script.

2005-08-17 Thread Junio C Hamano
Junio C Hamano <[EMAIL PROTECTED]> writes:

> Yes, the patch had some context conflicts with some other patch
> so the patch application was done by hand, and I did a quick and
> dirty cut & paste of the commit message from "cat mbox" output.
>
> I will probably drop future patches encoded in QP.

This was totally inappropriate; sorry, but I was in a bad mood.

A more serious response.

 - I personally consider commit message encoding a per project
   issue (so is blob contents encoding).  If for example a
   project is Japanese only, MS-DOS^WWindows programming
   project, I do not think it is unreasonable if all the commit
   messages and source files are encoded in Shift-JIS.  More
   Unixy projects over there might use EUC-JP in source files
   and maybe ISO-2022 in the log messages (because the latter is
   the standard way to exchange e-mails there).  As long as
   project participants agree to use the same encodings, things
   should work just fine within a project.

 - However, weird people are known to merge projects that
   started out as totally separate into one.  It would be a
   disaster for the commit log viewer when this happens.  For
   this reason, some people recommend using a common deniminator
   encoding, namely UTF-8, for commit logs from day one, even if
   you do not envision such a merge may happen in the future.

   This recommendation also goes to author and committer
   identification (but not blob contents).  But this is just an
   recommendation, and it is still up to the individual project
   what encoding to use in the log messages, and the low-level
   GIT should not dictate nor interfere; "git-commit-tree" and
   "git-cat-file commit" are 8-bit clean.

 - The e-mail patch acceptance machinery found in tools/
   directory is tuned for the established patch exchange
   practice used in the linux-kernel mailing list.  No MIME, no
   QP, no guarantee to pass things outside ASCII.

 - Eventually, tools/mailinfo.c should be taught about MIME to
   do at least:

   - detect whitespace corrupted patch via sending MUA using
 flowed-text and reject it;

   - detect multipart PGP signed message, discard the attached
 signature which is often useless, and unwrap the payload;

   - decode QP and B encodings as necessary, and after splitting
 the message to the info, msg and patch part, transliterate
 the info and msg part from original encoding to UTF-8 (when
 '--utf8' flag is given, perhaps).

   One of the requirement there is that it still needs to be
   _fast_.  Linus needs to be able to make 5 commits per second
   out of his mailbox.

So that is the technical part of the response.  There is one
Project policy part of the response: I would endorse the
application of that UTF-8 recommendation to the git project
itself, at least in principle.

But that in practice would happen only after the above mailinfo
update takes place.  Until then, it is very likely that I will
occasionally fail to spot and to hand-correct people's name left
undecoded the way the patch acceptance machinery passed through
in the log message.  Please live with it (or send such patches
to mailinfo ;-).


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fixed two bugs in git-cvsimport-script.

2005-08-17 Thread Junio C Hamano
> Apparently, my mail was encoded as QP, which is not very popular
> around here.  But it seems that the diff part was decoded properly
> before applying.  Was that done manually?

Yes, the patch had some context conflicts with some other patch
so the patch application was done by hand, and I did a quick and
dirty cut & paste of the commit message from "cat mbox" output.

I will probably drop future patches encoded in QP.



-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fixed two bugs in git-cvsimport-script.

2005-08-17 Thread David Kågedal
David Kågedal <[EMAIL PROTECTED]> writes:

> The git-cvsimport-script had a copule of small bugs that prevented me
> from importing a big CVS repository.
>
> The first was that it didn't handle removed files with a multi-digit
> primary revision number.

I noticed that this patch was accepted, which is great.  But there is
a problem with the character encoding in the commit message.

The commit in question is b0921331030d52febf52839753eee1b2b9ca1f24

The "author" field is written as "iso-8859-1?Q?David_K=E5gedal
<[EMAIL PROTECTED]>", which is taken from the "From:" line in my
email.  This should be decoded by the patch import script before
generating the commit message.

But the trickier question is what encoding to use in the commit
message.

This is the signed-off line in my mail:

  Signed-off-by: David Kågedal <[EMAIL PROTECTED]>

This is what appears in the commit:

  Signed-off-by: David K?5gedal <[EMAIL PROTECTED]>

Using ISO-8859-15 or UTF-8 would probably have made more sense here.

Apparently, my mail was encoded as QP, which is not very popular
around here.  But it seems that the diff part was decoded properly
before applying.  Was that done manually?

Since my name contains a character that is not part of ASCII, it isn't
really an option to send the mails encoded as ASCII.  I could probably
convince my mailer (Gnus) to send it as "8bit" or "binary", but that
is a pretty limited solution.  An it isn't even legal to use anything
but ASCII in the mail header.

-- 
David Kågedal

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html