Re: [bug report] git-am applying maildir patches in reverse

2013-03-02 Thread Andreas Schwab
Jeff King p...@peff.net writes:

  static int maildir_filename_cmp(const char *a, const char *b)
  {
 - while (1) {
 + while (*a  *b) {
   if (isdigit(*a)  isdigit(*b)) {
   long int na, nb;
   na = strtol(a, (char **)a, 10);
 @@ -148,6 +148,7 @@ static int maildir_filename_cmp(const char *a, const char 
 *b)
   b++;
   }
   }
 + return *a - *b;

You should always cast to unsigned char when determining the order of
characters, to be consistent with strcmp/memcmp.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug report] git-am applying maildir patches in reverse

2013-03-02 Thread Junio C Hamano
Andreas Schwab sch...@linux-m68k.org writes:

 You should always cast to unsigned char when determining the order of
 characters, to be consistent with strcmp/memcmp.

We treat runs of digits as numbers, so it is not even similar to
strcmp.  As long as it is internally consistent (i.e. the return
value inside the loop (*a - *b) must match the last return), it
should be OK, no?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug report] git-am applying maildir patches in reverse

2013-03-02 Thread Jeff King
On Sat, Mar 02, 2013 at 09:44:39AM +0100, Andreas Schwab wrote:

  +   return *a - *b;
 
 You should always cast to unsigned char when determining the order of
 characters, to be consistent with strcmp/memcmp.

Thanks, I hadn't heard that advice before, but it makes obvious sense.
Junio, do you want to squash this on top?

diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
index 1ddbff4..06296d4 100644
--- a/builtin/mailsplit.c
+++ b/builtin/mailsplit.c
@@ -143,12 +143,12 @@ static int maildir_filename_cmp(const char *a, const char 
*b)
}
else {
if (*a != *b)
-   return *a - *b;
+   return (unsigned char)*a - (unsigned char)*b;
a++;
b++;
}
}
-   return *a - *b;
+   return (unsigned char)*a - (unsigned char)*b;
 }
 
 static int split_maildir(const char *maildir, const char *dir,

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug report] git-am applying maildir patches in reverse

2013-03-02 Thread Jeff King
On Sat, Mar 02, 2013 at 10:22:46PM -0800, Junio C Hamano wrote:

 Andreas Schwab sch...@linux-m68k.org writes:
 
  You should always cast to unsigned char when determining the order of
  characters, to be consistent with strcmp/memcmp.
 
 We treat runs of digits as numbers, so it is not even similar to
 strcmp.  As long as it is internally consistent (i.e. the return
 value inside the loop (*a - *b) must match the last return), it
 should be OK, no?

I almost responded and said something similar, but we also do byte-wise
comparisons for non-numeric elements, and we would want those to match
what other programs may do (and what git used to do).

I highly doubt that it matters in practice, as it would mean:

  1. The sorting of a maildir's filenames are dependent on the sorting
 of non-numeric bits. We can't rule out such a scheme, but I'd guess
 implementations either use numbers, or their sort order is
 meaningless (and that is what I found in the ones I looked at).

  2. The importantly-sorted bits contain non-ascii characters (the
 difference is only seen when we go outside the signed range).

but it doesn't hurt to be thorough (and to set a good example).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug report] git-am applying maildir patches in reverse

2013-03-01 Thread Junio C Hamano
William Giokas 1007...@gmail.com writes:

 All,

 I've been using git for a while and this is the first time I've had to
 use `git am` and I've got a 16 patch patchset that I'm looking to apply.
 The files were copied to a separate maildir by mutt to keep things
 clean, and then I ran `git am -i /path/to/maildir/` expecting things to
 start from the patch with the subject 

 [PATCH 01/16] refactor common code in query_search/sync_search

 But instead, it starts with the 16/16 patch and works backwards, which,
 obviously, breaks the application process as the patches depend on each
 other. I looked in the maildir directory just to see if the file names
 were backwards, and that's not the issue. I talked to `gitster` on IRC
 and he said to send in a bug report on this issue here. The patchset I'm
 trying to apply can be found here[0]. 

 Process to reproduce:
 * find a multi-patch set with interdependent patches
 * run `git am` on the maildir containing these patches

 Expected result:
 * Apply patches in [01..N] order

 Actual result:
 * Patches applied in [N N-1..01] order

Note to bystanders.  This is coming from populate_maildir_list() in
builtin/mailsplit.c; the function claims to know what maildir
should look like, so it should be enforcing the ordering as
necessary by sorting the list, _if_ the implicit ordering given by
string_list_insert() is insufficient.

It also is likely that it is a user error to expect that patch
e-mails are received and stored in the maildir in the order they
were sent, or it could be mutt copying the mails in the order the
messages were originally received, or something silly like that.


 [0]: https://mailman.archlinux.org/pipermail/pacman-dev/2013-March/016541.html

 Thank you,
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug report] git-am applying maildir patches in reverse

2013-03-01 Thread Jeff King
On Fri, Mar 01, 2013 at 02:27:32PM -0800, Junio C Hamano wrote:

  I've been using git for a while and this is the first time I've had to
  use `git am` and I've got a 16 patch patchset that I'm looking to apply.
  The files were copied to a separate maildir by mutt to keep things
  clean, and then I ran `git am -i /path/to/maildir/` expecting things to
  start from the patch with the subject 
 
  [PATCH 01/16] refactor common code in query_search/sync_search
 
  But instead, it starts with the 16/16 patch and works backwards, which,
  obviously, breaks the application process as the patches depend on each

 Note to bystanders.  This is coming from populate_maildir_list() in
 builtin/mailsplit.c; the function claims to know what maildir
 should look like, so it should be enforcing the ordering as
 necessary by sorting the list, _if_ the implicit ordering given by
 string_list_insert() is insufficient.
 
 It also is likely that it is a user error to expect that patch
 e-mails are received and stored in the maildir in the order they
 were sent, or it could be mutt copying the mails in the order the
 messages were originally received, or something silly like that.

I think it is a property of the maildir format that it does not
technically define the message order. The order of items you get from
readdir is filesystem specific and not necessarily defined (and that is
what we use now). On my ext4 system, I do not even get them in backwards
order; it is jumbled.

We could sort based on the mtime of the file, but in some cases that
won't have sufficient resolution (e.g., with one second resolution,
they'll all probably have the same timestamp).

The maildir spec explicitly says that readers should not make
assumptions about the content of the filenames. Mutt happens to write
them as:

  ${epoch_seconds}.${pid}_${seq}.${host}

so in practice, sorting them kind of works. Except that a series written
out at once will all have the same timestamp and pid, and because ${seq}
is not zero-padded, you have to actually parse up to there and do a
numeric instead of byte-wise comparison.  So we can add a mutt-specific
hack, but that's the best we can do. Other maildir writers (including
future versions of mutt) will not necessarily do the same thing.

I think maildir's philosophy is that ordering is not important, and the
contents of the messages themselves should define the order in which a
MUA presents them, anyway. It would make sense to me if we actually
parsed the '[PATCH n/m]' bit from the subject of each and sorted based
on that. That is blurring the line between git-mailsplit and
git-mailinfo, in that mailsplit would have to start parsing the files.

We could also sort based on the rfc822 Date header, but I don't think
that is a good idea. It represents the author timestamp, so patches
moved via rebase -i will have out-of-sequence dates with respect to
the actual history graph.  You can encounter this if you format-patch
--stdout a series into a single mbox and load it into a MUA that sorts
by date (like mutt). The patches appear jumbled until you switch to
unsorted or sort by subject.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug report] git-am applying maildir patches in reverse

2013-03-01 Thread Jeff King
On Fri, Mar 01, 2013 at 05:52:31PM -0500, Jeff King wrote:

  Note to bystanders.  This is coming from populate_maildir_list() in
  builtin/mailsplit.c; the function claims to know what maildir
  should look like, so it should be enforcing the ordering as
  necessary by sorting the list, _if_ the implicit ordering given by
  string_list_insert() is insufficient.
 [...]
 
 I think it is a property of the maildir format that it does not
 technically define the message order. The order of items you get from
 readdir is filesystem specific and not necessarily defined (and that is
 what we use now). On my ext4 system, I do not even get them in backwards
 order; it is jumbled.

Hmph, sorry, I mistook string_list_insert for string_list_append. So we
do actually sort them by filename, not just random readdir order. But
due to this:

 The maildir spec explicitly says that readers should not make
 assumptions about the content of the filenames. Mutt happens to write
 them as:
 
   ${epoch_seconds}.${pid}_${seq}.${host}
 
 so in practice, sorting them kind of works. Except that a series written
 out at once will all have the same timestamp and pid, and because ${seq}
 is not zero-padded, you have to actually parse up to there and do a
 numeric instead of byte-wise comparison.  So we can add a mutt-specific
 hack, but that's the best we can do. Other maildir writers (including
 future versions of mutt) will not necessarily do the same thing.

That ordering is not necessarily useful.

So one strategy could be to try harder to sort numeric elements
numerically. That is, to treat:

  1234.5678_90.foo

as the list

  {1234, '.', 5678, '_', 90, foo}

for the purposes of sorting (even though we do not necessarily know what
each piece means). That works for mutt and similar schemes (dovecot, for
example, does something like M${seq}P${pid}), but some systems may put
random bytes in the middle (the point of it is to create a unique name).

So it is somewhat against the maildir spec, but I think in practice it
would help.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug report] git-am applying maildir patches in reverse

2013-03-01 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Fri, Mar 01, 2013 at 05:52:31PM -0500, Jeff King wrote:
 ...
 The maildir spec explicitly says that readers should not make
 assumptions about the content of the filenames. Mutt happens to write
 them as:
 
   ${epoch_seconds}.${pid}_${seq}.${host}
 
 so in practice, sorting them kind of works. Except that ...
  it does not work  ...
 That ordering is not necessarily useful.
 ...
 So it is somewhat against the maildir spec, but I think in practice it
 would help.

I do not think it would help, unless these epoch_seconds are the
sender timestamps.  And the number of digits in epoch-seconds for
recent messages would be the same, so ordering textually would be
sufficient if ordering by timestamp were.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug report] git-am applying maildir patches in reverse

2013-03-01 Thread Jeff King
On Fri, Mar 01, 2013 at 03:24:42PM -0800, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  On Fri, Mar 01, 2013 at 05:52:31PM -0500, Jeff King wrote:
  ...
  The maildir spec explicitly says that readers should not make
  assumptions about the content of the filenames. Mutt happens to write
  them as:
  
${epoch_seconds}.${pid}_${seq}.${host}
  
  so in practice, sorting them kind of works. Except that ...
   it does not work  ...
  That ordering is not necessarily useful.
  ...
  So it is somewhat against the maildir spec, but I think in practice it
  would help.
 
 I do not think it would help, unless these epoch_seconds are the
 sender timestamps.  And the number of digits in epoch-seconds for
 recent messages would be the same, so ordering textually would be
 sufficient if ordering by timestamp were.

The epoch_seconds are the time of writing into the maildir. They will
typically all be the same, unless your system is very slow, or you are
writing a really long patch series. The PID likewise should be the same
for a given series. It's the sequence number that is the interesting bit
to sort numerically (for mutt, anyway; ditto for dovecot).

The patch below seems to fix the issue for me with mutt. It's possible
that it regresses another case (which has numbers, but really wants them
sorted as byte strings), but I find that unlikely. If you're
zero-padding your numbers this will still work, and if you're not, then
you likely have no meaningful sort order at all.

-- 8 --
Subject: [PATCH] mailsplit: sort maildir filenames more cleverly

A maildir does not technically record the order in which
items were placed into it. That means that when applying a
patch series from a maildir, we may get the patches in the
wrong order. We try to work around this by sorting the
filenames. Unfortunately, this may or may not work depending
on the naming scheme used by the writer of the maildir.

For instance, mutt will write:

  ${epoch_seconds}.${pid}_${seq}.${host}

where we have:

  - epoch_seconds: timestamp at which entry was written
  - pid: PID of writing process
  - seq: a sequence number to ensure uniqueness of filenames
  - host: hostname

None of the numbers are zero-padded. Therefore, when we sort
the names as byte strings, entries that cross a digit
boundary (e.g., 10) will sort out of order.  In the case of
timestamps, it almost never matters (because we do not cross
a digit boundary in the epoch time very often these days).
But for the sequence number, a 10-patch series would be
ordered as 1, 10, 2, 3, etc.

To fix this, we can use a custom sort comparison function
which traverses each string, comparing chunks of digits
numerically, and otherwise doing a byte-for-byte comparison.
That would sort:

  123.456_1.bar
  123.456_2.bar
  ...
  123.456_10.bar

according to the sequence number. Since maildir does not
define a filename format, this is really just a heuristic.
But it happens to work for mutt, and there is a reasonable
chance that it will work for other writers, too (at least as
well as a straight sort).

Signed-off-by: Jeff King p...@peff.net
---
 builtin/mailsplit.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
index 2d43278..772c668 100644
--- a/builtin/mailsplit.c
+++ b/builtin/mailsplit.c
@@ -130,6 +130,26 @@ static int populate_maildir_list(struct string_list *list, 
const char *path)
return 0;
 }
 
+static int maildir_filename_cmp(const char *a, const char *b)
+{
+   while (1) {
+   if (isdigit(*a)  isdigit(*b)) {
+   long int na, nb;
+   na = strtol(a, (char **)a, 10);
+   nb = strtol(b, (char **)b, 10);
+   if (na != nb)
+   return na - nb;
+   /* strtol advanced our pointers */
+   }
+   else {
+   if (*a != *b)
+   return *a - *b;
+   a++;
+   b++;
+   }
+   }
+}
+
 static int split_maildir(const char *maildir, const char *dir,
int nr_prec, int skip)
 {
@@ -139,6 +159,8 @@ static int split_maildir(const char *maildir, const char 
*dir,
int i;
struct string_list list = STRING_LIST_INIT_DUP;
 
+   list.cmp = maildir_filename_cmp;
+
if (populate_maildir_list(list, maildir)  0)
goto out;
 
-- 
1.8.1.39.gbb3bf60

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug report] git-am applying maildir patches in reverse

2013-03-01 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
 index 2d43278..772c668 100644
 --- a/builtin/mailsplit.c
 +++ b/builtin/mailsplit.c
 @@ -130,6 +130,26 @@ static int populate_maildir_list(struct string_list 
 *list, const char *path)
   return 0;
  }
  
 +static int maildir_filename_cmp(const char *a, const char *b)
 +{
 + while (1) {

It is somewhat funny that we do not need to check !*a or !*b in this
loop.  As long as readdir() does not return duplicates, we won't be
comparing the same strings with this function, and we won't read
past '\0' at the end of both a and b at the same time.

 + if (isdigit(*a)  isdigit(*b)) {
 + long int na, nb;
 + na = strtol(a, (char **)a, 10);
 + nb = strtol(b, (char **)b, 10);
 + if (na != nb)
 + return na - nb;
 + /* strtol advanced our pointers */
 + }
 + else {
 + if (*a != *b)
 + return *a - *b;
 + a++;
 + b++;
 + }
 + }
 +}
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug report] git-am applying maildir patches in reverse

2013-03-01 Thread Jeff King
On Fri, Mar 01, 2013 at 03:57:39PM -0800, Junio C Hamano wrote:

  The epoch_seconds are the time of writing into the maildir. They will
  typically all be the same, unless your system is very slow, or you are
  writing a really long patch series. The PID likewise should be the same
  for a given series. It's the sequence number that is the interesting bit
  to sort numerically (for mutt, anyway; ditto for dovecot).
 
 OK, so as long as the user tells mutt what order the messages need
 to be written in when choosing these 16 patches, tiebreaking with
 the sequence number would be sufficient.
 
 Is it easy to tell that to mutt, though, given that the patches
 arrive in random order, not in the order they were sent?

Yes. You can either write one at a time, or you can tag several and
write them with a single command. In the latter case, it writes them out
according to the currently displayed sort order. The usual threaded
display gets it right for delivered messages (it shows them in date
order within the thread), and you can use sort-by-subject to override it
if you are working with munged timestamps.

 I would imagine that you sort the messages in your inbox, select a
 group of messages and tell mutt to write them out to a (new) empty
 maildir, and at that point mutt writes them out in the order you used
 to sort them---is that how it is supposed to work?
 
 Or are you assuming that the user chooses 1/16, tells mutt to write it
 out, chooses 2/16, tells mut to write that out, iterating it 16 times
 to write out a 16-patch series?

Right. The latter probably already works (unless you are so fast that
you write out multiple messages in one second), but I would expect most
people would tag and then save all at once.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bug report] git-am applying maildir patches in reverse

2013-03-01 Thread Jeff King
On Fri, Mar 01, 2013 at 04:08:04PM -0800, Junio C Hamano wrote:

  +static int maildir_filename_cmp(const char *a, const char *b)
  +{
  +   while (1) {
 
 It is somewhat funny that we do not need to check !*a or !*b in this
 loop.  As long as readdir() does not return duplicates, we won't be
 comparing the same strings with this function, and we won't read
 past '\0' at the end of both a and b at the same time.

Ugh, thanks for catching. I had structured this differently at first
with a while(1), and then after refactoring it forgot to update the loop
condition. Even if it is fine without duplicates, we should not rely on
that not to cause an infinite loop (in case the function is ever used
again). But moreover, you can trigger the problem even now, since we
would parse foo_0123 and foo_123 as equivalent, and hit the NUL.

I think this needs squashed in:

diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
index 772c668..1ddbff4 100644
--- a/builtin/mailsplit.c
+++ b/builtin/mailsplit.c
@@ -132,7 +132,7 @@ static int maildir_filename_cmp(const char *a, const char 
*b)
 
 static int maildir_filename_cmp(const char *a, const char *b)
 {
-   while (1) {
+   while (*a  *b) {
if (isdigit(*a)  isdigit(*b)) {
long int na, nb;
na = strtol(a, (char **)a, 10);
@@ -148,6 +148,7 @@ static int maildir_filename_cmp(const char *a, const char 
*b)
b++;
}
}
+   return *a - *b;
 }
 
 static int split_maildir(const char *maildir, const char *dir,

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html