Re: [bug report] git-am applying maildir patches in reverse
Jeff King p...@peff.net writes: static int maildir_filename_cmp(const char *a, const char *b) { - while (1) { + while (*a *b) { if (isdigit(*a) isdigit(*b)) { long int na, nb; na = strtol(a, (char **)a, 10); @@ -148,6 +148,7 @@ static int maildir_filename_cmp(const char *a, const char *b) b++; } } + return *a - *b; You should always cast to unsigned char when determining the order of characters, to be consistent with strcmp/memcmp. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug report] git-am applying maildir patches in reverse
Andreas Schwab sch...@linux-m68k.org writes: You should always cast to unsigned char when determining the order of characters, to be consistent with strcmp/memcmp. We treat runs of digits as numbers, so it is not even similar to strcmp. As long as it is internally consistent (i.e. the return value inside the loop (*a - *b) must match the last return), it should be OK, no? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug report] git-am applying maildir patches in reverse
On Sat, Mar 02, 2013 at 09:44:39AM +0100, Andreas Schwab wrote: + return *a - *b; You should always cast to unsigned char when determining the order of characters, to be consistent with strcmp/memcmp. Thanks, I hadn't heard that advice before, but it makes obvious sense. Junio, do you want to squash this on top? diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c index 1ddbff4..06296d4 100644 --- a/builtin/mailsplit.c +++ b/builtin/mailsplit.c @@ -143,12 +143,12 @@ static int maildir_filename_cmp(const char *a, const char *b) } else { if (*a != *b) - return *a - *b; + return (unsigned char)*a - (unsigned char)*b; a++; b++; } } - return *a - *b; + return (unsigned char)*a - (unsigned char)*b; } static int split_maildir(const char *maildir, const char *dir, -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug report] git-am applying maildir patches in reverse
On Sat, Mar 02, 2013 at 10:22:46PM -0800, Junio C Hamano wrote: Andreas Schwab sch...@linux-m68k.org writes: You should always cast to unsigned char when determining the order of characters, to be consistent with strcmp/memcmp. We treat runs of digits as numbers, so it is not even similar to strcmp. As long as it is internally consistent (i.e. the return value inside the loop (*a - *b) must match the last return), it should be OK, no? I almost responded and said something similar, but we also do byte-wise comparisons for non-numeric elements, and we would want those to match what other programs may do (and what git used to do). I highly doubt that it matters in practice, as it would mean: 1. The sorting of a maildir's filenames are dependent on the sorting of non-numeric bits. We can't rule out such a scheme, but I'd guess implementations either use numbers, or their sort order is meaningless (and that is what I found in the ones I looked at). 2. The importantly-sorted bits contain non-ascii characters (the difference is only seen when we go outside the signed range). but it doesn't hurt to be thorough (and to set a good example). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug report] git-am applying maildir patches in reverse
William Giokas 1007...@gmail.com writes: All, I've been using git for a while and this is the first time I've had to use `git am` and I've got a 16 patch patchset that I'm looking to apply. The files were copied to a separate maildir by mutt to keep things clean, and then I ran `git am -i /path/to/maildir/` expecting things to start from the patch with the subject [PATCH 01/16] refactor common code in query_search/sync_search But instead, it starts with the 16/16 patch and works backwards, which, obviously, breaks the application process as the patches depend on each other. I looked in the maildir directory just to see if the file names were backwards, and that's not the issue. I talked to `gitster` on IRC and he said to send in a bug report on this issue here. The patchset I'm trying to apply can be found here[0]. Process to reproduce: * find a multi-patch set with interdependent patches * run `git am` on the maildir containing these patches Expected result: * Apply patches in [01..N] order Actual result: * Patches applied in [N N-1..01] order Note to bystanders. This is coming from populate_maildir_list() in builtin/mailsplit.c; the function claims to know what maildir should look like, so it should be enforcing the ordering as necessary by sorting the list, _if_ the implicit ordering given by string_list_insert() is insufficient. It also is likely that it is a user error to expect that patch e-mails are received and stored in the maildir in the order they were sent, or it could be mutt copying the mails in the order the messages were originally received, or something silly like that. [0]: https://mailman.archlinux.org/pipermail/pacman-dev/2013-March/016541.html Thank you, -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug report] git-am applying maildir patches in reverse
On Fri, Mar 01, 2013 at 02:27:32PM -0800, Junio C Hamano wrote: I've been using git for a while and this is the first time I've had to use `git am` and I've got a 16 patch patchset that I'm looking to apply. The files were copied to a separate maildir by mutt to keep things clean, and then I ran `git am -i /path/to/maildir/` expecting things to start from the patch with the subject [PATCH 01/16] refactor common code in query_search/sync_search But instead, it starts with the 16/16 patch and works backwards, which, obviously, breaks the application process as the patches depend on each Note to bystanders. This is coming from populate_maildir_list() in builtin/mailsplit.c; the function claims to know what maildir should look like, so it should be enforcing the ordering as necessary by sorting the list, _if_ the implicit ordering given by string_list_insert() is insufficient. It also is likely that it is a user error to expect that patch e-mails are received and stored in the maildir in the order they were sent, or it could be mutt copying the mails in the order the messages were originally received, or something silly like that. I think it is a property of the maildir format that it does not technically define the message order. The order of items you get from readdir is filesystem specific and not necessarily defined (and that is what we use now). On my ext4 system, I do not even get them in backwards order; it is jumbled. We could sort based on the mtime of the file, but in some cases that won't have sufficient resolution (e.g., with one second resolution, they'll all probably have the same timestamp). The maildir spec explicitly says that readers should not make assumptions about the content of the filenames. Mutt happens to write them as: ${epoch_seconds}.${pid}_${seq}.${host} so in practice, sorting them kind of works. Except that a series written out at once will all have the same timestamp and pid, and because ${seq} is not zero-padded, you have to actually parse up to there and do a numeric instead of byte-wise comparison. So we can add a mutt-specific hack, but that's the best we can do. Other maildir writers (including future versions of mutt) will not necessarily do the same thing. I think maildir's philosophy is that ordering is not important, and the contents of the messages themselves should define the order in which a MUA presents them, anyway. It would make sense to me if we actually parsed the '[PATCH n/m]' bit from the subject of each and sorted based on that. That is blurring the line between git-mailsplit and git-mailinfo, in that mailsplit would have to start parsing the files. We could also sort based on the rfc822 Date header, but I don't think that is a good idea. It represents the author timestamp, so patches moved via rebase -i will have out-of-sequence dates with respect to the actual history graph. You can encounter this if you format-patch --stdout a series into a single mbox and load it into a MUA that sorts by date (like mutt). The patches appear jumbled until you switch to unsorted or sort by subject. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug report] git-am applying maildir patches in reverse
On Fri, Mar 01, 2013 at 05:52:31PM -0500, Jeff King wrote: Note to bystanders. This is coming from populate_maildir_list() in builtin/mailsplit.c; the function claims to know what maildir should look like, so it should be enforcing the ordering as necessary by sorting the list, _if_ the implicit ordering given by string_list_insert() is insufficient. [...] I think it is a property of the maildir format that it does not technically define the message order. The order of items you get from readdir is filesystem specific and not necessarily defined (and that is what we use now). On my ext4 system, I do not even get them in backwards order; it is jumbled. Hmph, sorry, I mistook string_list_insert for string_list_append. So we do actually sort them by filename, not just random readdir order. But due to this: The maildir spec explicitly says that readers should not make assumptions about the content of the filenames. Mutt happens to write them as: ${epoch_seconds}.${pid}_${seq}.${host} so in practice, sorting them kind of works. Except that a series written out at once will all have the same timestamp and pid, and because ${seq} is not zero-padded, you have to actually parse up to there and do a numeric instead of byte-wise comparison. So we can add a mutt-specific hack, but that's the best we can do. Other maildir writers (including future versions of mutt) will not necessarily do the same thing. That ordering is not necessarily useful. So one strategy could be to try harder to sort numeric elements numerically. That is, to treat: 1234.5678_90.foo as the list {1234, '.', 5678, '_', 90, foo} for the purposes of sorting (even though we do not necessarily know what each piece means). That works for mutt and similar schemes (dovecot, for example, does something like M${seq}P${pid}), but some systems may put random bytes in the middle (the point of it is to create a unique name). So it is somewhat against the maildir spec, but I think in practice it would help. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug report] git-am applying maildir patches in reverse
Jeff King p...@peff.net writes: On Fri, Mar 01, 2013 at 05:52:31PM -0500, Jeff King wrote: ... The maildir spec explicitly says that readers should not make assumptions about the content of the filenames. Mutt happens to write them as: ${epoch_seconds}.${pid}_${seq}.${host} so in practice, sorting them kind of works. Except that ... it does not work ... That ordering is not necessarily useful. ... So it is somewhat against the maildir spec, but I think in practice it would help. I do not think it would help, unless these epoch_seconds are the sender timestamps. And the number of digits in epoch-seconds for recent messages would be the same, so ordering textually would be sufficient if ordering by timestamp were. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug report] git-am applying maildir patches in reverse
On Fri, Mar 01, 2013 at 03:24:42PM -0800, Junio C Hamano wrote: Jeff King p...@peff.net writes: On Fri, Mar 01, 2013 at 05:52:31PM -0500, Jeff King wrote: ... The maildir spec explicitly says that readers should not make assumptions about the content of the filenames. Mutt happens to write them as: ${epoch_seconds}.${pid}_${seq}.${host} so in practice, sorting them kind of works. Except that ... it does not work ... That ordering is not necessarily useful. ... So it is somewhat against the maildir spec, but I think in practice it would help. I do not think it would help, unless these epoch_seconds are the sender timestamps. And the number of digits in epoch-seconds for recent messages would be the same, so ordering textually would be sufficient if ordering by timestamp were. The epoch_seconds are the time of writing into the maildir. They will typically all be the same, unless your system is very slow, or you are writing a really long patch series. The PID likewise should be the same for a given series. It's the sequence number that is the interesting bit to sort numerically (for mutt, anyway; ditto for dovecot). The patch below seems to fix the issue for me with mutt. It's possible that it regresses another case (which has numbers, but really wants them sorted as byte strings), but I find that unlikely. If you're zero-padding your numbers this will still work, and if you're not, then you likely have no meaningful sort order at all. -- 8 -- Subject: [PATCH] mailsplit: sort maildir filenames more cleverly A maildir does not technically record the order in which items were placed into it. That means that when applying a patch series from a maildir, we may get the patches in the wrong order. We try to work around this by sorting the filenames. Unfortunately, this may or may not work depending on the naming scheme used by the writer of the maildir. For instance, mutt will write: ${epoch_seconds}.${pid}_${seq}.${host} where we have: - epoch_seconds: timestamp at which entry was written - pid: PID of writing process - seq: a sequence number to ensure uniqueness of filenames - host: hostname None of the numbers are zero-padded. Therefore, when we sort the names as byte strings, entries that cross a digit boundary (e.g., 10) will sort out of order. In the case of timestamps, it almost never matters (because we do not cross a digit boundary in the epoch time very often these days). But for the sequence number, a 10-patch series would be ordered as 1, 10, 2, 3, etc. To fix this, we can use a custom sort comparison function which traverses each string, comparing chunks of digits numerically, and otherwise doing a byte-for-byte comparison. That would sort: 123.456_1.bar 123.456_2.bar ... 123.456_10.bar according to the sequence number. Since maildir does not define a filename format, this is really just a heuristic. But it happens to work for mutt, and there is a reasonable chance that it will work for other writers, too (at least as well as a straight sort). Signed-off-by: Jeff King p...@peff.net --- builtin/mailsplit.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c index 2d43278..772c668 100644 --- a/builtin/mailsplit.c +++ b/builtin/mailsplit.c @@ -130,6 +130,26 @@ static int populate_maildir_list(struct string_list *list, const char *path) return 0; } +static int maildir_filename_cmp(const char *a, const char *b) +{ + while (1) { + if (isdigit(*a) isdigit(*b)) { + long int na, nb; + na = strtol(a, (char **)a, 10); + nb = strtol(b, (char **)b, 10); + if (na != nb) + return na - nb; + /* strtol advanced our pointers */ + } + else { + if (*a != *b) + return *a - *b; + a++; + b++; + } + } +} + static int split_maildir(const char *maildir, const char *dir, int nr_prec, int skip) { @@ -139,6 +159,8 @@ static int split_maildir(const char *maildir, const char *dir, int i; struct string_list list = STRING_LIST_INIT_DUP; + list.cmp = maildir_filename_cmp; + if (populate_maildir_list(list, maildir) 0) goto out; -- 1.8.1.39.gbb3bf60 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug report] git-am applying maildir patches in reverse
Jeff King p...@peff.net writes: diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c index 2d43278..772c668 100644 --- a/builtin/mailsplit.c +++ b/builtin/mailsplit.c @@ -130,6 +130,26 @@ static int populate_maildir_list(struct string_list *list, const char *path) return 0; } +static int maildir_filename_cmp(const char *a, const char *b) +{ + while (1) { It is somewhat funny that we do not need to check !*a or !*b in this loop. As long as readdir() does not return duplicates, we won't be comparing the same strings with this function, and we won't read past '\0' at the end of both a and b at the same time. + if (isdigit(*a) isdigit(*b)) { + long int na, nb; + na = strtol(a, (char **)a, 10); + nb = strtol(b, (char **)b, 10); + if (na != nb) + return na - nb; + /* strtol advanced our pointers */ + } + else { + if (*a != *b) + return *a - *b; + a++; + b++; + } + } +} -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug report] git-am applying maildir patches in reverse
On Fri, Mar 01, 2013 at 03:57:39PM -0800, Junio C Hamano wrote: The epoch_seconds are the time of writing into the maildir. They will typically all be the same, unless your system is very slow, or you are writing a really long patch series. The PID likewise should be the same for a given series. It's the sequence number that is the interesting bit to sort numerically (for mutt, anyway; ditto for dovecot). OK, so as long as the user tells mutt what order the messages need to be written in when choosing these 16 patches, tiebreaking with the sequence number would be sufficient. Is it easy to tell that to mutt, though, given that the patches arrive in random order, not in the order they were sent? Yes. You can either write one at a time, or you can tag several and write them with a single command. In the latter case, it writes them out according to the currently displayed sort order. The usual threaded display gets it right for delivered messages (it shows them in date order within the thread), and you can use sort-by-subject to override it if you are working with munged timestamps. I would imagine that you sort the messages in your inbox, select a group of messages and tell mutt to write them out to a (new) empty maildir, and at that point mutt writes them out in the order you used to sort them---is that how it is supposed to work? Or are you assuming that the user chooses 1/16, tells mutt to write it out, chooses 2/16, tells mut to write that out, iterating it 16 times to write out a 16-patch series? Right. The latter probably already works (unless you are so fast that you write out multiple messages in one second), but I would expect most people would tag and then save all at once. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug report] git-am applying maildir patches in reverse
On Fri, Mar 01, 2013 at 04:08:04PM -0800, Junio C Hamano wrote: +static int maildir_filename_cmp(const char *a, const char *b) +{ + while (1) { It is somewhat funny that we do not need to check !*a or !*b in this loop. As long as readdir() does not return duplicates, we won't be comparing the same strings with this function, and we won't read past '\0' at the end of both a and b at the same time. Ugh, thanks for catching. I had structured this differently at first with a while(1), and then after refactoring it forgot to update the loop condition. Even if it is fine without duplicates, we should not rely on that not to cause an infinite loop (in case the function is ever used again). But moreover, you can trigger the problem even now, since we would parse foo_0123 and foo_123 as equivalent, and hit the NUL. I think this needs squashed in: diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c index 772c668..1ddbff4 100644 --- a/builtin/mailsplit.c +++ b/builtin/mailsplit.c @@ -132,7 +132,7 @@ static int maildir_filename_cmp(const char *a, const char *b) static int maildir_filename_cmp(const char *a, const char *b) { - while (1) { + while (*a *b) { if (isdigit(*a) isdigit(*b)) { long int na, nb; na = strtol(a, (char **)a, 10); @@ -148,6 +148,7 @@ static int maildir_filename_cmp(const char *a, const char *b) b++; } } + return *a - *b; } static int split_maildir(const char *maildir, const char *dir, -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html