Re: [PATCH] Improve function dir.c:trim_trailing_spaces()
On Sat, May 31, 2014 at 08:21:31AM -0700, Pasha Bolokhov wrote: > + char *p, *last_space = NULL; > + > + for (p = buf; *p; p++) > + if (*p == ' ') { > + if (!last_space) > + last_space = p; > + } else { > + if (*p == '\\') > + p++; > + last_space = NULL; > + } Your backslash-escape works here by incrementing "p" an extra time. So we move past the backslash to the next character (which is escaped), and then the for-loop increments it again to the character beyond that, which is the next one worth considering. What happens if we are parsing a string with an unmatched backslash at the end of the string, like: foo\ We consider the end-of-string NUL to be escaped, skip it, and then keep reading whatever random bytes are in memory after the string. The original version did not have a problem with this because it used a length, which meant that "i < len" caught this case. I think you either need to insert an extra check for "!p[1]" when moving past the escaped character, or move back to a length-based check. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Improve function dir.c:trim_trailing_spaces()
Discard the unnecessary 'nr_spaces' variable, remove 'strlen()' and improve the 'if' structure. Switch to pointers instead of integers Slightly more rare occurrences of 'text \' with a backslash in between spaces are handled correctly. Namely, the code in 8ba87adad6 does not reset 'last_space' when a backslash is encountered and the above line stays intact as a result. Add a test at the end of t/t0008-ignores.sh to exhibit this behavior Signed-off-by: Pasha Bolokhov --- Correct 'if()' statements to conform to the general style which implies using 'if(ptr)' as an equivalent of 'if(ptr != NULL)' dir.c | 29 ++--- t/t0008-ignores.sh | 21 + 2 files changed, 35 insertions(+), 15 deletions(-) diff --git a/dir.c b/dir.c index eb6f581..98c81a8 100644 --- a/dir.c +++ b/dir.c @@ -508,21 +508,20 @@ void clear_exclude_list(struct exclude_list *el) static void trim_trailing_spaces(char *buf) { - int i, last_space = -1, nr_spaces, len = strlen(buf); - for (i = 0; i < len; i++) - if (buf[i] == '\\') - i++; - else if (buf[i] == ' ') { - if (last_space == -1) { - last_space = i; - nr_spaces = 1; - } else - nr_spaces++; - } else - last_space = -1; - - if (last_space != -1 && last_space + nr_spaces == len) - buf[last_space] = '\0'; + char *p, *last_space = NULL; + + for (p = buf; *p; p++) + if (*p == ' ') { + if (!last_space) + last_space = p; + } else { + if (*p == '\\') + p++; + last_space = NULL; + } + + if (last_space) + *last_space = '\0'; } int add_excludes_from_file_to_list(const char *fname, diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh index 63beb99..7becf96 100755 --- a/t/t0008-ignores.sh +++ b/t/t0008-ignores.sh @@ -806,4 +806,25 @@ test_expect_success !MINGW 'quoting allows trailing whitespace' ' test_cmp err.expect err ' +test_expect_success NOT_MINGW,NOT_CYGWIN 'correct handling of backslashes' ' + rm -rf whitespace && + mkdir whitespace && + >"whitespace/trailing 1 " && + >"whitespace/trailing 2 " && + >"whitespace/trailing 3 " && + >"whitespace/trailing 4 \\ " && + >"whitespace/trailing 5 \\ \\ " && + >whitespace/untracked && + echo "whitespace/trailing 1 \\" >ignore && + echo "whitespace/trailing 2 " >>ignore && + echo "whitespace/trailing 3 " >>ignore && + echo "whitespace/trailing 4 " >>ignore && + echo "whitespace/trailing 5 " >>ignore && + echo whitespace/untracked >expect && + : >err.expect && + git ls-files -o -X ignore whitespace >actual 2>err && + test_cmp expect actual && + test_cmp err.expect err +' + test_done -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Improve function dir.c:trim_trailing_spaces()
Jeff King writes: > On Thu, May 29, 2014 at 02:34:33PM -0700, Pasha Bolokhov wrote: > >> > However, I doubt it makes that much of a difference in practice, so >> > unless it's measurable, I would certainly go with the version that is >> > more readable (and correct, of course). >> >> Sorry, just to recap, you would go with the existing version >> (which needs correction), or with the one that is being suggested? (I >> agree I can format the style a tiny bit better in the latter one) > > I actually think the original left-to-right is a little easier to > follow, but I do not feel strongly. I mainly meant "argue based on > readability and correctness, do not argue based on speed". Sensible. > I'd be OK with either, though I have a slight preference for the first, > just because I find the "bslash ^= 1" bit of yours, while clever, a bit > hard to follow. FWIW, I think I agree. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Improve function dir.c:trim_trailing_spaces()
On Thu, May 29, 2014 at 02:34:33PM -0700, Pasha Bolokhov wrote: > > However, I doubt it makes that much of a difference in practice, so > > unless it's measurable, I would certainly go with the version that is > > more readable (and correct, of course). > > Sorry, just to recap, you would go with the existing version > (which needs correction), or with the one that is being suggested? (I > agree I can format the style a tiny bit better in the latter one) I actually think the original left-to-right is a little easier to follow, but I do not feel strongly. I mainly meant "argue based on readability and correctness, do not argue based on speed". > Tests should not be a big problem, although it's kind of clumsy > to test an internal function which does not really give any output > (you can only measure the outcome). Just again to stress, I have > tested both implementation extensively and the suggested new > implementation gives the correct answers for all your examples below > and all others. If I show this with explicit "t/" tests, will it > suffice then? Yes. I think specifically that you can extend the tests at the end of t0008. > Basically what I suggest is > > -- either: improve the existing function such that it does correctly > that "text \ " case, and also does not use 'strlen' since it anyway > moves left to right > > -- or: use the new suggested implementation (and just brush the > formatting a bit), and perhaps borrow 'len' from the calling routine > > And add tests in any case. What is the preference? I'd be OK with either, though I have a slight preference for the first, just because I find the "bslash ^= 1" bit of yours, while clever, a bit hard to follow. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Improve function dir.c:trim_trailing_spaces()
On Thu, May 29, 2014 at 1:13 PM, Jeff King wrote: > On Wed, May 28, 2014 at 04:45:57PM -0700, Pasha Bolokhov wrote: > >> Move backwards from the end of the string (more efficient for >> lines which do not have trailing spaces or have just a couple). > > The original code reads the string from left to right. In theory, that > means we could get away with not calling strlen() at all, over a > right-to-left that must start with a call to strlen(). > > That being said, I think we already have the length in the calling > function, so you could probably avoid the strlen() altogether, which > makes a right-to-left function more efficient. > > However, I doubt it makes that much of a difference in practice, so > unless it's measurable, I would certainly go with the version that is > more readable (and correct, of course). Sorry, just to recap, you would go with the existing version (which needs correction), or with the one that is being suggested? (I agree I can format the style a tiny bit better in the latter one) Tests should not be a big problem, although it's kind of clumsy to test an internal function which does not really give any output (you can only measure the outcome). Just again to stress, I have tested both implementation extensively and the suggested new implementation gives the correct answers for all your examples below and all others. If I show this with explicit "t/" tests, will it suffice then? Basically what I suggest is -- either: improve the existing function such that it does correctly that "text \ " case, and also does not use 'strlen' since it anyway moves left to right -- or: use the new suggested implementation (and just brush the formatting a bit), and perhaps borrow 'len' from the calling routine And add tests in any case. What is the preference? > >> Slightly more rare occurrences of 'text \' with a backslash >> in between spaces are handled correctly. > > Can you add a test for this? > > Also, if you are refactoring this function, I think it makes sense to > check that: > > "foo\\ " > > and > > "foo\\\ " > > match "foo\" and "foo\ ", respectively (I think they do with your patch, > but it is a tricky case that is not immediately obvious). > > -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Improve function dir.c:trim_trailing_spaces()
On Wed, May 28, 2014 at 04:45:57PM -0700, Pasha Bolokhov wrote: > Move backwards from the end of the string (more efficient for > lines which do not have trailing spaces or have just a couple). The original code reads the string from left to right. In theory, that means we could get away with not calling strlen() at all, over a right-to-left that must start with a call to strlen(). That being said, I think we already have the length in the calling function, so you could probably avoid the strlen() altogether, which makes a right-to-left function more efficient. However, I doubt it makes that much of a difference in practice, so unless it's measurable, I would certainly go with the version that is more readable (and correct, of course). > Slightly more rare occurrences of 'text \' with a backslash > in between spaces are handled correctly. Can you add a test for this? Also, if you are refactoring this function, I think it makes sense to check that: "foo\\ " and "foo\\\ " match "foo\" and "foo\ ", respectively (I think they do with your patch, but it is a tricky case that is not immediately obvious). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Improve function dir.c:trim_trailing_spaces()
Move backwards from the end of the string (more efficient for lines which do not have trailing spaces or have just a couple). Slightly more rare occurrences of 'text \' with a backslash in between spaces are handled correctly. Namely, the code in 8ba87adad6 does not reset 'last_space' when a backslash is encountered and the above line stays intact as a result --- How about trailing tabs? dir.c | 25 +++-- 1 file changed, 11 insertions(+), 14 deletions(-) diff --git a/dir.c b/dir.c index eb6f581..3315eea 100644 --- a/dir.c +++ b/dir.c @@ -508,21 +508,18 @@ void clear_exclude_list(struct exclude_list *el) static void trim_trailing_spaces(char *buf) { - int i, last_space = -1, nr_spaces, len = strlen(buf); - for (i = 0; i < len; i++) - if (buf[i] == '\\') - i++; - else if (buf[i] == ' ') { - if (last_space == -1) { - last_space = i; - nr_spaces = 1; - } else - nr_spaces++; - } else - last_space = -1; - - if (last_space != -1 && last_space + nr_spaces == len) + int i, last_space, bslash = 0, len = strlen(buf); + + if (len == 0 || buf[len - 1] != ' ') + return; + for (i = len - 2; i >= 0 && buf[i] == ' '; i--) ; + last_space = i + 1; + for ( ; i >=0 && buf[i] == '\\'; i--) bslash ^= 1; + + if (!bslash) buf[last_space] = '\0'; + else if (bslash && last_space < len - 1) + buf[last_space + 1] = '\0'; } int add_excludes_from_file_to_list(const char *fname, -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html