Re: [RFC PATCH] git log: support "auto" decorations
Jeff King writes: > I wonder if it would be sane to remove or quote NULs when attaching the > buffer to commit->buffer. That would _break_ signatures, but that is a > good thing. I do not think there is a reason to have NULs in your commit > message unless you are doing something malicious (or using utf16, but > that already is horribly broken). Ahh, our messages crossed. I do not think we are quite ready to depart from our traditional position: the payload of a commit object can be any bytestream, even though we do expect and encourage them to be human readable text in a reasonable encoding. And there is no fundamental reason why we should forbid signing the payload that happens to be a structured binary blob. The user may need some way other than "log --show-signature" that can be used to validate, because "log" itself will be useless for such a payload with or without signature. But I think that may be a more or less orthogonal issue. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] git log: support "auto" decorations
Junio C Hamano writes: > Jeff King writes: > >> Subject: [PATCH] reuse commit->buffer when parsing signatures >> ... >> Signed-off-by: Jeff King > > Hmph, unfortunately this seems to break t7510. And I think without re-reading the patch I know what is wrong. The length of the object and strlen(commit->buffer) would be different, no? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] git log: support "auto" decorations
On Fri, May 30, 2014 at 01:44:32PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > Subject: [PATCH] reuse commit->buffer when parsing signatures > > ... > > Signed-off-by: Jeff King > > Hmph, unfortunately this seems to break t7510. Urgh, sorry for not testing more thoroughly. I imagine it is because of the strlen(commit->buffer) I introduced. Unfortunately I do not know if we have an easy way to know the length of commit->buffer, short of hitting sha1_object_info (which is somewhat expensive to do for every commit). I wonder if it would be sane to remove or quote NULs when attaching the buffer to commit->buffer. That would _break_ signatures, but that is a good thing. I do not think there is a reason to have NULs in your commit message unless you are doing something malicious (or using utf16, but that already is horribly broken). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] git log: support "auto" decorations
Jeff King writes: > Subject: [PATCH] reuse commit->buffer when parsing signatures > ... > Signed-off-by: Jeff King Hmph, unfortunately this seems to break t7510. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] git log: support "auto" decorations
On Fri, May 30, 2014 at 02:34:41PM -0400, Jeff King wrote: > On Fri, May 30, 2014 at 10:35:14AM -0700, Junio C Hamano wrote: > > > > Do you want me to roll it up with a real commit message? > > > > Yes. I think the change is sensible. > > Here it is. [...] By the way, I rather derailed Linus's original patch with this sub-thread. I think it actually looks fine as-is. The shortcomings he listed are all there, but I think addressing them would end up even worse. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] git log: support "auto" decorations
On Fri, May 30, 2014 at 10:35:14AM -0700, Junio C Hamano wrote: > > Do you want me to roll it up with a real commit message? > > Yes. I think the change is sensible. Here it is. We may want to make these helper functions available to other callers so they can use the same trick, but I do not know offhand of any others that would want it. pretty.c is the obvious place, and it already uses a similar trick in logmsg_reencode (and I would expect most users of the commit message would actually want the reencoded version, and would use that). -- >8 -- Subject: [PATCH] reuse commit->buffer when parsing signatures When we call show_signature or show_mergetag, we read the commit object fresh via read_sha1_file and reparse its headers. However, in most cases we already have the object data available as commit->buffer. This is partially laziness in dealing with the memory allocation issues, but partially defensive programming, in that we would always want to verify a clean version of the buffer (not one that might have been munged by other users of the commit). However, we do not currently ever munge commit->buffer, and not using the already-available buffer carries a fairly big performance penalty when we are looking at a large number of commits. Here are timings on linux.git: [baseline, no signatures] $ time git log >/dev/null real0m4.902s user0m4.784s sys 0m0.120s [before] $ time git log --show-signature >/dev/null real0m14.735s user0m9.964s sys 0m0.944s [after] $ time git log --show-signature >/dev/null real0m9.981s user0m5.260s sys 0m0.936s Note that our user CPU time drops almost in half, close to the non-signature case, but we do still spend more wall-clock and system time, presumably from dealing with gpg. An alternative to this is to note that most commits do not have signatures (less than 1% in this repo), yet we pay the re-parsing cost for every commit just to find out if it has a mergetag or signature. If we checked that when parsing the commit initially, we could avoid re-examining most commits later on. Even if we did pursue that direction, however, this would still speed up the cases where we _do_ have signatures. So it's probably worth doing either way. Signed-off-by: Jeff King --- commit.c | 44 commit.h | 2 +- log-tree.c | 2 +- 3 files changed, 38 insertions(+), 10 deletions(-) diff --git a/commit.c b/commit.c index f479331..9e2abf7 100644 --- a/commit.c +++ b/commit.c @@ -1080,14 +1080,42 @@ static int do_sign_commit(struct strbuf *buf, const char *keyid) return 0; } -int parse_signed_commit(const unsigned char *sha1, +/* + * Return the contents of the object pointed to by commit, as + * if read by read_sha1_file. However, in cases where the commit's + * data is already in memory, return that as an optimization. + * + * The resulting buffer may or may not be freshly allocated, + * and should only be freed by free_commit_buffer. + */ +static const char *read_commit_buffer(const struct commit *commit, + enum object_type *type, + unsigned long *size) +{ + if (commit->buffer) { + *type = OBJ_COMMIT; + *size = strlen(commit->buffer); + return commit->buffer; + } + + return read_sha1_file(commit->object.sha1, type, size); +} + +static void free_commit_buffer(const char *buffer, const struct commit *commit) +{ + if (buffer != commit->buffer) + free((char *)buffer); +} + +int parse_signed_commit(const struct commit *commit, struct strbuf *payload, struct strbuf *signature) { + unsigned long size; enum object_type type; - char *buffer = read_sha1_file(sha1, &type, &size); + const char *buffer = read_commit_buffer(commit, &type, &size); int in_signature, saw_signature = -1; - char *line, *tail; + const char *line, *tail; if (!buffer || type != OBJ_COMMIT) goto cleanup; @@ -1098,7 +1126,7 @@ int parse_signed_commit(const unsigned char *sha1, saw_signature = 0; while (line < tail) { const char *sig = NULL; - char *next = memchr(line, '\n', tail - line); + const char *next = memchr(line, '\n', tail - line); next = next ? next + 1 : tail; if (in_signature && line[0] == ' ') @@ -1120,7 +1148,7 @@ int parse_signed_commit(const unsigned char *sha1, line = next; } cleanup: - free(buffer); + free_commit_buffer(buffer, commit); return saw_signature; } @@ -1211,7 +1239,7 @@ void check_commit_signature(const struct commit* commit, struct signature_check sigc->result = 'N'; - if (parse_signed_commit(commit->object.sha1, + if (parse_signed_commit(commit,
Re: [RFC PATCH] git log: support "auto" decorations
Jeff King writes: > On Fri, May 30, 2014 at 09:55:14AM -0700, Junio C Hamano wrote: > > I don't think we need to worry about commit->buffer being mucked with. > It is always either NULL, or points to the original object contents. > Encoded log messages are always placed in a separate buffer (and in fact > we use the same "optionally point to commit->buffer" trick there). And > things like mucking with parents always happen on the parsed form. > > Of course I may be missing a site, and it's certainly a maintenance risk > for the future. But I'd go so far as to say that anything modifying > commit->buffer is wrong, and that side should be fixed. I fully agree, and "that side should be fixed" implying "we should always be on a look-out for such a change" is something the lazyness tried to avoid. > Do you want me to roll it up with a real commit message? Yes. I think the change is sensible. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] git log: support "auto" decorations
On Fri, May 30, 2014 at 09:55:14AM -0700, Junio C Hamano wrote: > Jeff King writes: > > > On Thu, May 29, 2014 at 09:54:10PM -0700, Linus Torvalds wrote: > > > >> That said, part of it is just that show-signature is so suboptimal > >> performance-wise, re-parsing the commit buffer for each commit when > >> "show_signature" is set. That's just crazy, we've already parsed the > >> commit text, we already *could* know if it has a signature or not, and > >> skip it if it doesn't. That would require one of the flag bits in the > >> object, though, or something, so it's probably not worth doing. > > > > Wow, it's really quite bad. Not only do we spend time on commits that we > > could otherwise know do not have signatures, but we actually pull the > > buffer from disk, even though we generally have it saved as > > commit->buffer. > > The one for the signature on the commit itself is me being lazy and > defensive; I did not want to have to worry about people mucking with > what is in commit->buffer for whatever reason (e.g. re-encode in > different charset, etc.) and then asking the signature validated. > > The other one for the merge-tag is me just being lazy, as it is > unlikely to be corrupt by any reasonable kinds of mucking with > commit->buffer on a merge. I don't think we need to worry about commit->buffer being mucked with. It is always either NULL, or points to the original object contents. Encoded log messages are always placed in a separate buffer (and in fact we use the same "optionally point to commit->buffer" trick there). And things like mucking with parents always happen on the parsed form. Of course I may be missing a site, and it's certainly a maintenance risk for the future. But I'd go so far as to say that anything modifying commit->buffer is wrong, and that side should be fixed. Do you want me to roll it up with a real commit message? The other option is to do something like Linus suggested, and note the presence/absence of signature and mergetag headers with a few bits (we could even use a commit slab if we don't want to waste bits in the object struct). That would prevent us hitting this code at all for most commits, so we would save not only the read_sha1_file cost, but the extra parsing cost. However, that does nothing to help the cases where we _do_ have signatures. A repo where somebody GPG-signed every commit, for example, would still perform terribly. So even if we go that route, I think we'd want to apply this technique, too. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] git log: support "auto" decorations
Jeff King writes: > On Thu, May 29, 2014 at 09:54:10PM -0700, Linus Torvalds wrote: > >> That said, part of it is just that show-signature is so suboptimal >> performance-wise, re-parsing the commit buffer for each commit when >> "show_signature" is set. That's just crazy, we've already parsed the >> commit text, we already *could* know if it has a signature or not, and >> skip it if it doesn't. That would require one of the flag bits in the >> object, though, or something, so it's probably not worth doing. > > Wow, it's really quite bad. Not only do we spend time on commits that we > could otherwise know do not have signatures, but we actually pull the > buffer from disk, even though we generally have it saved as > commit->buffer. The one for the signature on the commit itself is me being lazy and defensive; I did not want to have to worry about people mucking with what is in commit->buffer for whatever reason (e.g. re-encode in different charset, etc.) and then asking the signature validated. The other one for the merge-tag is me just being lazy, as it is unlikely to be corrupt by any reasonable kinds of mucking with commit->buffer on a merge. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] git log: support "auto" decorations
On Thu, May 29, 2014 at 09:54:10PM -0700, Linus Torvalds wrote: > That said, part of it is just that show-signature is so suboptimal > performance-wise, re-parsing the commit buffer for each commit when > "show_signature" is set. That's just crazy, we've already parsed the > commit text, we already *could* know if it has a signature or not, and > skip it if it doesn't. That would require one of the flag bits in the > object, though, or something, so it's probably not worth doing. Wow, it's really quite bad. Not only do we spend time on commits that we could otherwise know do not have signatures, but we actually pull the buffer from disk, even though we generally have it saved as commit->buffer. And we do it twice! Once for the commit signature, and once for the mergetag. Below is a fairly straightforward patch to use commit->buffer when we have it. Here are timings on the kernel repo: [baseline, no signatures] $ time git log >/dev/null real0m4.902s user0m4.784s sys 0m0.120s [before] $ time git log --show-signature >/dev/null real0m14.735s user0m9.964s sys 0m0.944s [after] $ time git log --show-signature >/dev/null real0m9.981s user0m5.260s sys 0m0.936s If you look at just the user CPU time, you can see we've reclaimed all but 0.5s of the 5.2s wasted seconds. Some of that is presumably going to the extra re-parsing, with a little to the actual gpg verification. The wall-clock time improves, too, but we're still 5s slower. A little of that goes to system time, but presumably most of the rest of it is latency due to waiting on gpg? Getting rid of that would probably involve caching the gpg output from run to run. That's not that hard to do, but I don't like the idea of caching security information. --- commit.c | 36 commit.h | 2 +- log-tree.c | 2 +- 3 files changed, 30 insertions(+), 10 deletions(-) diff --git a/commit.c b/commit.c index f479331..529ee50 100644 --- a/commit.c +++ b/commit.c @@ -1080,14 +1080,34 @@ static int do_sign_commit(struct strbuf *buf, const char *keyid) return 0; } -int parse_signed_commit(const unsigned char *sha1, +static const char *get_commit_buffer(const struct commit *commit, +enum object_type *type, +unsigned long *size) +{ + if (commit->buffer) { + *type = OBJ_COMMIT; + *size = strlen(commit->buffer); + return commit->buffer; + } + + return read_sha1_file(commit->object.sha1, type, size); +} + +static void free_commit_buffer(const char *buffer, const struct commit *commit) +{ + if (buffer != commit->buffer) + free((char *)buffer); +} + +int parse_signed_commit(const struct commit *commit, struct strbuf *payload, struct strbuf *signature) { + unsigned long size; enum object_type type; - char *buffer = read_sha1_file(sha1, &type, &size); + const char *buffer = get_commit_buffer(commit, &type, &size); int in_signature, saw_signature = -1; - char *line, *tail; + const char *line, *tail; if (!buffer || type != OBJ_COMMIT) goto cleanup; @@ -1098,7 +1118,7 @@ int parse_signed_commit(const unsigned char *sha1, saw_signature = 0; while (line < tail) { const char *sig = NULL; - char *next = memchr(line, '\n', tail - line); + const char *next = memchr(line, '\n', tail - line); next = next ? next + 1 : tail; if (in_signature && line[0] == ' ') @@ -1120,7 +1140,7 @@ int parse_signed_commit(const unsigned char *sha1, line = next; } cleanup: - free(buffer); + free_commit_buffer(buffer, commit); return saw_signature; } @@ -1211,7 +1231,7 @@ void check_commit_signature(const struct commit* commit, struct signature_check sigc->result = 'N'; - if (parse_signed_commit(commit->object.sha1, + if (parse_signed_commit(commit, &payload, &signature) <= 0) goto out; status = verify_signed_buffer(payload.buf, payload.len, @@ -1258,10 +1278,10 @@ struct commit_extra_header *read_commit_extra_headers(struct commit *commit, struct commit_extra_header *extra = NULL; unsigned long size; enum object_type type; - char *buffer = read_sha1_file(commit->object.sha1, &type, &size); + const char *buffer = get_commit_buffer(commit, &type, &size); if (buffer && type == OBJ_COMMIT) extra = read_commit_extra_header_lines(buffer, size, exclude); - free(buffer); + free_commit_buffer(buffer, commit); return extra; } diff --git a/commit.h b/commit.h index a9f177b..a765f0f 100644 --- a/commit.h +++ b/commit.h @@ -287,7 +287,7 @@ struct merge_re
Re: [RFC PATCH] git log: support "auto" decorations
On Thu, May 29, 2014 at 6:58 PM, Jeff King wrote: > > I will spare you the usual lecture on having these lines in the message > body. ;) We do it for the kernel because they often get lost otherwise. Particularly the date/author. git doesn't tend to have the same kind of deep email forwarding models, and nobody uses quilt to develop git (I hope!) but it's a habit I like to encourage for the kernel, so I use it in general.. > Yeah, I think this makes a lot of sense. I do use log.decorate=true, and > it is usually not a big deal. However, I think I have run into > annoyances once or twice when piping it. I'd probably use > log.decorate=auto if we had it. I have various scripts to generate releases etc, using "git log" and piping it to random other stuff. I don't _think_ they care, but quite frankly, I'd rather not even have to think about it. Also, I actually find the un-colorized version of the decorations to be distracting. With color (not that I really like the default color scheme, but I'm too lazy to set it up to anything else), the colorization ends up making the decorations much easier to visually separate, so I like them there. >> It's marked with RFC because >> >> (a) that "isatty(1) || pager_in_use()" test is kind of hacky, maybe we >> would be better off sharing something with the auto-coloration? > > The magic for this is in color.c, want_color() and check_auto_color(). Oh, I know. That's where I stole it from. But the colorization actually has very different rules, and looks at TERM etc. So I only stole the actual non-color-specific parts of it, but I was wondering whether it might make sense to unify them. >> (b) I also think it would be nice to have the equivalent for >> "--show-signature", but there we don't have any preexisting config >> file option. > > Potentially yes, though there is a real performance impact for "log > --show-signature" if you actually have a lot of signatures. Even on > linux.git, a full "git log" is 15s with --show-signature, and 5s > without. Maybe that is acceptable for interactive use (and certainly it > is not a reason to make it an _option_, if somebody wants to turn it > on). Yes. This is an example of where the whole "tty is fundamentally different from scripting" happens. It really is about the whole "user is looking at it" vs "scripting". There is no way in hell that "--show-signature" should be on by default for anybody sane. That said, part of it is just that show-signature is so suboptimal performance-wise, re-parsing the commit buffer for each commit when "show_signature" is set. That's just crazy, we've already parsed the commit text, we already *could* know if it has a signature or not, and skip it if it doesn't. That would require one of the flag bits in the object, though, or something, so it's probably not worth doing. Sure, performance might matter if you end up searching for something in the pager after you've done "git log", but personally I think I'd never even notice.. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] git log: support "auto" decorations
On Thu, May 29, 2014 at 03:31:58PM -0700, Linus Torvalds wrote: > From: Linus Torvalds > Date: Thu, 29 May 2014 15:19:40 -0700 > Subject: [RFC PATCH] git log: support "auto" decorations I will spare you the usual lecture on having these lines in the message body. ;) > I actually like seeing decorations by default, but I do *not* think our > current "log.decorate" options make sense, since they will change any > random use of "git log" to have decorations. I much prefer the > "ui.color=auto" behavior that we have for coloration. This is a trivial > patch that tries to approximate that. Yeah, I think this makes a lot of sense. I do use log.decorate=true, and it is usually not a big deal. However, I think I have run into annoyances once or twice when piping it. I'd probably use log.decorate=auto if we had it. > It's marked with RFC because > > (a) that "isatty(1) || pager_in_use()" test is kind of hacky, maybe we > would be better off sharing something with the auto-coloration? The magic for this is in color.c, want_color() and check_auto_color(). The color code checks "pager_use_color" when the pager is in use, but I do not think that makes any sense here. It also checks that $TERM is not "dumb", but that also does not make sense here. So I think your check is fine. It would be nice to share with the color code, but I doubt it will end up any more readable, because of conditionally dealing with those two differences. > (b) I also think it would be nice to have the equivalent for > "--show-signature", but there we don't have any preexisting config > file option. Potentially yes, though there is a real performance impact for "log --show-signature" if you actually have a lot of signatures. Even on linux.git, a full "git log" is 15s with --show-signature, and 5s without. Maybe that is acceptable for interactive use (and certainly it is not a reason to make it an _option_, if somebody wants to turn it on). > (c) maybe somebody would like a way to combine "auto" and "full", > although personally that doesn't seem to strike me as all that useful > (would you really want to see the full refname when not scripting it) Yeah, "full/short" is really orthogonal to "true/false/auto". If we were starting from scratch, I think putting "full/short" into log.decorateStyle would make more sense, but it is probably not worth changing now. I agree that "full auto" is probably not something useful, and we can live without it. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html