Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On Tue, Apr 21, 2015 at 04:56:08PM +0530, karthik nayak wrote: > >>+ status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz); > > > >I wonder if we would feel comfortable just running this NUL-check as > >part of unpack_sha1_header (i.e., in all code paths). It _shouldn't_ > >trigger in normal use, but I wonder if there would be any downsides > >(e.g., maliciously crafted objects getting us to allocate memory or > >something; I think it is fairly easy to convince git to allocate memory, > >though). > > > But why would we want it to be a part of unpack_sha1_header? Just to reduce the number of functions and the complexity of the caller. But I guess it doesn't help that much if the caller would then need to speculatively pass in a strbuf. So it's probably not a good idea. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On 04/21/2015 12:21 AM, Jeff King wrote: On Tue, Apr 21, 2015 at 12:13:30AM +0530, karthik nayak wrote: +static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map, + unsigned long mapsize, void *buffer, + unsigned long bufsiz, struct strbuf *header) +{ + unsigned char *cp; + int status; + int i = 0; + + status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz); I wonder if we would feel comfortable just running this NUL-check as part of unpack_sha1_header (i.e., in all code paths). It _shouldn't_ trigger in normal use, but I wonder if there would be any downsides (e.g., maliciously crafted objects getting us to allocate memory or something; I think it is fairly easy to convince git to allocate memory, though). But why would we want it to be a part of unpack_sha1_header? + for (cp = buffer; cp < stream->next_out; cp++) + if (!*cp) { + /* Found the NUL at the end of the header */ + return 0; + } I think we can spell this as: if (memchr(buffer, '\0', stream->next_out - buffer)) return 0; which is shorter and possibly more efficient. Noted. Thanks :) In theory we could also just start trying to parse the type/size header, and notice there when we don't find the NUL. That's probably not worth doing, though. The parsing is separated from the unpacking here, so it would require combining those two operations in a single function. And the extra NUL search here is likely not very expensive. Yes, even I though about doing that, but wasn't keen on combining those two functions, they're meant to do two different things. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On Tue, Apr 21, 2015 at 12:13:30AM +0530, karthik nayak wrote: > +static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char > *map, > + unsigned long mapsize, void *buffer, > + unsigned long bufsiz, struct strbuf > *header) > +{ > + unsigned char *cp; > + int status; > + int i = 0; > + > + status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz); I wonder if we would feel comfortable just running this NUL-check as part of unpack_sha1_header (i.e., in all code paths). It _shouldn't_ trigger in normal use, but I wonder if there would be any downsides (e.g., maliciously crafted objects getting us to allocate memory or something; I think it is fairly easy to convince git to allocate memory, though). > + for (cp = buffer; cp < stream->next_out; cp++) > + if (!*cp) { > + /* Found the NUL at the end of the header */ > + return 0; > + } I think we can spell this as: if (memchr(buffer, '\0', stream->next_out - buffer)) return 0; which is shorter and possibly more efficient. In theory we could also just start trying to parse the type/size header, and notice there when we don't find the NUL. That's probably not worth doing, though. The parsing is separated from the unpacking here, so it would require combining those two operations in a single function. And the extra NUL search here is likely not very expensive. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On 04/18/2015 02:40 AM, Junio C Hamano wrote: Jeff King writes: On Fri, Apr 17, 2015 at 09:21:31AM -0700, Junio C Hamano wrote: Jeff King writes: If there _is_ a performance implication to worry about here, I think it would be that we are doing an extra malloc/free. Thanks for reminding me; yes, that also worried me. As an aside, I worried about the extra allocation for reading the header in the first place. But it looks like we only do this on the --literally code path (and otherwise use the normal unpack_sha1_header). Still, I wonder if we could make this work automagically. That is, speculatively unpack the first N bytes, assuming we hit the end-of-header. If not, then go to a strbuf as the slow path. Then it would be fine to cover all cases; the normal ones would be fast, and only ridiculous things would incur the extra allocation. Yes, that was what I was hoping to see eventually ;-) Sorry for the delay, So after reading what Jeff said I tried to implement it, this might not be a final version of the change, more of a RFC version. What do you'll think? @@ -1564,6 +1564,36 @@ int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long ma return git_inflate(stream, 0); } +static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map, + unsigned long mapsize, void *buffer, + unsigned long bufsiz, struct strbuf *header) +{ + unsigned char *cp; + int status; + int i = 0; + + status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz); + + for (cp = buffer; cp < stream->next_out; cp++) + if (!*cp) { + /* Found the NUL at the end of the header */ + return 0; + } + + strbuf_add(header, buffer, stream->next_out - (unsigned char *)buffer); + do { + status = git_inflate(stream, 0); + strbuf_add(header, buffer, stream->next_out - (unsigned char *)buffer); + for (cp = buffer; cp < stream->next_out; cp++) + if (!*cp) + /* Found the NUL at the end of the header */ + return 0; + stream->next_out = buffer; + stream->avail_out = bufsiz; + } while (status != Z_STREAM_END); + return -1; +} + @@ -2555,17 +2610,24 @@ static int sha1_loose_object_info(const unsigned char *sha1, return -1; if (oi->disk_sizep) *oi->disk_sizep = mapsize; - if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) + if ((flags & LOOKUP_LITERALLY)) { + if (unpack_sha1_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0) + status = error("unable to unpack %s header with --literally", + sha1_to_hex(sha1)); + } else if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) status = error("unable to unpack %s header", sha1_to_hex(sha1)); - else if ((status = parse_sha1_header(hdr, &size)) < 0) + if (hdrbuf.len) { + if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, flags)) < 0) + status = error("unable to parse %s header with --literally", + sha1_to_hex(sha1)); + } else if ((status = parse_sha1_header_extended(hdr, oi, flags)) < 0) status = error("unable to parse %s header", sha1_to_hex(sha1)); and -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On 04/18/2015 05:01 AM, Eric Sunshine wrote: On Wed, Apr 15, 2015 at 12:59 PM, Karthik Nayak wrote: Update sha1_loose_object_info() to optionally allow it to read from a loose object file of unknown/bogus type; as the function usually returns the type of the object it read in the form of enum for known types, add an optional "typename" field to receive the name of the type in textual form and a flag to indicate the reading of a loose object file of unknown/bogus type. [...] --- diff --git a/sha1_file.c b/sha1_file.c index 980ce6b..267399d 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2522,13 +2575,15 @@ struct packed_git *find_sha1_pack(const unsigned char *sha1, } static int sha1_loose_object_info(const unsigned char *sha1, - struct object_info *oi) + struct object_info *oi, + int flags) { - int status; - unsigned long mapsize, size; + int status = 0; + unsigned long mapsize; void *map; git_zstream stream; char hdr[32]; + struct strbuf hdrbuf = STRBUF_INIT; if (oi->delta_base_sha1) hashclr(oi->delta_base_sha1); @@ -2555,17 +2610,26 @@ static int sha1_loose_object_info(const unsigned char *sha1, return -1; if (oi->disk_sizep) *oi->disk_sizep = mapsize; - if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) - status = error("unable to unpack %s header", - sha1_to_hex(sha1)); - else if ((status = parse_sha1_header(hdr, &size)) < 0) - status = error("unable to parse %s header", sha1_to_hex(sha1)); - else if (oi->sizep) - *oi->sizep = size; + if ((flags & LOOKUP_LITERALLY)) { + if (unpack_sha1_header_to_strbuf(&stream, map, mapsize, &hdrbuf) < 0) + status = error("unable to unpack %s header with --literally", + sha1_to_hex(sha1)); + else if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, flags)) < 0) + status = error("unable to parse %s header with --literally", + sha1_to_hex(sha1)); + } else { + if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) + status = error("unable to unpack %s header", + sha1_to_hex(sha1)); + else if ((status = parse_sha1_header_extended(hdr, oi, flags)) < 0) + status = error("unable to parse %s header", sha1_to_hex(sha1)); + } git_inflate_end(&stream); munmap(map, mapsize); - if (oi->typep) + if (status && oi->typep) *oi->typep = status; + if (hdrbuf.buf) + strbuf_release(&hdrbuf); Why is strbuf_release() protected by a conditional rather than being called unconditionally? Am I missing something obvious? No, you're right. return 0; } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On 04/18/2015 12:53 AM, Junio C Hamano wrote: karthik nayak writes: + type = type_from_string_gently(buf, len, 1); + if (oi->typename) { + strbuf_add(oi->typename, buf, len); + strbuf_addch(oi->typename, '\0'); add() has setlen() at the end so you do not have to NUL terminate it yourself. Doing addch() is actively wrong, as oi->typename->len now counts the terminating NUL as part of the string, no? Yes. was speculative of that. thanks for clearing it up. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On 04/18/2015 12:19 AM, Jeff King wrote: On Sat, Apr 18, 2015 at 12:15:28AM +0530, karthik nayak wrote: But now we use type_from_string_gently, which can accept a length[1]. So we could just count the bytes to the first space and pass the original buffer along with that length, no? Yes, we could, that would eliminate "struct strbuf typename = STRBUF_INIT". Something like this perhaps : Yeah, this is exactly what I had in mind. { - char type[10]; - int i; + const char *buf = hdr; unsigned long size; + int type, len = 0; Maybe switch the names of "buf" and "len" to "type_buf" and "type_len" to make their purpose more clear? -Peff Yes, will change the names. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On Wed, Apr 15, 2015 at 12:59 PM, Karthik Nayak wrote: > Update sha1_loose_object_info() to optionally allow it to read > from a loose object file of unknown/bogus type; as the function > usually returns the type of the object it read in the form of enum > for known types, add an optional "typename" field to receive the > name of the type in textual form and a flag to indicate the reading > of a loose object file of unknown/bogus type. > [...] > --- > diff --git a/sha1_file.c b/sha1_file.c > index 980ce6b..267399d 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -2522,13 +2575,15 @@ struct packed_git *find_sha1_pack(const unsigned char > *sha1, > } > > static int sha1_loose_object_info(const unsigned char *sha1, > - struct object_info *oi) > + struct object_info *oi, > + int flags) > { > - int status; > - unsigned long mapsize, size; > + int status = 0; > + unsigned long mapsize; > void *map; > git_zstream stream; > char hdr[32]; > + struct strbuf hdrbuf = STRBUF_INIT; > > if (oi->delta_base_sha1) > hashclr(oi->delta_base_sha1); > @@ -2555,17 +2610,26 @@ static int sha1_loose_object_info(const unsigned char > *sha1, > return -1; > if (oi->disk_sizep) > *oi->disk_sizep = mapsize; > - if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) > - status = error("unable to unpack %s header", > - sha1_to_hex(sha1)); > - else if ((status = parse_sha1_header(hdr, &size)) < 0) > - status = error("unable to parse %s header", > sha1_to_hex(sha1)); > - else if (oi->sizep) > - *oi->sizep = size; > + if ((flags & LOOKUP_LITERALLY)) { > + if (unpack_sha1_header_to_strbuf(&stream, map, mapsize, > &hdrbuf) < 0) > + status = error("unable to unpack %s header with > --literally", > + sha1_to_hex(sha1)); > + else if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, > flags)) < 0) > + status = error("unable to parse %s header with > --literally", > + sha1_to_hex(sha1)); > + } else { > + if (unpack_sha1_header(&stream, map, mapsize, hdr, > sizeof(hdr)) < 0) > + status = error("unable to unpack %s header", > + sha1_to_hex(sha1)); > + else if ((status = parse_sha1_header_extended(hdr, oi, > flags)) < 0) > + status = error("unable to parse %s header", > sha1_to_hex(sha1)); > + } > git_inflate_end(&stream); > munmap(map, mapsize); > - if (oi->typep) > + if (status && oi->typep) > *oi->typep = status; > + if (hdrbuf.buf) > + strbuf_release(&hdrbuf); Why is strbuf_release() protected by a conditional rather than being called unconditionally? Am I missing something obvious? > return 0; > } > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
Jeff King writes: > On Fri, Apr 17, 2015 at 09:21:31AM -0700, Junio C Hamano wrote: > >> Jeff King writes: >> >> > If there _is_ a performance implication to worry about here, I think it >> > would be that we are doing an extra malloc/free. >> >> Thanks for reminding me; yes, that also worried me. > > As an aside, I worried about the extra allocation for reading the header > in the first place. But it looks like we only do this on the --literally > code path (and otherwise use the normal unpack_sha1_header). Still, I > wonder if we could make this work automagically. That is, speculatively > unpack the first N bytes, assuming we hit the end-of-header. If not, > then go to a strbuf as the slow path. Then it would be fine to cover all > cases; the normal ones would be fast, and only ridiculous things would > incur the extra allocation. Yes, that was what I was hoping to see eventually ;-) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On Fri, Apr 17, 2015 at 09:21:31AM -0700, Junio C Hamano wrote: > Jeff King writes: > > > If there _is_ a performance implication to worry about here, I think it > > would be that we are doing an extra malloc/free. > > Thanks for reminding me; yes, that also worried me. As an aside, I worried about the extra allocation for reading the header in the first place. But it looks like we only do this on the --literally code path (and otherwise use the normal unpack_sha1_header). Still, I wonder if we could make this work automagically. That is, speculatively unpack the first N bytes, assuming we hit the end-of-header. If not, then go to a strbuf as the slow path. Then it would be fine to cover all cases; the normal ones would be fast, and only ridiculous things would incur the extra allocation. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
Jeff King writes: > If there _is_ a performance implication to worry about here, I think it > would be that we are doing an extra malloc/free. Thanks for reminding me; yes, that also worried me. > I'm not sure I > understand why we are copying it at all. The original code copied from > the hdr into type[10] so that we could NUL-terminate it, which was > required for type_from_string(). Sounds like a good plan. > > But now we use type_from_string_gently, which can accept a length[1]. So > we could just count the bytes to the first space and pass the original > buffer along with that length, no? > > -Peff > > [1] Not related to your patch, but it looks like type_from_string_gently > is overly lax. It does a strncmp() with the length of the candidate > name, but does not check that we consumed all of the matching name. > So "tr" would match "tree", "comm" would match "commit", and so > forth. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
karthik nayak writes: > + type = type_from_string_gently(buf, len, 1); > + if (oi->typename) { > + strbuf_add(oi->typename, buf, len); > + strbuf_addch(oi->typename, '\0'); add() has setlen() at the end so you do not have to NUL terminate it yourself. Doing addch() is actively wrong, as oi->typename->len now counts the terminating NUL as part of the string, no? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On Sat, Apr 18, 2015 at 12:15:28AM +0530, karthik nayak wrote: > >But now we use type_from_string_gently, which can accept a length[1]. So > >we could just count the bytes to the first space and pass the original > >buffer along with that length, no? > > Yes, we could, that would eliminate "struct strbuf typename = > STRBUF_INIT". > > Something like this perhaps : Yeah, this is exactly what I had in mind. > { > - char type[10]; > - int i; > + const char *buf = hdr; > unsigned long size; > + int type, len = 0; Maybe switch the names of "buf" and "len" to "type_buf" and "type_len" to make their purpose more clear? -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On 04/17/2015 07:53 PM, Jeff King wrote: On Wed, Apr 15, 2015 at 06:18:24PM -0400, Jeff King wrote: >> This _might_ have some performance impact in that strbuf_addch() >> involves strbuf_grow(*, 1), which does "does it overflow to >> increment sb->len by one?"; I would say it should be unmeasurable >> because the function is expected to be used only on loose objects >> and you shouldn't have very many of them without packing in your >> repository in the first place. >> >> I guess Peff's c1822d4f (strbuf: add an optimized 1-character >> strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his >> new strbuf_grow_ch(), and once that happens the performance worry >> would disappear without this code to be changed at all. > > I haven't re-rolled that series yet, but the discussion there showed > that strbuf_grow_ch() is unnecessary; call-sites can just check: > >if (!strbuf_avail(sb)) > strbuf_grow(sb, 1); > > to get the fast inline check. Since we go to the trouble to inline > strbuf_addch, we should probably teach it the same trick. It would be > nice to identify a place with a tight strbuf_addch() loop that could > demonstrate the speed increase, though. Just for reference, I did teach strbuf_addch this trick. And the config code is the tight loop to test it with. Results are here: http://article.gmane.org/gmane.comp.version-control.git/267266 In this code path, we are typically only seeing short strings (the original code used a 10-byte static buffer). So I doubt it makes all that much difference. If there _is_ a performance implication to worry about here, I think it would be that we are doing an extra malloc/free. I'm not sure I understand why we are copying it at all. The original code copied from the hdr into type[10] so that we could NUL-terminate it, which was required for type_from_string(). But now we use type_from_string_gently, which can accept a length[1]. So we could just count the bytes to the first space and pass the original buffer along with that length, no? Yes, we could, that would eliminate "struct strbuf typename = STRBUF_INIT". Something like this perhaps : @@ -1614,27 +1642,40 @@ static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long s * too permissive for what we want to check. So do an anal * object header parse by hand. */ -int parse_sha1_header(const char *hdr, unsigned long *sizep) +static int parse_sha1_header_extended(const char *hdr, struct object_info *oi, + unsigned int flags) { - char type[10]; - int i; + const char *buf = hdr; unsigned long size; + int type, len = 0; /* -* The type can be at most ten bytes (including the -* terminating '\0' that we add), and is followed by +* The type can be of any size but is followed by * a space. */ - i = 0; for (;;) { char c = *hdr++; if (c == ' ') break; - type[i++] = c; - if (i >= sizeof(type)) - return -1; + len++; } - type[i] = 0; + + type = type_from_string_gently(buf, len, 1); + if (oi->typename) { + strbuf_add(oi->typename, buf, len); + strbuf_addch(oi->typename, '\0'); + } + /* +* Set type to 0 if its an unknown object and +* we're obtaining the type using '--literally' +* option. +*/ + if ((flags & LOOKUP_LITERALLY) && (type == -1)) + type = 0; + else if (type == -1) + die("invalid object type"); + if (oi->typep) + *oi->typep = type; /* * The length must follow immediately, and be in canonical -Peff [1] Not related to your patch, but it looks like type_from_string_gently is overly lax. It does a strncmp() with the length of the candidate name, but does not check that we consumed all of the matching name. So "tr" would match "tree", "comm" would match "commit", and so forth. Thanks for the patch re-roll on strbuf_addch() and the patch on type_from_string_gently(). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On Wed, Apr 15, 2015 at 06:18:24PM -0400, Jeff King wrote: > > This _might_ have some performance impact in that strbuf_addch() > > involves strbuf_grow(*, 1), which does "does it overflow to > > increment sb->len by one?"; I would say it should be unmeasurable > > because the function is expected to be used only on loose objects > > and you shouldn't have very many of them without packing in your > > repository in the first place. > > > > I guess Peff's c1822d4f (strbuf: add an optimized 1-character > > strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his > > new strbuf_grow_ch(), and once that happens the performance worry > > would disappear without this code to be changed at all. > > I haven't re-rolled that series yet, but the discussion there showed > that strbuf_grow_ch() is unnecessary; call-sites can just check: > > if (!strbuf_avail(sb)) > strbuf_grow(sb, 1); > > to get the fast inline check. Since we go to the trouble to inline > strbuf_addch, we should probably teach it the same trick. It would be > nice to identify a place with a tight strbuf_addch() loop that could > demonstrate the speed increase, though. Just for reference, I did teach strbuf_addch this trick. And the config code is the tight loop to test it with. Results are here: http://article.gmane.org/gmane.comp.version-control.git/267266 In this code path, we are typically only seeing short strings (the original code used a 10-byte static buffer). So I doubt it makes all that much difference. If there _is_ a performance implication to worry about here, I think it would be that we are doing an extra malloc/free. I'm not sure I understand why we are copying it at all. The original code copied from the hdr into type[10] so that we could NUL-terminate it, which was required for type_from_string(). But now we use type_from_string_gently, which can accept a length[1]. So we could just count the bytes to the first space and pass the original buffer along with that length, no? -Peff [1] Not related to your patch, but it looks like type_from_string_gently is overly lax. It does a strncmp() with the length of the candidate name, but does not check that we consumed all of the matching name. So "tr" would match "tree", "comm" would match "commit", and so forth. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
On Wed, Apr 15, 2015 at 01:21:15PM -0700, Junio C Hamano wrote: > > - type[i++] = c; > > - if (i >= sizeof(type)) > > - return -1; > > + strbuf_addch(&typename, c); > > } > > - type[i] = 0; > > This _might_ have some performance impact in that strbuf_addch() > involves strbuf_grow(*, 1), which does "does it overflow to > increment sb->len by one?"; I would say it should be unmeasurable > because the function is expected to be used only on loose objects > and you shouldn't have very many of them without packing in your > repository in the first place. > > I guess Peff's c1822d4f (strbuf: add an optimized 1-character > strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his > new strbuf_grow_ch(), and once that happens the performance worry > would disappear without this code to be changed at all. I haven't re-rolled that series yet, but the discussion there showed that strbuf_grow_ch() is unnecessary; call-sites can just check: if (!strbuf_avail(sb)) strbuf_grow(sb, 1); to get the fast inline check. Since we go to the trouble to inline strbuf_addch, we should probably teach it the same trick. It would be nice to identify a place with a tight strbuf_addch() loop that could demonstrate the speed increase, though. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
Karthik Nayak writes: > Update sha1_loose_object_info() to optionally allow it to read > from a loose object file of unknown/bogus type; as the function > usually returns the type of the object it read in the form of enum > for known types, add an optional "typename" field to receive the > name of the type in textual form and a flag to indicate the reading > of a loose object file of unknown/bogus type. > > Add parse_sha1_header_extended() which acts as a wrapper around > parse_sha1_header() allowing more information to be obtained. > > Add unpack_sha1_header_to_strbuf() to unpack sha1 headers of > unknown/corrupt objects which have a unknown sha1 header size to > a strbuf structure. This was written by Junio C Hamano but tested > by me. > > Helped-by: Junio C Hamano > Helped-by: Eric Sunshine > Signed-off-by: Karthik Nayak > --- I see that you made the type parsing to happen earlier than the previous round (which used to do the size first and then type). Not a problem, though. Just something I noticed... > @@ -1614,27 +1642,40 @@ static void *unpack_sha1_rest(git_zstream *stream, > void *buffer, unsigned long s > * too permissive for what we want to check. So do an anal > * object header parse by hand. > */ > -int parse_sha1_header(const char *hdr, unsigned long *sizep) > +int parse_sha1_header_extended(const char *hdr, struct object_info *oi, > +unsigned int flags) > { > - char type[10]; > - int i; > + struct strbuf typename = STRBUF_INIT; > unsigned long size; > + int type; > > /* >* The type can be at most ten bytes (including the >* terminating '\0' that we add), and is followed by >* a space. >*/ > - i = 0; > for (;;) { > char c = *hdr++; > if (c == ' ') > break; > - type[i++] = c; > - if (i >= sizeof(type)) > - return -1; > + strbuf_addch(&typename, c); > } > - type[i] = 0; This _might_ have some performance impact in that strbuf_addch() involves strbuf_grow(*, 1), which does "does it overflow to increment sb->len by one?"; I would say it should be unmeasurable because the function is expected to be used only on loose objects and you shouldn't have very many of them without packing in your repository in the first place. I guess Peff's c1822d4f (strbuf: add an optimized 1-character strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his new strbuf_grow_ch(), and once that happens the performance worry would disappear without this code to be changed at all. > @@ -2541,7 +2596,7 @@ static int sha1_loose_object_info(const unsigned char > *sha1, >* return value implicitly indicates whether the >* object even exists. >*/ > - if (!oi->typep && !oi->sizep) { > + if (!oi->typep && !oi->typename && !oi->sizep) { You didn't have this check in the earlier round, and this new one is correct, I think. Good eyes to spot this potential problem. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type
Update sha1_loose_object_info() to optionally allow it to read from a loose object file of unknown/bogus type; as the function usually returns the type of the object it read in the form of enum for known types, add an optional "typename" field to receive the name of the type in textual form and a flag to indicate the reading of a loose object file of unknown/bogus type. Add parse_sha1_header_extended() which acts as a wrapper around parse_sha1_header() allowing more information to be obtained. Add unpack_sha1_header_to_strbuf() to unpack sha1 headers of unknown/corrupt objects which have a unknown sha1 header size to a strbuf structure. This was written by Junio C Hamano but tested by me. Helped-by: Junio C Hamano Helped-by: Eric Sunshine Signed-off-by: Karthik Nayak --- cache.h | 2 + sha1_file.c | 126 +--- 2 files changed, 105 insertions(+), 23 deletions(-) diff --git a/cache.h b/cache.h index c47323e..ea6faf0 100644 --- a/cache.h +++ b/cache.h @@ -881,6 +881,7 @@ extern int is_ntfs_dotgit(const char *name); /* object replacement */ #define LOOKUP_REPLACE_OBJECT 1 +#define LOOKUP_LITERALLY 2 extern void *read_sha1_file_extended(const unsigned char *sha1, enum object_type *type, unsigned long *size, unsigned flag); static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size) { @@ -1349,6 +1350,7 @@ struct object_info { unsigned long *sizep; unsigned long *disk_sizep; unsigned char *delta_base_sha1; + struct strbuf *typename; /* Response */ enum { diff --git a/sha1_file.c b/sha1_file.c index 980ce6b..267399d 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1564,6 +1564,34 @@ int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long ma return git_inflate(stream, 0); } +static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char *map, + unsigned long mapsize, + struct strbuf *header) +{ + unsigned char buffer[32], *cp; + unsigned long bufsiz = sizeof(buffer); + int status; + + status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz); + + if (status) { + strbuf_add(header, buffer, stream->next_out - buffer); + return status; + } + + do { + status = git_inflate(stream, 0); + strbuf_add(header, buffer, stream->next_out - buffer); + for (cp = buffer; cp < stream->next_out; cp++) + if (!*cp) + /* Found the NUL at the end of the header */ + return 0; + stream->next_out = buffer; + stream->avail_out = bufsiz; + } while (status != Z_STREAM_END); + return -1; +} + static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long size, const unsigned char *sha1) { int bytes = strlen(buffer) + 1; @@ -1614,27 +1642,40 @@ static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long s * too permissive for what we want to check. So do an anal * object header parse by hand. */ -int parse_sha1_header(const char *hdr, unsigned long *sizep) +int parse_sha1_header_extended(const char *hdr, struct object_info *oi, + unsigned int flags) { - char type[10]; - int i; + struct strbuf typename = STRBUF_INIT; unsigned long size; + int type; /* * The type can be at most ten bytes (including the * terminating '\0' that we add), and is followed by * a space. */ - i = 0; for (;;) { char c = *hdr++; if (c == ' ') break; - type[i++] = c; - if (i >= sizeof(type)) - return -1; + strbuf_addch(&typename, c); } - type[i] = 0; + + type = type_from_string_gently(typename.buf, typename.len, 1); + if (oi->typename) + strbuf_addbuf(oi->typename, &typename); + strbuf_release(&typename); + /* +* Set type to 0 if its an unknown object and +* we're obtaining the type using '--literally' +* option. +*/ + if ((flags & LOOKUP_LITERALLY) && (type == -1)) + type = 0; + else if (type == -1) + die("invalid object type"); + if (oi->typep) + *oi->typep = type; /* * The length must follow immediately, and be in canonical @@ -1652,12 +1693,24 @@ int parse_sha1_header(const char *hdr, unsigned long *sizep) size = size * 10 + c; } } - *sizep = size; + + if (oi->sizep) + *oi->sizep = size; /* * The leng