Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-21 Thread Jeff King
On Tue, Apr 21, 2015 at 04:56:08PM +0530, karthik nayak wrote:

> >>+   status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);
> >
> >I wonder if we would feel comfortable just running this NUL-check as
> >part of unpack_sha1_header (i.e., in all code paths). It _shouldn't_
> >trigger in normal use, but I wonder if there would be any downsides
> >(e.g., maliciously crafted objects getting us to allocate memory or
> >something; I think it is fairly easy to convince git to allocate memory,
> >though).
> >
> But why would we want it to be a part of unpack_sha1_header?

Just to reduce the number of functions and the complexity of the caller.
But I guess it doesn't help that much if the caller would then need to
speculatively pass in a strbuf. So it's probably not a good idea.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-21 Thread karthik nayak



On 04/21/2015 12:21 AM, Jeff King wrote:

On Tue, Apr 21, 2015 at 12:13:30AM +0530, karthik nayak wrote:


+static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char
*map,
+   unsigned long mapsize, void *buffer,
+   unsigned long bufsiz, struct strbuf
*header)
+{
+   unsigned char *cp;
+   int status;
+   int i = 0;
+
+   status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);


I wonder if we would feel comfortable just running this NUL-check as
part of unpack_sha1_header (i.e., in all code paths). It _shouldn't_
trigger in normal use, but I wonder if there would be any downsides
(e.g., maliciously crafted objects getting us to allocate memory or
something; I think it is fairly easy to convince git to allocate memory,
though).


But why would we want it to be a part of unpack_sha1_header?


+   for (cp = buffer; cp < stream->next_out; cp++)
+   if (!*cp) {
+   /* Found the NUL at the end of the header */
+   return 0;
+   }


I think we can spell this as:

   if (memchr(buffer, '\0', stream->next_out - buffer))
return 0;

which is shorter and possibly more efficient.

Noted. Thanks :)


In theory we could also just start trying to parse the type/size header,
and notice there when we don't find the NUL. That's probably not worth
doing, though. The parsing is separated from the unpacking here, so it
would require combining those two operations in a single function. And
the extra NUL search here is likely not very expensive.

Yes, even I though about doing that, but wasn't keen on combining those 
two functions, they're meant to do two different things.

-Peff


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-20 Thread Jeff King
On Tue, Apr 21, 2015 at 12:13:30AM +0530, karthik nayak wrote:

> +static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char
> *map,
> +   unsigned long mapsize, void *buffer,
> +   unsigned long bufsiz, struct strbuf
> *header)
> +{
> +   unsigned char *cp;
> +   int status;
> +   int i = 0;
> +
> +   status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);

I wonder if we would feel comfortable just running this NUL-check as
part of unpack_sha1_header (i.e., in all code paths). It _shouldn't_
trigger in normal use, but I wonder if there would be any downsides
(e.g., maliciously crafted objects getting us to allocate memory or
something; I think it is fairly easy to convince git to allocate memory,
though).

> +   for (cp = buffer; cp < stream->next_out; cp++)
> +   if (!*cp) {
> +   /* Found the NUL at the end of the header */
> +   return 0;
> +   }

I think we can spell this as:

  if (memchr(buffer, '\0', stream->next_out - buffer))
return 0;

which is shorter and possibly more efficient.

In theory we could also just start trying to parse the type/size header,
and notice there when we don't find the NUL. That's probably not worth
doing, though. The parsing is separated from the unpacking here, so it
would require combining those two operations in a single function. And
the extra NUL search here is likely not very expensive.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-20 Thread karthik nayak



On 04/18/2015 02:40 AM, Junio C Hamano wrote:

Jeff King  writes:


On Fri, Apr 17, 2015 at 09:21:31AM -0700, Junio C Hamano wrote:


Jeff King  writes:


If there _is_ a performance implication to worry about here, I think it
would be that we are doing an extra malloc/free.


Thanks for reminding me; yes, that also worried me.


As an aside, I worried about the extra allocation for reading the header
in the first place. But it looks like we only do this on the --literally
code path (and otherwise use the normal unpack_sha1_header).  Still, I
wonder if we could make this work automagically.  That is, speculatively
unpack the first N bytes, assuming we hit the end-of-header. If not,
then go to a strbuf as the slow path. Then it would be fine to cover all
cases; the normal ones would be fast, and only ridiculous things would
incur the extra allocation.


Yes, that was what I was hoping to see eventually ;-)



Sorry for the delay, So after reading what Jeff said I tried to 
implement it, this might not be a final version of the change, more of a 
RFC version. What do you'll think?


@@ -1564,6 +1564,36 @@ int unpack_sha1_header(git_zstream *stream, 
unsigned char *map, unsigned long ma

return git_inflate(stream, 0);
 }

+static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned 
char *map,

+   unsigned long mapsize, void *buffer,
+   unsigned long bufsiz, struct 
strbuf *header)

+{
+   unsigned char *cp;
+   int status;
+   int i = 0;
+
+   status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);
+
+   for (cp = buffer; cp < stream->next_out; cp++)
+   if (!*cp) {
+   /* Found the NUL at the end of the header */
+   return 0;
+   }
+
+   strbuf_add(header, buffer, stream->next_out - (unsigned char 
*)buffer);

+   do {
+   status = git_inflate(stream, 0);
+   strbuf_add(header, buffer, stream->next_out - (unsigned 
char *)buffer);

+   for (cp = buffer; cp < stream->next_out; cp++)
+   if (!*cp)
+   /* Found the NUL at the end of the header */
+   return 0;
+   stream->next_out = buffer;
+   stream->avail_out = bufsiz;
+   } while (status != Z_STREAM_END);
+   return -1;
+}
+


@@ -2555,17 +2610,24 @@ static int sha1_loose_object_info(const unsigned 
char *sha1,

return -1;
if (oi->disk_sizep)
*oi->disk_sizep = mapsize;
-   if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+   if ((flags & LOOKUP_LITERALLY)) {
+   if (unpack_sha1_header_to_strbuf(&stream, map, mapsize, 
hdr, sizeof(hdr), &hdrbuf) < 0)
+   status = error("unable to unpack %s header with 
--literally",

+  sha1_to_hex(sha1));
+   } else if (unpack_sha1_header(&stream, map, mapsize, hdr, 
sizeof(hdr)) < 0)

status = error("unable to unpack %s header",
   sha1_to_hex(sha1));
-   else if ((status = parse_sha1_header(hdr, &size)) < 0)
+   if (hdrbuf.len) {
+   if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, 
flags)) < 0)
+   status = error("unable to parse %s header with 
--literally",

+  sha1_to_hex(sha1));
+   } else if ((status = parse_sha1_header_extended(hdr, oi, flags)) 
< 0)
status = error("unable to parse %s header", 
sha1_to_hex(sha1));



and
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-18 Thread karthik nayak



On 04/18/2015 05:01 AM, Eric Sunshine wrote:

On Wed, Apr 15, 2015 at 12:59 PM, Karthik Nayak  wrote:

Update sha1_loose_object_info() to optionally allow it to read
from a loose object file of unknown/bogus type; as the function
usually returns the type of the object it read in the form of enum
for known types, add an optional "typename" field to receive the
name of the type in textual form and a flag to indicate the reading
of a loose object file of unknown/bogus type.
[...]
---
diff --git a/sha1_file.c b/sha1_file.c
index 980ce6b..267399d 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2522,13 +2575,15 @@ struct packed_git *find_sha1_pack(const unsigned char 
*sha1,
  }

  static int sha1_loose_object_info(const unsigned char *sha1,
- struct object_info *oi)
+ struct object_info *oi,
+ int flags)
  {
-   int status;
-   unsigned long mapsize, size;
+   int status = 0;
+   unsigned long mapsize;
 void *map;
 git_zstream stream;
 char hdr[32];
+   struct strbuf hdrbuf = STRBUF_INIT;

 if (oi->delta_base_sha1)
 hashclr(oi->delta_base_sha1);
@@ -2555,17 +2610,26 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
 return -1;
 if (oi->disk_sizep)
 *oi->disk_sizep = mapsize;
-   if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
-   status = error("unable to unpack %s header",
-  sha1_to_hex(sha1));
-   else if ((status = parse_sha1_header(hdr, &size)) < 0)
-   status = error("unable to parse %s header", sha1_to_hex(sha1));
-   else if (oi->sizep)
-   *oi->sizep = size;
+   if ((flags & LOOKUP_LITERALLY)) {
+   if (unpack_sha1_header_to_strbuf(&stream, map, mapsize, &hdrbuf) 
< 0)
+   status = error("unable to unpack %s header with 
--literally",
+  sha1_to_hex(sha1));
+   else if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, 
flags)) < 0)
+   status = error("unable to parse %s header with 
--literally",
+  sha1_to_hex(sha1));
+   } else {
+   if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) 
< 0)
+   status = error("unable to unpack %s header",
+  sha1_to_hex(sha1));
+   else if ((status = parse_sha1_header_extended(hdr, oi, flags)) 
< 0)
+   status = error("unable to parse %s header", 
sha1_to_hex(sha1));
+   }
 git_inflate_end(&stream);
 munmap(map, mapsize);
-   if (oi->typep)
+   if (status && oi->typep)
 *oi->typep = status;
+   if (hdrbuf.buf)
+   strbuf_release(&hdrbuf);


Why is strbuf_release() protected by a conditional rather than being
called unconditionally? Am I missing something obvious?

No, you're right.



 return 0;
  }


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-18 Thread karthik nayak



On 04/18/2015 12:53 AM, Junio C Hamano wrote:

karthik nayak  writes:


+   type = type_from_string_gently(buf, len, 1);
+   if (oi->typename) {
+   strbuf_add(oi->typename, buf, len);
+   strbuf_addch(oi->typename, '\0');


add() has setlen() at the end so you do not have to NUL terminate it
yourself.  Doing addch() is actively wrong, as oi->typename->len now
counts the terminating NUL as part of the string, no?



Yes. was speculative of that. thanks for clearing it up.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-18 Thread karthik nayak



On 04/18/2015 12:19 AM, Jeff King wrote:

On Sat, Apr 18, 2015 at 12:15:28AM +0530, karthik nayak wrote:


But now we use type_from_string_gently, which can accept a length[1]. So
we could just count the bytes to the first space and pass the original
buffer along with that length, no?


Yes, we could, that would eliminate  "struct strbuf typename =
STRBUF_INIT".

Something like this perhaps :


Yeah, this is exactly what I had in mind.


   {
-   char type[10];
-   int i;
+   const char *buf = hdr;
  unsigned long size;
+   int type, len = 0;


Maybe switch the names of "buf" and "len" to "type_buf" and "type_len"
to make their purpose more clear?

-Peff



Yes, will change the names.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread Eric Sunshine
On Wed, Apr 15, 2015 at 12:59 PM, Karthik Nayak  wrote:
> Update sha1_loose_object_info() to optionally allow it to read
> from a loose object file of unknown/bogus type; as the function
> usually returns the type of the object it read in the form of enum
> for known types, add an optional "typename" field to receive the
> name of the type in textual form and a flag to indicate the reading
> of a loose object file of unknown/bogus type.
> [...]
> ---
> diff --git a/sha1_file.c b/sha1_file.c
> index 980ce6b..267399d 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2522,13 +2575,15 @@ struct packed_git *find_sha1_pack(const unsigned char 
> *sha1,
>  }
>
>  static int sha1_loose_object_info(const unsigned char *sha1,
> - struct object_info *oi)
> + struct object_info *oi,
> + int flags)
>  {
> -   int status;
> -   unsigned long mapsize, size;
> +   int status = 0;
> +   unsigned long mapsize;
> void *map;
> git_zstream stream;
> char hdr[32];
> +   struct strbuf hdrbuf = STRBUF_INIT;
>
> if (oi->delta_base_sha1)
> hashclr(oi->delta_base_sha1);
> @@ -2555,17 +2610,26 @@ static int sha1_loose_object_info(const unsigned char 
> *sha1,
> return -1;
> if (oi->disk_sizep)
> *oi->disk_sizep = mapsize;
> -   if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
> -   status = error("unable to unpack %s header",
> -  sha1_to_hex(sha1));
> -   else if ((status = parse_sha1_header(hdr, &size)) < 0)
> -   status = error("unable to parse %s header", 
> sha1_to_hex(sha1));
> -   else if (oi->sizep)
> -   *oi->sizep = size;
> +   if ((flags & LOOKUP_LITERALLY)) {
> +   if (unpack_sha1_header_to_strbuf(&stream, map, mapsize, 
> &hdrbuf) < 0)
> +   status = error("unable to unpack %s header with 
> --literally",
> +  sha1_to_hex(sha1));
> +   else if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, 
> flags)) < 0)
> +   status = error("unable to parse %s header with 
> --literally",
> +  sha1_to_hex(sha1));
> +   } else {
> +   if (unpack_sha1_header(&stream, map, mapsize, hdr, 
> sizeof(hdr)) < 0)
> +   status = error("unable to unpack %s header",
> +  sha1_to_hex(sha1));
> +   else if ((status = parse_sha1_header_extended(hdr, oi, 
> flags)) < 0)
> +   status = error("unable to parse %s header", 
> sha1_to_hex(sha1));
> +   }
> git_inflate_end(&stream);
> munmap(map, mapsize);
> -   if (oi->typep)
> +   if (status && oi->typep)
> *oi->typep = status;
> +   if (hdrbuf.buf)
> +   strbuf_release(&hdrbuf);

Why is strbuf_release() protected by a conditional rather than being
called unconditionally? Am I missing something obvious?

> return 0;
>  }
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Apr 17, 2015 at 09:21:31AM -0700, Junio C Hamano wrote:
>
>> Jeff King  writes:
>> 
>> > If there _is_ a performance implication to worry about here, I think it
>> > would be that we are doing an extra malloc/free.
>> 
>> Thanks for reminding me; yes, that also worried me.
>
> As an aside, I worried about the extra allocation for reading the header
> in the first place. But it looks like we only do this on the --literally
> code path (and otherwise use the normal unpack_sha1_header).  Still, I
> wonder if we could make this work automagically.  That is, speculatively
> unpack the first N bytes, assuming we hit the end-of-header. If not,
> then go to a strbuf as the slow path. Then it would be fine to cover all
> cases; the normal ones would be fast, and only ridiculous things would
> incur the extra allocation.

Yes, that was what I was hoping to see eventually ;-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread Jeff King
On Fri, Apr 17, 2015 at 09:21:31AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > If there _is_ a performance implication to worry about here, I think it
> > would be that we are doing an extra malloc/free.
> 
> Thanks for reminding me; yes, that also worried me.

As an aside, I worried about the extra allocation for reading the header
in the first place. But it looks like we only do this on the --literally
code path (and otherwise use the normal unpack_sha1_header).  Still, I
wonder if we could make this work automagically.  That is, speculatively
unpack the first N bytes, assuming we hit the end-of-header. If not,
then go to a strbuf as the slow path. Then it would be fine to cover all
cases; the normal ones would be fast, and only ridiculous things would
incur the extra allocation.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread Junio C Hamano
Jeff King  writes:

> If there _is_ a performance implication to worry about here, I think it
> would be that we are doing an extra malloc/free.

Thanks for reminding me; yes, that also worried me.

> I'm not sure I
> understand why we are copying it at all. The original code copied from
> the hdr into type[10] so that we could NUL-terminate it, which was
> required for type_from_string().

Sounds like a good plan.

>
> But now we use type_from_string_gently, which can accept a length[1]. So
> we could just count the bytes to the first space and pass the original
> buffer along with that length, no?
>
> -Peff
>
> [1] Not related to your patch, but it looks like type_from_string_gently
> is overly lax. It does a strncmp() with the length of the candidate
> name, but does not check that we consumed all of the matching name.
> So "tr" would match "tree", "comm" would match "commit", and so
> forth.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread Junio C Hamano
karthik nayak  writes:

> +   type = type_from_string_gently(buf, len, 1);
> +   if (oi->typename) {
> +   strbuf_add(oi->typename, buf, len);
> +   strbuf_addch(oi->typename, '\0');

add() has setlen() at the end so you do not have to NUL terminate it
yourself.  Doing addch() is actively wrong, as oi->typename->len now
counts the terminating NUL as part of the string, no?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread Jeff King
On Sat, Apr 18, 2015 at 12:15:28AM +0530, karthik nayak wrote:

> >But now we use type_from_string_gently, which can accept a length[1]. So
> >we could just count the bytes to the first space and pass the original
> >buffer along with that length, no?
> 
> Yes, we could, that would eliminate  "struct strbuf typename =
> STRBUF_INIT".
> 
> Something like this perhaps :

Yeah, this is exactly what I had in mind.

>   {
> -   char type[10];
> -   int i;
> +   const char *buf = hdr;
>  unsigned long size;
> +   int type, len = 0;

Maybe switch the names of "buf" and "len" to "type_buf" and "type_len"
to make their purpose more clear?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread karthik nayak



On 04/17/2015 07:53 PM, Jeff King wrote:

On Wed, Apr 15, 2015 at 06:18:24PM -0400, Jeff King wrote:

>> This _might_ have some performance impact in that strbuf_addch()
>> involves strbuf_grow(*, 1), which does "does it overflow to
>> increment sb->len by one?"; I would say it should be unmeasurable
>> because the function is expected to be used only on loose objects
>> and you shouldn't have very many of them without packing in your
>> repository in the first place.
>>
>> I guess Peff's c1822d4f (strbuf: add an optimized 1-character
>> strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his
>> new strbuf_grow_ch(), and once that happens the performance worry
>> would disappear without this code to be changed at all.
>
> I haven't re-rolled that series yet, but the discussion there showed
> that strbuf_grow_ch() is unnecessary; call-sites can just check:
>
>if (!strbuf_avail(sb))
> strbuf_grow(sb, 1);
>
> to get the fast inline check. Since we go to the trouble to inline
> strbuf_addch, we should probably teach it the same trick. It would be
> nice to identify a place with a tight strbuf_addch() loop that could
> demonstrate the speed increase, though.

Just for reference, I did teach strbuf_addch this trick. And the config
code is the tight loop to test it with. Results are here:

   http://article.gmane.org/gmane.comp.version-control.git/267266

In this code path, we are typically only seeing short strings (the
original code used a 10-byte static buffer). So I doubt it makes all
that much difference.

If there _is_ a performance implication to worry about here, I think it
would be that we are doing an extra malloc/free. I'm not sure I
understand why we are copying it at all. The original code copied from
the hdr into type[10] so that we could NUL-terminate it, which was
required for type_from_string().

But now we use type_from_string_gently, which can accept a length[1]. So
we could just count the bytes to the first space and pass the original
buffer along with that length, no?


Yes, we could, that would eliminate  "struct strbuf typename =
STRBUF_INIT".

Something like this perhaps :

@@ -1614,27 +1642,40 @@ static void *unpack_sha1_rest(git_zstream
*stream, void *buffer, unsigned long s
   * too permissive for what we want to check. So do an anal
   * object header parse by hand.
   */
-int parse_sha1_header(const char *hdr, unsigned long *sizep)
+static int parse_sha1_header_extended(const char *hdr, struct
object_info *oi,
+  unsigned int flags)
  {
-   char type[10];
-   int i;
+   const char *buf = hdr;
 unsigned long size;
+   int type, len = 0;

 /*
-* The type can be at most ten bytes (including the
-* terminating '\0' that we add), and is followed by
+* The type can be of any size but is followed by
  * a space.
  */
-   i = 0;
 for (;;) {
 char c = *hdr++;
 if (c == ' ')
 break;
-   type[i++] = c;
-   if (i >= sizeof(type))
-   return -1;
+   len++;
 }
-   type[i] = 0;
+
+   type = type_from_string_gently(buf, len, 1);
+   if (oi->typename) {
+   strbuf_add(oi->typename, buf, len);
+   strbuf_addch(oi->typename, '\0');
+   }
+   /*
+* Set type to 0 if its an unknown object and
+* we're obtaining the type using '--literally'
+* option.
+*/
+   if ((flags & LOOKUP_LITERALLY) && (type == -1))
+   type = 0;
+   else if (type == -1)
+   die("invalid object type");
+   if (oi->typep)
+   *oi->typep = type;

 /*
  * The length must follow immediately, and be in canonical



-Peff

[1] Not related to your patch, but it looks like type_from_string_gently
 is overly lax. It does a strncmp() with the length of the candidate
 name, but does not check that we consumed all of the matching name.
 So "tr" would match "tree", "comm" would match "commit", and so
 forth.


Thanks for the patch re-roll on strbuf_addch() and the patch on
type_from_string_gently().
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-17 Thread Jeff King
On Wed, Apr 15, 2015 at 06:18:24PM -0400, Jeff King wrote:

> > This _might_ have some performance impact in that strbuf_addch()
> > involves strbuf_grow(*, 1), which does "does it overflow to
> > increment sb->len by one?"; I would say it should be unmeasurable
> > because the function is expected to be used only on loose objects
> > and you shouldn't have very many of them without packing in your
> > repository in the first place.
> > 
> > I guess Peff's c1822d4f (strbuf: add an optimized 1-character
> > strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his
> > new strbuf_grow_ch(), and once that happens the performance worry
> > would disappear without this code to be changed at all.
> 
> I haven't re-rolled that series yet, but the discussion there showed
> that strbuf_grow_ch() is unnecessary; call-sites can just check:
> 
>   if (!strbuf_avail(sb))
>   strbuf_grow(sb, 1);
> 
> to get the fast inline check. Since we go to the trouble to inline
> strbuf_addch, we should probably teach it the same trick. It would be
> nice to identify a place with a tight strbuf_addch() loop that could
> demonstrate the speed increase, though.

Just for reference, I did teach strbuf_addch this trick. And the config
code is the tight loop to test it with. Results are here:

  http://article.gmane.org/gmane.comp.version-control.git/267266

In this code path, we are typically only seeing short strings (the
original code used a 10-byte static buffer). So I doubt it makes all
that much difference.

If there _is_ a performance implication to worry about here, I think it
would be that we are doing an extra malloc/free. I'm not sure I
understand why we are copying it at all. The original code copied from
the hdr into type[10] so that we could NUL-terminate it, which was
required for type_from_string().

But now we use type_from_string_gently, which can accept a length[1]. So
we could just count the bytes to the first space and pass the original
buffer along with that length, no?

-Peff

[1] Not related to your patch, but it looks like type_from_string_gently
is overly lax. It does a strncmp() with the length of the candidate
name, but does not check that we consumed all of the matching name.
So "tr" would match "tree", "comm" would match "commit", and so
forth.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-15 Thread Jeff King
On Wed, Apr 15, 2015 at 01:21:15PM -0700, Junio C Hamano wrote:

> > -   type[i++] = c;
> > -   if (i >= sizeof(type))
> > -   return -1;
> > +   strbuf_addch(&typename, c);
> > }
> > -   type[i] = 0;
> 
> This _might_ have some performance impact in that strbuf_addch()
> involves strbuf_grow(*, 1), which does "does it overflow to
> increment sb->len by one?"; I would say it should be unmeasurable
> because the function is expected to be used only on loose objects
> and you shouldn't have very many of them without packing in your
> repository in the first place.
> 
> I guess Peff's c1822d4f (strbuf: add an optimized 1-character
> strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his
> new strbuf_grow_ch(), and once that happens the performance worry
> would disappear without this code to be changed at all.

I haven't re-rolled that series yet, but the discussion there showed
that strbuf_grow_ch() is unnecessary; call-sites can just check:

  if (!strbuf_avail(sb))
strbuf_grow(sb, 1);

to get the fast inline check. Since we go to the trouble to inline
strbuf_addch, we should probably teach it the same trick. It would be
nice to identify a place with a tight strbuf_addch() loop that could
demonstrate the speed increase, though.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-15 Thread Junio C Hamano
Karthik Nayak  writes:

> Update sha1_loose_object_info() to optionally allow it to read
> from a loose object file of unknown/bogus type; as the function
> usually returns the type of the object it read in the form of enum
> for known types, add an optional "typename" field to receive the
> name of the type in textual form and a flag to indicate the reading
> of a loose object file of unknown/bogus type.
>
> Add parse_sha1_header_extended() which acts as a wrapper around
> parse_sha1_header() allowing more information to be obtained.
>
> Add unpack_sha1_header_to_strbuf() to unpack sha1 headers of
> unknown/corrupt objects which have a unknown sha1 header size to
> a strbuf structure. This was written by Junio C Hamano but tested
> by me.
>
> Helped-by: Junio C Hamano 
> Helped-by: Eric Sunshine 
> Signed-off-by: Karthik Nayak 
> ---

I see that you made the type parsing to happen earlier than the
previous round (which used to do the size first and then type).

Not a problem, though.  Just something I noticed...

> @@ -1614,27 +1642,40 @@ static void *unpack_sha1_rest(git_zstream *stream, 
> void *buffer, unsigned long s
>   * too permissive for what we want to check. So do an anal
>   * object header parse by hand.
>   */
> -int parse_sha1_header(const char *hdr, unsigned long *sizep)
> +int parse_sha1_header_extended(const char *hdr, struct object_info *oi,
> +unsigned int flags)
>  {
> - char type[10];
> - int i;
> + struct strbuf typename = STRBUF_INIT;
>   unsigned long size;
> + int type;
>  
>   /*
>* The type can be at most ten bytes (including the
>* terminating '\0' that we add), and is followed by
>* a space.
>*/
> - i = 0;
>   for (;;) {
>   char c = *hdr++;
>   if (c == ' ')
>   break;
> - type[i++] = c;
> - if (i >= sizeof(type))
> - return -1;
> + strbuf_addch(&typename, c);
>   }
> - type[i] = 0;

This _might_ have some performance impact in that strbuf_addch()
involves strbuf_grow(*, 1), which does "does it overflow to
increment sb->len by one?"; I would say it should be unmeasurable
because the function is expected to be used only on loose objects
and you shouldn't have very many of them without packing in your
repository in the first place.

I guess Peff's c1822d4f (strbuf: add an optimized 1-character
strbuf_grow, 2015-04-04) may want to teach strbuf_addch() to use his
new strbuf_grow_ch(), and once that happens the performance worry
would disappear without this code to be changed at all.

> @@ -2541,7 +2596,7 @@ static int sha1_loose_object_info(const unsigned char 
> *sha1,
>* return value implicitly indicates whether the
>* object even exists.
>*/
> - if (!oi->typep && !oi->sizep) {
> + if (!oi->typep && !oi->typename && !oi->sizep) {

You didn't have this check in the earlier round, and this new one is
correct, I think.  Good eyes to spot this potential problem.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 1/4] sha1_file.c: support reading from a loose object of unknown type

2015-04-15 Thread Karthik Nayak
Update sha1_loose_object_info() to optionally allow it to read
from a loose object file of unknown/bogus type; as the function
usually returns the type of the object it read in the form of enum
for known types, add an optional "typename" field to receive the
name of the type in textual form and a flag to indicate the reading
of a loose object file of unknown/bogus type.

Add parse_sha1_header_extended() which acts as a wrapper around
parse_sha1_header() allowing more information to be obtained.

Add unpack_sha1_header_to_strbuf() to unpack sha1 headers of
unknown/corrupt objects which have a unknown sha1 header size to
a strbuf structure. This was written by Junio C Hamano but tested
by me.

Helped-by: Junio C Hamano 
Helped-by: Eric Sunshine 
Signed-off-by: Karthik Nayak 
---
 cache.h |   2 +
 sha1_file.c | 126 +---
 2 files changed, 105 insertions(+), 23 deletions(-)

diff --git a/cache.h b/cache.h
index c47323e..ea6faf0 100644
--- a/cache.h
+++ b/cache.h
@@ -881,6 +881,7 @@ extern int is_ntfs_dotgit(const char *name);
 
 /* object replacement */
 #define LOOKUP_REPLACE_OBJECT 1
+#define LOOKUP_LITERALLY 2
 extern void *read_sha1_file_extended(const unsigned char *sha1, enum 
object_type *type, unsigned long *size, unsigned flag);
 static inline void *read_sha1_file(const unsigned char *sha1, enum object_type 
*type, unsigned long *size)
 {
@@ -1349,6 +1350,7 @@ struct object_info {
unsigned long *sizep;
unsigned long *disk_sizep;
unsigned char *delta_base_sha1;
+   struct strbuf *typename;
 
/* Response */
enum {
diff --git a/sha1_file.c b/sha1_file.c
index 980ce6b..267399d 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1564,6 +1564,34 @@ int unpack_sha1_header(git_zstream *stream, unsigned 
char *map, unsigned long ma
return git_inflate(stream, 0);
 }
 
+static int unpack_sha1_header_to_strbuf(git_zstream *stream, unsigned char 
*map,
+   unsigned long mapsize,
+   struct strbuf *header)
+{
+   unsigned char buffer[32], *cp;
+   unsigned long bufsiz = sizeof(buffer);
+   int status;
+
+   status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);
+
+   if (status) {
+   strbuf_add(header, buffer, stream->next_out - buffer);
+   return status;
+   }
+
+   do {
+   status = git_inflate(stream, 0);
+   strbuf_add(header, buffer, stream->next_out - buffer);
+   for (cp = buffer; cp < stream->next_out; cp++)
+   if (!*cp)
+   /* Found the NUL at the end of the header */
+   return 0;
+   stream->next_out = buffer;
+   stream->avail_out = bufsiz;
+   } while (status != Z_STREAM_END);
+   return -1;
+}
+
 static void *unpack_sha1_rest(git_zstream *stream, void *buffer, unsigned long 
size, const unsigned char *sha1)
 {
int bytes = strlen(buffer) + 1;
@@ -1614,27 +1642,40 @@ static void *unpack_sha1_rest(git_zstream *stream, void 
*buffer, unsigned long s
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-int parse_sha1_header(const char *hdr, unsigned long *sizep)
+int parse_sha1_header_extended(const char *hdr, struct object_info *oi,
+  unsigned int flags)
 {
-   char type[10];
-   int i;
+   struct strbuf typename = STRBUF_INIT;
unsigned long size;
+   int type;
 
/*
 * The type can be at most ten bytes (including the
 * terminating '\0' that we add), and is followed by
 * a space.
 */
-   i = 0;
for (;;) {
char c = *hdr++;
if (c == ' ')
break;
-   type[i++] = c;
-   if (i >= sizeof(type))
-   return -1;
+   strbuf_addch(&typename, c);
}
-   type[i] = 0;
+
+   type = type_from_string_gently(typename.buf, typename.len, 1);
+   if (oi->typename)
+   strbuf_addbuf(oi->typename, &typename);
+   strbuf_release(&typename);
+   /*
+* Set type to 0 if its an unknown object and
+* we're obtaining the type using '--literally'
+* option.
+*/
+   if ((flags & LOOKUP_LITERALLY) && (type == -1))
+   type = 0;
+   else if (type == -1)
+   die("invalid object type");
+   if (oi->typep)
+   *oi->typep = type;
 
/*
 * The length must follow immediately, and be in canonical
@@ -1652,12 +1693,24 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
size = size * 10 + c;
}
}
-   *sizep = size;
+
+   if (oi->sizep)
+   *oi->sizep = size;
 
/*
 * The leng