Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-26 Thread Arjun Sreedharan
On 26 August 2014 02:44, Junio C Hamano gits...@pobox.com wrote:
 Arjun Sreedharan arjun...@gmail.com writes:

 Find and allocate the required amount instead of allocating extra
 100 bytes

 Signed-off-by: Arjun Sreedharan arjun...@gmail.com
 ---

 Interesting.  How much memory do we typically waste (in other words,
 how big did cnt grew in your repository where you noticed this)?


I did not try to make an observation of that. I was thinking we're
anyway better off
not using a magic number to allot memory on the heap for each item in
the commit list.

  bisect.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

 diff --git a/bisect.c b/bisect.c
 index d6e851d..c96aab0 100644
 --- a/bisect.c
 +++ b/bisect.c
 @@ -215,10 +215,13 @@ static struct commit_list 
 *best_bisection_sorted(struct commit_list *list, int n
   }
   qsort(array, cnt, sizeof(*array), compare_commit_dist);
   for (p = list, i = 0; i  cnt; i++) {
 - struct name_decoration *r = xmalloc(sizeof(*r) + 100);
 + char name[100];
 + sprintf(name, dist=%d, array[i].distance);
 + int name_len = strlen(name);

 Decl-after-stmt.

 You do not have to run a separate strlen() for this.  The return
 value from sprintf() should tell you how much you need to allocate,
 I think.


Yes. yes.

 + struct name_decoration *r = xmalloc(sizeof(*r) + name_len);
   struct object *obj = (array[i].commit-object);

 - sprintf(r-name, dist=%d, array[i].distance);
 + memcpy(r-name, name, name_len + 1);
   r-next = add_decoration(name_decoration, obj, r);
   p-item = array[i].commit;
   p = p-next;
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-26 Thread Jeff King
On Mon, Aug 25, 2014 at 02:36:02PM -0700, Junio C Hamano wrote:

  I think you are right, and the patch is the right direction (assuming we
  want to do this; I question whether there are enough elements in the
  list for us to care about the size, and if there are, we are probably
  better off storing the int and formatting the strings on the fly).
 
 ;-)

Having now dug into this much further, the answers to those questions
are:

  1. Yes, we might actually have quite a lot of these, if you are
 bisecting a large range of commits. However, the code only runs
 when you are doing --bisect-all or skipping a commit, so probably
 nobody actually cares that much.

  2. It would be nicer to hold just an int, but we are hooking into
 the log-tree decorate machinery here, which expects a string. We'd
 need some kind of decorate-like callback machinery from log-tree to
 do this right. It's probably not worth the effort.

 Among the flex arrays we use, some are arrays of bools, some others
 are arrays of object names, and there are many arrays of even more
 esoteric types.  Only a small number of them are We want a struct
 with a constant string, and we do not want to do two allocations and
 to pay an extra dereference cost by using 'const char *'.

Yeah, I was working under the assumption that most of them were holding
a string. I just did a quick skim of the grep results for FLEX_ARRAY. Of
the 36 instances, 26 hold strings. 9 hold something else entirely, and 1
holds a char buffer that does not need to be NUL-terminated (so it
_could_ be handled by a similar helper, but using %s would be wrong).

So it's definitely the majority, but certainly not all. I decided to
look into this a little further, and my results are below. The tl;dr is
that no, we probably shouldn't do it. So you can stop reading if you
don't find this interesting. :)

 For them, by the time we allocate a struct, by definition we should
 have sufficient information to compute how large to make that
 structure and a printf-format plus its args would be the preferred
 form of that sufficient information, I would think.

I was tempted to also suggest a pure-string form (i.e., just take a
string, run strlen on it, and use that as the final value). That would
make the variadic macro problem go away. But besides name_decoration,
there are other cases that really do want formatting.  For instance,
alloc_ref basically wants to do (%s%s, prefix, name).

 The name fmt_flex_array(), which stresses too much on the
 formatting side without implying that it is the way to allocate
 the thing, may be horrible, and I agree with you that without
 variadic macros the end result may not read very well.  Unless we
 have great many number of places we can use the helper to make the
 code to create these objects look nicer, I am afraid that the
 pros-and-cons may not be very favourable.

Yeah, reading my suggestion again, it should clearly be
alloc_flex_struct or something.

Here's a fully-converted sample spot, where I think there's a slight
benefit:

diff --git a/remote.c b/remote.c
index 3d6c86a..ba32d40 100644
--- a/remote.c
+++ b/remote.c
@@ -928,14 +928,30 @@ int remote_find_tracking(struct remote *remote, struct 
refspec *refspec)
return query_refspecs(remote-fetch, remote-fetch_refspec_nr, refspec);
 }
 
+static void *alloc_flex_struct(size_t base, size_t offset, const char *fmt, 
...)
+{
+   va_list ap;
+   size_t extra;
+   char *ret;
+
+   va_start(ap, fmt);
+   extra = vsnprintf(NULL, 0, fmt, ap);
+   extra++; /* for NUL terminator */
+   va_end(ap);
+
+   ret = xcalloc(1, base + extra);
+   va_start(ap, fmt);
+   vsnprintf(ret + offset, extra, fmt, ap);
+   va_end(ap);
+
+   return ret;
+}
+
 static struct ref *alloc_ref_with_prefix(const char *prefix, size_t prefixlen,
const char *name)
 {
-   size_t len = strlen(name);
-   struct ref *ref = xcalloc(1, sizeof(struct ref) + prefixlen + len + 1);
-   memcpy(ref-name, prefix, prefixlen);
-   memcpy(ref-name + prefixlen, name, len);
-   return ref;
+   return alloc_flex_struct(sizeof(struct ref), offsetof(struct ref, name),
+%.*s%s, prefixlen, prefix, name);
 }
 
 struct ref *alloc_ref(const char *name)

Obviously the helper is much longer than the code it is replacing, but
it would be used in multiple spots. The main thing I like here is that
we are dropping the manual length computations, which are easy to get
wrong (it's easy to forget a +1 for a NUL terminator, etc).

The offsetof is a little ugly. And the fact that we have a pre-computed
length for prefixlen makes the format string a little ugly.

Here's a another example:

diff --git a/attr.c b/attr.c
index 734222d..100c423 100644
--- a/attr.c
+++ b/attr.c
@@ -89,8 +89,8 @@ static struct git_attr *git_attr_internal(const char *name, 
int len)
if (invalid_attr_name(name, len))

Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-26 Thread Ramsay Jones
On 26/08/14 12:03, Jeff King wrote:
[snip]
 
 Yeah, reading my suggestion again, it should clearly be
 alloc_flex_struct or something.
 
 Here's a fully-converted sample spot, where I think there's a slight
 benefit:
 
 diff --git a/remote.c b/remote.c
 index 3d6c86a..ba32d40 100644
 --- a/remote.c
 +++ b/remote.c
 @@ -928,14 +928,30 @@ int remote_find_tracking(struct remote *remote, struct 
 refspec *refspec)
   return query_refspecs(remote-fetch, remote-fetch_refspec_nr, refspec);
  }
  
 +static void *alloc_flex_struct(size_t base, size_t offset, const char *fmt, 
 ...)
 +{
 + va_list ap;
 + size_t extra;
 + char *ret;
 +
 + va_start(ap, fmt);
 + extra = vsnprintf(NULL, 0, fmt, ap);
 + extra++; /* for NUL terminator */
 + va_end(ap);
 +
 + ret = xcalloc(1, base + extra);
 + va_start(ap, fmt);
 + vsnprintf(ret + offset, extra, fmt, ap);

What is the relationship between 'base' and 'offset'?

Let me assume that base is always, depending on your compiler, either
equal to offset or offset+1. Yes? (I'm assuming base is always the
sizeof(struct whatever)). Do you need both base and offset?

 + va_end(ap);
 +
 + return ret;
 +}
 +
  static struct ref *alloc_ref_with_prefix(const char *prefix, size_t 
 prefixlen,
   const char *name)
  {
 - size_t len = strlen(name);
 - struct ref *ref = xcalloc(1, sizeof(struct ref) + prefixlen + len + 1);
 - memcpy(ref-name, prefix, prefixlen);
 - memcpy(ref-name + prefixlen, name, len);
 - return ref;
 + return alloc_flex_struct(sizeof(struct ref), offsetof(struct ref, name),
 +  %.*s%s, prefixlen, prefix, name);
  }
  
  struct ref *alloc_ref(const char *name)
 
 Obviously the helper is much longer than the code it is replacing, but
 it would be used in multiple spots. The main thing I like here is that
 we are dropping the manual length computations, which are easy to get
 wrong (it's easy to forget a +1 for a NUL terminator, etc).
 
 The offsetof is a little ugly. And the fact that we have a pre-computed
 length for prefixlen makes the format string a little ugly.
 
 Here's a another example:
 
 diff --git a/attr.c b/attr.c
 index 734222d..100c423 100644
 --- a/attr.c
 +++ b/attr.c
 @@ -89,8 +89,8 @@ static struct git_attr *git_attr_internal(const char *name, 
 int len)
   if (invalid_attr_name(name, len))
   return NULL;
  
 - a = xmalloc(sizeof(*a) + len + 1);
 - memcpy(a-name, name, len);
 + a = alloc_flex_array(sizeof(*a), offsetof(struct git_attr, name),
 +  %.*s, len, name);
   a-name[len] = 0;
   a-h = hval;
   a-next = git_attr_hash[pos];
 
 I think this is strictly worse for reading. The original computation was
 pretty easy in the first place, so we are not getting much benefit
 there. And again we have the precomputed len passed into the function,
 so we have to use the less readable %.*s. And note that offsetof()
 requires us to pass a real typename instead of just *a, as sizeof()
 allows (I suspect passing a-name - a would work on many systems, but
 I also suspect that is a gross violation of the C standard when a has
 not yet been initialized).
 
 So given that the benefit ranges from a little to negative in these
 two examples, I'm inclined to give up this line of inquiry.

Indeed. :-D

ATB,
Ramsay Jones



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-26 Thread Jeff King
On Tue, Aug 26, 2014 at 12:57:21PM +0100, Ramsay Jones wrote:

  +   ret = xcalloc(1, base + extra);
  +   va_start(ap, fmt);
  +   vsnprintf(ret + offset, extra, fmt, ap);
 
 What is the relationship between 'base' and 'offset'?
 
 Let me assume that base is always, depending on your compiler, either
 equal to offset or offset+1. Yes? (I'm assuming base is always the
 sizeof(struct whatever)). Do you need both base and offset?

It's much more complicated than that. Take struct name_decoration, for
instance, which looks like this:

  struct name_decoration {
struct name_decoration *next;
int type;
char name[FLEX_ARRAY];
  };

On my 64-bit system using gcc, sizeof() returns 16; it has to pad the
whole thing to 64-bit alignment in case I put two of them in an array.
But offsetof(name) is 12, since the array of char does not need the same
alignment; it can go right after the type and make use of the padding
bits.

As a side note, that means that the original char name[1] (before it
became FLEX_ARRAY) was not any less efficient on 64-bit machines (the
1-byte went into the padding, and sizeof() was the same). It did matter
on 32-bit systems, though where it bumped the empty struct size from 12
to 16.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-26 Thread Ramsay Jones
On 26/08/14 13:14, Jeff King wrote:
 On Tue, Aug 26, 2014 at 12:57:21PM +0100, Ramsay Jones wrote:
 
 +   ret = xcalloc(1, base + extra);
 +   va_start(ap, fmt);
 +   vsnprintf(ret + offset, extra, fmt, ap);

 What is the relationship between 'base' and 'offset'?

 Let me assume that base is always, depending on your compiler, either
 equal to offset or offset+1. Yes? (I'm assuming base is always the
 sizeof(struct whatever)). Do you need both base and offset?
 
 It's much more complicated than that. Take struct name_decoration, for
 instance, which looks like this:
 
   struct name_decoration {
   struct name_decoration *next;
   int type;
   char name[FLEX_ARRAY];
   };
 
 On my 64-bit system using gcc, sizeof() returns 16; it has to pad the
 whole thing to 64-bit alignment in case I put two of them in an array.
 But offsetof(name) is 12, since the array of char does not need the same
 alignment; it can go right after the type and make use of the padding
 bits.

Hmm, interesting. I must re-read the standard. I was convinced that the
standard *requires* any alignment padding to come *before* the name field.
(how would you put a, non-trivial, variable data structure into an array?)

ATB,
Ramsay Jones


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-26 Thread Jeff King
On Tue, Aug 26, 2014 at 01:37:44PM +0100, Ramsay Jones wrote:

  On my 64-bit system using gcc, sizeof() returns 16; it has to pad the
  whole thing to 64-bit alignment in case I put two of them in an array.
  But offsetof(name) is 12, since the array of char does not need the same
  alignment; it can go right after the type and make use of the padding
  bits.
 
 Hmm, interesting. I must re-read the standard. I was convinced that the
 standard *requires* any alignment padding to come *before* the name field.
 (how would you put a, non-trivial, variable data structure into an array?)

I think you don't. How would you compute a[1] if a[0] has a variable
size?

You can put a flex-array structure into an array, but then each element
has the flex member as zero-size (and you should not access it, of
course).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-26 Thread Ramsay Jones
On 26/08/14 13:43, Jeff King wrote:
 On Tue, Aug 26, 2014 at 01:37:44PM +0100, Ramsay Jones wrote:
 
 On my 64-bit system using gcc, sizeof() returns 16; it has to pad the
 whole thing to 64-bit alignment in case I put two of them in an array.
 But offsetof(name) is 12, since the array of char does not need the same
 alignment; it can go right after the type and make use of the padding
 bits.

 Hmm, interesting. I must re-read the standard. I was convinced that the
 standard *requires* any alignment padding to come *before* the name field.
 (how would you put a, non-trivial, variable data structure into an array?)
 
 I think you don't. How would you compute a[1] if a[0] has a variable
 size?
 
 You can put a flex-array structure into an array, but then each element
 has the flex member as zero-size (and you should not access it, of
 course).

Exactly. ;-)

ATB,
Ramsay Jones



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-25 Thread Jeff King
On Sun, Aug 24, 2014 at 04:39:37PM -0700, Junio C Hamano wrote:

 On Sun, Aug 24, 2014 at 8:10 AM, Stefan Beller stefanbel...@gmail.com wrote:
for (p = list, i = 0; i  cnt; i++) {
  - struct name_decoration *r = xmalloc(sizeof(*r) + 100);
  + char name[100];
 
  Would it make sense to convert the 'name' into a git strbuf?
  Please have a look at Documentation/technical/api-strbuf.txt
 
 Why not go one step further and format it twice, once only
 to measure the necessary size to allocate, allocate and
 then format into it for real? Then you do not need to print
 into a temporary strbuf, copy the result and free the strbuf,
 no?
 
 BUT.
 
 The string will always be dist= followed by decimal representation of
 a count that fits in int anyway, so I actually think use of strbuf is way
 overkill (and formatting it twice also is); the patch as posted should be
 just fine.

I think you are right, and the patch is the right direction (assuming we
want to do this; I question whether there are enough elements in the
list for us to care about the size, and if there are, we are probably
better off storing the int and formatting the strings on the fly).

I wonder if there is a way we could get rid of the magic 100 here,
though. Its meaning is enough to hold 'dist=' and any integer. But you
have to read carefully to see that this call to sprintf is not a buffer
overflow. A strbuf is one way to get rid of it, though it is awkward
because we then have to copy the result into a flex-array structure.

It would be nice if there was some way to abstract the idea of
formatting a buffer directly into a flex-array. That would involve the
double-format you mention, but we could use it in lots of places to make
the code nicer. Maybe like:

  void *fmt_flex_array(size_t base, const char *fmt, ...)
  {
  va_list ap;
  size_t flex;
  unsigned char *ret;

  va_start(ap, fmt);
  flex = vsnprintf(NULL, 0, fmt, ap);
  va_end(ap);

  ret = xmalloc(base + flex + 1);
  va_start(ap, fmt);
  /* Eek, see below */
  vsnprintf(ret + flex, flex + 1, fmt, ap);

  return ret;
  }

and you'd call it like:

  struct name_decoration *r = fmt_flex_array(sizeof(*r), dist=%d, x);

Except that I don't think we are guaranteed that offsetof(mystruct,
flex_member) is equal to sizeof(mystruct). If FLEX_ARRAY is 0, it should
be, but some platforms use FLEX_ARRAY=1. So you'd have to pass in the
offset like:

  struct name_decoration *r = fmt_flex_array(sizeof(*r),
 offsetof(*r, name),
 dist=%d, x);

which is a little less nice. You could make it nicer with a macro, but
we don't assume variadic macros. sigh

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-25 Thread Jeff King
On Sun, Aug 24, 2014 at 07:47:24PM +0530, Arjun Sreedharan wrote:

 diff --git a/bisect.c b/bisect.c
 index d6e851d..c96aab0 100644
 --- a/bisect.c
 +++ b/bisect.c
 @@ -215,10 +215,13 @@ static struct commit_list *best_bisection_sorted(struct 
 commit_list *list, int n
   }
   qsort(array, cnt, sizeof(*array), compare_commit_dist);
   for (p = list, i = 0; i  cnt; i++) {
 - struct name_decoration *r = xmalloc(sizeof(*r) + 100);
 + char name[100];
 + sprintf(name, dist=%d, array[i].distance);
 + int name_len = strlen(name);
 + struct name_decoration *r = xmalloc(sizeof(*r) + name_len);

This allocation should be name_len + 1 for the NUL-terminator, no?

It looks like add_name_decoration in log-tree already handles half of
what you are adding here. Can we just make that available globally (it
is manipulating the already-global struct decoration name_decoration)?

I also notice that we do not set r-type at all, meaning the decoration
lookup code in log-tree will access uninitialized memory (worse, it will
use it as a pointer offset into the color list; I got a segfault when I
tried to run git rev-list --bisect-all v1.8.0..v1.9.0).

I think we need this:

diff --git a/bisect.c b/bisect.c
index d6e851d..e2a7682 100644
--- a/bisect.c
+++ b/bisect.c
@@ -219,6 +219,7 @@ static struct commit_list *best_bisection_sorted(struct 
commit_list *list, int n
struct object *obj = (array[i].commit-object);
 
sprintf(r-name, dist=%d, array[i].distance);
+   r-type = 0;
r-next = add_decoration(name_decoration, obj, r);
p-item = array[i].commit;
p = p-next;

at a minimum.

It looks like this was a regression caused by eb3005e (commit.h: add
'type' to struct name_decoration, 2010-06-19). Which makes me wonder if
anybody actually _uses_ --bisect-all (which AFAICT is the only way to
trigger the problem), but since it's public, I guess we should keep it.

I think the sane thing here is to stop advertising name_decoration as a
global, and make all callers use add_name_decoration. That makes it
easier for callers like this one, and would have caught the regression
caused be eb3005e (the compiler would have noticed that we were not
passing a type parameter to the function).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-25 Thread Christian Couder
On Mon, Aug 25, 2014 at 3:35 PM, Jeff King p...@peff.net wrote:
 On Sun, Aug 24, 2014 at 07:47:24PM +0530, Arjun Sreedharan wrote:

 diff --git a/bisect.c b/bisect.c
 index d6e851d..c96aab0 100644
 --- a/bisect.c
 +++ b/bisect.c
 @@ -215,10 +215,13 @@ static struct commit_list 
 *best_bisection_sorted(struct commit_list *list, int n
   }
   qsort(array, cnt, sizeof(*array), compare_commit_dist);
   for (p = list, i = 0; i  cnt; i++) {
 - struct name_decoration *r = xmalloc(sizeof(*r) + 100);
 + char name[100];
 + sprintf(name, dist=%d, array[i].distance);
 + int name_len = strlen(name);
 + struct name_decoration *r = xmalloc(sizeof(*r) + name_len);

 This allocation should be name_len + 1 for the NUL-terminator, no?

I wondered about that too, but as struct name_decoration is defined like this:

struct name_decoration {
struct name_decoration *next;
int type;
char name[1];
};

the .name field of this struct already has one char, so the allocation
above should be ok.

 It looks like add_name_decoration in log-tree already handles half of
 what you are adding here. Can we just make that available globally (it
 is manipulating the already-global struct decoration name_decoration)?

Yeah, it looks like it should be better.

Note that add_name_decoration() does:

int nlen = strlen(name);
struct name_decoration *res = xmalloc(sizeof(struct name_decoration) + nlen);

so it also relies on the fact that .name contains one char.

 I also notice that we do not set r-type at all, meaning the decoration
 lookup code in log-tree will access uninitialized memory (worse, it will
 use it as a pointer offset into the color list; I got a segfault when I
 tried to run git rev-list --bisect-all v1.8.0..v1.9.0).

 I think we need this:

 diff --git a/bisect.c b/bisect.c
 index d6e851d..e2a7682 100644
 --- a/bisect.c
 +++ b/bisect.c
 @@ -219,6 +219,7 @@ static struct commit_list *best_bisection_sorted(struct 
 commit_list *list, int n
 struct object *obj = (array[i].commit-object);

 sprintf(r-name, dist=%d, array[i].distance);
 +   r-type = 0;
 r-next = add_decoration(name_decoration, obj, r);
 p-item = array[i].commit;
 p = p-next;

 at a minimum.

Yeah if we don't use add_name_decoration() we would need that.
Thanks for noticing.

 It looks like this was a regression caused by eb3005e (commit.h: add
 'type' to struct name_decoration, 2010-06-19). Which makes me wonder if
 anybody actually _uses_ --bisect-all (which AFAICT is the only way to
 trigger the problem), but since it's public, I guess we should keep it.

Yeah, we should probably keep it.

 I think the sane thing here is to stop advertising name_decoration as a
 global, and make all callers use add_name_decoration. That makes it
 easier for callers like this one, and would have caught the regression
 caused be eb3005e (the compiler would have noticed that we were not
 passing a type parameter to the function).

I agree.

Thanks,
Christian.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-25 Thread Jeff King
On Mon, Aug 25, 2014 at 04:06:52PM +0200, Christian Couder wrote:

  This allocation should be name_len + 1 for the NUL-terminator, no?
 
 I wondered about that too, but as struct name_decoration is defined like this:
 
 struct name_decoration {
 struct name_decoration *next;
 int type;
 char name[1];
 };
 
 the .name field of this struct already has one char, so the allocation
 above should be ok.

Yeah, you're right. I would argue it should just be FLEX_ARRAY for
consistency with other spots, though (in which case add_name_decoration
needs to be updated with a +1).

Running git grep '^char [^ ]*\[[01]]' -- '*.[ch]' shows that this
is one of only two spots that don't use FLEX_ARRAY (and the other has a
comment explaining why not).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-25 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Mon, Aug 25, 2014 at 04:06:52PM +0200, Christian Couder wrote:

  This allocation should be name_len + 1 for the NUL-terminator, no?
 
 I wondered about that too, but as struct name_decoration is defined like 
 this:
 
 struct name_decoration {
 struct name_decoration *next;
 int type;
 char name[1];
 };
 
 the .name field of this struct already has one char, so the allocation
 above should be ok.

 Yeah, you're right. I would argue it should just be FLEX_ARRAY for
 consistency with other spots, though (in which case add_name_decoration
 needs to be updated with a +1).

 Running git grep '^  char [^ ]*\[[01]]' -- '*.[ch]' shows that this
 is one of only two spots that don't use FLEX_ARRAY (and the other has a
 comment explaining why not).

Good digging, and I agree that it should use the FLEX_ARRAY for
consistency.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-25 Thread Jeff King
On Mon, Aug 25, 2014 at 11:26:39AM -0700, Junio C Hamano wrote:

 Good digging, and I agree that it should use the FLEX_ARRAY for
 consistency.

I can produce a patch, but I did not want to steal Arjun's thunder.

Arjun, did my proposal make sense? Do you want to try implementing that?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-25 Thread Arjun Sreedharan
On 26 August 2014 01:05, Jeff King p...@peff.net wrote:
 On Mon, Aug 25, 2014 at 11:26:39AM -0700, Junio C Hamano wrote:

 Good digging, and I agree that it should use the FLEX_ARRAY for
 consistency.

 I can produce a patch, but I did not want to steal Arjun's thunder.

Please feel free to do so.


 Arjun, did my proposal make sense? Do you want to try implementing that?

 -Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-25 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Mon, Aug 25, 2014 at 11:26:39AM -0700, Junio C Hamano wrote:

 Good digging, and I agree that it should use the FLEX_ARRAY for
 consistency.

 I can produce a patch, but I did not want to steal Arjun's thunder.

Hmph, would it have to overlap?  I think we can queue Arjun's patch
with +1 fix and FLEX_ARRAY thing separately, and they can go in in
any order, no?

 Arjun, did my proposal make sense? Do you want to try implementing that?

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-25 Thread Junio C Hamano
Arjun Sreedharan arjun...@gmail.com writes:

 Find and allocate the required amount instead of allocating extra
 100 bytes

 Signed-off-by: Arjun Sreedharan arjun...@gmail.com
 ---

Interesting.  How much memory do we typically waste (in other words,
how big did cnt grew in your repository where you noticed this)?

  bisect.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

 diff --git a/bisect.c b/bisect.c
 index d6e851d..c96aab0 100644
 --- a/bisect.c
 +++ b/bisect.c
 @@ -215,10 +215,13 @@ static struct commit_list *best_bisection_sorted(struct 
 commit_list *list, int n
   }
   qsort(array, cnt, sizeof(*array), compare_commit_dist);
   for (p = list, i = 0; i  cnt; i++) {
 - struct name_decoration *r = xmalloc(sizeof(*r) + 100);
 + char name[100];
 + sprintf(name, dist=%d, array[i].distance);
 + int name_len = strlen(name);

Decl-after-stmt.

You do not have to run a separate strlen() for this.  The return
value from sprintf() should tell you how much you need to allocate,
I think.

 + struct name_decoration *r = xmalloc(sizeof(*r) + name_len);
   struct object *obj = (array[i].commit-object);
  
 - sprintf(r-name, dist=%d, array[i].distance);
 + memcpy(r-name, name, name_len + 1);
   r-next = add_decoration(name_decoration, obj, r);
   p-item = array[i].commit;
   p = p-next;
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-25 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 The string will always be dist= followed by decimal representation of
 a count that fits in int anyway, so I actually think use of strbuf is way
 overkill (and formatting it twice also is); the patch as posted should be
 just fine.

 I think you are right, and the patch is the right direction (assuming we
 want to do this; I question whether there are enough elements in the
 list for us to care about the size, and if there are, we are probably
 better off storing the int and formatting the strings on the fly).

;-)

 It would be nice if there was some way to abstract the idea of
 formatting a buffer directly into a flex-array. That would involve the
 double-format you mention, but we could use it in lots of places to make
 the code nicer
 ...
   struct name_decoration *r = fmt_flex_array(sizeof(*r),
  offsetof(*r, name),
dist=%d, x);

 which is a little less nice. You could make it nicer with a macro, but
 we don't assume variadic macros. sigh

At first I thought Yuck.  A helper only to format into the flex
member that holds a string?, and I tried to change my mind, but I
couldn't quite convince myself.  At least not yet.

Among the flex arrays we use, some are arrays of bools, some others
are arrays of object names, and there are many arrays of even more
esoteric types.  Only a small number of them are We want a struct
with a constant string, and we do not want to do two allocations and
to pay an extra dereference cost by using 'const char *'.

For them, by the time we allocate a struct, by definition we should
have sufficient information to compute how large to make that
structure and a printf-format plus its args would be the preferred
form of that sufficient information, I would think.

The name fmt_flex_array(), which stresses too much on the
formatting side without implying that it is the way to allocate
the thing, may be horrible, and I agree with you that without
variadic macros the end result may not read very well.  Unless we
have great many number of places we can use the helper to make the
code to create these objects look nicer, I am afraid that the
pros-and-cons may not be very favourable.

Thanks for an interesting tangent.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-24 Thread Stefan Beller
On 24.08.2014 16:17, Arjun Sreedharan wrote:
 Find and allocate the required amount instead of
 allocating extra 100 bytes
 
 Signed-off-by: Arjun Sreedharan arjun...@gmail.com
 ---
  bisect.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/bisect.c b/bisect.c
 index d6e851d..c96aab0 100644
 --- a/bisect.c
 +++ b/bisect.c
 @@ -215,10 +215,13 @@ static struct commit_list *best_bisection_sorted(struct 
 commit_list *list, int n
   }
   qsort(array, cnt, sizeof(*array), compare_commit_dist);
   for (p = list, i = 0; i  cnt; i++) {
 - struct name_decoration *r = xmalloc(sizeof(*r) + 100);
 + char name[100];

Would it make sense to convert the 'name' into a git strbuf?
Please have a look at Documentation/technical/api-strbuf.txt

 + sprintf(name, dist=%d, array[i].distance);
 + int name_len = strlen(name);
 + struct name_decoration *r = xmalloc(sizeof(*r) + name_len);
   struct object *obj = (array[i].commit-object);
  
 - sprintf(r-name, dist=%d, array[i].distance);
 + memcpy(r-name, name, name_len + 1);
   r-next = add_decoration(name_decoration, obj, r);
   p-item = array[i].commit;
   p = p-next;
 

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-24 Thread Ramsay Jones
On 24/08/14 15:17, Arjun Sreedharan wrote:
 Find and allocate the required amount instead of
 allocating extra 100 bytes
 
 Signed-off-by: Arjun Sreedharan arjun...@gmail.com
 ---
  bisect.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/bisect.c b/bisect.c
 index d6e851d..c96aab0 100644
 --- a/bisect.c
 +++ b/bisect.c
 @@ -215,10 +215,13 @@ static struct commit_list *best_bisection_sorted(struct 
 commit_list *list, int n
   }
   qsort(array, cnt, sizeof(*array), compare_commit_dist);
   for (p = list, i = 0; i  cnt; i++) {
 - struct name_decoration *r = xmalloc(sizeof(*r) + 100);
 + char name[100];
 + sprintf(name, dist=%d, array[i].distance);
 + int name_len = strlen(name);
 + struct name_decoration *r = xmalloc(sizeof(*r) + name_len);
   struct object *obj = (array[i].commit-object);

declaration(s) after statement.

  
 - sprintf(r-name, dist=%d, array[i].distance);
 + memcpy(r-name, name, name_len + 1);
   r-next = add_decoration(name_decoration, obj, r);
   p-item = array[i].commit;
   p = p-next;
 

HTH

ATB,
Ramsay Jones



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bisect: save heap memory. allocate only the required amount

2014-08-24 Thread Junio C Hamano
On Sun, Aug 24, 2014 at 8:10 AM, Stefan Beller stefanbel...@gmail.com wrote:
   for (p = list, i = 0; i  cnt; i++) {
 - struct name_decoration *r = xmalloc(sizeof(*r) + 100);
 + char name[100];

 Would it make sense to convert the 'name' into a git strbuf?
 Please have a look at Documentation/technical/api-strbuf.txt

Why not go one step further and format it twice, once only
to measure the necessary size to allocate, allocate and
then format into it for real? Then you do not need to print
into a temporary strbuf, copy the result and free the strbuf,
no?

BUT.

The string will always be dist= followed by decimal representation of
a count that fits in int anyway, so I actually think use of strbuf is way
overkill (and formatting it twice also is); the patch as posted should be
just fine.

Or am I missing some uses that require larger/unbounded buffer outside
the context of the patch?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html