Re: Project idea: strbuf allocation modes

2014-04-22 Thread Matthieu Moy
Michael Haggerty mhag...@alum.mit.edu writes:

 STRBUF_OWNS_MEMORY

 The memory pointed to by buf is owned by this strbuf.  If this
 bit is not set, then the memory should never be freed, and
 (among other things) strbuf_detach() must always call xstrcpy().

I just foresee a small difficulty from the caller side:

char path_buf[PATH_MAX];
char *detached;

struct strbuf path;

strbuf_wrap_preallocated(path, path_buf, 0, sizeof(path));
...
detached = strbuf_detach(path);

the wrapping/unwrapping of path_buf magically turned a stack-allocated
buffer into a heap-allocated one. And the initial goal of avoiding
malloc() is defeated. So, essentially, one should avoir using
strbuf_wrap_preallocated with strbuf_detach, right? But I agree with
Junio that if the API is properly used, everything should work. I'm just
worried that we will add a bit more complexity to the API, and I'm not
sure we can actually expect noticeable improvements in terms of
performance (i.e. do we actually use strbuf for performance-critical
stuff?).

In your proposal, would STRBUF_OWNS_MEMORY be a constant, or a flag that
change when the internal buffer needs reallocation? My understanding is
that it should change (if STRBUF_FIXED_MEMORY is not set), and the
strbuf wrapping a preallocated buffer would become a normal strbuf
when its internal buffer grows.

If so, your strbuf_detach() must always call xstrcpy() is to be
understood as if STRBUF_OWNS_MEMORY is still set when strbuf_detach()
is called, then it must always call xstrcpy(), right?

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Project idea: strbuf allocation modes

2014-04-22 Thread Michael Haggerty
On 04/22/2014 09:07 AM, Matthieu Moy wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 STRBUF_OWNS_MEMORY

 The memory pointed to by buf is owned by this strbuf.  If this
 bit is not set, then the memory should never be freed, and
 (among other things) strbuf_detach() must always call xstrcpy().
 
 I just foresee a small difficulty from the caller side:
 
 char path_buf[PATH_MAX];
 char *detached;
 
 struct strbuf path;
 
 strbuf_wrap_preallocated(path, path_buf, 0, sizeof(path));
 ...
 detached = strbuf_detach(path);
 
 the wrapping/unwrapping of path_buf magically turned a stack-allocated
 buffer into a heap-allocated one. And the initial goal of avoiding
 malloc() is defeated. So, essentially, one should avoir using
 strbuf_wrap_preallocated with strbuf_detach, right?

Correct; it wouldn't bring any performance advantages.  But
strbuf_detach() should be implemented for such strbufs for completeness
so that after the strbuf is instantiated, users don't have to know its
allocation mode.

  But I agree with
 Junio that if the API is properly used, everything should work. I'm just
 worried that we will add a bit more complexity to the API, and I'm not
 sure we can actually expect noticeable improvements in terms of
 performance (i.e. do we actually use strbuf for performance-critical
 stuff?).

The whole point of the change is to *allow* strbuf to be used in
performance-critical stuff.

 In your proposal, would STRBUF_OWNS_MEMORY be a constant, or a flag that
 change when the internal buffer needs reallocation? My understanding is
 that it should change (if STRBUF_FIXED_MEMORY is not set), and the
 strbuf wrapping a preallocated buffer would become a normal strbuf
 when its internal buffer grows.

Correct.  STRBUF_OWNS_MEMORY itself is of course a constant like 0x02 (a
mask to set/clear a bit in the flags) but the corresponding flags bit
would sometimes be changed internal to the strbuf implementation.  For
example, if the buffer is grown past its original size (when
STRBUF_FIXED_MEMORY is not set) the bit would be set.

After strbuf_detach(), there would be no automatic way to re-attach such
a strbuf to its original heap-allocated memory.  So in this situation
the buffer would have to be pointed at strbuf_slopbuf again, and the
STRBUF_OWNS_MEMORY bit would be cleared again.

 If so, your strbuf_detach() must always call xstrcpy() is to be
 understood as if STRBUF_OWNS_MEMORY is still set when strbuf_detach()
 is called, then it must always call xstrcpy(), right?

Exactly.


How does the size of this project compare to what you are looking for
for your Ensimag students?

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Project idea: strbuf allocation modes

2014-04-22 Thread Matthieu Moy
Michael Haggerty mhag...@alum.mit.edu writes:

 On 04/22/2014 09:07 AM, Matthieu Moy wrote:

 The whole point of the change is to *allow* strbuf to be used in
 performance-critical stuff.

OK. It should not make the current use of strbuf any harder anyway.

 In your proposal, would STRBUF_OWNS_MEMORY be a constant, or a flag that
 change when the internal buffer needs reallocation? My understanding is
 that it should change (if STRBUF_FIXED_MEMORY is not set), and the
 strbuf wrapping a preallocated buffer would become a normal strbuf
 when its internal buffer grows.

 Correct.  STRBUF_OWNS_MEMORY itself is of course a constant like 0x02

Yes, I meant the STRBUF_OWNS_MEMORY flag.

 How does the size of this project compare to what you are looking for
 for your Ensimag students?

I do not yet have applications for this project (it should come within
the next few days). It greatly depends on students and team size. I
always start with a warm up patch (like GSoC's microprojects), and
depending on the time taken for it, I direct students to bigger or
smaller projects.

Your proposal is a bit tricky (it requires a good understanding of C and
memory management), but it is a purely local change (i.e. you can take
strbuf.[cf] out of Git's source code and hack it as a ~500 LOC project,
unit-test and document the result in a self-contained way), so it should
be doable. Using the new API features here and there in Git's source
code is another story though.

I've added the proposal with a link to this discussion here:

  
https://git.wiki.kernel.org/index.php/SmallProjectsIdeas#Allow_finer_memory_management_in_the_strbuf_API

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Project idea: strbuf allocation modes

2014-04-18 Thread Michael Haggerty
I like that strbuf is getting used more than it used to, and I think
that is a good general trend to help git rid of and/or avoid buffer
overflows and arbitrary limits on strings.

It is unfortunate that it is currently impossible to use a strbuf
without doing a memory allocation.  So code like

void f()
{
char path[PATH_MAX];

...
}

typically gets turned into either

void f()
{
struct strbuf path;

strbuf_add(path, ...);   -- does a malloc
...
strbuf_release(path);-- does a free
}

which costs extra memory allocations, or

void f()
{
static struct strbuf path;

strbuf_add(path, ...);
...
strbuf_setlen(path, 0);
}

which, by using a static variable, avoids most of the malloc/free
overhead, but makes the function unsafe to use recursively or from
multiple threads.


The Idea


I would like to see strbuf enhanced to allow it to use memory that it
doesn't own (for example, stack-allocated memory), while (optionally)
allowing it to switch over to using allocated memory if the string grows
past the size of the pre-allocated buffer.

The way I envision this, strbuf would gain a flags field with two bits:

STRBUF_OWNS_MEMORY

The memory pointed to by buf is owned by this strbuf.  If this
bit is not set, then the memory should never be freed, and
(among other things) strbuf_detach() must always call xstrcpy().

STRBUF_FIXED_MEMORY

This strbuf can use the memory of length alloc at buf however
it likes.  But it must never free() or realloc() the memory or
move the string.  Essentially, buf should be treated as a
constant pointer.  If the string overgrows the buffer, die().

The different bit combinations would have the following effects:

flags == ~STRBUF_OWNS_MEMORY | ~STRBUF_FIXED_MEMORY

The strbuf doesn't own the current memory, but it is allowed to
grow the buffer into other (allocated) memory.  When needed,
malloc() new memory, copy the old string to the new memory,
and clear this bit -- but *don't* free the old memory.  Note
that STRBUF_INIT, when buf is initialized to point at
strbuf_slopbuf, would use this flag setting.  This flag
combination could be used to wrap a stack-allocated buffer
without preventing the string from growing beyond the
size of the buffer:

void f()
{
char path_buf[PATH_MAX];
struct strbuf path;

strbuf_wrap_preallocated(path, path_buf, 0, sizeof(path));
...
strbuf_release(path);
}

(Probably the boilerplate could be reduced by using macros.)

flags == STRBUF_OWNS_MEMORY | ~STRBUF_FIXED_MEMORY

This gives the current behavior of a non-empty strbuf.

flags == ~STRBUF_OWNS_MEMORY | STRBUF_FIXED_MEMORY

This would make it possible to use a strbuf to wrap a
fixed-size buffer that belongs to somebody else and must not
be moved.  For example,

void f(char *path_buf, size_t path_buf_len)
{
struct strbuf path;

strbuf_wrap_fixed(path, path_buf,
  strlen(path_buf), path_buf_len);
...
/*
 * no strbuf_release() required here, but if called it
 * is a NOOP
 */
}

(Probably the boilerplate could be reduced by using macros.)

It would also be possible to use this mode to dynamically
allocate a strbuf along with space for its string (i.e., to
make do with one allocation instead of two):

struct strbuf sb = xmalloc(sizeof(strbuf) + len + 1);
sb.alloc = len + 1;
sb.len = len;
sb.flags = STRBUF_FIXED_MEMORY;
sb.buf = (void *)sb + sizeof(strbuf);
*sb.buf = '\0';

If this is considered useful, it should of course be offered
via a function.

flags == STRBUF_OWNS_MEMORY | STRBUF_FIXED_MEMORY

This combination seems less useful, but I leave it here for
completeness.  It is basically a strbuf whose length is
constrained to its currently-allocated size.

The nice thing is that I believe this new functionality could be
implemented without slowing down the most-used strbuf functions.  For
example, none of the inline functions would have to be changed.  The
flags would only have to be checked when doing memory-related operations
like strbuf_grow(), strbuf_detach(), etc.


Anyway, I just wanted to get the idea out there.  I'd like to get
feedback about whether people think this is a good idea.  If so, maybe
I'll add it to the wiki as a suggested project.

I've wanted to work on this myself, but realistically I'm not going to
have time in the near future.  It might be a good Ensimag project or a
future GSoC project (though it might be a bit 

Re: Project idea: strbuf allocation modes

2014-04-18 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 The Idea
 

 I would like to see strbuf enhanced to allow it to use memory that it
 doesn't own (for example, stack-allocated memory), while (optionally)
 allowing it to switch over to using allocated memory if the string grows
 past the size of the pre-allocated buffer.

I'd like to see these characteristics, but I would want to see that
this is done entirely internally inside the strbuf implementation
without any API impact, other than the initialisation.  I do not
think the current API contract is too rigid to allow us doing so.

 - The API users may peek strbuf.buf in-place until they perform an
   operation that makes it longer (at which point the .buf pointer
   may point at a new piece of memory).

 - The API users may strbuf_detach() to obtain a piece of memory
   that belongs to them (at which point the strbuf becomes empty),
   hence needs to be freed by the callers.

As long as the above two promises are kept intact, it is all
internal to the strbuf implementation, current iteration of which
does not have any initial (possibly static) allocation other than
the fixed terminating NUL, but your updated one may take a caller
supplied piece of memory that is designed to outlive the strbuf
itself as its initial allocation and the memory ownership can be
left as an internal implementation detail to the strbuf without
bothering the callers.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Project idea: strbuf allocation modes

2014-04-18 Thread Michael Haggerty
On 04/18/2014 07:21 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 The Idea
 

 I would like to see strbuf enhanced to allow it to use memory that it
 doesn't own (for example, stack-allocated memory), while (optionally)
 allowing it to switch over to using allocated memory if the string grows
 past the size of the pre-allocated buffer.
 
 I'd like to see these characteristics, but I would want to see that
 this is done entirely internally inside the strbuf implementation
 without any API impact, other than the initialisation.  I do not
 think the current API contract is too rigid to allow us doing so.
 
  - The API users may peek strbuf.buf in-place until they perform an
operation that makes it longer (at which point the .buf pointer
may point at a new piece of memory).
 
  - The API users may strbuf_detach() to obtain a piece of memory
that belongs to them (at which point the strbuf becomes empty),
hence needs to be freed by the callers.
 
 As long as the above two promises are kept intact, it is all
 internal to the strbuf implementation, current iteration of which
 does not have any initial (possibly static) allocation other than
 the fixed terminating NUL, but your updated one may take a caller
 supplied piece of memory that is designed to outlive the strbuf
 itself as its initial allocation and the memory ownership can be
 left as an internal implementation detail to the strbuf without
 bothering the callers.

I think that's exactly what I described.  The only thing that would have
to change in the strbuf behavior (aside from initialization) is that a
buffer-growing operation might die if the string would grow outside of
the bounds of the existing buffer when STRBUF_FIXED_MEMORY is set.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Project idea: strbuf allocation modes

2014-04-18 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 On 04/18/2014 07:21 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 The Idea
 

 I would like to see strbuf enhanced to allow it to use memory that it
 doesn't own (for example, stack-allocated memory), while (optionally)
 allowing it to switch over to using allocated memory if the string grows
 past the size of the pre-allocated buffer.
 
 I'd like to see these characteristics, but I would want to see that
 this is done entirely internally inside the strbuf implementation
 without any API impact, other than the initialisation.  I do not
 ...

 I think that's exactly what I described.  The only thing that would have
 to change in the strbuf behavior (aside from initialization) is that a
 buffer-growing operation might die if the string would grow outside of
 the bounds of the existing buffer when STRBUF_FIXED_MEMORY is set.

Yup, we are in agreement ;-)

More seriously, sorry for sounding as if I was objecting. I should
have edited s/but/and/ before sending my response out.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html