Re: Project idea: strbuf allocation modes
Michael Haggerty mhag...@alum.mit.edu writes: STRBUF_OWNS_MEMORY The memory pointed to by buf is owned by this strbuf. If this bit is not set, then the memory should never be freed, and (among other things) strbuf_detach() must always call xstrcpy(). I just foresee a small difficulty from the caller side: char path_buf[PATH_MAX]; char *detached; struct strbuf path; strbuf_wrap_preallocated(path, path_buf, 0, sizeof(path)); ... detached = strbuf_detach(path); the wrapping/unwrapping of path_buf magically turned a stack-allocated buffer into a heap-allocated one. And the initial goal of avoiding malloc() is defeated. So, essentially, one should avoir using strbuf_wrap_preallocated with strbuf_detach, right? But I agree with Junio that if the API is properly used, everything should work. I'm just worried that we will add a bit more complexity to the API, and I'm not sure we can actually expect noticeable improvements in terms of performance (i.e. do we actually use strbuf for performance-critical stuff?). In your proposal, would STRBUF_OWNS_MEMORY be a constant, or a flag that change when the internal buffer needs reallocation? My understanding is that it should change (if STRBUF_FIXED_MEMORY is not set), and the strbuf wrapping a preallocated buffer would become a normal strbuf when its internal buffer grows. If so, your strbuf_detach() must always call xstrcpy() is to be understood as if STRBUF_OWNS_MEMORY is still set when strbuf_detach() is called, then it must always call xstrcpy(), right? -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Project idea: strbuf allocation modes
On 04/22/2014 09:07 AM, Matthieu Moy wrote: Michael Haggerty mhag...@alum.mit.edu writes: STRBUF_OWNS_MEMORY The memory pointed to by buf is owned by this strbuf. If this bit is not set, then the memory should never be freed, and (among other things) strbuf_detach() must always call xstrcpy(). I just foresee a small difficulty from the caller side: char path_buf[PATH_MAX]; char *detached; struct strbuf path; strbuf_wrap_preallocated(path, path_buf, 0, sizeof(path)); ... detached = strbuf_detach(path); the wrapping/unwrapping of path_buf magically turned a stack-allocated buffer into a heap-allocated one. And the initial goal of avoiding malloc() is defeated. So, essentially, one should avoir using strbuf_wrap_preallocated with strbuf_detach, right? Correct; it wouldn't bring any performance advantages. But strbuf_detach() should be implemented for such strbufs for completeness so that after the strbuf is instantiated, users don't have to know its allocation mode. But I agree with Junio that if the API is properly used, everything should work. I'm just worried that we will add a bit more complexity to the API, and I'm not sure we can actually expect noticeable improvements in terms of performance (i.e. do we actually use strbuf for performance-critical stuff?). The whole point of the change is to *allow* strbuf to be used in performance-critical stuff. In your proposal, would STRBUF_OWNS_MEMORY be a constant, or a flag that change when the internal buffer needs reallocation? My understanding is that it should change (if STRBUF_FIXED_MEMORY is not set), and the strbuf wrapping a preallocated buffer would become a normal strbuf when its internal buffer grows. Correct. STRBUF_OWNS_MEMORY itself is of course a constant like 0x02 (a mask to set/clear a bit in the flags) but the corresponding flags bit would sometimes be changed internal to the strbuf implementation. For example, if the buffer is grown past its original size (when STRBUF_FIXED_MEMORY is not set) the bit would be set. After strbuf_detach(), there would be no automatic way to re-attach such a strbuf to its original heap-allocated memory. So in this situation the buffer would have to be pointed at strbuf_slopbuf again, and the STRBUF_OWNS_MEMORY bit would be cleared again. If so, your strbuf_detach() must always call xstrcpy() is to be understood as if STRBUF_OWNS_MEMORY is still set when strbuf_detach() is called, then it must always call xstrcpy(), right? Exactly. How does the size of this project compare to what you are looking for for your Ensimag students? Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Project idea: strbuf allocation modes
Michael Haggerty mhag...@alum.mit.edu writes: On 04/22/2014 09:07 AM, Matthieu Moy wrote: The whole point of the change is to *allow* strbuf to be used in performance-critical stuff. OK. It should not make the current use of strbuf any harder anyway. In your proposal, would STRBUF_OWNS_MEMORY be a constant, or a flag that change when the internal buffer needs reallocation? My understanding is that it should change (if STRBUF_FIXED_MEMORY is not set), and the strbuf wrapping a preallocated buffer would become a normal strbuf when its internal buffer grows. Correct. STRBUF_OWNS_MEMORY itself is of course a constant like 0x02 Yes, I meant the STRBUF_OWNS_MEMORY flag. How does the size of this project compare to what you are looking for for your Ensimag students? I do not yet have applications for this project (it should come within the next few days). It greatly depends on students and team size. I always start with a warm up patch (like GSoC's microprojects), and depending on the time taken for it, I direct students to bigger or smaller projects. Your proposal is a bit tricky (it requires a good understanding of C and memory management), but it is a purely local change (i.e. you can take strbuf.[cf] out of Git's source code and hack it as a ~500 LOC project, unit-test and document the result in a self-contained way), so it should be doable. Using the new API features here and there in Git's source code is another story though. I've added the proposal with a link to this discussion here: https://git.wiki.kernel.org/index.php/SmallProjectsIdeas#Allow_finer_memory_management_in_the_strbuf_API -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Project idea: strbuf allocation modes
I like that strbuf is getting used more than it used to, and I think that is a good general trend to help git rid of and/or avoid buffer overflows and arbitrary limits on strings. It is unfortunate that it is currently impossible to use a strbuf without doing a memory allocation. So code like void f() { char path[PATH_MAX]; ... } typically gets turned into either void f() { struct strbuf path; strbuf_add(path, ...); -- does a malloc ... strbuf_release(path);-- does a free } which costs extra memory allocations, or void f() { static struct strbuf path; strbuf_add(path, ...); ... strbuf_setlen(path, 0); } which, by using a static variable, avoids most of the malloc/free overhead, but makes the function unsafe to use recursively or from multiple threads. The Idea I would like to see strbuf enhanced to allow it to use memory that it doesn't own (for example, stack-allocated memory), while (optionally) allowing it to switch over to using allocated memory if the string grows past the size of the pre-allocated buffer. The way I envision this, strbuf would gain a flags field with two bits: STRBUF_OWNS_MEMORY The memory pointed to by buf is owned by this strbuf. If this bit is not set, then the memory should never be freed, and (among other things) strbuf_detach() must always call xstrcpy(). STRBUF_FIXED_MEMORY This strbuf can use the memory of length alloc at buf however it likes. But it must never free() or realloc() the memory or move the string. Essentially, buf should be treated as a constant pointer. If the string overgrows the buffer, die(). The different bit combinations would have the following effects: flags == ~STRBUF_OWNS_MEMORY | ~STRBUF_FIXED_MEMORY The strbuf doesn't own the current memory, but it is allowed to grow the buffer into other (allocated) memory. When needed, malloc() new memory, copy the old string to the new memory, and clear this bit -- but *don't* free the old memory. Note that STRBUF_INIT, when buf is initialized to point at strbuf_slopbuf, would use this flag setting. This flag combination could be used to wrap a stack-allocated buffer without preventing the string from growing beyond the size of the buffer: void f() { char path_buf[PATH_MAX]; struct strbuf path; strbuf_wrap_preallocated(path, path_buf, 0, sizeof(path)); ... strbuf_release(path); } (Probably the boilerplate could be reduced by using macros.) flags == STRBUF_OWNS_MEMORY | ~STRBUF_FIXED_MEMORY This gives the current behavior of a non-empty strbuf. flags == ~STRBUF_OWNS_MEMORY | STRBUF_FIXED_MEMORY This would make it possible to use a strbuf to wrap a fixed-size buffer that belongs to somebody else and must not be moved. For example, void f(char *path_buf, size_t path_buf_len) { struct strbuf path; strbuf_wrap_fixed(path, path_buf, strlen(path_buf), path_buf_len); ... /* * no strbuf_release() required here, but if called it * is a NOOP */ } (Probably the boilerplate could be reduced by using macros.) It would also be possible to use this mode to dynamically allocate a strbuf along with space for its string (i.e., to make do with one allocation instead of two): struct strbuf sb = xmalloc(sizeof(strbuf) + len + 1); sb.alloc = len + 1; sb.len = len; sb.flags = STRBUF_FIXED_MEMORY; sb.buf = (void *)sb + sizeof(strbuf); *sb.buf = '\0'; If this is considered useful, it should of course be offered via a function. flags == STRBUF_OWNS_MEMORY | STRBUF_FIXED_MEMORY This combination seems less useful, but I leave it here for completeness. It is basically a strbuf whose length is constrained to its currently-allocated size. The nice thing is that I believe this new functionality could be implemented without slowing down the most-used strbuf functions. For example, none of the inline functions would have to be changed. The flags would only have to be checked when doing memory-related operations like strbuf_grow(), strbuf_detach(), etc. Anyway, I just wanted to get the idea out there. I'd like to get feedback about whether people think this is a good idea. If so, maybe I'll add it to the wiki as a suggested project. I've wanted to work on this myself, but realistically I'm not going to have time in the near future. It might be a good Ensimag project or a future GSoC project (though it might be a bit
Re: Project idea: strbuf allocation modes
Michael Haggerty mhag...@alum.mit.edu writes: The Idea I would like to see strbuf enhanced to allow it to use memory that it doesn't own (for example, stack-allocated memory), while (optionally) allowing it to switch over to using allocated memory if the string grows past the size of the pre-allocated buffer. I'd like to see these characteristics, but I would want to see that this is done entirely internally inside the strbuf implementation without any API impact, other than the initialisation. I do not think the current API contract is too rigid to allow us doing so. - The API users may peek strbuf.buf in-place until they perform an operation that makes it longer (at which point the .buf pointer may point at a new piece of memory). - The API users may strbuf_detach() to obtain a piece of memory that belongs to them (at which point the strbuf becomes empty), hence needs to be freed by the callers. As long as the above two promises are kept intact, it is all internal to the strbuf implementation, current iteration of which does not have any initial (possibly static) allocation other than the fixed terminating NUL, but your updated one may take a caller supplied piece of memory that is designed to outlive the strbuf itself as its initial allocation and the memory ownership can be left as an internal implementation detail to the strbuf without bothering the callers. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Project idea: strbuf allocation modes
On 04/18/2014 07:21 PM, Junio C Hamano wrote: Michael Haggerty mhag...@alum.mit.edu writes: The Idea I would like to see strbuf enhanced to allow it to use memory that it doesn't own (for example, stack-allocated memory), while (optionally) allowing it to switch over to using allocated memory if the string grows past the size of the pre-allocated buffer. I'd like to see these characteristics, but I would want to see that this is done entirely internally inside the strbuf implementation without any API impact, other than the initialisation. I do not think the current API contract is too rigid to allow us doing so. - The API users may peek strbuf.buf in-place until they perform an operation that makes it longer (at which point the .buf pointer may point at a new piece of memory). - The API users may strbuf_detach() to obtain a piece of memory that belongs to them (at which point the strbuf becomes empty), hence needs to be freed by the callers. As long as the above two promises are kept intact, it is all internal to the strbuf implementation, current iteration of which does not have any initial (possibly static) allocation other than the fixed terminating NUL, but your updated one may take a caller supplied piece of memory that is designed to outlive the strbuf itself as its initial allocation and the memory ownership can be left as an internal implementation detail to the strbuf without bothering the callers. I think that's exactly what I described. The only thing that would have to change in the strbuf behavior (aside from initialization) is that a buffer-growing operation might die if the string would grow outside of the bounds of the existing buffer when STRBUF_FIXED_MEMORY is set. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Project idea: strbuf allocation modes
Michael Haggerty mhag...@alum.mit.edu writes: On 04/18/2014 07:21 PM, Junio C Hamano wrote: Michael Haggerty mhag...@alum.mit.edu writes: The Idea I would like to see strbuf enhanced to allow it to use memory that it doesn't own (for example, stack-allocated memory), while (optionally) allowing it to switch over to using allocated memory if the string grows past the size of the pre-allocated buffer. I'd like to see these characteristics, but I would want to see that this is done entirely internally inside the strbuf implementation without any API impact, other than the initialisation. I do not ... I think that's exactly what I described. The only thing that would have to change in the strbuf behavior (aside from initialization) is that a buffer-growing operation might die if the string would grow outside of the bounds of the existing buffer when STRBUF_FIXED_MEMORY is set. Yup, we are in agreement ;-) More seriously, sorry for sounding as if I was objecting. I should have edited s/but/and/ before sending my response out. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html