Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-21 Thread Jeff King
On Thu, Jun 21, 2018 at 09:03:05AM +0200, Johannes Schindelin wrote:

> > > And at that point, maybe
> > > 
> > >   char *some_var = xstrdup("default");
> > >   git_config_string(_var, ...);
> > > 
> > > that takes "char **" and frees the current storage before assigning to
> > > it may be simpler than the two-variable approach.
> > 
> > That _is_ much nicer, but you cannot use xstrdup() as the initializer
> > for a global "static char *some_var", which is what the majority of the
> > config variables are. It's this "static initializer sometimes, run-time
> > heap sometimes" duality to the variables that makes handling it such a
> > pain.
> 
> This makes me think of Michael's proposal to teach strbuf some sort of
> STRBUF_INIT_CONST("default") which would set the appropriate len and set
> alloc to 0.
> 
> That way, we could turn those settings into strbufs that only allocate
> memory when/if needed.

Yes! I should have thought about that as soon as I started saying "you
need two variables...". That is a good indication that you need a
struct. ;)

I think the result would be quite readable and pleasant to work with.

I tried to dig up previous conversations about this to see if there were
any patches shown, but I couldn't find any (mostly I found the
conversation about using stack buffers in strbufs, which is not quite
the same thing, since we _do_ want to write in those).

-Peff


Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-21 Thread Johannes Schindelin
Hi Peff,

On Mon, 4 Jun 2018, Jeff King wrote:

> On Mon, Jun 04, 2018 at 01:26:57PM +0900, Junio C Hamano wrote:
> 
> > And at that point, maybe
> > 
> > char *some_var = xstrdup("default");
> > git_config_string(_var, ...);
> > 
> > that takes "char **" and frees the current storage before assigning to
> > it may be simpler than the two-variable approach.
> 
> That _is_ much nicer, but you cannot use xstrdup() as the initializer
> for a global "static char *some_var", which is what the majority of the
> config variables are. It's this "static initializer sometimes, run-time
> heap sometimes" duality to the variables that makes handling it such a
> pain.

This makes me think of Michael's proposal to teach strbuf some sort of
STRBUF_INIT_CONST("default") which would set the appropriate len and set
alloc to 0.

That way, we could turn those settings into strbufs that only allocate
memory when/if needed.

We would need to be careful to revisit all strbuf_*() functions to make
sure that they test not only for enough space when growing the string, but
also when reducing it (unless reducing to len 0, in which case we can
reassign strbuf_slopbuf to the buf field).

We also would need to revisit strbuf_*() to understand alloc < len to mean
that we cannot realloc(), but have to malloc() && memcpy().

As a bonus, the pattern used for

[section]
var = 1
var = 2
var = 3

would change from three times strdup() and two times free() to one time
malloc().

Ciao,
Dscho


Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-03 Thread Jeff King
On Sun, Jun 03, 2018 at 11:56:37PM -0400, Jeff King wrote:

> So sometimes some_var needs to be freed and sometimes not (and every one
> of those uses is a potential leak, but it's OK because they're all
> program-lifetime globals anyway, and people don't _tend_ to set the same
> option over and over, leaking heap memory). And C being C, we can't
> convert a pointer-to-pointer-to-const into a pointer-to-pointer without
> an explicit cast.
> 
> Doing it "right" in C would probably involve two variables:
> 
>   const char *some_var = "default";
>   const char *some_var_storage = NULL;
> 
>   int git_config_string_smart(const char **ptr, char **storage,
>   const char *var, const char *value)
>   {
> ...
>   free(*storage);
>   *ptr = *storage = xstrdup(value);
>   return 0;
>   }
> 
>   #define GIT_CONFIG_STRING(name, var, value) \
>   git_config_string_smart(&(name), &(name##_storage), var, value)
> 
> Or something like that.

Oh, one other option I forgot to mention: we already have an "intern"
hashmap helper. So we could just replace the xstrdup() with strintern()
and magically the memory isn't "leaked".

I think this is a little bit of a hack, though. It catches my:

  [core]
  editor = foo
  editor = foo

case, because we only ever make one copy of the string "foo". But if I
do:

  [core]
  editor = 1
  editor = 2
  ...

then we get unique strings. And while they're not _technically_ leaked,
since the hashmap still knows about them, they might as well be. They're
still wasting space through the duration of the program run.

-Peff


Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-03 Thread Junio C Hamano
Jeff King  writes:

> With that strategy, we'd have to have a big initialize_defaults()
> function. Which actually might not be _too_ bad since we now have
> common-main.c, but:
>
>   - it sucks to keep the default values far away from the declarations
>
>   - it does carry a runtime cost. Not a huge one, but it sucks to pay it
> on every program startup, even if we're not using those variables.

True, and s/even if/even though most of the time/ the situation is
even worse X-<.


Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-03 Thread Jeff King
On Mon, Jun 04, 2018 at 01:26:57PM +0900, Junio C Hamano wrote:

> > Doing it "right" in C would probably involve two variables:
> >
> >   const char *some_var = "default";
> >   const char *some_var_storage = NULL;
> >
> >   int git_config_string_smart(const char **ptr, char **storage,
> >   const char *var, const char *value)
> >   {
> > ...
> > free(*storage);
> > *ptr = *storage = xstrdup(value);
> > return 0;
> >   }
> >
> >   #define GIT_CONFIG_STRING(name, var, value) \
> >   git_config_string_smart(&(name), &(name##_storage), var, value)
> >
> > Or something like that.
> 
> The attitude the approach takes is that "run once and let exit(3)
> clean after us" programs *should* care.

Even with "let exit clean up", we are still leaking heap every time the
variable is assigned after the first. Again, I don't think it matters
that much in practice, but I think:

  [core]
  editor = foo
  editor = foo
  ...etc...

would leak arbitrary memory during the config parse, that would be
allocated for the remainder of the program. I guess you could say exit()
is handling it, but I think the point is that we are letting exit()
handle memory that was potentially useful until we exit, not leaks. :)

> And at that point, maybe
> 
>   char *some_var = xstrdup("default");
>   git_config_string(_var, ...);
> 
> that takes "char **" and frees the current storage before assigning
> to it may be simpler than the two-variable approach.

That _is_ much nicer, but you cannot use xstrdup() as the initializer
for a global "static char *some_var", which is what the majority of the
config variables are. It's this "static initializer sometimes, run-time
heap sometimes" duality to the variables that makes handling it such a
pain.

With that strategy, we'd have to have a big initialize_defaults()
function. Which actually might not be _too_ bad since we now have
common-main.c, but:

  - it sucks to keep the default values far away from the declarations

  - it does carry a runtime cost. Not a huge one, but it sucks to pay it
on every program startup, even if we're not using those variables.

-Peff


Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-03 Thread Junio C Hamano
Jeff King  writes:

> I've looked into it before, but that causes its own wave of headaches.
> The source of the problem is that we do:
>
>   const char *some_var = "default";
>   ...
>   git_config_string(_var, ...);

Yup, that is a valid pattern for "run once and let exit(3) clean
after us" programs.

> Doing it "right" in C would probably involve two variables:
>
>   const char *some_var = "default";
>   const char *some_var_storage = NULL;
>
>   int git_config_string_smart(const char **ptr, char **storage,
>   const char *var, const char *value)
>   {
> ...
>   free(*storage);
>   *ptr = *storage = xstrdup(value);
>   return 0;
>   }
>
>   #define GIT_CONFIG_STRING(name, var, value) \
>   git_config_string_smart(&(name), &(name##_storage), var, value)
>
> Or something like that.

The attitude the approach takes is that "run once and let exit(3)
clean after us" programs *should* care.  And at that point, maybe

char *some_var = xstrdup("default");
git_config_string(_var, ...);

that takes "char **" and frees the current storage before assigning
to it may be simpler than the two-variable approach.

But you're right.  We cannot just unconst the type and be done with
it---there are associated clean-up necessary if we were to do this.

Thanks.


Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-03 Thread Jeff King
On Mon, Jun 04, 2018 at 12:44:15PM +0900, Junio C Hamano wrote:

> >> diff --git a/sequencer.c b/sequencer.c
> >> index b98690ecd41..aba03e9429a 100644
> >> --- a/sequencer.c
> >> +++ b/sequencer.c
> >> @@ -175,6 +175,7 @@ static int git_sequencer_config(const char *k, const 
> >> char *v, void *cb)
> >>warning(_("invalid commit message cleanup mode '%s'"),
> >>  s);
> >>  
> >> +  free(s);
> >>return status;
> >>}
> >
> > Shouldn't 's' now lose 'const'?  After all, git_config_string()
> > gives you an allocated memory so...
> 
> Yikes.  Should git_config_string() and git_config_pathname() take
> "char **dst" instead of "const char **" as their out-location
> parameter?  They both assign a pointer to an allocated piece of
> memory for the caller to own or dispose of, but because of
> const-ness of the pointee their first parameter has, a caller like
> this one must declare "const char *s" yet is forced to call free()
> not to leak the value when it is done.

I've looked into it before, but that causes its own wave of headaches.
The source of the problem is that we do:

  const char *some_var = "default";
  ...
  git_config_string(_var, ...);

So sometimes some_var needs to be freed and sometimes not (and every one
of those uses is a potential leak, but it's OK because they're all
program-lifetime globals anyway, and people don't _tend_ to set the same
option over and over, leaking heap memory). And C being C, we can't
convert a pointer-to-pointer-to-const into a pointer-to-pointer without
an explicit cast.

Doing it "right" in C would probably involve two variables:

  const char *some_var = "default";
  const char *some_var_storage = NULL;

  int git_config_string_smart(const char **ptr, char **storage,
  const char *var, const char *value)
  {
...
free(*storage);
*ptr = *storage = xstrdup(value);
return 0;
  }

  #define GIT_CONFIG_STRING(name, var, value) \
  git_config_string_smart(&(name), &(name##_storage), var, value)

Or something like that.

We could also get away from an out-parameter and use the return type,
since the single-pointer conversion is OK. But the primary value of
git_config_string() is that it lets you return the error code as a
one-liner.

-Peff


Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-03 Thread Junio C Hamano
Junio C Hamano  writes:

> Stefan Beller  writes:
>
>> Signed-off-by: Stefan Beller 
>> ---
>>  sequencer.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/sequencer.c b/sequencer.c
>> index b98690ecd41..aba03e9429a 100644
>> --- a/sequencer.c
>> +++ b/sequencer.c
>> @@ -175,6 +175,7 @@ static int git_sequencer_config(const char *k, const 
>> char *v, void *cb)
>>  warning(_("invalid commit message cleanup mode '%s'"),
>>s);
>>  
>> +free(s);
>>  return status;
>>  }
>
> Shouldn't 's' now lose 'const'?  After all, git_config_string()
> gives you an allocated memory so...

Yikes.  Should git_config_string() and git_config_pathname() take
"char **dst" instead of "const char **" as their out-location
parameter?  They both assign a pointer to an allocated piece of
memory for the caller to own or dispose of, but because of
const-ness of the pointee their first parameter has, a caller like
this one must declare "const char *s" yet is forced to call free()
not to leak the value when it is done.



Re: [PATCH 2/2] sequencer.c: plug mem leak in git_sequencer_config

2018-06-03 Thread Junio C Hamano
Stefan Beller  writes:

> Signed-off-by: Stefan Beller 
> ---
>  sequencer.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/sequencer.c b/sequencer.c
> index b98690ecd41..aba03e9429a 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -175,6 +175,7 @@ static int git_sequencer_config(const char *k, const char 
> *v, void *cb)
>   warning(_("invalid commit message cleanup mode '%s'"),
> s);
>  
> + free(s);
>   return status;
>   }

Shouldn't 's' now lose 'const'?  After all, git_config_string()
gives you an allocated memory so...