Re: [PATCH v3 2/2] read-cache: plug a few leaks

2013-06-09 Thread René Scharfe

Am 09.06.2013 04:25, schrieb Felipe Contreras:

On Sat, Jun 8, 2013 at 9:11 PM, René Scharfe
rene.scha...@lsrfire.ath.cx wrote:

Am 08.06.2013 19:27, schrieb Felipe Contreras:


On Sat, Jun 8, 2013 at 12:22 PM, René Scharfe
rene.scha...@lsrfire.ath.cx wrote:


Let's find and fix those leaks by freeing memory in the right places.
Freeing memory just in case in places where we can show that no leak is
triggered by our test suite doesn't help.



It helps; it prevents leaks. The real culprit is the bogus API, but I
don't see that changing anytime soon, so there are two options when
somebody makes a mistake the API allows; leak or don't leak. And you
seem to prefer the leak, even though it provides absolutely no
advantage.


It covers up bugs,


It doesn't. I thought you already silently agreed that nobody would
ever find that leak, as they haven't found the hundreds of leaks that
plague Git's code.


Nah, I explained non-silently that leakage was a design decision for 
short-running commands that allocate memory, use it and exit.  Reusing 
such code without freeing allocated memory between runs explicitly turns 
a good leak into a bad one, as we saw with cherry-pick --stdin.



What would be a better API?  Making discard_index free the array is a good
first step; what else is bogus?


'initialized' for starters; it should be renamed to 'loaded' or
removed, but removing it would require many more changes to make sure
we don't load twice. Also, when loading cache entries, it might make
sense to check if there's already entries that have not been
previously discarded properly.


Adding diagnostics that help find leaks is a good idea.

So, from reading the code, this sequence is OK:

discard_cache() // defined starting point
read_cache()// reads the cache
read_cache()// does nothing

And I guess this one is not OK:

discard_cache() // defined starting point
add_index_entry()   // add single entry
read_cache()// currently leaks, should warn/die

Any more sequences that we need to guard against, or counterexamples?


In the meantime, just in case, the only sane thing to do is free the
entries rather than leak.


I consider not plugging a leak which we don't know how to trigger with 
existing code even more sane.  Yay, circles! ;-)



That being said I'm not interested in this patch any more. The patch
is good yet after three tries and countless arguments it's still not
applied, nor is there any sign of getting there.


Let's take it step by step: Once the known leak is plugged we can worry 
about the unknown ones.  I'll send small patches.


René
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] read-cache: plug a few leaks

2013-06-09 Thread Felipe Contreras
On Sun, Jun 9, 2013 at 12:38 PM, René Scharfe
rene.scha...@lsrfire.ath.cx wrote:
 Am 09.06.2013 04:25, schrieb Felipe Contreras:

 It doesn't. I thought you already silently agreed that nobody would
 ever find that leak, as they haven't found the hundreds of leaks that
 plague Git's code.

 Nah, I explained non-silently that leakage was a design decision for
 short-running commands that allocate memory, use it and exit.  Reusing such
 code without freeing allocated memory between runs explicitly turns a good
 leak into a bad one, as we saw with cherry-pick --stdin.

git cherry-pick for multiple commits is already there, such a leak
would be a bad one, and nobody would find it. Valgrind doesn't know
about your concept of good leaks, all that you see is tons of leaks.

 Any more sequences that we need to guard against, or counterexamples?

I don't know.

 In the meantime, just in case, the only sane thing to do is free the
 entries rather than leak.

 I consider not plugging a leak which we don't know how to trigger with
 existing code even more sane.  Yay, circles! ;-)

There's absolutely no benefits to leaving the potential leak.

 Let's take it step by step: Once the known leak is plugged we can worry
 about the unknown ones.  I'll send small patches.

Go ahead. I'm not interested any more.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] read-cache: plug a few leaks

2013-06-09 Thread Junio C Hamano
René Scharfe rene.scha...@lsrfire.ath.cx writes:

 A comment before read_index_from says remember to discard_cache()
 before reading a different cache!.  That is probably a reminder that
 read_index_from does nothing if -initialized is set.  Entries added
 before calling read_index_from make up a different cache, however, so
 I think this comment applies for the call sequence above as well.

Yes, you can lose the probably from there.  Back when he comment was
added at 8fd2cb406917 (Extract helper bits from c-merge-recursive
work, 2006-07-25), there was only one in-core index, and checking
active_cache or cache_mmap is not NULL was a way to say Have we
loaded from $GIT_DIR/index already?  If so, there is nothing more to
do.

Later unpack_trees() added a way to populate the index not by
reading from $GIT_DIR/index and the original Has file, hence is
loaded, so ignore read_cache() caused issues.  A discussion was in 

http://thread.gmane.org/gmane.comp.version-control.git/93260/focus=93469

which brought the initialized bit in, which lead to 913e0e99b6a6
(unpack_trees(): protect the handcrafted in-core index from
read_cache(), 2008-08-23).

 Side note: I wonder why we need to guard against multiple
 read_index_from calls in a row with -initialized.  Wouldn't it be
 easier to avoid the duplicate calls in the first place?  Finding them
 now might be not so easy, though.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] read-cache: plug a few leaks

2013-06-08 Thread René Scharfe

Am 08.06.2013 00:29, schrieb Felipe Contreras:

We are not freeing 'istate-cache' properly.

We can't rely on 'initialized' to keep track of the 'istate-cache',
because it doesn't really mean it's initialized. So assume it always has
data, and free it before overwriting it.


That sounds a bit backwards to me.  If -initialized doesn't mean that 
the index state is initialized then something is fishy.  Would it make 
sense and be sufficient to set -initialized in add_index_entry?  Or to 
get rid of it and check for -cache_alloc instead?



Signed-off-by: Felipe Contreras felipe.contre...@gmail.com
---
  read-cache.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 5e30746..a1dd04d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1451,6 +1451,7 @@ int read_index_from(struct index_state *istate, const 
char *path)
istate-version = ntohl(hdr-hdr_version);
istate-cache_nr = ntohl(hdr-hdr_entries);
istate-cache_alloc = alloc_nr(istate-cache_nr);
+   free(istate-cache);
istate-cache = xcalloc(istate-cache_alloc, sizeof(*istate-cache));
istate-initialized = 1;


You wrote earlier that this change is safe with current callers and that 
it prevents leaks with the following sequence:


discard_cache();
# add entries
read_cache();

Do we currently have such a call sequence somewhere?  Wouldn't that be a 
bug, namely forgetting to call discard_cache before read_cache?


I've added a assert(istate-cache_nr == 0); a few lines above and the 
test suite still passed.  With the hunk below, -cache is also always 
NULL and cache_alloc is always 0 at that point.  So we don't need that 
free call there in the cases covered by the test suite at least -- 
better leave it out.



@@ -1512,6 +1513,9 @@ int discard_index(struct index_state *istate)

for (i = 0; i  istate-cache_nr; i++)
free(istate-cache[i]);
+   free(istate-cache);
+   istate-cache = NULL;
+   istate-cache_alloc = 0;
resolve_undo_clear_index(istate);
istate-cache_nr = 0;
istate-cache_changed = 0;


I still like this part, but also would still recommend to remove the now 
doubly-outdated comment a few lines below that says no need to throw 
away allocated active_cache.  It was valid back when there was only a 
single in-memory instance of the index and no istate parameter.


René

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] read-cache: plug a few leaks

2013-06-08 Thread Felipe Contreras
On Sat, Jun 8, 2013 at 6:32 AM, René Scharfe
rene.scha...@lsrfire.ath.cx wrote:
 Am 08.06.2013 00:29, schrieb Felipe Contreras:

 We are not freeing 'istate-cache' properly.

 We can't rely on 'initialized' to keep track of the 'istate-cache',
 because it doesn't really mean it's initialized. So assume it always has
 data, and free it before overwriting it.


 That sounds a bit backwards to me.  If -initialized doesn't mean that the
 index state is initialized then something is fishy.  Would it make sense and
 be sufficient to set -initialized in add_index_entry?

I don't know.

 Or to get rid of it and check for -cache_alloc instead?

That might make sense. I was thinking on renaming 'initialized' to
'loaded', but I really don't care.

 Signed-off-by: Felipe Contreras felipe.contre...@gmail.com
 ---
   read-cache.c | 4 
   1 file changed, 4 insertions(+)

 diff --git a/read-cache.c b/read-cache.c
 index 5e30746..a1dd04d 100644
 --- a/read-cache.c
 +++ b/read-cache.c
 @@ -1451,6 +1451,7 @@ int read_index_from(struct index_state *istate,
 const char *path)
 istate-version = ntohl(hdr-hdr_version);
 istate-cache_nr = ntohl(hdr-hdr_entries);
 istate-cache_alloc = alloc_nr(istate-cache_nr);
 +   free(istate-cache);
 istate-cache = xcalloc(istate-cache_alloc,
 sizeof(*istate-cache));
 istate-initialized = 1;


 You wrote earlier that this change is safe with current callers and that it
 prevents leaks with the following sequence:

 discard_cache();
 # add entries
 read_cache();

 Do we currently have such a call sequence somewhere?

I don't know.

 Wouldn't that be a
 bug, namely forgetting to call discard_cache before read_cache?

Why would it be a bug? There's nothing in the API that hints there's a
problem with that.

 I've added a assert(istate-cache_nr == 0); a few lines above and the test
 suite still passed.  With the hunk below, -cache is also always NULL and
 cache_alloc is always 0 at that point.  So we don't need that free call
 there in the cases covered by the test suite at least -- better leave it
 out.

Why leave it out? If somebody makes the mistake of doing the above
sequence, would you prefer that we leak?

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] read-cache: plug a few leaks

2013-06-08 Thread René Scharfe

Am 08.06.2013 14:15, schrieb Felipe Contreras:

On Sat, Jun 8, 2013 at 6:32 AM, René Scharfe
rene.scha...@lsrfire.ath.cx wrote:

diff --git a/read-cache.c b/read-cache.c
index 5e30746..a1dd04d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1451,6 +1451,7 @@ int read_index_from(struct index_state *istate,
const char *path)
 istate-version = ntohl(hdr-hdr_version);
 istate-cache_nr = ntohl(hdr-hdr_entries);
 istate-cache_alloc = alloc_nr(istate-cache_nr);
+   free(istate-cache);
 istate-cache = xcalloc(istate-cache_alloc,
sizeof(*istate-cache));
 istate-initialized = 1;



You wrote earlier that this change is safe with current callers and that it
prevents leaks with the following sequence:

discard_cache();
# add entries
read_cache();

Do we currently have such a call sequence somewhere?


I don't know.


Wouldn't that be a
bug, namely forgetting to call discard_cache before read_cache?


Why would it be a bug? There's nothing in the API that hints there's a
problem with that.


A comment before read_index_from says remember to discard_cache() 
before reading a different cache!.  That is probably a reminder that 
read_index_from does nothing if -initialized is set.  Entries added 
before calling read_index_from make up a different cache, however, so I 
think this comment applies for the call sequence above as well.


Only read_index_from and add_index_entry allocate -cache, and only 
discard_index frees it, so the two are a triple like malloc, realloc and 
free.


Granted, these hints are not part of the API -- that looks like a 
documentation bug, however.


Side note: I wonder why we need to guard against multiple 
read_index_from calls in a row with -initialized.  Wouldn't it be 
easier to avoid the duplicate calls in the first place?  Finding them 
now might be not so easy, though.



I've added a assert(istate-cache_nr == 0); a few lines above and the test
suite still passed.  With the hunk below, -cache is also always NULL and
cache_alloc is always 0 at that point.  So we don't need that free call
there in the cases covered by the test suite at least -- better leave it
out.


Why leave it out? If somebody makes the mistake of doing the above
sequence, would you prefer that we leak?


Leaking is better than silently cleaning up after a buggy caller because 
it still allows the underlying bug to be found.  Even better would be an 
assert, but it's important to make sure it doesn't trigger for existing 
use cases.


René

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] read-cache: plug a few leaks

2013-06-08 Thread Felipe Contreras
On Sat, Jun 8, 2013 at 8:22 AM, René Scharfe
rene.scha...@lsrfire.ath.cx wrote:
 Am 08.06.2013 14:15, schrieb Felipe Contreras:

 Why leave it out? If somebody makes the mistake of doing the above
 sequence, would you prefer that we leak?

 Leaking is better than silently cleaning up after a buggy caller because it
 still allows the underlying bug to be found.

No, it doesn't. The pointer is replaced and forever lost. How is
leaking memory helping anyone to find the bug?

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] read-cache: plug a few leaks

2013-06-08 Thread René Scharfe

Am 08.06.2013 16:04, schrieb Felipe Contreras:

On Sat, Jun 8, 2013 at 8:22 AM, René Scharfe
rene.scha...@lsrfire.ath.cx wrote:

Am 08.06.2013 14:15, schrieb Felipe Contreras:



Why leave it out? If somebody makes the mistake of doing the above
sequence, would you prefer that we leak?


Leaking is better than silently cleaning up after a buggy caller because it
still allows the underlying bug to be found.


No, it doesn't. The pointer is replaced and forever lost. How is
leaking memory helping anyone to find the bug?


Valgrind can tell you where leaked memory was allocated, but not if you 
free it in the wrong place.


René

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] read-cache: plug a few leaks

2013-06-08 Thread Felipe Contreras
On Sat, Jun 8, 2013 at 10:56 AM, René Scharfe
rene.scha...@lsrfire.ath.cx wrote:
 Am 08.06.2013 16:04, schrieb Felipe Contreras:

 On Sat, Jun 8, 2013 at 8:22 AM, René Scharfe
 rene.scha...@lsrfire.ath.cx wrote:

 Am 08.06.2013 14:15, schrieb Felipe Contreras:


 Why leave it out? If somebody makes the mistake of doing the above
 sequence, would you prefer that we leak?


 Leaking is better than silently cleaning up after a buggy caller because
 it
 still allows the underlying bug to be found.


 No, it doesn't. The pointer is replaced and forever lost. How is
 leaking memory helping anyone to find the bug?

 Valgrind can tell you where leaked memory was allocated, but not if you free
 it in the wrong place.

It is the correct place to free it. And nobody will ever find it with
valgrind, just like nobody has ever tracked down the hundreds of leaks
in Git that happen reliably 100% of the time, much less the ones that
happen rarely.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] read-cache: plug a few leaks

2013-06-08 Thread René Scharfe

Am 08.06.2013 18:53, schrieb Felipe Contreras:

On Sat, Jun 8, 2013 at 10:56 AM, René Scharfe
rene.scha...@lsrfire.ath.cx wrote:

Am 08.06.2013 16:04, schrieb Felipe Contreras:


On Sat, Jun 8, 2013 at 8:22 AM, René Scharfe
rene.scha...@lsrfire.ath.cx wrote:


Am 08.06.2013 14:15, schrieb Felipe Contreras:




Why leave it out? If somebody makes the mistake of doing the above
sequence, would you prefer that we leak?



Leaking is better than silently cleaning up after a buggy caller because
it
still allows the underlying bug to be found.



No, it doesn't. The pointer is replaced and forever lost. How is
leaking memory helping anyone to find the bug?


Valgrind can tell you where leaked memory was allocated, but not if you free
it in the wrong place.


It is the correct place to free it. And nobody will ever find it with
valgrind, just like nobody has ever tracked down the hundreds of leaks
in Git that happen reliably 100% of the time, much less the ones that
happen rarely.


We could argue about freeing it here or adding a discard_index call 
somewhere else (which seems more natural to me) once we had a call 
sequence that actually causes such a leak.  The test suite contains 
none, so I'd say we need more tests first.


Lots of the existing leaks were not worth plugging because the process 
would end right after freeing the memory.  Leaving clean-up to the OS 
was a conscious design choice, simplifying the code.


When such code is reused in a library or run multiple times while it was 
originally meant to be run only a single time (like with cherry-pick 
--stdin) the original assumption doesn't hold any more and we have a 
problem.


Let's find and fix those leaks by freeing memory in the right places. 
Freeing memory just in case in places where we can show that no leak is 
triggered by our test suite doesn't help.


René

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] read-cache: plug a few leaks

2013-06-08 Thread René Scharfe

Am 08.06.2013 19:27, schrieb Felipe Contreras:

On Sat, Jun 8, 2013 at 12:22 PM, René Scharfe
rene.scha...@lsrfire.ath.cx wrote:


Let's find and fix those leaks by freeing memory in the right places.
Freeing memory just in case in places where we can show that no leak is
triggered by our test suite doesn't help.


It helps; it prevents leaks. The real culprit is the bogus API, but I
don't see that changing anytime soon, so there are two options when
somebody makes a mistake the API allows; leak or don't leak. And you
seem to prefer the leak, even though it provides absolutely no
advantage.


It covers up bugs, which may seem helpful, but isn't, because it's 
better to fix the actual mistake.


What would be a better API?  Making discard_index free the array is a 
good first step; what else is bogus?


René

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] read-cache: plug a few leaks

2013-06-08 Thread Felipe Contreras
On Sat, Jun 8, 2013 at 9:11 PM, René Scharfe
rene.scha...@lsrfire.ath.cx wrote:
 Am 08.06.2013 19:27, schrieb Felipe Contreras:

 On Sat, Jun 8, 2013 at 12:22 PM, René Scharfe
 rene.scha...@lsrfire.ath.cx wrote:

 Let's find and fix those leaks by freeing memory in the right places.
 Freeing memory just in case in places where we can show that no leak is
 triggered by our test suite doesn't help.


 It helps; it prevents leaks. The real culprit is the bogus API, but I
 don't see that changing anytime soon, so there are two options when
 somebody makes a mistake the API allows; leak or don't leak. And you
 seem to prefer the leak, even though it provides absolutely no
 advantage.

 It covers up bugs,

It doesn't. I thought you already silently agreed that nobody would
ever find that leak, as they haven't found the hundreds of leaks that
plague Git's code.

 What would be a better API?  Making discard_index free the array is a good
 first step; what else is bogus?

'initialized' for starters; it should be renamed to 'loaded' or
removed, but removing it would require many more changes to make sure
we don't load twice. Also, when loading cache entries, it might make
sense to check if there's already entries that have not been
previously discarded properly.

In the meantime, just in case, the only sane thing to do is free the
entries rather than leak.

That being said I'm not interested in this patch any more. The patch
is good yet after three tries and countless arguments it's still not
applied, nor is there any sign of getting there.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] read-cache: plug a few leaks

2013-06-07 Thread Felipe Contreras
We are not freeing 'istate-cache' properly.

We can't rely on 'initialized' to keep track of the 'istate-cache',
because it doesn't really mean it's initialized. So assume it always has
data, and free it before overwriting it.

Signed-off-by: Felipe Contreras felipe.contre...@gmail.com
---
 read-cache.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 5e30746..a1dd04d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1451,6 +1451,7 @@ int read_index_from(struct index_state *istate, const 
char *path)
istate-version = ntohl(hdr-hdr_version);
istate-cache_nr = ntohl(hdr-hdr_entries);
istate-cache_alloc = alloc_nr(istate-cache_nr);
+   free(istate-cache);
istate-cache = xcalloc(istate-cache_alloc, sizeof(*istate-cache));
istate-initialized = 1;
 
@@ -1512,6 +1513,9 @@ int discard_index(struct index_state *istate)
 
for (i = 0; i  istate-cache_nr; i++)
free(istate-cache[i]);
+   free(istate-cache);
+   istate-cache = NULL;
+   istate-cache_alloc = 0;
resolve_undo_clear_index(istate);
istate-cache_nr = 0;
istate-cache_changed = 0;
-- 
1.8.3.698.g079b096

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html