Re: [PATCH v3 2/2] read-cache: plug a few leaks
René Scharfe writes: > A comment before read_index_from says "remember to discard_cache() > before reading a different cache!". That is probably a reminder that > read_index_from does nothing if ->initialized is set. Entries added > before calling read_index_from make up a different cache, however, so > I think this comment applies for the call sequence above as well. Yes, you can lose the probably from there. Back when he comment was added at 8fd2cb406917 (Extract helper bits from c-merge-recursive work, 2006-07-25), there was only one in-core index, and checking active_cache or cache_mmap is not NULL was a way to say "Have we loaded from $GIT_DIR/index already? If so, there is nothing more to do". Later unpack_trees() added a way to populate the index not by reading from $GIT_DIR/index and the original "Has file, hence is loaded, so ignore read_cache()" caused issues. A discussion was in http://thread.gmane.org/gmane.comp.version-control.git/93260/focus=93469 which brought the "initialized" bit in, which lead to 913e0e99b6a6 (unpack_trees(): protect the handcrafted in-core index from read_cache(), 2008-08-23). > Side note: I wonder why we need to guard against multiple > read_index_from calls in a row with ->initialized. Wouldn't it be > easier to avoid the duplicate calls in the first place? Finding them > now might be not so easy, though. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] read-cache: plug a few leaks
On Sun, Jun 9, 2013 at 12:38 PM, René Scharfe wrote: > Am 09.06.2013 04:25, schrieb Felipe Contreras: >> It doesn't. I thought you already silently agreed that nobody would >> ever find that leak, as they haven't found the hundreds of leaks that >> plague Git's code. > > Nah, I explained non-silently that leakage was a design decision for > short-running commands that allocate memory, use it and exit. Reusing such > code without freeing allocated memory between runs explicitly turns a "good" > leak into a "bad" one, as we saw with cherry-pick --stdin. git cherry-pick for multiple commits is already there, such a leak would be a bad one, and nobody would find it. Valgrind doesn't know about your concept of "good" leaks, all that you see is tons of leaks. > Any more sequences that we need to guard against, or counterexamples? I don't know. >> In the meantime, just in case, the only sane thing to do is free the >> entries rather than leak. > > I consider not plugging a leak which we don't know how to trigger with > existing code even more sane. Yay, circles! ;-) There's absolutely no benefits to leaving the potential leak. > Let's take it step by step: Once the known leak is plugged we can worry > about the unknown ones. I'll send small patches. Go ahead. I'm not interested any more. -- Felipe Contreras -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] read-cache: plug a few leaks
Am 09.06.2013 04:25, schrieb Felipe Contreras: On Sat, Jun 8, 2013 at 9:11 PM, René Scharfe wrote: Am 08.06.2013 19:27, schrieb Felipe Contreras: On Sat, Jun 8, 2013 at 12:22 PM, René Scharfe wrote: Let's find and fix those leaks by freeing memory in the right places. Freeing memory just in case in places where we can show that no leak is triggered by our test suite doesn't help. It helps; it prevents leaks. The real culprit is the bogus API, but I don't see that changing anytime soon, so there are two options when somebody makes a mistake the API allows; leak or don't leak. And you seem to prefer the leak, even though it provides absolutely no advantage. It covers up bugs, It doesn't. I thought you already silently agreed that nobody would ever find that leak, as they haven't found the hundreds of leaks that plague Git's code. Nah, I explained non-silently that leakage was a design decision for short-running commands that allocate memory, use it and exit. Reusing such code without freeing allocated memory between runs explicitly turns a "good" leak into a "bad" one, as we saw with cherry-pick --stdin. What would be a better API? Making discard_index free the array is a good first step; what else is bogus? 'initialized' for starters; it should be renamed to 'loaded' or removed, but removing it would require many more changes to make sure we don't load twice. Also, when loading cache entries, it might make sense to check if there's already entries that have not been previously discarded properly. Adding diagnostics that help find leaks is a good idea. So, from reading the code, this sequence is OK: discard_cache() // defined starting point read_cache()// reads the cache read_cache()// does nothing And I guess this one is not OK: discard_cache() // defined starting point add_index_entry() // add single entry read_cache()// currently leaks, should warn/die Any more sequences that we need to guard against, or counterexamples? In the meantime, just in case, the only sane thing to do is free the entries rather than leak. I consider not plugging a leak which we don't know how to trigger with existing code even more sane. Yay, circles! ;-) That being said I'm not interested in this patch any more. The patch is good yet after three tries and countless arguments it's still not applied, nor is there any sign of getting there. Let's take it step by step: Once the known leak is plugged we can worry about the unknown ones. I'll send small patches. René -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] read-cache: plug a few leaks
On Sat, Jun 8, 2013 at 9:11 PM, René Scharfe wrote: > Am 08.06.2013 19:27, schrieb Felipe Contreras: > >> On Sat, Jun 8, 2013 at 12:22 PM, René Scharfe >> wrote: >> >>> Let's find and fix those leaks by freeing memory in the right places. >>> Freeing memory just in case in places where we can show that no leak is >>> triggered by our test suite doesn't help. >> >> >> It helps; it prevents leaks. The real culprit is the bogus API, but I >> don't see that changing anytime soon, so there are two options when >> somebody makes a mistake the API allows; leak or don't leak. And you >> seem to prefer the leak, even though it provides absolutely no >> advantage. > > It covers up bugs, It doesn't. I thought you already silently agreed that nobody would ever find that leak, as they haven't found the hundreds of leaks that plague Git's code. > What would be a better API? Making discard_index free the array is a good > first step; what else is bogus? 'initialized' for starters; it should be renamed to 'loaded' or removed, but removing it would require many more changes to make sure we don't load twice. Also, when loading cache entries, it might make sense to check if there's already entries that have not been previously discarded properly. In the meantime, just in case, the only sane thing to do is free the entries rather than leak. That being said I'm not interested in this patch any more. The patch is good yet after three tries and countless arguments it's still not applied, nor is there any sign of getting there. -- Felipe Contreras -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] read-cache: plug a few leaks
Am 08.06.2013 19:27, schrieb Felipe Contreras: On Sat, Jun 8, 2013 at 12:22 PM, René Scharfe wrote: Let's find and fix those leaks by freeing memory in the right places. Freeing memory just in case in places where we can show that no leak is triggered by our test suite doesn't help. It helps; it prevents leaks. The real culprit is the bogus API, but I don't see that changing anytime soon, so there are two options when somebody makes a mistake the API allows; leak or don't leak. And you seem to prefer the leak, even though it provides absolutely no advantage. It covers up bugs, which may seem helpful, but isn't, because it's better to fix the actual mistake. What would be a better API? Making discard_index free the array is a good first step; what else is bogus? René -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] read-cache: plug a few leaks
On Sat, Jun 8, 2013 at 12:22 PM, René Scharfe wrote: > Let's find and fix those leaks by freeing memory in the right places. > Freeing memory just in case in places where we can show that no leak is > triggered by our test suite doesn't help. It helps; it prevents leaks. The real culprit is the bogus API, but I don't see that changing anytime soon, so there are two options when somebody makes a mistake the API allows; leak or don't leak. And you seem to prefer the leak, even though it provides absolutely no advantage. -- Felipe Contreras -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] read-cache: plug a few leaks
Am 08.06.2013 18:53, schrieb Felipe Contreras: On Sat, Jun 8, 2013 at 10:56 AM, René Scharfe wrote: Am 08.06.2013 16:04, schrieb Felipe Contreras: On Sat, Jun 8, 2013 at 8:22 AM, René Scharfe wrote: Am 08.06.2013 14:15, schrieb Felipe Contreras: Why leave it out? If somebody makes the mistake of doing the above sequence, would you prefer that we leak? Leaking is better than silently cleaning up after a buggy caller because it still allows the underlying bug to be found. No, it doesn't. The pointer is replaced and forever lost. How is leaking memory helping anyone to find the bug? Valgrind can tell you where leaked memory was allocated, but not if you free it in the "wrong" place. It is the correct place to free it. And nobody will ever find it with valgrind, just like nobody has ever tracked down the hundreds of leaks in Git that happen reliably 100% of the time, much less the ones that happen rarely. We could argue about freeing it here or adding a discard_index call somewhere else (which seems more natural to me) once we had a call sequence that actually causes such a leak. The test suite contains none, so I'd say we need more tests first. Lots of the existing leaks were not worth plugging because the process would end right after freeing the memory. Leaving clean-up to the OS was a conscious design choice, simplifying the code. When such code is reused in a library or run multiple times while it was originally meant to be run only a single time (like with cherry-pick --stdin) the original assumption doesn't hold any more and we have a problem. Let's find and fix those leaks by freeing memory in the right places. Freeing memory just in case in places where we can show that no leak is triggered by our test suite doesn't help. René -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] read-cache: plug a few leaks
On Sat, Jun 8, 2013 at 10:56 AM, René Scharfe wrote: > Am 08.06.2013 16:04, schrieb Felipe Contreras: > >> On Sat, Jun 8, 2013 at 8:22 AM, René Scharfe >> wrote: >>> >>> Am 08.06.2013 14:15, schrieb Felipe Contreras: >> >> Why leave it out? If somebody makes the mistake of doing the above sequence, would you prefer that we leak? >>> >>> >>> Leaking is better than silently cleaning up after a buggy caller because >>> it >>> still allows the underlying bug to be found. >> >> >> No, it doesn't. The pointer is replaced and forever lost. How is >> leaking memory helping anyone to find the bug? > > Valgrind can tell you where leaked memory was allocated, but not if you free > it in the "wrong" place. It is the correct place to free it. And nobody will ever find it with valgrind, just like nobody has ever tracked down the hundreds of leaks in Git that happen reliably 100% of the time, much less the ones that happen rarely. -- Felipe Contreras -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] read-cache: plug a few leaks
Am 08.06.2013 16:04, schrieb Felipe Contreras: On Sat, Jun 8, 2013 at 8:22 AM, René Scharfe wrote: Am 08.06.2013 14:15, schrieb Felipe Contreras: Why leave it out? If somebody makes the mistake of doing the above sequence, would you prefer that we leak? Leaking is better than silently cleaning up after a buggy caller because it still allows the underlying bug to be found. No, it doesn't. The pointer is replaced and forever lost. How is leaking memory helping anyone to find the bug? Valgrind can tell you where leaked memory was allocated, but not if you free it in the "wrong" place. René -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] read-cache: plug a few leaks
On Sat, Jun 8, 2013 at 8:22 AM, René Scharfe wrote: > Am 08.06.2013 14:15, schrieb Felipe Contreras: >> Why leave it out? If somebody makes the mistake of doing the above >> sequence, would you prefer that we leak? > > Leaking is better than silently cleaning up after a buggy caller because it > still allows the underlying bug to be found. No, it doesn't. The pointer is replaced and forever lost. How is leaking memory helping anyone to find the bug? -- Felipe Contreras -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] read-cache: plug a few leaks
Am 08.06.2013 14:15, schrieb Felipe Contreras: On Sat, Jun 8, 2013 at 6:32 AM, René Scharfe wrote: diff --git a/read-cache.c b/read-cache.c index 5e30746..a1dd04d 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1451,6 +1451,7 @@ int read_index_from(struct index_state *istate, const char *path) istate->version = ntohl(hdr->hdr_version); istate->cache_nr = ntohl(hdr->hdr_entries); istate->cache_alloc = alloc_nr(istate->cache_nr); + free(istate->cache); istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache)); istate->initialized = 1; You wrote earlier that this change is safe with current callers and that it prevents leaks with the following sequence: discard_cache(); # add entries read_cache(); Do we currently have such a call sequence somewhere? I don't know. Wouldn't that be a bug, namely forgetting to call discard_cache before read_cache? Why would it be a bug? There's nothing in the API that hints there's a problem with that. A comment before read_index_from says "remember to discard_cache() before reading a different cache!". That is probably a reminder that read_index_from does nothing if ->initialized is set. Entries added before calling read_index_from make up a different cache, however, so I think this comment applies for the call sequence above as well. Only read_index_from and add_index_entry allocate ->cache, and only discard_index frees it, so the two are a triple like malloc, realloc and free. Granted, these hints are not part of the API -- that looks like a documentation bug, however. Side note: I wonder why we need to guard against multiple read_index_from calls in a row with ->initialized. Wouldn't it be easier to avoid the duplicate calls in the first place? Finding them now might be not so easy, though. I've added a "assert(istate->cache_nr == 0);" a few lines above and the test suite still passed. With the hunk below, ->cache is also always NULL and cache_alloc is always 0 at that point. So we don't need that free call there in the cases covered by the test suite at least -- better leave it out. Why leave it out? If somebody makes the mistake of doing the above sequence, would you prefer that we leak? Leaking is better than silently cleaning up after a buggy caller because it still allows the underlying bug to be found. Even better would be an assert, but it's important to make sure it doesn't trigger for existing use cases. René -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] read-cache: plug a few leaks
On Sat, Jun 8, 2013 at 6:32 AM, René Scharfe wrote: > Am 08.06.2013 00:29, schrieb Felipe Contreras: > >> We are not freeing 'istate->cache' properly. >> >> We can't rely on 'initialized' to keep track of the 'istate->cache', >> because it doesn't really mean it's initialized. So assume it always has >> data, and free it before overwriting it. > > > That sounds a bit backwards to me. If ->initialized doesn't mean that the > index state is initialized then something is fishy. Would it make sense and > be sufficient to set ->initialized in add_index_entry? I don't know. > Or to get rid of it and check for ->cache_alloc instead? That might make sense. I was thinking on renaming 'initialized' to 'loaded', but I really don't care. >> Signed-off-by: Felipe Contreras >> --- >> read-cache.c | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/read-cache.c b/read-cache.c >> index 5e30746..a1dd04d 100644 >> --- a/read-cache.c >> +++ b/read-cache.c >> @@ -1451,6 +1451,7 @@ int read_index_from(struct index_state *istate, >> const char *path) >> istate->version = ntohl(hdr->hdr_version); >> istate->cache_nr = ntohl(hdr->hdr_entries); >> istate->cache_alloc = alloc_nr(istate->cache_nr); >> + free(istate->cache); >> istate->cache = xcalloc(istate->cache_alloc, >> sizeof(*istate->cache)); >> istate->initialized = 1; > > > You wrote earlier that this change is safe with current callers and that it > prevents leaks with the following sequence: > > discard_cache(); > # add entries > read_cache(); > > Do we currently have such a call sequence somewhere? I don't know. > Wouldn't that be a > bug, namely forgetting to call discard_cache before read_cache? Why would it be a bug? There's nothing in the API that hints there's a problem with that. > I've added a "assert(istate->cache_nr == 0);" a few lines above and the test > suite still passed. With the hunk below, ->cache is also always NULL and > cache_alloc is always 0 at that point. So we don't need that free call > there in the cases covered by the test suite at least -- better leave it > out. Why leave it out? If somebody makes the mistake of doing the above sequence, would you prefer that we leak? -- Felipe Contreras -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] read-cache: plug a few leaks
Am 08.06.2013 00:29, schrieb Felipe Contreras: We are not freeing 'istate->cache' properly. We can't rely on 'initialized' to keep track of the 'istate->cache', because it doesn't really mean it's initialized. So assume it always has data, and free it before overwriting it. That sounds a bit backwards to me. If ->initialized doesn't mean that the index state is initialized then something is fishy. Would it make sense and be sufficient to set ->initialized in add_index_entry? Or to get rid of it and check for ->cache_alloc instead? Signed-off-by: Felipe Contreras --- read-cache.c | 4 1 file changed, 4 insertions(+) diff --git a/read-cache.c b/read-cache.c index 5e30746..a1dd04d 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1451,6 +1451,7 @@ int read_index_from(struct index_state *istate, const char *path) istate->version = ntohl(hdr->hdr_version); istate->cache_nr = ntohl(hdr->hdr_entries); istate->cache_alloc = alloc_nr(istate->cache_nr); + free(istate->cache); istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache)); istate->initialized = 1; You wrote earlier that this change is safe with current callers and that it prevents leaks with the following sequence: discard_cache(); # add entries read_cache(); Do we currently have such a call sequence somewhere? Wouldn't that be a bug, namely forgetting to call discard_cache before read_cache? I've added a "assert(istate->cache_nr == 0);" a few lines above and the test suite still passed. With the hunk below, ->cache is also always NULL and cache_alloc is always 0 at that point. So we don't need that free call there in the cases covered by the test suite at least -- better leave it out. @@ -1512,6 +1513,9 @@ int discard_index(struct index_state *istate) for (i = 0; i < istate->cache_nr; i++) free(istate->cache[i]); + free(istate->cache); + istate->cache = NULL; + istate->cache_alloc = 0; resolve_undo_clear_index(istate); istate->cache_nr = 0; istate->cache_changed = 0; I still like this part, but also would still recommend to remove the now doubly-outdated comment a few lines below that says "no need to throw away allocated active_cache". It was valid back when there was only a single in-memory instance of the index and no istate parameter. René -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/2] read-cache: plug a few leaks
We are not freeing 'istate->cache' properly. We can't rely on 'initialized' to keep track of the 'istate->cache', because it doesn't really mean it's initialized. So assume it always has data, and free it before overwriting it. Signed-off-by: Felipe Contreras --- read-cache.c | 4 1 file changed, 4 insertions(+) diff --git a/read-cache.c b/read-cache.c index 5e30746..a1dd04d 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1451,6 +1451,7 @@ int read_index_from(struct index_state *istate, const char *path) istate->version = ntohl(hdr->hdr_version); istate->cache_nr = ntohl(hdr->hdr_entries); istate->cache_alloc = alloc_nr(istate->cache_nr); + free(istate->cache); istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache)); istate->initialized = 1; @@ -1512,6 +1513,9 @@ int discard_index(struct index_state *istate) for (i = 0; i < istate->cache_nr; i++) free(istate->cache[i]); + free(istate->cache); + istate->cache = NULL; + istate->cache_alloc = 0; resolve_undo_clear_index(istate); istate->cache_nr = 0; istate->cache_changed = 0; -- 1.8.3.698.g079b096 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html