Re: [PATCH 0/4] Handling unmerged files with merged entries
Johannes Sixt writes: > Am 21.08.2014 00:19, schrieb Junio C Hamano: >> For that, we need to catch an index whose entries are not sorted and >> error out, perhaps when read_index_from() iterates over the mmapped >> index entries. We can even draw that "hopelessly corrupt" line >> above the breakage you are addressing and add a check to make sure >> no path has both merged and unmerged entries to the same check to >> make it error out. > > Except that we can't declare an index with both merged and unmerged > entries as "hopelessly corrupt, return to sender" when it's dead easy to > generate with the git tool set: > > >x > name=$(git hash-object -w x) > for i in 0 1 2 3; do printf '100644 %s %d\tx\n' $name $i; done | > git update-index --index-info Because hash-object and update-index deliberately have these holes to allow us (read: me ;-) to easily experiment new and/or unallowed formats, I wouldn't take that as a serious objection. It is dead easy to corrupt your repository or lose your data by /bin/rm, too ;-) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] Handling unmerged files with merged entries
Am 21.08.2014 00:19, schrieb Junio C Hamano: > For that, we need to catch an index whose entries are not sorted and > error out, perhaps when read_index_from() iterates over the mmapped > index entries. We can even draw that "hopelessly corrupt" line > above the breakage you are addressing and add a check to make sure > no path has both merged and unmerged entries to the same check to > make it error out. Except that we can't declare an index with both merged and unmerged entries as "hopelessly corrupt, return to sender" when it's dead easy to generate with the git tool set: >x name=$(git hash-object -w x) for i in 0 1 2 3; do printf '100644 %s %d\tx\n' $name $i; done | git update-index --index-info -- Hannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] Handling unmerged files with merged entries
Good points. On Thu, Aug 21, 2014 at 12:19 AM, Junio C Hamano wrote: > After looking at what you did in 1/4, I started to wonder if we can > solve this in add_index_entry_with_check() in a less intrusive way. > When we call the function with a stage #0 entry, we are telling the > index that any entry in higher stage for the same path must > disappear. Since the current implementation of the function assumes > that the index is not corrupt in this particular way to have both > merged and unmerged entries for the same path, it fails to remove > the higher stage entries. If we fix the function, wouldn't it make > your 1/4 unnecessary? Read-only operations such as "ls-files -s" > would not call add_index_entry() so diagnostic tools would not be > affected even with such a fix. > Another thing that is done in 1/4 is to get rid of the call to index_name_pos, that can lead to infinite loops depending on what the previous add_index_entry call does as we have seen, and I wonder why is it really needed, specially if we guarantee the order in the index. > ... which may look something like the one attached at the end. > And it would be more in the line of my first patch. > But then it made me wonder even more. > > There are other ways a piece of software can leave a corrupt index > for us to read from. Your fix, or the simpler one I suggested for > that matter, would still assume that the index entries are in the > sorted order, and a corrupt index that does not sort its entries > correctly will cause us to behave in an undefined way. At some > point we should draw a line and say "Your index is hopelessly > corrupt.", send it back to whatever broken software that originally > wrote such a mess and have the user use that software to fix the > corrupt index up before talking to us. > True. > For that, we need to catch an index whose entries are not sorted and > error out, perhaps when read_index_from() iterates over the mmapped > index entries. We can even draw that "hopelessly corrupt" line > above the breakage you are addressing and add a check to make sure > no path has both merged and unmerged entries to the same check to > make it error out. > > I suspect that such a "detect and error out" may be sufficient and > also may be more robust than the approach that assumes that a > breakage is only to have both merged and unmerged entries for the > same path, the entries are still correctly sorted. > Agree. I have prepared an initial patch for this to discuss, but adding checks in read_index_from() can add a small(?) penalization to all git operations, specially with big indexes. And it wouldn't probably allow the user to fix the repository using git commands (unless we only warn instead of die depending on the thing that is broken). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] Handling unmerged files with merged entries
Jaime Soriano Pastor writes: > New approach for the case of finding unmerged files with merged entries > in the index. > After some discussion the solution tries to: > - Avoid the problems with infinite loops in this case. > - Provide better information to the user in the commands affected. > - Make sure there are ways to clean the index. > - Provide also a way to specifically recover each one of the files with > this problem. > > With these patches the behaviour of these commands (for this case) change: > - git reset is able to finish, cleaning the index, but warning out the > information about the removed stages. > - git merge is able to finish, reporting that there is a merge in progress as > usual, it also warns about the unmerged files with merged entries. > - git add fails when this case happens, telling the user to check the state > with 'git ls-files -s', before, it did nothing. The same with git commit -a. > - git update-index --cacheinfo can be used to select an specific staged > version to resolve the conflict, without the need of reseting the working > copy. It did nothing before. > > Tests added for these cases. Rest of the tests remain unchanged and pass too. Thanks. After looking at what you did in 1/4, I started to wonder if we can solve this in add_index_entry_with_check() in a less intrusive way. When we call the function with a stage #0 entry, we are telling the index that any entry in higher stage for the same path must disappear. Since the current implementation of the function assumes that the index is not corrupt in this particular way to have both merged and unmerged entries for the same path, it fails to remove the higher stage entries. If we fix the function, wouldn't it make your 1/4 unnecessary? Read-only operations such as "ls-files -s" would not call add_index_entry() so diagnostic tools would not be affected even with such a fix. ... which may look something like the one attached at the end. But then it made me wonder even more. There are other ways a piece of software can leave a corrupt index for us to read from. Your fix, or the simpler one I suggested for that matter, would still assume that the index entries are in the sorted order, and a corrupt index that does not sort its entries correctly will cause us to behave in an undefined way. At some point we should draw a line and say "Your index is hopelessly corrupt.", send it back to whatever broken software that originally wrote such a mess and have the user use that software to fix the corrupt index up before talking to us. For that, we need to catch an index whose entries are not sorted and error out, perhaps when read_index_from() iterates over the mmapped index entries. We can even draw that "hopelessly corrupt" line above the breakage you are addressing and add a check to make sure no path has both merged and unmerged entries to the same check to make it error out. I suspect that such a "detect and error out" may be sufficient and also may be more robust than the approach that assumes that a breakage is only to have both merged and unmerged entries for the same path, the entries are still correctly sorted. read-cache.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/read-cache.c b/read-cache.c index 7f5645e..56006a3 100644 --- a/read-cache.c +++ b/read-cache.c @@ -943,9 +943,16 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e if (pos >= 0) { if (!new_only) replace_index_entry(istate, pos, ce); - return 0; + /* +* ... but protect ourselves from a corrupt index +* that has an unmerged entry for the same path. +*/ + if (istate->cache_nr <= pos + 1 || + !ce_same_name(ce, istate->cache[pos + 1])) + return 0; + } else { + pos = -pos-1; } - pos = -pos-1; /* * Inserting a merged entry ("stage 0") into the index [Footnote] -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html