Re: [PATCH v3 00/24] Index-v5
Junio C Hamano gits...@pobox.com writes: Duy Nguyen pclo...@gmail.com writes: On Mon, Aug 19, 2013 at 2:41 AM, Thomas Gummerer t.gumme...@gmail.com wrote: I'm done reviewing this version (I neglected the extension writing patches because after spending hours on the main write patch I don't want to look at them anymore :p). Now that rc period is over, with a partial write proof-of-concept, I think it's enough to call Junio's attention on the series, see if we have any chance of merging it. The partial write POC is needed to make sure we don't overlook anything, just support update-index is enough. I've been following the review comment threads after looking at the patches myself when they were posted. I was hoping to see some API improvement over the current we (have to) have everything available in-core in a flat array model, which gives a lot of convenience and IO overhead at the same time, that would make me say yes, this operation, that we need to do very often, will certainly be helped by this new API, and in order to support that style of API better, the current file format is inadequate and we do need to go to the proposed tree like on-disk format for at least one, but unfortunately I haven't found any (yet). So... I think the issue is a bit different. The current API, with some small additions (e.g. read_index_filtered()) works well as in-memory format, even for partial reading/writing. I will try to write a POC for partial writing to show that the current in-memory format works for this too. As Duy wrote in the other email, some API changes will be necessary to allow that, but not a big API change moving from a flat array to a tree based format. I think it comes down to this operation will be helped by partial loading/writing and we need this small API changes (read_index_filtered() for now, more to follow) and the index format change to be able to do that. Does that make sense, with at least Duy's comments in the review addressed and a POC for partial writing? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/24] Index-v5
Duy Nguyen pclo...@gmail.com writes: On Mon, Aug 19, 2013 at 2:41 AM, Thomas Gummerer t.gumme...@gmail.com wrote: I'm done reviewing this version (I neglected the extension writing patches because after spending hours on the main write patch I don't want to look at them anymore :p). Now that rc period is over, with a partial write proof-of-concept, I think it's enough to call Junio's attention on the series, see if we have any chance of merging it. The partial write POC is needed to make sure we don't overlook anything, just support update-index is enough. I've been following the review comment threads after looking at the patches myself when they were posted. I was hoping to see some API improvement over the current we (have to) have everything available in-core in a flat array model, which gives a lot of convenience and IO overhead at the same time, that would make me say yes, this operation, that we need to do very often, will certainly be helped by this new API, and in order to support that style of API better, the current file format is inadequate and we do need to go to the proposed tree like on-disk format for at least one, but unfortunately I haven't found any (yet). So... -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/24] Index-v5
On Sun, Aug 25, 2013 at 10:07 AM, Junio C Hamano gits...@pobox.com wrote: Duy Nguyen pclo...@gmail.com writes: On Mon, Aug 19, 2013 at 2:41 AM, Thomas Gummerer t.gumme...@gmail.com wrote: I'm done reviewing this version (I neglected the extension writing patches because after spending hours on the main write patch I don't want to look at them anymore :p). Now that rc period is over, with a partial write proof-of-concept, I think it's enough to call Junio's attention on the series, see if we have any chance of merging it. The partial write POC is needed to make sure we don't overlook anything, just support update-index is enough. I've been following the review comment threads after looking at the patches myself when they were posted. I was hoping to see some API improvement over the current we (have to) have everything available in-core in a flat array model, which gives a lot of convenience and IO overhead at the same time, that would make me say yes, this operation, that we need to do very often, will certainly be helped by this new API, and in order to support that style of API better, the current file format is inadequate and we do need to go to the proposed tree like on-disk format for at least one, but unfortunately I haven't found any (yet). Thomas is in the best position to answer this, but I'll give it a try. In my opinon, v2-4 works well for moderate-sized worktrees, v5 aims to make the index scale better. One way to make it scale is not to read the whole index up when you only need a portion of the index. read_index_filtered() enables this. We could implement read_index_filtered() on v2 too, but because v2 lacks proper data structure to support it, we need to scan through all on-disk entries. git diff and git status with pathspec may benefit from this (and for large worktrees, people better use pathspec than whole-tree status). The flat (but not full) array model seems best fit because we still need to support v2. Another v5 improvement is fast git add -u/git commit -a when partial write is implemented. I don't think such a patch is posted. There may be API addition to aid v5 code but it should not be big API change. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 00/24] Index-v5
On Mon, Aug 19, 2013 at 2:41 AM, Thomas Gummerer t.gumme...@gmail.com wrote: Hi, previous rounds (without api) are at $gmane/202752, $gmane/202923, $gmane/203088 and $gmane/203517, the previous rounds with api were at $gmane/229732 and $gmane/230210. Thanks to Duy for reviewing the the last round and Junio and Ramsay for additional comments. I'm done reviewing this version (I neglected the extension writing patches because after spending hours on the main write patch I don't want to look at them anymore :p). Now that rc period is over, with a partial write proof-of-concept, I think it's enough to call Junio's attention on the series, see if we have any chance of merging it. The partial write POC is needed to make sure we don't overlook anything, just support update-index is enough. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 00/24] Index-v5
Hi, previous rounds (without api) are at $gmane/202752, $gmane/202923, $gmane/203088 and $gmane/203517, the previous rounds with api were at $gmane/229732 and $gmane/230210. Thanks to Duy for reviewing the the last round and Junio and Ramsay for additional comments. Changes since the previous round: read-cache: move index v2 specific functions to their own file - set istate-ops to NULL in discard_index read-cache: add index reading api - style fixes - instead of using internal_ops struct, do for_each_index_entry in read-cache.c grep.c: use index api - remove duplicate call to match_pathspec_depth ls-files.c: use index api - load the whole index if there is a trai documentation: add documentation of the index-v5 file format - fix typo - change the position of nfile and ndir in the index file - document that the conflicts are also stored in the fileentries block - document invalid flag read-cache: read index-v5 - restrict partial loading a bit more, by being more careful when adjusting the pathspec - move the ondisk structs from cache.h to read-cache-v5.c - merge for and while loop in read_entries - keep a directory tree instead of a flat list when reading the directories - ce_queue_push moved to read-cache: write index-v5 using a next_ce pointer instead of the next pointer that's already used by name-hash. - fix reading if there are extensions that are not yet supported - ignore entries that have the invalid flag set read-cache: read cache-tree in index-v5 - use the tree structure which is now used in read index-v5 read-cache: write index-v5 - simplify compile_directory_data changes to the index file format: - store the number of files before the number of directories in the header, so that the file command still can recognize the number of files in the repository correctly. - store all staged entries in the fileentries block. Doesn't hurt the performance a lot but simplifies the code. - add an invalid flag for entries that should be ignored. currently unused but respected when reading. will be used once the conflict resolution is done by flipping a bit in the conflict entries at the end of the index. added commits: - read-cache: use fixed width integer types - read-cache: clear version in discard_index() - read-cache: Don't compare uid, gid and ino on cygwin - introduce GIT_INDEX_VERSION environment variable - test-lib: allow setting the index format version Thomas Gummerer (23): t2104: Don't fail for index versions other than [23] read-cache: use fixed width integer types read-cache: split index file version specific functionality read-cache: clear version in discard_index() read-cache: move index v2 specific functions to their own file read-cache: Don't compare uid, gid and ino on cygwin read-cache: Re-read index if index file changed add documentation for the index api read-cache: add index reading api make sure partially read index is not changed grep.c: use index api ls-files.c: use index api documentation: add documentation of the index-v5 file format read-cache: make in-memory format aware of stat_crc read-cache: read index-v5 read-cache: read resolve-undo data read-cache: read cache-tree in index-v5 read-cache: write index-v5 read-cache: write index-v5 cache-tree data read-cache: write resolve-undo data for index-v5 update-index.c: rewrite index when index-version is given introduce GIT_INDEX_VERSION environment variable test-lib: allow setting the index format version Thomas Rast (1): p0003-index.sh: add perf test for the index formats Documentation/technical/api-in-core-index.txt| 54 +- Documentation/technical/index-file-format-v5.txt | 301 + Makefile | 10 + builtin/apply.c |2 + builtin/grep.c | 69 +- builtin/ls-files.c | 36 +- builtin/update-index.c |6 +- cache-tree.c |2 +- cache-tree.h |1 + cache.h | 93 +- read-cache-v2.c | 550 + read-cache-v5.c | 1417 ++ read-cache.c | 685 +++ read-cache.h | 61 + t/perf/p0003-index.sh| 63 + t/t2104-update-index-skip-worktree.sh|1 + t/test-lib-functions.sh |5 + t/test-lib.sh|3 + test-index-version.c |6 + unpack-trees.c |3 +- 20 files changed, 2786 insertions(+), 582 deletions(-) create mode 100644