Re: [PATCH v3 00/24] Index-v5

2013-08-30 Thread Thomas Gummerer
Junio C Hamano gits...@pobox.com writes:

 Duy Nguyen pclo...@gmail.com writes:

 On Mon, Aug 19, 2013 at 2:41 AM, Thomas Gummerer t.gumme...@gmail.com 
 wrote:

 I'm done reviewing this version (I neglected the extension writing
 patches because after spending hours on the main write patch I don't
 want to look at them anymore :p). Now that rc period is over, with a
 partial write proof-of-concept, I think it's enough to call Junio's
 attention on the series, see if we have any chance of merging it. The
 partial write POC is needed to make sure we don't overlook anything,
 just support update-index is enough.

 I've been following the review comment threads after looking at the
 patches myself when they were posted. I was hoping to see some API
 improvement over the current we (have to) have everything available
 in-core in a flat array model, which gives a lot of convenience and
 IO overhead at the same time, that would make me say yes, this
 operation, that we need to do very often, will certainly be helped
 by this new API, and in order to support that style of API better,
 the current file format is inadequate and we do need to go to the
 proposed tree like on-disk format for at least one, but
 unfortunately I haven't found any (yet).

 So...

I think the issue is a bit different.  The current API, with some small
additions (e.g. read_index_filtered()) works well as in-memory format,
even for partial reading/writing.  I will try to write a POC for partial
writing to show that the current in-memory format works for this too.
As Duy wrote in the other email, some API changes will be necessary to
allow that, but not a big API change moving from a flat array to a tree
based format.

I think it comes down to this operation will be helped by partial
loading/writing and we need this small API changes
(read_index_filtered() for now, more to follow) and the index format
change to be able to do that.

Does that make sense, with at least Duy's comments in the review
addressed and a POC for partial writing?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/24] Index-v5

2013-08-24 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 On Mon, Aug 19, 2013 at 2:41 AM, Thomas Gummerer t.gumme...@gmail.com wrote:

 I'm done reviewing this version (I neglected the extension writing
 patches because after spending hours on the main write patch I don't
 want to look at them anymore :p). Now that rc period is over, with a
 partial write proof-of-concept, I think it's enough to call Junio's
 attention on the series, see if we have any chance of merging it. The
 partial write POC is needed to make sure we don't overlook anything,
 just support update-index is enough.

I've been following the review comment threads after looking at the
patches myself when they were posted. I was hoping to see some API
improvement over the current we (have to) have everything available
in-core in a flat array model, which gives a lot of convenience and
IO overhead at the same time, that would make me say yes, this
operation, that we need to do very often, will certainly be helped
by this new API, and in order to support that style of API better,
the current file format is inadequate and we do need to go to the
proposed tree like on-disk format for at least one, but
unfortunately I haven't found any (yet).

So...
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/24] Index-v5

2013-08-24 Thread Duy Nguyen
On Sun, Aug 25, 2013 at 10:07 AM, Junio C Hamano gits...@pobox.com wrote:
 Duy Nguyen pclo...@gmail.com writes:

 On Mon, Aug 19, 2013 at 2:41 AM, Thomas Gummerer t.gumme...@gmail.com 
 wrote:

 I'm done reviewing this version (I neglected the extension writing
 patches because after spending hours on the main write patch I don't
 want to look at them anymore :p). Now that rc period is over, with a
 partial write proof-of-concept, I think it's enough to call Junio's
 attention on the series, see if we have any chance of merging it. The
 partial write POC is needed to make sure we don't overlook anything,
 just support update-index is enough.

 I've been following the review comment threads after looking at the
 patches myself when they were posted. I was hoping to see some API
 improvement over the current we (have to) have everything available
 in-core in a flat array model, which gives a lot of convenience and
 IO overhead at the same time, that would make me say yes, this
 operation, that we need to do very often, will certainly be helped
 by this new API, and in order to support that style of API better,
 the current file format is inadequate and we do need to go to the
 proposed tree like on-disk format for at least one, but
 unfortunately I haven't found any (yet).

Thomas is in the best position to answer this, but I'll give it a try.
In my opinon, v2-4 works well for moderate-sized worktrees, v5 aims to
make the index scale better. One way to make it scale is not to read
the whole index up when you only need a portion of the index.
read_index_filtered() enables this. We could implement
read_index_filtered() on v2 too, but because v2 lacks proper data
structure to support it, we need to scan through all on-disk entries.
git diff and git status with pathspec may benefit from this (and
for large worktrees, people better use pathspec than whole-tree
status). The flat (but not full) array model seems best fit because
we still need to support v2. Another v5 improvement is fast git add
-u/git commit -a when partial write is implemented. I don't think
such a patch is posted. There may be API addition to aid v5 code but
it should not be big API change.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/24] Index-v5

2013-08-23 Thread Duy Nguyen
On Mon, Aug 19, 2013 at 2:41 AM, Thomas Gummerer t.gumme...@gmail.com wrote:
 Hi,

 previous rounds (without api) are at $gmane/202752, $gmane/202923,
 $gmane/203088 and $gmane/203517, the previous rounds with api were at
 $gmane/229732 and $gmane/230210.  Thanks to Duy for reviewing the the
 last round and Junio and Ramsay for additional comments.

I'm done reviewing this version (I neglected the extension writing
patches because after spending hours on the main write patch I don't
want to look at them anymore :p). Now that rc period is over, with a
partial write proof-of-concept, I think it's enough to call Junio's
attention on the series, see if we have any chance of merging it. The
partial write POC is needed to make sure we don't overlook anything,
just support update-index is enough.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 00/24] Index-v5

2013-08-18 Thread Thomas Gummerer
Hi,

previous rounds (without api) are at $gmane/202752, $gmane/202923,
$gmane/203088 and $gmane/203517, the previous rounds with api were at
$gmane/229732 and $gmane/230210.  Thanks to Duy for reviewing the the
last round and Junio and Ramsay for additional comments.

Changes since the previous round:

read-cache: move index v2 specific functions to their own file
  - set istate-ops to NULL in discard_index

read-cache: add index reading api
  - style fixes
  - instead of using internal_ops struct, do for_each_index_entry in
read-cache.c

grep.c: use index api
  - remove duplicate call to match_pathspec_depth

ls-files.c: use index api
  - load the whole index if there is a trai

documentation: add documentation of the index-v5 file format
  - fix typo
  - change the position of nfile and ndir in the index file
  - document that the conflicts are also stored in the fileentries
block
  - document invalid flag

read-cache: read index-v5
  - restrict partial loading a bit more, by being more careful when
adjusting the pathspec
  - move the ondisk structs from cache.h to read-cache-v5.c
  - merge for and while loop in read_entries
  - keep a directory tree instead of a flat list when reading the
directories
  - ce_queue_push moved to read-cache: write index-v5 using a next_ce
pointer instead of the next pointer that's already used by
name-hash.
  - fix reading if there are extensions that are not yet supported
  - ignore entries that have the invalid flag set

read-cache: read cache-tree in index-v5
  - use the tree structure which is now used in read index-v5

read-cache: write index-v5
  - simplify compile_directory_data

changes to the index file format:
  - store the number of files before the number of directories in the
header, so that the file command still can recognize the number of
files in the repository correctly.
  - store all staged entries in the fileentries block. Doesn't hurt
the performance a lot but simplifies the code.
  - add an invalid flag for entries that should be ignored.  currently
unused but respected when reading.  will be used once the conflict
resolution is done by flipping a bit in the conflict entries at the
end of the index.
  
added commits:
  - read-cache: use fixed width integer types
  - read-cache: clear version in discard_index()
  - read-cache: Don't compare uid, gid and ino on cygwin
  - introduce GIT_INDEX_VERSION environment variable
  - test-lib: allow setting the index format version

Thomas Gummerer (23):
  t2104: Don't fail for index versions other than [23]
  read-cache: use fixed width integer types
  read-cache: split index file version specific functionality
  read-cache: clear version in discard_index()
  read-cache: move index v2 specific functions to their own file
  read-cache: Don't compare uid, gid and ino on cygwin
  read-cache: Re-read index if index file changed
  add documentation for the index api
  read-cache: add index reading api
  make sure partially read index is not changed
  grep.c: use index api
  ls-files.c: use index api
  documentation: add documentation of the index-v5 file format
  read-cache: make in-memory format aware of stat_crc
  read-cache: read index-v5
  read-cache: read resolve-undo data
  read-cache: read cache-tree in index-v5
  read-cache: write index-v5
  read-cache: write index-v5 cache-tree data
  read-cache: write resolve-undo data for index-v5
  update-index.c: rewrite index when index-version is given
  introduce GIT_INDEX_VERSION environment variable
  test-lib: allow setting the index format version

Thomas Rast (1):
  p0003-index.sh: add perf test for the index formats

 Documentation/technical/api-in-core-index.txt|   54 +-
 Documentation/technical/index-file-format-v5.txt |  301 +
 Makefile |   10 +
 builtin/apply.c  |2 +
 builtin/grep.c   |   69 +-
 builtin/ls-files.c   |   36 +-
 builtin/update-index.c   |6 +-
 cache-tree.c |2 +-
 cache-tree.h |1 +
 cache.h  |   93 +-
 read-cache-v2.c  |  550 +
 read-cache-v5.c  | 1417 ++
 read-cache.c |  685 +++
 read-cache.h |   61 +
 t/perf/p0003-index.sh|   63 +
 t/t2104-update-index-skip-worktree.sh|1 +
 t/test-lib-functions.sh  |5 +
 t/test-lib.sh|3 +
 test-index-version.c |6 +
 unpack-trees.c   |3 +-
 20 files changed, 2786 insertions(+), 582 deletions(-)
 create mode 100644