Re: [PATCH 1/1] read-cache: update index format default to v4

2018-09-26 Thread Matthias Sohn
On Tue, Sep 25, 2018 at 8:01 PM Stefan Beller  wrote:
>
> On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee  wrote:
> >
> > On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
> > > On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
> > >> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via 
> > >> GitGitGadget wrote:
> > >>> From: Derrick Stolee 
> > >>>
> > >>> The index v4 format has been available since 2012 with 9d22778
> > >>> "reach-cache.c: write prefix-compressed names in the index". Since
> > >>> the format has been stable for so long, almost all versions of Git
> > >>> in use today understand version 4, removing one barrier to upgrade
> > >>> -- that someone may want to downgrade and needs a working repo.
> > >> What about alternative implementations, like JGit, libgit2, etc.?
> > > Speaking of libgit2, we are able to read and write index v4 since
> > > commit c1b370e93
> >
> > This is a good point, Szeder.
> >
> > Patrick: I'm glad LibGit2 is up-to-date with index formats.
> >
> > Unfortunately, taking a look (for the first time) at the JGit code
> > reveals that they don't appear to have v4 support. In
> > org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the
> > DirCache.readFrom() method: lines 488-494, I see the following snippet:
> >
> >  final int ver = NB.decodeInt32(hdr, 4);
> >  boolean extended = false;
> >  if (ver == 3)
> >  extended = true;
> >  else if (ver != 2)
> >  throw new
> > CorruptObjectException(MessageFormat.format(
> > JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));
> >
> > It looks like this will immediately throw with versions other than 2 or 3.
> >
> > I'm adding Jonathan Nieder to CC so he can check with JGit people about
> > the impact of this change.
>
> JGit is used both on the server (which doesn't use index/staging area)
> as well as client side as e.g. an Eclipse integration, which would
> very much like to use the index.
>
> Adding Matthias Sohn as well, who is active in JGit and cares
> more about the client side than Googlers who only make use
> of the server side part of JGit.

thanks for the heads up, in fact JGit does not yet support index version 4.
I will look into this.

-Matthias


Re: [PATCH 1/1] read-cache: update index format default to v4

2018-09-25 Thread Ben Peart




On 9/25/2018 2:01 PM, Stefan Beller wrote:

On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee  wrote:


On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:

On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:

On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:

From: Derrick Stolee 

The index v4 format has been available since 2012 with 9d22778
"reach-cache.c: write prefix-compressed names in the index". Since
the format has been stable for so long, almost all versions of Git
in use today understand version 4, removing one barrier to upgrade
-- that someone may want to downgrade and needs a working repo.

What about alternative implementations, like JGit, libgit2, etc.?

Speaking of libgit2, we are able to read and write index v4 since
commit c1b370e93


This is a good point, Szeder.

Patrick: I'm glad LibGit2 is up-to-date with index formats.

Unfortunately, taking a look (for the first time) at the JGit code
reveals that they don't appear to have v4 support. In
org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the
DirCache.readFrom() method: lines 488-494, I see the following snippet:

  final int ver = NB.decodeInt32(hdr, 4);
  boolean extended = false;
  if (ver == 3)
  extended = true;
  else if (ver != 2)
  throw new
CorruptObjectException(MessageFormat.format(
JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));

It looks like this will immediately throw with versions other than 2 or 3.

I'm adding Jonathan Nieder to CC so he can check with JGit people about
the impact of this change.


JGit is used both on the server (which doesn't use index/staging area)
as well as client side as e.g. an Eclipse integration, which would
very much like to use the index.

Adding Matthias Sohn as well, who is active in JGit and cares
more about the client side than Googlers who only make use
of the server side part of JGit.

Stefan



In addition to the compatibility with other implementations of git 
question, I think we should include Duy's patch [1] to "optimize reading 
index format v4" before flipping the default to V4.


In my testing, this reduced the index load time of V4 by 10%-15% making 
the CPU cost only ~2% more expensive than V2 or V3 indexes.  This 
increase in CPU cost is more than offset by the significant reduction in 
file IO due to the reduced index file size.


[1] https://public-inbox.org/git/20180825064458.28484-1-pclo...@gmail.com/


Re: [PATCH 1/1] read-cache: update index format default to v4

2018-09-25 Thread Stefan Beller
On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee  wrote:
>
> On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
> > On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
> >> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget 
> >> wrote:
> >>> From: Derrick Stolee 
> >>>
> >>> The index v4 format has been available since 2012 with 9d22778
> >>> "reach-cache.c: write prefix-compressed names in the index". Since
> >>> the format has been stable for so long, almost all versions of Git
> >>> in use today understand version 4, removing one barrier to upgrade
> >>> -- that someone may want to downgrade and needs a working repo.
> >> What about alternative implementations, like JGit, libgit2, etc.?
> > Speaking of libgit2, we are able to read and write index v4 since
> > commit c1b370e93
>
> This is a good point, Szeder.
>
> Patrick: I'm glad LibGit2 is up-to-date with index formats.
>
> Unfortunately, taking a look (for the first time) at the JGit code
> reveals that they don't appear to have v4 support. In
> org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the
> DirCache.readFrom() method: lines 488-494, I see the following snippet:
>
>  final int ver = NB.decodeInt32(hdr, 4);
>  boolean extended = false;
>  if (ver == 3)
>  extended = true;
>  else if (ver != 2)
>  throw new
> CorruptObjectException(MessageFormat.format(
> JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));
>
> It looks like this will immediately throw with versions other than 2 or 3.
>
> I'm adding Jonathan Nieder to CC so he can check with JGit people about
> the impact of this change.

JGit is used both on the server (which doesn't use index/staging area)
as well as client side as e.g. an Eclipse integration, which would
very much like to use the index.

Adding Matthias Sohn as well, who is active in JGit and cares
more about the client side than Googlers who only make use
of the server side part of JGit.

Stefan


Re: [PATCH 1/1] read-cache: update index format default to v4

2018-09-25 Thread Derrick Stolee

On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:

On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:

On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:

From: Derrick Stolee 

The index v4 format has been available since 2012 with 9d22778
"reach-cache.c: write prefix-compressed names in the index". Since
the format has been stable for so long, almost all versions of Git
in use today understand version 4, removing one barrier to upgrade
-- that someone may want to downgrade and needs a working repo.

What about alternative implementations, like JGit, libgit2, etc.?

Speaking of libgit2, we are able to read and write index v4 since
commit c1b370e93


This is a good point, Szeder.

Patrick: I'm glad LibGit2 is up-to-date with index formats.

Unfortunately, taking a look (for the first time) at the JGit code 
reveals that they don't appear to have v4 support. In 
org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the 
DirCache.readFrom() method: lines 488-494, I see the following snippet:


    final int ver = NB.decodeInt32(hdr, 4);
    boolean extended = false;
    if (ver == 3)
    extended = true;
    else if (ver != 2)
    throw new 
CorruptObjectException(MessageFormat.format(

JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));

It looks like this will immediately throw with versions other than 2 or 3.

I'm adding Jonathan Nieder to CC so he can check with JGit people about 
the impact of this change.


Thanks,

-Stolee




Re: [PATCH 1/1] read-cache: update index format default to v4

2018-09-25 Thread Patrick Steinhardt
On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget 
> wrote:
> > From: Derrick Stolee 
> > 
> > The index v4 format has been available since 2012 with 9d22778
> > "reach-cache.c: write prefix-compressed names in the index". Since
> > the format has been stable for so long, almost all versions of Git
> > in use today understand version 4, removing one barrier to upgrade
> > -- that someone may want to downgrade and needs a working repo.
> 
> What about alternative implementations, like JGit, libgit2, etc.?

Speaking of libgit2, we are able to read and write index v4 since
commit c1b370e93 (Merge pull request #3837 from
novalis/dturner/indexv4, 2016-08-17), released with v0.25 in
December 2016. Due to insufficient tests, our read support was
initially broken, which got fixed with commit 3bc95cfe3 (Merge
pull request #4236 from pks-t/pks/index-v4-fixes, 2017-06-07) and
released with v0.26 in June 2017.

Right now I'm not aware of additional bugs in our index v4
support, so we'd be fine with changing the default.

Patrick

> > Despite being stable for a long time, this index version was never
> > adopted as the default. This prefix-compressed version of the format
> > can get significant space savings on repos with large working
> > directories (which naturally tend to have deep nesting). This version
> > is set as the default for some external tools, such as VFS for Git.
> > Because of this external use, the format has had a lot of "testing in
> > production" and also is subject to continuous integration in these
> > environments.
> > 
> > Previously, to test version 4 indexes, we needed to run the test
> > suite with GIT_TEST_INDEX_VERSION=4 (or TEST_GIT_INDEX_VERSION=4).
> > 
> > One potential, but short-term, downside is that we lose coverage of
> > the version 3 indexes. The trade-off is that we may want to cover
> > that version using GIT_TEST_INDEX_VERSION=3.
> > 
> > Signed-off-by: Derrick Stolee 
> > ---
> >  read-cache.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/read-cache.c b/read-cache.c
> > index 372588260e..af6c8f2a67 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1484,7 +1484,7 @@ struct cache_entry *refresh_cache_entry(struct 
> > cache_entry *ce,
> >   * Index File I/O
> >   */
> >  
> > -#define INDEX_FORMAT_DEFAULT 3
> > +#define INDEX_FORMAT_DEFAULT 4
> >  
> >  static unsigned int get_index_format_default(void)
> >  {
> > -- 
> > gitgitgadget


signature.asc
Description: PGP signature


Re: [PATCH 1/1] read-cache: update index format default to v4

2018-09-24 Thread Junio C Hamano
SZEDER Gábor  writes:

> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget 
> wrote:
>> From: Derrick Stolee 
>> 
>> The index v4 format has been available since 2012 with 9d22778
>> "reach-cache.c: write prefix-compressed names in the index". Since
>> the format has been stable for so long, almost all versions of Git
>> in use today understand version 4, removing one barrier to upgrade
>> -- that someone may want to downgrade and needs a working repo.
>
> What about alternative implementations, like JGit, libgit2, etc.?

Good question.

Because the index-version of an index file is designed to be sticky,
repos that need to be accessed by other implementations can keep
whatever current version.  A new repo that need to be accessed by
them can be (forcibly) written in v2 and keep its v2ness, I would
think.

And that would serve as an incentive for the implementations to
catch up ;-)





Re: [PATCH 1/1] read-cache: update index format default to v4

2018-09-24 Thread SZEDER Gábor
On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee 
> 
> The index v4 format has been available since 2012 with 9d22778
> "reach-cache.c: write prefix-compressed names in the index". Since
> the format has been stable for so long, almost all versions of Git
> in use today understand version 4, removing one barrier to upgrade
> -- that someone may want to downgrade and needs a working repo.

What about alternative implementations, like JGit, libgit2, etc.?

> Despite being stable for a long time, this index version was never
> adopted as the default. This prefix-compressed version of the format
> can get significant space savings on repos with large working
> directories (which naturally tend to have deep nesting). This version
> is set as the default for some external tools, such as VFS for Git.
> Because of this external use, the format has had a lot of "testing in
> production" and also is subject to continuous integration in these
> environments.
> 
> Previously, to test version 4 indexes, we needed to run the test
> suite with GIT_TEST_INDEX_VERSION=4 (or TEST_GIT_INDEX_VERSION=4).
> 
> One potential, but short-term, downside is that we lose coverage of
> the version 3 indexes. The trade-off is that we may want to cover
> that version using GIT_TEST_INDEX_VERSION=3.
> 
> Signed-off-by: Derrick Stolee 
> ---
>  read-cache.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/read-cache.c b/read-cache.c
> index 372588260e..af6c8f2a67 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1484,7 +1484,7 @@ struct cache_entry *refresh_cache_entry(struct 
> cache_entry *ce,
>   * Index File I/O
>   */
>  
> -#define INDEX_FORMAT_DEFAULT 3
> +#define INDEX_FORMAT_DEFAULT 4
>  
>  static unsigned int get_index_format_default(void)
>  {
> -- 
> gitgitgadget


[PATCH 1/1] read-cache: update index format default to v4

2018-09-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The index v4 format has been available since 2012 with 9d22778
"reach-cache.c: write prefix-compressed names in the index". Since
the format has been stable for so long, almost all versions of Git
in use today understand version 4, removing one barrier to upgrade
-- that someone may want to downgrade and needs a working repo.

Despite being stable for a long time, this index version was never
adopted as the default. This prefix-compressed version of the format
can get significant space savings on repos with large working
directories (which naturally tend to have deep nesting). This version
is set as the default for some external tools, such as VFS for Git.
Because of this external use, the format has had a lot of "testing in
production" and also is subject to continuous integration in these
environments.

Previously, to test version 4 indexes, we needed to run the test
suite with GIT_TEST_INDEX_VERSION=4 (or TEST_GIT_INDEX_VERSION=4).

One potential, but short-term, downside is that we lose coverage of
the version 3 indexes. The trade-off is that we may want to cover
that version using GIT_TEST_INDEX_VERSION=3.

Signed-off-by: Derrick Stolee 
---
 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 372588260e..af6c8f2a67 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1484,7 +1484,7 @@ struct cache_entry *refresh_cache_entry(struct 
cache_entry *ce,
  * Index File I/O
  */
 
-#define INDEX_FORMAT_DEFAULT 3
+#define INDEX_FORMAT_DEFAULT 4
 
 static unsigned int get_index_format_default(void)
 {
-- 
gitgitgadget