Re: [PATCH v2] index-format.txt: be more liberal on what can represent invalid cache tree

2012-12-13 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Nguyễn Thái Ngọc Duy  pclo...@gmail.com writes:

  How would that work with existing versions? If you write -2 in
  cache-tree, the next time 1.8.0 updates cache tree it writes -1 back.
  That loses whatever information you attach to -2. A new cache-tree
  extension is probably better.

 You can easily imagine a definition like this:
 ...

As we clarified that we do not allow implementations to write
anything but -1 for invalidated entries until we decide what we will
use other values for, the whole log message needs to be updated, I
think.

-- 8 --
From: Nguyễn Thái Ngọc Duy pclo...@gmail.com
Date: Thu, 13 Dec 2012 08:14:47 +0700
Subject: [PATCH] index-format.txt: clarify what is invalid

A cache-tree entry with a negative entry count is considered invalid
in the current Git; it records that we do not know the object name
of a tree that would result by writing the directory covered by the
cache-tree as a tree object.

Clarify that any entry with a negative entry count is invalid, but
the implementations must write -1 there. This way, we can later
decide to allow writers to use negative values other than -1 to
encode optional information without harming interoperability; we do
not know what is encoded how, so keep these other negative values as
reserved for now.

Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 Documentation/technical/index-format.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/index-format.txt 
b/Documentation/technical/index-format.txt
index 9d25b30..ce28a7a 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -161,8 +161,9 @@ GIT index format
 this span of index as a tree.
 
   An entry can be in an invalidated state and is represented by having
-  -1 in the entry_count field. In this case, there is no object name
-  and the next entry starts immediately after the newline.
+  a negative number in the entry_count field. In this case, there is no
+  object name and the next entry starts immediately after the newline.
+  When writing an invalid entry, -1 should always be used as entry_count.
 
   The entries are written out in the top-down, depth-first order.  The
   first entry represents the root level of the repository, followed by the
-- 
1.8.1.rc1.141.g0ffea5d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] index-format.txt: be more liberal on what can represent invalid cache tree

2012-12-12 Thread Nguyễn Thái Ngọc Duy
We have been writing -1 as invalid since day 1. On that same day we
accept all negative entry counts as invalid. So in theory all C Git
versions out there would be happy to accept any negative numbers. JGit
seems to do exactly the same.

Correct the document to reflect the fact that -1 is not the only magic
number. At least one implementation, libgit2, is found to treat -1
this way.

Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
---
 On Thu, Dec 13, 2012 at 1:14 AM, Junio C Hamano gits...@pobox.com wrote:
  Nguyễn Thái Ngọc Duy  pclo...@gmail.com writes:
 
  We have been writing -1 as invalid since day 1. On that same day we
  accept all negative entry counts as invalid. So in theory all C Git
  versions out there would be happy to accept any negative numbers. JGit
  seems to do exactly the same.
 
  I am of two minds here.
 
  The existing code is being more lenient than specified when they
  read stuff others wrote, but it still adheres to -1 when writing.
  Allowing random implementations to write random negative values will
  close the door for us to later update the specification to encode
  more informatin about these invalid entries by using negative value
  other than -1 here.

 How would that work with existing versions? If you write -2 in
 cache-tree, the next time 1.8.0 updates cache tree it writes -1 back.
 That loses whatever information you attach to -2. A new cache-tree
 extension is probably better.

  I am OK with a reword to say negative means invalid, and writers
  should write -1 for invalid entries, but without the latter half,
  this change is not justified.

 Done.

 Documentation/technical/index-format.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/index-format.txt 
b/Documentation/technical/index-format.txt
index 9d25b30..ce28a7a 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -161,8 +161,9 @@ GIT index format
 this span of index as a tree.
 
   An entry can be in an invalidated state and is represented by having
-  -1 in the entry_count field. In this case, there is no object name
-  and the next entry starts immediately after the newline.
+  a negative number in the entry_count field. In this case, there is no
+  object name and the next entry starts immediately after the newline.
+  When writing an invalid entry, -1 should always be used as entry_count.
 
   The entries are written out in the top-down, depth-first order.  The
   first entry represents the root level of the repository, followed by the
-- 
1.8.0.rc2.23.g1fb49df

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] index-format.txt: be more liberal on what can represent invalid cache tree

2012-12-12 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy  pclo...@gmail.com writes:

  How would that work with existing versions? If you write -2 in
  cache-tree, the next time 1.8.0 updates cache tree it writes -1 back.
  That loses whatever information you attach to -2. A new cache-tree
  extension is probably better.

You can easily imagine a definition like this:

 - If non-negative, the entry is valid and it is the number of index
   entries that are covered by this subtree;

 - If -1, the cached-tree does not know the object name of the tree
   object, and nothing else is known. The caller needs to walk the
   index to skip the entries that could have been covered by this
   subtree, if the cached tree information were valid;

 - If less than -1, the cached-tree does not know the object name of
   the tree object, but we know the number of index entries that are
   covered by this subtree.  The caller, instead of walking the index,
   can subtract the count from -1 and skip that many entries to find
   the index entry after the part that is inside this directory.

Newer Git may write -201 and newer Git may be able to take advantage
of the new information encoded there, saving 200 calls to
prefixcmp(), while given the same index, older Git will operate just
as correctly as before.  An older Git may write that part of the
cache-tree back with -1, defeating the optimization when a newer Git
reads it back, but by definition the older Git does not know what to
write to help that optimization that did not exist when it was done,
and writing -1 will not confuse the newer Git.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html