Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size

2013-09-12 Thread Nicolas Pitre
On Tue, 10 Sep 2013, Duy Nguyen wrote:

 On Mon, Sep 9, 2013 at 10:01 PM, Nicolas Pitre n...@fluxnic.net wrote:
  However this means that the progress meter will now be wrong and that's
  terrible !  Users *will* complain that the meter doesn't reach 100% and
  they'll protest for being denied the remaining objects during the
  transfer !
 
  Joking aside, we should think about doing something about it.  I was
  wondering if some kind of prefix to the pack stream could be inserted
  onto the wire when sending a pack v4.  Something like:
 
  'T', 'H', 'I', 'N', actual_number_of_sent_objects_in_network_order
 
  This 8-byte prefix would simply be discarded by index-pack after being
  parsed.
 
  What do you think?
 
 I have no problem with this. Although I rather we generalize the case
 to support multiple packs in the same stream (in some case the server
 can just stream away one big existing pack, followed by a smaller pack
 of recent updates), where thin is just a special pack that is not
 saved on disk. So except for the signature difference, it should at
 least follow the pack header (sig, version, nr_objects)

Except in this case this is not a separate pack.  This prefix is there 
to provide information that is valid only for the pack to follow and 
therefore cannot be considered as some independent data.


Nicolas
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size

2013-09-09 Thread Nguyễn Thái Ngọc Duy
nr_objects in the next patch is used to reflect the number of actual
objects in the stream, which may be smaller than the number recorded
in pack header.

Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
---
 builtin/index-pack.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 9036f3e..dc9961b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -80,6 +80,7 @@ static int nr_objects;
 static int nr_deltas;
 static int nr_resolved_deltas;
 static int nr_threads;
+static int nr_objects_final;
 
 static int from_stdin;
 static int strict;
@@ -297,7 +298,7 @@ static void check_against_sha1table(const unsigned char 
*sha1)
if (!packv4)
return;
 
-   found = bsearch(sha1, sha1_table, nr_objects, 20,
+   found = bsearch(sha1, sha1_table, nr_objects_final, 20,
(int (*)(const void *, const void *))hashcmp);
if (!found)
die(_(object %s not found in SHA-1 table),
@@ -331,7 +332,7 @@ static const unsigned char *read_sha1ref(void)
return sha1;
}
index--;
-   if (index = nr_objects)
+   if (index = nr_objects_final)
bad_object(consumed_bytes,
   _(bad index in read_sha1ref));
return sha1_table + index * 20;
@@ -340,7 +341,7 @@ static const unsigned char *read_sha1ref(void)
 static const unsigned char *read_sha1table_ref(void)
 {
const unsigned char *sha1 = read_sha1ref();
-   if (sha1  sha1_table || sha1 = sha1_table + nr_objects * 20)
+   if (sha1  sha1_table || sha1 = sha1_table + nr_objects_final * 20)
check_against_sha1table(sha1);
return sha1;
 }
@@ -392,7 +393,7 @@ static void parse_pack_header(void)
die(_(pack version %PRIu32 unsupported),
ntohl(hdr-hdr_version));
 
-   nr_objects = ntohl(hdr-hdr_entries);
+   nr_objects_final = nr_objects = ntohl(hdr-hdr_entries);
use(sizeof(struct pack_header));
 }
 
@@ -1472,9 +1473,9 @@ static void parse_dictionaries(void)
if (!packv4)
return;
 
-   sha1_table = xmalloc(20 * nr_objects);
+   sha1_table = xmalloc(20 * nr_objects_final);
hashcpy(sha1_table, fill_and_use(20));
-   for (i = 1; i  nr_objects; i++) {
+   for (i = 1; i  nr_objects_final; i++) {
unsigned char *p = sha1_table + i * 20;
hashcpy(p, fill_and_use(20));
if (hashcmp(p - 20, p) = 0)
-- 
1.8.2.83.gc99314b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size

2013-09-09 Thread Nicolas Pitre
On Mon, 9 Sep 2013, Nguyễn Thái Ngọc Duy wrote:

 nr_objects in the next patch is used to reflect the number of actual
 objects in the stream, which may be smaller than the number recorded
 in pack header.

This highlights an issue that has been nagging me for a while.

We decided to send the final number of objects in the thin pack header 
for two reasons:

1) it allows to properly size the SHA1 table upfront which already 
   contains entries for the omitted objects;

2) the whole pack doesn't have to be re-summed again after being 
   completed on the receiving end since we don't alter the header.

However this means that the progress meter will now be wrong and that's 
terrible !  Users *will* complain that the meter doesn't reach 100% and 
they'll protest for being denied the remaining objects during the 
transfer !

Joking aside, we should think about doing something about it.  I was 
wondering if some kind of prefix to the pack stream could be inserted 
onto the wire when sending a pack v4.  Something like:

'T', 'H', 'I', 'N', actual_number_of_sent_objects_in_network_order

This 8-byte prefix would simply be discarded by index-pack after being 
parsed.

What do you think?


Nicolas


Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size

2013-09-09 Thread Junio C Hamano
Nicolas Pitre n...@fluxnic.net writes:

  ...  I was 
  wondering if some kind of prefix to the pack stream could be inserted 
  onto the wire when sending a pack v4.  Something like:
 
  'T', 'H', 'I', 'N', actual_number_of_sent_objects_in_network_order
 
  This 8-byte prefix would simply be discarded by index-pack after being 
  parsed.
 
  What do you think?
 
 I do not think it is _too_ bad if the meter jumped from 92% to 100%
 when we finish reading from the other end ;-), as long as we can
 reliably tell that we read the right thing.

 Sure.  but eventually people will complain about this.  So while we're 
 about to introduce a new pack format anyway, better think of this little 
 cosmetic detail now when it can be included in the pack v4 capability 
 negociation.

Oh, I completely agree on that part.  When we send a self-contained
pack, would we send nothing?  That is, should the receiving end
expect and rely on that the sending end will send a thin pack and
never a fat pack when asked to send a thin pack (and vice versa)?

Also should we make the even though we have negotiated the protocol
parameters, after enumerating the objects and deciding what the pack
stream would look like, we have a bit more information to tell you
the sending side gives the receiver extensible?  I am wondering if
that prefix needs something like end of prefix marker (or here
comes N-bytes worth of prefix information upfront); we probably do
not need it, as the capability exchange will determine what kind of
information will be sent (e.g. actual objects in the thin pack data
stream).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size

2013-09-09 Thread Nicolas Pitre
On Mon, 9 Sep 2013, Junio C Hamano wrote:

 Nicolas Pitre n...@fluxnic.net writes:
 
  On Mon, 9 Sep 2013, Nguyễn Thái Ngọc Duy wrote:
 
  nr_objects in the next patch is used to reflect the number of actual
  objects in the stream, which may be smaller than the number recorded
  in pack header.
 
  This highlights an issue that has been nagging me for a while.
 
  We decided to send the final number of objects in the thin pack header 
  for two reasons:
 
  1) it allows to properly size the SHA1 table upfront which already 
 contains entries for the omitted objects;
 
  2) the whole pack doesn't have to be re-summed again after being 
 completed on the receiving end since we don't alter the header.
 
  However this means that the progress meter will now be wrong and that's 
  terrible !  Users *will* complain that the meter doesn't reach 100% and 
  they'll protest for being denied the remaining objects during the 
  transfer !
 
  Joking aside, we should think about doing something about it.  I was 
  wondering if some kind of prefix to the pack stream could be inserted 
  onto the wire when sending a pack v4.  Something like:
 
  'T', 'H', 'I', 'N', actual_number_of_sent_objects_in_network_order
 
  This 8-byte prefix would simply be discarded by index-pack after being 
  parsed.
 
  What do you think?
 
 I do not think it is _too_ bad if the meter jumped from 92% to 100%
 when we finish reading from the other end ;-), as long as we can
 reliably tell that we read the right thing.

Sure.  but eventually people will complain about this.  So while we're 
about to introduce a new pack format anyway, better think of this little 
cosmetic detail now when it can be included in the pack v4 capability 
negociation.

 Which brings me to a tangent.  Do we have a means to make sure that
 the data received over the wire is bit-for-bit correct as a whole
 when it is a thin pack stream?  When it is a non-thin pack stream,
 we have the checksum at the end added by sha1close() which
 index-pack.c::parse_pack_objects() can (and does) verify.

The trailing checksum is still there.  Nothing has changed in that 
regard.


Nicolas


Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size

2013-09-09 Thread Junio C Hamano
Nicolas Pitre n...@fluxnic.net writes:

 On Mon, 9 Sep 2013, Nguyễn Thái Ngọc Duy wrote:

 nr_objects in the next patch is used to reflect the number of actual
 objects in the stream, which may be smaller than the number recorded
 in pack header.

 This highlights an issue that has been nagging me for a while.

 We decided to send the final number of objects in the thin pack header 
 for two reasons:

 1) it allows to properly size the SHA1 table upfront which already 
contains entries for the omitted objects;

 2) the whole pack doesn't have to be re-summed again after being 
completed on the receiving end since we don't alter the header.

 However this means that the progress meter will now be wrong and that's 
 terrible !  Users *will* complain that the meter doesn't reach 100% and 
 they'll protest for being denied the remaining objects during the 
 transfer !

 Joking aside, we should think about doing something about it.  I was 
 wondering if some kind of prefix to the pack stream could be inserted 
 onto the wire when sending a pack v4.  Something like:

 'T', 'H', 'I', 'N', actual_number_of_sent_objects_in_network_order

 This 8-byte prefix would simply be discarded by index-pack after being 
 parsed.

 What do you think?

I do not think it is _too_ bad if the meter jumped from 92% to 100%
when we finish reading from the other end ;-), as long as we can
reliably tell that we read the right thing.

Which brings me to a tangent.  Do we have a means to make sure that
the data received over the wire is bit-for-bit correct as a whole
when it is a thin pack stream?  When it is a non-thin pack stream,
we have the checksum at the end added by sha1close() which
index-pack.c::parse_pack_objects() can (and does) verify.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size

2013-09-09 Thread Nicolas Pitre
On Mon, 9 Sep 2013, Junio C Hamano wrote:

 Nicolas Pitre n...@fluxnic.net writes:
 
   ...  I was 
   wondering if some kind of prefix to the pack stream could be inserted 
   onto the wire when sending a pack v4.  Something like:
  
   'T', 'H', 'I', 'N', actual_number_of_sent_objects_in_network_order
  
   This 8-byte prefix would simply be discarded by index-pack after being 
   parsed.
  
   What do you think?
  
  I do not think it is _too_ bad if the meter jumped from 92% to 100%
  when we finish reading from the other end ;-), as long as we can
  reliably tell that we read the right thing.
 
  Sure.  but eventually people will complain about this.  So while we're 
  about to introduce a new pack format anyway, better think of this little 
  cosmetic detail now when it can be included in the pack v4 capability 
  negociation.
 
 Oh, I completely agree on that part.  When we send a self-contained
 pack, would we send nothing?  That is, should the receiving end
 expect and rely on that the sending end will send a thin pack and
 never a fat pack when asked to send a thin pack (and vice versa)?
 
 Also should we make the even though we have negotiated the protocol
 parameters, after enumerating the objects and deciding what the pack
 stream would look like, we have a bit more information to tell you
 the sending side gives the receiver extensible?  I am wondering if
 that prefix needs something like end of prefix marker (or here
 comes N-bytes worth of prefix information upfront); we probably do
 not need it, as the capability exchange will determine what kind of
 information will be sent (e.g. actual objects in the thin pack data
 stream).

Do we know the actual number of objects to send during the capability 
negociation?  I don't think so as this is known only after the 
compressing objects phase, and that already depends on the capability 
negociation before it can start.


Nicolas
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size

2013-09-09 Thread Junio C Hamano
Nicolas Pitre n...@fluxnic.net writes:

 Do we know the actual number of objects to send during the capability 
 negociation?

No, and that is not what I meant.  We know upfront after capability
negotiation (by seeing a request to give them a thin-pack) that we
will send, in addition to the usual packfile, the prefix that
carries that information and that is the important part.  That lets
the receiver decide whether to _expect_ to see the prefix or no
prefix.  Without such, there needs some clue in the prefix part
itself if there are prefixes that carry information computed after
capability negotiation finished (i.e. after object enumeration).

Sorry if I was unclear.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size

2013-09-09 Thread Nicolas Pitre
On Mon, 9 Sep 2013, Junio C Hamano wrote:

 Nicolas Pitre n...@fluxnic.net writes:
 
  Do we know the actual number of objects to send during the capability 
  negociation?
 
 No, and that is not what I meant.  We know upfront after capability
 negotiation (by seeing a request to give them a thin-pack) that we
 will send, in addition to the usual packfile, the prefix that
 carries that information and that is the important part.  That lets
 the receiver decide whether to _expect_ to see the prefix or no
 prefix.  Without such, there needs some clue in the prefix part
 itself if there are prefixes that carry information computed after
 capability negotiation finished (i.e. after object enumeration).

In this case, if negociation concludes on thin and pack-version=4 
then that could mean there is a prefix to be expected.


Nicolas
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size

2013-09-09 Thread Duy Nguyen
On Mon, Sep 9, 2013 at 10:01 PM, Nicolas Pitre n...@fluxnic.net wrote:
 However this means that the progress meter will now be wrong and that's
 terrible !  Users *will* complain that the meter doesn't reach 100% and
 they'll protest for being denied the remaining objects during the
 transfer !

 Joking aside, we should think about doing something about it.  I was
 wondering if some kind of prefix to the pack stream could be inserted
 onto the wire when sending a pack v4.  Something like:

 'T', 'H', 'I', 'N', actual_number_of_sent_objects_in_network_order

 This 8-byte prefix would simply be discarded by index-pack after being
 parsed.

 What do you think?

I have no problem with this. Although I rather we generalize the case
to support multiple packs in the same stream (in some case the server
can just stream away one big existing pack, followed by a smaller pack
of recent updates), where thin is just a special pack that is not
saved on disk. So except for the signature difference, it should at
least follow the pack header (sig, version, nr_objects)
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html