Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size
On Tue, 10 Sep 2013, Duy Nguyen wrote: On Mon, Sep 9, 2013 at 10:01 PM, Nicolas Pitre n...@fluxnic.net wrote: However this means that the progress meter will now be wrong and that's terrible ! Users *will* complain that the meter doesn't reach 100% and they'll protest for being denied the remaining objects during the transfer ! Joking aside, we should think about doing something about it. I was wondering if some kind of prefix to the pack stream could be inserted onto the wire when sending a pack v4. Something like: 'T', 'H', 'I', 'N', actual_number_of_sent_objects_in_network_order This 8-byte prefix would simply be discarded by index-pack after being parsed. What do you think? I have no problem with this. Although I rather we generalize the case to support multiple packs in the same stream (in some case the server can just stream away one big existing pack, followed by a smaller pack of recent updates), where thin is just a special pack that is not saved on disk. So except for the signature difference, it should at least follow the pack header (sig, version, nr_objects) Except in this case this is not a separate pack. This prefix is there to provide information that is valid only for the pack to follow and therefore cannot be considered as some independent data. Nicolas -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size
nr_objects in the next patch is used to reflect the number of actual objects in the stream, which may be smaller than the number recorded in pack header. Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com --- builtin/index-pack.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 9036f3e..dc9961b 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -80,6 +80,7 @@ static int nr_objects; static int nr_deltas; static int nr_resolved_deltas; static int nr_threads; +static int nr_objects_final; static int from_stdin; static int strict; @@ -297,7 +298,7 @@ static void check_against_sha1table(const unsigned char *sha1) if (!packv4) return; - found = bsearch(sha1, sha1_table, nr_objects, 20, + found = bsearch(sha1, sha1_table, nr_objects_final, 20, (int (*)(const void *, const void *))hashcmp); if (!found) die(_(object %s not found in SHA-1 table), @@ -331,7 +332,7 @@ static const unsigned char *read_sha1ref(void) return sha1; } index--; - if (index = nr_objects) + if (index = nr_objects_final) bad_object(consumed_bytes, _(bad index in read_sha1ref)); return sha1_table + index * 20; @@ -340,7 +341,7 @@ static const unsigned char *read_sha1ref(void) static const unsigned char *read_sha1table_ref(void) { const unsigned char *sha1 = read_sha1ref(); - if (sha1 sha1_table || sha1 = sha1_table + nr_objects * 20) + if (sha1 sha1_table || sha1 = sha1_table + nr_objects_final * 20) check_against_sha1table(sha1); return sha1; } @@ -392,7 +393,7 @@ static void parse_pack_header(void) die(_(pack version %PRIu32 unsupported), ntohl(hdr-hdr_version)); - nr_objects = ntohl(hdr-hdr_entries); + nr_objects_final = nr_objects = ntohl(hdr-hdr_entries); use(sizeof(struct pack_header)); } @@ -1472,9 +1473,9 @@ static void parse_dictionaries(void) if (!packv4) return; - sha1_table = xmalloc(20 * nr_objects); + sha1_table = xmalloc(20 * nr_objects_final); hashcpy(sha1_table, fill_and_use(20)); - for (i = 1; i nr_objects; i++) { + for (i = 1; i nr_objects_final; i++) { unsigned char *p = sha1_table + i * 20; hashcpy(p, fill_and_use(20)); if (hashcmp(p - 20, p) = 0) -- 1.8.2.83.gc99314b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size
On Mon, 9 Sep 2013, Nguyễn Thái Ngọc Duy wrote: nr_objects in the next patch is used to reflect the number of actual objects in the stream, which may be smaller than the number recorded in pack header. This highlights an issue that has been nagging me for a while. We decided to send the final number of objects in the thin pack header for two reasons: 1) it allows to properly size the SHA1 table upfront which already contains entries for the omitted objects; 2) the whole pack doesn't have to be re-summed again after being completed on the receiving end since we don't alter the header. However this means that the progress meter will now be wrong and that's terrible ! Users *will* complain that the meter doesn't reach 100% and they'll protest for being denied the remaining objects during the transfer ! Joking aside, we should think about doing something about it. I was wondering if some kind of prefix to the pack stream could be inserted onto the wire when sending a pack v4. Something like: 'T', 'H', 'I', 'N', actual_number_of_sent_objects_in_network_order This 8-byte prefix would simply be discarded by index-pack after being parsed. What do you think? Nicolas
Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size
Nicolas Pitre n...@fluxnic.net writes: ... I was wondering if some kind of prefix to the pack stream could be inserted onto the wire when sending a pack v4. Something like: 'T', 'H', 'I', 'N', actual_number_of_sent_objects_in_network_order This 8-byte prefix would simply be discarded by index-pack after being parsed. What do you think? I do not think it is _too_ bad if the meter jumped from 92% to 100% when we finish reading from the other end ;-), as long as we can reliably tell that we read the right thing. Sure. but eventually people will complain about this. So while we're about to introduce a new pack format anyway, better think of this little cosmetic detail now when it can be included in the pack v4 capability negociation. Oh, I completely agree on that part. When we send a self-contained pack, would we send nothing? That is, should the receiving end expect and rely on that the sending end will send a thin pack and never a fat pack when asked to send a thin pack (and vice versa)? Also should we make the even though we have negotiated the protocol parameters, after enumerating the objects and deciding what the pack stream would look like, we have a bit more information to tell you the sending side gives the receiver extensible? I am wondering if that prefix needs something like end of prefix marker (or here comes N-bytes worth of prefix information upfront); we probably do not need it, as the capability exchange will determine what kind of information will be sent (e.g. actual objects in the thin pack data stream). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size
On Mon, 9 Sep 2013, Junio C Hamano wrote: Nicolas Pitre n...@fluxnic.net writes: On Mon, 9 Sep 2013, Nguyễn Thái Ngọc Duy wrote: nr_objects in the next patch is used to reflect the number of actual objects in the stream, which may be smaller than the number recorded in pack header. This highlights an issue that has been nagging me for a while. We decided to send the final number of objects in the thin pack header for two reasons: 1) it allows to properly size the SHA1 table upfront which already contains entries for the omitted objects; 2) the whole pack doesn't have to be re-summed again after being completed on the receiving end since we don't alter the header. However this means that the progress meter will now be wrong and that's terrible ! Users *will* complain that the meter doesn't reach 100% and they'll protest for being denied the remaining objects during the transfer ! Joking aside, we should think about doing something about it. I was wondering if some kind of prefix to the pack stream could be inserted onto the wire when sending a pack v4. Something like: 'T', 'H', 'I', 'N', actual_number_of_sent_objects_in_network_order This 8-byte prefix would simply be discarded by index-pack after being parsed. What do you think? I do not think it is _too_ bad if the meter jumped from 92% to 100% when we finish reading from the other end ;-), as long as we can reliably tell that we read the right thing. Sure. but eventually people will complain about this. So while we're about to introduce a new pack format anyway, better think of this little cosmetic detail now when it can be included in the pack v4 capability negociation. Which brings me to a tangent. Do we have a means to make sure that the data received over the wire is bit-for-bit correct as a whole when it is a thin pack stream? When it is a non-thin pack stream, we have the checksum at the end added by sha1close() which index-pack.c::parse_pack_objects() can (and does) verify. The trailing checksum is still there. Nothing has changed in that regard. Nicolas
Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size
Nicolas Pitre n...@fluxnic.net writes: On Mon, 9 Sep 2013, Nguyễn Thái Ngọc Duy wrote: nr_objects in the next patch is used to reflect the number of actual objects in the stream, which may be smaller than the number recorded in pack header. This highlights an issue that has been nagging me for a while. We decided to send the final number of objects in the thin pack header for two reasons: 1) it allows to properly size the SHA1 table upfront which already contains entries for the omitted objects; 2) the whole pack doesn't have to be re-summed again after being completed on the receiving end since we don't alter the header. However this means that the progress meter will now be wrong and that's terrible ! Users *will* complain that the meter doesn't reach 100% and they'll protest for being denied the remaining objects during the transfer ! Joking aside, we should think about doing something about it. I was wondering if some kind of prefix to the pack stream could be inserted onto the wire when sending a pack v4. Something like: 'T', 'H', 'I', 'N', actual_number_of_sent_objects_in_network_order This 8-byte prefix would simply be discarded by index-pack after being parsed. What do you think? I do not think it is _too_ bad if the meter jumped from 92% to 100% when we finish reading from the other end ;-), as long as we can reliably tell that we read the right thing. Which brings me to a tangent. Do we have a means to make sure that the data received over the wire is bit-for-bit correct as a whole when it is a thin pack stream? When it is a non-thin pack stream, we have the checksum at the end added by sha1close() which index-pack.c::parse_pack_objects() can (and does) verify. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size
On Mon, 9 Sep 2013, Junio C Hamano wrote: Nicolas Pitre n...@fluxnic.net writes: ... I was wondering if some kind of prefix to the pack stream could be inserted onto the wire when sending a pack v4. Something like: 'T', 'H', 'I', 'N', actual_number_of_sent_objects_in_network_order This 8-byte prefix would simply be discarded by index-pack after being parsed. What do you think? I do not think it is _too_ bad if the meter jumped from 92% to 100% when we finish reading from the other end ;-), as long as we can reliably tell that we read the right thing. Sure. but eventually people will complain about this. So while we're about to introduce a new pack format anyway, better think of this little cosmetic detail now when it can be included in the pack v4 capability negociation. Oh, I completely agree on that part. When we send a self-contained pack, would we send nothing? That is, should the receiving end expect and rely on that the sending end will send a thin pack and never a fat pack when asked to send a thin pack (and vice versa)? Also should we make the even though we have negotiated the protocol parameters, after enumerating the objects and deciding what the pack stream would look like, we have a bit more information to tell you the sending side gives the receiver extensible? I am wondering if that prefix needs something like end of prefix marker (or here comes N-bytes worth of prefix information upfront); we probably do not need it, as the capability exchange will determine what kind of information will be sent (e.g. actual objects in the thin pack data stream). Do we know the actual number of objects to send during the capability negociation? I don't think so as this is known only after the compressing objects phase, and that already depends on the capability negociation before it can start. Nicolas -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size
Nicolas Pitre n...@fluxnic.net writes: Do we know the actual number of objects to send during the capability negociation? No, and that is not what I meant. We know upfront after capability negotiation (by seeing a request to give them a thin-pack) that we will send, in addition to the usual packfile, the prefix that carries that information and that is the important part. That lets the receiver decide whether to _expect_ to see the prefix or no prefix. Without such, there needs some clue in the prefix part itself if there are prefixes that carry information computed after capability negotiation finished (i.e. after object enumeration). Sorry if I was unclear. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size
On Mon, 9 Sep 2013, Junio C Hamano wrote: Nicolas Pitre n...@fluxnic.net writes: Do we know the actual number of objects to send during the capability negociation? No, and that is not what I meant. We know upfront after capability negotiation (by seeing a request to give them a thin-pack) that we will send, in addition to the usual packfile, the prefix that carries that information and that is the important part. That lets the receiver decide whether to _expect_ to see the prefix or no prefix. Without such, there needs some clue in the prefix part itself if there are prefixes that carry information computed after capability negotiation finished (i.e. after object enumeration). In this case, if negociation concludes on thin and pack-version=4 then that could mean there is a prefix to be expected. Nicolas -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 15/16] index-pack: use nr_objects_final as sha1_table size
On Mon, Sep 9, 2013 at 10:01 PM, Nicolas Pitre n...@fluxnic.net wrote: However this means that the progress meter will now be wrong and that's terrible ! Users *will* complain that the meter doesn't reach 100% and they'll protest for being denied the remaining objects during the transfer ! Joking aside, we should think about doing something about it. I was wondering if some kind of prefix to the pack stream could be inserted onto the wire when sending a pack v4. Something like: 'T', 'H', 'I', 'N', actual_number_of_sent_objects_in_network_order This 8-byte prefix would simply be discarded by index-pack after being parsed. What do you think? I have no problem with this. Although I rather we generalize the case to support multiple packs in the same stream (in some case the server can just stream away one big existing pack, followed by a smaller pack of recent updates), where thin is just a special pack that is not saved on disk. So except for the signature difference, it should at least follow the pack header (sig, version, nr_objects) -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html