Re: [PATCH 08/41] packfile: abstract away hash constant values

2018-05-02 Thread brian m. carlson
On Wed, May 02, 2018 at 05:26:25PM +0200, Duy Nguyen wrote:
> On Wed, May 2, 2018 at 2:11 AM, brian m. carlson
>  wrote:
> > On Tue, May 01, 2018 at 12:22:43PM +0200, Duy Nguyen wrote:
> >> While we're abstracting away 20. There's the only 20 left in
> >> sha1_file.c that should also be gone. But I guess you could do that
> >> later since you need to rename fill_sha1_path to
> >> fill_loose_object_path or something.
> >
> > I'm already working on knocking those out.
> 
> Yeah I checked out your part14 branch after writing this note :P You
> still need to rename the function though. I can remind that again when
> part14 is sent out.

I've made a note in my project notes.

> > Unfortunately, I can't, because it's not an object ID.  I think the
> > decision was made to not transform non-object ID hashes into struct
> > object_id, which makes sense.  I suppose we could have an equivalent
> > struct hash or something for those other uses.
> 
> I probably miss something, is hashcmp() supposed to stay after the
> conversion? And will it compare any hash (as configured in the_algo)
> or will it for SHA-1 only?

Yes, it will stick around for the handful of places where we have hashes
like pack checksums.

> If hashcmp() will eventually compare the_algo->rawsz then yes this makes 
> sense.

That's my intention, yes.

> > I believe this is the pack hash, which isn't an object ID.  I will
> > transform it to be called something other than "sha1" and allocate more
> > memory for it in a future series, though.
> 
> Ah ok, it's the "sha1" in the name that bugged me. I'm all good then.

Also noted in my project notes.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 08/41] packfile: abstract away hash constant values

2018-05-02 Thread Duy Nguyen
On Wed, May 2, 2018 at 2:11 AM, brian m. carlson
 wrote:
> On Tue, May 01, 2018 at 12:22:43PM +0200, Duy Nguyen wrote:
>> On Mon, Apr 23, 2018 at 11:39:18PM +, brian m. carlson wrote:
>> > There are several instances of the constant 20 and 20-based values in
>> > the packfile code.  Abstract away dependence on SHA-1 by using the
>> > values from the_hash_algo instead.
>>
>> While we're abstracting away 20. There's the only 20 left in
>> sha1_file.c that should also be gone. But I guess you could do that
>> later since you need to rename fill_sha1_path to
>> fill_loose_object_path or something.
>
> I'm already working on knocking those out.

Yeah I checked out your part14 branch after writing this note :P You
still need to rename the function though. I can remind that again when
part14 is sent out.

>> > @@ -507,15 +509,15 @@ static int open_packed_git_1(struct packed_git *p)
>> >  " while index indicates %"PRIu32" objects",
>> >  p->pack_name, ntohl(hdr.hdr_entries),
>> >  p->num_objects);
>> > -   if (lseek(p->pack_fd, p->pack_size - sizeof(sha1), SEEK_SET) == -1)
>> > +   if (lseek(p->pack_fd, p->pack_size - hashsz, SEEK_SET) == -1)
>> > return error("end of packfile %s is unavailable", 
>> > p->pack_name);
>> > -   read_result = read_in_full(p->pack_fd, sha1, sizeof(sha1));
>> > +   read_result = read_in_full(p->pack_fd, hash, hashsz);
>> > if (read_result < 0)
>> > return error_errno("error reading from %s", p->pack_name);
>> > -   if (read_result != sizeof(sha1))
>> > +   if (read_result != hashsz)
>> > return error("packfile %s signature is unavailable", 
>> > p->pack_name);
>> > -   idx_sha1 = ((unsigned char *)p->index_data) + p->index_size - 40;
>> > -   if (hashcmp(sha1, idx_sha1))
>> > +   idx_hash = ((unsigned char *)p->index_data) + p->index_size - hashsz * 
>> > 2;
>> > +   if (hashcmp(hash, idx_hash))
>>
>> Since the hash size is abstracted away, shouldn't this hashcmp become
>> oidcmp? (which still does not do the right thing, but at least it's
>> one less place to worry about)
>
> Unfortunately, I can't, because it's not an object ID.  I think the
> decision was made to not transform non-object ID hashes into struct
> object_id, which makes sense.  I suppose we could have an equivalent
> struct hash or something for those other uses.

I probably miss something, is hashcmp() supposed to stay after the
conversion? And will it compare any hash (as configured in the_algo)
or will it for SHA-1 only?

If hashcmp() will eventually compare the_algo->rawsz then yes this makes sense.

>> > @@ -675,7 +677,8 @@ struct packed_git *add_packed_git(const char *path, 
>> > size_t path_len, int local)
>> > p->pack_size = st.st_size;
>> > p->pack_local = local;
>> > p->mtime = st.st_mtime;
>> > -   if (path_len < 40 || get_sha1_hex(path + path_len - 40, p->sha1))
>> > +   if (path_len < the_hash_algo->hexsz ||
>> > +   get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->sha1))
>>
>> get_sha1_hex looks out of place when we start going with
>> the_hash_algo. Maybe change to get_oid_hex() too.
>
> I believe this is the pack hash, which isn't an object ID.  I will
> transform it to be called something other than "sha1" and allocate more
> memory for it in a future series, though.

Ah ok, it's the "sha1" in the name that bugged me. I'm all good then.
-- 
Duy


Re: [PATCH 08/41] packfile: abstract away hash constant values

2018-05-01 Thread brian m. carlson
On Tue, May 01, 2018 at 12:22:43PM +0200, Duy Nguyen wrote:
> On Mon, Apr 23, 2018 at 11:39:18PM +, brian m. carlson wrote:
> > There are several instances of the constant 20 and 20-based values in
> > the packfile code.  Abstract away dependence on SHA-1 by using the
> > values from the_hash_algo instead.
> 
> While we're abstracting away 20. There's the only 20 left in
> sha1_file.c that should also be gone. But I guess you could do that
> later since you need to rename fill_sha1_path to
> fill_loose_object_path or something.

I'm already working on knocking those out.

> > @@ -507,15 +509,15 @@ static int open_packed_git_1(struct packed_git *p)
> >  " while index indicates %"PRIu32" objects",
> >  p->pack_name, ntohl(hdr.hdr_entries),
> >  p->num_objects);
> > -   if (lseek(p->pack_fd, p->pack_size - sizeof(sha1), SEEK_SET) == -1)
> > +   if (lseek(p->pack_fd, p->pack_size - hashsz, SEEK_SET) == -1)
> > return error("end of packfile %s is unavailable", p->pack_name);
> > -   read_result = read_in_full(p->pack_fd, sha1, sizeof(sha1));
> > +   read_result = read_in_full(p->pack_fd, hash, hashsz);
> > if (read_result < 0)
> > return error_errno("error reading from %s", p->pack_name);
> > -   if (read_result != sizeof(sha1))
> > +   if (read_result != hashsz)
> > return error("packfile %s signature is unavailable", 
> > p->pack_name);
> > -   idx_sha1 = ((unsigned char *)p->index_data) + p->index_size - 40;
> > -   if (hashcmp(sha1, idx_sha1))
> > +   idx_hash = ((unsigned char *)p->index_data) + p->index_size - hashsz * 
> > 2;
> > +   if (hashcmp(hash, idx_hash))
> 
> Since the hash size is abstracted away, shouldn't this hashcmp become
> oidcmp? (which still does not do the right thing, but at least it's
> one less place to worry about)

Unfortunately, I can't, because it's not an object ID.  I think the
decision was made to not transform non-object ID hashes into struct
object_id, which makes sense.  I suppose we could have an equivalent
struct hash or something for those other uses.

> Same comment for other hashcmp in this patch.
> 
> > @@ -675,7 +677,8 @@ struct packed_git *add_packed_git(const char *path, 
> > size_t path_len, int local)
> > p->pack_size = st.st_size;
> > p->pack_local = local;
> > p->mtime = st.st_mtime;
> > -   if (path_len < 40 || get_sha1_hex(path + path_len - 40, p->sha1))
> > +   if (path_len < the_hash_algo->hexsz ||
> > +   get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->sha1))
> 
> get_sha1_hex looks out of place when we start going with
> the_hash_algo. Maybe change to get_oid_hex() too.

I believe this is the pack hash, which isn't an object ID.  I will
transform it to be called something other than "sha1" and allocate more
memory for it in a future series, though.

> > @@ -1678,10 +1683,10 @@ int bsearch_pack(const struct object_id *oid, const 
> > struct packed_git *p, uint32
> >  
> > index_lookup = index_fanout + 4 * 256;
> > if (p->index_version == 1) {
> > -   index_lookup_width = 24;
> > +   index_lookup_width = hashsz + 4;
> 
> You did good research to spot this 24 constant ;-)

Thanks.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 08/41] packfile: abstract away hash constant values

2018-05-01 Thread Duy Nguyen
On Mon, Apr 23, 2018 at 11:39:18PM +, brian m. carlson wrote:
> There are several instances of the constant 20 and 20-based values in
> the packfile code.  Abstract away dependence on SHA-1 by using the
> values from the_hash_algo instead.

While we're abstracting away 20. There's the only 20 left in
sha1_file.c that should also be gone. But I guess you could do that
later since you need to rename fill_sha1_path to
fill_loose_object_path or something.


> @@ -507,15 +509,15 @@ static int open_packed_git_1(struct packed_git *p)
>" while index indicates %"PRIu32" objects",
>p->pack_name, ntohl(hdr.hdr_entries),
>p->num_objects);
> - if (lseek(p->pack_fd, p->pack_size - sizeof(sha1), SEEK_SET) == -1)
> + if (lseek(p->pack_fd, p->pack_size - hashsz, SEEK_SET) == -1)
>   return error("end of packfile %s is unavailable", p->pack_name);
> - read_result = read_in_full(p->pack_fd, sha1, sizeof(sha1));
> + read_result = read_in_full(p->pack_fd, hash, hashsz);
>   if (read_result < 0)
>   return error_errno("error reading from %s", p->pack_name);
> - if (read_result != sizeof(sha1))
> + if (read_result != hashsz)
>   return error("packfile %s signature is unavailable", 
> p->pack_name);
> - idx_sha1 = ((unsigned char *)p->index_data) + p->index_size - 40;
> - if (hashcmp(sha1, idx_sha1))
> + idx_hash = ((unsigned char *)p->index_data) + p->index_size - hashsz * 
> 2;
> + if (hashcmp(hash, idx_hash))

Since the hash size is abstracted away, shouldn't this hashcmp become
oidcmp? (which still does not do the right thing, but at least it's
one less place to worry about)

Same comment for other hashcmp in this patch.

> @@ -675,7 +677,8 @@ struct packed_git *add_packed_git(const char *path, 
> size_t path_len, int local)
>   p->pack_size = st.st_size;
>   p->pack_local = local;
>   p->mtime = st.st_mtime;
> - if (path_len < 40 || get_sha1_hex(path + path_len - 40, p->sha1))
> + if (path_len < the_hash_algo->hexsz ||
> + get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->sha1))

get_sha1_hex looks out of place when we start going with
the_hash_algo. Maybe change to get_oid_hex() too.

> @@ -1678,10 +1683,10 @@ int bsearch_pack(const struct object_id *oid, const 
> struct packed_git *p, uint32
>  
>   index_lookup = index_fanout + 4 * 256;
>   if (p->index_version == 1) {
> - index_lookup_width = 24;
> + index_lookup_width = hashsz + 4;

You did good research to spot this 24 constant ;-)

--
Duy


[PATCH 08/41] packfile: abstract away hash constant values

2018-04-23 Thread brian m. carlson
There are several instances of the constant 20 and 20-based values in
the packfile code.  Abstract away dependence on SHA-1 by using the
values from the_hash_algo instead.

Use unsigned values for temporary constants to provide the compiler with
more information about what kinds of values it should expect.

Signed-off-by: brian m. carlson 
---
 packfile.c | 66 ++
 1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/packfile.c b/packfile.c
index 84acd405e0..b7bc4eab17 100644
--- a/packfile.c
+++ b/packfile.c
@@ -84,6 +84,7 @@ static int check_packed_git_idx(const char *path, struct 
packed_git *p)
uint32_t version, nr, i, *index;
int fd = git_open(path);
struct stat st;
+   const unsigned int hashsz = the_hash_algo->rawsz;
 
if (fd < 0)
return -1;
@@ -92,7 +93,7 @@ static int check_packed_git_idx(const char *path, struct 
packed_git *p)
return -1;
}
idx_size = xsize_t(st.st_size);
-   if (idx_size < 4 * 256 + 20 + 20) {
+   if (idx_size < 4 * 256 + hashsz + hashsz) {
close(fd);
return error("index file %s is too small", path);
}
@@ -129,11 +130,11 @@ static int check_packed_git_idx(const char *path, struct 
packed_git *p)
/*
 * Total size:
 *  - 256 index entries 4 bytes each
-*  - 24-byte entries * nr (20-byte sha1 + 4-byte offset)
-*  - 20-byte SHA1 of the packfile
-*  - 20-byte SHA1 file checksum
+*  - 24-byte entries * nr (object ID + 4-byte offset)
+*  - hash of the packfile
+*  - file checksum
 */
-   if (idx_size != 4*256 + nr * 24 + 20 + 20) {
+   if (idx_size != 4*256 + nr * (hashsz + 4) + hashsz + hashsz) {
munmap(idx_map, idx_size);
return error("wrong index v1 file size in %s", path);
}
@@ -142,16 +143,16 @@ static int check_packed_git_idx(const char *path, struct 
packed_git *p)
 * Minimum size:
 *  - 8 bytes of header
 *  - 256 index entries 4 bytes each
-*  - 20-byte sha1 entry * nr
+*  - object ID entry * nr
 *  - 4-byte crc entry * nr
 *  - 4-byte offset entry * nr
-*  - 20-byte SHA1 of the packfile
-*  - 20-byte SHA1 file checksum
+*  - hash of the packfile
+*  - file checksum
 * And after the 4-byte offset table might be a
 * variable sized table containing 8-byte entries
 * for offsets larger than 2^31.
 */
-   unsigned long min_size = 8 + 4*256 + nr*(20 + 4 + 4) + 20 + 20;
+   unsigned long min_size = 8 + 4*256 + nr*(hashsz + 4 + 4) + 
hashsz + hashsz;
unsigned long max_size = min_size;
if (nr)
max_size += (nr - 1)*8;
@@ -444,10 +445,11 @@ static int open_packed_git_1(struct packed_git *p)
 {
struct stat st;
struct pack_header hdr;
-   unsigned char sha1[20];
-   unsigned char *idx_sha1;
+   unsigned char hash[GIT_MAX_RAWSZ];
+   unsigned char *idx_hash;
long fd_flag;
ssize_t read_result;
+   const unsigned hashsz = the_hash_algo->rawsz;
 
if (!p->index_data && open_pack_index(p))
return error("packfile %s index unavailable", p->pack_name);
@@ -507,15 +509,15 @@ static int open_packed_git_1(struct packed_git *p)
 " while index indicates %"PRIu32" objects",
 p->pack_name, ntohl(hdr.hdr_entries),
 p->num_objects);
-   if (lseek(p->pack_fd, p->pack_size - sizeof(sha1), SEEK_SET) == -1)
+   if (lseek(p->pack_fd, p->pack_size - hashsz, SEEK_SET) == -1)
return error("end of packfile %s is unavailable", p->pack_name);
-   read_result = read_in_full(p->pack_fd, sha1, sizeof(sha1));
+   read_result = read_in_full(p->pack_fd, hash, hashsz);
if (read_result < 0)
return error_errno("error reading from %s", p->pack_name);
-   if (read_result != sizeof(sha1))
+   if (read_result != hashsz)
return error("packfile %s signature is unavailable", 
p->pack_name);
-   idx_sha1 = ((unsigned char *)p->index_data) + p->index_size - 40;
-   if (hashcmp(sha1, idx_sha1))
+   idx_hash = ((unsigned char *)p->index_data) + p->index_size - hashsz * 
2;
+   if (hashcmp(hash, idx_hash))
return error("packfile %s does not match index", p->pack_name);
return 0;
 }
@@ -530,7 +532,7 @@ static int open_packed_git(struct packed_git *p)
 
 static int in_window(struct pack_window *win, off_t offset)
 {