Re: [PATCH 6/6] Document some functions defined in object.c

2014-02-24 Thread Michael Haggerty
Nicolas, thanks for the very fast feedback!

On 02/21/2014 06:33 PM, Nicolas Pitre wrote:
 On Fri, 21 Feb 2014, Michael Haggerty wrote:
 
 Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
 
 Minor nits below.
 
 
 ---
  object.c | 23 ++-
  object.h |  7 +++
  2 files changed, 29 insertions(+), 1 deletion(-)

 diff --git a/object.c b/object.c
 index 584f7ac..c34e225 100644
 --- a/object.c
 +++ b/object.c
 @@ -43,14 +43,26 @@ int type_from_string(const char *str)
  die(invalid object type \%s\, str);
  }
  
 +/*
 + * Return a numerical hash value between 0 and n-1 for the object with
 + * the specified sha1.  n must be a power of 2.
 + *
 + * Since the sha1 is essentially random, we just take the required
 + * bits from the first sizeof(unsigned int) bytes of sha1.
 
 This might be improved a little.  The only reason for the sizeof() is 
 actually to copy those bits into a properly aligned integer.  Some 
 architectures have alignment restrictions that incure a significant cost 
 when integer operations are performed on unaligned data whereas sha1 
 pointers don't have any particular alignment requirements.  Once upon a 
 time this used to simply be:
 
   return *(unsigned int *)sha1  (n - 1);
 
 The memcpy is there only to avoid unaligned accesses.

I understand all that; it's clear that the old code is not correct C,
isn't it?  And ISTM that the use of memcpy() is an implementation detail
that is not relevant to callers and so not needed in the docstring.  So
I suggest that your note be added as comments within the function; what
do you think?

Contrariwise, I thought about it again and believe that it *is*
important for the docstring to mention explicitly that the return value
is architecture-dependent (it depends on endianness and possibly
sizeof(unsigned int)).  Presumably the function is only used within one
Git invocation, so this isn't a problem, but we should warn callers.
Alternatively, we could stick a call to ntohl() in the function to make
the return value consistent, but this would cost a little bit on
little-endian computers.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] Document some functions defined in object.c

2014-02-24 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 Contrariwise, I thought about it again and believe that it *is*
 important for the docstring to mention explicitly that the return value
 is architecture-dependent

As it gives an internal hash value which should not leak to the
outside world (e.g. stored in a file or sent over the wire later to
be read by other platforms), I think that is a good idea.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] Document some functions defined in object.c

2014-02-21 Thread Nicolas Pitre
On Fri, 21 Feb 2014, Michael Haggerty wrote:

 Signed-off-by: Michael Haggerty mhag...@alum.mit.edu

Minor nits below.


 ---
  object.c | 23 ++-
  object.h |  7 +++
  2 files changed, 29 insertions(+), 1 deletion(-)
 
 diff --git a/object.c b/object.c
 index 584f7ac..c34e225 100644
 --- a/object.c
 +++ b/object.c
 @@ -43,14 +43,26 @@ int type_from_string(const char *str)
   die(invalid object type \%s\, str);
  }
  
 +/*
 + * Return a numerical hash value between 0 and n-1 for the object with
 + * the specified sha1.  n must be a power of 2.
 + *
 + * Since the sha1 is essentially random, we just take the required
 + * bits from the first sizeof(unsigned int) bytes of sha1.

This might be improved a little.  The only reason for the sizeof() is 
actually to copy those bits into a properly aligned integer.  Some 
architectures have alignment restrictions that incure a significant cost 
when integer operations are performed on unaligned data whereas sha1 
pointers don't have any particular alignment requirements.  Once upon a 
time this used to simply be:

return *(unsigned int *)sha1  (n - 1);

The memcpy is there only to avoid unaligned accesses.


 + */
  static unsigned int hash_obj(const unsigned char *sha1, unsigned int n)
  {
   unsigned int hash;
 +
   memcpy(hash, sha1, sizeof(unsigned int));
 - /* Assumes power-of-2 hash sizes in grow_object_hash */
   return hash  (n - 1);
  }

Other than that...

Reviewed-by: Nicolas Pitre n...@fluxnic.net



  
 +/*
 + * Insert obj into the hash table hash, which has length size (which
 + * must be a power of 2).  On collisions, simply overflow to the next
 + * empty bucket.
 + */
  static void insert_obj_hash(struct object *obj, struct object **hash, 
 unsigned int size)
  {
   unsigned int j = hash_obj(obj-sha1, size);
 @@ -63,6 +75,10 @@ static void insert_obj_hash(struct object *obj, struct 
 object **hash, unsigned i
   hash[j] = obj;
  }
  
 +/*
 + * Look up the record for the given sha1 in the hash map stored in
 + * obj_hash.  Return NULL if it was not found.
 + */
  struct object *lookup_object(const unsigned char *sha1)
  {
   unsigned int i, first;
 @@ -92,6 +108,11 @@ struct object *lookup_object(const unsigned char *sha1)
   return obj;
  }
  
 +/*
 + * Increase the size of the hash map stored in obj_hash to the next
 + * power of 2 (but at least 32).  Copy the existing values to the new
 + * hash map.
 + */
  static void grow_object_hash(void)
  {
   int i;
 diff --git a/object.h b/object.h
 index dc5df8c..732bf4d 100644
 --- a/object.h
 +++ b/object.h
 @@ -42,7 +42,14 @@ struct object {
  extern const char *typename(unsigned int type);
  extern int type_from_string(const char *str);
  
 +/*
 + * Return the current number of buckets in the object hashmap.
 + */
  extern unsigned int get_max_object_index(void);
 +
 +/*
 + * Return the object from the specified bucket in the object hashmap.
 + */
  extern struct object *get_indexed_object(unsigned int);
  
  /*
 -- 
 1.8.5.3
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html