Re: [PATCH] Implement ‘hash’ for structs

2012-10-12 Thread Ludovic Courtès
Mark H Weaver m...@netris.org skribis:

 l...@gnu.org (Ludovic Courtès) writes:

 Mark H Weaver m...@netris.org skribis:

 I guess this 'if' is to avoid an infinite loop if the struct points back
 to itself.  However, it apparently fails to detect cycles in the general
 case.

 Yes, indeed.

 Here’s an updated patch that uses the ‘depth’ argument of ‘scm_hasher’
 for that, as is done for pairs.

 I don't think 'depth' is an appropriate name for that argument.

I’ve finally pushed it (keeping that variable name).

Thanks for the review!

Ludo’.



Re: [PATCH] Implement ‘hash’ for structs

2012-10-11 Thread Mark H Weaver
l...@gnu.org (Ludovic Courtès) writes:

 Mark H Weaver m...@netris.org skribis:

 I guess this 'if' is to avoid an infinite loop if the struct points back
 to itself.  However, it apparently fails to detect cycles in the general
 case.

 Yes, indeed.

 Here’s an updated patch that uses the ‘depth’ argument of ‘scm_hasher’
 for that, as is done for pairs.

I don't think 'depth' is an appropriate name for that argument.  The way
it is used when hashing vectors (see below) implies that it is roughly
proportional to the number of elements to traverse, not the depth:

--8---cut here---start-8---
case scm_tc7_wvect:
case scm_tc7_vector:
  {
size_t len = SCM_SIMPLE_VECTOR_LENGTH (obj);
if (len  5)
  {
size_t i = d/2;
unsigned long h = 1;
while (i--)
  {
SCM elt = SCM_SIMPLE_VECTOR_REF (obj, h % len);
h = ((h  8) + (scm_hasher (elt, n, 2))) % n;
  }
return h;
  }
else
  {
size_t i = len;
unsigned long h = (n)-1;
while (i--)
  {
SCM elt = SCM_SIMPLE_VECTOR_REF (obj, h % len);
h = ((h  8) + (scm_hasher (elt, n, d/len))) % n;
  }
return h;
  }
  }
--8---cut here---end---8---

I would do something that preserves the meaning of 'd' consistent with
its use above.  Maybe it should be called something like 'effort'.

 Thanks,
   Mark



Re: [PATCH] Implement ‘hash’ for structs

2012-10-11 Thread Ludovic Courtès
Hi,

Mark H Weaver m...@netris.org skribis:

 l...@gnu.org (Ludovic Courtès) writes:

 Mark H Weaver m...@netris.org skribis:

 I guess this 'if' is to avoid an infinite loop if the struct points back
 to itself.  However, it apparently fails to detect cycles in the general
 case.

 Yes, indeed.

 Here’s an updated patch that uses the ‘depth’ argument of ‘scm_hasher’
 for that, as is done for pairs.

 I don't think 'depth' is an appropriate name for that argument.

Yeah, it’s debatable.  It’s called just ‘d’ in hash.c, and Andy renamed
it to ‘depth’ in 2.1.  I think ‘depth’ conveys the idea that it’s about
limiting the recursion depth.

I’d keep it this way for consistency.

Thanks,
Ludo’.



Re: [PATCH] Implement ‘hash’ for structs

2012-10-10 Thread Ludovic Courtès
Hello!

Mark H Weaver m...@netris.org skribis:

 l...@gnu.org (Ludovic Courtès) writes:
 As incredible as it may seem, ‘hash’ until now always returned 263 % n
 for structs, leading to interesting experiences when using structs as
 hash table keys.

 Yes, do you remember us talking about this long ago on IRC?  I wanted to
 fix this, but asked whether changing the hash function was okay for 2.0,
 and you never gave me an answer :)

I don’t remember, but I’m glad we agree that something must be done.
It’s also a sign that email is better than IRC for these things, as far
as I’m concerned.  ;-)

 Andy said that he improved the hash function on the master branch.
 You might want to look at what he did.

Thanks for the reminder.  I just looked, it’s much nicer, but it doesn’t
address this particular problem, so we could port it there afterward.

 I guess this 'if' is to avoid an infinite loop if the struct points back
 to itself.  However, it apparently fails to detect cycles in the general
 case.

Yes, indeed.

Here’s an updated patch that uses the ‘depth’ argument of ‘scm_hasher’
for that, as is done for pairs.

Thanks for the review!

Ludo’.

diff --git a/libguile/hash.c b/libguile/hash.c
index a79f03d..8b00a0c 100644
--- a/libguile/hash.c
+++ b/libguile/hash.c
@@ -1,5 +1,6 @@
-/*	Copyright (C) 1995,1996,1997, 2000, 2001, 2003, 2004, 2006, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
- * 
+/* Copyright (C) 1995, 1996, 1997, 2000, 2001, 2003, 2004, 2006, 2008,
+ *   2009, 2010, 2011, 2012 Free Software Foundation, Inc.
+ *
  * This library is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public License
  * as published by the Free Software Foundation; either version 3 of
@@ -223,6 +224,8 @@ scm_hasher(SCM obj, unsigned long n, size_t d)
 	significant_bits = (scm_t_uintptr) SCM_POINTER_VALUE (obj)  4UL;
 	return (size_t) significant_bits  % n;
   }
+case scm_tcs_struct:
+  return scm_i_struct_hash (obj, n, d);
 case scm_tc7_wvect:
 case scm_tc7_vector:
   {
diff --git a/libguile/struct.c b/libguile/struct.c
index 5837b7c..7e8f68c 100644
--- a/libguile/struct.c
+++ b/libguile/struct.c
@@ -922,6 +922,53 @@ scm_struct_ihashq (SCM obj, unsigned long n, void *closure)
   return SCM_UNPACK (obj) % n;
 }
 
+unsigned long
+scm_i_struct_hash (SCM obj, unsigned long n, size_t depth)
+#define FUNC_NAME hash
+{
+  SCM layout;
+  scm_t_bits *data;
+  size_t struct_size, field_num;
+  unsigned long hash;
+
+  SCM_VALIDATE_STRUCT (1, obj);
+
+  layout = SCM_STRUCT_LAYOUT (obj);
+  struct_size = scm_i_symbol_length (layout) / 2;
+  data = SCM_STRUCT_DATA (obj);
+
+  hash = SCM_UNPACK (SCM_STRUCT_VTABLE (obj)) % n;
+  if (depth  0)
+for (field_num = 0; field_num  struct_size; field_num++)
+  {
+	int protection;
+
+	protection = scm_i_symbol_ref (layout, field_num * 2 + 1);
+	if (protection != 'h'  protection != 'o')
+	  {
+	int type;
+	type = scm_i_symbol_ref (layout, field_num * 2);
+	switch (type)
+	  {
+	  case 'p':
+		hash ^= scm_hasher (SCM_PACK (data[field_num]), n,
+depth / 2);
+		break;
+	  case 'u':
+		hash ^= data[field_num] % n;
+		break;
+	  default:
+		/* Ignore 's' fields.  */;
+	  }
+	  }
+  }
+
+  /* FIXME: Tail elements should be taken into account.  */
+
+  return hash % n;
+}
+#undef FUNC_NAME
+
 SCM_DEFINE (scm_struct_vtable_name, struct-vtable-name, 1, 0, 0, 
 (SCM vtable),
 	Return the name of the vtable @var{vtable}.)
diff --git a/libguile/struct.h b/libguile/struct.h
index 3072f24..643fd9d 100644
--- a/libguile/struct.h
+++ b/libguile/struct.h
@@ -193,6 +193,8 @@ SCM_API void scm_print_struct (SCM exp, SCM port, scm_print_state *);
 
 SCM_INTERNAL SCM scm_i_struct_equalp (SCM s1, SCM s2);
 SCM_INTERNAL unsigned long scm_struct_ihashq (SCM, unsigned long, void *);
+SCM_INTERNAL unsigned long scm_i_struct_hash (SCM s, unsigned long n,
+	  size_t depth);
 SCM_INTERNAL SCM scm_i_alloc_struct (scm_t_bits *vtable_data, int n_words);
 SCM_INTERNAL void scm_i_struct_inherit_vtable_magic (SCM vtable, SCM obj);
 SCM_INTERNAL void scm_init_struct (void);
diff --git a/test-suite/tests/structs.test b/test-suite/tests/structs.test
index 431a014..0e3b241 100644
--- a/test-suite/tests/structs.test
+++ b/test-suite/tests/structs.test
@@ -126,7 +126,49 @@
  (not (or (equal? (make-ball red Bob) (make-ball green Bob))
 	  (equal? (make-ball red Bob) (make-ball red Bill))
 
+
+(with-test-prefix hash
+
+  (pass-if simple structs
+(let* ((v  (make-vtable pr))
+   (s1 (make-struct v 0 hello))
+   (s2 (make-struct v 0 hello)))
+  (= (hash s1 ) (hash s2 
+
+  (pass-if different structs
+(let* ((v  (make-vtable pr))
+   (s1 (make-struct v 0 hello))
+   (s2 (make-struct v 0 world)))
+  (or (not (= (hash s1 ) (hash s2 )))
+  (throw 'unresolved
+
+  (pass-if different struct 

Re: [PATCH] Implement ‘hash’ for structs

2012-10-09 Thread Mark H Weaver
Hi Ludovic!

l...@gnu.org (Ludovic Courtès) writes:
 As incredible as it may seem, ‘hash’ until now always returned 263 % n
 for structs, leading to interesting experiences when using structs as
 hash table keys.

Yes, do you remember us talking about this long ago on IRC?  I wanted to
fix this, but asked whether changing the hash function was okay for 2.0,
and you never gave me an answer :)

Andy said that he improved the hash function on the master branch.
You might want to look at what he did.

[...]

 diff --git a/libguile/struct.c b/libguile/struct.c
 index 5837b7c..6287163 100644
 --- a/libguile/struct.c
 +++ b/libguile/struct.c
 @@ -922,6 +922,52 @@ scm_struct_ihashq (SCM obj, unsigned long n, void 
 *closure)
return SCM_UNPACK (obj) % n;
  }
  
 +unsigned long
 +scm_i_struct_hash (SCM obj, unsigned long n)
 +#define FUNC_NAME hash
 +{
 +  SCM layout;
 +  scm_t_bits *data;
 +  size_t struct_size, field_num;
 +  unsigned long hash;
 +
 +  SCM_VALIDATE_STRUCT (1, obj);
 +
 +  layout = SCM_STRUCT_LAYOUT (obj);
 +  struct_size = scm_i_symbol_length (layout) / 2;
 +  data = SCM_STRUCT_DATA (obj);
 +
 +  hash = (unsigned long) SCM_PACK (SCM_STRUCT_VTABLE (obj));
 +  for (field_num = 0; field_num  struct_size; field_num++)
 +{
 +  int protection;
 +
 +  protection = scm_i_symbol_ref (layout, field_num * 2 + 1);
 +  if (protection != 'h'  protection != 'o')
 + {
 +   int type;
 +   type = scm_i_symbol_ref (layout, field_num * 2);
 +   switch (type)
 + {
 + case 'p':
 +   if (!scm_is_eq (obj, SCM_PACK (data[field_num])))
 + hash ^= scm_ihash (SCM_PACK (data[field_num]), n);

I guess this 'if' is to avoid an infinite loop if the struct points back
to itself.  However, it apparently fails to detect cycles in the general
case.  I think this is a show stopper.  I think we need to detect cycles
and DTRT.

 Mark