[netlabs #574] [PATCH] Hashtables

via RT Tue, 14 May 2002 15:57:56 -0700

# New Ticket Created by  Steve Fink 
# Please include the string:  [netlabs #574]
# in the subject line of all future correspondence about this issue. 
# <URL: http://bugs6.perl.org/rt2/Ticket/Display.html?id=574 >



Here's the ultimate point of that flurry of patches. This implements
hashtables, albeit with some... arguable implementation decisions.

These hashtables accept only keys as strings, but natively store all
sorts of various types as values (in other words, not just PMCs).

The straightforward implementation using the preexisting memory
management code would have required seven separately addressable wads
of memory. With the sized buffer pools, these hashtables use five:
 1 The PMC
 2 The overall hashtable header (a subclass of a Buffer)
 3 An array of pointers to buckets, stored in the above header's bufstart
 4 A Buffer header for the pool of buckets themselves
 5 The array of available buckets

The array of pointers to buckets could be stored in the same buffer as
the buckets themselves, reducing the total number of objects to three,
but then there would be two very different beasts pointed to by the
same Buffer header subclass, complicating the addressing, so I opted
to keep them separate. The costs would be less for a hashtable that
stored only PMC*'s, however.

The most potentially controversial attribute of these hashtables is
the use of direct memory pointers to and between the buckets. These
pointers are invalidated every time buffers are compacted, so at the
beginning of every public entry point, restore_invariants() is called.
It checks to see if the buckets have moved (they're the only things
pointed to directly), and if so, they are swept through and adjusted
to the new position. The same routine is used to resize the table when
it reaches 80% full. It's very similar to the DOD sweep phase.

I tried using indexes for everything, but they cluttered the code too
much and it got hard to tell what was going on. But it would probably
be faster that way.

This patch requires all of my recently posted patches except for the
cleanup patch. It will compile and even appear to work if you leave
out the sized buffer pool patch, but if you turn on GC_DEBUG it will
break very quickly.

Index: MANIFEST
===================================================================
RCS file: /home/perlcvs/parrot/MANIFEST,v
retrieving revision 1.146
diff -a -u -r1.146 MANIFEST
--- MANIFEST    3 May 2002 23:56:51 -0000       1.146
+++ MANIFEST    14 May 2002 22:05:01 -0000
@@ -102,6 +102,7 @@
 examples/mops/mops.scheme
 exceptions.c
 global_setup.c
+hash.c
 hints/cygwin.pl
 hints/darwin.pl
 hints/dec_osf.pl
@@ -114,6 +115,7 @@
 include/parrot/events.h
 include/parrot/exceptions.h
 include/parrot/global_setup.h
+include/parrot/hash.h
 include/parrot/interp_guts.h
 include/parrot/interpreter.h
 include/parrot/io.h
Index: Makefile.in
===================================================================
RCS file: /home/perlcvs/parrot/Makefile.in,v
retrieving revision 1.148
diff -a -u -r1.148 Makefile.in
--- Makefile.in 11 May 2002 19:53:33 -0000      1.148
+++ Makefile.in 14 May 2002 22:05:02 -0000
@@ -65,7 +65,7 @@
 $(INC)/memory.h $(INC)/parrot.h $(INC)/stacks.h $(INC)/packfile.h \
 $(INC)/global_setup.h $(INC)/vtable.h $(INC)/oplib/core_ops.h \
 $(INC)/oplib/core_ops_prederef.h $(INC)/runops_cores.h $(INC)/trace.h \
-$(INC)/pmc.h $(INC)/key.h $(INC)/resources.h $(INC)/platform.h ${cg_h} \
+$(INC)/pmc.h $(INC)/key.h $(INC)/hash.h $(INC)/resources.h $(INC)/platform.h ${cg_h} \
 $(INC)/interp_guts.h ${jit_h} $(INC)/rx.h $(INC)/rxstacks.h \
 $(INC)/embed.h $(INC)/warnings.h $(INC)/misc.h
 
@@ -87,7 +87,7 @@
 INTERP_O_FILES = exceptions$(O) global_setup$(O) interpreter$(O) parrot$(O) \
                                 register$(O) core_ops$(O) core_ops_prederef$(O) 
memory$(O) \
                                 packfile$(O) stacks$(O) string$(O) encoding$(O) \
-                                chartype$(O) runops_cores$(O) trace$(O) pmc$(O) 
key$(O) \
+                                chartype$(O) runops_cores$(O) trace$(O) pmc$(O) 
+key$(O) hash$(O) \
                                 platform$(O) ${jit_o} resources$(O) rx$(O) 
rxstacks$(O) \
                                 embed$(O) warnings$(O) misc$(O) ${cg_o} \
                                 packout$(O) byteorder$(O)
Index: include/parrot/string_funcs.h
===================================================================
RCS file: /home/perlcvs/parrot/include/parrot/string_funcs.h,v
retrieving revision 1.7
diff -a -u -r1.7 string_funcs.h
--- include/parrot/string_funcs.h       14 Apr 2002 18:54:29 -0000      1.7
+++ include/parrot/string_funcs.h       14 May 2002 22:05:33 -0000
@@ -35,6 +35,7 @@
 FLOATVAL Parrot_string_to_num(const STRING *);
 INTVAL Parrot_string_to_int(const STRING *);
 STRING * Parrot_string_from_int(struct Parrot_Interp *, INTVAL i);
+STRING * Parrot_string_from_num(struct Parrot_Interp *, FLOATVAL f);
 STRING * Parrot_string_grow(struct Parrot_Interp *, STRING * s, INTVAL addlen);
 void Parrot_string_destroy(STRING *);
 STRING *Parrot_string_make(struct Parrot_Interp *, const void *buffer,
@@ -65,6 +66,7 @@
 #define string_to_num           Parrot_string_to_num
 #define string_to_int           Parrot_string_to_int
 #define string_from_int         Parrot_string_from_int
+#define string_from_num         Parrot_string_from_num
 #define string_grow             Parrot_string_grow
 #define string_destroy          Parrot_string_destroy
 #define string_make             Parrot_string_make
Index: string.c
===================================================================
RCS file: /home/perlcvs/parrot/string.c,v
retrieving revision 1.75
diff -a -u -r1.75 string.c
--- string.c    5 May 2002 04:02:59 -0000       1.75
+++ string.c    14 May 2002 22:05:18 -0000
@@ -787,6 +787,21 @@
                             NULL, 0, NULL);
 }
 
+/* Stolen, with modifications, from perlnum.pmc */
+STRING *
+string_from_num(struct Parrot_Interp * interpreter, FLOATVAL f)
+{
+    char buff[200];
+    STRING* s;
+#ifdef HAS_SNPRINTF
+    snprintf(buff, sizeof(buff), FLOATVAL_FMT, f);
+#else
+    sprintf(buff, FLOATVAL_FMT, f);  /* XXX buffer overflow! */
+#endif
+    s = string_make(interpreter, buff, strlen(buff), NULL, 0, NULL);
+    return s;
+}
+
 const char *
 string_to_cstring(struct Parrot_Interp * interpreter, STRING * s)
 {
Index: hash.c
===================================================================
RCS file: hash.c
diff -N hash.c
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ hash.c      14 May 2002 22:05:09 -0000
@@ -0,0 +1,427 @@
+/* hash.c
+ *  Copyright: (When this is determined...it will go here)
+ *  CVS Info
+ *     $Id: key.c,v 1.24 2002/04/02 20:35:52 sfink Exp $
+ *  Overview:
+ *  Data Structure and Algorithms:
+ *     A hashtable contains an array of pointers to buckets. Buckets
+ *     are nodes in a linked list, each containing a STRING key and
+ *     a value. The value is currently stored as a KEY_ATOM, which
+ *     maybe makes sense for some hashes but probably doesn't for what
+ *     they're currently used for, which is PerlHashes (since those
+ *     should probably just be hashes of STRINGs mapping to PMCs.)
+ *
+ *     To minimize memory overhead, buckets are carved out of a pool
+ *     that is allocated normally by parrot's memory subsystem. That
+ *     means that the pool can get moved around a lot. Which in turns
+ *     means that any bucket pointers will be invalidated. The simple
+ *     way to handle this would be to use integer indexes into the
+ *     pool instead of pointers, but when I started coding that up, it
+ *     made everything much messier. So instead, I use pointers but
+ *     fixup all the pointers whenever necessary (either because the
+ *     GC moved stuff, or because the hashtable was resized.) This is
+ *     done in restore_invariants, which compares the current base
+ *     address of the bucket pool to a stored value that indicates
+ *     what all of the pointers are relative to, and traverses through
+ *     the entire hash to fix them up. A side effect of this approach
+ *     is that pointers to buckets can never be allowed to escape this
+ *     module, but that shouldn't be allowed anyway.
+ *
+ *  History:
+ *     Initial version by Jeff G. on 2001.12.05
+ *     Substantially rewritten by Steve F.
+ *  Notes:
+ *     Future optimizations:
+ *       - compute modulus by ANDing with num_buckets-1 (The compiler
+ *         cannot know that num_buckets is always a power of 2.)
+ *       - Stop reallocating the bucket pool, and instead add chunks on.
+ *         (Saves pointer fixups and copying during realloc.)
+ *       - Hash contraction (dunno if it's worth it)
+ *  References:
+ *     pdd08_keys.pod */
+
+#include "parrot/parrot.h"
+
+#define INITIAL_BUCKETS 16
+#define MAXFULL_PERCENT 80
+
+struct _hashbucket {
+    STRING *key;
+    KEY_ATOM value;
+    HASHBUCKET* next;
+};
+
+struct _hash {
+    Buffer buffer; /* This struct is a Buffer subclass! */
+    UINTVAL num_buckets;
+    UINTVAL entries; /* Number of values stored in hashtable */
+    Buffer* bucket_pool; /* Buffer full of buckets, used and unused */
+    HASHBUCKET* free_list;
+    HASHBUCKET* former_base;
+};
+
+/* This routine must be called at the beginning of all public entry
+ * points. It adjusts the pointers in case the garbage collector has
+ * moved things around. If it bothers you that this has to scan the
+ * whole hashtable and so will slow things down tremendously, consider
+ * this: that's what garbage collection does during DOD anyway! */
+static void restore_invariants(Interp* interpreter, HASH* hash)
+{
+    ptrdiff_t adjust;
+    UINTVAL i;
+    HASHBUCKET** table;
+    HASHBUCKET* current_base = (HASHBUCKET*) hash->bucket_pool->bufstart;
+    UINTVAL table_size;
+
+    UNUSED(interpreter);
+
+    if (current_base == NULL || current_base == hash->former_base)
+        return; /* Nothing has moved, so we're good. */
+
+    if (hash->former_base == NULL) {
+        /* Moved from nowhere to somewhere, so no fixup needed */
+        hash->former_base = current_base;
+        return;
+    }
+    
+/*      fprintf(stderr, "Moving hash %p buckets from %p -> %p\n", */
+/*              hash, hash->former_base, current_base); */
+
+    adjust = current_base - hash->former_base;
+
+    /* Fix up the free list */
+    if (hash->free_list) hash->free_list += adjust;
+
+    /* Fix up the hashtable */
+    table = (HASHBUCKET**) hash->buffer.bufstart;
+    for (i = 0; i < hash->num_buckets; i++) {
+        if (table[i]) table[i] += adjust;
+    }
+
+    /* Fix up the buckets themselves. All buckets in the pool are
+     * either on the free list or in a hash chain, and either way they
+     * need to be adjusted. So there is no need to chase down all the
+     * chains. */
+    table_size = hash->bucket_pool->buflen / sizeof(HASHBUCKET);
+    for (i = 0; i < table_size; i++) {
+        if (current_base[i].next) current_base[i].next += adjust;
+    }
+
+    hash->former_base = current_base;
+}
+
+/*=for api key key_hash
+
+Return the hashed value of the string
+
+=cut */
+
+static INTVAL
+key_hash(Interp *interpreter, STRING *value)
+{
+    char *buffptr = value->bufstart;
+    INTVAL len = value->bufused;
+    INTVAL hash = 5893;
+
+    UNUSED(interpreter);
+
+    while (len--) {
+        hash = hash * 33 + *buffptr++;
+    }
+    if (hash < 0) {
+        hash = -hash;
+    }
+    return hash;
+}
+
+void
+dump_hash(Interp* interpreter, HASH* hash)
+{
+    UINTVAL i;
+    HASHBUCKET** buckets = (HASHBUCKET**) hash->buffer.bufstart;
+    fprintf(stderr, "Hashtable[" INTVAL_FMT "/" INTVAL_FMT "]\n",
+            hash->entries, hash->num_buckets);
+    for (i = 0; i <= hash->num_buckets; i++) {
+        HASHBUCKET* bucket;
+        if (i == hash->num_buckets) bucket = hash->free_list;
+        else bucket = buckets[i];
+        if (bucket == NULL) continue;
+        fprintf(stderr, "  Bucket " INTVAL_FMT ": ", i);
+        while (bucket) {
+            fprintf(stderr, "type(%d)", bucket->value.type);
+            bucket = bucket->next;
+            if (bucket) fprintf(stderr, " -> ");
+        }
+        fprintf(stderr, "\n");
+    }
+}
+
+PMC*
+mark_hash(Interp* interpreter, HASH* hash, PMC* end_of_used_list)
+{
+    UINTVAL i;
+    HASHBUCKET** buckets = (HASHBUCKET**) hash->buffer.bufstart;
+
+    restore_invariants(interpreter, hash);
+
+    buffer_lives((Buffer *)hash);
+    buffer_lives(hash->bucket_pool);
+    for (i = 0; i < hash->num_buckets; i++) {
+        HASHBUCKET* bucket = buckets[i];
+        while (bucket) {
+            buffer_lives((Buffer *)bucket->key);
+            if (bucket->value.type == enum_key_string)
+                buffer_lives((Buffer *)bucket->value.val.string_val);
+            else if (bucket->value.type == enum_key_pmc)
+                end_of_used_list = mark_used(bucket->value.val.pmc_val,
+                                             end_of_used_list);
+            bucket = bucket->next;
+        }
+    }
+
+    return end_of_used_list;
+}
+
+/* For a hashtable of size N, we use MAXFULL_PERCENT% of N as the number of
+ * buckets. This way, as soon as we run out of buckets on the free list,
+ * we know that it's time to resize the hashtable.
+ *
+ * Algorithm for expansion: We exactly double the size of the hashtable.
+ * Keys are assigned to buckets with the formula
+ *    bucket_index = hash(key) % num_buckets
+ * so when doubling the size of the hashtable, we know that every key
+ * is either already in the correct bucket, or belongs in the current
+ * bucket plus num_buckets (the old num_buckets). In fact, because the
+ * hashtable is always a power of two in size, it depends only on the
+ * next bit in the hash value, after the ones previously used.
+ *
+ * So we scan through all the buckets in order, moving the buckets
+ * that need to be moved. No bucket will be scanned twice, and the
+ * cache should be reasonably happy because the hashtable accesses
+ * will be two parallel sequential scans. (Of course, this also mucks
+ * with the ->next pointers, and they'll be all over memory.)
+ */
+static void
+expand_hash(Interp *interpreter, HASH* hash)
+{
+    HASHBUCKET** table;
+    HASHBUCKET* bucket;
+    UINTVAL new_size = (hash->num_buckets ? hash->num_buckets * 2
+                                          : INITIAL_BUCKETS);
+    UINTVAL i;
+    UINTVAL old_pool_size = hash->bucket_pool->buflen / sizeof(HASHBUCKET);
+    UINTVAL new_pool_size = new_size * MAXFULL_PERCENT / 100;
+    HASHBUCKET* old_pool = (HASHBUCKET*) hash->bucket_pool->bufstart;
+
+    Parrot_reallocate(interpreter, hash, new_size * sizeof(HASHBUCKET*));
+    table = (HASHBUCKET**) hash->buffer.bufstart;
+/*      fprintf(stderr, "expand_hash(%p): buckets=%p..%p (mem=%p..%p)\n", hash, 
+table, table + new_size, hash->buffer.bufstart, (char*)hash->buffer.bufstart + 
+hash->buffer.buflen); */
+
+    Parrot_reallocate(interpreter, hash->bucket_pool,
+                            new_pool_size * sizeof(HASHBUCKET));
+    restore_invariants(interpreter, hash); /* Bucket pool may have moved */
+
+    /* Add the newly allocated buckets onto the free list */
+    for (i = old_pool_size; i < new_pool_size; i++) {
+        bucket = &((HASHBUCKET*) hash->bucket_pool->bufstart)[i];
+        bucket->next = hash->free_list;
+        hash->free_list = bucket;
+    }
+
+    /* NULL out new space in table */
+    memset(table + hash->num_buckets, 0,
+           (new_size - hash->num_buckets) * sizeof(HASHBUCKET*));
+
+    /* Move buckets to new homes */
+    for (i = 0; i < hash->num_buckets; i++) {
+        HASHBUCKET** bucketP = &table[i];
+        while (*bucketP != NULL) {
+            bucket = *bucketP;
+            if ((key_hash(interpreter, bucket->key) % new_size) != i) {
+                /* Remove from table */
+                *bucketP = bucket->next;
+
+                /* Add to new spot in table */
+                bucket->next = table[i + hash->num_buckets];
+                table[i + hash->num_buckets] = bucket;
+            }
+            else {
+                bucketP = &bucket->next;
+            }
+        }
+    }
+
+    hash->num_buckets = new_size;
+}
+
+static HASHBUCKET *
+new_bucket(Interp *interpreter, HASH* hash, STRING *key, KEY_ATOM *value)
+{
+    if (key == NULL) {
+        internal_exception(INTERNAL_PANIC, "NULL key\n");
+        return NULL;
+    }
+
+    if (value == NULL) {
+        internal_exception(INTERNAL_PANIC, "NULL value\n");
+        return NULL;
+    }
+
+    if (hash->free_list != NULL) {
+        HASHBUCKET *bucket = hash->free_list;
+        hash->free_list = bucket->next;
+        bucket->key = key;
+        memcpy(&bucket->value, value, sizeof(*value));
+        return bucket;
+    }
+
+    /* Free list is empty. Need to expand the hashtable. */
+    expand_hash(interpreter, hash);
+    return new_bucket(interpreter, hash, key, value);
+}
+
+static HASHBUCKET *
+find_bucket(Interp *interpreter, HASHBUCKET *head, STRING *key)
+{
+    KEY_ATOM *pair = NULL;
+    if (head != NULL) {
+        if (key != NULL) {
+            while (head != NULL) {
+                if (string_compare(interpreter, key, head->key) == 0) {
+                    return head;
+                }
+                head = head->next;
+            }
+        }
+        else {
+            fprintf(stderr, "*** find_bucket given a null key\n");
+        }
+    }
+    return NULL;
+}
+
+HASH *
+new_hash(Interp *interpreter)
+{
+    HASH* hash = (HASH *) new_tracked_header(interpreter, sizeof(*hash));
+/*      hash->buffer.flags |= BUFFER_report_FLAG; */
+    hash->num_buckets = 0;
+    hash->entries = 0;
+    hash->bucket_pool = new_buffer_header(interpreter);
+/*      hash->bucket_pool->flags |= BUFFER_report_FLAG; */
+    hash->free_list = NULL;
+    hash->former_base = NULL;
+    return hash;
+}
+
+/*=for api key hash_size
+
+return the number of used entries in hashtable
+
+=cut
+*/
+
+INTVAL
+hash_size(Interp *interpreter, HASH *hash)
+{
+    UNUSED(interpreter);
+
+    if (hash != NULL) {
+        return hash->entries;
+    }
+    else {
+        fprintf(stderr, "*** hash_size asked to check a NULL hash\n");
+        return 0;
+    }
+}
+
+static HASHBUCKET*
+hash_lookup(Interp *interpreter, HASH *hash, STRING* key)
+{
+    HASHBUCKET** table = (HASHBUCKET**) hash->buffer.bufstart;
+    UINTVAL hashval = key_hash(interpreter, key);
+    HASHBUCKET* chain = table[hashval % hash->num_buckets];
+    return find_bucket(interpreter, chain, key);
+}
+
+KEY_ATOM *
+hash_get(Interp *interpreter, HASH *hash, STRING* key)
+{
+    HASHBUCKET* bucket;
+    restore_invariants(interpreter, hash);
+    bucket = hash_lookup(interpreter, hash, key);
+    if (bucket == NULL) return NULL; /* Not found */
+    return &bucket->value;
+}
+
+/* The key is *not* copied. */
+void
+hash_put(Interp *interpreter, HASH *hash, STRING* key, KEY_ATOM* value)
+{
+    HASHBUCKET** table;
+    UINTVAL hashval;
+    HASHBUCKET* chain;
+    HASHBUCKET* bucket;
+
+    restore_invariants(interpreter, hash);
+/*      dump_hash(interpreter, hash); */
+
+    table = (HASHBUCKET**) hash->buffer.bufstart;
+    hashval = key_hash(interpreter, key);
+    chain = table ? table[hashval % hash->num_buckets] : NULL;
+    bucket = find_bucket(interpreter, chain, key);
+
+/*      fprintf(stderr, "HASH=%p buckets=%p chain=%p bucket=%p KEY=%s\n", */
+/*              hash, hash->buffer.bufstart, chain, bucket, 
+string_to_cstring(interpreter, key)); */
+
+    if (bucket) {
+        /* Replacing old value */
+        memcpy(&bucket->value, value, sizeof(KEY_ATOM));
+    } else {
+        /* Create new bucket */
+        hash->entries++;
+        bucket = new_bucket(interpreter, hash, key, value);
+        table = (HASHBUCKET**) hash->buffer.bufstart;
+        bucket->next = table[hashval % hash->num_buckets];
+        table[hashval % hash->num_buckets] = bucket;
+    }
+/*      dump_hash(interpreter, hash); */
+}
+
+void
+hash_delete(Interp *interpreter, HASH *hash, STRING* key)
+{
+    HASHBUCKET** table;
+    UINTVAL hashval;
+    HASHBUCKET* chain;
+    HASHBUCKET* bucket;
+    HASHBUCKET* prev = NULL;
+
+    restore_invariants(interpreter, hash);
+
+    table = (HASHBUCKET**) hash->buffer.bufstart;
+    hashval = key_hash(interpreter, key);
+    chain = table[hashval % hash->num_buckets];
+
+    for (bucket = chain; bucket != NULL; bucket = bucket->next) {
+        if (string_compare(interpreter, key, bucket->key) == 0) {
+            if (prev) prev->next = bucket->next;
+            else table[hashval % hash->num_buckets] = bucket->next;
+            hash->entries--;
+            return;
+        }
+        prev = bucket;
+    }
+
+    fprintf(stderr, "*** hash_delete given nonexistent key\n");
+}
+
+/*
+ * Local variables:
+ * c-indentation-style: bsd
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ *
+ * vim: expandtab shiftwidth=4:
+ */
Index: classes/perlhash.pmc
===================================================================
RCS file: /home/perlcvs/parrot/classes/perlhash.pmc,v
retrieving revision 1.16
diff -a -u -r1.16 perlhash.pmc
--- classes/perlhash.pmc        2 Apr 2002 20:32:42 -0000       1.16
+++ classes/perlhash.pmc        14 May 2002 22:05:28 -0000
@@ -7,11 +7,52 @@
  *  Data Structure and Algorithms:
  *  History:
  *  Notes:
+ *     PerlHash values should probably only be allowed to be PMCs.
  *  References:
+ *     pdd08_keys.pod
  */
 
 #include "parrot/parrot.h"
 
+static PMC* undef = NULL;
+
+static STRING* make_hash_key_helper(Interp* interpreter, KEY * key)
+{
+    if (key == NULL) {
+        internal_exception(OUT_OF_BOUNDS, "Cannot use NULL key for PerlHash!\n");
+        return NULL;
+    }
+
+    if (key->next != NULL) {
+        internal_exception(OUT_OF_BOUNDS, "PerlHash does not support compound 
+keys!\n");
+        return NULL;
+    }
+
+    switch (key->atom.type) {
+    case enum_key_int:
+        return string_from_int(interpreter, key->atom.val.int_val);
+    case enum_key_num:
+        return string_from_num(interpreter, key->atom.val.num_val);
+    case enum_key_pmc: {
+        PMC* pmc = key->atom.val.pmc_val;
+        return string_copy(interpreter,
+                           pmc->vtable->get_string(interpreter, pmc));
+    }
+    case enum_key_string:
+        return string_copy(interpreter, key->atom.val.string_val);
+    default:
+        internal_exception(OUT_OF_BOUNDS, "Cannot make hash key from type %d\n", 
+key->atom.type);
+    }
+
+    return NULL;
+}
+
+static STRING* make_hash_key(Interp* interpreter, KEY * key)
+{
+        STRING* s = make_hash_key_helper(interpreter, key);
+        return s;
+}
+
 pmclass PerlHash {
 
     INTVAL type () {
@@ -23,56 +64,41 @@
     }
 
     void init (INTVAL size) {
-       SELF->data = key_new(INTERP);
-       key_set_size(INTERP,SELF->data,0);
-    }
-
-    void clone (PMC* dest) { 
+        if (undef == NULL) {
+            undef = pmc_new(INTERP, enum_class_PerlUndef);
+            undef->flags |= PMC_constant_FLAG;
+        }
+        SELF->flags |= PMC_custom_mark_FLAG;
+       SELF->data = new_hash(INTERP);
     }
 
-    void morph (INTVAL type) {
-    }
-
-    BOOLVAL move_to (void * destination) {
-        return 0; /* You can't move me, I don't have anything to move! */
-    }
-
-    INTVAL real_size () {
-       return 0; /* ->data is unused */
-    }
-
-    void destroy () {
-       key_destroy(INTERP,SELF->data);
+    /* The end of used parameter is passed into the mark_used function of
+     * the garbage collector.
+     */
+    PMC* mark (PMC *end_of_used_list) {
+        return mark_hash(INTERP, SELF->data, end_of_used_list);
     }
 
     INTVAL get_integer () {
-       return SELF->cache.int_val;
+       return hash_size(INTERP, SELF->data);
     }
 
     INTVAL get_integer_keyed (KEY * key) {
-        KEY_PAIR* kp;
-        INTVAL ix;
-        PMC* value;
-        internal_exception(OUT_OF_BOUNDS, "Hash not implemented yet!\n");
-
-        if (!key) {
-            return 0;
-        }
-
-        kp = &key->keys[0];
-        ix = kp->cache.int_val;
-
-        if (ix > SELF->cache.int_val) {
-            /* XXX I don't think this will work quite yet */
-            /* key_set_size(kp,ix); */
-            internal_exception(OUT_OF_BOUNDS, "Array element out of bounds!\n");
-        }
-        if (ix < 0) {
-            ix += SELF->cache.int_val;
-        }
-
-        value = ((PMC**)(SELF->data))[ix];
-        return value->vtable->get_integer(INTERP, value);
+        PMC* valpmc;
+        STRING* key_string = make_hash_key(INTERP, key);
+        KEY_ATOM* atom = hash_get(INTERP, (HASH*) SELF->data, key_string);
+        if (atom == NULL) {
+            /* XXX Warning: use of uninitialized value */
+            return undef->vtable->get_integer(INTERP, undef);
+        }
+        if (atom->type == enum_key_int) return atom->val.int_val;
+        if (atom->type == enum_key_pmc) {
+            valpmc = atom->val.pmc_val;
+            return valpmc->vtable->get_integer(INTERP, valpmc);
+        }
+        /* XXX This should convert to an integer if possible */
+        internal_exception(OUT_OF_BOUNDS, "Cannot fetch integer out of non-integer 
+key!\n");
+        return -1;
     }
 
     FLOATVAL get_number () {
@@ -80,29 +106,21 @@
     }
 
     FLOATVAL get_number_keyed (KEY * key) {
-        KEY_PAIR* kp;
-        INTVAL ix;
-        PMC* value;
-        internal_exception(OUT_OF_BOUNDS, "Hash not implemented yet!\n");
-
-        if (!key) {
-            return 0;
-        }
-
-        kp = &key->keys[0];
-        ix = kp->cache.int_val;
-
-        if (ix > SELF->cache.int_val) {
-            /* XXX I don't think this will work quite yet */
-            /* key_set_size(kp,ix); */
-            internal_exception(OUT_OF_BOUNDS, "Array element out of bounds!\n");
-        }
-        if (ix < 0) {
-            ix += SELF->cache.int_val;
+        PMC* valpmc;
+        STRING* key_string = make_hash_key(INTERP, key);
+        KEY_ATOM* atom = hash_get(INTERP, (HASH*) SELF->data, key_string);
+        if (atom == NULL) {
+            /* XXX Warning: Use of uninitialized value */
+            return undef->vtable->get_number(INTERP, undef);
+        }
+        if (atom->type == enum_key_num) return atom->val.num_val;
+        /* XXX This should convert to a number if possible */
+        if (atom->type == enum_key_pmc) {
+            valpmc = atom->val.pmc_val;
+            return valpmc->vtable->get_number(INTERP, valpmc);
         }
-
-        value = ((PMC**)(SELF->data))[ix];
-        return value->vtable->get_number(INTERP, value);
+        internal_exception(OUT_OF_BOUNDS, "Cannot fetch number out of non-numeric 
+key!\n");
+        return 0.0;
     }
 
     STRING* get_string () {
@@ -110,29 +128,21 @@
     }
 
     STRING* get_string_keyed (KEY * key) {
-        KEY_PAIR* kp;
-        INTVAL ix;
-        PMC* value;
-        internal_exception(OUT_OF_BOUNDS, "Hash not implemented yet!\n");
-
-        if (!key) {
-            return 0;
-        }
-
-        kp = &key->keys[0];
-        ix = kp->cache.int_val;
-
-        if (ix > SELF->cache.int_val) {
-            /* XXX I don't think this will work quite yet */
-            /* key_set_size(kp,ix); */
-            internal_exception(OUT_OF_BOUNDS, "Array element out of bounds!\n");
+        PMC* valpmc;
+        STRING* key_string = make_hash_key(INTERP, key);
+        KEY_ATOM* atom = hash_get(INTERP, (HASH*) SELF->data, key_string);
+        if (atom == NULL) {
+            /* XXX Warning: use of uninitialized value */
+            return undef->vtable->get_string(INTERP, undef);
+        }
+        if (atom->type == enum_key_string) return atom->val.string_val;
+        /* XXX This should convert to a string */
+        if (atom->type == enum_key_pmc) {
+            valpmc = atom->val.pmc_val;
+            return valpmc->vtable->get_string(INTERP, valpmc);
         }
-        if (ix < 0) {
-            ix += SELF->cache.int_val;
-        }
-
-        value = ((PMC**)(SELF->data))[ix];
-        return value->vtable->get_string(INTERP, value);
+        internal_exception(OUT_OF_BOUNDS, "Cannot fetch string out of non-string 
+key!\n");
+        return NULL;
     }
 
     BOOLVAL get_bool () {
@@ -152,130 +162,83 @@
     }
 
     void set_integer (PMC* value) {
-       INTVAL size = value->vtable->get_integer(INTERP,value);
-       key_set_size(INTERP,SELF->data,size);
+        internal_exception(INTERNAL_PANIC, "Cannot set hash to int\n");
     }
 
     void set_integer_native (INTVAL size) {
-       key_set_size(INTERP,SELF->data,size);
+        internal_exception(INTERNAL_PANIC, "Cannot set hash to int\n");
     }
 
     void set_integer_bigint (BIGINT value) {
+        internal_exception(INTERNAL_PANIC, "Cannot set hash to int\n");
     }
 
     void set_integer_same (PMC * value) {
-       INTVAL size = value->cache.int_val;
-       key_set_size(INTERP,SELF->data,size);
+        internal_exception(INTERNAL_PANIC, "Cannot set hash to int\n");
     }
 
     void set_integer_keyed (KEY * key, INTVAL value) {
-        KEY_PAIR* kp;
-        INTVAL ix;
-        PMC* pmc2;
-
-        if (!key) {
-            return;
-        }
-
-        kp = &key->keys[0];
-        ix = kp->cache.int_val;
-
-        if (ix > SELF->cache.int_val) {
-            /* XXX I don't think this will work quite yet */
-            /* key_set_size(kp,ix); */
-            internal_exception(OUT_OF_BOUNDS, "Array element out of bounds!\n");
-        }
-        if (ix < 0) {
-            ix += SELF->cache.int_val;
-        }
-
-        pmc2 = ((PMC**)(SELF->data))[ix];
-        pmc2->vtable->set_integer_native(INTERP, pmc2, value);
+        STRING* key_string = make_hash_key(INTERP, key);
+        KEY_ATOM atom;
+        atom.type = enum_key_int;
+        atom.val.int_val = value;
+        hash_put(INTERP, SELF->data, key_string, &atom);
     }
 
     void set_number (PMC * value) {
-       INTVAL size = (INTVAL)value->cache.num_val;
-       key_set_size(INTERP,SELF->data,size);
+        internal_exception(INTERNAL_PANIC, "Cannot set hash to number\n");
     }
 
     void set_number_native (FLOATVAL size) {
-       key_set_size(INTERP,SELF->data,(INTVAL)size);
+        internal_exception(INTERNAL_PANIC, "Cannot set hash to number\n");
     }
 
     void set_number_bigfloat (BIGFLOAT value) {
+        internal_exception(INTERNAL_PANIC, "Cannot set hash to number\n");
     }
 
     void set_number_same (PMC * value) {
-       INTVAL size = value->cache.int_val;
-       key_set_size(INTERP,SELF->data,size);
+        internal_exception(INTERNAL_PANIC, "Cannot set hash to number\n");
     }
 
     void set_number_keyed (KEY * key, FLOATVAL value) {
-        KEY_PAIR* kp;
-        INTVAL ix;
-        PMC* pmc2;
-
-        if (!key) {
-            return;
-        }
-
-        kp = &key->keys[0];
-        ix = kp->cache.int_val;
-
-        if (ix > SELF->cache.int_val) {
-            /* XXX I don't think this will work quite yet */
-            /* key_set_size(kp,ix); */
-            internal_exception(OUT_OF_BOUNDS, "Array element out of bounds!\n");
-        }
-        if (ix < 0) {
-            ix += SELF->cache.int_val;
-        }
-
-        pmc2 = ((PMC**)(SELF->data))[ix];
-        pmc2->vtable->set_number_native(INTERP, pmc2, value);
+        STRING* key_string = make_hash_key(INTERP, key);
+        KEY_ATOM atom;
+        atom.type = enum_key_num;
+        atom.val.num_val = value;
+        hash_put(INTERP, SELF->data, key_string, &atom);
     }
 
     void set_string (PMC * value) {
+        internal_exception(INTERNAL_PANIC, "Cannot set hash to string\n");
     }
 
     void set_string_native (STRING * value) {
+        internal_exception(INTERNAL_PANIC, "Cannot set hash to string\n");
     }
 
     void set_string_unicode (STRING * value) {
+        internal_exception(INTERNAL_PANIC, "Cannot set hash to string\n");
     }
 
     void set_string_other (STRING * value) {
+        internal_exception(INTERNAL_PANIC, "Cannot set hash to string\n");
     }
 
     void set_string_same (PMC * value) {
+        internal_exception(INTERNAL_PANIC, "Cannot set hash to string\n");
     }
 
     void set_string_keyed (KEY * key, STRING * value) {
-        KEY_PAIR* kp;
-        INTVAL ix;
-        PMC* pmc2;
-
-        if (!key) {
-            return;
-        }
-
-        kp = &key->keys[0];
-        ix = kp->cache.int_val;
-
-        if (ix > SELF->cache.int_val) {
-            /* XXX I don't think this will work quite yet */
-            /* key_set_size(kp,ix); */
-            internal_exception(OUT_OF_BOUNDS, "Array element out of bounds!\n");
-        }
-        if (ix < 0) {
-            ix += SELF->cache.int_val;
-        }
-
-        pmc2 = ((PMC**)(SELF->data))[ix];
-        pmc2->vtable->set_string_native(INTERP, pmc2, value);
+        STRING* key_string = make_hash_key(INTERP, key);
+        KEY_ATOM atom;
+        atom.type = enum_key_string;
+        atom.val.string_val = value;
+        hash_put(INTERP, SELF->data, key_string, &atom);
     }
 
     void set_value (void* value) {
+        internal_exception(INTERNAL_PANIC, "Cannot set hash to value\n");
     }
 
     void add (PMC * value, PMC* dest) {
Index: include/parrot/hash.h
===================================================================
RCS file: include/parrot/hash.h
diff -N include/parrot/hash.h
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ include/parrot/hash.h       14 May 2002 22:05:30 -0000
@@ -0,0 +1,42 @@
+/* hash.h
+ *  Copyright: (When this is determined...it will go here)
+ *  CVS Info
+ *     $Id: key.h,v 1.10 2002/04/02 20:32:48 sfink Exp $
+ *  Overview:
+ *     Hashtable implementation
+ *  Data Structure and Algorithms:
+ *  History:
+ *  Notes:
+ *  References:
+ */
+
+#if !defined(PARROT_HASH_H_GUARD)
+#define PARROT_HASH_H_GUARD
+
+/* Prototypes */
+
+typedef struct _hashbucket HASHBUCKET;
+
+/* HASH is really a hashtable, but 'hash' is standard perl nomenclature. */
+typedef struct _hash HASH;
+
+HASH * new_hash(Interp *interpreter);
+INTVAL hash_size(Interp *interpreter, HASH *hash);
+void hash_set_size(Interp *interpreter, HASH *hash, UINTVAL size);
+void hash_destroy(Interp *interpreter, HASH* hash);
+KEY_ATOM * hash_get(Interp *interpreter, HASH *hash, STRING* key);
+void hash_put(Interp *interpreter, HASH *hash, STRING* key, KEY_ATOM* value);
+void hash_delete(Interp *interpreter, HASH *hash, STRING* key);
+PMC* mark_hash(Interp* interpreter, HASH* hash, PMC* end_of_used_list);
+
+#endif
+
+/*
+ * Local variables:
+ * c-indentation-style: bsd
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil 
+ * End:
+ *
+ * vim: expandtab shiftwidth=4:
+*/
Index: include/parrot/parrot.h
===================================================================
RCS file: /home/perlcvs/parrot/include/parrot/parrot.h,v
retrieving revision 1.31
diff -a -u -r1.31 parrot.h
--- include/parrot/parrot.h     6 May 2002 02:51:27 -0000       1.31
+++ include/parrot/parrot.h     14 May 2002 22:05:31 -0000
@@ -116,6 +116,7 @@
 #include "parrot/chartype.h"
 #include "parrot/string.h"
 #include "parrot/key.h"
+#include "parrot/hash.h"
 #include "parrot/vtable.h"
 #include "parrot/register.h"
 #include "parrot/regfuncs.h"
Index: t/pmc/perlhash.t
===================================================================
RCS file: /home/perlcvs/parrot/t/pmc/perlhash.t,v
retrieving revision 1.9
diff -a -u -r1.9 perlhash.t
--- t/pmc/perlhash.t    2 Apr 2002 20:32:52 -0000       1.9
+++ t/pmc/perlhash.t    14 May 2002 22:05:34 -0000
@@ -3,7 +3,6 @@
 use Parrot::Test tests => 8;
 use Test::More;
 
-SKIP: { skip("Hashes unimplemented", 8);
 output_is(<<'CODE', <<OUTPUT, "simple set / get");
        new P0, PerlHash
        set S0, "one"
@@ -180,7 +179,5 @@
 ok 2
 ok 3
 OUTPUT
-
-} # SKIP
 
 1;

[netlabs #574] [PATCH] Hashtables

Reply via email to