Wow, I appreciate your thorough examination.

Nearly all of the shortcomings came from me following the simple interface
and behavior laid out in chapter 5 of Mark Weiss' "Data Structures and
Algorithm Analysis in C++" (I wanted to ensure that I had something working
as intended), such as using horners to begin with and reducing mod 2^32
(the book uses unsigned ints, though you are right about moving the mod
inside the reduction) and reversing the input string to mimic a for each
loop.

> I guess the only reason for tacking on the zero is to handle zero-length
strings?
Frankly, I hadn't even considered empty strings, but rather, I was using 0
as a logical start value.

> One alternative is to use the recently-added fold family of the
conjunctions
I have actually never used these before, but it appears that the equivalent
of the C++ code, found in Figure 5.4 of the aforementioned book and below,
would be (0 [F..((2 ^ 32x)|[+37x*]) 3 u: ]), which is much less
intimidating than I had originally thought and handles empty strings as you
mentioned.

unsigned int horners(const std::string& key, int tableSize) {
    unsigned int hashValue = 0;
    for (char c : key) hashValue = 37 * hashValue + c;
    return hashValue % s;
}

> clear =: {{ 1 [ table =: a: #~ ts }
I should definitely set occupancy to 0 here regardless, but I am not sure
about setting the table size to 0 (reusing the initial table size to
possibly free up memory seems better). The goal of this method is to
simply empty all of the items inserted into the table, and my current,
naive understanding is that you would change how many buckets you have
when it's time to rehash, which is the situation Matt is describing in his
talk with the set max load factor method. Could you elaborate on this?

Best Regards,
Luke De La Cruz


On Tue, Jul 18, 2023 at 5:13 AM Elijah Stone <elro...@elronnd.net> wrote:

> Cool.
>
> > if. contains y do. 0 return. end.
>
> I think the more usual hash table interface, when asked to insert a key
> that
> matches an existing one, replaces the existing value with the new one.
>
> > horners =. (2 ^ 32x) | [: (+37x&*)/ 0 |.@, 3 u: ]
>
> - Reducing this mod 2^32 at end seems a bit pointless, as you are about to
> reduce it mod the table size anyway.  Evaluating the intermediate
> operations
> mod 2^32, to avoid blowup in the intermediate terms, seems more useful.
> The
> way to do this now is to use the recently-added conjunction m.
> (https://code.jsoftware.com/wiki/Vocabulary/mdot); + m. (2x^32) does an
> addition mod 2^32, and ditto * .
>
> - Adding a zero at the beginning and then reversing seems a bit odd--why
> not
> just add the zero to the end and not reverse?  The result will be
> different,
> but if the hash is good, it should not change the quality.  (I guess the
> only
> reason for tacking on the zero is to handle zero-length strings?  It's
> annoying that this needs extra effort, and I've specced, but not
> implemented,
> a solution.  One alternative is to use the recently-added fold family of
> the
> conjunctions--https://code.jsoftware.com/wiki/Vocabulary/fcap)
>
> Following notes are for recreational purposes only:
>
> - I think a more usual approach nowadays to hashing large strings is to
> apply
> a mac to some fixed-size blocks (a straightforward--though probably
> suboptimal--implementation is (-blocksize) mac\ string) and then a more
> expensive/expressive polynomial or similar for combining the blocks--this
> should be faster and better quality than djb, though probably more effort.
>
> - You can use _3&ic--after padding to a multiple of 8 bytes--to get the
> characters packed more densely--eight per element, rather than one.  If
> using
> a better hash (i.e., something other than the likes of djb or fna-1a),
> that
> will let you get through it quicker.
>
> > NB. maybe there's a better int hash?
>
> Perhaps an appelby-style mixer--e.g.
> http://zimbry.blogspot.com/2011/09/better-bit-mixing-improving-on.html
>
> It's a good idea to apply such a mixer to the result of the string hash,
> too.
>
> > ts&|`horners@.('literal' -: datatype) y
> > index =: {{ ts | hash y }}
>
> When y is an integer, you reduce it mod x in hash, and then immediately
> again
> in index.
>
> > clear =: {{ 1 [ table =: a: #~ ts }}
>
> I think this is prone to space leaks.  Shouldn't it reset the table size
> to
> zero?  You won't be able to reuse the storage from the table anyway.
> (Well,
> you could, but you don't.)  Alternately, you could save the size passed in
> when the table was created and use that instead.  But empirically, it
> seems
> like a bad idea to let users control such aspects of the hash table--as
> the
> designer of google's hash table says: https://youtu.be/ncHmEUmJZf4?t=3075.
>
> > cs =: 0 NB. current size
>
> I think this is usually referred to as 'occupancy' (as in 'how many
> occupants
> does the table have?').
>
> > 1 [ insert every items
>
> It's common to return EMPTY (which is the same as i.0 0) from a verb which
> is
> called only for effect (rather than for its result).  As a matrix with
> zero
> rows, it takes up zero lines of precious screenspace at the repl--a pun,
> but a
> cute one.
>
> On Tue, 18 Jul 2023, Luke D wrote:
>
> > I have written an implementation for a separate chaining hash table that
> > handles integer and string keys, which I have posted as a gist
> > https://gist.github.com/4hg/f412476edba86599f1b0e6dd2ea8b12d, and I am
> > moderately pleased with the outcome. It was my first exploration with the
> > topics found in chapter 25 of Learning J.
> >
> > If anyone cares to comment, I am eager to hear any feedback regarding
> > convention or design. Feel free to tear it up, if necessary, however, I
> > still have some testing to do, so apologies if I am showcasing a limp
> horse.
> >
> > Best Regards,
> > Luke De La Cruz
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to