Cool.
if. contains y do. 0 return. end.
I think the more usual hash table interface, when asked to insert a key that
matches an existing one, replaces the existing value with the new one.
horners =. (2 ^ 32x) | [: (+37x&*)/ 0 |.@, 3 u: ]
- Reducing this mod 2^32 at end seems a bit pointless, as you are about to
reduce it mod the table size anyway. Evaluating the intermediate operations
mod 2^32, to avoid blowup in the intermediate terms, seems more useful. The
way to do this now is to use the recently-added conjunction m.
(https://code.jsoftware.com/wiki/Vocabulary/mdot); + m. (2x^32) does an
addition mod 2^32, and ditto * .
- Adding a zero at the beginning and then reversing seems a bit odd--why not
just add the zero to the end and not reverse? The result will be different,
but if the hash is good, it should not change the quality. (I guess the only
reason for tacking on the zero is to handle zero-length strings? It's
annoying that this needs extra effort, and I've specced, but not implemented,
a solution. One alternative is to use the recently-added fold family of the
conjunctions--https://code.jsoftware.com/wiki/Vocabulary/fcap)
Following notes are for recreational purposes only:
- I think a more usual approach nowadays to hashing large strings is to apply
a mac to some fixed-size blocks (a straightforward--though probably
suboptimal--implementation is (-blocksize) mac\ string) and then a more
expensive/expressive polynomial or similar for combining the blocks--this
should be faster and better quality than djb, though probably more effort.
- You can use _3&ic--after padding to a multiple of 8 bytes--to get the
characters packed more densely--eight per element, rather than one. If using
a better hash (i.e., something other than the likes of djb or fna-1a), that
will let you get through it quicker.
NB. maybe there's a better int hash?
Perhaps an appelby-style mixer--e.g.
http://zimbry.blogspot.com/2011/09/better-bit-mixing-improving-on.html
It's a good idea to apply such a mixer to the result of the string hash, too.
ts&|`horners@.('literal' -: datatype) y
index =: {{ ts | hash y }}
When y is an integer, you reduce it mod x in hash, and then immediately again
in index.
clear =: {{ 1 [ table =: a: #~ ts }}
I think this is prone to space leaks. Shouldn't it reset the table size to
zero? You won't be able to reuse the storage from the table anyway. (Well,
you could, but you don't.) Alternately, you could save the size passed in
when the table was created and use that instead. But empirically, it seems
like a bad idea to let users control such aspects of the hash table--as the
designer of google's hash table says: https://youtu.be/ncHmEUmJZf4?t=3075.
cs =: 0 NB. current size
I think this is usually referred to as 'occupancy' (as in 'how many occupants
does the table have?').
1 [ insert every items
It's common to return EMPTY (which is the same as i.0 0) from a verb which is
called only for effect (rather than for its result). As a matrix with zero
rows, it takes up zero lines of precious screenspace at the repl--a pun, but a
cute one.
On Tue, 18 Jul 2023, Luke D wrote:
I have written an implementation for a separate chaining hash table that
handles integer and string keys, which I have posted as a gist
https://gist.github.com/4hg/f412476edba86599f1b0e6dd2ea8b12d, and I am
moderately pleased with the outcome. It was my first exploration with the
topics found in chapter 25 of Learning J.
If anyone cares to comment, I am eager to hear any feedback regarding
convention or design. Feel free to tear it up, if necessary, however, I
still have some testing to do, so apologies if I am showcasing a limp horse.
Best Regards,
Luke De La Cruz
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm