Cool.

if. contains y do. 0 return. end.

I think the more usual hash table interface, when asked to insert a key that matches an existing one, replaces the existing value with the new one.

horners =. (2 ^ 32x) | [: (+37x&*)/ 0 |.@, 3 u: ]

- Reducing this mod 2^32 at end seems a bit pointless, as you are about to reduce it mod the table size anyway. Evaluating the intermediate operations mod 2^32, to avoid blowup in the intermediate terms, seems more useful. The way to do this now is to use the recently-added conjunction m. (https://code.jsoftware.com/wiki/Vocabulary/mdot); + m. (2x^32) does an addition mod 2^32, and ditto * .

- Adding a zero at the beginning and then reversing seems a bit odd--why not just add the zero to the end and not reverse? The result will be different, but if the hash is good, it should not change the quality. (I guess the only reason for tacking on the zero is to handle zero-length strings? It's annoying that this needs extra effort, and I've specced, but not implemented, a solution. One alternative is to use the recently-added fold family of the conjunctions--https://code.jsoftware.com/wiki/Vocabulary/fcap)

Following notes are for recreational purposes only:

- I think a more usual approach nowadays to hashing large strings is to apply a mac to some fixed-size blocks (a straightforward--though probably suboptimal--implementation is (-blocksize) mac\ string) and then a more expensive/expressive polynomial or similar for combining the blocks--this should be faster and better quality than djb, though probably more effort.

- You can use _3&ic--after padding to a multiple of 8 bytes--to get the characters packed more densely--eight per element, rather than one. If using a better hash (i.e., something other than the likes of djb or fna-1a), that will let you get through it quicker.

NB. maybe there's a better int hash?

Perhaps an appelby-style mixer--e.g. http://zimbry.blogspot.com/2011/09/better-bit-mixing-improving-on.html

It's a good idea to apply such a mixer to the result of the string hash, too.

ts&|`horners@.('literal' -: datatype) y
index =: {{ ts | hash y }}

When y is an integer, you reduce it mod x in hash, and then immediately again in index.

clear =: {{ 1 [ table =: a: #~ ts }}

I think this is prone to space leaks. Shouldn't it reset the table size to zero? You won't be able to reuse the storage from the table anyway. (Well, you could, but you don't.) Alternately, you could save the size passed in when the table was created and use that instead. But empirically, it seems like a bad idea to let users control such aspects of the hash table--as the designer of google's hash table says: https://youtu.be/ncHmEUmJZf4?t=3075.

cs =: 0 NB. current size

I think this is usually referred to as 'occupancy' (as in 'how many occupants does the table have?').

1 [ insert every items

It's common to return EMPTY (which is the same as i.0 0) from a verb which is called only for effect (rather than for its result). As a matrix with zero rows, it takes up zero lines of precious screenspace at the repl--a pun, but a cute one.

On Tue, 18 Jul 2023, Luke D wrote:

I have written an implementation for a separate chaining hash table that
handles integer and string keys, which I have posted as a gist
https://gist.github.com/4hg/f412476edba86599f1b0e6dd2ea8b12d, and I am
moderately pleased with the outcome. It was my first exploration with the
topics found in chapter 25 of Learning J.

If anyone cares to comment, I am eager to hear any feedback regarding
convention or design. Feel free to tear it up, if necessary, however, I
still have some testing to do, so apologies if I am showcasing a limp horse.

Best Regards,
Luke De La Cruz
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to