Re: [Jprogramming] A symetrical hash function that outputs valid J variable names

'Pascal Jasmin' via Programming Wed, 03 Sep 2014 19:29:55 -0700

The difference between using L:1 and S:1 is that with L:1 the tree structure is 
kept.  When used recursively, the tree gets rolled up from the bottom.  If 
there is a relationship with the group of cells at a tree level, it cannot be 
expressed with S: .


in terms of access speed, I do have benchmarks to complete, but

'key'~ should be faster than  values {~ names i. 'key'

Another issue though is heavy inserts (=:) and deletes (erase 'key').

While I call it a hash, its more of a "context dependent collision avoidance 
transformation".  J's underlying symbol table hashing is doing the hashing 
collision management.  If your context can possibly create collisions then you 
probably want to transform the data to eliminate possible collisions.  

An application that might be a fair bit faster than i. is indexing multiple 
keyed data avoiding boxes.

An application that you once showed me is an alternative to J's numbered and 
other locales.  Creating a hierarchy within a locale that is only of concern to 
that locale and can be erased all at once. 

I don't understand what you mean by unification scheme.


----- Original Message -----
From: Raul Miller <[email protected]>
To: Programming forum <[email protected]>
Cc: 
Sent: Wednesday, September 3, 2014 9:07:42 PM
Subject: Re: [Jprogramming] A symetrical hash function that outputs valid J 
variable names

Hmm...

A few comments:

1] I like strbracket:
   '()' strbracket 'abc'
(abc)

2] If by 'tree' you mean J boxed lists of boxed lists to arbitrary
depths, here's a simple flattener:
   <S:0 tree

If the leaf data is homogeneous and makes sense without the boxes you
can ;<S:0 tree

3] When dealing with hashes its usually worthwhile doing some logical
analysis on the collision rates and some benchmarks on the timing.
(The more involved something is, the more likely it is that there's
simpler faster ways of accomplishing something similar.)

4] The classic "APL" mechanism for "lookup" is a pair of lists - names
and values, and an expression which translated into J might look like:
((names i. y) { values) - so that's the benchmark to beat.

5] Do you think you could implement a unification scheme using this
mechanism? (My guess is you'd want something a bit simpler, but I
could be wrong.)

Thanks,

-- 
Raul

On Wed, Sep 3, 2014 at 8:51 PM, 'Pascal Jasmin' via Programming
<[email protected]> wrote:
> strbracket =: {.@:[ , ] , {:@[
> joinB =: (1 : ' ((- # m)&}.@;@(,&m&.>"1))')(@: (] : ;))
> nums2name =:  'NUM' joinB@:cut@:(('.';'F') rplc~ ('_';'n') rplc~ ":)^:(1 4 8 
> 16 64 128  e.~ 3!:0)
> boxsandnums2name =: [: (('B0X_',:'B0oX') strbracket '_' joinB )^:(32  = 3!:0 
> ) L:1 ^:_ ^:(1 <L.) nums2name L:0
> text2name =: ('_' joinB)@:,&boxopen&([: cut (AlphaNum_j_, ' ')&(-.~ -.~ 
> ]))^:(2 = 3!:0)
> buildvar =:  ([: ('_'joinB)@:('_'&cut)@:('_'joinB)  [:  ('_' 
> joinB)@:,&boxopen&boxsandnums2name&.> text2name L:0 )@:boxopen@:(] :;)
>
> joinB is an adverb similar to ;: inv.  ' ' JoinB is identical to ;: inv.  
> Except rather than space, any string can be used to join together boxes, and 
> it can be used dyadically implicitly boxing the arguments.
>
>    'fd' '_'joinB 'hh'
> fd_hh
>
>    '_'joinB <'hh'
> hh
>
>    '_|_'joinB ;: 'fb hh try'
> fb_|_hh_|_try
>
>
> nums2name will take a numeric list (or scalar) and strip out _ and .  It will 
> also insert 'NUM' between numeric list items.
>
>    nums2name each _4.32;i.2 3
> ┌─────┬──────────────────┐
> │n4F32│0NUM1NUM23NUM4NUM5│
> └─────┴──────────────────┘
> the above may seem buggy (23)  but cleaner separation for higher dimensional 
> lists can be obtained by boxing them.
>
>    nums2name each _4.32;<"1 i.2 3
> ┌─────┬─────────┬─────────┐
> │n4F32│0NUM1NUM2│3NUM4NUM5│
> └─────┴─────────┴─────────┘
>
>   nums2name each 'asdf';_4.32; i.2 3
>   NB. leaves non numeric data unchanged
> ┌────┬─────┬──────────────────┐
> │asdf│n4F32│0NUM1NUM23NUM4NUM5│
> └────┴─────┴──────────────────┘
>
>
> boxsandnums2name calls nums2names first such that all data processed is 
> strings and boxes of strings.  There is a neat recursive process I discovered 
> for flattening trees.  It tunnels down to L:1 level, and flattens that level 
> converting the L:1 cell to a string, then recurses "upwards" until there are 
> no more L:2+ cells.
>
>    boxsandnums2name  (5!:2 <'nums2name')
> B0X_B0X_B0X_B0X_B0X_B0X_n3_&_}.B0oX_@_;B0oX_@_B0X_B0X_B0X_,_&_NUMB0oX_&._>B0oX_"_1B0oXB0oX_@:_cutB0oX_@:_B0X_B0X_._FB0oX_B0X_rplc_~B0oX_B0X_B0X___nB0oX_B0X_rplc_~B0oX_":B0oXB0oXB0oX_^:_B0X_1NUM4NUM8NUM16NUM64NUM128_B0X_e._~B0oX_B0X_3_!:_0B0oXB0oXB0oX
>
> text2name strips all punctuation from a string then replaces (possible 
> multiple) spaces with '_'
>
> buildvar makes the hash.  Using all of the previous functions.
>
> for strings, words in a string will hash to same value as a list of boxed 
> words.
>    buildvar 'asd fsdf sdf sd fsd f'
> asd_fsdf_sdf_sd_fsd_f
>    buildvar ;:'asd fsdf sdf sd fsd f'
> asd_fsdf_sdf_sd_fsd_f
>
> same as this dyadic call:
>    'asd' buildvar '   fsdf sdf sd fsd f'
> asd_fsdf_sdf_sd_fsd_f
>
>    'asd' buildvar '   fsdf sdf sd fsd f'; 123 _33.3
> asd_fsdf_sdf_sd_fsd_f_123NUMn33F3
>
> for numbers though, boxed nums hash differently than a list of numbers
>
>    buildvar 1 2.4 _33
> 1NUM2F4NUMn33
>    buildvar ;/ 1 2.4 _33
> 1_2F4_n33
>    buildvar << ;/ 1 2.4 _33
> B0X_B0X_1_2F4_n33B0oXB0oX
>
> You may note that numbers do not produce a legal variable name if the leading 
> character is a number.  The presented routine creates the core hash.  Its 
> expected that a classifying prefix would be appended, and a locale would be 
> attached.
>
> boxed positive integers and their text version hash identically.  As strings 
> though, negatives and decimal points are stripped off.
>
>    buildvar  ;: 'sdf sdf  dsdff 1 2.4 _33'
> sdf_sdf_dsdff_1_24_33
>    buildvar ".^:(0 -.@-: 0&".) each ;: 'sdf sdf  dsdff 1 2.4 _33'
> sdf_sdf_dsdff_1NUM2F4NUMn33
>
>
>
> doesn't work great (too many symbols stripped out) for J code, but it still 
> works:
>   buildvar quote 5!:5 <'nums2name'
> 3NUM1cutF_rplc_n_rplc_1_4_8_16_64_128_e_30
>
>    buildvar  (5!:2 <'nums2name')
> B0X_B0X_B0X_B0X_B0X_n3_B0oX_B0oX_B0X_B0X_B0X_NUMB0oX_B0oX_1B0oXB0oX_cutB0oX_B0X_B0X_FB0oX_B0X_rplc_B0oX_B0X_B0X_nB0oX_B0X_rplc_B0oX_B0oXB0oXB0oX_B0X_1NUM4NUM8NUM16NUM64NUM128_B0X_e_B0oX_B0X_3_0B0oXB0oX
>
> The hash is symetrical for Alpha Num and space.  That is, if no symbol has 
> been stripped off, then the original data structure can be recovered.  It is 
> somewhat human readable.
>
> As to why, it can hash anything, and since the hash value is the basis for a 
> legal J name, anything can be associated with anything.  A prefix can provide 
> context for what you are associating and so what to do with the content, and 
> allows classification such that multiple data (dictionaries) can be held 
> within a single locale.
>
> Compared to a boxed dictionary structure, value retrieval is faster.  By 
> holding a tree structure within prefixes, tree traversal is also faster.  A 
> linked list where the hash of the data gives the location of the next node is 
> an application.  If the data is reversible (boxedalphanumspace) then you also 
> get double linked list.  Because of infinite boxing, all data can be 
> transformed into alphanumspace.
>
> Compared to symbols, the hash is permanent, repeatable, serializable, and 
> shareable among machines.
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm



----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] A symetrical hash function that outputs valid J variable names

Reply via email to