There is no point to hashing or small-range or ... if you only have one
item in the right argument. Benchmarks in the paper (as stated) were done
on arguments with 1e6 items.
I have the following model of hashing in Dyalog APL. Its translation into
J (including dealing with APL chars which I am told is NO LONGER A PROBLEM)
is left as an exercise for the reader. :-). Another (easy) exercise is to
find x and y for which the verbose model is faster than the one-liner.
z←x xiy y;⎕io;h;hf;i;j;m;n;q
⍝ model of x⍳y using hashing; written to be easily translated into C
⎕io←0
hf←{123457×⍵} ⍝ hash function
n←≢y
m←≢x
q←2*⌈2⍟m ⍝ size of hash table
h←q⍴m ⍝ hash table; m means "free"
z←n⍴m ⍝ initialize to "not found"
:For i :In ⍳m ⍝ index for each x
j←q|hf x[i] ⍝ index into hash table
:While m>h[j] ⋄ :AndIf x[h[j]]≠x[i] ⍝ i.e. stop on finding m or an
equal entry
j←q|1+j ⍝ the next hash table entry
:End
:If m=h[j] ⋄ h[j]←i ⋄ :End ⍝ new hash entry
:End
:For i :In ⍳n ⍝ index for each y
j←q|hf y[i] ⍝ where to start looking in hash
table
:While m>h[j] ⋄ :AndIf x[h[j]]≠y[i] ⍝ i.e. stop on finding m or an
equal entry
j←q|1+j ⍝ the next hash table entry
:End
z[i]←h[j] ⍝ here, either m=h[j] or
x[h[j]]=y[i]
:End
On Thu, Jul 31, 2014 at 6:41 AM, Joe Bogner <[email protected]> wrote:
> I enjoyed your paper and particularly enjoyed playing with the native
> J implementation:
>
> xiy =: 13 : '+/*./\x~:/y'
>
> (i. 1e6) xiy 1e5
> 100000
>
> I was surprised that the tacit J version performed reasonably close to
> the special code C version:
>
> big=: 1e6
>
> 100 timespacex 'big xiy 1e5'
> 1.01128e_5 2816
>
> 100 timespacex 'big i. 1e5'
> 1.23157e_6 1664
>
> The timing starts to diverge on significantly boxed arrays it seems at
> first glance:
>
> big=: (1e6 # <'a')
>
> 100 timespacex 'big i. <''a'''
> 3.50044e_6 1920
> 100 timespacex 'big xiy <''a'''
> 0.054772 2.09933e6
>
> That seems to be hitting an optimization where it stops on first
> find. Compare to:
>
> 100 timespacex 'big i. <''z'''
> 0.0486771 1920
>
> And it runs similarly to the native J version, which was 0.054772
>
> At first I was also puzzled by why / was required.
>
> xiy1 =: 13 : '+/*./\x~:y'
> (i. 1e5) (xiy -: xiy1) 10
> 1
>
> Then I realized this:
>
> (i. 1e5) xiy1 (10,2)
> |length error: xiy1
> | (i.100000) xiy1(10,2)
>
> However:
> (i. 1e5) xiy (10,2)
> 10 2
>
> It was neat to tinker with xiy to better understand how it works.
>
> I now need to spend some time better understanding the hashing. I
> understand at a surface level yet want to play with the examples too.
> I may try to create a J implementation of the algorithm using amend
> and loops. I realize it will be slow, but it will be easier to play
> with than C. If someone else wants to do it and share I'd be glad to
> use that instead. Otherwise, if I get around to it I will post it.
>
>
>
>
>
>
> On Wed, Jul 30, 2014 at 2:44 AM, Roger Hui <[email protected]>
> wrote:
> > My J Conference 2014 presentation can be found at:
> >
> > Slides only: http://www.jsoftware.com/papers/indexof/
> > Slides and script:
> http://www.jsoftware.com/papers/indexof/indexofscript.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm