RE: [Jprogramming] Performance of case-insensitive lookup

R.E. Boss Fri, 12 Oct 2007 03:12:45 -0700

The adverb I use for these cases, where [EMAIL PROTECTED] is much less than # , 
is


        fst=:1 : '](i.!.0~ { u @:]) ~.'

   5 ts'tolower &.> ppl'
0.018118821 72128
   5 ts'tolower &.>fst ppl'
0.00065326459 19328

   (tolower &.>fst -: tolower &.>) ppl
1

It is about 28 times as fast and almost 4 times as lean, which gives a
relative performance of more than 100. which is still inferior to yours.

But then, in those cases, you should use the nub anyhow since 

        (x i. y) -: (~.x) i. y

   ts'ppl i.&:(tolower &.>) p'
0.017758768 73856
   ts'(~.ppl) i.&:(tolower &.>) p'
0.00043554953 9856

Which gives a relative performance improvement of a factor 300
   

R.E. Boss

 


> -----Oorspronkelijk bericht-----
> Van: [EMAIL PROTECTED] [mailto:programming-
> [EMAIL PROTECTED] Namens Sherlock, Ric
> Verzonden: vrijdag 12 oktober 2007 11:20
> Aan: Programming forum
> Onderwerp: [Jprogramming] Performance of case-insensitive lookup
> 
> I was doing a case-insensitive lookup of firstname and lastname in a
> 2-column boxed table.
>    fnames=: <;._1 ' John Dakota Wilson Diana Joan Roberto John John'
>    lnames=: <;._1 ' Smith Jones Chan Wilson Saxon Angelo Smith Wilson'
>    ]ppl=:500 $ fnames,.lnames
> +-------+------+
> |John   |Smith |
> +-------+------+
> |Dakota |Jones |
> +-------+------+
> |Wilson |Chan  |
> +-------+------+
> |Diana  |Wilson|
> +-------+------+
> |Joan   |Saxon |
> +-------+------+
> |Roberto|Angelo|
> +-------+------+
> |John   |Smith |
> +-------+------+
> |John   |Wilson|
> ...
> 
>    p=: 'Joan';'Saxon'
>    p2=:'JOAN';'saxon'
>    ppl i. p
> 4
>   (tolower each ppl) i. tolower each p2
> 4
> 
> However performance wasn't great, which I tracked it down to having to
> run the verb tolower so many times. Below I've documented a solution to
> this performance problem using inverted tables, but would be interested
> in other possible ways of bypassing the performance hit caused by making
> the lookup case-insensitive.
> 
> A solution using inverted tables.
> (Load collected definitions from
> http://www.jsoftware.com/jwiki/Essays/Inverted_Table )
> 
>    mfv=: ,:^:(#&$ = 1:) NB. Create 1 row matrix from vector
>    pplinv=: ifa ppl
>    pinv =:  ifa mfv p
>    p2inv=:  ifa mfv p2
>    pplinv tindexof pinv
> length error
> 
> The problem is that converting ppl to an inverted table extends each
> name to the length of the longest name. For pinv to match, its names
> also need to be extended to that same width.
> How can this best be done?
> 
> My solutions as follows:
>    textend=: {:@$&.>@[ {."1&.> ]
>    pplinv textend pinv
> +-------+------+
> |Joan   |Saxon |
> +-------+------+
> 
>    pplinv tindexof pplinv textend pinv
> 4
> 
> Or more directly:
> 
>    tindexof1=: [ tindexof {:@$&.>@[ {."1&.> ]
>    pplinv tindexof1 pinv
> 4
>    (tolower each pplinv) tindexof1 tolower each p2inv
> 4
> 
>    ts=: 6!:2 , 7!:[EMAIL PROTECTED]
>    ts '(tolower each ppl) i. tolower each p2'
> 0.0206076470613 147456
>    ts '(tolower each ppl2inv) tindexof1 tolower each p2inv'
> 0.000427987355935 48000
> 
> About 48 times faster and 3 time leaner using inverted tables.
> 
> Even for a single lookup the overhead of converting to inverted tables
> is worthwhile:
>    ts '(tolower each ifa ppl) tindexof1 tolower each ifa mfv p2'
> 0.000516546097339 57216
> 
> About 40 times faster and 2.5 times leaner.
> 
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

RE: [Jprogramming] Performance of case-insensitive lookup

Reply via email to