RE: [Jprogramming] Performance of case-insensitive lookup

R.E. Boss Fri, 12 Oct 2007 10:42:42 -0700

I was wrong, it should have been
        (x i. y) -: ((i.~.)x){~(~.x) i. y

It still is a lot more efficient if [EMAIL PROTECTED] is much smaller than # .



R.E. Boss



> -----Oorspronkelijk bericht-----
> Van: [EMAIL PROTECTED] [mailto:programming-
> [EMAIL PROTECTED] Namens Roger Hui
> Verzonden: vrijdag 12 oktober 2007 17:13
> Aan: Programming forum
> Onderwerp: Re: [Jprogramming] Performance of case-insensitive lookup
> 
> > But then, in those cases, you should use the nub anyhow since
> >     (x i. y) -: (~.x) i. y
> 
> That is not true in general:
> 
>    x=: 20 [EMAIL PROTECTED] 10
>    y=: 5 [EMAIL PROTECTED] 10
>    x i. y
> 20 1 9 6 4
>    (~.x) i. y
> 9 1 6 5 4
> 
> The "those cases" for which the statement is true, must be
> that  x-:~.x   or that y has 0 items or some more complicated
> condition on x and y (all items of y occur in a prefix of x
> that has all unique items).
> 
> 
> 
> ----- Original Message -----
> From: "R.E. Boss" <[EMAIL PROTECTED]>
> Date: Friday, October 12, 2007 3:04
> Subject: RE: [Jprogramming] Performance of case-insensitive lookup
> To: 'Programming forum' <[email protected]>
> 
> > The adverb I use for these cases, where [EMAIL PROTECTED] is much less than 
> > #
> > , is
> >
> >     fst=:1 : '](i.!.0~ { u @:]) ~.'
> >
> >    5 ts'tolower &.> ppl'
> > 0.018118821 72128
> >    5 ts'tolower &.>fst ppl'
> > 0.00065326459 19328
> >
> >    (tolower &.>fst -: tolower &.>) ppl
> > 1
> >
> > It is about 28 times as fast and almost 4 times as lean, which
> > gives a
> > relative performance of more than 100. which is still inferior
> > to yours.
> >
> > But then, in those cases, you should use the nub anyhow since
> >
> >     (x i. y) -: (~.x) i. y
> >
> >    ts'ppl i.&:(tolower &.>) p'
> > 0.017758768 73856
> >    ts'(~.ppl) i.&:(tolower &.>) p'
> > 0.00043554953 9856
> >
> > Which gives a relative performance improvement of a factor 300
> >
> >
> > R.E. Boss
> >
> >
> >
> >
> > > -----Oorspronkelijk bericht-----
> > > Van: [EMAIL PROTECTED] [mailto:programming-
> > > [EMAIL PROTECTED] Namens Sherlock, Ric
> > > Verzonden: vrijdag 12 oktober 2007 11:20
> > > Aan: Programming forum
> > > Onderwerp: [Jprogramming] Performance of case-insensitive lookup
> > >
> > > I was doing a case-insensitive lookup of firstname and
> > lastname in a
> > > 2-column boxed table.
> > >    fnames=: <;._1 ' John Dakota Wilson Diana
> > Joan Roberto John John'
> > >    lnames=: <;._1 ' Smith Jones Chan Wilson
> > Saxon Angelo Smith Wilson'
> > >    ]ppl=:500 $ fnames,.lnames
> > > +-------+------+
> > > |John   |Smith |
> > > +-------+------+
> > > |Dakota |Jones |
> > > +-------+------+
> > > |Wilson |Chan  |
> > > +-------+------+
> > > |Diana  |Wilson|
> > > +-------+------+
> > > |Joan   |Saxon |
> > > +-------+------+
> > > |Roberto|Angelo|
> > > +-------+------+
> > > |John   |Smith |
> > > +-------+------+
> > > |John   |Wilson|
> > > ...
> > >
> > >    p=: 'Joan';'Saxon'
> > >    p2=:'JOAN';'saxon'
> > >    ppl i. p
> > > 4
> > >   (tolower each ppl) i. tolower each p2
> > > 4
> > >
> > > However performance wasn't great, which I tracked it down to
> > having to
> > > run the verb tolower so many times. Below I've documented a
> > solution to
> > > this performance problem using inverted tables, but would be
> > interested> in other possible ways of bypassing the performance
> > hit caused by making
> > > the lookup case-insensitive.
> > >
> > > A solution using inverted tables.
> > > (Load collected definitions from
> > > http://www.jsoftware.com/jwiki/Essays/Inverted_Table )
> > >
> > >    mfv=: ,:^:(#&$ = 1:) NB. Create 1 row matrix
> > from vector
> > >    pplinv=: ifa ppl
> > >    pinv =:  ifa mfv p
> > >    p2inv=:  ifa mfv p2
> > >    pplinv tindexof pinv
> > > length error
> > >
> > > The problem is that converting ppl to an inverted table
> > extends each
> > > name to the length of the longest name. For pinv to match, its names
> > > also need to be extended to that same width.
> > > How can this best be done?
> > >
> > > My solutions as follows:
> > >    textend=: {:@$&.>@[ {."1&.> ]
> > >    pplinv textend pinv
> > > +-------+------+
> > > |Joan   |Saxon |
> > > +-------+------+
> > >
> > >    pplinv tindexof pplinv textend pinv
> > > 4
> > >
> > > Or more directly:
> > >
> > >    tindexof1=: [ tindexof {:@$&.>@[ {."1&.> ]
> > >    pplinv tindexof1 pinv
> > > 4
> > >    (tolower each pplinv) tindexof1 tolower each p2inv
> > > 4
> > >
> > >    ts=: 6!:2 , 7!:[EMAIL PROTECTED]
> > >    ts '(tolower each ppl) i. tolower each p2'
> > > 0.0206076470613 147456
> > >    ts '(tolower each ppl2inv) tindexof1 tolower
> > each p2inv'
> > > 0.000427987355935 48000
> > >
> > > About 48 times faster and 3 time leaner using inverted tables.
> > >
> > > Even for a single lookup the overhead of converting to
> > inverted tables
> > > is worthwhile:
> > >    ts '(tolower each ifa ppl) tindexof1 tolower
> > each ifa mfv p2'
> > > 0.000516546097339 57216
> > >
> > > About 40 times faster and 2.5 times leaner.
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

RE: [Jprogramming] Performance of case-insensitive lookup

Reply via email to