I was doing a case-insensitive lookup of firstname and lastname in a
2-column boxed table.
   fnames=: <;._1 ' John Dakota Wilson Diana Joan Roberto John John'
   lnames=: <;._1 ' Smith Jones Chan Wilson Saxon Angelo Smith Wilson'
   ]ppl=:500 $ fnames,.lnames
+-------+------+
|John   |Smith |
+-------+------+
|Dakota |Jones |
+-------+------+
|Wilson |Chan  |
+-------+------+
|Diana  |Wilson|
+-------+------+
|Joan   |Saxon |
+-------+------+
|Roberto|Angelo|
+-------+------+
|John   |Smith |
+-------+------+
|John   |Wilson|
...

   p=: 'Joan';'Saxon'
   p2=:'JOAN';'saxon'
   ppl i. p
4
  (tolower each ppl) i. tolower each p2
4

However performance wasn't great, which I tracked it down to having to
run the verb tolower so many times. Below I've documented a solution to
this performance problem using inverted tables, but would be interested
in other possible ways of bypassing the performance hit caused by making
the lookup case-insensitive.

A solution using inverted tables.
(Load collected definitions from
http://www.jsoftware.com/jwiki/Essays/Inverted_Table )

   mfv=: ,:^:(#&$ = 1:) NB. Create 1 row matrix from vector
   pplinv=: ifa ppl
   pinv =:  ifa mfv p
   p2inv=:  ifa mfv p2
   pplinv tindexof pinv
length error

The problem is that converting ppl to an inverted table extends each
name to the length of the longest name. For pinv to match, its names
also need to be extended to that same width.
How can this best be done?

My solutions as follows:
   textend=: {:@$&.>@[ {."1&.> ]
   pplinv textend pinv
+-------+------+
|Joan   |Saxon |
+-------+------+

   pplinv tindexof pplinv textend pinv
4

Or more directly:

   tindexof1=: [ tindexof {:@$&.>@[ {."1&.> ]
   pplinv tindexof1 pinv
4
   (tolower each pplinv) tindexof1 tolower each p2inv
4

   ts=: 6!:2 , 7!:[EMAIL PROTECTED]
   ts '(tolower each ppl) i. tolower each p2'
0.0206076470613 147456
   ts '(tolower each ppl2inv) tindexof1 tolower each p2inv'
0.000427987355935 48000

About 48 times faster and 3 time leaner using inverted tables. 

Even for a single lookup the overhead of converting to inverted tables
is worthwhile:
   ts '(tolower each ifa ppl) tindexof1 tolower each ifa mfv p2'
0.000516546097339 57216

About 40 times faster and 2.5 times leaner.

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to