Consider this example:
table=:<;._2;._2]0 :0
First Name,Last Name,Sum,
Adam,Wallace,19,
Travis,Smith,10,
Donald,Barnell,8,
Gary,Wallace,27,
James,Smith,10,
Sam,Johnson,10,
Travis,Neal,11,
Adam,Campbell,11,
Walter,Abbott,13,
)
Using boxed strings works great for relatively small sets of data. But when
things get big, their overhead starts to hurt to much. (Big means: so much
data that you'll probably not be able to fit it all in memory at the same
time. So you need to plan on relatively frequent delays while reading from
disk.)
One alternative to boxed strings is segmented strings. A segmented string
is an argument which could be passed to <;._1. It's basically just a string
with a prefix delimiter. You can work with these sorts of strings directly,
and achieve results similar to what you would achieve with boxed arrays.
Segmented strings are a bit clumsier than boxed arrays - you lose a lot of
the integrity checks, so if you mess up you probably will not see an error.
So it's probably a good idea to model your code using boxed arrays on a
small set of data and then convert to segmented representation once you're
happy with how things work (and once you see a time cost that makes it
worth spending the time to rework your code).
Also, to avoid having to use f;._2 (or whatever) every time, it's good to
do an initial pass on the data, to extract its structure.
Here's an example:
FirstName=:;LF&,each }.0{"1 table
LastName=:;LF&,each }.1{"1 table
Sum=:;LF&,each }.2{"1 table
ssdir=: [:(}:,:2-~/\])I.@(= {.),#
FirstNameDir=: ssdir FirstName
LastNameDir=: ssdir LastName
Actually, sum is numeric so let's just use a numeric representation for
that column
Sum=: _&".@> }.2{"1 table
Which rows have a last name of Smith?
<:({.LastNameDir) I. I.'Smith' E. LastName
1 4
Actually, there's an assumption there that Smith is not part of some larger
name. We can include the delimiter in the search if we are concerned about
that. For even more protection we could append a trailing delimiter on our
segmented string and then search for (in this case) LF,'Smith',LF.
Anyways, let's extract the corresponding sums and first name:
1 4{Sum
10 10
FirstName{~;<@(+ i.)/"1|:1 4 {"1 FirstNameDir
Travis
James
Note that that last expression is a bit complicated. It's not so bad,
though, if what you are extracting is a small part of the total. And, in
that case, using a list of indices to express a boolean result seems like a
good thing. You wind up working with set operations (intersection and
union) rather than logical operations (and and or). Also, set difference
instead of logical not (dyadic -. instead of monadic -.).
intersect=: [ -. -.
union=. ~.@,
(It looks like I might be using this kind of thing really soon, so I
thought I'd lay down my thoughts here and invite comment.)
Thanks,
--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm