At 10:12 -0400 2006/05/15, Henry Rich wrote:
Try
x ([: I. E.) y
to get the list of places where the string x occurs. This uses
special code and doesn't create the entire result of E. .
Thank you Henry, I made my yesterday post in haste to leave for
an appointment and didn't resolve in my mind the unsettlingly
large space requirement....
At 13:10 -0700 2006/05/14, Joey K Tuttle wrote:
timex 'tagis =: I. tag E. mls' NB. time and space to get indexes
3.49947 1.34481e8
Much nicer in both space and time is your suggestion:
timex 'htagis =: tag ([: I. E.) mls'
0.929349 4.1977e6
htagis -: tagis
1
I can still imagine running into limits with really large files -
or a large number of "hits". Probably something the NSA folks have
been pondering regarding phone call logs (I have had fun in the
past with phone logs... :)
I take this opportunity to say that in my quick search for an
example yesterday, I was actually interested in the distribution
of sizes of email messages passing through my server. I have
for years (first motivated by phone call log tinkering) used
histograms to look at such, usually sorting them after getting
a frequency distribution, e.g.
hist
~. ,: #/.~
shist
([: /: 0"_ { ]) {"1 ]
shist hist ?.20$10
0 1 2 3 4 5 6 7 8 9
4 1 3 1 2 1 2 1 3 2
but that simple technique isn't very useful on data like my
sizes of email messages. Of course, the new dyadic I. is very
nice for such things, as in -
shist hist sizes I.~ 10^i.10
0 1 2 3 4 5 6 7 8
11720 1 9 2944 15840 11071 1663 523 176
(Of course, I had to run that last expression in j601 instead
of the j504 system that I used to do the timings)
- joey
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm