RE: [Jprogramming] Scanning a large file

Joey K Tuttle Mon, 15 May 2006 07:50:44 -0700

At 10:12  -0400 2006/05/15, Henry Rich wrote:

Try


x ([: I. E.) y

to get the list of places where the string x occurs.  This uses
special code and doesn't create the entire result of E. .


Thank you Henry, I made my yesterday post in haste to leave for
an appointment and didn't resolve in my mind the unsettlingly
large space requirement....

At 13:10  -0700 2006/05/14, Joey K Tuttle wrote:


   timex 'tagis =: I. tag E. mls'    NB. time and space to get indexes
3.49947 1.34481e8


Much nicer in both space and time is your suggestion:

   timex 'htagis =: tag  ([: I. E.)  mls'
0.929349 4.1977e6
   htagis -: tagis
1

I can still imagine running into limits with really large files -
or a large number of "hits". Probably something the NSA folks have
been pondering regarding phone call logs (I have had fun in the
past with phone logs... :)

I take this opportunity to say that in my quick search for an
example yesterday, I was actually interested in the distribution
of sizes of email messages passing through my server. I have
for years (first motivated by phone call log tinkering) used
histograms to look at such, usually sorting them after getting
a frequency distribution, e.g.

   hist
~. ,: #/.~
   shist
([: /: 0"_ { ]) {"1 ]

   shist hist ?.20$10
0 1 2 3 4 5 6 7 8 9
4 1 3 1 2 1 2 1 3 2

but that simple technique isn't very useful on data like my
sizes of email messages. Of course, the new dyadic I. is very
nice for such things, as in -

   shist hist sizes I.~ 10^i.10
    0 1 2    3     4     5    6   7   8
11720 1 9 2944 15840 11071 1663 523 176

(Of course, I had to run that last expression in j601 instead
of the j504 system that I used to do the timings)



- joey
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

RE: [Jprogramming] Scanning a large file

Reply via email to