Try x ([: I. E.) y
to get the list of places where the string x occurs. This uses special code and doesn't create the entire result of E. . Henry Rich > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Yoel Jacobsen > Sent: Monday, May 15, 2006 10:09 AM > To: Programming forum > Subject: Re: [Jprogramming] Scanning a large file > > It won't work for large files. E. returns a 'limit error'. > > Yoel > > On 5/14/06, Joey K Tuttle <[EMAIL PROTECTED]> wrote: > > > > Yoel, > > > > Some of the feedback you got suggested mapped files, others > > suggested just reading the file. My own habits lean towards > > reading the file and I have a utility verb that gets "lines" > > while not exceeding a buffer size limit. I find that buffer > > sizes > 100Kbytes generally make almost no difference in > > processing time - in fact, processing can take longer on > > larger chunks. Actually, the gain after 40Kbytes is minor > > indeed. > > > > But in your responses you indicated that you were interested > > in not using (explicit) loops and doing it in a j style yet > > being able to handle large files. j mapped files are certainly > > needed in that case. There was also a suggestion of regex, > > but my experience calling regex from j has been less than > > satisfactory. > > > > In my opinion, these things usually require some thought and > > knowledge of the data and the objectives. If the pattern you > > are searching for is "nice" (like your keyword 'csn') then > > there are usually pretty good ways to have j gather the data. > > To find an actual example to illustrate, I catenated the past > > 8 weeks worth of sendmail logs on my linux system to create > > a file "maillogs" - here is some experimenting with it - > > > > [EMAIL PROTECTED] mqueue]$ wc maillogs > > 564175 6987478 75395162 maillogs > > > > that is, the file is 75Mbytes with 564,175 lines > > > > [EMAIL PROTECTED] mqueue]$ ja # starting jconsole > > version '' > > j504/2005-03-16/15:30 > > Running in: Linux > > host 'cat /proc/cpuinfo' > > processor : 0 > > vendor_id : GenuineIntel > > cpu family : 6 > > model : 5 > > model name : Pentium II (Deschutes) > > stepping : 2 > > cpu MHz : 399.071 > > cache size : 512 KB > > .... > > > > NB. not a very fast machine, but it does have 1Gbyte ram available > > > > require 'jmf' > > JCHAR map_jmf_ 'mls';'maillogs';'';1 > > NB. HIGHLY recommended to map read only... that is the 1 at the > > NB. end of the mapping expression. There is a vicious side effect > > NB. (IMHO a BUG) in setting an alias of a mapped name within a verb. > > > > NB. My example is to get the size of messages that passed through > > NB. sendmail. Typically there is a phrase like size=1234, in > > NB. the log. The following is based on that. > > > > delim =: ',' > > tag =: 'size=' > > > > timex 'tagis =: I. tag E. mls' NB. time and space to > get indexes > > 3.49947 1.34481e8 > > timex 'sizes =: delim (_1: ". (] i."1 [) {."0 1 ]) (tagis +/ > > (#tag)+i. 12){mls' > > 0.431585 1.37452e7 > > $sizes > > 43947 > > +/ x: sizes > > 11572953524 > > > > Maybe these are some ideas you can use to attack your problem. > > > > - joey > > > > > > At 11:01 +0300 2006/05/14, Yoel Jacobsen wrote: > > >Hello, > > > > > >I'm new to J so please forgive me if this is a FAQ. > > > > > >I wrote some short sentences to parse a log file. I want > to retrieve all > > the > > >unique values of some attribute. The way it shows in the > log file is > > ><attribute name>SPACE<attribute value> such as "..... csn > 92892849893284 > > >..." > > > > > >My initial (brute force) program is: > > > > > >text =: 1!:1 < '/tmp/logfile' > > >words =: cutopen text > > >bv =: (<'csn') = words > > >srbv =: _1 |.!.0 bv > > >csns =: ~. srbv # words > > > > > >Now csns holds the unique values as requested. > > > > > >The program works fine for small files (few megabytes). > > > > > >My question is, what should be done to make it work for > large files (say, > > >1GB or more)? I guess it involves memory mapped files but > I have no clue > > >where to continue from here. > > > > > >Further, is there any notion of 'laziness' (evaluate only > when the data > > is > > >really needed) in J? can a verb be decalred as a lazy verb? > > > > > >Thanks, > > > > > >Yoel > > > ---------------------------------------------------------------------- > > For information about J forums see > http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > For information about J forums see > http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
