Try

x ([: I. E.) y

to get the list of places where the string x occurs.  This uses
special code and doesn't create the entire result of E. .

Henry Rich

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Yoel Jacobsen
> Sent: Monday, May 15, 2006 10:09 AM
> To: Programming forum
> Subject: Re: [Jprogramming] Scanning a large file
> 
> It won't work for large files. E. returns a 'limit error'.
> 
> Yoel
> 
> On 5/14/06, Joey K Tuttle <[EMAIL PROTECTED]> wrote:
> >
> > Yoel,
> >
> > Some of the feedback you got suggested mapped files, others
> > suggested just reading the file. My own habits lean towards
> > reading the file and I have a utility verb that gets "lines"
> > while not exceeding a buffer size limit. I find that buffer
> > sizes > 100Kbytes generally make almost no difference in
> > processing time - in fact, processing can take longer on
> > larger chunks. Actually, the gain after 40Kbytes is minor
> > indeed.
> >
> > But in your responses you indicated that you were interested
> > in not using (explicit) loops and doing it in a j style yet
> > being able to handle large files. j mapped files are certainly
> > needed in that case. There was also a suggestion of regex,
> > but my experience calling regex from j has been less than
> > satisfactory.
> >
> > In my opinion, these things usually require some thought and
> > knowledge of the data and the objectives. If the pattern you
> > are searching for is "nice" (like your keyword 'csn') then
> > there are usually pretty good ways to have j gather the data.
> > To find an actual example to illustrate, I catenated the past
> > 8 weeks worth of sendmail logs on my linux system to create
> > a file "maillogs" - here is some experimenting with it -
> >
> > [EMAIL PROTECTED] mqueue]$ wc maillogs
> >   564175 6987478 75395162 maillogs
> >
> >     that is, the file is 75Mbytes with 564,175 lines
> >
> > [EMAIL PROTECTED] mqueue]$ ja  # starting jconsole
> >     version ''
> > j504/2005-03-16/15:30
> > Running in: Linux
> >     host 'cat /proc/cpuinfo'
> > processor       : 0
> > vendor_id       : GenuineIntel
> > cpu family      : 6
> > model           : 5
> > model name      : Pentium II (Deschutes)
> > stepping        : 2
> > cpu MHz         : 399.071
> > cache size      : 512 KB
> >    ....
> >
> > NB. not a very fast machine, but it does have 1Gbyte ram available
> >
> >     require 'jmf'
> >     JCHAR map_jmf_ 'mls';'maillogs';'';1
> > NB. HIGHLY recommended to map read only... that is the 1 at the
> > NB. end of the mapping expression. There is a vicious side effect
> > NB. (IMHO a BUG) in setting an alias of a mapped name within a verb.
> >
> > NB. My example is to get the size of messages that passed through
> > NB. sendmail. Typically there is a phrase like   size=1234,  in
> > NB. the log. The following is based on that.
> >
> >     delim =: ','
> >     tag =: 'size='
> >
> >     timex 'tagis =: I. tag E. mls'    NB. time and space to 
> get indexes
> > 3.49947 1.34481e8
> >     timex 'sizes =: delim (_1: ". (] i."1 [) {."0 1 ]) (tagis +/
> > (#tag)+i. 12){mls'
> > 0.431585 1.37452e7
> >     $sizes
> > 43947
> >     +/ x: sizes
> > 11572953524
> >
> > Maybe these are some ideas you can use to attack your problem.
> >
> > - joey
> >
> >
> > At 11:01  +0300 2006/05/14, Yoel Jacobsen wrote:
> > >Hello,
> > >
> > >I'm new to J so please forgive me if this is a FAQ.
> > >
> > >I wrote some short sentences to parse a log file. I want 
> to retrieve all
> > the
> > >unique values of some attribute. The way it shows in the 
> log file is
> > ><attribute name>SPACE<attribute value> such as "..... csn 
> 92892849893284
> > >..."
> > >
> > >My initial (brute force) program is:
> > >
> > >text =: 1!:1 < '/tmp/logfile'
> > >words =: cutopen text
> > >bv =: (<'csn') = words
> > >srbv =: _1 |.!.0 bv
> > >csns =: ~. srbv # words
> > >
> > >Now csns holds the unique values as requested.
> > >
> > >The program works fine for small files (few megabytes).
> > >
> > >My question is, what should be done to make it work for 
> large files (say,
> > >1GB or more)? I guess it involves memory mapped files but 
> I have no clue
> > >where to continue from here.
> > >
> > >Further, is there any notion of 'laziness' (evaluate only 
> when the data
> > is
> > >really needed) in J? can a verb be decalred as a lazy verb?
> > >
> > >Thanks,
> > >
> > >Yoel
> > 
> ----------------------------------------------------------------------
> > For information about J forums see 
> http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see 
> http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to