> limit is only 2GB A phrase I thought I'd _never_ hear ----------------------------------------------------------------------- |\/| Randy A MacDonald | APL: If you can say it, it's done.. (ram) |/\| [EMAIL PROTECTED] | |\ | | BSc(Math) UNBF'83 Sapere Aude |If you cannot describe what you are doing | as a process, you don't know what you're doing. | - W. E. Deming Natural Born APL'er | Demo website: http://156.34.82.41/ ----------------------------------------------------(INTP)----{ gnat }-
----- Original Message ----- From: "Oleg Kobchenko" <[EMAIL PROTECTED]> To: "Programming forum" <programming@jsoftware.com> Sent: Monday, May 15, 2006 11:50 AM Subject: RE: [Jprogramming] Scanning a large file > I believe you cannot map the entire file > at once, the limit is only 2GB > 'c0'8!:2]_1+2^31 > 2,147,483,647 > > > --- Henry Rich <[EMAIL PROTECTED]> wrote: > > > Try > > > > x ([: I. E.) y > > > > to get the list of places where the string x occurs. This uses > > special code and doesn't create the entire result of E. . > > > > Henry Rich > > > > > -----Original Message----- > > > From: [EMAIL PROTECTED] > > > [mailto:[EMAIL PROTECTED] On Behalf Of Yoel Jacobsen > > > Sent: Monday, May 15, 2006 10:09 AM > > > To: Programming forum > > > Subject: Re: [Jprogramming] Scanning a large file > > > > > > It won't work for large files. E. returns a 'limit error'. > > > > > > Yoel > > > > > > On 5/14/06, Joey K Tuttle <[EMAIL PROTECTED]> wrote: > > > > > > > > Yoel, > > > > > > > > Some of the feedback you got suggested mapped files, others > > > > suggested just reading the file. My own habits lean towards > > > > reading the file and I have a utility verb that gets "lines" > > > > while not exceeding a buffer size limit. I find that buffer > > > > sizes > 100Kbytes generally make almost no difference in > > > > processing time - in fact, processing can take longer on > > > > larger chunks. Actually, the gain after 40Kbytes is minor > > > > indeed. > > > > > > > > But in your responses you indicated that you were interested > > > > in not using (explicit) loops and doing it in a j style yet > > > > being able to handle large files. j mapped files are certainly > > > > needed in that case. There was also a suggestion of regex, > > > > but my experience calling regex from j has been less than > > > > satisfactory. > > > > > > > > In my opinion, these things usually require some thought and > > > > knowledge of the data and the objectives. If the pattern you > > > > are searching for is "nice" (like your keyword 'csn') then > > > > there are usually pretty good ways to have j gather the data. > > > > To find an actual example to illustrate, I catenated the past > > > > 8 weeks worth of sendmail logs on my linux system to create > > > > a file "maillogs" - here is some experimenting with it - > > > > > > > > [EMAIL PROTECTED] mqueue]$ wc maillogs > > > > 564175 6987478 75395162 maillogs > > > > > > > > that is, the file is 75Mbytes with 564,175 lines > > > > > > > > [EMAIL PROTECTED] mqueue]$ ja # starting jconsole > > > > version '' > > > > j504/2005-03-16/15:30 > > > > Running in: Linux > > > > host 'cat /proc/cpuinfo' > > > > processor : 0 > > > > vendor_id : GenuineIntel > > > > cpu family : 6 > > > > model : 5 > > > > model name : Pentium II (Deschutes) > > > > stepping : 2 > > > > cpu MHz : 399.071 > > > > cache size : 512 KB > > > > .... > > > > > > > > NB. not a very fast machine, but it does have 1Gbyte ram available > > > > > > > > require 'jmf' > > > > JCHAR map_jmf_ 'mls';'maillogs';'';1 > > > > NB. HIGHLY recommended to map read only... that is the 1 at the > > > > NB. end of the mapping expression. There is a vicious side effect > > > > NB. (IMHO a BUG) in setting an alias of a mapped name within a verb. > > > > > > > > NB. My example is to get the size of messages that passed through > > > > NB. sendmail. Typically there is a phrase like size=1234, in > > > > NB. the log. The following is based on that. > > > > > > > > delim =: ',' > > > > tag =: 'size=' > > > > > > > > timex 'tagis =: I. tag E. mls' NB. time and space to > > > get indexes > > > > 3.49947 1.34481e8 > > > > timex 'sizes =: delim (_1: ". (] i."1 [) {."0 1 ]) (tagis +/ > > > > (#tag)+i. 12){mls' > > > > 0.431585 1.37452e7 > > > > $sizes > > > > 43947 > > > > +/ x: sizes > > > > 11572953524 > > > > > > > > Maybe these are some ideas you can use to attack your problem. > > > > > > > > - joey > > > > > > > > > > > > At 11:01 +0300 2006/05/14, Yoel Jacobsen wrote: > > > > >Hello, > > > > > > > > > >I'm new to J so please forgive me if this is a FAQ. > > > > > > > > > >I wrote some short sentences to parse a log file. I want > > > to retrieve all > > > > the > > > > >unique values of some attribute. The way it shows in the > > > log file is > > > > ><attribute name>SPACE<attribute value> such as "..... csn > > > 92892849893284 > > > > >..." > > > > > > > > > >My initial (brute force) program is: > > > > > > > > > >text =: 1!:1 < '/tmp/logfile' > > > > >words =: cutopen text > > > > >bv =: (<'csn') = words > > > > >srbv =: _1 |.!.0 bv > > > > >csns =: ~. srbv # words > > > > > > > > > >Now csns holds the unique values as requested. > > > > > > > > > >The program works fine for small files (few megabytes). > > > > > > > > > >My question is, what should be done to make it work for > > > large files (say, > > > > >1GB or more)? I guess it involves memory mapped files but > > > I have no clue > > > > >where to continue from here. > > > > > > > > > >Further, is there any notion of 'laziness' (evaluate only > > > when the data > > > > is > > > > >really needed) in J? can a verb be decalred as a lazy verb? > > > > > > > > > >Thanks, > > > > > > > > > >Yoel > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm