> limit is only 2GB

A phrase I thought I'd _never_ hear
-----------------------------------------------------------------------
|\/| Randy A MacDonald             | APL: If you can say it, it's done..
(ram)
|/\| [EMAIL PROTECTED]            |
|\ |                               |
BSc(Math) UNBF'83 Sapere Aude      |If you cannot describe what you are
doing
                                   | as a process, you don't know what
you're doing.
                                   |     - W. E. Deming
Natural Born APL'er                | Demo website: http://156.34.82.41/
----------------------------------------------------(INTP)----{ gnat }-

----- Original Message ----- 
From: "Oleg Kobchenko" <[EMAIL PROTECTED]>
To: "Programming forum" <programming@jsoftware.com>
Sent: Monday, May 15, 2006 11:50 AM
Subject: RE: [Jprogramming] Scanning a large file


> I believe you cannot map the entire file
> at once, the limit is only 2GB
>    'c0'8!:2]_1+2^31
> 2,147,483,647
>
>
> --- Henry Rich <[EMAIL PROTECTED]> wrote:
>
> > Try
> >
> > x ([: I. E.) y
> >
> > to get the list of places where the string x occurs.  This uses
> > special code and doesn't create the entire result of E. .
> >
> > Henry Rich
> >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of Yoel Jacobsen
> > > Sent: Monday, May 15, 2006 10:09 AM
> > > To: Programming forum
> > > Subject: Re: [Jprogramming] Scanning a large file
> > >
> > > It won't work for large files. E. returns a 'limit error'.
> > >
> > > Yoel
> > >
> > > On 5/14/06, Joey K Tuttle <[EMAIL PROTECTED]> wrote:
> > > >
> > > > Yoel,
> > > >
> > > > Some of the feedback you got suggested mapped files, others
> > > > suggested just reading the file. My own habits lean towards
> > > > reading the file and I have a utility verb that gets "lines"
> > > > while not exceeding a buffer size limit. I find that buffer
> > > > sizes > 100Kbytes generally make almost no difference in
> > > > processing time - in fact, processing can take longer on
> > > > larger chunks. Actually, the gain after 40Kbytes is minor
> > > > indeed.
> > > >
> > > > But in your responses you indicated that you were interested
> > > > in not using (explicit) loops and doing it in a j style yet
> > > > being able to handle large files. j mapped files are certainly
> > > > needed in that case. There was also a suggestion of regex,
> > > > but my experience calling regex from j has been less than
> > > > satisfactory.
> > > >
> > > > In my opinion, these things usually require some thought and
> > > > knowledge of the data and the objectives. If the pattern you
> > > > are searching for is "nice" (like your keyword 'csn') then
> > > > there are usually pretty good ways to have j gather the data.
> > > > To find an actual example to illustrate, I catenated the past
> > > > 8 weeks worth of sendmail logs on my linux system to create
> > > > a file "maillogs" - here is some experimenting with it -
> > > >
> > > > [EMAIL PROTECTED] mqueue]$ wc maillogs
> > > >   564175 6987478 75395162 maillogs
> > > >
> > > >     that is, the file is 75Mbytes with 564,175 lines
> > > >
> > > > [EMAIL PROTECTED] mqueue]$ ja  # starting jconsole
> > > >     version ''
> > > > j504/2005-03-16/15:30
> > > > Running in: Linux
> > > >     host 'cat /proc/cpuinfo'
> > > > processor       : 0
> > > > vendor_id       : GenuineIntel
> > > > cpu family      : 6
> > > > model           : 5
> > > > model name      : Pentium II (Deschutes)
> > > > stepping        : 2
> > > > cpu MHz         : 399.071
> > > > cache size      : 512 KB
> > > >    ....
> > > >
> > > > NB. not a very fast machine, but it does have 1Gbyte ram available
> > > >
> > > >     require 'jmf'
> > > >     JCHAR map_jmf_ 'mls';'maillogs';'';1
> > > > NB. HIGHLY recommended to map read only... that is the 1 at the
> > > > NB. end of the mapping expression. There is a vicious side effect
> > > > NB. (IMHO a BUG) in setting an alias of a mapped name within a verb.
> > > >
> > > > NB. My example is to get the size of messages that passed through
> > > > NB. sendmail. Typically there is a phrase like   size=1234,  in
> > > > NB. the log. The following is based on that.
> > > >
> > > >     delim =: ','
> > > >     tag =: 'size='
> > > >
> > > >     timex 'tagis =: I. tag E. mls'    NB. time and space to
> > > get indexes
> > > > 3.49947 1.34481e8
> > > >     timex 'sizes =: delim (_1: ". (] i."1 [) {."0 1 ]) (tagis +/
> > > > (#tag)+i. 12){mls'
> > > > 0.431585 1.37452e7
> > > >     $sizes
> > > > 43947
> > > >     +/ x: sizes
> > > > 11572953524
> > > >
> > > > Maybe these are some ideas you can use to attack your problem.
> > > >
> > > > - joey
> > > >
> > > >
> > > > At 11:01  +0300 2006/05/14, Yoel Jacobsen wrote:
> > > > >Hello,
> > > > >
> > > > >I'm new to J so please forgive me if this is a FAQ.
> > > > >
> > > > >I wrote some short sentences to parse a log file. I want
> > > to retrieve all
> > > > the
> > > > >unique values of some attribute. The way it shows in the
> > > log file is
> > > > ><attribute name>SPACE<attribute value> such as "..... csn
> > > 92892849893284
> > > > >..."
> > > > >
> > > > >My initial (brute force) program is:
> > > > >
> > > > >text =: 1!:1 < '/tmp/logfile'
> > > > >words =: cutopen text
> > > > >bv =: (<'csn') = words
> > > > >srbv =: _1 |.!.0 bv
> > > > >csns =: ~. srbv # words
> > > > >
> > > > >Now csns holds the unique values as requested.
> > > > >
> > > > >The program works fine for small files (few megabytes).
> > > > >
> > > > >My question is, what should be done to make it work for
> > > large files (say,
> > > > >1GB or more)? I guess it involves memory mapped files but
> > > I have no clue
> > > > >where to continue from here.
> > > > >
> > > > >Further, is there any notion of 'laziness' (evaluate only
> > > when the data
> > > > is
> > > > >really needed) in J? can a verb be decalred as a lazy verb?
> > > > >
> > > > >Thanks,
> > > > >
> > > > >Yoel
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to