We need a general purpose read line functionality. It is common in C runtime and in other languages. Although, it is possible to do in J, but it's better not to do the low-level stuff every time.
Chris has shown how to do it in a way specific for a concrete example. It is suggested to separate the reading part from processing, so that reading could be reused. Here is a list constraints: - it's OK to assume LF line separators only (no CR) - read every byte of the file once and only once - proceess empty lines - proceess non-terminated last line - be fast and lean Here is an approach that keeps the state of file management out of the user code by means of a callback for each line. It calculates wc for 1Mb file on P2.8GHz in 1.7 sec. (wc FN) , ts'wc FN' 80000 200000 999999 1.6866 95808 NB. ========================================================= NB. readlines -- line reader require 'files' SB=: 10000 readlines=: 1 : 0 assert fexist y S=. fsize y P=. 0 B=. '' while. P < S do. B=. B,fread y ; P,SR=. SB<.S-P P=. P+SR if. (#B) >: L=. 1 + B i:LF do. u ;.2 L {. B B=. L }. B end. end. if. #B do. u B end. ) NB. ========================================================= NB. user code lwc=: 3 : 0 LC=: LC + 1 WC=: WC + #@;: }:^:(LF={:)y CC=: CC + #y ) wc=: 3 : 0 LC=: WC=: CC=: 0 lwc readlines y LC , WC , CC ) ts=: 6!:2 , 7!:[EMAIL PROTECTED] A=: 20000 ((* #) $ ]) 0 : 0 one two three four five six seven eight nine ten ) 0 : 0 (}:A) fwrite FN=: jpath '~temp/t1.txt' (wc FN) , ts'wc FN' ) NB. ========================================================= --- Chris Burke <[EMAIL PROTECTED]> wrote: > Yoel Jacobsen wrote: > > I wrote some short sentences to parse a log file. I want to retrieve all > > the > > unique values of some attribute. The way it shows in the log file is > > <attribute name>SPACE<attribute value> such as "..... csn 92892849893284 > > ..." > > > > My initial (brute force) program is: > > > > text =: 1!:1 < '/tmp/logfile' > > words =: cutopen text > > bv =: (<'csn') = words > > srbv =: _1 |.!.0 bv > > csns =: ~. srbv # words > > > > Now csns holds the unique values as requested. > > > > The program works fine for small files (few megabytes). > > Probably the simplest way to handle this is to read the file in large > blocks, and chop the blocks into lines. Since lines are of uneven > length, the blocks will likely not end in a line separator, so need to > be truncated. > > You don't need to memory map the file. > > The following example assumes each line ends in LF: > > getcsn=: 3 : 0 > siz=. fsize y > blk=. 1e7 > ptr=. 0 > res=. '' > while. ptr < siz do. > len=. blk <. siz - ptr > dat=. fread y;ptr,len > lfx=. 1 + dat i: LF > ptr=. ptr + lfx > dat=. <;._2 lfx {. dat > key=. (dat i.&> ' ') {. each dat > msk=. key = <'csn' > res=. ~. res, msk # dat > end. > 4 }. each res > ) > > A=: 0 : 0 > abc qweqwe > csn 1234 > def 123123 > csn 87654 > ) > > A fwrites F=: jpath '~temp/t1.dat' > 41 > > getcsn F > +----+-----+ > |1234|87654| > +----+-----+ > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm