Yoel Jacobsen wrote: > I wrote some short sentences to parse a log file. I want to retrieve all > the > unique values of some attribute. The way it shows in the log file is > <attribute name>SPACE<attribute value> such as "..... csn 92892849893284 > ..." > > My initial (brute force) program is: > > text =: 1!:1 < '/tmp/logfile' > words =: cutopen text > bv =: (<'csn') = words > srbv =: _1 |.!.0 bv > csns =: ~. srbv # words > > Now csns holds the unique values as requested. > > The program works fine for small files (few megabytes).
Probably the simplest way to handle this is to read the file in large blocks, and chop the blocks into lines. Since lines are of uneven length, the blocks will likely not end in a line separator, so need to be truncated. You don't need to memory map the file. The following example assumes each line ends in LF: getcsn=: 3 : 0 siz=. fsize y blk=. 1e7 ptr=. 0 res=. '' while. ptr < siz do. len=. blk <. siz - ptr dat=. fread y;ptr,len lfx=. 1 + dat i: LF ptr=. ptr + lfx dat=. <;._2 lfx {. dat key=. (dat i.&> ' ') {. each dat msk=. key = <'csn' res=. ~. res, msk # dat end. 4 }. each res ) A=: 0 : 0 abc qweqwe csn 1234 def 123123 csn 87654 ) A fwrites F=: jpath '~temp/t1.dat' 41 getcsn F +----+-----+ |1234|87654| +----+-----+ ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm