Yoel Jacobsen wrote:
> I wrote some short sentences to parse a log file. I want to retrieve all
> the
> unique values of some attribute. The way it shows in the log file is
> <attribute name>SPACE<attribute value> such as "..... csn 92892849893284
> ..."
> 
> My initial (brute force) program is:
> 
> text =: 1!:1 < '/tmp/logfile'
> words =: cutopen text
> bv =: (<'csn') = words
> srbv =: _1 |.!.0 bv
> csns =: ~. srbv # words
> 
> Now csns holds the unique values as requested.
> 
> The program works fine for small files (few megabytes).

Probably the simplest way to handle this is to read the file in large
blocks, and chop the blocks into lines. Since lines are of uneven
length, the blocks will likely not end in a line separator, so need to
be truncated.

You don't need to memory map the file.

The following example assumes each line ends in LF:

getcsn=: 3 : 0
siz=. fsize y
blk=. 1e7
ptr=. 0
res=. ''
while. ptr < siz do.
  len=. blk <. siz - ptr
  dat=. fread y;ptr,len
  lfx=. 1 + dat i: LF
  ptr=. ptr + lfx
  dat=. <;._2 lfx {. dat
  key=. (dat i.&> ' ') {. each dat
  msk=. key = <'csn'
  res=. ~. res, msk # dat
end.
4 }. each res
)

A=: 0 : 0
abc qweqwe
csn 1234
def 123123
csn 87654
)

   A fwrites F=: jpath '~temp/t1.dat'
41

   getcsn F
+----+-----+
|1234|87654|
+----+-----+

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to