That is what I have tried to avoid. J is "marketed" in the documentation as high level (decelerative?) language.
BTW, I did it with the following short (3 statements) python program: ------------------------------ import re f = open("/data/ddd") csnre = re.compile("(?<= csn )\w+") s = set( [match.group(0) for line in f.xreadlines() for match in [csnre.search(line)] if match] ) ---- When it finish, s contain a set of the unique value. The runtime on my laptop for a 1.2GB file was about 1.5 minutes Yoel On 5/14/06, bill lam <[EMAIL PROTECTED]> wrote:
I meant large file that can be read into memory. 5GB is too much for me. Given J is not lazy and I think you problem is diffcult to realize without loop like K&R c style p=. 0 while. do. line=. '' NB. todo: detect EOF while. LF~: c=. 1!:11 file,p,1 do. line=. line,c [ p=. >:p end. NB. process one line end. Yoel Jacobsen wrote: > Even for regex, I don't see how to avoid manually reading the file in > chunks > which is too imperative style for me. Again, consider the Python example: > > for line in file.readlines(): > match_object = re.search("(<= csn )\w+", line) > if match_object: > process(match_object.group(0)) > > The regex can be precompiled as well. > > This works on a 5GB file as well as on small files since readlines() take > care for reading the file in chunks. > > Is there a way to do it a consice way in J? > > Yoel > > On 5/14/06, bill lam <[EMAIL PROTECTED]> wrote: >> >> If the file is really large, I prefer regex instead. >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm