Thanks for all of the responses. I am trying to test the solutions offered.
2009/8/28 Alex Rufon <[email protected]>: >> I actually have a large file splitter in both perl and J - the J uses the >> bigfiles utilities and runs maybe 25% faster than the perl, but the perl was >> easier to write - and has better platform-independence than this piece of J >> code - because of the bigfiles wrinkle. > > Hehehe. The easier to write part is a bit relative. ;) I couldn't code in > pearl right now to save my life. LOL :D > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Devon McCormick > Sent: Thursday, August 27, 2009 11:09 PM > To: Programming forum > Subject: Re: [Jprogramming] streaming through a large text file > > For large text files, I've found perl to be quite efficient. One advantage > the Unix commands have over J for machine performance is that they are > stream-oriented. An array-oriented language like J tends to want to have > the whole array available at once. > > I actually have a large file splitter in both perl and J - the J uses the > bigfiles utilities and runs maybe 25% faster than the perl, but the perl was > easier to write - and has better platform-independence than this piece of J > code - because of the bigfiles wrinkle. > > On Thu, Aug 27, 2009 at 6:17 AM, Matthew Brand > <[email protected]>wrote: > >> I am using 64 bit linux so do not run into any file size issues. It >> appears that the whole file is read into memory (i.e. swap disk) >> before any operations are carried out. It might me more efficient to >> use mapped files. >> >> Splitting into many smaller files takes less time because at no point >> does the program have to use the swap disk. I agree that on a machine >> with much larger ram it would probably not make a difference. >> >> I don't know the details, but I wonder how the unix gawk command >> manages to trundle through huge data files a line at a time seemingly >> efficiently, could J do it in a similar way (what ever that is!)? >> >> >> >> 2009/8/27 R.E. Boss <[email protected]>: >> > Link should be >> > http://www.jsoftware.com/jwiki/Scripts/Working%20with%20Big%20Files >> > >> > >> > R.E. Boss >> > >> > >> >> -----Oorspronkelijk bericht----- >> >> Van: [email protected] [mailto:programming- >> >> [email protected]] Namens Devon McCormick >> >> Verzonden: donderdag 27 augustus 2009 3:56 >> >> Aan: Programming forum >> >> Onderwerp: Re: [Jprogramming] streaming through a large text file >> >> >> >> These could be made to work on files >4GB using the bigfiles code >> (Windows >> >> only) but they would have to be re-written to do that. You'd have to >> use >> >> "bixread" instead of 1!:11 and deal with extended integers - see >> >> http://www.jsoftware.com/jwiki/Scripts/Working with Big Files for more >> on >> >> this if you're interested. >> >> >> >> On Wed, Aug 26, 2009 at 6:42 PM, Sherlock, Ric >> >> <[email protected]>wrote: >> >> >> >> > Is the reason that fapplylines & freadblock doesn't work on files >4GB >> >> > because a 32bit system can't represent the index into the file as an >> >> 32bit >> >> > integer? >> >> > In other words they may well work OK on a 64bit system? >> >> > >> >> > I think bigfiles.ijs is Windows only? It so it would be an alternative >> >> if >> >> > using a 32bit Windows system, but it sounds like Matthew is on Linux. >> >> > >> >> > > From: Don Guinn >> >> > > >> >> > > Use bigfiles.ijs >> >> > > >> >> > > On Wed, Aug 26, 2009 at 4:09 PM, Devon McCormick wrote: >> >> > > >> >> > > > I thought I'd try this code but it doesn't work with very large >> >> files >> >> > > (>4 >> >> > > > GB). >> >> > > > >> >> > > > On Wed, Aug 26, 2009 at 11:46 AM, R.E. Boss wrote: >> >> > > > >> >> > > > > > Chris Burke wrote: >> >> > > > > > > Matthew Brand wrote: >> >> > > > > > > Thanks for the links. I tried the fapplylines adverb but the >> >> > > computer >> >> > > > > > > grinds along for 30 minutes or so before I pulled the plug. >> It >> >> > > ends >> >> > > > up >> >> > > > > > > using 10Gb of (mainly virtual) memory. There are 40M lines >> in >> >> > > my >> >> > > > file. >> >> > > > > > > >> >> > > > > > > I will use the unix split command to make lots of little >> files >> >> > > and >> >> > > > > > > (myverb fapplylines)&.> fname to solve the problem. >> >> > > > > > >> >> > > > > > There should be little difference between processing lots of >> >> > > small >> >> > > > > > files, and one big file in chunks. >> >> > > > > > >> >> > > > > > What processing is being done? What result is being >> accumulated? >> >> > > > > > >> >> > > > > > Why not test on a small file first and find out what is taking >> >> > > time - >> >> > > > > > and only then try on the full file? >> >> > > > > >> >> > > > > >> >> > > > > My guess is we can improve the efficiency of your code by at >> least >> >> > > a >> >> > > > factor >> >> > > > > 2 (= Hui's constant). >> >> > > > > >> >> > >> >> > ---------------------------------------------------------------------- >> >> > For information about J forums see >> http://www.jsoftware.com/forums.htm >> >> > >> >> >> >> >> >> >> >> -- >> >> Devon McCormick, CFA >> >> ^me^ at acm. >> >> org is my >> >> preferred e-mail >> >> ---------------------------------------------------------------------- >> >> For information about J forums see http://www.jsoftware.com/forums.htm >> > >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> > >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > > > > -- > Devon McCormick, CFA > ^me^ at acm. > org is my > preferred e-mail > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
