Re: [Jprogramming] streaming through a large text file

Matthew Brand Thu, 27 Aug 2009 03:18:16 -0700

I am using 64 bit linux so do not run into any file size issues. It
appears that the whole file is read into memory (i.e. swap disk)
before any operations are carried out. It might me more efficient to
use mapped files.


Splitting into many smaller files takes less time because at no point
does the program have to use the swap disk. I agree that on a machine
with much larger ram it would probably not make a difference.

I don't know the details, but I wonder how the unix gawk command
manages to trundle through huge data files a line at a time seemingly
efficiently, could J do it in a similar way (what ever that is!)?



2009/8/27 R.E. Boss <[email protected]>:
> Link should be
> http://www.jsoftware.com/jwiki/Scripts/Working%20with%20Big%20Files
>
>
> R.E. Boss
>
>
>> -----Oorspronkelijk bericht-----
>> Van: [email protected] [mailto:programming-
>> [email protected]] Namens Devon McCormick
>> Verzonden: donderdag 27 augustus 2009 3:56
>> Aan: Programming forum
>> Onderwerp: Re: [Jprogramming] streaming through a large text file
>>
>> These could be made to work on files >4GB using the bigfiles code (Windows
>> only) but they would have to be re-written to do that.  You'd have to use
>> "bixread" instead of 1!:11 and deal with extended integers - see
>> http://www.jsoftware.com/jwiki/Scripts/Working with Big Files for more on
>> this if you're interested.
>>
>> On Wed, Aug 26, 2009 at 6:42 PM, Sherlock, Ric
>> <[email protected]>wrote:
>>
>> > Is the reason that fapplylines & freadblock doesn't work on files >4GB
>> > because a 32bit system can't represent the index into the file as an
>> 32bit
>> > integer?
>> > In other words they may well work OK on a 64bit system?
>> >
>> > I think bigfiles.ijs is Windows only? It so it would be an alternative
>> if
>> > using a 32bit Windows system, but it sounds like Matthew is on Linux.
>> >
>> > > From: Don Guinn
>> > >
>> > > Use bigfiles.ijs
>> > >
>> > > On Wed, Aug 26, 2009 at 4:09 PM, Devon McCormick wrote:
>> > >
>> > > > I thought I'd try this code but it doesn't work with very large
>> files
>> > > (>4
>> > > > GB).
>> > > >
>> > > > On Wed, Aug 26, 2009 at 11:46 AM, R.E. Boss wrote:
>> > > >
>> > > > > > Chris Burke wrote:
>> > > > > > > Matthew Brand wrote:
>> > > > > > > Thanks for the links. I tried the fapplylines adverb but the
>> > > computer
>> > > > > > > grinds along for 30 minutes or so before I pulled the plug. It
>> > > ends
>> > > > up
>> > > > > > > using 10Gb of (mainly virtual) memory. There are 40M lines in
>> > > my
>> > > > file.
>> > > > > > >
>> > > > > > > I will use the unix split command to make lots of little files
>> > > and
>> > > > > > > (myverb fapplylines)&.> fname to solve the problem.
>> > > > > >
>> > > > > > There should be little difference between processing lots of
>> > > small
>> > > > > > files, and one big file in chunks.
>> > > > > >
>> > > > > > What processing is being done? What result is being accumulated?
>> > > > > >
>> > > > > > Why not test on a small file first and find out what is taking
>> > > time -
>> > > > > > and only then try on the full file?
>> > > > >
>> > > > >
>> > > > > My guess is we can improve the efficiency of your code by at least
>> > > a
>> > > > factor
>> > > > > 2 (= Hui's constant).
>> > > > >
>> >
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> >
>>
>>
>>
>> --
>> Devon McCormick, CFA
>> ^me^ at acm.
>> org is my
>> preferred e-mail
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] streaming through a large text file

Reply via email to