Thanks for all of the responses. I am trying to test the solutions offered.



2009/8/28 Alex Rufon <[email protected]>:
>> I actually have a large file splitter in both perl and J - the J uses the
>> bigfiles utilities and runs maybe 25% faster than the perl, but the perl was
>> easier to write - and has better platform-independence than this piece of J
>> code - because of the bigfiles wrinkle.
>
> Hehehe. The easier to write part is a bit relative. ;) I couldn't code in 
> pearl right now to save my life. LOL :D
>
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Devon McCormick
> Sent: Thursday, August 27, 2009 11:09 PM
> To: Programming forum
> Subject: Re: [Jprogramming] streaming through a large text file
>
> For large text files, I've found perl to be quite efficient.  One advantage
> the Unix commands have over J for machine performance is that they are
> stream-oriented.  An array-oriented language like J tends to want to have
> the whole array available at once.
>
> I actually have a large file splitter in both perl and J - the J uses the
> bigfiles utilities and runs maybe 25% faster than the perl, but the perl was
> easier to write - and has better platform-independence than this piece of J
> code - because of the bigfiles wrinkle.
>
> On Thu, Aug 27, 2009 at 6:17 AM, Matthew Brand 
> <[email protected]>wrote:
>
>> I am using 64 bit linux so do not run into any file size issues. It
>> appears that the whole file is read into memory (i.e. swap disk)
>> before any operations are carried out. It might me more efficient to
>> use mapped files.
>>
>> Splitting into many smaller files takes less time because at no point
>> does the program have to use the swap disk. I agree that on a machine
>> with much larger ram it would probably not make a difference.
>>
>> I don't know the details, but I wonder how the unix gawk command
>> manages to trundle through huge data files a line at a time seemingly
>> efficiently, could J do it in a similar way (what ever that is!)?
>>
>>
>>
>> 2009/8/27 R.E. Boss <[email protected]>:
>> > Link should be
>> > http://www.jsoftware.com/jwiki/Scripts/Working%20with%20Big%20Files
>> >
>> >
>> > R.E. Boss
>> >
>> >
>> >> -----Oorspronkelijk bericht-----
>> >> Van: [email protected] [mailto:programming-
>> >> [email protected]] Namens Devon McCormick
>> >> Verzonden: donderdag 27 augustus 2009 3:56
>> >> Aan: Programming forum
>> >> Onderwerp: Re: [Jprogramming] streaming through a large text file
>> >>
>> >> These could be made to work on files >4GB using the bigfiles code
>> (Windows
>> >> only) but they would have to be re-written to do that.  You'd have to
>> use
>> >> "bixread" instead of 1!:11 and deal with extended integers - see
>> >> http://www.jsoftware.com/jwiki/Scripts/Working with Big Files for more
>> on
>> >> this if you're interested.
>> >>
>> >> On Wed, Aug 26, 2009 at 6:42 PM, Sherlock, Ric
>> >> <[email protected]>wrote:
>> >>
>> >> > Is the reason that fapplylines & freadblock doesn't work on files >4GB
>> >> > because a 32bit system can't represent the index into the file as an
>> >> 32bit
>> >> > integer?
>> >> > In other words they may well work OK on a 64bit system?
>> >> >
>> >> > I think bigfiles.ijs is Windows only? It so it would be an alternative
>> >> if
>> >> > using a 32bit Windows system, but it sounds like Matthew is on Linux.
>> >> >
>> >> > > From: Don Guinn
>> >> > >
>> >> > > Use bigfiles.ijs
>> >> > >
>> >> > > On Wed, Aug 26, 2009 at 4:09 PM, Devon McCormick wrote:
>> >> > >
>> >> > > > I thought I'd try this code but it doesn't work with very large
>> >> files
>> >> > > (>4
>> >> > > > GB).
>> >> > > >
>> >> > > > On Wed, Aug 26, 2009 at 11:46 AM, R.E. Boss wrote:
>> >> > > >
>> >> > > > > > Chris Burke wrote:
>> >> > > > > > > Matthew Brand wrote:
>> >> > > > > > > Thanks for the links. I tried the fapplylines adverb but the
>> >> > > computer
>> >> > > > > > > grinds along for 30 minutes or so before I pulled the plug.
>> It
>> >> > > ends
>> >> > > > up
>> >> > > > > > > using 10Gb of (mainly virtual) memory. There are 40M lines
>> in
>> >> > > my
>> >> > > > file.
>> >> > > > > > >
>> >> > > > > > > I will use the unix split command to make lots of little
>> files
>> >> > > and
>> >> > > > > > > (myverb fapplylines)&.> fname to solve the problem.
>> >> > > > > >
>> >> > > > > > There should be little difference between processing lots of
>> >> > > small
>> >> > > > > > files, and one big file in chunks.
>> >> > > > > >
>> >> > > > > > What processing is being done? What result is being
>> accumulated?
>> >> > > > > >
>> >> > > > > > Why not test on a small file first and find out what is taking
>> >> > > time -
>> >> > > > > > and only then try on the full file?
>> >> > > > >
>> >> > > > >
>> >> > > > > My guess is we can improve the efficiency of your code by at
>> least
>> >> > > a
>> >> > > > factor
>> >> > > > > 2 (= Hui's constant).
>> >> > > > >
>> >> >
>> >> > ----------------------------------------------------------------------
>> >> > For information about J forums see
>> http://www.jsoftware.com/forums.htm
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Devon McCormick, CFA
>> >> ^me^ at acm.
>> >> org is my
>> >> preferred e-mail
>> >> ----------------------------------------------------------------------
>> >> For information about J forums see http://www.jsoftware.com/forums.htm
>> >
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> >
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
>
> --
> Devon McCormick, CFA
> ^me^ at acm.
> org is my
> preferred e-mail
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to