Matthew Brand wrote:
> I have a very large data file which contains comma seperated values (some
> columns are non numeric data). It is so big that doing anything with it
> apart from very basic things in 32-bit seems to run out of memory:
> 
> JCHAR map_jmf_ 'text';textfile
>    $text
> 134125010
> 
> It contains many "missing values" that are denoted by two commas with
> nothing inbetween. I want to read in the file and output a new one with
> zeroes inbetween the two commas.
> 
...

> Does anyone know a faster algorithm to do this on such a large file? Can it
> be done in a 32-bit address space?  The problem can be solved by streaming
> through the data in C++, but I want know how to do it in J efficiently
> without using explicit loops.

It can be done reasonably efficiently in J, but with loops. There is no
harm in looping when you have to.

For reading through large files in blocks, use freadblock from the files
script. For example:

load 'files'

fixcsv=: 3 : 0
'infile outfile'=. y
'' fwrite outfile
ptr=. 0
while.
  'dat ptr'=. freadblock infile;ptr
  #dat do.
  (fixcsv1 dat) fappend outfile
end.
)

NB. fix example:
fixcsv1=: 3 : 0
dat=. (y e. ',',LF) <;.2 y
ndx=. I. 1=#&> dat
;('0' ,each ndx{dat) ndx}dat
)

   F1=: jpath '~temp/t1.dat'
   F2=: jpath '~temp/t2.dat'
   ('0,,34567,,abcd,,efg',LF,',jkl,,,,,') fwrites F1
   fixcsv F1;F2
   freads F2
0,0,34567,0,abcd,0,efg
0,jkl,0,0,0,0,0


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to