Re: [Jprogramming] Insert zeroes into large data file

Oleg Kobchenko Tue, 29 Jul 2008 17:33:52 -0700

It is still interesting to use mapped files to 
fit the solution. However, not arbitrary J execution 
makes use of the mapped space offered, so it should
be done with caution.


Here's a process that does it; a very precise sequence
of steps with mapped files. Note mem peaks from Task Manager.
Also temp memory is used is many steps (whereas seemingly
the provided mapped space could be used). So "out of memory"
can occur is more than one large temp space is allocated.
To prevent this, here each operation is broken into small
assigned steps.

NB. =========================================================
load'jmf files'

nsizes=: (,. <@(7!:5) ,. <@(4!:0))@nl

Note 'interactive test' NB. replace ",," with "0"
NB. create dummy data
  N=. 134125010
  f1=. N$'qq,,zz,,125',LF     NB. peak:    139 Mb
  f1 1!:2 <jpath'~temp/f1'
  erase <'f1'

NB. actuall process
  JCHAR map_jmf_ 'f1';jpath'~temp/f1'
  $f1
  ]P=. ',,' +/@E. f1
  createjmf_jmf_ (jpath'~temp/f2');4*1+N
  (fsize jpath'~temp/f2'),N
  map_jmf_ 'f2';jpath'~temp/f2'
  f2=: ',,' E. f1              NB. peak:   402 Mb
  f2=: f2+1                    NB. peak: 1,190 Mb
  f2=: 0,f2                    NB. ]
  f2=: +/\f2                   NB. ] here large tmp mem
  f2=: }:f2                    NB. ]
  createjmf_jmf_ (jpath'~temp/f3');N+P
  map_jmf_ 'f3';jpath'~temp/f3'
  f3=. (N+P)$'0'
  f3=. f1 f2}f3                NB. peak: 1,343 Mb
  f3 1!:2 <jpath'~temp/f4'

  nsizes''
  unmapall_jmf_''
)
NB. =========================================================

     nsizes''
+------+---------+-+
|N     |64       |0|
+------+---------+-+
|P     |64       |0|
+------+---------+-+
|f1    |1.34218e8|0|
+------+---------+-+
|f2    |5.36871e8|0|
+------+---------+-+
|f3    |2.68435e8|0|
+------+---------+-+
|nsizes|1088     |3|
+------+---------+-+

   load'dir'
   dir jpath'~temp/*.'
f1          134125010 29-Jul-08 20:08:22
f2          536500328 29-Jul-08 20:08:23
f3          156479462 29-Jul-08 20:08:28
f4          156479178 29-Jul-08 20:08:34


> From: Matthew Brand <[EMAIL PROTECTED]>
> 
> Hi All,
> 
> I have a very large data file which contains comma seperated values (some
> columns are non numeric data). It is so big that doing anything with it
> apart from very basic things in 32-bit seems to run out of memory:
> 
> JCHAR map_jmf_ 'text';textfile
>    $text
> 134125010
> 
> It contains many "missing values" that are denoted by two commas with
> nothing inbetween. I want to read in the file and output a new one with
> zeroes inbetween the two commas.
> 
> E.g. if the input file is:
> text =.'0,,34567,,abcd,,efg'
> 
> then the output should be:
> '0,0,34567,0,abcd,0,efg'
> 
> I am really struggling with this.  I have managed to get on a 64 bit XP
> machine to overcome the out-of-memory errors, but everything I come up with
> is so slow that I just kill the process and have a rethink.
> 
> This is what I have come up with so far:
>     text =.'0,,34567,,abcd,,efg'
>    commas =. ( [: I. ',,'&E.  ) text
>    start =. 0, commas + 1
>    end =.  (<: # text) _1} 1 |. <: start
>    grab =. ( [: < {. + [: i. [: >: }.-{. )"_ 1 start,.end
>    output =. ; ( '0' ,~ [: ] {&text )&.> grab
> 
> But if I run it on the read file it is too slow.
> 
> I also came up with this:
>     ; 2 ([: < ((0&{)`(',0'"_))@.(([: ','&= 0&{ ) *. ([: ','&= _1&{ )))\
> '0,0,34567,0,abcd,0,efg'
> 0,0,34567,0,abcd,0,ef
> 
> but again it is too slow on the massive file (last character missed, but
> easy to fix that) and runs out of memory on a 32-bit machine.
> 
> Does anyone know a faster algorithm to do this on such a large file? Can it
> be done in a 32-bit address space?  The problem can be solved by streaming
> through the data in C++, but I want know how to do it in J efficiently
> without using explicit loops.
> 
> Thanks,
> Matthew.
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm



      
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Insert zeroes into large data file

Reply via email to