Re: [Jprogramming] Insert zeroes into large data file

Matthew Brand Fri, 01 Aug 2008 07:03:46 -0700

it can be done in 43 seconds using C#  for comparison:

          DateTime t = DateTime.Now;


            StreamReader sr = new StreamReader(@"C:\\input.csv");

            StreamWriter sw = new StreamWriter(@"C:\\output.csv");

            string s = "";

            string[] slt=null;

            string news="";



            while(!sr.EndOfStream)

            {

                s=sr.ReadLine();

                news = "";

                slt = s.Split(',');

                foreach(string p in slt)

                {

                    if (p == "")

                        news += "0";

                    else

                        news += p;

                    news += ",";

                }

                news = news.Remove(news.Length-1);

                sw.WriteLine(news);

            }

            sw.Close();

            sr.Close();

            MessageBox.Show("Took " + DateTime.Now.Subtract(t).Seconds + "
seconds");

I think the main problem in J is running out of ram during the process. I
tried the (#!.'0'~ 1 j. ',,' E. ]) solution on a 4Gb 64-bit XP machine and
it was very slow, it started to use the page file in task manager - at that
point I stopped the process. I think that given enough ram J would do it
quickly with (#!.'0'~ 1 j. ',,' E. ]) ... but what about a 1Gb file, or a
10Gb file?

I don't think that 138MB is considered very large these days. Intraday
trading data or output of global climate models can easily be larger than
this.

I was wondering whether it is possible to write an adverb that auto
splits whatever is coming in to it on the basis of availible memory or to
make some kind of chunkifiaction happen automatically... but don't think it
would be easy to do. One might input a set of asserts to guide the splitting
process and use info about the ram size to determine wether it is neccesary
and how many splits to do.

The chunkify method takes 86 seconds, which is good enough for me at the
moment...

   require'jmf'        NB. map file utilities loaded in jmf
   require'files dir'
   textfile =. 'C:\input.csv'
   JCHAR map_jmf_ 'bigtext';textfile
   chunk_idx     =.  (i.@:<.&.(%&chunk_size <i.@:%3C.&.(%25&chunk_size> =:
10000))@:#
   chunkify_mask =.  (($@:[ $ 0"_) (1"_)`]`[} _1 , ] + ',,' -:"1 ({~ (,.
>:))) chunk_idx
   null2zero     =.  #!.'0'~ 1 j.',,' E. ]
   ts=: 6!:2, 7!:[EMAIL PROTECTED]
   ts ' ((;@:(<@:null2zero;.2)~ chunkify_mask) bigtext ) 1!:2
<''C:\output.csv'' '
85.6266 5.38447e8

 It would be nice if this chunkification just happened somehow in the
background though and all you needed to write is:

  require'jmf'        NB. map file utilities loaded in jmf
   require'files dir'
   textfile =. 'C:\input.csv'
   JCHAR map_jmf_ 'bigtext';textfile

 (  #!.'0'~ 1 j.',,' E. ]) MEMHANDLE (assert1`assert2`...) bigtext

Where MEMHANDLE is a conjunction which manages the splitting and assert1,...
are a list of things that have to hold true in each split.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Insert zeroes into large data file

Reply via email to