Re: [Jprogramming] horrible performance regarding loading saved data into a sparse array

Ric Sherlock Sun, 13 Nov 2011 00:58:28 -0800

Below is one way to save and restore a sparse matrix in the format you used. I :
   ]A=: 10 10 ?.@$ 3
0 2 1 0 2 1 1 1 2 1
0 2 1 2 1 2 0 1 1 2
2 0 1 1 0 1 0 2 1 0
2 0 1 0 2 2 0 1 2 0
1 2 0 2 2 0 0 1 1 0
1 0 1 0 0 0 1 2 1 2
1 0 1 0 2 1 1 1 1 0
1 0 2 1 2 2 1 2 0 2
2 2 0 0 2 1 2 1 2 2
1 0 0 0 0 1 2 2 1 1
   B=: $. A                              NB. make matrix A sparse
   'test.sparse' fwrites~ ;,.&LF ": (4&$. ,. 5&$.) B  NB. write
indicies and value to LF-delimited file
414
   tmp=: _". ];._2 freads 'test.sparse'  NB. read back in and convert
to numeric array
   C=: $. 10 10 $ 0                      NB. initialise empty sparse matrix C
   B -: ({:"1 tmp) (<"1 }:"1 tmp)} C     NB. populate C and check it matches B
1



On Sat, Nov 12, 2011 at 10:08 PM, Nick Simicich <[email protected]> wrote:
> I have a very large sparse array that only has a tiny amount of hard to
> calculate actual data in it.  I decided to make verbs to backup the data in
> the array to a file on disk, so that I can restore the data and so that I
> am protected, at least partially, from power failures and so forth.  I have
> working code, but my first pass at the restore code performed very poorly.
>  I created another version of the code where performance is consistent with
> the same sort of operation running against a non-sparse array.
>
> The operation in question was the modification of an element in a sparse
> array, in a loop.
>
> The fix was to do all modifications in a single statement, rather than a
> loop. This was possible in this case.
>
> I am running it under Engine: j701/2011-01-10/11:25
> Library: 7.01.050
> Platform: Win 64
> Installer: j701a_win64.exe
> InstallPath: c:/program files (x86)/j64-701
>
> The backup program runs very quickly, under .2 sec:
>
> e85memobackup =: verb define
> format =. ( ": (x: 4 $. e85memo),.x: 5 $. e85memo)
> ((":$format),';',,format) (1!:2)
> <'\Users\Nick\j64-602-user\temp\e85backup.txt'
> 2 0 $ 0
> )
>
> I think that the backup program is about as good as I can make it.  I
> elected to use character data, but converting from the character data back
> to a numeric matrix goes very fast.
>
> e85memorestoreold =: verb define
> e85memo=: 1$.160000 160000;0 1;_
> t =. 1!:1 <'\Users\Nick\j64-602-user\temp\e85backup.txt'
> NB. the semicolon separates the shape from the bulk of the backup.
> split =. I. t=';'
> NB. the first part of the file
> s =. ". split {. t
> NB. is the shape of the rest of the file.
> mat =. ".s$(>:split)}.t
>
> for_line. mat do.
>  e85memo=: (2{line) (<0 1{line) } e85memo
>  NB. e85memo =: (2{"1 mat) (<"1 ] 0 1{"1 mat) } e85memo
> end.
> 2 0 $ 0
> )
>
> The restore takes quite a while to run.  Execution time up to where the
> "for." loop starts is no more than a fraction of a second.  The mat array
> takes a moment to build, and is 32027 3 in shape.
>
> Loading the data into the e85memo array takes 1 minute, 55 seconds. I can't
> test this with a full sized ordinary array, because, of course, attempting
> to execute a statement like
>
> e85memo =: 160000 160000 $ _
>
> gives me an "out of memory" instantly.
>
> Now I then thought, OK, I can do the same number of restore operations into
> a much smaller array:
>
> e85memorestorefake =: verb define
> e85memox=: 4000 4000 $_
> t =. 1!:1 <'\Users\Nick\j64-602-user\temp\e85backup.txt'
> split =. I. t=';'
> s =. ". split {. t
> mat =. ".s$(>:split)}.t
> for_line. mat do.
> e85memox=: (2{line) (<(4000|0 1{line)) } e85memox
> end.
> 0 2 $ 0
> )
>
> Same input data file, same number of insertion operations into the matrix,
> this program takes a quarter second to run.
>
> So, my presumption is that any single insertion operation into a sparse
> array takes a very long time (by computer standards).
>
> The other day, I wrote a very poor question which came down to, "how do you
> make many changes to an array at once?" but the question was so poorly
> worded that no one understood it.
>
> I still could not find it in the doc (sometimes I find part of the doc to
> be opaque) but I did a bunch more experimentation, and determined that this
> would work:
>
> e85memorestore =: verb define
> e85memo=: 1$.160000 160000;0 1;_
> t =. 1!:1 <'\Users\Nick\j64-602-user\temp\e85backup.txt'
> split =. I. t=';'
> s =. ". split {. t
> mat =. ".s$(>:split)}.t
> e85memo =: (2{"1 mat) (<"1 ] 0 1{"1 mat) } e85memo
> 0 2 $ 0
> )
>
> This program runs in approximately 0.08 seconds.
>
> I believe that this proves that it produces the same results:
>
>   load 'c:/users/nick/j602-user/temp/e85x.ijs'
> NB. This is the old loopy copy of the program
>   time 'e85memorestoreold i.0'
> execution of e85memorestoreold i.0
> took 0 h 1 m 54.489000 sec wall time 0 h 1 m 54.486888 s cpu
>
>   e85keep =. e85memo
>   e85keep -: e85memo
> 1
> NB. Loop replaced as above
>   time 'e85memorestore i.0'
> execution of e85memorestore i.0
> took 0 h 0 m 0.060000 sec wall time 0 h 0 m 0.059890 s cpu
>
>   e85keep -: e85memo
> 1
>
> Is it something about how I allocated my sparse array?  I don't think I've
> made any outright mistakes...I just didn't know how to make the multiple
> array modifications until minutes ago...and it would never have occurred to
> me that it would cause this much performance change - it does not cause
> this sort of problems with a non-sparse array.
>
> Musing:  (Skip this if my style irritates you, sorry)
>
> So this was another one of those that started out as a question.  Now that
> I know enough J to get sparse arrays to actually work, most of the time, I
> have found them to be a really great tool.  In this case, for example, I
> have a very large array of numeric values that I can use directly as an
> index into the array to memoize an expensive process. But if, for example,
> I had to make many insertions into the array in a small loop - like if
> those 32000 items were calculated over a few seconds instead of a large
> number of hours, making all those separate insertions would be very
> expensive, I might have to do something else.  But if I can make all of the
> insertions at once, the time is competitive with doing the same number of
> insertions into a non-sparse array.
>
> Of course, if I could calculate those 32000 answers in a few seconds, it
> might not be as important to memoize the routine as it is.
>
> I think that my testing has eliminated everything other than the insertions
> into the sparse array.  One thing that I could do, if appropriate, would be
> to save up a bunch of insertions and to do them when I had a few hundred or
> thousand, with a lookup that first looked in the sparse array and then
> searched for the index in the list of "to be inserted" items.  This is a
> pathological case - if I was doing insertions in one pass and lookups in
> another, I would definitely just put the insertions into a queue and then
> insert the data into the sparse array in one operation.  In my case, I've
> run this routine so many times that even if I ended up just saving the
> insertions into a table and then doing the insertions into the sparse array
> when I was done so that I didn't have to redo the calculations for the next
> run, that would work...at 0.004 sec/insertion, it does not hurt that much
> until you have to do thousands back to back.
>
> --
> Of course I can ride in the carpool lane, officer.  Jesus is my constant
> companion.
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] horrible performance regarding loading saved data into a sparse array

Reply via email to