[Jprogramming] horrible performance regarding loading saved data into a sparse array

Nick Simicich Sat, 12 Nov 2011 01:09:38 -0800

I have a very large sparse array that only has a tiny amount of hard to
calculate actual data in it.  I decided to make verbs to backup the data in
the array to a file on disk, so that I can restore the data and so that I
am protected, at least partially, from power failures and so forth.  I have
working code, but my first pass at the restore code performed very poorly.
 I created another version of the code where performance is consistent with
the same sort of operation running against a non-sparse array.


The operation in question was the modification of an element in a sparse
array, in a loop.

The fix was to do all modifications in a single statement, rather than a
loop. This was possible in this case.

I am running it under Engine: j701/2011-01-10/11:25
Library: 7.01.050
Platform: Win 64
Installer: j701a_win64.exe
InstallPath: c:/program files (x86)/j64-701

The backup program runs very quickly, under .2 sec:

e85memobackup =: verb define
format =. ( ": (x: 4 $. e85memo),.x: 5 $. e85memo)
((":$format),';',,format) (1!:2)
<'\Users\Nick\j64-602-user\temp\e85backup.txt'
2 0 $ 0
)

I think that the backup program is about as good as I can make it.  I
elected to use character data, but converting from the character data back
to a numeric matrix goes very fast.

e85memorestoreold =: verb define
e85memo=: 1$.160000 160000;0 1;_
t =. 1!:1 <'\Users\Nick\j64-602-user\temp\e85backup.txt'
NB. the semicolon separates the shape from the bulk of the backup.
split =. I. t=';'
NB. the first part of the file
s =. ". split {. t
NB. is the shape of the rest of the file.
mat =. ".s$(>:split)}.t

for_line. mat do.
  e85memo=: (2{line) (<0 1{line) } e85memo
  NB. e85memo =: (2{"1 mat) (<"1 ] 0 1{"1 mat) } e85memo
end.
2 0 $ 0
)

The restore takes quite a while to run.  Execution time up to where the
"for." loop starts is no more than a fraction of a second.  The mat array
takes a moment to build, and is 32027 3 in shape.

Loading the data into the e85memo array takes 1 minute, 55 seconds. I can't
test this with a full sized ordinary array, because, of course, attempting
to execute a statement like

e85memo =: 160000 160000 $ _

gives me an "out of memory" instantly.

Now I then thought, OK, I can do the same number of restore operations into
a much smaller array:

e85memorestorefake =: verb define
e85memox=: 4000 4000 $_
t =. 1!:1 <'\Users\Nick\j64-602-user\temp\e85backup.txt'
split =. I. t=';'
s =. ". split {. t
mat =. ".s$(>:split)}.t
for_line. mat do.
e85memox=: (2{line) (<(4000|0 1{line)) } e85memox
end.
0 2 $ 0
)

Same input data file, same number of insertion operations into the matrix,
this program takes a quarter second to run.

So, my presumption is that any single insertion operation into a sparse
array takes a very long time (by computer standards).

The other day, I wrote a very poor question which came down to, "how do you
make many changes to an array at once?" but the question was so poorly
worded that no one understood it.

I still could not find it in the doc (sometimes I find part of the doc to
be opaque) but I did a bunch more experimentation, and determined that this
would work:

e85memorestore =: verb define
e85memo=: 1$.160000 160000;0 1;_
t =. 1!:1 <'\Users\Nick\j64-602-user\temp\e85backup.txt'
split =. I. t=';'
s =. ". split {. t
mat =. ".s$(>:split)}.t
e85memo =: (2{"1 mat) (<"1 ] 0 1{"1 mat) } e85memo
0 2 $ 0
)

This program runs in approximately 0.08 seconds.

I believe that this proves that it produces the same results:

   load 'c:/users/nick/j602-user/temp/e85x.ijs'
NB. This is the old loopy copy of the program
   time 'e85memorestoreold i.0'
execution of e85memorestoreold i.0
took 0 h 1 m 54.489000 sec wall time 0 h 1 m 54.486888 s cpu

   e85keep =. e85memo
   e85keep -: e85memo
1
NB. Loop replaced as above
   time 'e85memorestore i.0'
execution of e85memorestore i.0
took 0 h 0 m 0.060000 sec wall time 0 h 0 m 0.059890 s cpu

   e85keep -: e85memo
1

Is it something about how I allocated my sparse array?  I don't think I've
made any outright mistakes...I just didn't know how to make the multiple
array modifications until minutes ago...and it would never have occurred to
me that it would cause this much performance change - it does not cause
this sort of problems with a non-sparse array.

Musing:  (Skip this if my style irritates you, sorry)

So this was another one of those that started out as a question.  Now that
I know enough J to get sparse arrays to actually work, most of the time, I
have found them to be a really great tool.  In this case, for example, I
have a very large array of numeric values that I can use directly as an
index into the array to memoize an expensive process. But if, for example,
I had to make many insertions into the array in a small loop - like if
those 32000 items were calculated over a few seconds instead of a large
number of hours, making all those separate insertions would be very
expensive, I might have to do something else.  But if I can make all of the
insertions at once, the time is competitive with doing the same number of
insertions into a non-sparse array.

Of course, if I could calculate those 32000 answers in a few seconds, it
might not be as important to memoize the routine as it is.

I think that my testing has eliminated everything other than the insertions
into the sparse array.  One thing that I could do, if appropriate, would be
to save up a bunch of insertions and to do them when I had a few hundred or
thousand, with a lookup that first looked in the sparse array and then
searched for the index in the list of "to be inserted" items.  This is a
pathological case - if I was doing insertions in one pass and lookups in
another, I would definitely just put the insertions into a queue and then
insert the data into the sparse array in one operation.  In my case, I've
run this routine so many times that even if I ended up just saving the
insertions into a table and then doing the insertions into the sparse array
when I was done so that I didn't have to redo the calculations for the next
run, that would work...at 0.004 sec/insertion, it does not hurt that much
until you have to do thousands back to back.

-- 
Of course I can ride in the carpool lane, officer.  Jesus is my constant
companion.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jprogramming] horrible performance regarding loading saved data into a sparse array

Reply via email to