Use (3!:1) and (3!2) to write and read the sparse array. I made a test script to insert several thousand numbers into your array. I just hard-coded numbers in the test, but it should give you the idea. The script runs in a fraction of a second on my machine.
The last two lines show something of what is in the array. NB. Start______________ make_sparce=:3 : 0 s=.1 $.160000 160000;0 1;_ put_sparce s ) get_sparce=:3 : 0 3!:2 fread 'sparce' ) put_sparce=:3 : 0 (3!:1 y) fwrite 'sparce' ) insert_values=:3 : 0 NB. Randomly insert 1000 random values into s v=.100*?1000$0 i=.?1000$160000 j=.?1000$160000 s=.get_sparce'' s=.v (<"1 i,.j)}s put_sparce s ) make_sparce '' insert_values'' insert_values'' insert_values'' insert_values'' s=:get_sparce'' 7$.s +/5$.s NB. End________________________ On Sat, Nov 12, 2011 at 2:08 AM, Nick Simicich <[email protected]> wrote: > I have a very large sparse array that only has a tiny amount of hard to > calculate actual data in it. I decided to make verbs to backup the data in > the array to a file on disk, so that I can restore the data and so that I > am protected, at least partially, from power failures and so forth. I have > working code, but my first pass at the restore code performed very poorly. > I created another version of the code where performance is consistent with > the same sort of operation running against a non-sparse array. > > The operation in question was the modification of an element in a sparse > array, in a loop. > > The fix was to do all modifications in a single statement, rather than a > loop. This was possible in this case. > > I am running it under Engine: j701/2011-01-10/11:25 > Library: 7.01.050 > Platform: Win 64 > Installer: j701a_win64.exe > InstallPath: c:/program files (x86)/j64-701 > > The backup program runs very quickly, under .2 sec: > > e85memobackup =: verb define > format =. ( ": (x: 4 $. e85memo),.x: 5 $. e85memo) > ((":$format),';',,format) (1!:2) > <'\Users\Nick\j64-602-user\temp\e85backup.txt' > 2 0 $ 0 > ) > > I think that the backup program is about as good as I can make it. I > elected to use character data, but converting from the character data back > to a numeric matrix goes very fast. > > e85memorestoreold =: verb define > e85memo=: 1$.160000 160000;0 1;_ > t =. 1!:1 <'\Users\Nick\j64-602-user\temp\e85backup.txt' > NB. the semicolon separates the shape from the bulk of the backup. > split =. I. t=';' > NB. the first part of the file > s =. ". split {. t > NB. is the shape of the rest of the file. > mat =. ".s$(>:split)}.t > > for_line. mat do. > e85memo=: (2{line) (<0 1{line) } e85memo > NB. e85memo =: (2{"1 mat) (<"1 ] 0 1{"1 mat) } e85memo > end. > 2 0 $ 0 > ) > > The restore takes quite a while to run. Execution time up to where the > "for." loop starts is no more than a fraction of a second. The mat array > takes a moment to build, and is 32027 3 in shape. > > Loading the data into the e85memo array takes 1 minute, 55 seconds. I can't > test this with a full sized ordinary array, because, of course, attempting > to execute a statement like > > e85memo =: 160000 160000 $ _ > > gives me an "out of memory" instantly. > > Now I then thought, OK, I can do the same number of restore operations into > a much smaller array: > > e85memorestorefake =: verb define > e85memox=: 4000 4000 $_ > t =. 1!:1 <'\Users\Nick\j64-602-user\temp\e85backup.txt' > split =. I. t=';' > s =. ". split {. t > mat =. ".s$(>:split)}.t > for_line. mat do. > e85memox=: (2{line) (<(4000|0 1{line)) } e85memox > end. > 0 2 $ 0 > ) > > Same input data file, same number of insertion operations into the matrix, > this program takes a quarter second to run. > > So, my presumption is that any single insertion operation into a sparse > array takes a very long time (by computer standards). > > The other day, I wrote a very poor question which came down to, "how do you > make many changes to an array at once?" but the question was so poorly > worded that no one understood it. > > I still could not find it in the doc (sometimes I find part of the doc to > be opaque) but I did a bunch more experimentation, and determined that this > would work: > > e85memorestore =: verb define > e85memo=: 1$.160000 160000;0 1;_ > t =. 1!:1 <'\Users\Nick\j64-602-user\temp\e85backup.txt' > split =. I. t=';' > s =. ". split {. t > mat =. ".s$(>:split)}.t > e85memo =: (2{"1 mat) (<"1 ] 0 1{"1 mat) } e85memo > 0 2 $ 0 > ) > > This program runs in approximately 0.08 seconds. > > I believe that this proves that it produces the same results: > > load 'c:/users/nick/j602-user/temp/e85x.ijs' > NB. This is the old loopy copy of the program > time 'e85memorestoreold i.0' > execution of e85memorestoreold i.0 > took 0 h 1 m 54.489000 sec wall time 0 h 1 m 54.486888 s cpu > > e85keep =. e85memo > e85keep -: e85memo > 1 > NB. Loop replaced as above > time 'e85memorestore i.0' > execution of e85memorestore i.0 > took 0 h 0 m 0.060000 sec wall time 0 h 0 m 0.059890 s cpu > > e85keep -: e85memo > 1 > > Is it something about how I allocated my sparse array? I don't think I've > made any outright mistakes...I just didn't know how to make the multiple > array modifications until minutes ago...and it would never have occurred to > me that it would cause this much performance change - it does not cause > this sort of problems with a non-sparse array. > > Musing: (Skip this if my style irritates you, sorry) > > So this was another one of those that started out as a question. Now that > I know enough J to get sparse arrays to actually work, most of the time, I > have found them to be a really great tool. In this case, for example, I > have a very large array of numeric values that I can use directly as an > index into the array to memoize an expensive process. But if, for example, > I had to make many insertions into the array in a small loop - like if > those 32000 items were calculated over a few seconds instead of a large > number of hours, making all those separate insertions would be very > expensive, I might have to do something else. But if I can make all of the > insertions at once, the time is competitive with doing the same number of > insertions into a non-sparse array. > > Of course, if I could calculate those 32000 answers in a few seconds, it > might not be as important to memoize the routine as it is. > > I think that my testing has eliminated everything other than the insertions > into the sparse array. One thing that I could do, if appropriate, would be > to save up a bunch of insertions and to do them when I had a few hundred or > thousand, with a lookup that first looked in the sparse array and then > searched for the index in the list of "to be inserted" items. This is a > pathological case - if I was doing insertions in one pass and lookups in > another, I would definitely just put the insertions into a queue and then > insert the data into the sparse array in one operation. In my case, I've > run this routine so many times that even if I ended up just saving the > insertions into a table and then doing the insertions into the sparse array > when I was done so that I didn't have to redo the calculations for the next > run, that would work...at 0.004 sec/insertion, it does not hurt that much > until you have to do thousands back to back. > > -- > Of course I can ride in the carpool lane, officer. Jesus is my constant > companion. > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
