P Kishor wrote: > First, Derek, thanks for explaining things so gently. Obviously, I am > super newbie with PDL. Now, onward... > > On Mon, Mar 30, 2009 at 5:00 PM, Derek Lamb <[email protected]> wrote: > >> P Kishor wrote: >> >>> Here is a large data structure -- >>> >>> my $pdl = pdl ( >>> ( >>> 1 .. 10, >>> [ [1 .. 33], x $d ], # d arrays >>> [ [1 .. 57], x $l ], # l arrays >>> [ [1 .. 9 ], x $m ], # m arrays >>> ) >>> ); >>> >>> $d is BETWEEN 0 and 5 >>> $l is BETWEEN 1 and 10 >>> $m is 7300, but could be as high as 18,000 or 20,000 >>> >>> my $size = howbig($pdl->get_datatype); >>> print "size of pdl is: $size\n"; >>> print $pdl->info("Type: %T Dim: %-15D State: %S"), "\n"; >>> my $n = $pdl->nelem; >>> print "There are $n elements in the piddle\n"; >>> >>> I get the following -- >>> >>> size of pdl is: 8 >>> Type: Double Dim: D [57,7300,13] State: P >>> There are 54093000 elements in the piddle >>> >>> Makes sense so far, but what does that "size of pdl is: 8" mean? >>> Surely, that is not the number of bytes being used by this data >>> structure? >>> >> Of course not. The docs say that howbig 'Returns the size of a piddle >> datatype in bytes.' You have a piddle of type double. Doubles take 8 bytes >> each. >> >> >>> By my calculations, the data structure weighs in at about >>> 450 KB packed as a Storable object. By the way... in the pseudo code >>> above, I have shown the number of elements in the arrays, not the >>> actual values. So, for example, in each of the 'd' arrays, there are >>> 33 elements, but only about 4 or 5 of them are INTEGERS, the rest >>> being REAL numbers. This is useful to get a sense of the size of the >>> data structure. >>> >>> >> Perhaps useful to people, but not so useful to PDL. If you have a >> five-element piddle and 4 elements are integers and 1 is a double, then the >> whole thing is promoted to double. The efficiency of PDL is derived mainly >> from knowing the byte-size of the elements of a piddle a priori. If you >> want to mix ints and doubles like this, you probably need to rethink your >> data structure. You can use plain old Perl lists, which don't require >> uniform typing, but the overhead will probably kill you. Hashes or lists of >> piddles is also an option to consider. >> >>> Now, this data structure is the data for computation that is applied >>> to a large array, say, 1000 x 1000 or even 1500 x 1800, so between a >>> million to a couple of million or more elements, on a cell by cell >>> basis. Imagine applying f(d) to the array where d is data structure, >>> with f(d) being applied to each cell individually. >>> >>> Curious to test the limits of my machine and PDL, I tried to create a >>> piddle that held 1000_000 such structures, I got a 'bus error'. At an >>> array with 100 elements, I got a segmentation fault. At an array with >>> 10 elements, it worked. >>> >>> >> And probably with the 10^6 example you got a computer brought to its knees >> trying to allocate 40 TB of memory. If I understand you correctly (let me >> know if I don't), you want to create a super-piddle that is (to use your >> examples here) 10^6 by 57 by 7300 by 13. Simple calculation shows that the >> base piddle $pdl is 41 MB, so if you want a million of these you need 41 >> million MB of memory somewhere. 10 of those is not such a big problem, 100 >> might work if you have several GB of memory, but 10^6 is just crazy. >> Probably need to rethink how you're doing things there. >> > > Yes, only now do I realize that PDL pads everything up to make for n-d > arrays with no holes. Yes, 100 of these piddles would be more than 4 > GB of memory. I have 32 GB per machine, but, I believe Perl can > address only less than 4 GB memory per process, no? or, is it 2 GB > (the 32-bit programs limit). > > I'm not sure--I've never run up against a Perl process limit. Remember that piddles are treated differently than perl SVs, so that limit may or may not apply. But you could test it by saying perldl> $a = zeroes(3*1024*1024*1024) which should try to allocate a 24 GB piddle. Fun stuff.
> In any case, you understood the problem correctly. We have an area of > 10^6 or upto 2*10^6 cells. Each one of those cells has that 57x7300x13 > (or, even 57x18000x13) piddle data structure (all depends on the > number of years of weather data... 20 years is 7300 rows, 50 years is > 18250 rows, and so on). For now, thankfully, each cell is independent. > In the future, things might become more interesting when each cell > might start depending on what happens in its neighboring cell, kinda > like the game of life (has anyone used PDL to do game of life?), but > that is not the case for now. > I think Craig has a version that did it in 3 or 5 lines or something. > Seems like the best thing might be to break up the area into smaller > chunks of n cells so that n x 57 x 7300 x 13 fits into the memory of a > single Perl process and then run multiple processes concurrently using > up the multiple cores in the computer. > > Guidance on how to achieve this would be very much appreciated. PDL is > making life with Perl seem even more interesting, and I am quite eager > to at least try out PDL in this work. If it doesn't work then it > doesn't work, but I do want to give it a shot. > I'm pretty sure (but could be completely wrong) that Perl does not support multiple cores automatically. This functionality is not yet in PDL either. But there is a Perl fork, which calls your system fork, so you might be able to cook something up that way. I don't have any of my books with me right now, so I can't provide specifics. > > > > >> Derek >> >> >>> I am seeking some suggestions on how to work with such data using PDL. >>> >>> Many thanks, >>> >>> >>> >>> >>> >> > > > > _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
