Re: [Perldl] PDL and large data structure per cell in a large array

P Kishor Mon, 30 Mar 2009 14:29:21 -0700

First, Derek, thanks for explaining things so gently. Obviously, I am
super newbie with PDL. Now, onward...


On Mon, Mar 30, 2009 at 5:00 PM, Derek Lamb <[email protected]> wrote:
> P Kishor wrote:
>>
>> Here is a large data structure --
>>
>> my $pdl = pdl (
>>        (
>>                1 .. 10,
>>                [ [1 .. 33], x $d ], # d arrays
>>                [ [1 .. 57], x $l ], # l arrays
>>                [ [1 .. 9 ], x $m ], # m arrays
>>        )
>> );
>>
>> $d is BETWEEN 0 and 5
>> $l is BETWEEN 1 and 10
>> $m is 7300, but could be as high as 18,000 or 20,000
>>
>> my $size = howbig($pdl->get_datatype);
>> print "size of pdl is: $size\n";
>> print $pdl->info("Type: %T Dim: %-15D State: %S"), "\n";
>> my $n = $pdl->nelem;
>> print "There are $n elements in the piddle\n";
>>
>> I get the following --
>>
>> size of pdl is: 8
>> Type: Double Dim: D [57,7300,13] State: P
>> There are 54093000 elements in the piddle
>>
>> Makes sense so far, but what does that "size of pdl is: 8" mean?
>> Surely, that is not the number of bytes being used by this data
>> structure?
>
> Of course not.  The docs say that howbig 'Returns the size of a piddle
> datatype in bytes.'  You have a piddle of type double.  Doubles take 8 bytes
> each.
>
>>  By my calculations, the data structure weighs in at about
>> 450 KB packed as a Storable object. By the way... in the pseudo code
>> above, I have shown the number of elements in the arrays, not the
>> actual values. So, for example, in each of the 'd' arrays, there are
>> 33 elements, but only about 4 or 5 of them are INTEGERS, the rest
>> being REAL numbers. This is useful to get a sense of the size of the
>> data structure.
>>
>
> Perhaps useful to people, but not so useful to PDL.  If you have a
> five-element piddle and 4 elements are integers and 1 is a double, then the
> whole thing is promoted to double.  The efficiency of PDL is derived mainly
> from knowing the byte-size of the elements of a piddle a priori.  If you
> want to mix ints and doubles like this, you probably need to rethink your
> data structure.  You can use plain old Perl lists, which don't require
> uniform typing, but the overhead will probably kill you.  Hashes or lists of
> piddles is also an option to consider.
>>
>> Now, this data structure is the data for computation that is applied
>> to a large array, say, 1000 x 1000 or even 1500 x 1800, so between a
>> million to a couple of million or more elements, on a cell by cell
>> basis. Imagine applying f(d) to the array where d is data structure,
>> with f(d) being applied to each cell individually.
>>
>> Curious to test the limits of my machine and PDL, I tried to create a
>> piddle that held 1000_000 such structures, I got a 'bus error'. At an
>> array with 100 elements, I got a segmentation fault. At an array with
>> 10 elements, it worked.
>>
>
> And probably with the 10^6 example you got a computer brought to its knees
> trying to allocate 40 TB of memory.  If I understand you correctly (let me
> know if I don't), you want to create a super-piddle that is (to use your
> examples here) 10^6 by 57 by 7300 by 13.  Simple calculation shows that the
> base piddle $pdl is 41 MB, so if you want a million of these you need 41
> million MB of memory somewhere.  10 of those is not such a big problem, 100
> might work if you have several GB of memory, but 10^6 is just crazy.
>  Probably need to rethink how you're doing things there.

Yes, only now do I realize that PDL pads everything up to make for n-d
arrays with no holes. Yes, 100 of these piddles would be more than 4
GB of memory. I have 32 GB per machine, but, I believe Perl can
address only less than 4 GB memory per process, no? or, is it 2 GB
(the 32-bit programs limit).

In any case, you understood the problem correctly. We have an area of
10^6 or upto 2*10^6 cells. Each one of those cells has that 57x7300x13
(or, even 57x18000x13) piddle data structure (all depends on the
number of years of weather data... 20 years is 7300 rows, 50 years is
18250 rows, and so on). For now, thankfully, each cell is independent.
In the future, things might become more interesting when each cell
might start depending on what happens in its neighboring cell, kinda
like the game of life (has anyone used PDL to do game of life?), but
that is not the case for now.

Seems like the best thing might be to break up the area into smaller
chunks of n cells so that n x 57 x 7300 x 13 fits into the memory of a
single Perl process and then run multiple processes concurrently using
up the multiple cores in the computer.

Guidance on how to achieve this would be very much appreciated. PDL is
making life with Perl seem even more interesting, and I am quite eager
to at least try out PDL in this work. If it doesn't work then it
doesn't work, but I do want to give it a shot.




>
> Derek
>
>> I am seeking some suggestions on how to work with such data using PDL.
>>
>> Many thanks,
>>
>>
>>
>>
>
>



-- 
Puneet Kishor http://www.punkish.org/
Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
Carbon Model http://carbonmodel.org/
Open Source Geospatial Foundation http://www.osgeo.org/
Sent from: Ft Myer VA United States.

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] PDL and large data structure per cell in a large array

Reply via email to