Re: [Perldl] PDL and large data structure per cell in a large array

Derek Lamb Mon, 30 Mar 2009 17:16:03 -0700

P Kishor wrote:
> First, Derek, thanks for explaining things so gently. Obviously, I am
> super newbie with PDL. Now, onward...
>
> On Mon, Mar 30, 2009 at 5:00 PM, Derek Lamb <[email protected]> wrote:
>   
>> P Kishor wrote:
>>     
>>> Here is a large data structure --
>>>
>>> my $pdl = pdl (
>>>        (
>>>                1 .. 10,
>>>                [ [1 .. 33], x $d ], # d arrays
>>>                [ [1 .. 57], x $l ], # l arrays
>>>                [ [1 .. 9 ], x $m ], # m arrays
>>>        )
>>> );
>>>
>>> $d is BETWEEN 0 and 5
>>> $l is BETWEEN 1 and 10
>>> $m is 7300, but could be as high as 18,000 or 20,000
>>>
>>> my $size = howbig($pdl->get_datatype);
>>> print "size of pdl is: $size\n";
>>> print $pdl->info("Type: %T Dim: %-15D State: %S"), "\n";
>>> my $n = $pdl->nelem;
>>> print "There are $n elements in the piddle\n";
>>>
>>> I get the following --
>>>
>>> size of pdl is: 8
>>> Type: Double Dim: D [57,7300,13] State: P
>>> There are 54093000 elements in the piddle
>>>
>>> Makes sense so far, but what does that "size of pdl is: 8" mean?
>>> Surely, that is not the number of bytes being used by this data
>>> structure?
>>>       
>> Of course not.  The docs say that howbig 'Returns the size of a piddle
>> datatype in bytes.'  You have a piddle of type double.  Doubles take 8 bytes
>> each.
>>
>>     
>>>  By my calculations, the data structure weighs in at about
>>> 450 KB packed as a Storable object. By the way... in the pseudo code
>>> above, I have shown the number of elements in the arrays, not the
>>> actual values. So, for example, in each of the 'd' arrays, there are
>>> 33 elements, but only about 4 or 5 of them are INTEGERS, the rest
>>> being REAL numbers. This is useful to get a sense of the size of the
>>> data structure.
>>>
>>>       
>> Perhaps useful to people, but not so useful to PDL.  If you have a
>> five-element piddle and 4 elements are integers and 1 is a double, then the
>> whole thing is promoted to double.  The efficiency of PDL is derived mainly
>> from knowing the byte-size of the elements of a piddle a priori.  If you
>> want to mix ints and doubles like this, you probably need to rethink your
>> data structure.  You can use plain old Perl lists, which don't require
>> uniform typing, but the overhead will probably kill you.  Hashes or lists of
>> piddles is also an option to consider.
>>     
>>> Now, this data structure is the data for computation that is applied
>>> to a large array, say, 1000 x 1000 or even 1500 x 1800, so between a
>>> million to a couple of million or more elements, on a cell by cell
>>> basis. Imagine applying f(d) to the array where d is data structure,
>>> with f(d) being applied to each cell individually.
>>>
>>> Curious to test the limits of my machine and PDL, I tried to create a
>>> piddle that held 1000_000 such structures, I got a 'bus error'. At an
>>> array with 100 elements, I got a segmentation fault. At an array with
>>> 10 elements, it worked.
>>>
>>>       
>> And probably with the 10^6 example you got a computer brought to its knees
>> trying to allocate 40 TB of memory.  If I understand you correctly (let me
>> know if I don't), you want to create a super-piddle that is (to use your
>> examples here) 10^6 by 57 by 7300 by 13.  Simple calculation shows that the
>> base piddle $pdl is 41 MB, so if you want a million of these you need 41
>> million MB of memory somewhere.  10 of those is not such a big problem, 100
>> might work if you have several GB of memory, but 10^6 is just crazy.
>>  Probably need to rethink how you're doing things there.
>>     
>
> Yes, only now do I realize that PDL pads everything up to make for n-d
> arrays with no holes. Yes, 100 of these piddles would be more than 4
> GB of memory. I have 32 GB per machine, but, I believe Perl can
> address only less than 4 GB memory per process, no? or, is it 2 GB
> (the 32-bit programs limit).
>
>   
I'm not sure--I've never run up against a Perl process limit.  Remember 
that piddles are treated differently than perl SVs, so that limit may or 
may not apply.  But you could test it by saying
perldl> $a = zeroes(3*1024*1024*1024)
which should try to allocate a 24 GB piddle.  Fun stuff.


> In any case, you understood the problem correctly. We have an area of
> 10^6 or upto 2*10^6 cells. Each one of those cells has that 57x7300x13
> (or, even 57x18000x13) piddle data structure (all depends on the
> number of years of weather data... 20 years is 7300 rows, 50 years is
> 18250 rows, and so on). For now, thankfully, each cell is independent.
> In the future, things might become more interesting when each cell
> might start depending on what happens in its neighboring cell, kinda
> like the game of life (has anyone used PDL to do game of life?), but
> that is not the case for now.
>   
I think Craig has a version that did it in 3 or 5 lines or something.

> Seems like the best thing might be to break up the area into smaller
> chunks of n cells so that n x 57 x 7300 x 13 fits into the memory of a
> single Perl process and then run multiple processes concurrently using
> up the multiple cores in the computer.
>
> Guidance on how to achieve this would be very much appreciated. PDL is
> making life with Perl seem even more interesting, and I am quite eager
> to at least try out PDL in this work. If it doesn't work then it
> doesn't work, but I do want to give it a shot.
>   
I'm pretty sure (but could be completely wrong) that Perl does not 
support multiple cores automatically.  This functionality is not yet in 
PDL either.  But there is a Perl fork, which calls your system fork, so 
you might be able to cook something up that way.  I don't have any of my 
books with me right now, so I can't provide specifics.
>
>
>
>   
>> Derek
>>
>>     
>>> I am seeking some suggestions on how to work with such data using PDL.
>>>
>>> Many thanks,
>>>
>>>
>>>
>>>
>>>       
>>     
>
>
>
>   


_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] PDL and large data structure per cell in a large array

Reply via email to