Re: [Perldl] matching vectors inside a PDL

Chris Marshall Fri, 21 Nov 2014 07:10:21 -0800

I couldn't reproduce the problem if I disable the
64bit index support either.  Perhaps the problem
is in the perl version.  Maybe someone will be
able to reproduce the problem when you have
a specific test case---including me.


--Chris


On Fri, Nov 21, 2014 at 9:34 AM, Chris Marshall <[email protected]> wrote:
> Hi Ken-
>
> I am unable to generate the error with PDL-2.007
> either.  My system has 8GiB of memory and the
> PDL build is using the 64bit index support.
>
> What are the specs of your linux box and could
> you please send the output of the 'perldl -V'
> command.  If you built PDL from sources, then
> the build log should indicate whether 64bit index
> support was enabled.  If you $Config{ivsize} is
> 8 then you should have 64bit support as well.
>
> --Chris
>
>
> On Fri, Nov 21, 2014 at 8:02 AM, Chris Marshall <[email protected]> 
> wrote:
>> Ken-
>>
>> If you could make a short script that generates the
>> problem along with the output/error messages, that
>> would help.
>>
>> Do you have $PDL::BIGPDL set?  Might try with
>> that set to 1.
>>
>> I'll try the problem code on PDL-2.007 to see if that
>> is the reason for the differences.
>>
>> --Chris
>>
>>
>>
>> On Thu, Nov 20, 2014 at 6:18 PM, LYONS, KENNETH B (KENNETH)
>> <[email protected]> wrote:
>>> Chris
>>> I'm running perl 5.8.8 on a rather old linux system.  I installed the perl 
>>> modules rather recently from the PDL site, so I'd expect they are up to 
>>> date with whatever is there.  From the names of the files, I'd say it's 
>>> 2.007.
>>>
>>> I've tried a variety of ways of using the inplace method, and none of them 
>>> produced a perl error akin to what you got below.  The errors were coming 
>>> out of the PDL module itself, complaining about the size of the piddle 
>>> being over 1GB.  Given the dimensions of the piddle that is being 
>>> calculated (around 200 MB), that shouldn't have happened--unless it's using 
>>> doubles, which would make it ~1.6 GB.  Like I said, I got around the 
>>> problem in kind of a hack, by just slicing things up 20K rows at a 
>>> time--but I'd really like to find a way to do it right!
>>>
>>> Among the things I tried were these:
>>>   $sigs->xchg(0,1) *= $present;
>>>   $sigs->xchg(0,1)->inplace->mult($present,0);
>>>   PDL::Ops::mult(inplace $sigs->xchg(0,1), $present, 0);
>>>   $sigs->xchg(0,1)->inplace *= present;
>>> None of which got around the error.
>>>
>>> Below is what finally worked (but only by occupying more memory than it 
>>> should):
>>>
>>> ($psize) = $present->dims;
>>> $STEPSIZE = 20000;
>>> for ($p = 0; $p < $psize; $p += $STEPSIZE) {  # note: it's known that 
>>> $present and $sigs have the same size!
>>>   my $start = $p;
>>>   my $end = $start+$STEPSIZE-1;
>>>   $end = $psize-1 if $end >= $psize;
>>>   $sigs->xchg(0,1)->slice("$start:$end,:,:") *= 
>>> $present->slice("$start:$end");
>>> }
>>>
>>> Like I said, it's a bit of a hack.  But it does wind up doing the 
>>> appropriate filtering on the $sigs matrix.
>>>
>>> Ken
>>>
>>> p.s. I don't know if it makes a difference, per se, but you are evidently 
>>> operating in an interactive environment, not an actual perl script.  I'm 
>>> using this to automate thru a very large body of data, eventually be run 
>>> automatically on a daily basis, so it's written as a script that calls the 
>>> PDL modules.  The error I refer to above was appearing in the error output 
>>> of the perl command.
>>>
>>> KL
>>>
>>> ----------------------------------------------------------------------------
>>>
>>> Below is the remainder of the thread that was mostly sidebar:
>>> -------------------------------------
>>> Hi Ken-
>>>
>>> You could sync up with the message I forwarded
>>> to perldl by replying with this message to that
>>> thread.  The main reason for keeping the discussion
>>> on the list is so that others can benefit from the
>>> discussion and/or offer other points of view/facts/...
>>>
>>> I tried the following in pdl2 and was not able to generate
>>> an error.  You are right that all byte args shouldn't be
>>> expanded to double intermediates.  I'm using PDL-2.007_03
>>> on cygwin64/win7 and the *= works fine but I get an error
>>> with the inplace construct (not the same as yours)
>>>
>>>   pdl> $sigs = (10*random(40,150000,26))->floor->byte
>>>   pdl> $present = (20*random(150000))->floor->byte
>>>   pdl> $ns = $sigs->copy
>>>   pdl> ?vars
>>>   PDL variables in package main::
>>>
>>>   Name         Type   Dimension       Flow  State          Mem
>>>   ----------------------------------------------------------------
>>>   $ns            Byte D [40,150000,26]        P          148.77MB
>>>   $present       Byte D [150000]             P            0.14MB
>>>   $sigs          Byte D [40,150000,26]        P          148.77MB
>>>   pdl> $sigs->xchg(0,1) *= $present  # works
>>>   pdl> $sigs = $ns->copy
>>>   pdl> $sigs->xchg(0,1)->inplace *= $present
>>>   Runtime error: Can't modify non-lvalue subroutine call at (eval 484) line 
>>> 5.
>>>
>>> What is your os/platform specs and what version of PDL are
>>> you using?
>>>
>>> --Chris
>>>
>>> On Thu, Nov 20, 2014 at 2:47 PM, LYONS, KENNETH B (KENNETH)
>>> <[email protected]> wrote:
>>>> (Didn't understand your first line, as there was no cc on this message?  I 
>>>> pretty much automatically avoid ever using reply-all, but I guess in this 
>>>> case that's how it's supposed to work, right?  How do I cc it to get the 
>>>> thread to match up?)
>>>>
>>>> Actually, all the pdls involved are byte type.  I was assuming when I saw 
>>>> the errors occurring that it was somehow generating a double intermediate, 
>>>> because it should have had plenty of room if it stayed as byte.
>>>>
>>>> The specific code was as follows:
>>>>
>>>> # sigs is byte, with dimensions about 40 x 150000 x 26
>>>> # present is byte, with dimension of 150000
>>>> $sigs->xchg(0,1)->inplace *= $present;
>>>>
>>>> I had tried numerous ways of using inplace in that line, and none of them 
>>>> avoided the complaint that it had run out of memory (although the memory 
>>>> usage prior to that command was about 10%).  So if it's not generating a 
>>>> double intermediate, I don't see why it would run out of memory (it 
>>>> shouldn't have exceeded about 20% or so).  I finally got it to work by 
>>>> splitting the structures up into slices of about 20K rows each, and doing 
>>>> the calculation that way.
>>>>
>>>> Other approaches?
>>>>
>>>> Ken
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Chris Marshall [mailto:[email protected]]
>>>> Sent: Wednesday, November 19, 2014 4:23 PM
>>>> To: LYONS, KENNETH B (KENNETH)
>>>> Subject: Re: [Perldl] matching vectors inside a PDL
>>>>
>>>> <re-cc-ing the perldl list>
>>>>
>>>> Thanks for the background.  If you hit a snag, feel free to
>>>> post to the perldl list.  We're usually able to help for
>>>> specific problems especially if accompanied by code
>>>> demonstrating the problem:
>>>>
>>>>   If I have a big byte piddle $a and I multiply it in-place
>>>>   my PDL session crashes because of a huge intermediate
>>>>   temp:
>>>>
>>>>   pdl> $a = (10*random(100))->floor->byte;
>>>>
>>>>   pdl> $a->inplace->mult(5,0);
>>>>   Error message here or crash
>>>>
>>>> Without a specific example, I would guess that the problem
>>>> is the piddle you are multiplying by (or perl scalar) is of
>>>> type double which would result in an intermediate temp of
>>>> double type which would then collapse down to a byte piddle
>>>> again at the end.  If both arguments to multiply are of byte
>>>> type, you can avoid the big double intermediate temp.  E.g.
>>>>
>>>>   pdl> p pdl(5)->type
>>>>   double
>>>>   pdl> p byte(5)->type
>>>>   byte
>>>>
>>>> Improved type support is planned for the PDL3 work.  My
>>>> initial ideas for bitfield support can be seen here:
>>>>
>>>>   
>>>> http://mailman.jach.hawaii.edu/pipermail/pdl-porters/2013-December/006132.html
>>>>
>>>> Hope this helps,
>>>> Chris
>>>>
>>>> On Wed, Nov 19, 2014 at 1:42 PM, LYONS, KENNETH B (KENNETH)
>>>> <[email protected]> wrote:
>>>>> Chris
>>>>>
>>>>> In answer to your question: my path in was as follows:  I wanted to find 
>>>>> a way to implement an LP on a medium-size problem (<~10K variables), and 
>>>>> the rest of my code was in perl--so I went looking for an LP 
>>>>> implementation in perl.   I was expecting to find a C-compiled module 
>>>>> that would do an LP specifically.  I found some instances of that sort of 
>>>>> thing, but I also ran across one using PDL.  It didn't do quite what I 
>>>>> wanted, but when I saw the PDL site, it was obvious this was something I 
>>>>> needed to know about.  I wound up writing my own simplex implementation 
>>>>> in PDL to do specifically what I needed, and that worked great--and I was 
>>>>> pretty blown away at the speed.  So then I started looking into how I 
>>>>> could back up and get the datasets I was dealing with implemented as PDLs 
>>>>> to start with.   So I've got a good bit of code now using PDL, not just 
>>>>> the little simplex program (which was only a few dozen lines--that was 
>>>>> pretty easy to implement in PDL.)
>>>>>
>>>>> I continue to have issues with the documentation, though.  Just as one 
>>>>> example from today: the mult function seems to claim that you can get it 
>>>>> to operate in-place.  And for me that was important, because I'm dealing 
>>>>> with a large dataset (of byte variables).  But, not only does "mult" by 
>>>>> itself cause an error because it isn't exported, but when I try to use it 
>>>>> as PDL::Ops::Mult(inplace ...), or PDL::Mult(inplace ...), or as 
>>>>> $piddle->inplace->mult(...), it completely fails to avoid generating a 
>>>>> large intermediate.  That was clobbering my program, repetitively, so I 
>>>>> finally punted and decided to break that step up into segments with only 
>>>>> a few hundred thousand elements to multiply in each (using slices), and 
>>>>> that got me around the problem.  But there was nothing in the 
>>>>> documentation that seemed to suggest that would be necessary.  It also 
>>>>> seemed, although I didn't document this carefully, that changing the 
>>>>> default PDL type didn't have any impact on the size of that temporary 
>>>>> intermediate (I think it was using double no matter what I did--whereas 
>>>>> using byte would have been fine.)
>>>>>
>>>>> I'd love it, in this context, if there were a PDL type of "bit" by the 
>>>>> way, since that's actually what this problem is using--it's a 3D binary 
>>>>> matrix, of ones and zeroes, with up to ~3*10^^9 elements.  When the 
>>>>> number of elements goes above ~200M, when I'm using bytes, I have to do 
>>>>> things to break it up and process one segment at a time, and it would be 
>>>>> nice if that weren't necessary--but there is evidently no implementation 
>>>>> of a "bit" type in PDL.
>>>>>
>>>>> Ken
>>>>>
>>>>>
>>>
>>> -----Original Message-----
>>> From: Chris Marshall [mailto:[email protected]]
>>> Sent: Saturday, November 15, 2014 11:42 AM
>>> To: Derek Lamb
>>> Cc: LYONS, KENNETH B (KENNETH); perldl
>>> Subject: Re: [Perldl] matching vectors inside a PDL
>>>
>>> Hi Ken and welcome to the PDL community!
>>>
>>>> On Nov 14, 2014, at 1:33 PM, LYONS, KENNETH B (KENNETH)
>>>> <[email protected]> wrote:
>>>>
>>>> Yes, most of this I knew, but thanks.  It’s because of that behavior of >
>>>> and <, that you mentioned, that I thought that ‘==’ would compare element 
>>>> by
>>>> element instead of on the whole vector.
>>>>
>>>> Have you ever tried, for example, to search the documentation for, say, the
>>>> function “list”?  it gives you every occurrence of the word “list” in the
>>>> documents (which, needless to say, is rather voluminous, and the first few
>>>> hundred entries have nothing to do with the function!)  there should be 
>>>> some
>>>> analog of the “man” command in unix that gives you information about the
>>>> *function* without all the other garbage.  I think it’s just doing 
>>>> something
>>>> akin to a grep thru the documents.
>>>
>>> In addition to the help/? and apropos/?? in the PDL shells
>>> (pdl2 and perldl) there is the command line version pdldoc
>>> which can be used starting with 'pdldoc pdldoc' to get the usage.
>>>
>>> When I do 'pdldoc -a list' I get 45 lines of output all
>>> of whose descriptions seem relevant to a general search
>>> for something having to do with 'list' including the 'list'
>>> command itself.  This type of problem is not specific to
>>> PDL as searching the docs for any complicated system
>>> or program does tend to produce a large number of
>>> incomprehensible and not particularly useful results.
>>>
>>> It is definitely desired to have smarter and more useful
>>> documentation searches.  The ability to add keywords
>>> would be nice and is on the feature request list, I believe.
>>>
>>> If you are on unix-ish system, you might try something
>>> like 'pdldoc -a list | grep --color list' to make the output
>>> more visually comprehensible.  (We should probably add
>>> that to the PDL shell output)
>>>
>>>> It’s horribly designed in that regard.  The software itself is great, and
>>>> I’m very happy with the results, but finding the simplest little thing in
>>>> the docs can be a total pain!
>>>
>>> I understand frustration.  What would really help PDL
>>> development would be to know how you got to using
>>> PDL without being introduced to the concepts that would
>>> have made the learning curve less steep.  (At least you
>>> found the mailing list and used it. :-)
>>>
>>> Happy PDL-ing!
>>> Chris

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] matching vectors inside a PDL

Reply via email to