Re: [Perldl] PDL speed vs Perl loop

Craig DeForest Fri, 09 Jul 2010 10:21:46 -0700

Hmmm... I wonder, Puneet, if the hit is due to breaking cache?  Lots  
of things get lots slower when you exceed particular size thresholds,  
and PDL (like basically all vectorized languages) is pessimal for  
cache management.




On Jul 9, 2010, at 9:17 AM, P Kishor wrote:

> On Fri, Jul 9, 2010 at 10:10 AM, Benjamin Schuster-Boeckler
> <[email protected]> wrote:
>> My experience with very large datasets in PDL comes down to this:
>>
>> USE THE SMALLEST SUITABLE DATATYPE
>>
>> I can't stress enough how important that is :-)
>>
>
>
> Yes, very correct. But, keep in mind, even if a single value is
> different from your smallest datatype, the *entire* piddle will get
> padded to the larger datatype.
>
>> I'm dealing with vectors of ~500M values (whole human chromosomes,  
>> if you're interested :-). If I only need a bitmask, I use a byte()  
>> piddle, if I have counts, I use byte/ushort, and I even sometimes  
>> convert rational numbers to integers for performance reasons. I'm  
>> pretty sure that most of your problems on a mac are due to over- 
>> allocating memory.
>>
>> In general, I find that PDL DOES eat Perl's lunch a million times  
>> if you do things cleverly. I was able to do sliding window  
>> averaging on those 500M vectors using PDL::PP in a second or two,  
>> compared to hours in pure Perl.
>>
>
>
> I am sure. I am sold on PDL. But, it does require me to be "very
> clever," while Perl is probably more forgiving. It will probably be a
> long while before I reach the "very clever" stage and capable of using
> PDL::PP to its fullest.
>
> We really need an extensive and updated "Cookbook" that documents all
> the best practices, with examples.
>
>
>
>
>
>> Cheers,
>> Ben
>>
>> On 9 Jul 2010, at 16:53, P Kishor wrote:
>>
>>> Craig, David, others,
>>>
>>> I find your explanation satisfying, but not the actual results  
>>> that I
>>> am getting. I am experiencing a more stable performance from Perl,
>>> with the performance scaling predictably. PDL shows itself to be  
>>> more
>>> moody. From one run to another, the performance can really swing.  
>>> This
>>> is on my MacBook with no other user process running (meaning, I am  
>>> not
>>> ripping music or watching a movie on Hulu at the same time...).
>>>
>>> First, no doubt my simplistic PDL approach was wrong. I figured, I
>>> have to calculate one "column" based on two other "columns" -- "Hey!
>>> the PDL docs show how to get a column... use slice." So, that is  
>>> what
>>> I went with. However, using Craig's better and more efficient
>>> calculation approach, I did experience much better results, but not
>>> completely.
>>>
>>> I used Craig's reworked script and ran it three times. The results  
>>> are
>>> below (use fixed width font to see the results), but here is some
>>> discussion --
>>>
>>> Both David and Craig implied that making the data (the array for  
>>> Perl
>>> and the piddle for PDL) would be more efficient in Perl because it
>>> would do some up-front memory allocation, so 'push'ing an element on
>>> to the array would not be costly. That is not the case. PDL is  
>>> pretty
>>> good, in fact, better than Perl in converting an array into a piddle
>>> than Perl is in making the array in the first place.
>>>
>>> Another assertion was that PDL will eat Perl's lunch when it comes  
>>> to
>>> calculation. That is also not the case *always*. PDL is much  
>>> faster at
>>> smaller data sets. But, at a certain threshold, (for me, that
>>> threshold is 3 million), PDL gets bogged down. Actually, at 3.5
>>> million, PDL gets very slow, and at 4 million, it basically locks up
>>> my computer.
>>>
>>> Another interesting issue -- Perl seems to be better at sharing the
>>> resources. When the Perl calculation is running, my machine is
>>> responsive. I can switch back to the browser, scroll a page, etc.  
>>> When
>>> the PDL calc is running, it is like my machine is frozen.
>>>
>>> This kinda worries me. If we write-up the gotchas and the limits
>>> between which PDL use is optimal, then it is "caveat emptor" and all
>>> that. However, on a more realistic front, I was hoping to use PDL  
>>> with
>>> a 13 million elements piddle. I did some tests, and I found that a  
>>> 2D
>>> piddle where ("first D" * "second D") = 13 million, PDL was  
>>> smokingly
>>> fast. I am wondering though -- will its performance change if the
>>> piddle was a 1D piddle that was 13 million elements long? Does it
>>> matter to PDL if my dataset is a "long rope" vs. a "carpet", but  
>>> both
>>> with the same "thread count" (to use a fabric analogy)?
>>>
>>> Test results (reformatted) shown below
>>>
>>>
>>> count: 10000
>>> ============================
>>>           Perl       PDL
>>> ----------------------------
>>> make data: 0.0097     0.0065
>>> calculate: 0.0064     0.0014
>>>
>>> make data: 0.0106     0.0065
>>> calculate: 0.0064     0.0014
>>>
>>> make data: 0.0104     0.0065
>>> calculate: 0.0063     0.0014
>>> ____________________________
>>>
>>>
>>> count: 100000
>>> ============================
>>>           Perl       PDL
>>> ----------------------------
>>> make data: 0.0962     0.0791
>>> calculate: 0.0624     0.0108
>>>
>>> make data: 0.0966     0.0811
>>> calculate: 0.0621     0.0109
>>>
>>> make data: 0.0966     0.0789
>>> calculate: 0.0626     0.0109
>>> ____________________________
>>>
>>>
>>> count: 1000000
>>> ============================
>>>           Perl       PDL
>>> ----------------------------
>>> make data: 0.9626     0.8014
>>> calculate: 0.6269     0.1170
>>>
>>> make data: 0.9656     0.8064
>>> calculate: 0.6275     0.1182
>>>
>>> make data: 0.9643     0.8203
>>> calculate: 0.6275     0.1168
>>> ____________________________
>>>
>>>
>>> count: 2000000
>>> ============================
>>>           Perl       PDL
>>> ----------------------------
>>> make data: 1.7542     1.5168
>>> calculate: 1.2462     0.2381
>>>
>>> make data: 1.7519     1.5221
>>> calculate: 1.2500     0.2391
>>>
>>> make data: 1.7517     1.5226
>>> calculate: 1.2699     0.2394
>>> ____________________________
>>>
>>>
>>> count: 3000000
>>> ============================
>>>           Perl       PDL
>>> ----------------------------
>>> make data: 2.5263     2.5722
>>> calculate: 1.9163     3.2107
>>>
>>> make data: 2.5411     2.2062
>>> calculate: 1.8897     6.9557
>>>
>>> make data: 2.5305     2.2822
>>> calculate: 1.9204     7.2502
>>> ____________________________
>>> On Fri, Jul 9, 2010 at 2:32 AM, Craig DeForest
>>> <[email protected]> wrote:
>>>> Wow, Puneet really stirred us all up (again).  Puneet, as David  
>>>> said, your
>>>> PDL code is slow because you are using a complicated expression,  
>>>> which
>>>> forced PDL to create and destroy intermediate PDLs (every binary  
>>>> operation
>>>> has to have a complete temporary PDL allocated and then freed to  
>>>> store its
>>>> result!).  I attach a variant of your test, with the operation  
>>>> carried out
>>>> as much in-place as possible to eliminate extra allocations.  PDL  
>>>> runs
>>>> almost exactly a factor of 10 faster on my computer than does raw  
>>>> Perl in
>>>> this case.
>>>> Note that the original ingestion of the Perl array to PDL is  
>>>> quite slow:  it
>>>> generally takes slightly longer to create the PDL than to  
>>>> generate the
>>>> random numbers and create the Perl array in the first place!   
>>>> That is
>>>> because PDL has to make several passes through the Perl array to  
>>>> determine
>>>> its size, and then has to individually probe and convert each  
>>>> numeric value
>>>> in the Perl array.
>>>>
>>>> On Jul 9, 2010, at 1:09 AM, David Mertens wrote:
>>>>
>>>> FYI, for really thorough timing results, check out Devel::NYTProf:
>>>> http://search.cpan.org/~timb/Devel-NYTProf-4.03/lib/Devel/ 
>>>> NYTProf.pm
>>>>
>>>> You have a lot of things going on to mix up the results - you  
>>>> have both a
>>>> memory allocation and a calculation. As I understand it, Perl  
>>>> will likely
>>>> outperform PDL in the memory allocation portion of this exercise,  
>>>> but PDL
>>>> should have Perl's lunch for the calculation portion.
>>>>
>>>> Perl will outperform PDL in the memory allocation because in all  
>>>> likelihood,
>>>> it doesn't perform any allocation with the push. It likely  
>>>> already allocated
>>>> more than three elements for (all of) its arrays, so pushing the  
>>>> new value
>>>> on the array does not cost anything, except for a higher up-front  
>>>> memory
>>>> cost. I suspect this is where PDL is losing to Perl - Perl is  
>>>> performing the
>>>> allocation ahead of where you start the timer.
>>>>
>>>> In terms of the calculation itself, PDL should far outperform  
>>>> Perl. The
>>>> reason is that the actual contents of the calculation loop are  
>>>> very slim, so
>>>> the cost of all of the Perl stack manipulation should  
>>>> significantly increase
>>>> its cost. The reason Perl for loops usually make sense are  
>>>> because the code
>>>> inside the for loops often involve IO operations or other such  
>>>> things, in
>>>> which case the Perl stack manipulations comprise only a small  
>>>> portion of the
>>>> total compute time.
>>>>
>>>> Try a situation when Perl and PDL allocate their memory as part  
>>>> of the
>>>> timing and see what that gives.
>>>>
>>>> David
>>>>
>>>> --
>>>> Sent via my carrier pigeon.
>>>> _______________________________________________
>>>> Perldl mailing list
>>>> [email protected]
>>>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Puneet Kishor http://www.punkish.org
>>> Carbon Model http://carbonmodel.org
>>> Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
>>> Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
>>> Nelson Institute, UW-Madison http://www.nelson.wisc.edu
>>> -----------------------------------------------------------------------
>>> Assertions are politics; backing up assertions with evidence is  
>>> science
>>> = 
>>> = 
>>> = 
>>> ====================================================================
>>>
>
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>


_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] PDL speed vs Perl loop

Reply via email to