Re: [Perldl] NBabel & a speed comparison of PDL and other interpretative languages

Craig DeForest Sat, 08 May 2010 06:38:41 -0700

Okay, I was stupid.  I just checked in a new version that does the  
same operation in 30 seconds on my mac.  Tile size is larger (64x64),  
and I avoid calculating offsets with PP macros inside the hottest loop  
- it walks through using dimincs in the summation direction.  So the  
$a x $b case below now runs in 30 seconds, a factor-of-2.7 improvement  
(on my platform) over yesterday morning.


Cheers,
Craig



On May 8, 2010, at 6:30 AM, Chris Marshall wrote:

> On 5/8/2010 3:44 AM, Craig DeForest wrote:
>>
>> Er, sorry, I was noodling around and may have jumped the gun.  I just
>> checked in a small speed improvement for matmult.  It just evaluates
>> the terms in the matrix product in tiled order (multiplying 32x32
>> tiles) rather than in direct threading order; that fits each tile  
>> into
>> 16k in the double-precision case, which is small enough to fit in L1
>> cache of most performance CPUs.  Unsurprisingly, it helps.
>> Surprisingly, not so very much.  On my PowerBook:
>>
>>      perldl>  $a = random(2000,2000);
>>      perldl>  $b = random(2000,2000);
>>      perldl>  {$t0=time; $c = $a->dummy(1)->inner($b->xchg(0,1)- 
>> >dummy(2));
>>      ..{>  $t1=time; print $t1-$t0,"\n";}
>>      82
>>
>>      perldl>  {$t0=time; $d = $a x $b;
>>      ..{>  $t1=time; print $t1-$t0,"\n";}
>>      70
>>              
>>      perldl>  print all($d==$c)
>>      1
>>
>> I am a bit puzzled how these other packages manage to go so much
>> faster...
>
> The tiled calculation optimizes the memory accesses
> but, if I understand correctly, the tile code still
> uses the existing inner product algorithm.
>
> If so the total memory ops is still N**3 rather than
> the optimal N**2, they've just been moved to a different
> level of the memory hierarchy.
>
> Another possibility is that a 32x32 tile is not
> big enough to hide the memory access time for the
> data behind the floating point calculations.
>
> As to other packages performance, I'm sure an
> optimized C matrix multiply routine that did the
> entire optimization would be very fast.  I don't
> know how much threading would/could be supported.
>
> --Chris
>> Cheers,
>> Craig
>>
>> _______________________________________________
>> Perldl mailing list
>> [email protected]
>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>
>>
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 9.0.819 / Virus Database: 271.1.1/2860 - Release Date:  
>> 05/07/10 14:26:00
>>
>
>
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>


_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] NBabel & a speed comparison of PDL and other interpretative languages

Reply via email to