Strange -- they do pass on my mac. I will examine the offending lines  
in the .pm when I get back to my laptop this p.m.

(Mobile)


On May 8, 2010, at 12:33 PM, Chris Marshall <[email protected]> wrote:

> Craig-
>
> I get the following test failures for commit id
> 8244725170a561f481c70933d057cc8aeb290284 ::
>
> t/matrixops.t ............... 1/28 Can't call method "slice" on an  
> undefined value at t/matrixops.t line 45.
> t/matmult.t ................. 1/5 Use of uninitialized value  
> $PDL::Primitive::d in concatenation (.) or string at /c/chm/pdl/git/ 
> pdl/blib/lib/PDL/Primitive.pm line 248.
> Use of uninitialized value $PDL::Primitive::d in concatenation (.)  
> or string at /c/chm/pdl/git/pdl/blib/lib/PDL/Primitive.pm line 248.
>
> Test Summary Report
> -------------------
> t/matrixops.t             (Wstat: 65280 Tests: 4 Failed: 0)
>  Non-zero exit status: 255
>  Parse errors: Bad plan.  You planned 28 tests but ran 4.
> t/subclass3.t             (Wstat: 0 Tests: 7 Failed: 1)
>  Failed test:  3
> Files=110, Tests=1184, 91 wallclock secs ( 0.20 usr  0.25 sys +  
> 56.04 cusr 28.18 csys = 84.68 CPU)
> Result: FAIL
> Failed 2/110 test programs. 1/1184 subtests failed.
> make: *** [test_dynamic] Error 255
>
>
> An the following failures for the 64x64 modification in
> commit id 7760030bd6520c606d8d9d5b2cee2c9ac554123c ::
>
> Test Summary Report
> -------------------
> t/limits_range.t          (Wstat: 256 Tests: 6 Failed: 1)
>  Failed test:  5
>  Non-zero exit status: 1
> t/limits_ulimits.t        (Wstat: 1280 Tests: 26 Failed: 5)
>  Failed tests:  19-23
>  Non-zero exit status: 5
> t/matrixops.t             (Wstat: 65280 Tests: 4 Failed: 0)
>  Non-zero exit status: 255
>  Parse errors: Bad plan.  You planned 28 tests but ran 4.
> t/poly.t                  (Wstat: 0 Tests: 1 Failed: 1)
>  Failed test:  1
> t/subclass3.t             (Wstat: 0 Tests: 7 Failed: 1)
>  Failed test:  3
> Files=110, Tests=1184, 88 wallclock secs ( 0.11 usr  0.20 sys +  
> 56.04 cusr 27.96 csys = 84.32 CPU)
> Result: FAIL
> Failed 5/110 test programs. 8/1184 subtests failed.
> make: *** [test_dynamic] Error 255
>
>
> Did these changes pass tests on your platform?  They
> don't on cygwin/XP.
>
> --Chris
>
>
> On 5/8/2010 9:37 AM, Craig DeForest wrote:
>> Okay, I was stupid.  I just checked in a new version that does the  
>> same
>> operation in 30 seconds on my mac. Tile size is larger (64x64), and I
>> avoid calculating offsets with PP macros inside the hottest loop - it
>> walks through using dimincs in the summation direction. So the $a x  
>> $b
>> case below now runs in 30 seconds, a factor-of-2.7 improvement (on my
>> platform) over yesterday morning.
>>
>> Cheers,
>> Craig
>>
>>
>>
>> On May 8, 2010, at 6:30 AM, Chris Marshall wrote:
>>
>>> On 5/8/2010 3:44 AM, Craig DeForest wrote:
>>>>
>>>> Er, sorry, I was noodling around and may have jumped the gun. I  
>>>> just
>>>> checked in a small speed improvement for matmult. It just evaluates
>>>> the terms in the matrix product in tiled order (multiplying 32x32
>>>> tiles) rather than in direct threading order; that fits each tile  
>>>> into
>>>> 16k in the double-precision case, which is small enough to fit in  
>>>> L1
>>>> cache of most performance CPUs. Unsurprisingly, it helps.
>>>> Surprisingly, not so very much. On my PowerBook:
>>>>
>>>> perldl> $a = random(2000,2000);
>>>> perldl> $b = random(2000,2000);
>>>> perldl> {$t0=time; $c = $a->dummy(1)->inner($b->xchg(0,1)->dummy 
>>>> (2));
>>>> ..{> $t1=time; print $t1-$t0,"\n";}
>>>> 82
>>>>
>>>> perldl> {$t0=time; $d = $a x $b;
>>>> ..{> $t1=time; print $t1-$t0,"\n";}
>>>> 70
>>>>
>>>> perldl> print all($d==$c)
>>>> 1
>>>>
>>>> I am a bit puzzled how these other packages manage to go so much
>>>> faster...
>>>
>>> The tiled calculation optimizes the memory accesses
>>> but, if I understand correctly, the tile code still
>>> uses the existing inner product algorithm.
>>>
>>> If so the total memory ops is still N**3 rather than
>>> the optimal N**2, they've just been moved to a different
>>> level of the memory hierarchy.
>>>
>>> Another possibility is that a 32x32 tile is not
>>> big enough to hide the memory access time for the
>>> data behind the floating point calculations.
>>>
>>> As to other packages performance, I'm sure an
>>> optimized C matrix multiply routine that did the
>>> entire optimization would be very fast. I don't
>>> know how much threading would/could be supported.
>>>
>>> --Chris
>>>> Cheers,
>>>> Craig
>

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to