Strange -- they do pass on my mac. I will examine the offending lines in the .pm when I get back to my laptop this p.m.
(Mobile) On May 8, 2010, at 12:33 PM, Chris Marshall <[email protected]> wrote: > Craig- > > I get the following test failures for commit id > 8244725170a561f481c70933d057cc8aeb290284 :: > > t/matrixops.t ............... 1/28 Can't call method "slice" on an > undefined value at t/matrixops.t line 45. > t/matmult.t ................. 1/5 Use of uninitialized value > $PDL::Primitive::d in concatenation (.) or string at /c/chm/pdl/git/ > pdl/blib/lib/PDL/Primitive.pm line 248. > Use of uninitialized value $PDL::Primitive::d in concatenation (.) > or string at /c/chm/pdl/git/pdl/blib/lib/PDL/Primitive.pm line 248. > > Test Summary Report > ------------------- > t/matrixops.t (Wstat: 65280 Tests: 4 Failed: 0) > Non-zero exit status: 255 > Parse errors: Bad plan. You planned 28 tests but ran 4. > t/subclass3.t (Wstat: 0 Tests: 7 Failed: 1) > Failed test: 3 > Files=110, Tests=1184, 91 wallclock secs ( 0.20 usr 0.25 sys + > 56.04 cusr 28.18 csys = 84.68 CPU) > Result: FAIL > Failed 2/110 test programs. 1/1184 subtests failed. > make: *** [test_dynamic] Error 255 > > > An the following failures for the 64x64 modification in > commit id 7760030bd6520c606d8d9d5b2cee2c9ac554123c :: > > Test Summary Report > ------------------- > t/limits_range.t (Wstat: 256 Tests: 6 Failed: 1) > Failed test: 5 > Non-zero exit status: 1 > t/limits_ulimits.t (Wstat: 1280 Tests: 26 Failed: 5) > Failed tests: 19-23 > Non-zero exit status: 5 > t/matrixops.t (Wstat: 65280 Tests: 4 Failed: 0) > Non-zero exit status: 255 > Parse errors: Bad plan. You planned 28 tests but ran 4. > t/poly.t (Wstat: 0 Tests: 1 Failed: 1) > Failed test: 1 > t/subclass3.t (Wstat: 0 Tests: 7 Failed: 1) > Failed test: 3 > Files=110, Tests=1184, 88 wallclock secs ( 0.11 usr 0.20 sys + > 56.04 cusr 27.96 csys = 84.32 CPU) > Result: FAIL > Failed 5/110 test programs. 8/1184 subtests failed. > make: *** [test_dynamic] Error 255 > > > Did these changes pass tests on your platform? They > don't on cygwin/XP. > > --Chris > > > On 5/8/2010 9:37 AM, Craig DeForest wrote: >> Okay, I was stupid. I just checked in a new version that does the >> same >> operation in 30 seconds on my mac. Tile size is larger (64x64), and I >> avoid calculating offsets with PP macros inside the hottest loop - it >> walks through using dimincs in the summation direction. So the $a x >> $b >> case below now runs in 30 seconds, a factor-of-2.7 improvement (on my >> platform) over yesterday morning. >> >> Cheers, >> Craig >> >> >> >> On May 8, 2010, at 6:30 AM, Chris Marshall wrote: >> >>> On 5/8/2010 3:44 AM, Craig DeForest wrote: >>>> >>>> Er, sorry, I was noodling around and may have jumped the gun. I >>>> just >>>> checked in a small speed improvement for matmult. It just evaluates >>>> the terms in the matrix product in tiled order (multiplying 32x32 >>>> tiles) rather than in direct threading order; that fits each tile >>>> into >>>> 16k in the double-precision case, which is small enough to fit in >>>> L1 >>>> cache of most performance CPUs. Unsurprisingly, it helps. >>>> Surprisingly, not so very much. On my PowerBook: >>>> >>>> perldl> $a = random(2000,2000); >>>> perldl> $b = random(2000,2000); >>>> perldl> {$t0=time; $c = $a->dummy(1)->inner($b->xchg(0,1)->dummy >>>> (2)); >>>> ..{> $t1=time; print $t1-$t0,"\n";} >>>> 82 >>>> >>>> perldl> {$t0=time; $d = $a x $b; >>>> ..{> $t1=time; print $t1-$t0,"\n";} >>>> 70 >>>> >>>> perldl> print all($d==$c) >>>> 1 >>>> >>>> I am a bit puzzled how these other packages manage to go so much >>>> faster... >>> >>> The tiled calculation optimizes the memory accesses >>> but, if I understand correctly, the tile code still >>> uses the existing inner product algorithm. >>> >>> If so the total memory ops is still N**3 rather than >>> the optimal N**2, they've just been moved to a different >>> level of the memory hierarchy. >>> >>> Another possibility is that a 32x32 tile is not >>> big enough to hide the memory access time for the >>> data behind the floating point calculations. >>> >>> As to other packages performance, I'm sure an >>> optimized C matrix multiply routine that did the >>> entire optimization would be very fast. I don't >>> know how much threading would/could be supported. >>> >>> --Chris >>>> Cheers, >>>> Craig > _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
