The following code fragment is representative of something I am doing a lot of 
these days:

$im = rfits('im.fits');
for $val ($im->uniq->list){
        next unless $val;
        $coord = whichND($im==$val);
        ##do some stuff with $im and $coord and $val
}

where $im is a 1024 x 1024 image that is mostly zeroes, but has about 2500 
regions of positive AND negative integers, like this but obviously much bigger:
[
[ 0 0 1 1 0 0]
[ 2 0 0 1 0 0]
[ 2 2 0 0 0 3]
[ 0 0 0 0 3 0]
]

Many of those regions are small, only a few pixels, but a few are several 
hundred pixels big.  The test-for-equality inside the whichND is killing me on 
speed: the above code fragment takes nearly 30 seconds to execute, not counting 
the rfits.  I thought maybe I could speed things up by only going through the 
image once, and creating a %hash whose keys were the integers (1,2,3 etc in the 
example above) and the values were a list of coordinates, like this:

$ndc = ndcoords($im)->clump(1,2);
for $i(0..$ndc->dim(1)-1){
   ($cx,$cy) = $ndc(:,($i))->list;
   next unless $im($cx,$cy);
   push @{$hash{$im($cx,$cy)}},$cx,$cy;
}

But that is even slower, around 40 seconds.

Finally, I came up with this method involving a few stages of indirection:

###start
use PDL::Math; #not included in PDL::LiteF, needed for floor()
my $vals = $im->flat->qsort;
my $vind = $im->flat->qsorti;  #because of the qsorts, similarly-labeled pixels 
of interest don't even have to be contiguous

my $x=$vind % $im->dim(0);
my $y=floor($vind/$im->dim(0));

my ($v_r_a,$v_r_b)=rle($vals);  #run-length-encode the 
#rle says only the elements up to the first 0 in $v_r_a should be considered, 
so we pare those down
$v_r_b=$v_r_b(0:which($v_r_a==0)->(0)-1);
$v_r_a=$v_r_a(0:which($v_r_a==0)->(0)-1);

my $vra_cumsum = cumusumover($v_r_a);

my ($val,$coord);
for my $i(0..$v_r_b->nelem-1){
    $val = $v_r_b->(($i));
    next unless $val;
    my ($start,$end);
    $end = $vra_cumsum(($i))-1;
    $start = $i ? $vra_cumsum(($i-1)) : 0;  #trinary op needed for the first 
element because of how rle() does counting.
    $coord = $x($start:$end)->glue(1,$y($start:$end))->transpose->qsortvec;
}
###end

The qsortvec at the end there is probably extraneous, but was useful for 
comparing the output of this method with my original.  Notice there is no check 
for equality inside the loop (or anywhere at all except the $v_r_a and _b 
pare-down, and I suppose implicitly in the qsorts).  Everything is accomplished 
using slices.  This code fragment takes less than 0.5 seconds, a factor of 60 
improvement in speed. Obviously this will only work for sparse images like my 
sample above.  I haven't been careful about generalizing it to more than 2 
dimensions, but that shouldn't be too hard.

I'm posting this in the PDL Cookbook on the wiki.  If you think it's useful let 
me know and I'll add it to the distribution, calling it which_all or something 
like that.  The trick will be how to return ALL of the instances of $coord to 
the user--maybe the hash implementation I tried earlier will work, but I'm open 
to other suggestions.

Derek


_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to