Background. We have about a billion records that we need to merge from 100
files. We have devised a customised algorithm that reduces the runtime
from weeks (using a RDBMS) to about 12 hours. The main bottleneck
currently in this is a need to sort an array (of arrays) composed of 100
items. 

[
  [ 'a_str', val ],
  [ 'b_str', val ],
  ...
  [ '100th_str', val ],
]

The sort occurs on the 'X_str' string ie $ary->[N]->[0]

Currently we are using perls native sort which of course works fine,
however in 99% of the cases only the $ary->[0]->[0] element will be out of
place. The remainder of the 'list' will have remained sorted. Native perl
sort can not take advantage of this, but we would like to, as on average
this will mean only 50 cmps (1-99) are required to correctly position the
element. Sort requires at least 99 every time so it should be much faster
to do this.

Here is an implementation in perl to show what we want to achieve in case
that was not clear enough

##moveFirst n=200000 Took 56 seconds 3571.43/sec
#sort n=200000 Took 85 seconds 2352.94/sec

sub moveFirstElementIntoPosition {
    my $ref = shift;

    # if we are already in place return
  return if $ref->[0]->[0] le $ref->[1]->[0];

    # bubble it into position
    for (1..$#{$ref}) {
        if ( $ref->[$_-1]->[0] le $ref->[$_]->[0] ) {
          return
        }
        else {
            ( $ref->[$_-1], $ref->[$_] ) = ( $ref->[$_], $ref->[$_-1] );
        }
    }
}

my $s = time();
my $n = 200_000;
my @test =  map { [ $_, $a++ ] } ( 'ca', 'aa' .. 'bz', 'cb' .. 'dz' );
for ( 1 .. $n ) { 
    my @ary = @test;
    moveFirstElementIntoPosition( [EMAIL PROTECTED] );
}
$s = time() - $s;
printf "moveFirst n=$n Took %d seconds %.2f/sec\n", $s, $n/$s;

$s = time();
for ( 1 .. $n ) { 
    my @ary = @test;
    @ary = sort { $a->[0] cmp $b->[0] } @ary;
}
$s = time() - $s;
printf "sort n=$n Took %d seconds %.2f/sec\n", $s, $n/$s;

As you can see even implemented in Perl the move algorithm beats sort by
some 30%. I would expect to be able to beat sort by 50%+ if we implemented
something similar in XS. With a billion sorts even a small increase in
throughput speed is significant, thus the desire.

If the data existed in a C struct I would have no problem in implementing
this, but of course it does not and although I have used some XS to
implement a number of algorithms where speed was of the essence I have not
tackled anything similar in the past. In other words my XS is not
currently up to it. I have had a look at sort.c in the source and the cmp
function(s). I would welcome any suggestions as to the most efficient way
to implement this moveFirstElementIntoPosition() function. I can see ways
to do using my limited XS by returning a new array ref but ideally would
just like to modify the array passed to the function in place to avoid the
overheads of data copying and just have a void return.

Can anyone point me in the right direction, perhaps to some XS code that
modifies an array in place or anything else that might be useful.

Kind Regards

Dr James Freeman

Reply via email to