Re: Highly customised sort using XS

Reini Urban Wed, 25 Feb 2004 10:27:53 -0800

I have a fine book recommendation:
"Programming Pearls" by Jon Bentley.
  http://www.cs.bell-labs.com/cm/cs/pearls/

He advises to use simple and special purpose tools for such kind of problems. Essentially that would be some kind of awk and sort, instead of a customized perl xs. :)

Ok, downstairs is my hint to your problem:

PerlDiscuss - Perl Newsgroups and mailing lists schrieb:

Background. We have about a billion records that we need to merge from 100 files. We have devised a customised algorithm that reduces the runtime from weeks (using a RDBMS) to about 12 hours. The main bottleneck currently in this is a need to sort an array (of arrays) composed of 100 items.

[
  [ 'a_str', val ],
  [ 'b_str', val ],
  ...
  [ '100th_str', val ],
]

The sort occurs on the 'X_str' string ie $ary->[N]->[0]

Currently we are using perls native sort which of course works fine,
however in 99% of the cases only the $ary->[0]->[0] element will be out of
place. The remainder of the 'list' will have remained sorted. Native perl
sort can not take advantage of this, but we would like to, as on average
this will mean only 50 cmps (1-99) are required to correctly position the
element. Sort requires at least 99 every time so it should be much faster
to do this.

Here is an implementation in perl to show what we want to achieve in case
that was not clear enough

##moveFirst n=200000 Took 56 seconds 3571.43/sec
#sort n=200000 Took 85 seconds 2352.94/sec

sub moveFirstElementIntoPosition {
    my $ref = shift;

    # if we are already in place return
  return if $ref->[0]->[0] le $ref->[1]->[0];

    # bubble it into position
    for (1..$#{$ref}) {
        if ( $ref->[$_-1]->[0] le $ref->[$_]->[0] ) {
          return
        }
        else {
            ( $ref->[$_-1], $ref->[$_] ) = ( $ref->[$_], $ref->[$_-1] );
        }
    }
}

my $s = time(); my $n = 200_000; my @test = map { [ $_, $a++ ] } ( 'ca', 'aa' .. 'bz', 'cb' .. 'dz' ); for ( 1 .. $n ) { my @ary = @test; moveFirstElementIntoPosition( [EMAIL PROTECTED] ); } $s = time() - $s; printf "moveFirst n=$n Took %d seconds %.2f/sec\n", $s, $n/$s;

$s = time(); for ( 1 .. $n ) { my @ary = @test; @ary = sort { $a->[0] cmp $b->[0] } @ary; } $s = time() - $s; printf "sort n=$n Took %d seconds %.2f/sec\n", $s, $n/$s;

As you can see even implemented in Perl the move algorithm beats sort by
some 30%. I would expect to be able to beat sort by 50%+ if we implemented
something similar in XS. With a billion sorts even a small increase in
throughput speed is significant, thus the desire.

If the data existed in a C struct I would have no problem in implementing
this, but of course it does not and although I have used some XS to
implement a number of algorithms where speed was of the essence I have not
tackled anything similar in the past. In other words my XS is not
currently up to it. I have had a look at sort.c in the source and the cmp
function(s). I would welcome any suggestions as to the most efficient way
to implement this moveFirstElementIntoPosition() function. I can see ways
to do using my limited XS by returning a new array ref but ideally would
just like to modify the array passed to the function in place to avoid the
overheads of data copying and just have a void return.

Can anyone point me in the right direction, perhaps to some XS code that
modifies an array in place or anything else that might be useful.

I also tried sort-in-place in my Tie::CArray implementation, esp. suited to such a bulk of external data. I needed it for comp geom problems, several hundred thousands of 2d or 3d points.

http://search.cpan.org/src/RURBAN/Tie-CArray-0.12/CArray.xs http://search.cpan.org/~rurban/Tie-CArray-0.12/CArray.pm#CLASS_METHODS => see isort but my isort was never implemented, because of the funarg (to perl) in xs. but you can take that nreverse code and sort it in xs. no major problem.

Note: the ARBITRARY STRUCTURED ARRAYS, PACK-STYLE TEMPLATES code is also not enabled per default.

--
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/

Re: Highly customised sort using XS

Reply via email to