Re: [Perldl] NBabel & a speed comparison of PDL and other interpretative languages

Timothy Pickering Thu, 06 May 2010 03:25:31 -0700

> I should perhaps clarify:
> 
> The numpy version (which I have nothing to do with) was really written as a 
> teaching aid, so it is way more object oriented than necessary for this 
> problem.
> 
> It is not really a fair test as I didn't write the code so did not know how 
> they really compared - which is why I labelled the line "Python" rather numpy 
> . However the "IDL Loop" code actually uses loops for _everything_ not a 
> single vectorised operation and the remarkable fact is that the Python code 
> is a factor of 2 slower even....


yeah, it's shockingly bad.  i've done enough number crunching in python to know 
that's not representative of what numpy can actually do.  so, as a personal 
enrichment exercise, i took jarle's pdl code from nbabel and did basically a 
line-by-line translation to python/numpy (all of the SALT astro software is 
done in python/pyraf so i'm contractually obligated to perform python-fu).  it 
was useful for me since it forced me to learn a few new tricks to do things i'm 
used to doing in pdl like '=.' and in-place reassignment.  i'm attaching both 
scripts for your amusement and will try to submit my python version to nbabel.  
the python script is slightly more succinct, but otherwise very, very similar 
to the perl one (sans comments...).  the timing results (on my 2.8 GHz macbook 
pro) are as follows:

for 1024 particles:

perldl

real    3m3.579s
user    2m31.091s
sys     0m26.850s


numpy

real    2m57.499s
user    2m23.483s
sys     0m28.369s


for 512 particles: 

perldl

real    0m35.754s
user    0m30.068s
sys     0m4.662s


numpy

real    0m36.153s
user    0m28.098s
sys     0m7.134s

timings are all done using the unix 'time' command so not super accurate, but 
close enough.  the conclusion is that they're both basically just as fast when 
implemented in such a way to maximize use of vector operations.  this makes 
sense since when you do that, you're maximizing the use of pre-fab 
well-optimized C code to do your work.  it's when you fall back to doing 
interpreted loops that you'll get nailed.  it's easier to trip over that with 
numpy because numpy arrays are indexed using the same syntax as normal python 
arrays.  pdl requires $pdl->at($i) which certainly discourages me from doing 
that.  

that all said, i don't want to start a language flame-war or anything.  i've 
been using pdl for a long, long time and it's still the most natural thing for 
me to use if i need something done an hour ago.  i'm required to do python for 
work, but am enjoying it more as i become more familiar with it.  its two big 
advantages for me are matplotlib for graphics and the ability to memmap large 
data files.  the latter is key for dealing with >2 GB FITS images, for example.

tim

#!/opt/local/bin/perl

=head1 NAME

nbabel - Routine to evolve a simple N-body system using leap-frog integration.

=head1 SYNOPSIS

This consists of a number of routines needed to evolve an N-body system with a
direct force calculation using a leap-frog integrator. The routine that does the
actual
evolution is C<evolve_cluster> below which is called with the name of the file
containing the initial conditions. Thus a synopsis would be:

   evolve_cluster('input16')

=head1 DESCRIPTION

The code reads in initial conditions from a file exected to have an ID column,
mass, 3D positions and 3D velocities. These are so evolved forward with a simple
leap-frog scheme. See the description of C<evolve_cluster> below.

I have not used a fully vectorised approach - rather I have kept the x, y and z
components of the positions and velocities in a Perl array. This makes it easier
to write the code for the relative separations without going into dimension
fiddling and it makes the code more readable. There is a noticeable hit in
performance at low N because the (slow) Perl loops over the dimensions is
significant. At large N this extra overhead becomes pretty much irrelevant.

=cut

use strict;
use warnings;
use PDL;
use PDL::NiceSlice;

use Time::HiRes qw(usleep ualarm gettimeofday tv_interval);

#
# Simple N-body calculator in PDL
#

# Get the initial conditions from the command line
my $DIR = './';
my @files = ('input16', 'input32', 'input64', 'input128', 'input256', 'input512');

@files = ('input1k');
#...@files = ('input512');

foreach my $f (@files) {
    print "Doing = $f - ";
    evolve_cluster($DIR.$f);
}

=head2 evolve_cluster

Core routine to evolve the cluster. The procedure is first to read in the
initial conditions with C<read_initial_conditions>. The subsequent steps are
then

=over 4

=item Calculate the initial acceleration and energy of the system.

This is stored in Store this in C<$a0> and C<$E0> respectively.

=item Update the positions

=item Calculate the acceleration with this new position but old velocities.

=item Update velocities

=item Assign current acceleration to C<$a0> and repeat.

=item Repeat until t=1 in N-body units

=back

=cut

sub evolve_cluster {
    my $file = shift;

    my ($m, $pos, $vel) = read_initial_conditions($file);

    my $dt = 0.001;
    my ($a0, $norm) = calculate_acceleration($m, $pos);
    my $E0 = calculate_energy($m, $pos, $vel, $norm);

    my $tend = 1.0;
    my ($nsteps, $tcurrent) = (0, 0.0);

    my $t1 = [gettimeofday];
    while ($tcurrent < $tend) {
	$pos = update_position($pos, $vel, $a0, $dt);
	($a, $norm) = calculate_acceleration($m, $pos);
	$vel = update_velocity($vel, $a0, $a, $dt);
 
	$a0 = $a;
 
	$tcurrent += $dt;
	$nsteps++;
 
 
	# This keeps track of the evolution of the system - you could
	# comment this out for speed when you have very few particles but
	# it should not give a noticeable overhead for large N.
	if (($nsteps % 10) == 0) {
	    my $E = calculate_energy($m, $pos, $vel, $norm);
	    printf "t=%.4f Etot=%8.5f  dEtot=%8.5f\n", $tcurrent, $E->at(0), $E->at(0)-$E0->at(0);
	}
    }
    my $t2 = [gettimeofday];
 
 
    my $E = calculate_energy($m, $pos, $vel, $norm);
    printf "Execution time = %7.4fs  - ", tv_interval($t1, $t2);
    printf "t=%.4f Etot=%8.5f  dEtot=%10.4e\n", $tcurrent, $E->at(0), $E->at(0)-$E0->at(0);
 
}
 
#
# Utility functions below
#
sub calculate_energy {
    my ($m, $pos, $vel, $norm) = @_;
 
    my $E_kin=0.0;
    for (my $i=0; $i<3; $i++) {
	$E_kin += $vel->[$i]**2
    }
    $E_kin = sum(0.5*$m*$E_kin);
 
    my $m_ij = outer($m, $m);
    my $E_pot = $m_ij/$norm;
 
    $E_pot->diagonal(0, 1) .= 0.0; # No self-energy
    $E_pot = -0.5*sum($E_pot);
 
    return pdl($E_kin+$E_pot, $E_kin, $E_pot);
 
}
 
sub calculate_acceleration {
    my ($m, $pos) = @_;
 
    my $n_obj = $m->nelem();
 
    # Loop over dimensions
    my @diff=();
    my $norm = 0.;
    for (my $i=0; $i<3; $i++) {
	$diff[$i] = $pos->[$i]-transpose($pos->[$i]);
	$diff[$i]->diagonal(0, 1) .= 0.0; # Ignore diagonals
	$norm += $diff[$i]*$diff[$i];
    }
 
    $norm = sqrt($norm);
    $norm->diagonal(0, 1) .= 1e30;
    my $norm3 = $norm**3;
 
    my $a = [];
    for (my $i=0; $i<3; $i++) {
	my $tmp = -$m*$diff[$i]/$norm3;
	$tmp->diagonal(0, 1) .= 0;
	$a->[$i] = sumover($tmp->xchg(0, 1));
    }
    return ($a, $norm);
}
 
sub update_position {
    my ($pos, $vel, $a, $dt) = @_;
    # Loop over dimensions
    for (my $i=0; $i<3; $i++) {
	$pos->[$i] += $vel->[$i]*$dt + 0.5*$a->[$i]*$dt*$dt;
    }
 
    return $pos;
}
 
sub update_velocity {
    my ($vel, $a, $a0, $dt) = @_;
    # Loop over dimensions
    for (my $i=0; $i<3; $i++) {
	$vel->[$i] += 0.5*($a->[$i] + $a0->[$i])*$dt;
    }
    return $vel;
}
 
sub read_initial_conditions {
    my $file = shift;
    my ($id, $m, $x, $y, $z, $vx, $vy, $vz) = rcols $file;
 
    # Drop empty lines....
    my $use = which($m > 0);
    $m = $m($use;|);
    $x = $x($use;|);
    $y = $y($use;|);
    $z = $z($use;|);
    $vx = $vx($use;|);
    $vy = $vy($use;|);
    $vz = $vz($use;|);
    
    # Note we use Perl arrays  - this probably could have been handled better
    return ($m, [$x, $y, $z], [$vx, $vy, $vz]);
}

#!/opt/local/bin/python

from numpy import *
import sys

def read_initial_conditions(file):
    id, m, x, y, z, vx, vy, vz = loadtxt(file, unpack=True)
    return (m, [x,y,z], [vx,vy,vz])

def update_velocity(vel, a, a0, dt):
    for i in range(len(vel)):
        vel[i] += 0.5*(a[i] + a0[i])*dt
    return vel

def update_position(pos, vel, a, dt):
    for i in range(len(pos)):
        pos[i] += vel[i]*dt + 0.5*a[i]*dt*dt
    return pos

def calculate_acceleration(m, pos):
    n_obj = len(m)

    diff = [0,0,0]
    norm = zeros((n_obj,n_obj))
    for i in range(len(pos)):
        tran = reshape(pos[i],(n_obj,1))
        diff[i] = pos[i] - tran
        fill_diagonal(diff[i], 0.0)
        norm += diff[i]*diff[i]

    norm = sqrt(norm)
    fill_diagonal(norm, 1.0e30)
    norm3 = norm*norm*norm

    a = [0,0,0]
    for i in range(len(pos)):
        tmp = -m*diff[i]/norm3
        fill_diagonal(tmp, 0.0)
        a[i] = sum(tmp, axis=0)

    return (a, norm)

def calculate_energy(m, pos, vel, norm):
    E = zeros(len(m))
    for i in range(len(vel)):
        E += vel[i]*vel[i]
    E_kin = 0.5*sum(m*E)

    m_ij = outer(m, m)
    E = m_ij/norm
    fill_diagonal(E, 0.0)
    E_pot = -0.5*sum(E)

    return (E_kin+E_pot, E_kin, E_pot)

def evolve_cluster(file):
    (m, pos, vel) = read_initial_conditions(file)

    dt = 0.001
    (a0, norm) = calculate_acceleration(m, pos)
    E0 = calculate_energy(m, pos, vel, norm)

    tend = 1.0
    nsteps = 0
    tcurrent = 0.0

    while tcurrent < tend:
        pos = update_position(pos, vel, a0, dt)
        (a, norm) = calculate_acceleration(m, pos)
        vel = update_velocity(vel, a0, a, dt)

        a0 = a

        tcurrent += dt
        nsteps += 1

        if (nsteps % 10) == 0:
            E = calculate_energy(m, pos, vel, norm)
            print "t=%.4f Etot = %8.5f  dEtot=%8.5f" % (tcurrent, E[0], E[0]-E0[0])

    print "Done!"
    E = calculate_energy(m, pos, vel, norm)
    print "t=%.4f Etot = %8.5f  dEtot=%8.5f" % (tcurrent, E[0], E[0]-E0[0])


if __name__ == '__main__':
    file = sys.argv[1]
    evolve_cluster(file)

--
+-------------------------------------------------------------------+
| T. E. Pickering, Ph.D.         | Southern African Large Telescope |
| SALT Astronomer                |                             SAAO |
| [email protected]  (520) 305-9823 |                 Observatory Road |
| [email protected] +27(0)214606284 |   7925 Observatory, South Africa |
+-------------------------------------------------------------------+
overflow error in /dev/null

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] NBabel & a speed comparison of PDL and other interpretative languages

Reply via email to