Primitive benchmark comparison (parsing LDIF)

Norman Gaywood Wed, 27 Oct 2021 19:15:28 -0700

Executive summary:
     - comparing raku 2021.10 with raku 2021.9
     -comparing 3 ways of parsing (although the 2 string function ways are
similar)
    - raku 2021.10 is better than 2 times as fast as 2021.9 using the
string functions
    - raku 2021.10 is about the same as 2021.9 using a more general regular
expression
    - regular expressions are still slow in 2021.10


Side note: not shown here is also parsing with Text::LDIF. In 2021.9 it was
comparable to the regex method. Not tried with 2021.10.

I need to parse a 40K entry LDIF file.

Below is some code that uses 3 ways to parse.
There are 3 MAIN subs that differ in a few last lines of the for loop.
The loop reads the LDIF entries and populates %ldap keyed on the "uid" of
the LDIF entry.
The values of %ldap are User objects.
A %f hash is used to build the values of User on each LDIF entry

The aim is to show the difference in timings between 3 ways of parsing the
LDIF

The 1st MAIN (regex) uses this general regular expression to build %f
         next unless $line ~~ m/ ^ (@attributes) ':' \s (.+) $ /;
        %f{$0} = "$1";

The "starts" MAIN uses starts-with() to build %f
       for @attributes -> $a {
            if $line.starts-with( $a ~ ": " ) {
               %f{$a} = (split( ": ", $line, 2))[1];
               last;
    }

And finally the "split" MAIN uses split() but also uses the feature that
User.new() will ignore attributes that are not used.
        ($k, $v) = split( ": ", $line, 2);
        %f{$k} = $v;

That's the difference between the MAIN()'s below. Sorry I couldn't golf it
down more.
Running the benchmarks multiple times does vary the times slightly but not
significantly.

Results for rakudo-pkg-2021.9.0-01:
$ ./icheck.raku regex
41391 entries by regex in 27.859560887 seconds
$ ./icheck.raku starts
41391 entries by starts-with in 5.970667533 seconds
$ ./icheck.raku split
41391 entries by split in 5.12252741 seconds

Results for rakudo-pkg-2021.10.0-01
$ ./icheck.raku regex
41391 entries by regex in 27.833870158 seconds
$ ./icheck.raku starts
41391 entries by starts-with in 2.560101599 seconds
$ ./icheck.raku split
41391 entries by split in 2.307679407 seconds

-------------------------------------
#!/usr/bin/env raku

class User {
    has $.uid;
    has $.uidNumber;
    has $.gidNumber;
    has $.homeDirectory;
    has $.mode = 0;

    method attributes {
       # return <uid uidNumber gidNumber homeDirectory mode>;
       User.^attributes(:local)>>.name>>.substr(2);  # Is the order
guaranteed?
    }
}

# Read user info from LDIF file
my %ldap;
my @attributes = User.attributes;

multi MAIN ( "regex", $ldif-fn = "db/icheck.ldif" ) {
    my ( %f );
    for $ldif-fn.IO.lines -> $line {
        when not $line {  # blank line is LDIF entry terminator
            %ldap{%f<uid>} = User.new( |%f );
        }
        when $line.starts-with( 'dn: ' ) { %f = () }   # dn: starts a new
entry

        next unless $line ~~ m/ ^ (@attributes) ':' \s (.+) $ /;
        %f{$0} = "$1";
    }
    say "{%ldap.elems} entries by regex in {now - BEGIN now} seconds";
}

multi MAIN ( "starts", $ldif-fn = "db/icheck.ldif" ) {
    my ( %f );
    for $ldif-fn.IO.lines -> $line {
        when not $line {  # blank line is LDIF entry terminator
            %ldap{%f<uid>} = User.new( |%f );
        }
        when $line.starts-with( 'dn: ' ) { %f = () }   # dn: starts a new
entry

        for @attributes -> $a {
            if $line.starts-with( $a ~ ": " ) {
               %f{$a} = (split( ": ", $line, 2))[1];
               last;
            }
         }

    }
    say "{%ldap.elems} entries by starts-with in {now - BEGIN now} seconds";
}

multi MAIN ( "split", $ldif-fn = "db/icheck.ldif" ) {
    my ( %f, $k, $v );
    for $ldif-fn.IO.lines -> $line {
        when not $line {  # blank line is LDIF entry terminator
            %ldap{%f<uid>} = User.new( |%f );         # attributes not used
are ignored
        }
        when $line.starts-with( 'dn: ' ) { %f = () }   # dn: starts a new
entry

        ($k, $v) = split( ": ", $line, 2);
        %f{$k} = $v;
    }
    say "{%ldap.elems} entries by split in {now - BEGIN now} seconds";
}

-- 
Norman Gaywood, Computer Systems Officer
School of Science and Technology
University of New England
Armidale NSW 2351, Australia

ngayw...@une.edu.au  http://turing.une.edu.au/~ngaywood
Phone: +61 (0)2 6773 2412  Mobile: +61 (0)4 7862 0062

Please avoid sending me Word or Power Point attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

Primitive benchmark comparison (parsing LDIF)

Reply via email to