Executive summary:
- comparing raku 2021.10 with raku 2021.9
-comparing 3 ways of parsing (although the 2 string function ways are
similar)
- raku 2021.10 is better than 2 times as fast as 2021.9 using the
string functions
- raku 2021.10 is about the same as 2021.9 using a more general regular
expression
- regular expressions are still slow in 2021.10
Side note: not shown here is also parsing with Text::LDIF. In 2021.9 it was
comparable to the regex method. Not tried with 2021.10.
I need to parse a 40K entry LDIF file.
Below is some code that uses 3 ways to parse.
There are 3 MAIN subs that differ in a few last lines of the for loop.
The loop reads the LDIF entries and populates %ldap keyed on the "uid" of
the LDIF entry.
The values of %ldap are User objects.
A %f hash is used to build the values of User on each LDIF entry
The aim is to show the difference in timings between 3 ways of parsing the
LDIF
The 1st MAIN (regex) uses this general regular expression to build %f
next unless $line ~~ m/ ^ (@attributes) ':' \s (.+) $ /;
%f{$0} = "$1";
The "starts" MAIN uses starts-with() to build %f
for @attributes -> $a {
if $line.starts-with( $a ~ ": " ) {
%f{$a} = (split( ": ", $line, 2))[1];
last;
}
And finally the "split" MAIN uses split() but also uses the feature that
User.new() will ignore attributes that are not used.
($k, $v) = split( ": ", $line, 2);
%f{$k} = $v;
That's the difference between the MAIN()'s below. Sorry I couldn't golf it
down more.
Running the benchmarks multiple times does vary the times slightly but not
significantly.
Results for rakudo-pkg-2021.9.0-01:
$ ./icheck.raku regex
41391 entries by regex in 27.859560887 seconds
$ ./icheck.raku starts
41391 entries by starts-with in 5.970667533 seconds
$ ./icheck.raku split
41391 entries by split in 5.12252741 seconds
Results for rakudo-pkg-2021.10.0-01
$ ./icheck.raku regex
41391 entries by regex in 27.833870158 seconds
$ ./icheck.raku starts
41391 entries by starts-with in 2.560101599 seconds
$ ./icheck.raku split
41391 entries by split in 2.307679407 seconds
-------------------------------------
#!/usr/bin/env raku
class User {
has $.uid;
has $.uidNumber;
has $.gidNumber;
has $.homeDirectory;
has $.mode = 0;
method attributes {
# return <uid uidNumber gidNumber homeDirectory mode>;
User.^attributes(:local)>>.name>>.substr(2); # Is the order
guaranteed?
}
}
# Read user info from LDIF file
my %ldap;
my @attributes = User.attributes;
multi MAIN ( "regex", $ldif-fn = "db/icheck.ldif" ) {
my ( %f );
for $ldif-fn.IO.lines -> $line {
when not $line { # blank line is LDIF entry terminator
%ldap{%f<uid>} = User.new( |%f );
}
when $line.starts-with( 'dn: ' ) { %f = () } # dn: starts a new
entry
next unless $line ~~ m/ ^ (@attributes) ':' \s (.+) $ /;
%f{$0} = "$1";
}
say "{%ldap.elems} entries by regex in {now - BEGIN now} seconds";
}
multi MAIN ( "starts", $ldif-fn = "db/icheck.ldif" ) {
my ( %f );
for $ldif-fn.IO.lines -> $line {
when not $line { # blank line is LDIF entry terminator
%ldap{%f<uid>} = User.new( |%f );
}
when $line.starts-with( 'dn: ' ) { %f = () } # dn: starts a new
entry
for @attributes -> $a {
if $line.starts-with( $a ~ ": " ) {
%f{$a} = (split( ": ", $line, 2))[1];
last;
}
}
}
say "{%ldap.elems} entries by starts-with in {now - BEGIN now} seconds";
}
multi MAIN ( "split", $ldif-fn = "db/icheck.ldif" ) {
my ( %f, $k, $v );
for $ldif-fn.IO.lines -> $line {
when not $line { # blank line is LDIF entry terminator
%ldap{%f<uid>} = User.new( |%f ); # attributes not used
are ignored
}
when $line.starts-with( 'dn: ' ) { %f = () } # dn: starts a new
entry
($k, $v) = split( ": ", $line, 2);
%f{$k} = $v;
}
say "{%ldap.elems} entries by split in {now - BEGIN now} seconds";
}
--
Norman Gaywood, Computer Systems Officer
School of Science and Technology
University of New England
Armidale NSW 2351, Australia
[email protected] http://turing.une.edu.au/~ngaywood
Phone: +61 (0)2 6773 2412 Mobile: +61 (0)4 7862 0062
Please avoid sending me Word or Power Point attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html