Re: Primitive benchmark comparison (parsing LDIF)

2021-10-27 Thread Norman Gaywood
Oh, and I welcome suggestions on how I might do the task more quickly,
elegantly, differently, etc :-)
And critiques of the code also welcome. I still have a strong perl5 accent
I suspect.

On Thu, 28 Oct 2021 at 13:15, Norman Gaywood  wrote:

> Executive summary:
>  - comparing raku 2021.10 with raku 2021.9
>  -comparing 3 ways of parsing (although the 2 string function ways are
> similar)
> - raku 2021.10 is better than 2 times as fast as 2021.9 using the
> string functions
> - raku 2021.10 is about the same as 2021.9 using a more general
> regular expression
> - regular expressions are still slow in 2021.10
>
> Side note: not shown here is also parsing with Text::LDIF. In 2021.9 it
> was comparable to the regex method. Not tried with 2021.10.
>
> I need to parse a 40K entry LDIF file.
>
> Below is some code that uses 3 ways to parse.
> There are 3 MAIN subs that differ in a few last lines of the for loop.
> The loop reads the LDIF entries and populates %ldap keyed on the "uid" of
> the LDIF entry.
> The values of %ldap are User objects.
> A %f hash is used to build the values of User on each LDIF entry
>
> The aim is to show the difference in timings between 3 ways of parsing the
> LDIF
>
> The 1st MAIN (regex) uses this general regular expression to build %f
>  next unless $line ~~ m/ ^ (@attributes) ':' \s (.+) $ /;
> %f{$0} = "$1";
>
> The "starts" MAIN uses starts-with() to build %f
>for @attributes -> $a {
> if $line.starts-with( $a ~ ": " ) {
>%f{$a} = (split( ": ", $line, 2))[1];
>last;
> }
>
> And finally the "split" MAIN uses split() but also uses the feature that
> User.new() will ignore attributes that are not used.
> ($k, $v) = split( ": ", $line, 2);
> %f{$k} = $v;
>
> That's the difference between the MAIN()'s below. Sorry I couldn't golf it
> down more.
> Running the benchmarks multiple times does vary the times slightly but not
> significantly.
>
> Results for rakudo-pkg-2021.9.0-01:
> $ ./icheck.raku regex
> 41391 entries by regex in 27.859560887 seconds
> $ ./icheck.raku starts
> 41391 entries by starts-with in 5.970667533 seconds
> $ ./icheck.raku split
> 41391 entries by split in 5.12252741 seconds
>
> Results for rakudo-pkg-2021.10.0-01
> $ ./icheck.raku regex
> 41391 entries by regex in 27.833870158 seconds
> $ ./icheck.raku starts
> 41391 entries by starts-with in 2.560101599 seconds
> $ ./icheck.raku split
> 41391 entries by split in 2.307679407 seconds
>
> -
> #!/usr/bin/env raku
>
> class User {
> has $.uid;
> has $.uidNumber;
> has $.gidNumber;
> has $.homeDirectory;
> has $.mode = 0;
>
> method attributes {
># return ;
>User.^attributes(:local)>>.name>>.substr(2);  # Is the order
> guaranteed?
> }
> }
>
> # Read user info from LDIF file
> my %ldap;
> my @attributes = User.attributes;
>
> multi MAIN ( "regex", $ldif-fn = "db/icheck.ldif" ) {
> my ( %f );
> for $ldif-fn.IO.lines -> $line {
> when not $line {  # blank line is LDIF entry terminator
> %ldap{%f} = User.new( |%f );
> }
> when $line.starts-with( 'dn: ' ) { %f = () }   # dn: starts a new
> entry
>
> next unless $line ~~ m/ ^ (@attributes) ':' \s (.+) $ /;
> %f{$0} = "$1";
> }
> say "{%ldap.elems} entries by regex in {now - BEGIN now} seconds";
> }
>
> multi MAIN ( "starts", $ldif-fn = "db/icheck.ldif" ) {
> my ( %f );
> for $ldif-fn.IO.lines -> $line {
> when not $line {  # blank line is LDIF entry terminator
> %ldap{%f} = User.new( |%f );
> }
> when $line.starts-with( 'dn: ' ) { %f = () }   # dn: starts a new
> entry
>
> for @attributes -> $a {
> if $line.starts-with( $a ~ ": " ) {
>%f{$a} = (split( ": ", $line, 2))[1];
>last;
> }
>  }
>
> }
> say "{%ldap.elems} entries by starts-with in {now - BEGIN now}
> seconds";
> }
>
> multi MAIN ( "split", $ldif-fn = "db/icheck.ldif" ) {
> my ( %f, $k, $v );
> for $ldif-fn.IO.lines -> $line {
> when not $line {  # blank line is LDIF entry terminator
> %ldap{%f} = User.new( |%f ); # attributes not
> used are ignored
> }
> when $line.starts-with( 'dn: ' ) { %f = () }   # dn: starts a new
> entry
>
> ($k, $v) = split( ": ", $line, 2);
> %f{$k} = $v;
> }
> say "{%ldap.elems} entries by split in {now - BEGIN now} seconds";
> }
>
> --
> Norman Gaywood, Computer Systems Officer
> School of Science and Technology
> University of New England
> Armidale NSW 2351, Australia
>
> ngayw...@une.edu.au  http://turing.une.edu.au/~ngaywood
> Phone: +61 (0)2 6773 2412  Mobile: +61 (0)4 7862 0062
>
> Please avoid sending me Word or Power Point attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
>


-- 
Norman Gaywood, Computer 

Primitive benchmark comparison (parsing LDIF)

2021-10-27 Thread Norman Gaywood
Executive summary:
 - comparing raku 2021.10 with raku 2021.9
 -comparing 3 ways of parsing (although the 2 string function ways are
similar)
- raku 2021.10 is better than 2 times as fast as 2021.9 using the
string functions
- raku 2021.10 is about the same as 2021.9 using a more general regular
expression
- regular expressions are still slow in 2021.10

Side note: not shown here is also parsing with Text::LDIF. In 2021.9 it was
comparable to the regex method. Not tried with 2021.10.

I need to parse a 40K entry LDIF file.

Below is some code that uses 3 ways to parse.
There are 3 MAIN subs that differ in a few last lines of the for loop.
The loop reads the LDIF entries and populates %ldap keyed on the "uid" of
the LDIF entry.
The values of %ldap are User objects.
A %f hash is used to build the values of User on each LDIF entry

The aim is to show the difference in timings between 3 ways of parsing the
LDIF

The 1st MAIN (regex) uses this general regular expression to build %f
 next unless $line ~~ m/ ^ (@attributes) ':' \s (.+) $ /;
%f{$0} = "$1";

The "starts" MAIN uses starts-with() to build %f
   for @attributes -> $a {
if $line.starts-with( $a ~ ": " ) {
   %f{$a} = (split( ": ", $line, 2))[1];
   last;
}

And finally the "split" MAIN uses split() but also uses the feature that
User.new() will ignore attributes that are not used.
($k, $v) = split( ": ", $line, 2);
%f{$k} = $v;

That's the difference between the MAIN()'s below. Sorry I couldn't golf it
down more.
Running the benchmarks multiple times does vary the times slightly but not
significantly.

Results for rakudo-pkg-2021.9.0-01:
$ ./icheck.raku regex
41391 entries by regex in 27.859560887 seconds
$ ./icheck.raku starts
41391 entries by starts-with in 5.970667533 seconds
$ ./icheck.raku split
41391 entries by split in 5.12252741 seconds

Results for rakudo-pkg-2021.10.0-01
$ ./icheck.raku regex
41391 entries by regex in 27.833870158 seconds
$ ./icheck.raku starts
41391 entries by starts-with in 2.560101599 seconds
$ ./icheck.raku split
41391 entries by split in 2.307679407 seconds

-
#!/usr/bin/env raku

class User {
has $.uid;
has $.uidNumber;
has $.gidNumber;
has $.homeDirectory;
has $.mode = 0;

method attributes {
   # return ;
   User.^attributes(:local)>>.name>>.substr(2);  # Is the order
guaranteed?
}
}

# Read user info from LDIF file
my %ldap;
my @attributes = User.attributes;

multi MAIN ( "regex", $ldif-fn = "db/icheck.ldif" ) {
my ( %f );
for $ldif-fn.IO.lines -> $line {
when not $line {  # blank line is LDIF entry terminator
%ldap{%f} = User.new( |%f );
}
when $line.starts-with( 'dn: ' ) { %f = () }   # dn: starts a new
entry

next unless $line ~~ m/ ^ (@attributes) ':' \s (.+) $ /;
%f{$0} = "$1";
}
say "{%ldap.elems} entries by regex in {now - BEGIN now} seconds";
}

multi MAIN ( "starts", $ldif-fn = "db/icheck.ldif" ) {
my ( %f );
for $ldif-fn.IO.lines -> $line {
when not $line {  # blank line is LDIF entry terminator
%ldap{%f} = User.new( |%f );
}
when $line.starts-with( 'dn: ' ) { %f = () }   # dn: starts a new
entry

for @attributes -> $a {
if $line.starts-with( $a ~ ": " ) {
   %f{$a} = (split( ": ", $line, 2))[1];
   last;
}
 }

}
say "{%ldap.elems} entries by starts-with in {now - BEGIN now} seconds";
}

multi MAIN ( "split", $ldif-fn = "db/icheck.ldif" ) {
my ( %f, $k, $v );
for $ldif-fn.IO.lines -> $line {
when not $line {  # blank line is LDIF entry terminator
%ldap{%f} = User.new( |%f ); # attributes not used
are ignored
}
when $line.starts-with( 'dn: ' ) { %f = () }   # dn: starts a new
entry

($k, $v) = split( ": ", $line, 2);
%f{$k} = $v;
}
say "{%ldap.elems} entries by split in {now - BEGIN now} seconds";
}

-- 
Norman Gaywood, Computer Systems Officer
School of Science and Technology
University of New England
Armidale NSW 2351, Australia

ngayw...@une.edu.au  http://turing.une.edu.au/~ngaywood
Phone: +61 (0)2 6773 2412  Mobile: +61 (0)4 7862 0062

Please avoid sending me Word or Power Point attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html


Re: ftp client yet?

2021-10-27 Thread Ralph Mellor
> On 10/25/21 22:21, Ralph Mellor wrote:
> >
> > You should be aiming to end up being able to write
> > something like the following three line Raku program:
> >
> > use lib:from 'dir-where-your-new-module-is';
> > use your-new-module:from;
> > RmdirAndLoop 'junk', 1, 3;

And when that's working, aim at just two lines:

use your-new-module:from;
RmdirAndLoop 'junk', 1, 3;

(ie the `use lib:from 'dir-where-your-new-module-is';`
is untidy. You should make that statement redundant by just
putting your module in one of your Perl's `@INC` directories.)



If you get it down to two lines, you could then reasonably ask
how you might reduce it to one:

RmdirAndLoop 'junk', 1, 3;

That would make sense if you wanted the module (that this function
is in) to be automatically loaded each time that Rakudo starts so you
don't have to bother to explicitly load it, but instead just always have
it, and its functions, available by default.

Rather than explain how to do that (it's not as simple as it could be),
that refinement can wait until you get it down to two lines first.

> love, -T

:)