A little while ago I developed a Raku program to look up file ownership in a Github CODEOWNERS file. My first attempt worked by converting the glob-like CODEOWNERS patterns into regexes, eg:
app/views/**/*.rb --> /^ 'app/views/' [<-[/]>+]* % '/' '/' <-[/]>* '.rb' $/ Surprisingly, this approach was so slow as to be a non-starter, no matter how I tweaked it. So, I wrote another implementation that uses a recursive subroutine to compare path components against the CODEOWNERS patterns. Originally it was a series of multi subs, but I recast it as a single sub when I thought that might make it faster: sub matches(@strs, @patterns) { if @patterns ~~ ('',) { True; } elsif @patterns ~~ ('**', **) { ?grep { matches @strs[$_..*], @patterns[1..*] }, @strs.keys; } elsif @strs && @patterns { my $str = @strs[0]; my $pattern = @patterns[0]; do if $pattern.contains('*') { my @words = $pattern.split('*'); $str.starts-with(@words[0]) && $str.ends-with(@words[*-1]) && sub has-words($start, [$word, *@rest]) { my $n = $str.index($word, $start); $n.defined && (!@rest || has-words($n + $word.chars, @rest)) }(0, @words); } else { $str eq $pattern; } && matches @strs[1..*], @patterns[1..*]; } else { @strs == @patterns == 0; } } Instead of using a regex to match the asterisk wildcard pattern, I split the pattern into asterisk-delimited chunks, then check that the test string starts with the first chunk, that it ends with the last one, and that the chunks occur in the string in order and without overlapping. (Real CODEOWNERS patterns can have a question mark wildcard, but my company doesn't use any and so I've omitted that case here for simplicity.) This seemed to work well enough for me. It could look up a single filename in our ≈900 line CODEOWNERS file in about a second. More than that to look up all of the files in a modest-sized pull request, but still tolerable. Then I decided I wanted to share my work with the rest of my company. The high-level language of choice here is Ruby, so I translated the above code into it and tried it out. I expected rough parity, but the Ruby version *killed* the original Raku version. Checking the ownership of a hundred files takes the Ruby version about a quarter of a second; Raku takes *ten times as long.* I was flabbergasted. Although it seemed futile, I looked for ways to speed up the Raku version. Surprisingly, I found one right away. I've rarely employed smart-matching with arrays as I do with @patterns in the first two branches above, but to make sure the Raku version had exactly the same logic as in Ruby, I rewrote them: if @patterns ~~ ('',) --> if @patterns == 1 && @patterns[0] eq '' if @patterns ~~ ('**', **) --> if @patterns && @patterns[0] eq '**' This change cuts the execution time in *half*. Apparently list smart-matching is *really, really expensive!* So that was a big win, but Raku still took five times as long as the Ruby version. I couldn't see anything else obvious to try in the matches subroutine, so I looked at the main loop over the CODEOWNERS lines. Cut down a bit, it looks like this: for $repo-path.add('CODEOWNERS').lines.kv -> $n, $line { my ($pattern, @owners) = $line.words; my @patterns = $pattern.split('/'); for %paths.kv -> $file, @path { if matches(@path, @patterns) { # record who owns this file } } } I tried skipping the bulk of the work entirely by changing if matches(@path, @patterns) to if False. Of course this shortens the execution time by a lot...but it still takes twice as long as the Ruby version that does all of the matching work. So, yeah. This was a very disappointing exercise. I'm posting it here to share my experience and to vent a bit, but if anyone can see anything I missed that could significantly pare down the execution time, I'm all ears.