Sorry, I was running the profile on a 4-weeks-old rakudo. After the optimizations i did to canonpath ~22 days ago the canonpath inclusive time went down to about 18% ...
FILETEST-D and FILETEST-F are in spots 3 and 4, but they only take 3594 / 26881 msec and 2749 / 216298 msec per invocation, so they are just called really, really often. next up is the loop from dir(), the one inside the gather. it sits at 10.44% inclusive time and 3.16% exclusive time. dir itself takes 17.8% inclusive time and 3.12% exclusive time, which seems to suggest it has a bit of overhead i'm not quite sure where to find. match is up next at about 8.8% inclusive and 2.91% exclusive. I'm not sure where exactly regex matches happen here; maybe a part of it comes from canonpath. AUTOTHREAD is at 12% inclusive time, which i find interesting. Perhaps the reason is the test for dir, which by default is a none-junction. Perhaps we can just use a little closure instead that eq's against . and .. instead of going through a somewhat-more-expensive junction autothread. That's all my thoughts for now.