On 4/25/19 8:05 PM, enh wrote: > having written a parallel grep in Java almost 20 years ago, mmap isn't
I wrote a glob implementation for OS/2's installer in 1996. Been there. :) > an obvious win. it's very dependent on what you're searching (number > and size of files), and to some extent what you're searching for. iirc Linus had a rant on this over a decade ago. He came down on the side of doing page size read/write in a loop, if I recall. (Which you'll notice my code doing a lot: toybuf and libbbuf are one page each for a reason. I should probably do magic to make sure they're page aligned, but see "wanted to hit 1.0 before opening performance optimization can of worms"). It's likely the optimal granularity's changed since, though. Disk sector sizes went from 512 to 4096, for one thing, this may even have been shortly before sata disk I/O became the norm. (And before usb3.) This is also why I want to do the one big mmap to avoid mremap(), and I'm also not sure MADV_SEQUENTIAL is a win. Need to benchmark it. (Especially with long lines and multiple regexes on a system doing something else where lines cross a page boundary. Even just putting the page on the clean list and soft-faulting it back in is expensive...) For the smaller ones, yes reading into a buffer is preferable. But what I _really_ want is "start DMA-ing data into this userspace buffer but don't block yet" and then later a "block until outstanding reads finish" so I can overlap processing and I/O. The nonblocking read and direct access APIs both do the opposite of that. I'd hoped to find a preadv2() flag for this, but no. (And where would the corresponding wait() interface live? An ioctl?) I _can_ get that from mmap() with MADV_WILLNEED, assuming the mapping overhead doesn't dominate. Possibly there's a minimum file length at which it's worth it, and it should fall back to treating it as a pipe below that? This is why I didn't want to open this can or worms yet. It HAS NO BOTTOM. > https://blog.burntsushi.net/ripgrep/ talks about this with much more > modern data, and they did actually go with a heuristic to guess when > they should/shouldn't use mmap. (glibc's stdio will use mmap in some > circumstances. i don't know of any other libc that does.) My phone battery died (the new laptop doesn't recognize it as a device, thus doesn't up the USB port's power high enough to charge it; I didn't know that LED could flash red). > what NetBSD grep (which is what we're coming from) _does_ spend a lot > of code on is avoiding regexec(3). they recognize easy cases (like > "you could have used -F") and handle them more cheaply. in an ideal > world, that would be in the regex implementation, but never having > tried i don't actually know how realistic that is. (if i had the time, > i'd love to compare against RE2.) Whatever tests netbsd is doing on the string it's not passing to regcomp could also be done in regcomp. 95% likely there. The question is why they didn't in _their_ libc. Haven't looked at it. (For licensing reasons I mostly don't look at other implementations' code when doing a toybox implementation of something. I try hard to clean-room it from RFCs and man pages and wikipedia and strace and so on. I spent a decade in the intellectual property mines and now have a certain "hazmat suit" mentality about the lot of it. NetBSD isn't the kind of full-on bugnuts crazy the FSF and SFC evince, but I'm not reintroducing the license stuttering problem to my code either.) That said if somebody _else_ wanted to look and tell me, that's how "clean room" works. :) >>> I might even be able to get it to mmap() and mremap() the data traversing >>> down a >>> file without too intrusive a second path. (Does madvise(MADV_WILLNEED) >>> persist >>> across mremaps of the same mapping, or do I need to call it again?) I note that my short-term todo pile is starting to teeter and what I should _really_ do is clean up the new man implementation and find the gzip second file bug and finish deflate and the new mkfs.vfat and FINALLY GET STARTED ON THE TOYSH REDO... Rob P.S. Yes the new tryfile() function I added to man.c thrashes the negative dentry cache, but if _man_ winds up being performance critical we shouldn't have one. And I couldn't test the original on my system because A) the supplied test wants to write into /usr/share/man which ain't happening, B) it doesn't understand the ".gz" entry and all the devuan man pages are gzipped because it's the 80/20 of archivers. _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
