On 5/26/23 15:43, enh wrote: > > what the kernel _actually_ does though is clamp to MAX_RW_COUNT. which > is > > actually (INT_MAX & PAGE_MASK). which i'm assuming changes for a > non-4KiB page > > kernel? > > I don't think any of my test images have a PAGE_SHIFT other than 12? > (Looks like > Alpha, OpenRisc, and 64 bit Sparc are the only 3 architectures that CAN'T > use a > 4k page size, and none of those are exactly load bearing these days.) > > (not relevant in this context, but darwin/arm64 is 16KiB. people do keep > trying > 64KiB linux/arm64, and one of these days they might succeed.)
I remember the litany of ouch from back when Alpha had forced 8k page size and was thus weird. Among other things, you could only mount certain members of the ext2 filesystem family on it. It's one of those "they've had all the time in the world since to fix this" meets "there is zero regression testing so this will bit-rot tremendously"... I also remember the FUN corner case with QEMU application emulation where host and target page sizes differed and mmap() system call translation had to figure out what to do with the leftover bit at the end of the last page. And I watched YEARS of Mel Gorman trying to make transparent hugepages work... *shrug* Not necessarily relevant to modern times, it's entirely possible it all got fix and is reliable now and I didn't get the memo. But I may have developed a tendency to just make 4k page size work and then wait for somebody to complain. :) > Halving the number of output system calls would theoretically save you > around > 0.015 seconds on a 10 year old laptop. > > So why does it have a ~20% impact on the kernel's throughput? The > kernel's cap > isn't even cleanly a power of 2. Maybe the kernel is using 2 megabyte > huge pages > internally in the disk cache, and the smaller size is causing unnecessary > copying? Is 1<<29 slower or faster than 1<<30? I didn't think letting > something > else get in there and seek was a big deal on ssd? Maybe a different > hardware > burst transaction size? > > This isn't even "maybe zerocopy from a userspace buffer above a certain > size > keeps the userspace process suspended so read and write never get to > overlap" > territory: there's no userspace buffer. This is "give the kernel two > filehandles > and a length and let it sort it out". We tell it what to do in very > abstract > terms. In theory the ENTIRE COPY OPERATION could be deferred by the > filesystem, > scheduling it as a big journal entry to update extents. On something like > btrfs > it could be shared extents behind the scenes. What is going ON here? > > excellent questions that should have occurred to me. I break everything, and keep having to clean up after myself. > i _think_ what happened is that my VM got migrated to a machine with different > performance. i'm now reliably 25s for everyone. (previously my coreutils > testing > had been on one day and my toybox on the next.) > > so, yeah, changing toybox here makes no noticeable difference. > > (on a related note, is there a clever way to make a 16GiB file without dealing > with dd? i used truncate and then cp to de-sparse it, but i was surprised > there > wasn't a truncate flag for a non-sparse file. i guess it's pretty niche > though?) truncate(1) is a wrapper for truncate(2), the system call you want is posix_fallocate() which hasn't got a command line wrapper I'm aware of... Oh look, they added one to util-linux. With 8 gazillion command line options for the linux-specific fallocate(2) syscall, because of course they did. (Collapse range is only supported on ext4, you say? What a good thing to expose to userspace...) Oh goddess, the -x flag. If the underlying filesystem doesn't support the syscall to do it the fast way, FAIL BY DEFAULT unless you provide a flag to fall back to do it the slow way. Imagine if cp worked that way, so you needed to say -x if sendfile() isn't supported. Having an -X to fail if you can't do the fast path makes sense, but failing by default is... ow. Why is -l an option? $ fallocate walrus fallocate: no length argument specified I mean seriously, WHY IS THIS AN OPTION? Why is it not "always argument #1" and then you can go offset:len if you want to start later in the file instead of having a separate -o? What is WRONG with... $ fallocate one two fallocate: unexpected number of arguments It doesn't even "FILE..." but instead works on EXACTLY ONE... -n, --keep-size Do not modify the apparent length of the file. This may effec‐ tively allocate blocks past EOF, which can be removed with a truncate. Oh look, a new way to damage filesystems I hadn't even thought of. (This 37 byte README file is eating 2 gigs of disk space. How droll...) Ahem. Yes, there's a way to do it. Yes I can add it. I may need a bit of a walk first. And possibly a muffin. Rob _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net