On Fri, May 26, 2023 at 10:26 AM Rob Landley <r...@landley.net> wrote: > > On 5/25/23 19:08, enh via Toybox wrote: > > so i finally enabled copy_file_range for the _host_ toybox because someone > > pointed out that we copying 16GiB zip files around in the build, and even > > though > > obviously we should stop doing that, 30s seemed unreasonable, and coreutils > > cp > > "only" took 20s because of copy_file_range. > > Hardlinking them is not an option? :) > > > but toybox cp with copy_file_range still takes 25s. why? > > > > if (bytes<0 || bytes>(1<<30)) len = (1<<30); > > > > the checkin comment being: > > > > Update comments and add "sanity check" from kernel commit f16acc9d9b376. > > (The kernel's been doing this since 2019, but older kernels may not, so...) > > The problem being that _before_ that commit, too big a sendfile didn't work > right (returned an error from the kernel?). I suspect my range check was just > the largest power of 2 that fit in the constraint...
is that true? the diff for that commit makes it look like it internally silently used `min(MAX_RW_COUNT, len)` which should be fine with the usual "subtract what was actually written" logic? (libc++ just started to use copy_file_range(), and i asked whether they knew about this limit, and then couldn't explain why toybox has a special case...) > > what the kernel _actually_ does though is clamp to MAX_RW_COUNT. which is > > actually (INT_MAX & PAGE_MASK). which i'm assuming changes for a non-4KiB > > page > > kernel? > > I don't think any of my test images have a PAGE_SHIFT other than 12? (Looks > like > Alpha, OpenRisc, and 64 bit Sparc are the only 3 architectures that CAN'T use > a > 4k page size, and none of those are exactly load bearing these days.) > > But I wouldn't have expected it to be that much slower given the block size > here > is a megabyte, and the number of transactions being submitted... 16 gigs done > a > megabyte at a time is 16k system calls, which is: > > $ cat hello2.c > #include <stdio.h> > > int main(int argc, char *argv[]) > { > int i; > > for (i = 0; i<16384; i++) dprintf(1, " "); > } > $ gcc hello2.c > $ strace ./a.out 2>&1 | grep write | wc -l > 16384 > $ time ./a.out | wc > 0 0 16384 > > real 0m0.033s > user 0m0.012s > sys 0m0.043s > > Halving the number of output system calls would theoretically save you around > 0.015 seconds on a 10 year old laptop. > > So why does it have a ~20% impact on the kernel's throughput? The kernel's cap > isn't even cleanly a power of 2. Maybe the kernel is using 2 megabyte huge > pages > internally in the disk cache, and the smaller size is causing unnecessary > copying? Is 1<<29 slower or faster than 1<<30? I didn't think letting > something > else get in there and seek was a big deal on ssd? Maybe a different hardware > burst transaction size? > > This isn't even "maybe zerocopy from a userspace buffer above a certain size > keeps the userspace process suspended so read and write never get to overlap" > territory: there's no userspace buffer. This is "give the kernel two > filehandles > and a length and let it sort it out". We tell it what to do in very abstract > terms. In theory the ENTIRE COPY OPERATION could be deferred by the > filesystem, > scheduling it as a big journal entry to update extents. On something like > btrfs > it could be shared extents behind the scenes. What is going ON here? > > > sadly 2019 is only 4 years ago, so there's a decent chunk of the 7 year rule > > left to run out... > > I'm happy to change it, but I'd like to understand what's going on? We can > switch to the kernel's exact size cap (assuming sysconf(_SC_PAGE_SIZE) is > reliable), but _why_ is that magic number we had to get by reading the kernel > source faster? We're handing this off to the kernel so it deals with the > details > and _avoids_ this sort of thing... > > (Why the kernel guys provided an API that can't handle O_LARGEFILE from 2001, > I > couldn't tell you...) > > Rob _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net