On 4/27/24 20:44, Oliver Webb via Toybox wrote: > Doing minimal linux system setup with mkroot and trying to create a minimal > environment > with a native toolchain to run autoconf in. This would mean getting the > native static > toolchain for my architecture from > https://landley.net/toybox/downloads/binaries/toolchains/latest/. > Mounting the image (Why are cross compilers tarballs while native compilers > are fs images?
Copying the native compiler into the initramfs takes more space than initramfs can comfortably hold. The run-qemu.sh in mkroot defaults to -m 256 (I.E. 256 megabytes system memory), and some board emulations (like mips) _can't_ map more than that. (Making the boards consistent is good, it's enough to run a single threaded compile, and it's nice for running lots of instances in parallel on the host ala mkroot/testroot.sh. Even ignoring that, the kernel's cpio extractor generally has its own size limits. The initial physical memory layout only leaves so large a gap between "where we loaded the cpio.gz" and "where we extract it to", and when you fill up that gap at a certain point the extract overwrites the data it's reading, because initramfs isn't _expected_ to be multiple gigabytes in size. Again, how much you've got varies by target but adding a quarter gigabyte of toolchain didn't work on multiple boards when I tried it. Shrinking the toolchain down has some hard limits: even way back in the aboriginal linux days when I was trying to set up a tinycc compiler on target, just the extracted /usr/include headers took up quite a bit of space: $ cd ccc $ du -s i686-*cross/*/include 23148 i686-linux-musl-cross/i686-linux-musl/include Currently 23 megabytes (and another couple megabytes for the compiler includes). Keeping them in a squashfs was more memory efficient. > Wouldn't making them tarballs mean that you could extract their contents > without running > losetup and dealing with mounting devices and needing root permissions ? Squashfs is an archive format, there's an unsquashfs command to extract it if you want to fiddle with it on the host, although mount-and-copy in mkroot works too. The problem (read-only) mounting a compressed archive is seekability: on normal block devices the kernel can jump around and grab chunks of directory information and file contents into dcache and page cache, and be free to discard them again under memory pressure so they should be cheap to get back. That's the design expectation for filesystems. The problem with a tarball is you need to extract the whole thing starting at the beginning to find where anything _is_. You can fix that by building an index at mount time (extract the whole thing, examine the contents, and make notes) but that makes mount really slow and also means you have a data tree you can't discard so you've more or less pinned your directory cache if you want to know where all the files start. Zip file format addresses the dentry part because it was designed to let you extract individual files, but it doesn't address seekability _within_ a file. If you try seek 10 megs into a file (or mmap from that point) it has to extract and discard 10 megs of data. (The main downside of zip files A) individually compressing each file is less efficient than compressing the whole archive, so they tend to be larger, B) zip puts all its metadata at the _end_ of the file, so if the file is truncated at all you've lost ALL the contents because it doesn't know what any of the rest means anymore. Incomplete zip file transfers were worthless because it has to start reading at the end to find anything. The reason it did that was so amending existing zip files in place was quick, because it can remove and rewrite the metadata easily. If the metadata wasn't at the end and needed to be expanded, it would either need to move all the file contents to make room, or break the metadata into chunks and parse together scattered overlays. Of course replacing a file in the archive wasted space because unless the old file had coincidentally been at the very end of the archive, it left the old one in there and just added the new copy and updated the index to point at it.) Most compression formats handle files in chunks: bzip2 does 900k blocks, gzip does periodic dictionary resets, etc. Using a compression format with a reasonable chunk size and tracking where each chunk starts lets you handle seeks reasonably well, and that's what squashfs does. I haven't looked up the actual file format, but conceptually it's a zip file plus chunk indexes within files. > I trust they were > made fs images for a good reason, but... _why_). Within mkroot, squashfs is easier to deal with because I don't need to reserve destination space to extract everything into to poke at the contents. Outside of mkroot, squashfs isn't that much harder to play with, mostly just less familiar. > And ideally running a mkroot overlay on > it because that's what the overlays seem to be made for, but...: > > /sbin/cp: cannot overwrite non-directory '[Path to root/host/fs]/././lib' > with directory '[Path to toolchain]//./lib' I have a todo item for that but there's a design sharp edge here: if we follow destination symlinks out of tree then cp is doing bad security things. I added tar --restrict to handle that but haven't done anything with cp yet. (My pending cp item my tree is dirty for is making -s work properly with relative paths...) > It's whining that It's trying to copy a directory to a symlink of a directory. > A working but non-viable solution is to use rsync -Ka (You also need > --exclude=usr > when copying the native toolchain, maybe a CPFLAGS for the overlay?). > But rsync is non-standard. Have you considered or ran into this problem > before? I've run into it before, yes. It's on the todo list. I can work around this, but people who drop a symlink in their chroot pointing to /bin on their host probably don't want those _followed_ when copying more stuff into their chroot. (You'll notice all the mkroot symlinks are relative, not absolute paths. There's a reason for that.) Oh, the other todo item here is "multiple overlays". The current overlay package was a quick hack, never did the design work to figure out what what more complication should look like. Partly waiting for people to complain to me that they need more than it does... > Then even after that, it whines that cc1 isn't there because it didn't copy > the > usr/ symlink: > > # echo 'int main() {printf("123\n");}' | gcc -xc - > gcc: fatal error: cannot execute 'cc1': execvp: No such file or directory > compilation terminated. > > Because it's trying to exec /usr/x86_64-linux-musl/bin/cc1, which we _can't > copy_ > for multiple reasons. I believe when I tried it the failure had actually been that the kernel cpio extractor silently truncated the initramfs when tmpfs hit 50% of memory and started returning "full" to writes. Given that the cross compiler toolchain is about 260 megs and you're trying to load it into a vm with 256 megs of physical memory. (And increasing the size hit the mapping conflicts.) https://landley.net/notes-2017.html#29-07-2017 There was earlier stuff in this area at: https://landley.net/notes-2011.html#10-09-2011 https://landley.net/notes-2011.html#17-09-2011 But that's probably obsolete by now. (I'm still using the arm versatile board for armv5l because I can, but armv7l is using the virt board for a basically interchangeable architecture...) > Okay, linking them manually on the host image. Now I'm able > to use the compiler. The configure script for the thing I'm working with fails > (something about the read builtin), but I can remedy with my host bash copying > package for now. It hasn't got "make". Kind of limiting factor not to have a make command on the target. > Is running a compiler under mkroot a goal, and if so why haven't you made a > package > for it? Toybox hasn't got awk, which pretty much every autoconf invocation needs. The toybox shell hasn't got "return" or "trap" yet, and does not even run the toybox test suite. I built a couple hello world variants with the native compiler and went "ok, toolchain smoketested, next todo item"... Note that i already _did_ all this back when I was maintaining busybox, back in https://landley.net/aboriginal/about.html and the https://landley.net/aboriginal/control-images/ at https://github.com/landley/control-images/tree/master/images/lfs-bootstrap did build Linux From Scratch. For all I know you might even be able to stick https://landley.net/aboriginal/downloads/control-images/ into the system images from https://landley.net/aboriginal/downloads/binaries/ under a modern qemu and get the build to run there, dunno. Haven't tried it in a while. Heck, https://landley.net/aboriginal/downloads/old/ goes back to 2006 (that correlates to https://landley.net/aboriginal/old/ back when I was running the build under User Mode Linux instead of QEMU, and had a hacked up copy of lilo... The old ruminations about THAT stuff were on https://landley.livejournal.com rather than the current blog, and I have no idea if it's still accessible, haven't looked in years.) Busybox did not start out able to build Linux From Scratch. I spent years gradually making it replace more and more gnu commands until there were none left. I talked in https://landley.net/aboriginal/history.html about how I started working on that, which trails off into unfinished stuff at the end and I've never gotten back around to finishing the story, but it's a moving target and who cares about ancient history? > (I'm assuming this is a MUCH larger and more complex issue than I was > originally > expecting, but still something to just copy the native toolchain to the > filesystem isn't > hard (It is if you are doing it with raw overlays, but not if you have bash > scripting at > your disposal)) Is it a situation where it's a TODO item and just not ready > yet? A) A compiler without "make" isn't very useful. I got it to build a couple small things from the command line, but can't make it build whole packages without more infrastructure. B) I already _did_ this with busybox, and the aboriginal linux stuff is still there, and grotesquely documented at https://landley.net/aboriginal/about.html which has its own FAQ.html and documentation.html and build-stages.html and the whole control-images subdirectory and I did SO MUCH BLOGGING about that stuff back in the day, and in the nav bar on the left is a PDF with the slides for a giant eight hour presentation about it all... My periodic poking at Linux From Scratch in my blog is me advancing this agenda, but it's not very useful with the shell in its current state, and I haven't managed to focus on the shell in a couple months because of interrupt du jour. (Currently I should be fixing unshare...) Rob P.S. The big delay in restarting all this was over GPLv3 toolchain hosting, which I explained in the blog a _lot_ between 2017 and 2022 but eventually just bit the bullet and did around https://landley.net/notes-2022.html#23-04-2022 but that ignores the llvm toolchains (hexagon is llvm-only), NDK/AOSP toolchain unification (which Ellioot says has advanced since last I checked), me building a bionic+llvm toolchain from source myself, my kernel cc unification patch for making llvm toolchains just work there, all the fdpic toolchain stuff, a toolchain build script that's _not_ based on Rich's musl-cross-make (which he has historically gone almost 2 years without updating, although he caught it up recently)... _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net