On 02/21/2016 11:34 PM, Isaac Dunham wrote: > On Sun, Feb 21, 2016 at 11:30:09PM -0500, Rich Felker wrote: >> On Sun, Feb 21, 2016 at 08:42:06PM -0600, Rob Landley wrote: >>> On 02/21/2016 03:39 PM, Rich Felker wrote: >>>> On Sat, Feb 20, 2016 at 02:28:22PM -0600, Rob Landley wrote: >>>>> On Wed, Feb 17, 2016 at 7:02 PM, enh <[email protected]> wrote: >>>>>> On Wed, Feb 17, 2016 at 3:32 PM, Rob Landley <[email protected]> wrote: >>>>>>> On Wed, Feb 17, 2016 at 10:22 AM, enh <[email protected]> wrote: >>>>>>>> It's necessary to distinguish x86 and x86-64 to be able to recognize >>>>>>>> the >>>>>>>> way x32 is encoded in ELF. >>>>>>> >>>>>>> Hmmm. That's not fun. >>>>>>> >>>>>>> I note that I spent the morning teaching the code to read/display the >>>>>>> dynamic linker name, so this patch won't "git am" directly. >>>>>>> >>>>>>> Reading the patch, we're pretending that arrch64 has nothing to do >>>>>>> with arm? No mention of arm in this architecture? Ok... (I guess >>>>>>> Cortex-M isn't arm either, but don't currently have an example binary >>>>>>> of that to test.) >>>>>> >>>>>> well, you're the one who removed my original "ARM aarch64" which is >>>>>> what the regular desktop file(1) says :-P >>>>> >>>>> But not what the linux-kernel developers ever seem to say in their >>>>> patch submissions. >>>> >>>> Regardless of what you think about these naming choices, IMO there's >>>> little value in a file(1) that does not print the names that scripts >>>> using it expect to see. >>> >>> _What_ scripts? I don't know what would be using this. (All for >>> real-world tests, but I have yet to find a build script using the file >>> command. Looking at /proc, sure, but not calling file...) >> >> I suspect it's stuff like: case "$(file "$f")" in ...
$ file /bin/ls /bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=9d2a434c4ff55aad2ddd19348c0ac75971606483, stripped Heck of a switch statement. >> I'm not thinking of build scripts for packages (this would not be >> remotely portable usage) but things like private admin scripts, >> perhaps printer filter scripts, file preview/thumbnail generation >> scripts, etc. I don't have any good examples at hand but this was the >> historical justification I always saw for the rather arcane/antiquated >> forms for many of the names. > > IIRC, I've used file in an HTML index-generating script, in a very > similar way. HTML is the home turf of mime types. I agree that if/when we do mime output, it should match exactly with other implementations. There's even something vaguely standards-shaped I've bookmarked but not read yet: http://www.iana.org/assignments/media-types/media-types.xhtml > (Said script would create a preview/thumbnail or get the > first few lines of text, then embed that in a table.) > But that didn't use binary architecture... > BSD-ish printer filter scripts certainly use 'file'...but again, that > doesn't deal with ELF binary types. > (CUPS largely eliminates the need for using file, because it converts > between known types.) > > A cgi script to pick the 'correct' download to offer someone is the > only use I can think of. > Someone's probably found a use for it, but I'll be surprised if someone > who parses the ELF details enumerated by file is using toybox soon. Or would rewrite their script to do so... >>> If the script wants to match "Intel 80386" explicitly, then do I have to >>> say that for i686? >> >> I would think it makes sense to preserve the "Intel 80386" convention >> here. There's not even a reliable way to detect that a binary is for >> "i686" anyway. Manufacturer's name in some chip types but not others? It's _already_ inconsistent... >>>> The choice to use aarch64 instead of arm64 is >>>> in some ways also a consequence of this, or rather an intentional >>>> _mismatch_ with patterns that should not match. The fact that mips64 >>>> and powerpc64 match mips* and powerpc* was historically very >>>> problematic. >>> >>> grep -w? Test for 64 bit first? >> >> Indeed, testing for 64 first is the right approach (see musl's >> configure script for an example), but the problem arises when the >> existing tests were written before the 64-bit version of the platform >> existed. I suspect there are lots of scripts that match arm*-*-* in >> the machine tuple or arm* in `uname -m` which would have wrongly >> detected "arm64" as arm. > > Oh yes...just about every autoconf script, for example. You'll notice I implemented uname a long time ago, and for uname -m I have a gross hack to make it work: https://github.com/landley/toybox/blob/master/toys/posix/uname.c#L28 Similarly, sed has: https://github.com/landley/toybox/blob/master/toys/posix/sed.c#L1019 I've put in gross hacks to maintain compatability with empirical test cases. But I won't do it speculatively, because maybe somewhere there might be a user we don't know about. >>> Elliott was suggesting that the elf-em.h constants might be good enough, >>> but that says 386 not 80386, and PPC instead of Powerpc... >>> >>> Seriously, standards would be nice! >> >> Yes, wouldn't they? :) > > Speaking of those, I've spotted a number of finer details that need > some polish... > - per POSIX, 'cannot open' must be in the 'type' string if open() fails > (both EPERM and ENOENT); we only do that if open() succeeds and fstat(fd) > fails. > - symlink detection (as per POSIX) won't work: opening them O_RDONLY > results in following the link, then we fstat() the fd. > - file 'FIFO' causes a hang; open() won't return till there's a writer. > > As far as I can tell, fixing these means we need to call loopfiles_rw > with failok=1 and O_NONBLOCK in flags, sometimes fall back to lstat(), > and possibly add O_NOFOLLOW to flags. I'll take a look. Thanks, Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
