Catching up by <strike>burning the candle</strike> reading the email at both ends...

On 10/07/2013 06:06:47 AM, Conroy, Bradley Quentin wrote:
I finally figured out the NTFS labels after reading a rant on how UTF-8 rocks
and how MS switched to UTF16 or UCS1 or whatever.

I read that article. (It's a small twitter stream... :)

The reason I couldn't grep for the label (mine was "myntfs") was
that it is stored as "m\0y\0n\0t\0f\0s\0\0" - found another good
use for hexdump :)

I should add it to toybox. And make -C mode the default (ala diff -u). And make it share code with hexedit and possibly od.

(My first todo item in that area is figuring out why od gets the indentation wrong.)

Notes:
I only have x86 to test on,

Allow me to introduce you to aboriginal linux system images. Go to:

  http://landley.net/aboriginal/bin

Download a system-image of your choice (mips and powerpc are big endian), extract the tarball, run "./dev-environment.sh", and at the shell prompt wget source and compile it.

(Note that mips networking is broken with qemu 1.6, you'd need to use qemu 1.5 for that. Should work on powerpc though.)

More random documation-like stuff at http://landley.net/aboriginal/about.html

so there are a couple of places that may need bswap_{16,32} for endianness.

My limiting factor here is actually lack of test filesystem images.

I used a 65k buf instead of toybuf (4k) for simplicity, but tried to organize
it for toybuf if wanted.

Half the file is #defines, and then the first line of actual C code is a typedef. There may be some more extensive modifications coming than that.

Ok, convert tabs to two spaces and check that in.

Oh wow. You're making me pull out my tab conversion sed. Haven't used that in a while...

Ok, yank the typedef. Make function definitions match K&R like everybody else for the past 30 years, ala:

  type function(args)
  {
  }

(Yes, we don't do that anywhere else but that's because this is creating a new function and anywhere else isn't.)

You don't need #if CFG_BLKID because blkid.c only gets compiled if CFG_BLKID is enabled. (If the name of a *.c file under toys/ matches the name of a config symbol, the C file's inclusion is controlled by that config symbol.)

You have an if() statement at the left edge, not indented at all within its function, and then the function ends with:

}else /* fstype */
write(1,fstype,strlen(fstype)); /* avoid printf overhead in fstype */
  putchar('\n');
}

And the _reason_ that works is there's no curly bracket on the else so the write() belongs to the else but the putchar doesn't. Otherwise the function wouldn't end. Ouch.

The way to make an alias for a command is the OLDTOY() macro.

If you feed loopfiles() zero arguments, it reads from stdin. So calling blkid with no arguments hangs awaiting user input instead of printing its usage message. (Probably you don't want NULL optstring, you want "<1", at least for the moment.)

Let's see, what have I got lying around:

  $ ./toybox blkid ~/qemu/images/tccboot.iso
  $ ./toybox blkid ~/qemu/images/rh9.img
  $

iso9660 it doesn't know but ext2 it _should_. Oh, duh, that one's a partitioned image, and it doesn't recognize the partition table. Let's see...

  $ ./toybox blkid ~/system-image-armv5l/hda.sqf
  $

Squashfs? Hello?

Sigh. What did I break? Check the previous version... that didn't work either, and all I did to that was delete the fstype at the end that was breaking the build. Ah, maybe the "type punned pointer" warnings actually matter with this compiler version? Lemme build for i686... Nope, _still_ not identifying squashfs.

By the way, in terms of your 64k buffer (66k buffer, actually): no sane filesystem is going to have its identifying info straddle 4k blocks, so we should be able to read 4k chunks and iterate over the list for offsets in range. (This even avoids lseek, although I'm not sure why that would be an issue...)

Right, continuing to clean this up until I can make it work. What the HECK is this nest of MATCH macros calling each other for? (That's where the type punned pointer warnings come from, anyway...) Ah, it's only used for ext2/3/4. Because treating ext2, ext3, and ext4 as three separate filesystems just wouldn't do.

You don't need to strcmp toys.which->name with "blkid", you can just compare the first character to 'b'. (There are only two options...)

Alright, let's turn this giant stack of #defines and if/else staircase into a table with a loop iterating over it. Lets make the magic a uint64_t so we're not ignoring the second half of the btrfs magic you've got listed there, and let's just use the hex numbers like the kernel does, ala:

fs/btrfs/ctree.h:#define BTRFS_MAGIC 0x4D5F53665248425FULL /* ascii _BHRfS_M, no null */

Hmmm, you have a CRAMFS_MAGIC2 but your code doesn't seem to be using it. (The if is using a MATCH() macro instead of MATCH2().) Ah, the kernel header says that's the same number at the other endianness.

If JFS isn't even in /usr/linux/include/magic.h is it really an important filesystem to autodetect?

For NTFS, you have 8 as the label length (well, -8) but toutf8 fills out a 16 byte buffer? (And it doesn't actually have a length, it just keeps going until it hits a null terminator which there's no guarantee the file will have...)

Also, the NTFS label isn't _really_ alternating ascii and NUL bytes. It's horrible 16 bit wide character stuff that involves "codepages" and actually displaying labels from japan or korea just isn't going to work here. (Doing full windows internationalization isn't an option either. The question is, does the special case for ascii make sense or should we just not support labels here at all? I'm balancing "2/3 of the planet does not speak english" with "does android care about legacy windows crap that's this generation's version of punched cards?" Eh, I guess "windows was english only, the future is UTF8" is a reasonable compromise...)

However, add to that the fact that ntfs is the only filesystem that has a label in a different 4k block than the ID info, and special casing this really sounds like more trouble than it's worth. Are there a lot of thumb drives formatted NTFS out in the wild? (I'll add code to deal with a real world problem, my question is whether this is a real world problem? No idea.)

Also... ntfs has an 8 bit uuid? What? (It's the only one that does...)

Hang on, this thing doesn't identify vfat? (Which most external USB devices are formatted with?) Hmmm, I know microsoft's documentation says not to use the "FAT16" and "FAT32" strings for filesystem identification, but I don't care.

Ok, printing out the uuid there's three different possible bit-patterns for where the "-" go, one for 16 (the default), one for 4 (vfat), and one for 8 (ntfs, no dashes). I think rather than having a separate uuid length field that's usually 16 I'll encode the non-16 values in the top few bits of the offset, since I've got an int. (Offset already won't fit in a short.)

Hmmm, in testing FAT's uuid bytes are presented in reverse order from the tool ubuntu's using. But ext2 isn't...

Need test images. Lots and lots of test images...

I have info on more fs types, to patch with after review.

I don't know what fs types count as "interesting". You have BFS which isn't in /usr/include/linux/magic.h, but don't have fat16 or fat32.

blkid does output for all devices if 0 args -> read /proc/partitions?

Possibly. (You can run the other one under strace to see what it's doing.)

Rob
_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to