I've been banging on dd, and here's the first section of my notes: I'm uncomfortable having a command repeat the same information in multiple places. In this case, something like C_SEEK is #defined, that define is used in an array, then it's used in a case statement during option parsing, and in an if statement performing the actual work. So if you add or remove an option, you have to do so in four places.
We actually need the information in _maybe_ two places, and even that's technically an optimization by not having the option parsing inside the loop doing the actual work, thus cacheing the results of command line option parsing. In theory there's common infrastructure to handle the command line option parsing (lib/args.c), but it doesn't handle keyword=value arguments which is why we have to implement something different here. Given how common keyword=value arguments are, maybe we can factor some of that out into lib in the future. We can _definitely_ factor out the "comma separated list" part (conv=), which is also needed by "mount -o", but let's implement the potentially common infrastructure locally for now, and then peel off into lib/lib.c later. So: I want to massage this code to remove the need for us to #define information before we can use it, and to make at least some of the new option parsing infrastructure reusable by other commands. The size suffices are also kind of weird weird. We already have atolx() which is parsing suffixes, but these are _different_ suffixes, and it's got a "d" option to do decimal instead of binary units (so multibyte suffixes). It's using "c" is 1, "w" is 2, "b" is 512 (presumably meaning disk block size, except it's 4096 these days)... It has mega and giga but not tera? I guess you don't want to malloc a terabyte any time soon, but that's fairly domain specific and might change within 10 years. Is this really what posix specifies? Nope: posix specifies "k" and "b", and that's it. This seems to have come from gnu, and busybox seems to have copied gnu, and this is just implementing what gnu did. Except busybox added "swab" last year, so the set of options this command is implementing is basically a snapshot of what busybox was doing at some point in the past. (Why did busybox implement swab? The mailing list has somebody just submitting it out of the blue without actually explaining their use case for it. I'd think this option was on the obsolete side: the "nuxi" problem was fairly specific to 16 bit systems, 32 bit systems have the "xinu" problem instead. :) Alright, back to the size suffix issue: trying to extend atolx() doesn't seem helpful, that's returning "long" and this is "long long" (different on 32 bit systems, still relevant for about the rest of this decade). The suffixes are different, with at least one outright conflict (b=block vs b=byte). The decimal sizes is sort of interesting but nobody's actually _asked_ for it. So yeah, we probably need a new function in to dd to do this parsing, but we can at least move the table into the function. And we can use the "one\0two\0three\0" trick to avoid an array of pointers (8 bytes each on 64 bit systems) to the actual string data, and have the names and values both local variables. (I think locality of reference trumps collating 'em, but that's a judgement call bordering on opinion.) As for option parsing corner cases... why is it skipping spaces at the beginning? $ dd " count=1" dd: unrecognized operand ` count=1' $ ~/busybox/busybox/busybox dd "count= 1" dd: invalid number ' 1' In the submitted toybox implementation, "dd count=" with no argument treats it as 0, both of those say invalid number. What does posix say? It... doesn't seem to specify... Wait, posix says number parsing should support: Two or more positive decimal numbers (with or without k or b) separated by x, specifying the product of the indicated values What, _really_? Um, wow: gnu accepts it. (Busybox doesn't.) And the _sad_ part is that posix only says you have to do that for bs, cbs, ibs, and obs, but not count= (and indeed, gnu won't take it for count). Right, I need to go closely read what posix actually says for this command, and at least document our deviations from it... Meanwhile, attached is a first stab at new parsing functions for dd that might be reusable in lib. It doesn't actually work yet (I have several test cases that break it and I'm not sure this snapshot even compiles) but just FYI on the direction I'm going... Rob
/* dd.c - data duplicator * * Copyright 2014 Rob Landley <[email protected]> * * See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html * * Deltas from posix: * no ebcdic (conv=ascii,ebcdic,ibm) * no punchcard records (conv=block,unblock cbs=) * no lcase/ucase (because utf8 and fixed block sizes don't mix) USE_DD(NEWTOY(dd, 0, TOYFLAG_USR|TOYFLAG_BIN)) config DD bool "dd" default n help usage: dd [if|of=FILE] [ibs|obs|bs|count|skip|seek=N] [conv=TYPE[,TYPE...]] Read and write data with specified granularity and formatting. if=FILE Read from FILE (stdin) of=FILE Write to FILE (stdout) bs=N Read/write block size (512b) count=N Stop after copying N blocks ibs=N Read N byte blocks (=bs) obs=N Write N byte blocks (=bs) skip=N Skip N input blocks seek=N Skip N output blocks conv=TYPE one of: notrunc Don't truncate output noerror Continue after read errors sync Pad blocks with zeros fsync Don't exit until output done Numbers [N] can end with unit size [cwbkmg] meaning 1, 2, 512, kilo, mega, or gigabytes, plus an optional d meaning decimal (base 10) units. */ #define FOR_dd #include "toys.h" GLOBALS( char *arg; ) // If *a starts with b, advance *a past it and return 1, else return 0; int strstart(char **a, char *b) { int len = strlen(b), i = !strncmp(*a, b, len); if (i) *a += len; return i; } // A bit like atolx() but does long long and takes different suffixes, // and we might as well handle the type= while we're at it static int dd_atolx(char *match, long long *val, int min) { long long obsolete = 1; int mul[] = {500, 1000, 1000000, 1000000000, 1, 2, 512, 1<<10, 1<<20, 1<<30}, i; char *arg = TT.arg, *suf, *suffixes = "bd\0kd\0md\0gd\0c\0w\0b\0k\0m\0g"; if (!strstart(&arg, match) || *(arg++)!='=') return 1; // Loop for posix's insane legacy "1x2x3x4" syntax predating $(( )) in shells. for (;;) { if (!isdigit(*arg)) goto fail; *val = strtoll(arg, &arg, 0); if (!*arg) break; // Loop trying each suffix for (suf = suffixes, i = 0; ; i++) { int len; if (i == ARRAY_LEN(mul)) goto fail; if (!strncasecmp(arg, suf, len = strlen(suf))) { *val *= mul[i]; arg += len; break; } suf += strlen(suf) + 1; } // posix's insane legacy multiplier syntax if (*arg == 'x') { obsolete *= *val; arg++; } else if (!*arg) break; printf("remaining=%s\n", arg); } *val *= obsolete; if (*val >= min) return 0; // Matched type but couldn't parse number, die loudly. fail: error_exit("bad %s", TT.arg); } // Note: modifies "string" to remove matches (because mount needs that) void comma_collect(char *string, char *list, unsigned *flags) { for (;;) { int no, bit = 0; char *ss, *s = string; // Skip leading/trailing/duplicate commas while (*s == ',') s++; if (!*s) { *string = 0; return; } no = strstart(&s, "no"); for (ss = list; *ss; ss += strlen(ss) + 1) { char *test = s; if (strstart(&test, ss) && (!*test || *test == ',')) { bit = 1<<bit; if (no) *flags &= ~bit; else *flags |= bit; s += 1+test-ss; break; } else bit++; } if (!*ss) { if (s != string) strcpy(string, s); string = strchr(string, ','); if (!string++) break; } } } void dd_main(void) { long long ibs = 512, obs = 512, skip = 0, seek = 0, count = -1; char *ifname = 0, *ofname = 0, *commalist = "swab\0error\0trunc\0sync\0"; unsigned commas = 6; int i, ifd = 0, ofd = 1; for (i=0; toys.optargs[i]; i++) { char *s = TT.arg = toys.optargs[i]; if (strstart(&s, "if=")) ifname = s; else if (strstart(&s, "of=")) ofname = s; else if (strstart(&s, "conv=")) { comma_collect(s, commalist, &commas); if (*s) goto fail; } else if (!dd_atolx("bs", &ibs, 1)) obs = ibs; else if (dd_atolx("ibs", &ibs, 1) && dd_atolx("obs", &obs, 1) && dd_atolx("skip", &skip, 0) && dd_atolx("seek", &seek, 0) && dd_atolx("count", &count, 0)) { fail: error_exit("bad %s", TT.arg); } } // Defer creating output file until we know there weren't parsing errors if (ofname) ofd = xcreate(ofname, O_CREAT|O_WRONLY, 0666); if (ifname) ifd = xopen(ifname, O_RDONLY); printf("parsed ibs=%lld obs=%lld skip=%lld\n", ibs, obs, skip); }
_______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
