Long overdue, cleaning up du.c (which predates the pending directory).

http://landley.net/hg/toybox/rev/1010 (and http://127.0.0.1/hg/toybox/rev/1011).

Added [-HL] and such to the end of the option string so last option of a given type wins, and fiddled with the help text.

The usual whitespace cleanup: end of line comments (especially that go way past 80 chars) moved to the line before. /* multiline comments */ that are actually only one line converted to // comments. Space after flow control
statements that look like functions but aren't, ala if() -> if (). Blank
line between declarations and the rest of the code block. Moving else after
the curly bracket when there is one. In GLOBALS have the variables the
option parser fills out one per line, with a blank line between them and
other globals (and those others collated by type since order isn't important
there the way it is with the option variables).

The linux kernel guys have a rule against typedefs, which I find convincing.

(Also, "char* blah;" is wrong, it's "char *blah;" because "char* one, two;"
doesn't make "two" a pointer. Putting that space in the wrong place
implies they don't understand how declarations work in C. Think about
"char* one,* two;" vs "int x, *y;"...)

print:

Every user of the print function except one is doing dirtree_path(node, NULL) which returns a malloc() that's never freed. Given that we're iterating over an arbitrary filesystem (potentially including things like /proc and /sys or network mounts not locally stored), this is leaking an unbounded amount of memory. So I swapped it to take node as an argument, do its own
  dirtree_path() and free the result, and print "total" if fed NULL.

Um, and in make_human_readable... 1<<64 is 18 exabytes. This has yotta and zeta but they can't be represented in a long long? (Each 1024 is 10 bits, we
  have 64 bits, it has 9 characters of unit string...) Right, du is the
only user of that function so inline _it_ too... (This turned into a more
  or less rewrite of the print() function.)

Ok, let's look at functions only ever used once (make_pathproper,
llist_add_inode, is_node_present) and see if they can be inlined...

make_pathproper:

  This function just appends / to "." and ".." entries. Adding
if (node->parent) before the dirtree_notdotdot() test in do_du() removes
  the need for it.

The other two are only ever used in do_du(), so let's clean that up...

do_du:

  Don't recurse if we crossed device boundary on -x, none of the things
  _under_ it should be on the old device. So return 0 in that case.

Both llist_add_node and is_node_present are only used from here. I combined
  them into seen_inode() which takes a stat pointer and returns whether
or not it had already seen the inode. (If it hadn't, it adds it to the list. If the stat pointer is NULL it flushes its cache so the old free_inode can go away too.) The list is using two allocations to store 3 fields: a next pointer, an inode, and a dev_t. That can combined into one allocation.

You can't hardlink directories, so I made it check S_ISDIR(). This isn't QUITE right because you can bind mount directories, and the gnu/dammit version of du skipse bind mounted directories. But A) that adds more inodes to remember, since directory link count is usually >1, B) other users of this logic (tar, cipo, genext2fs) couldn't restore directory hardlinks so
  detecting them wouldn't help. (It's a judgement call...)

  No need to use ->extra to flag the second call from comeagain, that's
  already indicated by ->data being -1. Freeing up that field means we
don't need struct node_size to glue another ->extra onto the end of struct dirtree, although on 32 bit systems we can't calculate a du total longer than 2 gigs so possibly we should be doing that _anyway_, but with a size_t?

  I rewrote the whole S_ISDIR() block, taking into account that ->extra
can now just store the size (in blocks, so we overflow less on 32 bits, although that's still only a max size of 1 terabyte because 32 bit signed long is 2 billion, * 512...) Got rid of TT.dirsum and just had nodes update
  their ->parent->extra to pass data back up the tree.

in du_main:

if toys.optc == 0 it assigns toys.optargs[0] = "./", which can now be "." since make_pathproper is gone, but there's a problem: if argc was zero get_optflags() in lib/args.c will allocate an array of length 1 for this.
  We then assign to that one and only allocated entry, and then do a
  while (*toys.optargs++) and fall off the end into unallocated space
  which _may_ contain a null terminator, but only by accident.

So do char **noargs = {".", 0}; and then swap to that for default args.

No need to set TT.maxdepth to INT_MAX, it defaults to 0, just have print() check for 0. Similarly, no need to initialize TT.total and TT.inodes to 0. And if TT.depth isn't 0 after a previous call to dirtree_add_node we screwed something up. Moved symfollow inline, and broken the "declaration = value" to two lines. (Same net number of lines, but easier to follow this way.)

Still one todo item: feed print() blocks instead of bytes. Also, du could use a test suite. (I did a lot of testing against the toybox source itself, but that's not stable enough to use in the test suite. I'd have to create a directory of files for it to du over, probably with "truncate".)

Also: the gnu/dammit du uses 1024 byte blocks by default unless you export POSIXLY_CORRECT. Busybox defconfig also does 1024, you can swich it off at config time but not runtime. My compromise was to default to 1024 (and accept but ignore -k), but add a -K flag to use 512 byte blocks so you can get the posix behavior with an alias.

Oh, and our rounding for -h differs in places from the gnu/dammit du. (We sometimes say 1.9 when they say 2.0, etc). Meanwhile busybox is all over the place, doesn't even match with -m...

Rob
_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to