Re: numfmt (=print 'human' sizes) updates
I've squashed what we have now to: http://www.pixelbeat.org/patches/coreutils/numfmt-squashed.patch.xz which is unchanged from the previous except for a bug fix in my negative handling, and further tweaks to TODO items. tests, clang-analyzer, valgrind are Ok with it at least. I'll push the patch in the morning. thanks, Pádraig.
Re: numfmt (=print 'human' sizes) updates
Hello, The attached patch adds 'numfmt' to the coreutils documentation. Regards, -gordon numfmt.12.patch.xz Description: application/xz
Re: numfmt (=print 'human' sizes) updates
Hello, Assaf Gordon wrote, On 12/26/2012 05:40 PM: Attached is an updated numfmt, with the following two changes: 1. --format support 2. optionally ignoring input errors. The attached patch (incremental to the above full patch) adds few more tests, and fixes 4 issues found with the clang static analyzer. There's no change in functionality. -gordon numfmt.11.patch.xz Description: application/xz
Re: numfmt (=print 'human' sizes) updates
Hello, Pádraig Brady wrote, On 12/21/2012 12:42 PM: I'm starting to think the original idea of having a --format option would be more a more concise, well known and extendible interface rather than having --padding, --grouping, --base, ... It wouldn't have to be passed directly to printf, and could be parsed and preprocessed something like we do in seq(1). Regarding 'format' option, there are some intricacies that are worth discussing: 1. Depending on the requested conversion, the output can be a string (e.g. 1.4Ki) or a long double (e.g. 140). 2. Internally, the program uses long doubles - so the real format is %Lf - regardless of what the user will give (e.g. %f). 3. printf accepts all sorts of options, some of which aren't relevant to numfmt, or only relevant when printing non-humanized values. e.g.: $ LC_ALL=en_US.utf8 seq -f %0'14.5f 1000 1001 0001,000.0 0001,001.0 4. The assumption was that humanized numbers are always maximum 4 characters in SI/IEC (e.g. 1024 or 4.5M) or 5 characters with iec-i (e.g. 999Ti). With the new 'format', if given %'2.9f - should the output be still 4 characters (e.g. 4.5T), or respect the .9 format (e.g. 4.5T) ? and does the suffix character counts in the 2.9 format ? My preference is to keep things simple, and accept just a limited subset of the format syntax: 1. grouping (the ' character) 2. padding (the number after '%' and before the 'f' 3. alignment (optional '-' after '%') 4. Any prefix/suffix before/after the '%' option. 5. Accept just %f, but internally treat it as '%s' or '%Lf', depending on the output. All other options will be silently ignored, or trigger errors. Example: $ numfmt --format xx%20fxx --to=si 5000 [[ internally, treats as --padding 20 ]] xx5.0Kxx $ numfmt --format xx%'-10fxx 5000 [[ internally, treats as --padding -10 --grouping ]] xx5,000 xx $ numfmt --format xx%0#'+010llfxx 5000 [[ reject as 'too complicated' / unsupported printf options ]] WDYT? -gordon
Re: numfmt (=print 'human' sizes) updates
On 12/26/2012 03:28 PM, Assaf Gordon wrote: Hello, Pádraig Brady wrote, On 12/21/2012 12:42 PM: I'm starting to think the original idea of having a --format option would be more a more concise, well known and extendible interface rather than having --padding, --grouping, --base, ... It wouldn't have to be passed directly to printf, and could be parsed and preprocessed something like we do in seq(1). Regarding 'format' option, there are some intricacies that are worth discussing: 1. Depending on the requested conversion, the output can be a string (e.g. 1.4Ki) or a long double (e.g. 140). I would expect the numeric format applies to the number portion only, so there should be little confusion there. 2. Internally, the program uses long doubles - so the real format is %Lf - regardless of what the user will give (e.g. %f). Yes. There is a long_double_format() function in seq.c that converts from specified format to L.. internally. 3. printf accepts all sorts of options, some of which aren't relevant to numfmt, or only relevant when printing non-humanized values. e.g.: $ LC_ALL=en_US.utf8 seq -f %0'14.5f 1000 1001 0001,000.0 0001,001.0 4. The assumption was that humanized numbers are always maximum 4 characters in SI/IEC (e.g. 1024 or 4.5M) or 5 characters with iec-i (e.g. 999Ti). With the new 'format', if given %'2.9f - should the output be still 4 characters (e.g. 4.5T), or respect the .9 format (e.g. 4.5T) ? and does the suffix character counts in the 2.9 format ? I would expect that specifying a precision after the '.' with --to={si,iec,iec-i} would override any default precision, and that the width for the field would adjusted accordingly. The suffix character would be significant to the field/padding width rather than the precision (similarly to the first point that format only directly applies to the numeric portion). So: $ printf %s\n 94500 95000 | numfmt --to=iec 94.5K 95.0K $ printf %s\n 94500 95000 | numfmt --to=iec-i 94.5Ki 95.0Ki $ printf %s\n 94500 95000 | numfmt --to=iec-i --format='[%f]' # Keep default of %7.1Lf for --to=iec-i [ 94.5Ki] [ 95.0Ki] $ printf %s\n 94500 95000 | numfmt --to=iec-i --format='[%g]' # %g - %7Lg for --to=iec-i (adjusted for padding) [ 94.5Ki] [ 95 Ki] $ printf %s\n 94500 95000 | numfmt --to=iec-i --format='[%10.f]' # Any padding or precision overrides defaults [ 94Ki] [ 95Ki] $ printf %s\n 94500 95000 | numfmt --to=iec-i --format='%.f ' # Common would be to strip decimals and add a space 94 Ki 95 Ki My preference is to keep things simple, and accept just a limited subset of the format syntax: 1. grouping (the ' character) 2. padding (the number after '%' and before the 'f' 3. alignment (optional '-' after '%') 4. Any prefix/suffix before/after the '%' option. 5. Accept just %f, but internally treat it as '%s' or '%Lf', depending on the output. The above are consistent with my examples I think. All other options will be silently ignored, or trigger errors. Example: $ numfmt --format xx%20fxx --to=si 5000 [[ internally, treats as --padding 20 ]] xx5.0Kxx $ numfmt --format xx%'-10fxx 5000 [[ internally, treats as --padding -10 --grouping ]] xx5,000 xx $ numfmt --format xx%0#'+010llfxx 5000 [[ reject as 'too complicated' / unsupported printf options ]] Too complicated/unsupported might just completely override our defaults and padding etc. and be passed to printf? I.E. follow the it's better to ask forgiveness than permission idiom, i.e. use as much logic below you as possible. $ echo | numfmt --format=xx%0#'+010llfxx\n xx+1,111.00xx cheers, Pádraig.
Re: numfmt (=print 'human' sizes) updates
On 12/14/2012 08:53 AM, Pádraig Brady wrote: On 12/14/2012 05:38 AM, Assaf Gordon wrote: Hello, Attached is a slightly improved patch - minor code changes, and many more tests. Line coverage is 98%, and branch coverage is now 93% , and most of the non-covered branches are simply unreachable (I'm checking the reachable ones). The comments below still apply. It's looking like a really useful cohesive command. I verified all the new tests pass and syntax check is fine too. Your attention to detail is _much_ appreciated. I'll note here some _future_ TODO items, which don't need to be added for the initial version: 1. Parse numbers containing grouping separators. 2. Have --fields support multiple values. I did some basic perf testing and verified no leaks with: $ yes 1 2 | pv /dev/null ^C09MB 0:00:03 [ 102MB/s] [ = ] $ yes 1 2 | (ulimit -v 2; src/numfmt --field 2 --to=iec --from=auto) | pv /dev/null ^C.9MB 0:00:11 [6.32MB/s] [ = ] $ yes 1 2 | head -n10 | valgrind src/numfmt --field 2 --to=iec --from=auto 21 | grep leaks ==23446== All heap blocks were freed -- no leaks are possible I notice the UNIT options: seciont is mis-formatted by help2man. I notice use of fprintf (stderr, ...) where error (0, 0, ...) should be used. indent -nut src/numfmt.c makes lots of changes, most valid. So on to some usage... This shows auto padding works, with before and after displayed through tee. $ df | head -n3 | tee (src/numfmt --header --field=2 --to=iec) Filesystem 1K-blocks Used Available Use% Mounted on rootfs13102208 12544324424816 97% / udev 1454732 0 1454732 0% /dev Filesystem 1K-blocks Used Available Use% Mounted on rootfs 13M 12544324424816 97% / udev 1.4M 0 1454732 0% /dev This shows padding increases the field width as required in edge cases: $ src/df /home | tee (src/numfmt --header --field=2 --from-unit=1024 --to-unit=1 --group) Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb1 100791728 71565732 24105996 75% /home Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb1 103,210,729,472 71565732 24105996 75% /home I notice the unit on both --to=iec and --to=si is the same. I was thinking that --to=iec would add a 'i' to distinguish? $ /bin/ls -l | head -n4 | src/numfmt --to=iec --field=5 total 7252 -rw-rw-r--. 1 padraig padraig 92K Aug 23 2011 ABOUT-NLS -rw-rw-r--. 1 padraig padraig 49K Dec 21 16:22 aclocal.m4 -rw-rw-r--. 1 padraig padraig3.6K Dec 21 16:15 AUTHORS I notice we exit immediately with invalid numbers. How about printing the warning, outputting the unparsable number as is and continuing? $ printf %s\n 1 one 2 | src/numfmt 1 src/numfmt: invalid number: 'one' As suggested missing fields are ignored now... $ echo -ne 1\n2 2\n | src/numfmt --field=2 1 2 2 ...and we get warnings with --debug. However they mix awkwardly with the output. Perhaps delay the warning until line causing the warning is displayed. $ echo -ne 1\n2 2\n | src/numfmt --field=2 --debug src/numfmt: no conversion option specified 1src/numfmt: input line is too short, no numbers found to convert in field 2 2 2 --devdebug should probably renamed to ---devdebug I'm starting to think the original idea of having a --format option would be more a more concise, well known and extendible interface rather than having --padding, --grouping, --base, ... It wouldn't have to be passed directly to printf, and could be parsed and preprocessed something like we do in seq(1). thanks! Pádraig.
Re: numfmt (=print 'human' sizes) updates
Hello Pádraig, Thanks for the review and the feedback. Pádraig Brady wrote, On 12/21/2012 12:42 PM: It's looking like a really useful cohesive command. The attached patch addresses the following issues: 1. changed the ---devdebug option 2. incorporated most of 'indent -nut' recommendations. 3. improved the usage string, to generate somewhat better man page. (help2man is a bit finicky about formatting, so it's not optimal). 4. i suffix with iec input/output: I understand the reason for always adding i, as it is The Right Thing to do. But I think some (most?) people still treat single-letter suffix (K/M/G/T/etc.) as a valid suffix for both SI and IEC, and deduce the scale from the context of whatever they're working on. Forcing them to use Ki might be too obtrusive. It could also mess-up automated scripted, when IEC scale is needed, but only with single-letter suffix (and there are many programs like that). As a compromise, I've added yet another scale option: ieci . When used with --from=ieci, a two-letter suffix is required. When used with --to=ieci, i will always be appended. Examples: $ ./src/numfmt --to=iec 4096 4.0K $ ./src/numfmt --to=ieci 4096 4.0Ki $ ./src/numfmt --from=iec 4K 4096 $ ./src/numfmt --from=ieci 4Ki 4096 $ ./src/numfmt --from=auto 4Ki 4096 $ ./src/numfmt --from=auto 4K 4000 # 'ieci' requires the 'i': $ ./src/numfmt --from=ieci 4K ./src/numfmt: missing 'i' suffix in input: '4K' (e.g Ki/Mi/Gi) # 'iec' does not accept 'i': $ ./src/numfmt --from=iec 4Ki ./src/numfmt: invalid suffix in input '4Ki': 'i' I hope this cover all the options, while maintaining consistent and expected behavior. (Optionally, we can change iec to behave like ieci, and rename ieci to something else). I will send --format and error message updates soon. Regards, -gordon numfmt.9.patch.xz Description: application/xz
Re: numfmt (=print 'human' sizes) updates
On 12/22/2012 12:51 AM, Assaf Gordon wrote: Hello Pádraig, Thanks for the review and the feedback. Pádraig Brady wrote, On 12/21/2012 12:42 PM: It's looking like a really useful cohesive command. The attached patch addresses the following issues: 1. changed the ---devdebug option 2. incorporated most of 'indent -nut' recommendations. 3. improved the usage string, to generate somewhat better man page. (help2man is a bit finicky about formatting, so it's not optimal). Yes it's awkward to work around the help2man constraints. 4. i suffix with iec input/output: I understand the reason for always adding i, as it is The Right Thing to do. But I think some (most?) people still treat single-letter suffix (K/M/G/T/etc.) as a valid suffix for both SI and IEC, and deduce the scale from the context of whatever they're working on. Forcing them to use Ki might be too obtrusive. It could also mess-up automated scripted, when IEC scale is needed, but only with single-letter suffix (and there are many programs like that). As a compromise, I've added yet another scale option: ieci . When used with --from=ieci, a two-letter suffix is required. When used with --to=ieci, i will always be appended. Examples: $ ./src/numfmt --to=iec 4096 4.0K $ ./src/numfmt --to=ieci 4096 4.0Ki $ ./src/numfmt --from=iec 4K 4096 $ ./src/numfmt --from=ieci 4Ki 4096 $ ./src/numfmt --from=auto 4Ki 4096 $ ./src/numfmt --from=auto 4K 4000 # 'ieci' requires the 'i': $ ./src/numfmt --from=ieci 4K ./src/numfmt: missing 'i' suffix in input: '4K' (e.g Ki/Mi/Gi) # 'iec' does not accept 'i': $ ./src/numfmt --from=iec 4Ki ./src/numfmt: invalid suffix in input '4Ki': 'i' I hope this cover all the options, while maintaining consistent and expected behavior. (Optionally, we can change iec to behave like ieci, and rename ieci to something else). I like the 'ieci' compromise. Also because it has two 'i's in it, there will be less confusion as to which is which between 'si' and 'ieci'. I suppose an alternative to 'ieci' is 'ibi', but I slightly prefer your 'ieci'. I will send --format and error message updates soon. thanks! Pádraig.
Re: numfmt (=print 'human' sizes) updates
On 12/21/2012 07:07 PM, Pádraig Brady wrote: As a compromise, I've added yet another scale option: ieci . When used with --from=ieci, a two-letter suffix is required. When used with --to=ieci, i will always be appended. That's a little hard to read. Would --from=iec-i be any easier, where we are separating the IEC acronym from the output of 'i'? -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: numfmt (=print 'human' sizes) updates
Hello, Attached is a first shot at documenting 'numfmt' . Comments are welcomed, -gordon numfmt_doc.patch.gz Description: GNU Zip compressed data
Re: numfmt (=print 'human' sizes) updates
On 12/14/2012 05:38 AM, Assaf Gordon wrote: Hello, Attached is a slightly improved patch - minor code changes, and many more tests. Line coverage is 98%, and branch coverage is now 93% , and most of the non-covered branches are simply unreachable (I'm checking the reachable ones). The comments below still apply. wow great stuff. Assaf Gordon wrote, On 12/13/2012 01:02 AM: Most of the previously raised issues have been addressed, except handling locale'd grouping in the input numbers (locale'd decimal-point is handled correctly). Cool, we can address that if needed later. Added support for header, auto-whitespace-padding, floating-point input . Internally, all values are now stored as long double (instead of previously uintmax_t) - enables working with Yotta-scale values. The following should now 'just work' : df | ./src/numfmt --header --field 2 --to=si ls -l | ./src/numfmt --header --field 5 --to=iec ls -lh | ./src/numfmt --header --field 5 --from=iec --padding=10 The --debug option now behaves more like sort's --debug: prints messages to STDERR about possible bad combinations and inputs (which are not fatal errors): $./src/numfmt --debug 6 ./src/numfmt: no conversion option specified 6 Maybe --verbose would be more appropriate, if this option is needed at all. sort is exceedingly complex whereas hopefully numfmt will not. The --devdebug option can be used to show internal states (perhaps will be removed once the program is finalized?). Yes we may remove, or at least not document. The test file 'tests/misc/numfmt.pl' contains many more tests and details about possible inputs/outputs. If the functionality is acceptable, the next steps are cleaner code and better documentations. I hope to review this, this weekend. thanks! Pádraig.
Re: numfmt (=print 'human' sizes) updates
On 12/14/2012 01:53 AM, Pádraig Brady wrote: The --devdebug option can be used to show internal states (perhaps will be removed once the program is finalized?). Yes we may remove, or at least not document. If you choose to not document it, then name it '---devdebug' (yes, with three dashes), so that it doesn't interfere with any use of '--d' as an unambiguous prefix of a documented long option starting with d. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: numfmt (=print 'human' sizes) updates
Hello, Attached is a slightly improved patch - minor code changes, and many more tests. Line coverage is 98%, and branch coverage is now 93% , and most of the non-covered branches are simply unreachable (I'm checking the reachable ones). The comments below still apply. - gordon Assaf Gordon wrote, On 12/13/2012 01:02 AM: Most of the previously raised issues have been addressed, except handling locale'd grouping in the input numbers (locale'd decimal-point is handled correctly). Added support for header, auto-whitespace-padding, floating-point input . Internally, all values are now stored as long double (instead of previously uintmax_t) - enables working with Yotta-scale values. The following should now 'just work' : df | ./src/numfmt --header --field 2 --to=si ls -l | ./src/numfmt --header --field 5 --to=iec ls -lh | ./src/numfmt --header --field 5 --from=iec --padding=10 The --debug option now behaves more like sort's --debug: prints messages to STDERR about possible bad combinations and inputs (which are not fatal errors): $./src/numfmt --debug 6 ./src/numfmt: no conversion option specified 6 The --devdebug option can be used to show internal states (perhaps will be removed once the program is finalized?). The test file 'tests/misc/numfmt.pl' contains many more tests and details about possible inputs/outputs. If the functionality is acceptable, the next steps are cleaner code and better documentations. numfmt.8.patch.xz Description: application/xz