git format-patch question
Hello, (picking up from a different thread) Pádraig Brady wrote, On 12/06/2012 06:59 PM: Generally it's best to get git to send email or send around formats that git can apply directly, which includes commit messages and references new files etc. The handiest way to do that is: git format-patch --stdout -1 | gzip numfmt.5.patch.gz While working on my development branch, I commit small, specific changes, as so: [PATCH 1/6] numfmt: a new command to format numbers [PATCH 2/6] numfmt: change SI/IEC parameters to lowercase. [PATCH 3/6] numfmt: separate debug/devdebug options. [PATCH 4/6] numfmt: fix segfault when no numbers are found. [PATCH 5/6] numfmt: improve --field, add more tests. [PATCH 6/6] numfmt: add --header option. Each commit can be just few lines. When I send a patch the the mailing list, I want to send one 'nice' 'clean' patch with my changes, compared to the master branch. When I use the following command: git diff -p --stat master..HEAD my.patch And all the changes (multiple commits) I made on my branch compared to master are represented as one coherent change in my.patch - but this is not convenient for you to apply. However, when I use git format-patch --stdout -1 my.patch Only the last commit appears. The alternative: git format-patch --stdout master..HEAD my.patch Generates a file which will cause multiple commits when imported with git am . When is the recommended way to generate a clean patch which will consolidate all my small commits into one? Or is there another way? Thanks, -gordon
Adding tests for non-C locales
Hello, I want to add tests for non-C locales (to check grouping in numfmt). My test script is written in Perl, based on tests/misc/wc.pl . It starts with: === @ENV{qw(LANGUAGE LANG LC_ALL)} = ('C') x 3; === Which is fine for most of the tests. How do I add tests for non-C locale, in a safe manner (I need a locale that I know which thousand-group separator character is used, but I can't know in advanced if it's installed on the testing machine). Thanks, -gordon
Re: git format-patch question
On 12/11/2012 04:17 PM, Assaf Gordon wrote: Hello, (picking up from a different thread) Pádraig Brady wrote, On 12/06/2012 06:59 PM: Generally it's best to get git to send email or send around formats that git can apply directly, which includes commit messages and references new files etc. The handiest way to do that is: git format-patch --stdout -1 | gzip numfmt.5.patch.gz While working on my development branch, I commit small, specific changes, as so: [PATCH 1/6] numfmt: a new command to format numbers [PATCH 2/6] numfmt: change SI/IEC parameters to lowercase. [PATCH 3/6] numfmt: separate debug/devdebug options. [PATCH 4/6] numfmt: fix segfault when no numbers are found. [PATCH 5/6] numfmt: improve --field, add more tests. [PATCH 6/6] numfmt: add --header option. Each commit can be just few lines. When I send a patch the the mailing list, I want to send one 'nice' 'clean' patch with my changes, compared to the master branch. When I use the following command: git diff -p --stat master..HEAD my.patch And all the changes (multiple commits) I made on my branch compared to master are represented as one coherent change in my.patch - but this is not convenient for you to apply. However, when I use git format-patch --stdout -1 my.patch Only the last commit appears. The alternative: git format-patch --stdout master..HEAD my.patch Generates a file which will cause multiple commits when imported with git am . When is the recommended way to generate a clean patch which will consolidate all my small commits into one? Or is there another way? No need to squash to a single diff. It's best to use the method I suggested, and then git can apply all the patches in turn. Just s/-1/-6/ in the command I suggested. We can squash to the appropriate number of commits, before the final commit anyway. If you really want to squash patches locally to hide trivial adjustments etc. then: git rebase -i HEAD~6 Then s/pick/squash/ where needed, and save. Also it's best for consumers of the patch set, that you rebase your branch onto the latest master like: git checkout master git pull origin git checkout my_branch git rebase master cheers, Pádraig.
Re: Adding tests for non-C locales
On 12/11/2012 04:21 PM, Assaf Gordon wrote: Hello, I want to add tests for non-C locales (to check grouping in numfmt). My test script is written in Perl, based on tests/misc/wc.pl . It starts with: === @ENV{qw(LANGUAGE LANG LC_ALL)} = ('C') x 3; === Which is fine for most of the tests. How do I add tests for non-C locale, in a safe manner (I need a locale that I know which thousand-group separator character is used, but I can't know in advanced if it's installed on the testing machine). There is special handling of the fr_FR locale to check that it's available etc. To see uses of this in both sh and pl scripts: git grep LOCALE_FR_UTF8 cheers, Pádraig.
numfmt: locale/grouping input issue
Hello, (Continuing a previously discussed issue - accepting input values with locale grouping separators) Pádraig Brady wrote, On 12/07/2012 01:09 PM: On 12/07/2012 03:07 PM, Assaf Gordon wrote: Another thing I thought of there, was it would be good to be able to parse number formats that it can generate: Sounds like two separate (but related) issues: $ echo '1,234' | src/numfmt --from=auto src/numfmt: invalid suffix in input '1,234': ',234' 1. Is there already a gnulib function that can accept locale-grouped values? can the xstrtoXXX functions handle that? I was thinking you would just strip out localeconv()-thousands_sep before parsing. I couldn't find an example of a coreutil program that readily accepts locale'd input. The while dots and commas (US/DE locales) are relatively easy to handle, in the french locale the separator is space - causing a conflict when assuming the default field separator is also white space. Another complication is that just stripping out the 'thousands_sep' character would treat text such as 1,3,4,5,6 as valid number 13456 . I would suggest at first not to accept locale'd input, or only offer partial support. WDYT ? Thanks, -gordon Couple of examples: # Output is OK $ LC_ALL=fr_FR.utf8 ./src/printf %'d\n 1000 1 000 # Input is not valid $ LC_ALL=fr_FR.utf8 ./src/printf %'d\n 1 000 ./src/printf: 1 000 : valeur non complètement convertie 1 # Sort can't handle locale'd input, treats the white-space as separator, # not as thousand separator. $ printf 1 123\n1 000\n | LC_ALL=fr_FR.utf8 sort --debug -k1,1 sort: utilse les règles de tri « fr_FR.utf8 » sort: leading blanks are significant in key 1; consider also specifying 'b' 1 000 _ _ 1 123 _ _
Re: numfmt: locale/grouping input issue
On 12/11/2012 06:36 PM, Assaf Gordon wrote: Hello, (Continuing a previously discussed issue - accepting input values with locale grouping separators) Pádraig Brady wrote, On 12/07/2012 01:09 PM: On 12/07/2012 03:07 PM, Assaf Gordon wrote: Another thing I thought of there, was it would be good to be able to parse number formats that it can generate: Sounds like two separate (but related) issues: $ echo '1,234' | src/numfmt --from=auto src/numfmt: invalid suffix in input '1,234': ',234' 1. Is there already a gnulib function that can accept locale-grouped values? can the xstrtoXXX functions handle that? I was thinking you would just strip out localeconv()-thousands_sep before parsing. I couldn't find an example of a coreutil program that readily accepts locale'd input. The while dots and commas (US/DE locales) are relatively easy to handle, in the french locale the separator is space - causing a conflict when assuming the default field separator is also white space. True. You could only support that when --delimiter was not ' ', or when LC_NUMERIC was set to one with a non space grouping char. Another complication is that just stripping out the 'thousands_sep' character would treat text such as 1,3,4,5,6 as valid number 13456 . Good point. You'd need to count as well as strip I would suggest at first not to accept locale'd input, or only offer partial support. # Input is not valid $ LC_ALL=fr_FR.utf8 ./src/printf %'d\n 1 000 ./src/printf: 1 000 : valeur non complètement convertie 1 # Sort can't handle locale'd input, treats the white-space as separator, # not as thousand separator. $ printf 1 123\n1 000\n | LC_ALL=fr_FR.utf8 sort --debug -k1,1 sort: utilse les règles de tri « fr_FR.utf8 » sort: leading blanks are significant in key 1; consider also specifying 'b' 1 000 _ _ 1 123 _ _ So the above don't support localized number formats directly, which is fair enough. That shows that the functionality is useful within numfmt as it would enable the above to use such numbers after being filtered through numfmt. Implementation should not be too onerous, given the above caveats. thanks, Pádraig.
Re: numfmt: locale/grouping input issue
On 12/11/2012 07:03 PM, Pádraig Brady wrote: On 12/11/2012 06:36 PM, Assaf Gordon wrote: Hello, (Continuing a previously discussed issue - accepting input values with locale grouping separators) Pádraig Brady wrote, On 12/07/2012 01:09 PM: On 12/07/2012 03:07 PM, Assaf Gordon wrote: Another thing I thought of there, was it would be good to be able to parse number formats that it can generate: Sounds like two separate (but related) issues: $ echo '1,234' | src/numfmt --from=auto src/numfmt: invalid suffix in input '1,234': ',234' 1. Is there already a gnulib function that can accept locale-grouped values? can the xstrtoXXX functions handle that? I was thinking you would just strip out localeconv()-thousands_sep before parsing. I couldn't find an example of a coreutil program that readily accepts locale'd input. The while dots and commas (US/DE locales) are relatively easy to handle, in the french locale the separator is space - causing a conflict when assuming the default field separator is also white space. True. You could only support that when --delimiter was not ' ', or when LC_NUMERIC was set to one with a non space grouping char. Another complication is that just stripping out the 'thousands_sep' character would treat text such as 1,3,4,5,6 as valid number 13456 . Good point. You'd need to count as well as strip And the counting shouldn't really hardcode 3 and instead honor locale_conv()-grouping. For example: $ LC_ALL=ta_IN locale grouping 3;2 $ LC_ALL=ta_IN printf %'d\n 123456789 12,34,56,789 cheers, Pádraig.
bug#13127: [PATCH] cut: use only one data strucutre
On Sun, 09 Dec 2012 21:45:03 +0100 Jim Meyering j...@meyering.net wrote: Thanks for the patch. This is large enough that you'll have to file a copyright assignment. For details, see the Copyright assignment section in the file named HACKING. Fine. Have you considered performance in the common case? I suspect that a byte or field number larger than 1000 is not common. That is why, in the FIXME comment above, I suggested to use an adaptive approach. I had the feeling (don't remember if I profiled it) that testing a bit per input field would be more efficient than an in-range test. Yes, it was the first thing I checked. And there's no performance loss. If you construct test cases and gather timing data, please do so in a reproducible manner and include details when you report back, so we can compare on different types of systems. Here are my benchmarks: OS: Parabola GNU/linux-libre (linux-libre v3.6.8-1) Compiler: GCC 4.7.2 Cflags: -O2 LANG: C CPU: Intel Core Duo (1.86 GHz) (L1 Cache 64KiB) (L2 Cache 2MiB) Main memory: - Bank 0: DIMM DRAM Synchronous (1GiB) (width 64 bits) - Bank 1: DIMM DRAM Synchronous (1GiB) (width 64 bits) NOTE: information gathered with `lshw'. Summary (see the attached file for complete data): ### small ranges cut-pre: 0:01.84 cut-post: 0:01.36 cut-split: 0:01.25 ### bigger ranges cut-pre: 0:11.74 cut-post: 0:09.20 cut-split: 0:07.91 *** ### fields cut-pre: 0:02.90 cut-post: 0:02.68 cut-split: 0:02.85 ### --output-delimiter cut-pre: 0:02.90 cut-post: 0:02.74 cut-split: 0:02.80 NOTES: cut-pre is the current implementation and was compiled from commit ec48beadf. cut-post was compiled after applying the above patch to commit ec48beadf. cut-split was compiled after applying the `split-print_kth' patch to commit ec48beadf. The main advantages cames from splitting `print_kth' into two separate functions, so now `print_kth' does fewer checks. Best regards, Cojocaru Alexandru OS: Parabola GNU/linux-libre (linux-libre v3.6.8-1) Compiler: GCC 4.7.2 Cflags: -O2 LANG: C CPU: Intel Core Duo (1.86 GHz) (L1 64KiB) (L2 2MiB) Main memory: - Bank 0: DIMM DRAM Synchronous (1GiB) (width 64 bits) - Bank 1: DIMM DRAM Synchronous (1GiB) (width 64 bits) NOTE: information gathered with `lshw'. bash$ ./cut-pre 2 /dev/null # try not to count caching of shared libraries ### small ranges bash$ for i in `seq 1 100`; do echo abcdfeg big-file; done bash$ for i in 1 2 3; do /usr/bin/time ./cut-pre -b1,3 big-file /dev/null; echo ; done 1.72user 0.11system 0:01.84elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps 1.75user 0.08system 0:01.84elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps 1.76user 0.07system 0:01.84elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps bash$ for i in 1 2 3; do /usr/bin/time ./cut-post -b1,3 big-file /dev/null; echo; done 1.23user 0.12system 0:01.36elapsed 99%CPU (0avgtext+0avgdata 560maxresident)k 0inputs+0outputs (0major+165minor)pagefaults 0swaps 1.25user 0.09system 0:01.36elapsed 99%CPU (0avgtext+0avgdata 560maxresident)k 0inputs+0outputs (0major+165minor)pagefaults 0swaps 1.25user 0.09system 0:01.36elapsed 99%CPU (0avgtext+0avgdata 556maxresident)k 0inputs+0outputs (0major+164minor)pagefaults 0swaps bash$ for i in 1 2 3; do /usr/bin/time ./cut-split -b1,3 big-file /dev/null; echo ; done 1.15user 0.09system 0:01.25elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps 1.15user 0.08system 0:01.25elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps 1.14user 0.10system 0:01.25elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps ### bigger ranges bash$ yes $(for i in $(seq 1 10); do echo -n a; done) | dd of=big-lines ibs=11 count=1 iflag=fullblock bash$ for i in 1 2 3; do /usr/bin/time ./cut-pre -b50-100,101-105, big-lines /dev/null; echo; done 11.01user 0.70system 0:11.74elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps 11.02user 0.70system 0:11.74elapsed 99%CPU (0avgtext+0avgdata 576maxresident)k 0inputs+0outputs (0major+169minor)pagefaults 0swaps 11.04user 0.66system 0:11.73elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps bash$ for i in 1 2 3; do /usr/bin/time ./cut-post -b50-100,101-105, big-lines /dev/null; echo; done 8.65user 0.52system 0:09.20elapsed 99%CPU (0avgtext+0avgdata 560maxresident)k 0inputs+0outputs (0major+165minor)pagefaults 0swaps 8.59user 0.58system 0:09.20elapsed 99%CPU (0avgtext+0avgdata 556maxresident)k 0inputs+0outputs (0major+164minor)pagefaults 0swaps 8.53user 0.65system 0:09.21elapsed 99%CPU (0avgtext+0avgdata
bug#13144: comm bug or strange behaviour
Matteo Zambelli wrote: Hi, i was trying to find common lines between the two attached files(both created with dpkg --get-selections filename.txt) with this command: comm -12 squeeze-xfce-installed_packages.txt squeeze-xfce-installed_packages.txt result.txt then i noticed that the line: libcdio10 install that is regularly present in both files, doesn't appear in the result(as well as few other lines). Thank you for the report. Also thank you for including the data sets needed to reproduce the activity. That's great. Unfortunately I cannot reproduce your result. $ comm -12 squeeze-xfce-installed_packages.txt squeeze-xfce-installed_packages.txt | grep libcdio10 libcdio10 install And so there it is in the output?? I have tried with option --nocheck-order and such, but those common lines are still missing in the output file. Try 'sort -c' to check the sort ordering of both files. (Looked okay to me but it is locale dependent.) What is your 'locale' setting? (Affects sort order.) $ locale In this case it shouldn't matter but what version of comm are you using? I know you reported your kernel version but that isn't really associated. (Think Squeeze with Backports.) $ comm --version Verify that the 'comm' binary that you are running is the expected one: $ type comm comm is hashed (/usr/bin/comm) Bob