git format-patch question

2012-12-11 Thread Assaf Gordon
Hello,

(picking up from a different thread)

Pádraig Brady wrote, On 12/06/2012 06:59 PM:
 Generally it's best to get git to send email
 or send around formats that git can apply directly,
 which includes commit messages and references new files etc.
 The handiest way to do that is:
 
   git format-patch --stdout -1 | gzip  numfmt.5.patch.gz

While working on my development branch, I commit small, specific changes, as so:
 [PATCH 1/6] numfmt: a new command to format numbers
 [PATCH 2/6] numfmt: change SI/IEC parameters to lowercase.
 [PATCH 3/6] numfmt: separate debug/devdebug options.
 [PATCH 4/6] numfmt: fix segfault when no numbers are found.
 [PATCH 5/6] numfmt: improve --field, add more tests.
 [PATCH 6/6] numfmt: add --header option.

Each commit can be just few lines.

When I send a patch the the mailing list, I want to send one 'nice' 'clean' 
patch with my changes, compared to the master branch.

When I use the following command:

   git diff -p --stat master..HEAD  my.patch

And all the changes (multiple commits) I made on my branch compared to master 
are represented as one coherent change in my.patch - but this is not 
convenient for you to apply.


However, when I use

git format-patch --stdout -1  my.patch

Only the last commit appears.

The alternative:

git format-patch --stdout master..HEAD  my.patch

Generates a file which will cause multiple commits when imported with git am .
 
When is the recommended way to generate a clean patch which will consolidate 
all my small commits into one?
Or is there another way?

Thanks,
 -gordon







Adding tests for non-C locales

2012-12-11 Thread Assaf Gordon
Hello,

I want to add tests for non-C locales (to check grouping in numfmt).

My test script is written in Perl, based on tests/misc/wc.pl .

It starts with:
===
   @ENV{qw(LANGUAGE LANG LC_ALL)} = ('C') x 3;
===

Which is fine for most of the tests.

How do I add tests for non-C locale, in a safe manner (I need a locale that I 
know which thousand-group separator character is used, but I can't know in 
advanced if it's installed on the testing machine).

Thanks,
 -gordon



Re: git format-patch question

2012-12-11 Thread Pádraig Brady

On 12/11/2012 04:17 PM, Assaf Gordon wrote:

Hello,

(picking up from a different thread)

Pádraig Brady wrote, On 12/06/2012 06:59 PM:

Generally it's best to get git to send email
or send around formats that git can apply directly,
which includes commit messages and references new files etc.
The handiest way to do that is:

   git format-patch --stdout -1 | gzip  numfmt.5.patch.gz


While working on my development branch, I commit small, specific changes, as so:
  [PATCH 1/6] numfmt: a new command to format numbers
  [PATCH 2/6] numfmt: change SI/IEC parameters to lowercase.
  [PATCH 3/6] numfmt: separate debug/devdebug options.
  [PATCH 4/6] numfmt: fix segfault when no numbers are found.
  [PATCH 5/6] numfmt: improve --field, add more tests.
  [PATCH 6/6] numfmt: add --header option.

Each commit can be just few lines.

When I send a patch the the mailing list, I want to send one 'nice' 'clean' 
patch with my changes, compared to the master branch.

When I use the following command:

git diff -p --stat master..HEAD  my.patch

And all the changes (multiple commits) I made on my branch compared to master are 
represented as one coherent change in my.patch - but this is not convenient 
for you to apply.


However, when I use

 git format-patch --stdout -1  my.patch

Only the last commit appears.

The alternative:

 git format-patch --stdout master..HEAD  my.patch

Generates a file which will cause multiple commits when imported with git am .

When is the recommended way to generate a clean patch which will consolidate 
all my small commits into one?
Or is there another way?


No need to squash to a single diff.
It's best to use the method I suggested,
and then git can apply all the patches in turn.
Just s/-1/-6/ in the command I suggested.
We can squash to the appropriate number of commits,
before the final commit anyway.

If you really want to squash patches locally
to hide trivial adjustments etc. then:

  git rebase -i HEAD~6

Then s/pick/squash/ where needed, and save.

Also it's best for consumers of the patch set,
that you rebase your branch onto the latest master like:

  git checkout master
  git pull origin
  git checkout my_branch
  git rebase master

cheers,
Pádraig.



Re: Adding tests for non-C locales

2012-12-11 Thread Pádraig Brady

On 12/11/2012 04:21 PM, Assaf Gordon wrote:

Hello,

I want to add tests for non-C locales (to check grouping in numfmt).

My test script is written in Perl, based on tests/misc/wc.pl .

It starts with:
===
@ENV{qw(LANGUAGE LANG LC_ALL)} = ('C') x 3;
===

Which is fine for most of the tests.

How do I add tests for non-C locale, in a safe manner (I need a locale that I 
know which thousand-group separator character is used, but I can't know in 
advanced if it's installed on the testing machine).


There is special handling of the fr_FR locale
to check that it's available etc.
To see uses of this in both sh and pl scripts:

git grep LOCALE_FR_UTF8

cheers,
Pádraig.



numfmt: locale/grouping input issue

2012-12-11 Thread Assaf Gordon
Hello,

(Continuing a previously discussed issue - accepting input values with locale 
grouping separators)

Pádraig Brady wrote, On 12/07/2012 01:09 PM:
 On 12/07/2012 03:07 PM, Assaf Gordon wrote:
 Another thing I thought of there, was it would be
 good to be able to parse number formats that it can generate:

 Sounds like two separate (but related) issues:

 $ echo '1,234' | src/numfmt --from=auto
 src/numfmt: invalid suffix in input '1,234': ',234'

 1. Is there already a gnulib function that can accept locale-grouped values? 
 can the xstrtoXXX functions handle that?
 
 I was thinking you would just strip out
 localeconv()-thousands_sep before parsing.

I couldn't find an example of a coreutil program that readily accepts locale'd 
input.
The while dots and commas (US/DE locales) are relatively easy to handle, in the 
french locale the separator is space - causing a conflict when assuming the 
default field separator is also white space.

Another complication is that just stripping out the 'thousands_sep' character 
would treat text such as 1,3,4,5,6 as valid number 13456 .

I would suggest at first not to accept locale'd input, or only offer partial 
support.
WDYT ?

Thanks,
 -gordon


Couple of examples:

   # Output is OK
   $ LC_ALL=fr_FR.utf8 ./src/printf %'d\n 1000
   1 000

   # Input is not valid
   $ LC_ALL=fr_FR.utf8 ./src/printf %'d\n 1 000
   ./src/printf: 1 000 : valeur non complètement convertie
   1

   # Sort can't handle locale'd input, treats the white-space as separator,
   #  not as thousand separator.
   $ printf 1 123\n1 000\n | LC_ALL=fr_FR.utf8 sort --debug -k1,1 
   sort: utilse les règles de tri « fr_FR.utf8 »
   sort: leading blanks are significant in key 1; consider also specifying 'b'
   1 000
   _
   _
   1 123
   _
   _






Re: numfmt: locale/grouping input issue

2012-12-11 Thread Pádraig Brady

On 12/11/2012 06:36 PM, Assaf Gordon wrote:

Hello,

(Continuing a previously discussed issue - accepting input values with locale 
grouping separators)


Pádraig Brady wrote, On 12/07/2012 01:09 PM:

On 12/07/2012 03:07 PM, Assaf Gordon wrote:

Another thing I thought of there, was it would be
good to be able to parse number formats that it can generate:


Sounds like two separate (but related) issues:


$ echo '1,234' | src/numfmt --from=auto
src/numfmt: invalid suffix in input '1,234': ',234'


1. Is there already a gnulib function that can accept locale-grouped values? can the 
xstrtoXXX functions handle that?


I was thinking you would just strip out
localeconv()-thousands_sep before parsing.


I couldn't find an example of a coreutil program that readily accepts locale'd 
input.
The while dots and commas (US/DE locales) are relatively easy to handle, in the 
french locale the separator is space - causing a conflict when assuming the 
default field separator is also white space.


True. You could only support that when --delimiter was not ' ',
or when LC_NUMERIC was set to one with a non space grouping char.


Another complication is that just stripping out the 'thousands_sep' character would treat text such 
as 1,3,4,5,6 as valid number 13456 .


Good point. You'd need to count as well as strip


I would suggest at first not to accept locale'd input, or only offer partial 
support.



# Input is not valid
$ LC_ALL=fr_FR.utf8 ./src/printf %'d\n 1 000
./src/printf: 1 000 : valeur non complètement convertie
1

# Sort can't handle locale'd input, treats the white-space as separator,
#  not as thousand separator.
$ printf 1 123\n1 000\n | LC_ALL=fr_FR.utf8 sort --debug -k1,1
sort: utilse les règles de tri « fr_FR.utf8 »
sort: leading blanks are significant in key 1; consider also specifying 'b'
1 000
_
_
1 123
_
_


So the above don't support localized number formats directly,
which is fair enough. That shows that the functionality is
useful within numfmt as it would enable the above to
use such numbers after being filtered through numfmt.
Implementation should not be too onerous, given the above caveats.

thanks,
Pádraig.



Re: numfmt: locale/grouping input issue

2012-12-11 Thread Pádraig Brady

On 12/11/2012 07:03 PM, Pádraig Brady wrote:

On 12/11/2012 06:36 PM, Assaf Gordon wrote:

Hello,

(Continuing a previously discussed issue - accepting input values with locale 
grouping separators)


Pádraig Brady wrote, On 12/07/2012 01:09 PM:

On 12/07/2012 03:07 PM, Assaf Gordon wrote:

Another thing I thought of there, was it would be
good to be able to parse number formats that it can generate:


Sounds like two separate (but related) issues:


$ echo '1,234' | src/numfmt --from=auto
src/numfmt: invalid suffix in input '1,234': ',234'


1. Is there already a gnulib function that can accept locale-grouped values? can the 
xstrtoXXX functions handle that?


I was thinking you would just strip out
localeconv()-thousands_sep before parsing.


I couldn't find an example of a coreutil program that readily accepts locale'd 
input.
The while dots and commas (US/DE locales) are relatively easy to handle, in the 
french locale the separator is space - causing a conflict when assuming the 
default field separator is also white space.


True. You could only support that when --delimiter was not ' ',
or when LC_NUMERIC was set to one with a non space grouping char.


Another complication is that just stripping out the 'thousands_sep' character would treat text such 
as 1,3,4,5,6 as valid number 13456 .


Good point. You'd need to count as well as strip


And the counting shouldn't really hardcode 3
and instead honor locale_conv()-grouping.
For example:

$ LC_ALL=ta_IN locale grouping
3;2

$ LC_ALL=ta_IN printf %'d\n 123456789
12,34,56,789

cheers,
Pádraig.



bug#13127: [PATCH] cut: use only one data strucutre

2012-12-11 Thread Cojocaru Alexandru
On Sun, 09 Dec 2012 21:45:03 +0100
Jim Meyering j...@meyering.net wrote:

 Thanks for the patch.
 This is large enough that you'll have to file a copyright assignment.
 For details, see the Copyright assignment section in the file
 named HACKING.

Fine.


 Have you considered performance in the common case?
 I suspect that a byte or field number larger than 1000 is
 not common.  That is why, in the FIXME comment above,
 I suggested to use an adaptive approach.  I had the feeling
 (don't remember if I profiled it) that testing a bit per
 input field would be more efficient than an in-range test.

Yes, it was the first thing I checked. And there's no performance loss.


 If you construct test cases and gather timing data, please do so
 in a reproducible manner and include details when you report back,
 so we can compare on different types of systems.

Here are my benchmarks:

OS:   Parabola GNU/linux-libre (linux-libre v3.6.8-1)
Compiler: GCC 4.7.2
Cflags:   -O2
LANG: C
CPU:  Intel Core Duo  (1.86 GHz) (L1 Cache 64KiB) (L2 Cache 2MiB)
Main memory:
 - Bank 0: DIMM DRAM Synchronous (1GiB) (width 64 bits)
 - Bank 1: DIMM DRAM Synchronous (1GiB) (width 64 bits)

NOTE: information gathered with `lshw'.


Summary (see the attached file for complete data):

### small ranges
cut-pre: 0:01.84
cut-post: 0:01.36
cut-split: 0:01.25

### bigger ranges
cut-pre: 0:11.74
cut-post: 0:09.20
cut-split: 0:07.91 ***

### fields
cut-pre: 0:02.90
cut-post: 0:02.68
cut-split: 0:02.85

### --output-delimiter
cut-pre: 0:02.90
cut-post: 0:02.74
cut-split: 0:02.80


NOTES:
 cut-pre is the current implementation and was compiled from commit ec48beadf.
 cut-post was compiled after applying the above patch to commit ec48beadf.
 cut-split was compiled after applying the `split-print_kth' patch to commit 
ec48beadf.


The main advantages cames from splitting `print_kth' into two
separate functions, so now `print_kth' does fewer checks.


Best regards,
Cojocaru Alexandru
OS:   Parabola GNU/linux-libre (linux-libre v3.6.8-1)
Compiler: GCC 4.7.2
Cflags:   -O2
LANG: C
CPU:  Intel Core Duo  (1.86 GHz) (L1 64KiB) (L2 2MiB)
Main memory:
 - Bank 0: DIMM DRAM Synchronous (1GiB) (width 64 bits)
 - Bank 1: DIMM DRAM Synchronous (1GiB) (width 64 bits)

NOTE: information gathered with `lshw'.


bash$ ./cut-pre 2 /dev/null # try not to count caching of shared libraries

### small ranges
bash$ for i in `seq 1 100`; do echo abcdfeg  big-file; done

bash$ for i in 1 2 3; do /usr/bin/time ./cut-pre -b1,3 big-file  /dev/null; 
echo ; done
1.72user 0.11system 0:01.84elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k
0inputs+0outputs (0major+168minor)pagefaults 0swaps

1.75user 0.08system 0:01.84elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k
0inputs+0outputs (0major+168minor)pagefaults 0swaps

1.76user 0.07system 0:01.84elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k
0inputs+0outputs (0major+167minor)pagefaults 0swaps


bash$ for i in 1 2 3; do /usr/bin/time ./cut-post -b1,3 big-file  /dev/null; 
echo; done
1.23user 0.12system 0:01.36elapsed 99%CPU (0avgtext+0avgdata 560maxresident)k
0inputs+0outputs (0major+165minor)pagefaults 0swaps

1.25user 0.09system 0:01.36elapsed 99%CPU (0avgtext+0avgdata 560maxresident)k
0inputs+0outputs (0major+165minor)pagefaults 0swaps

1.25user 0.09system 0:01.36elapsed 99%CPU (0avgtext+0avgdata 556maxresident)k
0inputs+0outputs (0major+164minor)pagefaults 0swaps


bash$ for i in 1 2 3; do /usr/bin/time ./cut-split -b1,3 big-file  /dev/null; 
echo ; done
1.15user 0.09system 0:01.25elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k
0inputs+0outputs (0major+168minor)pagefaults 0swaps

1.15user 0.08system 0:01.25elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k
0inputs+0outputs (0major+167minor)pagefaults 0swaps

1.14user 0.10system 0:01.25elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k
0inputs+0outputs (0major+167minor)pagefaults 0swaps


### bigger ranges
bash$ yes $(for i in $(seq 1 10); do echo -n a; done) | dd of=big-lines 
ibs=11 count=1 iflag=fullblock

bash$ for i in 1 2 3; do /usr/bin/time ./cut-pre -b50-100,101-105, 
big-lines  /dev/null; echo; done
11.01user 0.70system 0:11.74elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k
0inputs+0outputs (0major+168minor)pagefaults 0swaps

11.02user 0.70system 0:11.74elapsed 99%CPU (0avgtext+0avgdata 576maxresident)k
0inputs+0outputs (0major+169minor)pagefaults 0swaps

11.04user 0.66system 0:11.73elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k
0inputs+0outputs (0major+168minor)pagefaults 0swaps


bash$ for i in 1 2 3; do /usr/bin/time ./cut-post -b50-100,101-105, 
big-lines  /dev/null; echo; done
8.65user 0.52system 0:09.20elapsed 99%CPU (0avgtext+0avgdata 560maxresident)k
0inputs+0outputs (0major+165minor)pagefaults 0swaps

8.59user 0.58system 0:09.20elapsed 99%CPU (0avgtext+0avgdata 556maxresident)k
0inputs+0outputs (0major+164minor)pagefaults 0swaps

8.53user 0.65system 0:09.21elapsed 99%CPU (0avgtext+0avgdata 

bug#13144: comm bug or strange behaviour

2012-12-11 Thread Bob Proulx
Matteo Zambelli wrote:
 Hi, i was trying to find common lines between the two attached
 files(both created with dpkg --get-selections  filename.txt) with
 this command:
 
 comm -12 squeeze-xfce-installed_packages.txt 
 squeeze-xfce-installed_packages.txt  result.txt
 
 then i noticed that the line:
 
 libcdio10   install
 
 that is regularly present in both files, doesn't appear in the
 result(as well as few other lines).

Thank you for the report.  Also thank you for including the data sets
needed to reproduce the activity.  That's great.

Unfortunately I cannot reproduce your result.

  $ comm -12 squeeze-xfce-installed_packages.txt 
squeeze-xfce-installed_packages.txt | grep libcdio10
  libcdio10 install

And so there it is in the output??

 I have tried with option --nocheck-order and such, but those common
 lines are still missing in the output file.

Try 'sort -c' to check the sort ordering of both files.  (Looked okay
to me but it is locale dependent.)

What is your 'locale' setting?  (Affects sort order.)

  $ locale

In this case it shouldn't matter but what version of comm are you
using?  I know you reported your kernel version but that isn't really
associated.  (Think Squeeze with Backports.)

  $ comm --version

Verify that the 'comm' binary that you are running is the expected
one:

  $ type comm
  comm is hashed (/usr/bin/comm)

Bob