bug#38627: uniq -c gets wrong count with non-ascii strings

2019-12-16 Thread Roy Smith
Yup, this does depend on the locale. In my original example, I had LANG=en_US.UTF-8. Setting it to C.UTF-8 gets me the right result: > $ LANG=C.UTF-8 uniq -c x > 1 "ⁿᵘˡˡ" > 1 "ܥܝܪܐܩ" But, that doesn't fully explain what's going on. I find it difficult to believe that there's

bug#38621: gdu showing different sizes

2019-12-16 Thread Bernhard Voelker
On 2019-12-16 20:43, TJ Luoma wrote: > AHA! Ok, now I understand a little better. I have seen the difference > between "size" and "size on disk" and did not realize that applied > here. Thanks for confirming. > I'm still not 100% clear on _why_ two "identical" files would have > different

bug#38621: gdu showing different sizes

2019-12-16 Thread Bob Proulx
TJ Luoma wrote: > AHA! Ok, now I understand a little better. I have seen the difference > between "size" and "size on disk" and did not realize that applied > here. > > I'm still not 100% clear on _why_ two "identical" files would have > different results for "size on disk" (it _seems_ like those

bug#38621: gdu showing different sizes

2019-12-16 Thread TJ Luoma
AHA! Ok, now I understand a little better. I have seen the difference between "size" and "size on disk" and did not realize that applied here. I'm still not 100% clear on _why_ two "identical" files would have different results for "size on disk" (it _seems_ like those should be identical) but I

bug#38627: uniq -c gets wrong count with non-ascii strings

2019-12-16 Thread Paul Eggert
On 12/15/19 11:40 AM, Roy Smith wrote: > With the following input: > >> $ cat x >> "ⁿᵘˡˡ" >> "ܥܝܪܐܩ" > > > Running "uniq -c" says there's two copies of the same line! > >> $ uniq -c x >> 2 "ⁿᵘˡˡ" Thanks for the bug report. I expect this is because GNU 'uniq' uses the equivalent of