Re: sort (-g) [offtopic]

2018-02-27 Thread Ionel Mugurel Ciobîcă
On 27-02-2018, at 07h 31'11", David Wright wrote about "Re: sort (-g) 
[offtopic]"
> Yes, you need to read §3.4.2.8 over again:
> [...]
> IOW you should write a file containing
> 
> I i
> II ii
> etc.
> 
> and feed it to -s.

I can do that. 

> 
> > Besides, how this msort will work in a pipe, when I have to sort by
> > date, things like 3-V-2017 and 17-IX-2016, I can't find the equivalent
> > of -k from sort into this msort.
> > 

I could use -n option of msort in the same way as I use now -k from
sort.


Thank you.

Ionel



Re: sort (-g) [offtopic]

2018-02-27 Thread Ionel Mugurel Ciobîcă
On 27-02-2018, at 08h 36'51", Greg Wooledge wrote about "Re: sort (-g) 
[offtopic]"
> > Did I miss anything?
> 
> Well, this program certainly is ... unusual.  Doesn't just *work* by
> default.  No examples in the man page.  Anyway, it looks like you
> forgot to specify numeric comparison.
> 
> wooledg:~$ printf %s\\n III IX IV V X VII VI VIII | msort -q -w -l -c N -y any
> III
> IV
> V
> VI
> VII
> VIII
> IX
> X

Yes, you mean the -c. I supposed that if I use the -y option then -c
would be implied. But you are right that this is an unusual piece of
code...


Ionel



Re: sort (-g) [offtopic]

2018-02-27 Thread Greg Wooledge
On Tue, Feb 27, 2018 at 09:48:57AM +0100, Ionel Mugurel Ciobîcă wrote:
> # cat roman | msort -q -w -l -y ROMAN
> I
> II
> III
> IV
> IX
> V
> VI
> VII
> VIII
> X
> XI
> XII

> Did I miss anything?

Well, this program certainly is ... unusual.  Doesn't just *work* by
default.  No examples in the man page.  Anyway, it looks like you
forgot to specify numeric comparison.

wooledg:~$ printf %s\\n III IX IV V X VII VI VIII | msort -q -w -l -c N -y any
III
IV
V
VI
VII
VIII
IX
X

Here's what I mean by "doesn't just work by default":

wooledg:~$ printf %s\\n III IX IV V X VII VI VIII | msort -c N -y any
Comparison type specified without previous key selector (-e, -n, -t, or -w).

What, you don't just read lines by default?  And you don't take standard
sort(1)'s -k key specifier.  OK, fine, here's a -w for you:

wooledg:~$ printf %s\\n III IX IV V X VII VI VIII | msort -w -c N -y any
Sorting on whole record.
Increasing numeric string
Reading from stdin.
Records processed:  0
There's no point in sorting fewer than two records.0

Lolwut?  Who came up with this user interface?  Do you even read standard
input by default?  No "standard" in the man page.  No "stdin" in the man
page.  No examples in the man page.  No default behavior that works.  OK,
I guess it does say "Reading from stdin" in that error output, but why
didn't you say that in the manual?  Jeez.

I ended up googling "msort example" which was only slightly helpful.



Re: sort (-g) [offtopic]

2018-02-27 Thread David Wright
On Tue 27 Feb 2018 at 09:48:57 (+0100), Ionel Mugurel Ciobîcă wrote:
> On 19-02-2018, at 03h 23'27", Will Mengarini wrote about "Re: sort (-g) 
> [offtopic]"
> > * Ionel Mugurel Ciobica <i.m.ciob...@upcmail.nl> [18-02/18=Su 16:55 +0100]:
> > > [... How can something like
> > > "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nX\nIX"
> > > [be sorted?  ...]
> > 
> > See `aptitude show msort`; it probably does what you need.
> 
> 
> I can't see how that would work, I have read the manual and tried
> almost all options. At best msort is still behaves as sort, placing IX
> in between IV and V:
> 
> # cat roman
> X
> III
> II
> XI
> IV
> V
> VI
> VIII
> VII
> IX
> XII
> I
> 
> # cat roman | msort -q -w -l -y ROMAN
> I
> II
> III
> IV
> IX
> V
> VI
> VII
> VIII
> X
> XI
> XII
> 
> # cat roman | sort 
> I
> II
> III
> IV
> IX
> V
> VI
> VII
> VIII
> X
> XI
> XII
> 
> Did I miss anything?

Yes, you need to read §3.4.2.8 over again:

"If the argument to the -c option begins with m or M msort will treat
the key as the name of a month. If the -s option is also used with
this key and its argument is the name of a file, month names will be
read from the file. The file should have the same format as a sort
order specification file. All entries on the same line will be given
the same sort rank. The sort rank will follow the order of the
lines. This approach permits the use of calendars with more than
twelve months. It also allows multiple abbreviations or names for the
same month."

IOW you should write a file containing

I i
II ii
etc.

and feed it to -s.

> Besides, how this msort will work in a pipe, when I have to sort by
> date, things like 3-V-2017 and 17-IX-2016, I can't find the equivalent
> of -k from sort into this msort.
> 
> What I need is something like sort -t- -k3,3n -k2,2m -k1,1n, where m
> would ideally be the sorting of Roman numerals (or the months as Roman
> numerals)...
> 
> Is there a way to add this extension to sort? Like it is right now:
> 
> | --sort=WORD
> |   sort according to WORD: general-numeric -g,
> | human-numeric -h, month -M, numeric -n, random -R, version -V
> 
> to add "roman numerals -m"
> and to be able to specify a file with (all) roman numerals in their
> proper order and ask sort to order using that 'dictionary'?

Precisely the method that I've quoted above.

> I am willing to try to add this to sort if anyone can point me in the
> right direction...

Cheers,
David.



Re: sort (-g) [offtopic]

2018-02-27 Thread Ionel Mugurel Ciobîcă
On 19-02-2018, at 03h 23'27", Will Mengarini wrote about "Re: sort (-g) 
[offtopic]"
> * Ionel Mugurel Ciobica <i.m.ciob...@upcmail.nl> [18-02/18=Su 16:55 +0100]:
> > [... How can something like
> > "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nX\nIX"
> > [be sorted?  ...]
> 
> See `aptitude show msort`; it probably does what you need.


I can't see how that would work, I have read the manual and tried
almost all options. At best msort is still behaves as sort, placing IX
in between IV and V:

# cat roman
X
III
II
XI
IV
V
VI
VIII
VII
IX
XII
I

# cat roman | msort -q -w -l -y ROMAN
I
II
III
IV
IX
V
VI
VII
VIII
X
XI
XII

# cat roman | sort 
I
II
III
IV
IX
V
VI
VII
VIII
X
XI
XII

Did I miss anything?

Besides, how this msort will work in a pipe, when I have to sort by
date, things like 3-V-2017 and 17-IX-2016, I can't find the equivalent
of -k from sort into this msort.

What I need is something like sort -t- -k3,3n -k2,2m -k1,1n, where m
would ideally be the sorting of Roman numerals (or the months as Roman
numerals)...

Is there a way to add this extension to sort? Like it is right now:

| --sort=WORD
|   sort according to WORD: general-numeric -g,
| human-numeric -h, month -M, numeric -n, random -R, version -V

to add "roman numerals -m"
and to be able to specify a file with (all) roman numerals in their
proper order and ask sort to order using that 'dictionary'?

I am willing to try to add this to sort if anyone can point me in the
right direction...

Thank you.

Ionel



Re: sort (-g) [offtopic]

2018-02-19 Thread rhkramer
On Monday, February 19, 2018 06:23:27 AM Will Mengarini wrote:
> * Ionel Mugurel Ciobica  [18-02/18=Su 16:55 +0100]:
> > [... How can something like
> > "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nX\nIX"
> > [be sorted?  ...]
> 
> See `aptitude show msort`; it probably does what you need.

I'm not the OP, but, wow, thanks--looks like a very capable tool and sounds 
like it will work for some complex sorting that I was not looking forward to.

For kicks, I did call up the msort change log, and found that it does deal 
with Roman numerals:

See: http://www.billposer.org/Software/msort.html:


8.33
Numeric keys are no longer limited to the usual Indo-Arabic number system. 
Integers written in any of the following number systems are now accepted: 
Arabic, Arabic (South Asian), Bengali, Burmese, Chinese, Devanagari, Egyptian 
hieroglyphic, Ethiopic (Amharic and Tigrinya), Gujarati, Gurmukhi (Panjabi), 
Hebrew, Kannada, Klingon, Lao, Malayalam, Nko, Old Italic, Old Persian 
cuneiform, Oriya, Phoenician, Roman numerals, Tamil, Telugu, Tengwar, Thai, 
and Tibetan. The writing system for a key is specified by the -y flag. You may 
require a particular writing system, have msort autodetect the writing system 
but require all records to use the same writing system for that key, or have 
msort autodetect the writing system for each record independently. 
<\quote>

For me, that is not (at all) an important feature, but the feature to sort 
using (retaining) units (records) of arbitrary length (number of lines / 
paragraphs) marked in some way (some sort of delimiter) is probably the key 
feature (along with picking an arbitrary sort field, which will not be in the 
first line of a record.)


Re: sort (-g) [offtopic]

2018-02-19 Thread Greg Wooledge
On Sun, Feb 18, 2018 at 04:55:28PM +0100, Ionel Mugurel Ciobîcă wrote:
> Anyone care to explain what exactly means the -g option of sort? The
> fine manual only says "general numerical", but I doubt that is true,
> because -g (and all other options I have tried, -n, -M, -h, -V) will
> all put Roman numeral 9 in between 4 and 5. See here:

... hey, wait, it's not April yet!



Re: sort (-g) [offtopic]

2018-02-19 Thread Will Mengarini
* Ionel Mugurel Ciobica  [18-02/18=Su 16:55 +0100]:
> [... How can something like
> "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nX\nIX"
> [be sorted?  ...]

See `aptitude show msort`; it probably does what you need.



Re: sort (-g) [offtopic]

2018-02-18 Thread Ionel Mugurel Ciobîcă
On 18-02-2018, at 14h 44'27", David Wright wrote about "Re: sort (-g) 
[offtopic]"
> Any script that reads stdin and writes stdout can be used in a pipe.
> That's one of the guiding principles of unix.

I change the scripts to use read if $# is zero. I could use them in
pipes now. But that is not helping me.

> 
> After they've done Roman numerals, they can settle down and do
> yan tan tethera in all dialects.
> https://en.wikipedia.org/wiki/Yan_Tan_Tethera

Well, I do not see those to use any symbols. It is like asking the
computer to sort numerically one two three four ... in whatever
language. This will fail.

Roman numerals never fail from usage, they should be considered, in my
opinion.

> You shouldn't sort like that. If you've got records to sort which have
> an unsortable field like Roman months, then write some thing in sed,
> say, that can do the conversion. Now read your records, say:
> field1 field2 XII field3 field4
> field1 field2 IV field3 field4
> and prefix each record with the numeric representation;
> 12 field1 field2 XII field3 field4
> 04 field1 field2 IV field3 field4
> Now sort that, then throw away the first field with cut. You should
> never have to worry about converting things back!

My programing capacities in sed or awk must be improved in order to
accomplish that. In the mean time, since I only need to move the 5th
line into 9th position (before 10) I can supliment the sort with a sed
pipe:

some input | sort -k2,2 | sed -e '5h;5d;10H;10x'

This of course if all the Roman numerals from 1 to 12 are present...
and only once..., hm...

Thank you for the hints, David.

Ionel



Re: sort (-g) [offtopic]

2018-02-18 Thread John Hasler
 David Wright writes:
> You shouldn't sort like that. If you've got records to sort which have
> an unsortable field like Roman months, then write some thing in sed

And Awk, as well as Sort, Cut, Join, and other record-oriented filters.
-- 
John Hasler 
jhas...@newsguy.com
Elmwood, WI USA



Re: sort (-g) [offtopic]

2018-02-18 Thread David Wright
On Sun 18 Feb 2018 at 16:55:28 (+0100), Ionel Mugurel Ciobîcă wrote:
> 
> Anyone care to explain what exactly means the -g option of sort? The
> fine manual only says "general numerical", but I doubt that is true,
> because -g (and all other options I have tried, -n, -M, -h, -V) will
> all put Roman numeral 9 in between 4 and 5. See here:
> 
> # echo "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nX\nIX" | sort -g | nl
> 
> What I expect is to put 9 in between 8 and 10.
> 
> As I wrote above, I have tried -n as well. I tried -M because in
> Romanian often the months are written with Roman numerals (I to XII),
> but that also failed. -h and -V were not useful here either.
> 
> How do I sort in a pipe those roman numerals? I have written two bash
> scripts roman_to_arab.sh and arab_to_roman.sh, but I do not know how
> to adapt it to use it in pipes. Also, it may be too cumbersome to make
> the conversion to arab digits, sort with -n and then convert back into
> roman numerals...

Any script that reads stdin and writes stdout can be used in a pipe.
That's one of the guiding principles of unix.

Many commands take input from stdin, either be specifying no input
file or by using - as the filename. Same thing for output. Some use
a mixture, eg diff:
cat file1 file2 | diff - file3 | less
compares file1+file2 with file3 and pipes to less.

> Anyone has encounter this issue? Any ideas how to sort out this sort
> issue? Of course, the easier will be if, indeed, the sort -g would
> work as expected, e.g. if "_general_ numeric" will not be particular
> to exclude Roman numerals...

After they've done Roman numerals, they can settle down and do
yan tan tethera in all dialects.
https://en.wikipedia.org/wiki/Yan_Tan_Tethera

> At the moment I have to run this sort three times. First time to limit
> it before IX (with grep -v -e IX -e '^X'), second time just grep "IX",
> and third time to exclude all that starts with I and V: grep -v -e
> "^I" -e "^V", and then put all together, like this:
> 
> ( echo "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nXI\nIX\nXII\nX" | sort -g | grep -v 
> -e "IX" -e '^X' ; echo "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nXI\nIX\nXII\nX" | 
> grep -e "IX" ; echo "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nXI\nIX\nXII\nX" | sort 
> -g | grep -v -e "^I" -e "^V") | nl

You shouldn't sort like that. If you've got records to sort which have
an unsortable field like Roman months, then write some thing in sed,
say, that can do the conversion. Now read your records, say:
field1 field2 XII field3 field4
field1 field2 IV field3 field4
and prefix each record with the numeric representation;
12 field1 field2 XII field3 field4
04 field1 field2 IV field3 field4
Now sort that, then throw away the first field with cut. You should
never have to worry about converting things back!

Basically, that throwaway prefix (it could itself be several fields)
could be a function of any complexity: the order of seats in a
theatre, the value of chess pieces, a lookup table of the order of
precedence of church clergy, whatever turns unsortables into
sortables.

> I exclude here larger numerals, because at the moment I do not need
> anything in that range...

No—handling Romanian month names and abbreviations might be more
useful. I once wrote an arabic→roman converter but that was just as
an exercise in returning variable length strings from OS/360 assembler
to Fortran IV.

> Using the unicode gliphs also doesn't work:
> 
> # echo "Ⅲ\nⅡ\nⅠ\nⅣ\nⅤ\nⅨ\nⅥ\nⅦ\nⅧ\nⅫ\nⅪ\nⅩ" | sort -g | nl

Again, simpler with sed. And don't forget the lower case set just
along the way.

Cheers,
David.



Re: sort (-g) [offtopic]

2018-02-18 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Sun, Feb 18, 2018 at 04:55:28PM +0100, Ionel Mugurel Ciobîcă wrote:
> 
> Anyone care to explain what exactly means the -g option of sort? The
> fine manual only says "general numerical", but I doubt that is true,
> because -g (and all other options I have tried, -n, -M, -h, -V) will
> all put Roman numeral 9 in between 4 and 5. See here:
> 
> # echo "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nX\nIX" | sort -g | nl
> 
> What I expect is to put 9 in between 8 and 10.


The info documentation has more (alternative, you can look
that up on the web, as stated in the man page itself).

Extracted from the info:

  ‘-g’
  ‘--general-numeric-sort’
  ‘--sort=general-numeric’
 Sort numerically, converting a prefix of each line to a long
 double-precision floating point number.  *Note Floating point::.
 Do not report overflow, underflow, or conversion errors.  Use
 the following collating sequence:

  • Lines that do not start with numbers (all considered to be
equal).
  • NaNs (“Not a Number” values, in IEEE floating point
arithmetic) in a consistent but machine-dependent order.
  • Minus infinity.
  • Finite numbers in ascending numeric order (with -0 and +0
equal).
  • Plus infinity.

 Use this option only if there is no alternative; it is much slower
 than ‘--numeric-sort’ (‘-n’) and it can lose information when
 converting to floating point.

So '-g' basically means (decimal representation of) float, plus a
couple of NaNs. No roman numerals, alas...

[...]

> How do I sort in a pipe those roman numerals? I have written two bash
> scripts roman_to_arab.sh and arab_to_roman.sh, but I do not know how
> to adapt it to use it in pipes. Also, it may be too cumbersome to make
> the conversion to arab digits, sort with -n and then convert back into
> roman numerals...

I fear sort is out of its smarts on that. There are libraries for
different languages to do this, e.g. Perl's Roman.pm (in Debian
package libperl-roman).

> Anyone has encounter this issue? Any ideas how to sort out this sort
> issue? Of course, the easier will be if, indeed, the sort -g would
> work as expected, e.g. if "_general_ numeric" will not be particular
> to exclude Roman numerals...

I guess your idea of "general" is just too general to be practical :)

Cheers
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlqJrS0ACgkQBcgs9XrR2kZo4ACcDkY4H1RzyWYaQnQF7E/PfLN9
AbsAmgPSPyn7r5kWyTH7CFOir/OMPAwo
=SXF9
-END PGP SIGNATURE-