Re: sort (-g) [offtopic]
On 27-02-2018, at 07h 31'11", David Wright wrote about "Re: sort (-g) [offtopic]" > Yes, you need to read §3.4.2.8 over again: > [...] > IOW you should write a file containing > > I i > II ii > etc. > > and feed it to -s. I can do that. > > > Besides, how this msort will work in a pipe, when I have to sort by > > date, things like 3-V-2017 and 17-IX-2016, I can't find the equivalent > > of -k from sort into this msort. > > I could use -n option of msort in the same way as I use now -k from sort. Thank you. Ionel
Re: sort (-g) [offtopic]
On 27-02-2018, at 08h 36'51", Greg Wooledge wrote about "Re: sort (-g) [offtopic]" > > Did I miss anything? > > Well, this program certainly is ... unusual. Doesn't just *work* by > default. No examples in the man page. Anyway, it looks like you > forgot to specify numeric comparison. > > wooledg:~$ printf %s\\n III IX IV V X VII VI VIII | msort -q -w -l -c N -y any > III > IV > V > VI > VII > VIII > IX > X Yes, you mean the -c. I supposed that if I use the -y option then -c would be implied. But you are right that this is an unusual piece of code... Ionel
Re: sort (-g) [offtopic]
On Tue, Feb 27, 2018 at 09:48:57AM +0100, Ionel Mugurel Ciobîcă wrote: > # cat roman | msort -q -w -l -y ROMAN > I > II > III > IV > IX > V > VI > VII > VIII > X > XI > XII > Did I miss anything? Well, this program certainly is ... unusual. Doesn't just *work* by default. No examples in the man page. Anyway, it looks like you forgot to specify numeric comparison. wooledg:~$ printf %s\\n III IX IV V X VII VI VIII | msort -q -w -l -c N -y any III IV V VI VII VIII IX X Here's what I mean by "doesn't just work by default": wooledg:~$ printf %s\\n III IX IV V X VII VI VIII | msort -c N -y any Comparison type specified without previous key selector (-e, -n, -t, or -w). What, you don't just read lines by default? And you don't take standard sort(1)'s -k key specifier. OK, fine, here's a -w for you: wooledg:~$ printf %s\\n III IX IV V X VII VI VIII | msort -w -c N -y any Sorting on whole record. Increasing numeric string Reading from stdin. Records processed: 0 There's no point in sorting fewer than two records.0 Lolwut? Who came up with this user interface? Do you even read standard input by default? No "standard" in the man page. No "stdin" in the man page. No examples in the man page. No default behavior that works. OK, I guess it does say "Reading from stdin" in that error output, but why didn't you say that in the manual? Jeez. I ended up googling "msort example" which was only slightly helpful.
Re: sort (-g) [offtopic]
On Tue 27 Feb 2018 at 09:48:57 (+0100), Ionel Mugurel Ciobîcă wrote: > On 19-02-2018, at 03h 23'27", Will Mengarini wrote about "Re: sort (-g) > [offtopic]" > > * Ionel Mugurel Ciobica <i.m.ciob...@upcmail.nl> [18-02/18=Su 16:55 +0100]: > > > [... How can something like > > > "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nX\nIX" > > > [be sorted? ...] > > > > See `aptitude show msort`; it probably does what you need. > > > I can't see how that would work, I have read the manual and tried > almost all options. At best msort is still behaves as sort, placing IX > in between IV and V: > > # cat roman > X > III > II > XI > IV > V > VI > VIII > VII > IX > XII > I > > # cat roman | msort -q -w -l -y ROMAN > I > II > III > IV > IX > V > VI > VII > VIII > X > XI > XII > > # cat roman | sort > I > II > III > IV > IX > V > VI > VII > VIII > X > XI > XII > > Did I miss anything? Yes, you need to read §3.4.2.8 over again: "If the argument to the -c option begins with m or M msort will treat the key as the name of a month. If the -s option is also used with this key and its argument is the name of a file, month names will be read from the file. The file should have the same format as a sort order specification file. All entries on the same line will be given the same sort rank. The sort rank will follow the order of the lines. This approach permits the use of calendars with more than twelve months. It also allows multiple abbreviations or names for the same month." IOW you should write a file containing I i II ii etc. and feed it to -s. > Besides, how this msort will work in a pipe, when I have to sort by > date, things like 3-V-2017 and 17-IX-2016, I can't find the equivalent > of -k from sort into this msort. > > What I need is something like sort -t- -k3,3n -k2,2m -k1,1n, where m > would ideally be the sorting of Roman numerals (or the months as Roman > numerals)... > > Is there a way to add this extension to sort? Like it is right now: > > | --sort=WORD > | sort according to WORD: general-numeric -g, > | human-numeric -h, month -M, numeric -n, random -R, version -V > > to add "roman numerals -m" > and to be able to specify a file with (all) roman numerals in their > proper order and ask sort to order using that 'dictionary'? Precisely the method that I've quoted above. > I am willing to try to add this to sort if anyone can point me in the > right direction... Cheers, David.
Re: sort (-g) [offtopic]
On 19-02-2018, at 03h 23'27", Will Mengarini wrote about "Re: sort (-g) [offtopic]" > * Ionel Mugurel Ciobica <i.m.ciob...@upcmail.nl> [18-02/18=Su 16:55 +0100]: > > [... How can something like > > "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nX\nIX" > > [be sorted? ...] > > See `aptitude show msort`; it probably does what you need. I can't see how that would work, I have read the manual and tried almost all options. At best msort is still behaves as sort, placing IX in between IV and V: # cat roman X III II XI IV V VI VIII VII IX XII I # cat roman | msort -q -w -l -y ROMAN I II III IV IX V VI VII VIII X XI XII # cat roman | sort I II III IV IX V VI VII VIII X XI XII Did I miss anything? Besides, how this msort will work in a pipe, when I have to sort by date, things like 3-V-2017 and 17-IX-2016, I can't find the equivalent of -k from sort into this msort. What I need is something like sort -t- -k3,3n -k2,2m -k1,1n, where m would ideally be the sorting of Roman numerals (or the months as Roman numerals)... Is there a way to add this extension to sort? Like it is right now: | --sort=WORD | sort according to WORD: general-numeric -g, | human-numeric -h, month -M, numeric -n, random -R, version -V to add "roman numerals -m" and to be able to specify a file with (all) roman numerals in their proper order and ask sort to order using that 'dictionary'? I am willing to try to add this to sort if anyone can point me in the right direction... Thank you. Ionel
Re: sort (-g) [offtopic]
On Monday, February 19, 2018 06:23:27 AM Will Mengarini wrote: > * Ionel Mugurel Ciobica[18-02/18=Su 16:55 +0100]: > > [... How can something like > > "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nX\nIX" > > [be sorted? ...] > > See `aptitude show msort`; it probably does what you need. I'm not the OP, but, wow, thanks--looks like a very capable tool and sounds like it will work for some complex sorting that I was not looking forward to. For kicks, I did call up the msort change log, and found that it does deal with Roman numerals: See: http://www.billposer.org/Software/msort.html: 8.33 Numeric keys are no longer limited to the usual Indo-Arabic number system. Integers written in any of the following number systems are now accepted: Arabic, Arabic (South Asian), Bengali, Burmese, Chinese, Devanagari, Egyptian hieroglyphic, Ethiopic (Amharic and Tigrinya), Gujarati, Gurmukhi (Panjabi), Hebrew, Kannada, Klingon, Lao, Malayalam, Nko, Old Italic, Old Persian cuneiform, Oriya, Phoenician, Roman numerals, Tamil, Telugu, Tengwar, Thai, and Tibetan. The writing system for a key is specified by the -y flag. You may require a particular writing system, have msort autodetect the writing system but require all records to use the same writing system for that key, or have msort autodetect the writing system for each record independently. <\quote> For me, that is not (at all) an important feature, but the feature to sort using (retaining) units (records) of arbitrary length (number of lines / paragraphs) marked in some way (some sort of delimiter) is probably the key feature (along with picking an arbitrary sort field, which will not be in the first line of a record.)
Re: sort (-g) [offtopic]
On Sun, Feb 18, 2018 at 04:55:28PM +0100, Ionel Mugurel Ciobîcă wrote: > Anyone care to explain what exactly means the -g option of sort? The > fine manual only says "general numerical", but I doubt that is true, > because -g (and all other options I have tried, -n, -M, -h, -V) will > all put Roman numeral 9 in between 4 and 5. See here: ... hey, wait, it's not April yet!
Re: sort (-g) [offtopic]
* Ionel Mugurel Ciobica[18-02/18=Su 16:55 +0100]: > [... How can something like > "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nX\nIX" > [be sorted? ...] See `aptitude show msort`; it probably does what you need.
Re: sort (-g) [offtopic]
On 18-02-2018, at 14h 44'27", David Wright wrote about "Re: sort (-g) [offtopic]" > Any script that reads stdin and writes stdout can be used in a pipe. > That's one of the guiding principles of unix. I change the scripts to use read if $# is zero. I could use them in pipes now. But that is not helping me. > > After they've done Roman numerals, they can settle down and do > yan tan tethera in all dialects. > https://en.wikipedia.org/wiki/Yan_Tan_Tethera Well, I do not see those to use any symbols. It is like asking the computer to sort numerically one two three four ... in whatever language. This will fail. Roman numerals never fail from usage, they should be considered, in my opinion. > You shouldn't sort like that. If you've got records to sort which have > an unsortable field like Roman months, then write some thing in sed, > say, that can do the conversion. Now read your records, say: > field1 field2 XII field3 field4 > field1 field2 IV field3 field4 > and prefix each record with the numeric representation; > 12 field1 field2 XII field3 field4 > 04 field1 field2 IV field3 field4 > Now sort that, then throw away the first field with cut. You should > never have to worry about converting things back! My programing capacities in sed or awk must be improved in order to accomplish that. In the mean time, since I only need to move the 5th line into 9th position (before 10) I can supliment the sort with a sed pipe: some input | sort -k2,2 | sed -e '5h;5d;10H;10x' This of course if all the Roman numerals from 1 to 12 are present... and only once..., hm... Thank you for the hints, David. Ionel
Re: sort (-g) [offtopic]
David Wright writes: > You shouldn't sort like that. If you've got records to sort which have > an unsortable field like Roman months, then write some thing in sed And Awk, as well as Sort, Cut, Join, and other record-oriented filters. -- John Hasler jhas...@newsguy.com Elmwood, WI USA
Re: sort (-g) [offtopic]
On Sun 18 Feb 2018 at 16:55:28 (+0100), Ionel Mugurel Ciobîcă wrote: > > Anyone care to explain what exactly means the -g option of sort? The > fine manual only says "general numerical", but I doubt that is true, > because -g (and all other options I have tried, -n, -M, -h, -V) will > all put Roman numeral 9 in between 4 and 5. See here: > > # echo "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nX\nIX" | sort -g | nl > > What I expect is to put 9 in between 8 and 10. > > As I wrote above, I have tried -n as well. I tried -M because in > Romanian often the months are written with Roman numerals (I to XII), > but that also failed. -h and -V were not useful here either. > > How do I sort in a pipe those roman numerals? I have written two bash > scripts roman_to_arab.sh and arab_to_roman.sh, but I do not know how > to adapt it to use it in pipes. Also, it may be too cumbersome to make > the conversion to arab digits, sort with -n and then convert back into > roman numerals... Any script that reads stdin and writes stdout can be used in a pipe. That's one of the guiding principles of unix. Many commands take input from stdin, either be specifying no input file or by using - as the filename. Same thing for output. Some use a mixture, eg diff: cat file1 file2 | diff - file3 | less compares file1+file2 with file3 and pipes to less. > Anyone has encounter this issue? Any ideas how to sort out this sort > issue? Of course, the easier will be if, indeed, the sort -g would > work as expected, e.g. if "_general_ numeric" will not be particular > to exclude Roman numerals... After they've done Roman numerals, they can settle down and do yan tan tethera in all dialects. https://en.wikipedia.org/wiki/Yan_Tan_Tethera > At the moment I have to run this sort three times. First time to limit > it before IX (with grep -v -e IX -e '^X'), second time just grep "IX", > and third time to exclude all that starts with I and V: grep -v -e > "^I" -e "^V", and then put all together, like this: > > ( echo "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nXI\nIX\nXII\nX" | sort -g | grep -v > -e "IX" -e '^X' ; echo "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nXI\nIX\nXII\nX" | > grep -e "IX" ; echo "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nXI\nIX\nXII\nX" | sort > -g | grep -v -e "^I" -e "^V") | nl You shouldn't sort like that. If you've got records to sort which have an unsortable field like Roman months, then write some thing in sed, say, that can do the conversion. Now read your records, say: field1 field2 XII field3 field4 field1 field2 IV field3 field4 and prefix each record with the numeric representation; 12 field1 field2 XII field3 field4 04 field1 field2 IV field3 field4 Now sort that, then throw away the first field with cut. You should never have to worry about converting things back! Basically, that throwaway prefix (it could itself be several fields) could be a function of any complexity: the order of seats in a theatre, the value of chess pieces, a lookup table of the order of precedence of church clergy, whatever turns unsortables into sortables. > I exclude here larger numerals, because at the moment I do not need > anything in that range... No—handling Romanian month names and abbreviations might be more useful. I once wrote an arabic→roman converter but that was just as an exercise in returning variable length strings from OS/360 assembler to Fortran IV. > Using the unicode gliphs also doesn't work: > > # echo "Ⅲ\nⅡ\nⅠ\nⅣ\nⅤ\nⅨ\nⅥ\nⅦ\nⅧ\nⅫ\nⅪ\nⅩ" | sort -g | nl Again, simpler with sed. And don't forget the lower case set just along the way. Cheers, David.
Re: sort (-g) [offtopic]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Sun, Feb 18, 2018 at 04:55:28PM +0100, Ionel Mugurel Ciobîcă wrote: > > Anyone care to explain what exactly means the -g option of sort? The > fine manual only says "general numerical", but I doubt that is true, > because -g (and all other options I have tried, -n, -M, -h, -V) will > all put Roman numeral 9 in between 4 and 5. See here: > > # echo "III\nII\nI\nV\nIV\nVII\nVI\nVIII\nX\nIX" | sort -g | nl > > What I expect is to put 9 in between 8 and 10. The info documentation has more (alternative, you can look that up on the web, as stated in the man page itself). Extracted from the info: ‘-g’ ‘--general-numeric-sort’ ‘--sort=general-numeric’ Sort numerically, converting a prefix of each line to a long double-precision floating point number. *Note Floating point::. Do not report overflow, underflow, or conversion errors. Use the following collating sequence: • Lines that do not start with numbers (all considered to be equal). • NaNs (“Not a Number” values, in IEEE floating point arithmetic) in a consistent but machine-dependent order. • Minus infinity. • Finite numbers in ascending numeric order (with -0 and +0 equal). • Plus infinity. Use this option only if there is no alternative; it is much slower than ‘--numeric-sort’ (‘-n’) and it can lose information when converting to floating point. So '-g' basically means (decimal representation of) float, plus a couple of NaNs. No roman numerals, alas... [...] > How do I sort in a pipe those roman numerals? I have written two bash > scripts roman_to_arab.sh and arab_to_roman.sh, but I do not know how > to adapt it to use it in pipes. Also, it may be too cumbersome to make > the conversion to arab digits, sort with -n and then convert back into > roman numerals... I fear sort is out of its smarts on that. There are libraries for different languages to do this, e.g. Perl's Roman.pm (in Debian package libperl-roman). > Anyone has encounter this issue? Any ideas how to sort out this sort > issue? Of course, the easier will be if, indeed, the sort -g would > work as expected, e.g. if "_general_ numeric" will not be particular > to exclude Roman numerals... I guess your idea of "general" is just too general to be practical :) Cheers - -- tomás -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlqJrS0ACgkQBcgs9XrR2kZo4ACcDkY4H1RzyWYaQnQF7E/PfLN9 AbsAmgPSPyn7r5kWyTH7CFOir/OMPAwo =SXF9 -END PGP SIGNATURE-