[REBOL] Re: Sort by first part of line

Louis A. Turk Thu, 05 Sep 2002 09:08:01 -0700

Hi Sunanda,

Your sort worked perfectly.  Thanks also for the explanation.
You might be interested to know that
on my 450 Pentium 2 running w2k total time to sort
29,688 lines using your code was:


3:50:10

I want to thank Sunanda and everyone else that helped with this.  I still 
have questions, but I'm committed to other things for the rest of this 
week.  Next week I'll try to ask them.

One question I'll ask now however:  What exactly does hash do, and could 
hash be used to speed up sort?

Thanks again,
Louis

At 05:50 PM 9/3/2002 -0400, you wrote:
>Louis:
> > How do I sort the following lines by
> >  the numbers only (not by the letters and not by the length).
>
>Hope this helps.....
>
>Let me first simplify the data to make the code snippets easier to follow:
>
>data:  [
>   "1 first of the ones"
>   "1 last of the ones"
>   "3 top three"
>   "3 middle three"
>   "3 bottom three"
>   "2 start of the twos"
>   "2 middle of the twos"
>   "2 last of the twos"
>   ]
>
>You want a *stable* sort on the first field. End result (on my data) should
>be:
>
>sorted-data:  [
>   "1 first of the ones"
>   "1 last of the ones"
>   "2 start of the twos"
>   "2 middle of the twos"
>   "2 last of the twos"
>   "3 top three"
>   "3 middle three"
>   "3 bottom three"
>   ]
>
>
>A straight sort will compare the whole line length. So
>
>    sort sorted-data: copy data
>
>doesn't preserve the input sequencing for fields with an equal key.
>
>If Rebol's sort were "stable" -- i.e. it kept equal keys in their
>original sequence, then all you'd need to do is to write your own
>sort compare routine to compare partial keys.
>
>This is the basic code:
>
>   sort/compare sorted-data: copy data func [a b] [return a < b]
>
>a and b (you can use any names) are the pairs of records to be compared.
>
>But the code above adds no value to the basic sort, as we want to
>sort on the first character of each record (on my data). So:
>
>    sort/compare sorted-data copy data func [a b] [
>            return (copy/part a 1) < (copy/part b 1)]
>
>
>It's not *quite* what *you* want, as you need to sort on the
>first space-deliminated field. One way is to use 'parse:
>
>      sort/compare sorted-data copy data func [a b] [
>           return (first parse a " ") > (first parse b " ")
>          ]
>
>But this doesn't preserve the input sequence -- it looks to me like
>Rebol's sort is ***not*** stable.  So we need to add the input sequence
>to each key as part of the compare.  Like this:
>
>      sort/compare  sorted-data: copy data func [a b /local  a-key b-key ] [
>          a-key: join first parse/all a " "    ["-" index? find data a]
>           b-key: join first parse/all b " "    ["-" index? find data b]
>           ;;; print [a-key "   " b-key] ;;-- decomment this to see what's
>happening
>          return a-key < b-key
>        ]
>
>This is now a stable sort on my data.
>
>It's not quite what you want because your data starts with a space.
>So you need the **second** field in the parse:
>
>      sort/compare  sorted-data: copy data func [a b /local  a-key b-key ] [
>          a-key: join second parse/all a " "    ["-" index? find data a]
>           b-key: join second parse/all b " "    ["-" index? find data b]
>           print [a-key "   " b-key] ;;-- decomment this to see what's
>happening
>          return a-key < b-key
>        ]
>
>
>It should now work on your data, as long
>as all your integers remain all as three digits.
>If you start to use unequal length
>integers, you'll need to normalise them in the sort key.
>
>For your data, I get (I'm assuming you have it in a block as a
>string-per-line):
>
>print mold data
>[
>     " 454 en tw"
>     " 395 en th"
>     " 313 kai o"
>     " 175 oi de"
>     " 314 eij thn"
>     " 174 eij ton"
>     " 124 kai ouk"
>     " 123 kai thn"
>     " 219 ek tou"
>     " 160 kai en"
>     " 142 kai to"
>     " 126 tw qew"
>     " 166 kai h"
>     " 096 ei de"
>     " 094 ou mh"
>     " 091 ei mh"
>     " 120 thj ghj"
>     " 120 en toij"
>     " 112 estin o"
>     " 108 en taij"
>     " 118 o kurioj"
>     " 096 proj ton"
>     " 088 kai touj"
>     " 082 kai idou"
>     " 115 kai ta"
>     " 111 o uioj"
>     " 111 de kai"
>     " 103 ek twn"
>     " 114 eipen autoij"
>     " 104 tou anqrwpou"
>     " 071 legei autoij"
>     " 063 twn ioudaiwn"
>     " 105 proj auton"
>     " 092 oi maqhtai"
>     " 082 legei autw"
>     " 078 eipen autw"
>     " 103 ek thj"
>     " 090 ina mh"
>     " 086 autw o"
>     " 084 ou gar"
>     " 101 apo tou"
>]
>
>
> >> print mold sorted-data
>[
>     "063 twn ioudaiwn"
>     "071 legei autoij"
>     "078 eipen autw"
>     "082 kai idou"
>     "082 legei autw"
>     "084 ou gar"
>     "086 autw o"
>     "088 kai touj"
>     "090 ina mh"
>     "091 ei mh"
>     "092 oi maqhtai"
>     "094 ou mh"
>     "096 ei de"
>     "096 proj ton"
>     "101 apo tou"
>     "103 ek twn"
>     "103 ek thj"
>     "104 tou anqrwpou"
>     "105 proj auton"
>     "108 en taij"
>     "111 o uioj"
>     "111 de kai"
>     "112 estin o"
>     "114 eipen autoij"
>     "115 kai ta"
>     "118 o kurioj"
>     "120 thj ghj"
>     "120 en toij"
>     "123 kai thn"
>     "124 kai ouk"
>     "126 tw qew"
>     "142 kai to"
>     "160 kai en"
>     "166 kai h"
>     "174 eij ton"
>     "175 oi de"
>     "219 ek tou"
>     "313 kai o"
>     "314 eij thn"
>     "395 en th"
>     "454 en tw"]
>
>Some of the lines it makes a difference on are the 111s and the 120s --
>they would be swapped with a simple sort, or randomised within themselves by
>a non-stable one.
>
>Hope that makes sense,
>
>Sunanda.
>--
>To unsubscribe from this list, please send an email to
>[EMAIL PROTECTED] with "unsubscribe" in the
>subject, without the quotes.

-- 
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the 
subject, without the quotes.

[REBOL] Re: Sort by first part of line

Reply via email to