[REBOL] Re: Sort by first part of line

SunandaDH Sat, 07 Sep 2002 02:29:58 -0700

Carl:
> And, out of interest, I converted my index-sort to work with the file
>  in memory, (since you kindly left out disk-access in the timing:),
>  and got a result of 01:48.26.  Joel's method still wins though. (: 
>  I've 32megs, but didn't have any memory problems.


I've incorporated your code into the set of benchmarks I seem to be 
gathering. 

You are faster than Joel _sometimes_ on my machine -- the results are too 
close to call without a properly refereed benchmark environment.

Your code makes less assumptions that Joel's -- yours is sorting on the first 
four characters of a line. That's should be in the format "b999" (b=space 
9=digit) but your code will work if it isn't

Gregg and I make even less (or maybe different) assumptions -- we'll sort on 
the first non-whitespace string regardless of length. Whether in practice 
that is more future-proof is unknowable.

I've tweaked both mine and Scott's code by replacing "second parse/all" with 
"first parse". 20% faster -- but we still won't win on raw speed though.

I ran the benchmark 5 times with:

    loop 5 [do %sort-test.r]

It shows some interesting deterioration for long-running Rebol consoles:

41.41  -- Sunanda's parse in the Sort 
75.51    
78.43    
78.43    
81.73    

 8.19  --  Gregg's parse before the sort
 8.41      
 8.41      
 8.52
 8.67      

 2.25 -- Scott's index sort
 2.25 
 2.30
 2.30
 2.42  

 2.31  -- Joel's bridge sort
 2.26
 2.31
 2.30
 2.25


The second and subsequent runs run the loop have all the interim data 
structures in place, and possibly getting in our way.

But look at the way my sort deteriorates!!  It is possible that my code gets 
slugged by an internal garbage-collection, and if we ran it enough times any 
of the benchmarks could be hit by that.

But I've seen something similar when repeatedly doing 'layout on very large 
(over 1000) faces. Gradually, treacle gets poured into the fine workings of 
the Rebol interpretor, and it slows down. Even the undocumented 
recycle/torture doesn't get it back to speed.

All of our methods, except Joel's, show signs of slowing down under repeated 
usage. Shows the importance of stress-testing any script that is expected to 
run long and hard.

I've repeated below the benchmark code I've used here. Apologies here to 
anyone for whom this email is too long and on no interest.

Sunanda.

rebol []

report-item: func [what [string!] /start /end /local times] [
 times: []
    if start [
        clear times
        append times now/time/precise
        print ["rebol " system/version " -- Started" what times/1]
        ]
   
   if end [
       append times now/time/precise   
       print ["rebol " system/version " --  Ended" what times/2 " elapsed: " 
times/2 - times/1 ]
       ]
]



data: read/lines %louis-data.txt

;; data: copy/part data 100  ;;de-comment for less data
;; loop 5 [append data data] ;;de-comment for more data





;; Sunanda -- Parse every sort compare
;; -----------------------------------

unset 'sorted-data
recycle

report-item/start "parse/stable"

sort/compare  sorted-data: copy data func [a b /local  a-key b-key ] [
    a-key: first parse a " "
    b-key: first parse b " "
    either a-key < b-key [return -1]
    [either a-key = b-key [return 0] [return +1]]
]

report-item/end "parse/stable"


write/lines %sorted-data-parse.txt sorted-data


;; Scott -- pre-parsed before sort
;; -------------------------------

unset 'sorted-data
recycle

report-item/start "pre-parsed/stable"
data-blk: copy []
foreach datum data [
    append data-blk first parse datum " "
    append data-blk datum
]


sort/skip data-blk 2

sorted-data: copy []
foreach [key value] data-blk [
    append sorted-data value
    ]
report-item/end "pre-parsed/stable"

write %sorted-data-pre-parsed.txt ""
foreach value sorted-data [
    write/lines/append %sorted-data-pre-parsed.txt value
]



;; Carl -- index sort
;; ------------------

unset 'sorted-data
recycle

report-item/start "index sort"
file-index: array 2 * length? data
ptr: 1
foreach item data [
    poke file-index ptr copy/part item 4
    poke file-index ptr + 1 item
    ptr: ptr + 2
]

sorted-data: extract next sort/skip file-index 2 2

report-item/end "index sort"

write/lines %sorted-data-index.txt sorted-data



;; Joel -- bridge sort
;; -------------------

unset 'sorted-data
recycle

report-item/start "bridge sort"
buffer: []

foreach item data [
    nr: to-integer copy/part next item 3
    while [
        nr > length? buffer
    ][
        insert/only tail buffer copy []
    ]
    append buffer/:nr item
]

sorted-data: copy []

foreach group buffer [
    foreach line group [
        append sorted-data line
    ]
]

report-item/end "bridge sort"


write %sorted-data-bridge.txt ""
foreach value sorted-data [
    write/lines/append %sorted-data-bridge.txt value
]

unset 'sorted-data
recycle


;; Verify sort results
;; ===================

report-item/start "verifying results are the same"
 parsed-file: read %sorted-data-parse.txt 
 pre-parsed-file: read %sorted-data-pre-parsed.txt
 bridge-file: read %sorted-data-bridge.txt
 indexed-file: read %sorted-data-index.txt
 
 either all [parsed-file = pre-parsed-file
             parsed-file = bridge-file
             parsed-file = indexed-file
        ]
   [print "got same results"]
   [print "bad code in there somewhere"]
  
report-item/end "verifying results are the same"

print "done"
-- 
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the 
subject, without the quotes.

[REBOL] Re: Sort by first part of line

Reply via email to