Hi, Sunanda,

Very nice summary!

May I pick one small nit and reminisce for a moment?

[EMAIL PROTECTED] wrote:
> 
> ... thread has developed into two parallel tracks, with
> a third on the way.
> 
> TRACK 1
> Has several of us looking at increasingly faster ways to
> pre-process the lines to get stable sort keys.
> 

I should have been more clear about the goal of my submission.
I meant to imply (but I didn't state well) that we need to be
careful about the force of habit, as it is VERY powerful.  We
all jumped to the conclusion, based on Louis's first question,
that we needed to sort the data.

We've even found a way to call the solution I proposed a kind
of sort.

Actually, I began thinking about the fact that most of the
order that Louis requested was already present (i.e., the lines
with identical prefixes were already in the desired order,
relative to each other), which led me simply to group the data
by the one field that had actionable differences, the prefix.

Ergo, a solution that actually doesn't involve sorting at all.


Now for the reminiscence.


I didn't remember it (consiously...  ;-) until writing this
particular email, but there's a really lovely book titled
_Programming_Pearls_ by Jon Bentley of AT&T Bell Labs (the
copy in front of me is the first edition by Addison Wesley,
(c) 1986, but I believe there's a newer version out).  The
book is a collection of eponymous columns Bentley wrote for
CACM.

The very first column, titled "Cracking the Oyster", opens
as follows:

    The programmer's question was simple: "How do I sort
    on disk?"  Before I tell you how I made my first
    mistake, let me give you a chance to do better than
    I did.  What would you have said?

DON'T SCROLL DOWN

UNTIL

YOU

HAVE

THOUGHT

ABOUT

YOUR

ANSWER

TO

BENTLEY'S

QUESTION

,

PLEASE

!

He then continues:

    My mistake was to answer his question.  I gave him a
    thumbnail sketch on how to sort on disk.  My suggestion
    that he dig into Knuth's classic _Sorting_and_Searching_
    met with less than enthusiasm -- he was more concerned
    about solving the problem than with furthering his
    education.  I then told him about the disk sorting
    program in Chapter 4 of Kernighan and Plaugers's
    _Software_Tools_.  Their program consists of about two
    hundred lines of Ratfor code in twelve procedures;
    translating that into several hundred lines of FORTRAN
    and testing the code would have taken about a week.

(Nothing like dating yourself, right?  ;-)

Bentley goes on to describe how he (fortunately) kept asking
questions until the *REAL* nature of the problem became clear,
as summarized below:

-  The system was used for re-organizing political districts;
   the data were the precinct numbers that made up a district.
-  Each value was an integer (a precinct number).
-  It was illegal for a value to appear more than once.
-  There would be up to 27,000 values (in the range 1..27,000).
-  The system only had 1K-2K sixteen-bit words of free memory
   at that point.  (I told you I was dating myself! ;-)

Given this restatement of the problem, *now* how would you
advise the programmer?  (other than offering him a ride in your
time machine, of course!  ;-)

-jn-

-- 
; Joel Neely                             joeldotneelyatfedexdotcom
REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip
do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] {
| e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]
-- 
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the 
subject, without the quotes.

Reply via email to