Re: sort

Tomas Hlavaty Fri, 02 Jan 2009 04:11:21 -0800

Hi Alex,

> What user-defined less-than relation do you have in mind?


I am implementing multi-methods in picolisp, see
http://logand.com/picoWiki/multimethods

What I need there is to order methods by their prototype, so that more
specialized methods are called first.  This "more specialized" thing
is my lessThan predicate for sorting.

> Built-in list functions can be used better and more efficiently to
> do the pre- and post-processing.

I think there are the following cases:

1) the built-in comparison (as implemented by compare()) is suitable
   for the problem at hand and then 'sort' and 'by' is fine.  This
   saves overhead of calling 'apply' for comparison but introduces
   some consing when 'by' is used.  It also assumes "absolute"
   ordering of list elements as understood by compare().

2) a custom comparison is needed (no absolute ordering available).
   Then using 'sort' is not efficient because the pre-processing step
   is at least O(N^2).  In this case, overhead of pre-processing (my
   function 'order' mentioned earlier) and/or consing is bigger than
   overhead of calling 'apply' to do the custom comparison.

If I knew how to do the preprocessing step better, I would agree with
you, but it seems to me that there isn't a better way in general.

My reasoning here is that a sort function "knows" how to invoke the
"custom" lessThan predicate less than O(N^2) times.  If not, I have to
do the pre-processing and compute *absolute* ordering values the way
my 'order' function does, but that is not that great.

What the 'order' function does is to compute those absolute values
which the 'sort' function understands.  There are situations though
where these absolute values are not available for the problem at hand
and then I need to pre-compute them.  And computing this absolute
ordering cannot be implemented "efficiently".

I imagine other problems where the built-in 'sort' function is not
"good enough", e.g. sorting strings by application specific rules like
http://www.davekoelle.com/alphanum.html

>    (when (Lt X Y)
>
> would do directly what you intend.

Yes, thanks, that's what I meant:-)

> However, there is another problem with 'order', probably just a wrong
> parenthesis:
>
>>          (let S 0
>>             (for Y Lst
>>                (when (apply Lt NIL X Y)
>>                   (inc 'S) ) ) )
>>          (push 'Q (cons S X)) )
>
> 'S' is unbound when 'cons' is called, so the CARs of all elements end up
> with NIL (or whatever 'S' was before).

Oops, sorry for the bug:-( The 'push' expression should have been
under the 'let' expression...

>> That is quite simple but not efficient.
>
> True, as the list is iterated proportional to the square of the length.
> Why do you think this is necessary?

Not sure what do you mean?  Why it has to be O(N^2)?  Because only a
sort function can possibly know how to use a lessThan predicate less
than O(N^2) times as this depends on the sorting strategy I think.

>> Could the existing 'sort' function be extended for the scenario with a
>> user-defined less-than relation?
>
> I decided back then to write 'sort' without a functional argument, as I
> believe that it is more efficient this way.

I understand that this choice is useful for many practical tasks.
However, it also restricts the problems where I can use it.  Sometimes
a programmer builds more complicated data structures or assigns
application specific meanings to things and then the built-in
compare() is "not enough", it simply does not have the right meaning.

> If 'sort' calls a function twice each time two elements need to be
> compared,

A good sorting algorithm should minimize the total number of
comparisons.  Ideally two elements would be compared once at most.  If
it happens that they are compared twice (occassionaly), overall the
number of comparisons should be still as small as possible (some
elements do not get compared at all...).

> it will involve more overhead than simply applying this function
> once to each argument before the call to 'sort'.

Only if I have the absolute ordering expected by compare() readily at
hand.  Otherwise I have to do O(N^2) preprocessing in which case
applying the function less times than O(N^2) should be cheeper.

> 'sort' is much simpler this way and does not have to do any
> bookkeeping.

It certainly is (only slightly) simpler but not necessarily as useful.

I have attached 'sort2' function which "extends" the existing 'sort'
function and allows a custom lessThan predicate to be passed in.  What
do you think of it?

You can see the difference running:

(de lt (L R)
   (push '*Lt (list L R))
   (> L R) )

: (off *Lt)
-> NIL
: (sort2 '(9 3 0 6 2 8 1 7 5 4 9 3 0 6 2 8 1 7 5 4) lt)
-> (9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0)
: (length *Lt)
-> 183
: (off *Lt)
-> NIL
: (order '(9 3 0 6 2 8 1 7 5 4 9 3 0 6 2 8 1 7 5 4) lt)
-> (9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0)
: (length *Lt)
-> 400

Of course that could be run as:

: (by - sort '(9 3 0 6 2 8 1 7 5 4 9 3 0 6 2 8 1 7 5 4))
-> (9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0)

but that assumes the absolute ordering (something like readily
available mapping to integer numbers).  If the elements weren't
numbers but arbitrary partial ordered set data, it would not work
while the 'sort2' and 'order' examples would still work.

Thank you,

Tomas

(load "lib/gcc.l")

(gcc "sort2" NIL 'sort2)

static int cmp(any lt, any l, any r) {
   cell c[2] = {*l, *r};
   any z = apply(NULL, lt, NO, 2, c);
   return z == T ? -1 : 1;
}

#define compare(l, r)   (lt == Nil ? compare(l, r) : cmp(lt, l, r))

/* List Merge Sort: Bill McDaniel, DDJ Jun99 */
// (sort2 'lst 'lt) -> lst
any sort2(any x) {
   int i;
   any p, in[2], out[2], last;
   any *tail[2];
   any lt;

   x = cdr(x);
   if (!isCell(out[0] = EVAL(car(x))))
      return out[0];

   lt = EVAL(cadr(x));
      
   out[1] = Nil;

   do {
      in[0] = out[0];
      in[1] = out[1];

      i  =  isCell(in[1])  &&  compare(in[0], in[1]) >= 0;
      if (isCell(p = in[i]))
         in[i] = cdr(in[i]);
      out[0] = p;
      tail[0] = &cdr(p);
      last = out[0];
      cdr(p) = Nil;
      i = 0;
      out[1] = Nil;
      tail[1] = &out[1];

      while (isCell(in[0]) || isCell(in[1])) {
         if (!isCell(in[1])) {
            if (isCell(p = in[0]))
               in[0] = cdr(in[0]);
            if (compare(p,last) < 0)
               i = 1-i;
         }
         else if (!isCell(in[0])) {
            p = in[1],  in[1] = cdr(in[1]);
            if (compare(p,last) < 0)
               i = 1-i;
         }
         else if (compare(in[0],last) < 0) {
            if (compare(in[1],last) >= 0)
               p = in[1],  in[1] = cdr(in[1]);
            else {
               if (compare(in[0],in[1]) < 0)
                  p = in[0],  in[0] = cdr(in[0]);
               else
                  p = in[1],  in[1] = cdr(in[1]);
               i = 1-i;
            }
         }
         else {
            if (compare(in[1],last) < 0)
               p = in[0],  in[0] = cdr(in[0]);
            else {
               if (compare(in[0],in[1]) < 0)
                  p = in[0],  in[0] = cdr(in[0]);
               else
                  p = in[1],  in[1] = cdr(in[1]);
            }
         }
         *tail[i] = p;
         tail[i] = &cdr(p);
         cdr(p) = Nil;
         last = p;
      }
   } while (isCell(out[1]));
   return out[0];
}
/**/

Re: sort

Reply via email to