Re: Weird profiling behaviour

2002-07-15 Thread Eray Ozkural

On Thursday 27 June 2002 12:10, Ketil Z. Malde wrote:
 *lights go on*

 Of course!  While I have about 90K values to sort, it's only a range
 from 0 to about 5-600, and a less than even distribution at that.  (I
 must be a lot denser than I thought. Colin, if you ever happen to pass
 by, do let me know, I think I owe you a beer.)

Why don't you implement a decent quicksort (like randomized quicksort) that 
works on an array?

-- 
Eray Ozkural [EMAIL PROTECTED]
Comp. Sci. Dept., Bilkent University, Ankara
www: http://www.cs.bilkent.edu.tr/~erayo
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C

___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



Re: Weird profiling behaviour

2002-06-27 Thread Colin Runciman

Ketil Z. Malde wrote:

5.02 uses quicksort,

That's funny, since I see quadratic scaling, I must be hitting worst
case both times?  'sort' and  'sortBy' *are* implemented in the same
way, right?

Implementations of QuickSort on lists usually take the easy option of
using the head of the list as the threshold value for partitioning.
As a consequence QuickSort does indeed degenerate to quadratic
cost in the worst case.

Also, curiously enough, it could just as well be the problem that your
int-sorting phase has too *little* sorting to do, as this common
version of quickSort degenerates both for in-order and reverse-order
inputs.

Regards
Colin R




___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



Re: Weird profiling behaviour

2002-06-27 Thread Ketil Z. Malde

Colin Runciman [EMAIL PROTECTED] writes:

 Also, curiously enough, it could just as well be the problem that your
 int-sorting phase has too *little* sorting to do, as this common
 version of quickSort degenerates both for in-order and reverse-order
 inputs.

*lights go on*

Of course!  While I have about 90K values to sort, it's only a range
from 0 to about 5-600, and a less than even distribution at that.  (I
must be a lot denser than I thought. Colin, if you ever happen to pass
by, do let me know, I think I owe you a beer.) 

Okay: bucket sort; does anybody know of a nice bucket sort I can rip
off?  :-)  (Actually, while I haven't done the math or the tests to
say for sure, I suspect a trivial mod to QS where equals are kept in a
separate list might do just fine.  Would that be a sensible
modification to put in the standard libraries, I wonder?)

Thanks again!

-kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



Weird profiling behaviour

2002-06-26 Thread Ketil Z. Malde


Hi,

I have what I think is a really strange problem.  I have a fair sized
problem, which involves sorting a data set, first on labels (which are
Strings) and then on scores (which are Ints).

The strange thing is that string sorting is *vastly* faster than int
scoring!  Now, I've tried modifying the sorting routinges, moving them
into the same module, and so on, to no avail.  Is there any likely
explanation?  The pipeline is basically

 sortBy int_cmp . compound_equal . sortBy string_cmp

I hesitate to post the code, since it's part of a rather large
program, and in the hope that somebody will pop up and say that oh,
yes, it's a well know feature of 'sortBy'.  Or that it's an artifact
of profiling.  Or something like that.

Any, and I mean any, suggestions really, really welcome.

-kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



Re: Weird profiling behaviour

2002-06-26 Thread Colin Runciman

Ketil Z. Malde wrote:

I have what I think is a really strange problem.  I have a fair sized
problem, which involves sorting a data set, first on labels (which are
Strings) and then on scores (which are Ints).

The strange thing is that string sorting is *vastly* faster than int
scoring!  Now, I've tried modifying the sorting routinges, moving them
into the same module, and so on, to no avail.  Is there any likely
explanation?  The pipeline is basically

 sortBy int_cmp . compound_equal . sortBy string_cmp

I hesitate to post the code, since it's part of a rather large
program, and in the hope that somebody will pop up and say that oh,
yes, it's a well know feature of 'sortBy'.  Or that it's an artifact
of profiling.  Or something like that.

Any, and I mean any, suggestions really, really welcome.

-kzm

Could it be that the string-comparison sort simply has less sorting to do
than the int-comparison sort?  The default definition of sortBy uses
insertion sort, so if the string-sort input happens to be already sorted
it takes linear time and if the int-sort input happens to be in
reverse order  it takes quadratic time.

Colin R



___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



Re: Weird profiling behaviour

2002-06-26 Thread Ketil Z. Malde

Colin Runciman [EMAIL PROTECTED] writes:

 Could it be that the string-comparison sort simply has less sorting to do
 than the int-comparison sort?

Not quite improbable, hang on while I print the profiling (with
comparison in its own function):  Yes, that seems to be the case, for
90K values to sort, I get 7M string comparisons and 321M integer
comparisons. 

I'm running a new test now, with a larger number of values to sort,
we'll see how it goes.

Looks promising, thanks!

 The default definition of sortBy uses insertion sort

I have vague recollection of the wisdom of this choice being
questioned on these lists or others, and even vaguer recollection of
it actually being a good choice.  Comments, anybody?

-kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



Re: Weird profiling behaviour

2002-06-26 Thread Ketil Z. Malde

[EMAIL PROTECTED] (Ketil Z. Malde) writes:

 for 90K values to sort, I get 7M string comparisons and 321M integer

..and with different parameters giving 127K values, ie. a factor of
1.4, I get 12M and 614M comparisons, *very* close to the expected
O(n²) behavior of insertion sort.

 The default definition of sortBy uses insertion sort

 I have vague recollection of the wisdom of this choice being
 questioned

And now I think I'm about question it as well...

-kzm 
(writing his own O(n log n) sortBy as we speak)
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users



Re: Weird profiling behaviour

2002-06-26 Thread Brian Huffman

On Wednesday 26 June 2002 04:19 am, Colin Runciman wrote:
 Could it be that the string-comparison sort simply has less sorting to do
 than the int-comparison sort?  The default definition of sortBy uses
 insertion sort, so if the string-sort input happens to be already sorted
 it takes linear time and if the int-sort input happens to be in
 reverse order  it takes quadratic time.

A question I would like to ask is which compiler and libraries are you using? 
In the December 2001 version of Hugs, sortBy is implemented with the default 
definition of insertion sort. But if you are using ghc, you are probably 
using a quicksort algorithm. (Recently the ghc libraries were switched to use 
mergesort instead, but I'm not sure whether this made it into any of the 
released versions.)

Remember that quicksort behaves quadratically in the worst case, which can 
happen with pre-sorted input. Maybe this can explain why the second round of 
sorting is so slow?

- Brian Huffman

___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users