Re: Weird profiling behaviour
On Thursday 27 June 2002 12:10, Ketil Z. Malde wrote: *lights go on* Of course! While I have about 90K values to sort, it's only a range from 0 to about 5-600, and a less than even distribution at that. (I must be a lot denser than I thought. Colin, if you ever happen to pass by, do let me know, I think I owe you a beer.) Why don't you implement a decent quicksort (like randomized quicksort) that works on an array? -- Eray Ozkural [EMAIL PROTECTED] Comp. Sci. Dept., Bilkent University, Ankara www: http://www.cs.bilkent.edu.tr/~erayo GPG public key fingerprint: 360C 852F 88B0 A745 F31B EA0F 7C07 AE16 874D 539C ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Weird profiling behaviour
Ketil Z. Malde wrote: 5.02 uses quicksort, That's funny, since I see quadratic scaling, I must be hitting worst case both times? 'sort' and 'sortBy' *are* implemented in the same way, right? Implementations of QuickSort on lists usually take the easy option of using the head of the list as the threshold value for partitioning. As a consequence QuickSort does indeed degenerate to quadratic cost in the worst case. Also, curiously enough, it could just as well be the problem that your int-sorting phase has too *little* sorting to do, as this common version of quickSort degenerates both for in-order and reverse-order inputs. Regards Colin R ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Weird profiling behaviour
Colin Runciman [EMAIL PROTECTED] writes: Also, curiously enough, it could just as well be the problem that your int-sorting phase has too *little* sorting to do, as this common version of quickSort degenerates both for in-order and reverse-order inputs. *lights go on* Of course! While I have about 90K values to sort, it's only a range from 0 to about 5-600, and a less than even distribution at that. (I must be a lot denser than I thought. Colin, if you ever happen to pass by, do let me know, I think I owe you a beer.) Okay: bucket sort; does anybody know of a nice bucket sort I can rip off? :-) (Actually, while I haven't done the math or the tests to say for sure, I suspect a trivial mod to QS where equals are kept in a separate list might do just fine. Would that be a sensible modification to put in the standard libraries, I wonder?) Thanks again! -kzm -- If I haven't seen further, it is by standing in the footprints of giants ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Weird profiling behaviour
Hi, I have what I think is a really strange problem. I have a fair sized problem, which involves sorting a data set, first on labels (which are Strings) and then on scores (which are Ints). The strange thing is that string sorting is *vastly* faster than int scoring! Now, I've tried modifying the sorting routinges, moving them into the same module, and so on, to no avail. Is there any likely explanation? The pipeline is basically sortBy int_cmp . compound_equal . sortBy string_cmp I hesitate to post the code, since it's part of a rather large program, and in the hope that somebody will pop up and say that oh, yes, it's a well know feature of 'sortBy'. Or that it's an artifact of profiling. Or something like that. Any, and I mean any, suggestions really, really welcome. -kzm -- If I haven't seen further, it is by standing in the footprints of giants ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Weird profiling behaviour
Ketil Z. Malde wrote: I have what I think is a really strange problem. I have a fair sized problem, which involves sorting a data set, first on labels (which are Strings) and then on scores (which are Ints). The strange thing is that string sorting is *vastly* faster than int scoring! Now, I've tried modifying the sorting routinges, moving them into the same module, and so on, to no avail. Is there any likely explanation? The pipeline is basically sortBy int_cmp . compound_equal . sortBy string_cmp I hesitate to post the code, since it's part of a rather large program, and in the hope that somebody will pop up and say that oh, yes, it's a well know feature of 'sortBy'. Or that it's an artifact of profiling. Or something like that. Any, and I mean any, suggestions really, really welcome. -kzm Could it be that the string-comparison sort simply has less sorting to do than the int-comparison sort? The default definition of sortBy uses insertion sort, so if the string-sort input happens to be already sorted it takes linear time and if the int-sort input happens to be in reverse order it takes quadratic time. Colin R ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Weird profiling behaviour
Colin Runciman [EMAIL PROTECTED] writes: Could it be that the string-comparison sort simply has less sorting to do than the int-comparison sort? Not quite improbable, hang on while I print the profiling (with comparison in its own function): Yes, that seems to be the case, for 90K values to sort, I get 7M string comparisons and 321M integer comparisons. I'm running a new test now, with a larger number of values to sort, we'll see how it goes. Looks promising, thanks! The default definition of sortBy uses insertion sort I have vague recollection of the wisdom of this choice being questioned on these lists or others, and even vaguer recollection of it actually being a good choice. Comments, anybody? -kzm -- If I haven't seen further, it is by standing in the footprints of giants ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Weird profiling behaviour
[EMAIL PROTECTED] (Ketil Z. Malde) writes: for 90K values to sort, I get 7M string comparisons and 321M integer ..and with different parameters giving 127K values, ie. a factor of 1.4, I get 12M and 614M comparisons, *very* close to the expected O(n²) behavior of insertion sort. The default definition of sortBy uses insertion sort I have vague recollection of the wisdom of this choice being questioned And now I think I'm about question it as well... -kzm (writing his own O(n log n) sortBy as we speak) -- If I haven't seen further, it is by standing in the footprints of giants ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Weird profiling behaviour
On Wednesday 26 June 2002 04:19 am, Colin Runciman wrote: Could it be that the string-comparison sort simply has less sorting to do than the int-comparison sort? The default definition of sortBy uses insertion sort, so if the string-sort input happens to be already sorted it takes linear time and if the int-sort input happens to be in reverse order it takes quadratic time. A question I would like to ask is which compiler and libraries are you using? In the December 2001 version of Hugs, sortBy is implemented with the default definition of insertion sort. But if you are using ghc, you are probably using a quicksort algorithm. (Recently the ghc libraries were switched to use mergesort instead, but I'm not sure whether this made it into any of the released versions.) Remember that quicksort behaves quadratically in the worst case, which can happen with pre-sorted input. Maybe this can explain why the second round of sorting is so slow? - Brian Huffman ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users