Re: [Tutor] sorting algorithm

C.T. Matsumoto Wed, 03 Mar 2010 10:51:25 -0800

Dave Angel wrote:

C.T. Matsumoto wrote:
Hello,
This is follow up on a question I had about algorithms. In the threadit was suggested I make my own sorting algorithm.
Here are my results.

#!/usr/bin/python

def sort_(list_):
   for item1 in list_:
       pos1 = list_.index(item1)
       pos2 = pos1 + 1
       try:
           item2 = list_[pos2]
       except IndexError:
           pass

       if item1 >= item2:
           try:
               list_.pop(pos2)
               list_.insert(pos1, item2)
               return True
           except IndexError:
               pass

def mysorter(list_):
   while sort_(list_) is True:
       sort_(list_)
I found this to be a great exercise. In doing the exercise, I gotpretty stuck. I consulted another programmer (my dad) who describedhow to go about sorting. As it turned out the description hedescribed was the Bubble sort algorithm. Since coding the solution Iknow the Bubble sort is inefficient because of repeated iterationsover the entire list. This shed light on the quick sort algorithmwhich I'd like to have a go at.
Something I haven't tried is sticking in really large lists. I wastold that with really large list you break down the input list intosmaller lists. Sort each list, then go back and use the same swappingprocedure for each of the different lists. My question is, at whatpoint to you start breaking things up? Is that based on list elementsor is it based on memory(?) resources python is using?
One thing I'm not pleased about is the while loop and I'd like toreplace it with a for loop.
Thanks,

T
There are lots of references on the web about Quicksort, including avideo at:
    http://www.youtube.com/watch?v=y_G9BkAm6B8
which I think illustrates it pretty well. It would be a greatlearning exercise to implement Python code directly from thatdescription, without using the sample C++ code available.
(Incidentally, there are lots of variants of Quicksort, so I'm notgoing to quibble about whether this is the "right" one to be calledthat.)
I don't know what your earlier thread was, since you don't mention thesubject line, but there are a number of possible reasons you might nothave wanted to use the built-in sort. The best one is for educationalpurposes. I've done my own sort for various reasons in the past, eventhough I had a library function, since the library function had somelimits. One time I recall, the situation was that the library sortwas limited to 64k of total data, and I had to work with much largerarrays (this was in 16bit C++, in "large" model). I solved the sizeproblem by using the C++ sort library on 16k subsets (because apointer was 2*2 bytes). Then I merged the results of the sorts. Atthe time, and in the circumstances involved, there were seldom morethan a dozen or so sublists to merge, so this approach worked wellenough.
Generally, it's better for both your development time and theefficiency and reliabilty of the end code, to base a new sortmechanism on the existing one. In my case above, I was replacing whatamounted to an insertion sort, and achieved a 50* improvement for areal customer. It was fast enough that other factors completelydominated his running time.
But for learning purposes? Great plan. So now I'll respond to yourother questions, and comment on your present algorithm.
It would be useful to understand about algorithmic complexity, the socalled Order Function. In a bubble sort, if you double the size ofthe array, you quadruple the number of comparisons and swaps. It'sorder N-squared or O(n*n). So what works well for an array of size10 might take a very long time for an array of size 10000 (like amillion times as long). You can do much better by sorting smallerlists, and then combining them together. Such an algorithm can beO(n*log(n)).
You ask at what point you consider sublists? In a language like C,the answer is when the list is size 3 or more. For anything largerthan 2, you divide into sublists, and work on them.
Now, if I may comment on your code. You're modifying a list whileyou're iterating through it in a for loop. In the most general case,that's undefined. I think it's safe in this case, but I would avoidit anyway, by just using xrange(len(list_)-1) to iterate through it.You use the index function to find something you would already know --the index function is slow. And the first try/except isn't needed ifyou use a -1 in the xrange argument, as I do above.
You use pop() and push() to exchange two adjacent items in the list.Both operations copy the remainder of the list, so they're ratherslow. Since you're exchanging two items in the list, you can simplydo that:
    list[pos1], list[pos2] = list[pos2], list[pos1]

That also eliminates the need for the second try/except.
You mention being bothered by the while loop. You could replace itwith a simple for loop with xrange(len(list_)), since you know that Npasses will always be enough. But if the list is partially sorted,your present scheme will end sooner. And if it's fully sorted, it'llonly take one pass over the data.
There are many refinements you could do. For example, you don't haveto stop the inner loop after the first swap. You could finish thebuffer, swapping any other pairs that are out of order. You'd then besaving a flag indicating if you did any swaps. You could keep a indexpointing to the last pair you swapped on the previous pass, and usethat for a limit next time. Then you just terminate the outer loopwhen that limit value is 1. You could even keep two limit values, andbubble back and forth between them, as they gradually close into themedian of the list. You quit when they collide in the middle.
The resultant function should be much faster for medium-sized lists,but it still will slow down quadratically as the list size increases.You still need to divide and conquer, and quicksort is just one way ofdoing that.
DaveA

Thanks a lot Dave,

Sorry the original thread is called 'Python and algorithms'.

Yes, I think it's best to use what python provides and build on top ofthat. I got to asking my original question based on trying to learn moreabout algorithms in general, through python. Of late many people havebeen asking me how well I can 'build' algorithms, and this prompted meto start the thread. This is for learning purposes (which the originalthread will give you and indication where I'm coming from).

The refactored code looks like this. I have tackled a couple items.First the sub-listing (which I'll wait till I can get the full sortworking), then the last couple of paragraphs about refinements. Startingwith the first refinement, I'm not sure how *not* to stop the inner loop?


def s2(list_):
  for pos1 in xrange(len(list_)-1):
      item1 = list_[pos1]
      pos2 = pos1 + 1
      item2 = list_[pos2]
      if item1 >= item2:
          list_[pos1], list_[pos2] = list_[pos2], list_[pos1]
          return True

def mysorter(list_):
  # This is the outer loop?
  while s2(list_) is True:
      # Calling s2 kicks off the inner loop?
      s2(list_)

if __name__ == '__main__':
  from random import shuffle
  foo = range(10)
  shuffle(foo)
  mysorter(foo)

Thanks again.
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] sorting algorithm

Reply via email to