On Sun, Nov 24, 2013 at 8:22 PM, Joel Goldstick <joel.goldst...@gmail.com> wrote: > On Sun, Nov 24, 2013 at 8:12 PM, Steven D'Aprano <st...@pearwood.info> wrote: >> On Sun, Nov 24, 2013 at 09:33:11PM +0530, Reuben wrote: >> [...] >>> From the above output, I see key 'c' is at third position during input, but >>> while displaying the output it is displayed at second position >> >> Dictionaries are deliberately made unordered. That means the order that >> items will be displayed is free to vary, not only from how you entered >> them, but in principle they might even vary each time you inspect the >> dict. (They probably won't, but they could.) >> >> Python dicts are actually "hash tables", which is a standard computer >> science data structure that you can look up, but in a nutshell, a hash >> table is an array indexed by the key's hash value. Why do we care about >> hash tables? Because they make a really fast and efficient way to look >> up keys, and from the key get access to a value. >> >> Let's start with the simplest way to store a key and value pair, an >> unsorted table of pairs with a marker indicating "blank", so you know >> when you've reached the end. Here's an example of a half-filled table: >> >> { ("Hello", value), # key first, then some value >> ("World", value), >> ("Spam", value), >> ("Eggs", value), >> ("Cheese", value), >> <blank>, >> <blank>, >> <blank>, >> <blank>, >> <blank>, >> } >> >> In order to find a key in the table, you have to inspect every position >> until you reach either the key you are looking for, or <blank>, in which >> case you know that the key is not there. So to find "Eggs", you have to >> inspect "Hello", "World", "Spam" and finally "Eggs". On average, a >> successful search takes half as many inspections as there are keys, e.g. >> if you have 10 keys in the table, you need to inspect 5 positions; if >> there are 100 keys, you need to inspect 50 positions on average. If >> there are 1000 keys, you need to inspect 500 positions. >> >> This is pretty slow. Imagine looking up a word in a dictionary (a real >> paper dictionary), but with a twist: the words are in whatever order the >> dictionary writer first thought of them, not alphabetical order, so for >> any word you have to start at the beginning and read every single world >> until you find it. >> >> Obviously we can do better, by keeping the words sorted. Our table will >> look like this: >> >> { ("Cheese", value), >> ("Eggs", value), >> ("Hello", value), >> ("Spam", value), >> ("World", value), >> <blank>, >> <blank>, >> <blank>, >> <blank>, >> <blank>, >> } >> >> In this case, we can use a technique called "binary search" to narrow >> down the possible places the key might be. This is very much like what >> you probably do when looking up words in a paper dictionary: flip the >> dictionary open to the middle, then decide whether you've gone too far >> or not far enough. Each time, you split the remaining half in half >> again, until you've narrowed down to the page containing the word you >> want. >> >> In a sorted table with binary search, the savings over an unsorted table >> can be very high. With 10 keys, on average you will end up inspecting 3 >> or 4 items, which isn't much improvement over 5 for the unsorted case, >> but with 100 keys, you will inspect 6 or 7, and with 1000 keys, 9 or 10. >> With a million keys, you only need to inspect about 20 items to either >> find the key you are looking for, or determine it isn't there. >> >> Can we do better than binary search? Yes we can, and we do so with a >> hash table, which is what Python dicts are. >> >> The secret to the hash table is that instead of putting all the keys at >> the beginning of the table, we want the entries to be scattered at >> random all over the place. Only not quite at random, since we need a way >> to jump straight to the key when asked. That's the *hash function*, >> which converts a key to an arbitrary, and distinct, index: >> >> { <blank>, >> ("World", value), # hash of "World" is 1 >> ("Cheese", value), # hash of "Cheese" is 2 >> <blank>, >> <blank>, >> ("Eggs", value), # hash of "Eggs" is 5 >> <blank>, >> ("Spam", value), # hash of "Spam" is 7 >> <blank>, >> ("Hello", value), # hash of "Hello" is 9 >> } >> >> So the positions of the keys in the table depend on the keys, not the >> order you write them down. The advantage of this is that looking for a >> key requires (near enough to) constant time. If you want to see whether >> "Eggs" is in the table, it doesn't matter whether there are ten keys or >> ten billion, it will take exactly the same amount of time: hash the key >> "Eggs" to give you index 5, look at index 5, it is either there or it >> isn't. >> >> Now, in reality we can't actually guarantee that every hash value is >> distinct. In real life, a hash table has to deal with "collisions", >> which is when two different keys have the same hash value, and that will >> spoil the nice simple constant time performance. But collisions are >> usually rare, and so when discussing hash tables in broad terms we >> normally ignore them. >> >> In Python, you can calculate the hash value of a key with the hash() >> function. The exact values you get may depend on the specific version of >> Python you are using, but to give you an example: >> >> py> hash("cat") >> 405875055 >> py> hash("hat") >> 1255409212 >> py> hash("fat") >> -750391218 >> >> >> So you can see, by changing just one letter of the key, we get a >> completely different hash value. A good hash function -- and Python's >> hash function is good -- will will mix up the positions of keys in a >> very unpredictable way. >> >> The end result of this is that when displaying a dict, keys are shown in >> whatever order they happen to be inside the hash table. That's >> unpredicable, depending on the details of the hash function, the actual >> keys, and whether there have been any collisions or not. >> >> If you care about this, it is easy enough to sort the keys before >> printing them: >> >> # instead of this >> print mydict.keys() >> >> # do this >> print sorted(mydict.keys()) >> >> >> If you care about keeping the insertion order, rather than alphabetical >> order, you can use an OrderedDict: >> >> from collections import OrderedDict >> >> >> but OrderedDicts are slower and use more memory than regular dicts, >> since they have to track the order that the keys were inserted. >> >> >> >> -- >> Steven >> _______________________________________________ >> Tutor maillist - Tutor@python.org >> To unsubscribe or change subscription options: >> https://mail.python.org/mailman/listinfo/tutor > > Well explained Stephen
Oops... I mean Steven!... > > -- > Joel Goldstick > http://joelgoldstick.com -- Joel Goldstick http://joelgoldstick.com _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor