Dictionary vs binary search lookups [was: Populating a dictionary, fast [SOLVED]]

2007-11-21 Thread Francesc Altet
A Tuesday 20 November 2007, Istvan Albert escrigué: On Nov 19, 2:33 pm, Francesc Altet [EMAIL PROTECTED] wrote: Just for the record. I was unable to stop thinking about this, and after some investigation, I guess that my rememberings were gathered from some benchmarks with code in Pyrex

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-20 Thread harri
On Nov 15, 9:51 pm, Michael Bacarella [EMAIL PROTECTED] wrote: Since some people missed the EUREKA!, here's the executive summary: Python2.3: about 45 minutes Python2.4: about 45 minutes Python2.5: about _30 seconds_ FYI, I tried on two 64 bit SMP machines (4 way

Re: Populating a dictionary, fast [SOLVED]

2007-11-20 Thread Istvan Albert
On Nov 19, 2:33 pm, Francesc Altet [EMAIL PROTECTED] wrote: Just for the record. I was unable to stop thinking about this, and after some investigation, I guess that my rememberings were gathered from some benchmarks with code in Pyrex (i.e. pretty near to C speed). Pretty interesting and

Re: Populating a dictionary, fast [SOLVED]

2007-11-19 Thread Francesc Altet
A Wednesday 14 November 2007, Francesc Altet escrigué: I think I've got messed on some benchmarks that I've done on that subject some time ago, but either my memory is bad or I've made some mistake on those experiments. My apologies. Just for the record. I was unable to stop thinking about

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-17 Thread Hendrik van Rooyen
Michael Bacarella mb...opper.comwrote: I am so sorry to have ruined the decorum. Oh dear! I confidently predict that this thread will now degenerate to include such things as dignitas and gravitas. - Hendrik -- http://mail.python.org/mailman/listinfo/python-list

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-16 Thread Hrvoje Niksic
Steven D'Aprano [EMAIL PROTECTED] writes: Can you post minimal code that exhibits this behavior on Python 2.5.1? The OP posted a lot of different versions, most of which worked just fine for most people. Who were testing it on single-CPU, 32 bit systems. Still, I'd like to see a test case

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-16 Thread Istvan Albert
On Nov 16, 1:18 pm, Michael Bacarella [EMAIL PROTECTED] wrote: You're right, it is completely inappropriate for us to be showing our dirty laundry to the public. you are misinterpreting my words on many levels, (and I of course could have refrained from the chair-monitor jab as well) anyhow,

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-16 Thread Jeffrey Froman
Steven D'Aprano wrote: Can you try it running in 64-bit mode? Here are my results using the following test.py: $ cat test.py #!/usr/bin/python import time print Starting: %s % time.ctime() v = {} for line in open('keys.txt'): v[long(line.strip())] = True print Finished: %s % time.ctime()

RE: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-16 Thread Michael Bacarella
Do you really believe that you cannot create or delete a large dictionary with python versions less than 2.5 (on a 64 bit or multi- cpu system)? That a bug of this magnitude has not been noticed until someone posted on clp? You're right, it is completely inappropriate for us to be showing our

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-16 Thread Steven D'Aprano
On Fri, 16 Nov 2007 11:24:24 +0100, Hrvoje Niksic wrote: Steven D'Aprano [EMAIL PROTECTED] writes: Can you post minimal code that exhibits this behavior on Python 2.5.1? The OP posted a lot of different versions, most of which worked just fine for most people. Who were testing it on

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Aaron Watters
On Nov 14, 6:26 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: Someone please summarize. Yes, that would be good. On systems with multiple CPUs or 64-bit systems, or both, creating and/or deleting a multi-megabyte dictionary in recent versions of Python (2.3, 2.4, 2.5 at

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Aaron Watters
On Nov 14, 6:26 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: On systems with multiple CPUs or 64-bit systems, or both, creating and/or deleting a multi-megabyte dictionary in recent versions of Python (2.3, 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes,

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Chris Mellon
On Nov 14, 2007 5:26 PM, Steven D'Aprano [EMAIL PROTECTED] wrote: On Wed, 14 Nov 2007 18:16:25 +0100, Hrvoje Niksic wrote: Aaron Watters [EMAIL PROTECTED] writes: On Nov 12, 12:46 pm, Michael Bacarella [EMAIL PROTECTED] wrote: It takes about 20 seconds for me. It's possible it's

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Istvan Albert
On Nov 14, 6:26 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: On systems with multiple CPUs or 64-bit systems, or both, creating and/or deleting a multi-megabyte dictionary in recent versions of Python (2.3, 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes,

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Aaron Watters
On Nov 15, 2:11 pm, Istvan Albert [EMAIL PROTECTED] wrote: There is nothing wrong with neither creating nor deleting dictionaries. I suspect what happened is this: on 64 bit machines the data structures for creating dictionaries are larger (because pointers take twice as much space), so you run

RE: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Michael Bacarella
On Nov 15, 2:11 pm, Istvan Albert [EMAIL PROTECTED] wrote: There is nothing wrong with neither creating nor deleting dictionaries. I suspect what happened is this: on 64 bit machines the data structures for creating dictionaries are larger (because pointers take twice as much space), so

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Steven D'Aprano
On Thu, 15 Nov 2007 11:11:57 -0800, Istvan Albert wrote: On Nov 14, 6:26 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: On systems with multiple CPUs or 64-bit systems, or both, creating and/or deleting a multi-megabyte dictionary in recent versions of Python (2.3, 2.4,

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Steven D'Aprano
On Thu, 15 Nov 2007 10:51:08 -0600, Chris Mellon wrote: I can't duplicate this in a dual CPU (64 bit, but running in 32 bit mode with a 32 bit OS) system. Can you try it running in 64-bit mode? What Python version are you running? -- Steven. --

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Hrvoje Niksic
Steven D'Aprano [EMAIL PROTECTED] writes: Someone please summarize. Yes, that would be good. On systems with multiple CPUs or 64-bit systems, or both, creating and/or deleting a multi-megabyte dictionary in recent versions of Python (2.3, 2.4, 2.5 at least) takes a LONG time, of the

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Steven D'Aprano
On Thu, 15 Nov 2007 15:51:25 -0500, Michael Bacarella wrote: Since some people missed the EUREKA!, here's the executive summary: Python2.3: about 45 minutes Python2.4: about 45 minutes Python2.5: about _30 seconds_ I'm really happy that upgrading to 2.5 solved the issue

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Chris Mellon
On Nov 15, 2007 2:57 PM, Steven D'Aprano [EMAIL PROTECTED] wrote: On Thu, 15 Nov 2007 10:51:08 -0600, Chris Mellon wrote: I can't duplicate this in a dual CPU (64 bit, but running in 32 bit mode with a 32 bit OS) system. Can you try it running in 64-bit mode? What Python version are you

RE: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Michael Bacarella
On Thu, 15 Nov 2007 15:51:25 -0500, Michael Bacarella wrote: Since some people missed the EUREKA!, here's the executive summary: Python2.3: about 45 minutes Python2.4: about 45 minutes Python2.5: about _30 seconds_ I'm really happy that upgrading to 2.5 solved the issue

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Steven D'Aprano
On Thu, 15 Nov 2007 21:51:21 +0100, Hrvoje Niksic wrote: Steven D'Aprano [EMAIL PROTECTED] writes: Someone please summarize. Yes, that would be good. On systems with multiple CPUs or 64-bit systems, or both, creating and/or deleting a multi-megabyte dictionary in recent versions of

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Paul Rubin
Steven D'Aprano [EMAIL PROTECTED] writes: (7) It occurs in Python 2.3, 2.4 and 2.5, but not 2.5.1. Do we treat this as a solved problem and move on? I'm not comfortable with this. Better to identify the cause for certain. I'll look at it sometime if I get a chance. I have some 64-bit

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Terry Reedy
Steven D'Aprano [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] | (7) It occurs in Python 2.3, 2.4 and 2.5, but not 2.5.1. | | Do we treat this as a solved problem and move on? If the problem was fixed accidentally as an undocumented by product of a patch aimed at something else, it

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Istvan Albert
On Nov 15, 4:11 pm, Steven D'Aprano [EMAIL PROTECTED] cybersource.com.au wrote: Unless you're accusing both myself and the original poster of outright lying, of faking our results, what's your explanation? I don't attribute it to malice, I think you're simply measuring something else. You both

Re: Populating a dictionary, fast [SOLVED]

2007-11-14 Thread Francesc Altet
A Tuesday 13 November 2007, Steven D'Aprano escrigué: On Tue, 13 Nov 2007 17:27:11 +0100, Francesc Altet wrote: I don't know exactly why do you need a dictionary for keeping the data, but in case you want ultra-fast access to values, there is no replacement for keeping a sorted list of keys

Re: Populating a dictionary, fast [SOLVED]

2007-11-14 Thread Francesc Altet
A Wednesday 14 November 2007, Istvan Albert escrigué: On Nov 13, 11:27 am, Francesc Altet [EMAIL PROTECTED] wrote: Another possibility is using an indexed column in a table in a DB. Lookups there should be much faster than using a dictionary as well. I would agree with others who state

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-14 Thread Aaron Watters
On Nov 12, 12:46 pm, Michael Bacarella [EMAIL PROTECTED] wrote: It takes about 20 seconds for me. It's possible it's related to int/long unification - try using Python 2.5. If you can't switch to 2.5, try using string keys instead of longs. Yes, this was it. It ran *very* fast on

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-14 Thread Hrvoje Niksic
Aaron Watters [EMAIL PROTECTED] writes: On Nov 12, 12:46 pm, Michael Bacarella [EMAIL PROTECTED] wrote: It takes about 20 seconds for me. It's possible it's related to int/long unification - try using Python 2.5. If you can't switch to 2.5, try using string keys instead of longs. Yes,

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-14 Thread Steven D'Aprano
On Wed, 14 Nov 2007 18:16:25 +0100, Hrvoje Niksic wrote: Aaron Watters [EMAIL PROTECTED] writes: On Nov 12, 12:46 pm, Michael Bacarella [EMAIL PROTECTED] wrote: It takes about 20 seconds for me. It's possible it's related to int/long unification - try using Python 2.5. If you can't

RE: Populating a dictionary, fast [SOLVED]

2007-11-13 Thread Michael Bacarella
See end for solution. (3) Are you sure you need all eight-million-plus items in the cache all at once? Yes. I remain skeptical, but what do I know, I don't even know what you're doing with the data once you have it :-) It's OK, I'd be skeptical too. ;) $ cat /proc/cpuinfo

Re: Populating a dictionary, fast [SOLVED]

2007-11-13 Thread Francesc Altet
A Monday 12 November 2007, Michael Bacarella escrigué: As for the solution, after trying a half-dozen different integer hashing functions and hash table sizes (the brute force approach), on a total whim I switched to a model with two dictionary tiers and got whole orders of magnitude better

Re: Populating a dictionary, fast [SOLVED]

2007-11-13 Thread Paul McGuire
On Nov 12, 11:32 am, Michael Bacarella [EMAIL PROTECTED] wrote: See end for solution. (3) Are you sure you need all eight-million-plus items in the cache all at once? Yes. I remain skeptical, but what do I know, I don't even know what you're doing with the data once you have it

RE: Populating a dictionary, fast [SOLVED]

2007-11-13 Thread Michael Bacarella
Shouldn't this be: id2name[key 40][key 0xff] = name Yes, exactly, I had done hex(pow(2,40)) when I meant hex(pow(2,40)-1) I sent my correction a few minutes afterwards but Mailman queued it for moderator approval (condition with replying to myself?) --

Re: Populating a dictionary, fast [SOLVED]

2007-11-13 Thread Hrvoje Niksic
Francesc Altet [EMAIL PROTECTED] writes: I don't know exactly why do you need a dictionary for keeping the data, but in case you want ultra-fast access to values, there is no replacement for keeping a sorted list of keys and a list with the original indices to values, and the proper list of

Re: Populating a dictionary, fast [SOLVED]

2007-11-13 Thread Steven D'Aprano
On Tue, 13 Nov 2007 17:27:11 +0100, Francesc Altet wrote: I don't know exactly why do you need a dictionary for keeping the data, but in case you want ultra-fast access to values, there is no replacement for keeping a sorted list of keys and a list with the original indices to values, and the

Re: Populating a dictionary, fast [SOLVED]

2007-11-13 Thread Istvan Albert
On Nov 13, 11:27 am, Francesc Altet [EMAIL PROTECTED] wrote: Another possibility is using an indexed column in a table in a DB. Lookups there should be much faster than using a dictionary as well. I would agree with others who state that for a simple key based lookup nothing beats a

RE: Populating a dictionary, fast [SOLVED]

2007-11-12 Thread Michael Bacarella
id2name[key 40][key 0x100] = name Oops, typo. It's actually: Id2name[key 40][key 0xff] = name -- http://mail.python.org/mailman/listinfo/python-list

RE: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-12 Thread Michael Bacarella
You can download the list of keys from here, it's 43M gzipped: http://www.sendspace.com/file/9530i7 and see it take about 45 minutes with this: $ cat cache-keys.py #!/usr/bin/python v = {} for line in open('keys.txt'): v[long(line.strip())] = True It takes

Re: Populating a dictionary, fast [SOLVED]

2007-11-12 Thread Paul Rubin
Michael Bacarella [EMAIL PROTECTED] writes: id2name[key 40][key 0x100] = name Oops, typo. It's actually: Id2name[key 40][key 0xff] = name Are you saying this is a patch that appeared after python 2.3? Somehow I don't think it's come up in this thread whether