On Wed, Mar 19, 2014 at 7:16 PM, Rhodri James wrote:
> 65536 is a suspiciously round number. You might want to double-
> check that there's no 16-bit overflow causing something unexpected.
It's because I'm using powers of 2. All the numbers in the report are
round in hex.
--
https://mail.pytho
On Tue, 18 Mar 2014 21:45:52 -, Dan Stromberg
wrote:
blist.sorteddict was able to do 65536 operations on a dictionary
before taking more than 120 seconds to complete - it took 77.3
seconds to do 65536 operations.
65536 is a suspiciously round number. You might want to double-
check th
On 03/18/2014 06:15 PM, Steven D'Aprano wrote:
On Tue, 18 Mar 2014 15:21:28 -0700, Dan Stromberg wrote:
On Tue, Mar 18, 2014 at 3:03 PM, Marko Rauhamaa
wrote:
Dan Stromberg :
For a proper comparison, I'd like a fixed, identical dataset and set of
operations run against each data structure.
H
d?
Those are obviously not random, and using them as a keys in a balanced
tree would give you sub-optimum performance.
This is not a hypothetical scenario. Songza uses MongoDB as our
database. The indexes are balanced trees. One of our biggest
collections has a multi-component key, one of th
Steven D'Aprano :
> Please re-read what I wrote. I didn't say "if your data comes to you
> fully randomized". I said "if you are in a position to randomize the
> data before storing it". In other words, if you actively randomize the
> data yourself.
Yes, you could implement a "hash table" that wa
On Wed, 19 Mar 2014 10:49:33 +0200, Marko Rauhamaa wrote:
> Steven D'Aprano :
>
>> If you are in a position to randomize the data before storing it in the
>> tree, an unbalanced binary tree is a solid contender.
>
> Applications that can assume randomly distributed data are exceedingly
> rare
Steven D'Aprano :
> If you are in a position to randomize the data before storing it in
> the tree, an unbalanced binary tree is a solid contender.
Applications that can assume randomly distributed data are exceedingly
rare and even then, you can't easily discount the possibility of
worst-case or
On Tue, 18 Mar 2014 15:21:28 -0700, Dan Stromberg wrote:
> On Tue, Mar 18, 2014 at 3:03 PM, Marko Rauhamaa
> wrote:
>> Dan Stromberg :
>> For a proper comparison, I'd like a fixed, identical dataset and set of
>> operations run against each data structure.
>>
>> How about this test program:
>
>
On Wed, 19 Mar 2014 01:11:33 +0200, Marko Rauhamaa wrote:
> Dan Stromberg :
>> Rather than throw out unbalanced binary tree altogether, it makes more
>> sense to run it until it gets "too slow".
>
> I disagree strongly. You should throw out unbalanced binary trees and
> linked lists and the like
>
> On Tue, Mar 18, 2014 at 1:55 PM, Marko Rauhamaa wrote:
>
Dan Stromberg :
>> > The results are at
>> >
>> http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-03/
>
>
Size: 1048576, duration: 75.3, dictionary type: dict
> [...]
> Size: 262144, duration: 66.
Dan Stromberg :
> On Tue, Mar 18, 2014 at 3:03 PM, Marko Rauhamaa wrote:
>> Dan Stromberg :
>> For a proper comparison, I'd like a fixed, identical dataset and set
>> of operations run against each data structure.
>>
>> How about this test program:
>
> I used to do essentially this, but it was ti
On Tue, Mar 18, 2014 at 3:03 PM, Marko Rauhamaa wrote:
> Dan Stromberg :
> For a proper comparison, I'd like a fixed, identical dataset and set of
> operations run against each data structure.
>
> How about this test program:
I used to do essentially this, but it was time-prohibitive and
produced
Dan Stromberg :
> dict was able to do 1048576 operations on a dictionary before taking
> more than 120 seconds to complete - it took 75.3 seconds to do 1048576
> operations.
>
> AVL_tree was able to do 262144 operations on a dictionary before
> taking more than 120 seconds to complete - it took 66
On Tue, Mar 18, 2014 at 1:55 PM, Marko Rauhamaa wrote:
> Dan Stromberg :
>
>> The results are at
>> http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-03/
>
> Unfortunately I'm having a hard time understanding the results.
>
> The 50/50 get/set ratio is most interesting t
Dan Stromberg :
> The results are at
> http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-03/
Unfortunately I'm having a hard time understanding the results.
The 50/50 get/set ratio is most interesting to me.
I'm seeing (under cpython-3.3):
Size: 1048576, duratio
On Mon, Mar 17, 2014 at 3:05 PM, Marko Rauhamaa wrote:
> Joshua Landau :
>
>> The thing we really need is for the blist containers to become stdlib
>> (but not to replace the current list implementation).
>
> Very interesting. Downloaded blist but didn't compile it yet. It *could*
> be the missing
On 18 March 2014 01:01, Daniel Stutzbach wrote:
> I would love to have include macro-benchmarks. I keep waiting for the PyPy
> benchmark suite to get ported to Python 3...
*grins*
>> "Delete a slice" is fudged from its inclusion of multiplication, which
>> is far faster on blists. I admit that
On 17 March 2014 21:16, Daniel Stutzbach wrote:
> On Fri, Mar 14, 2014 at 6:13 PM, Joshua Landau wrote:
>>
>> Now, I understand there are downsides to blist. Particularly, I've
>> looked through the "benchmarks" and they seem untruthful.
>
> I worked hard to make those benchmarks as fair as possi
Joshua Landau :
> The thing we really need is for the blist containers to become stdlib
> (but not to replace the current list implementation).
Very interesting. Downloaded blist but didn't compile it yet. It *could*
be the missing link.
I would *love* to see some comparative performance results
reason. Sometimes people claim that collections.deque isn't
powerful enough for whatever they want, and blist will almost
definitely sate those cases.
* Balanced trees, with blist.sortedlist. This is actually needed right now.
* Poor performance in the cases where a lot of list merging and pruning
claim that collections.deque isn't
powerful enough for whatever they want, and blist will almost
definitely sate those cases.
* Balanced trees, with blist.sortedlist. This is actually needed right now.
* Poor performance in the cases where a lot of list merging and pruning happens.
* Most
On Thu, 13 Mar 2014 20:12:41 -0700, Dan Stromberg wrote:
> Sorting the same (slightly tweaked) data inside of a tight loop is
> rarely a good idea - despite the fact that the "sort" itself tends to be
> O(n) with Python's rather awesome builtin sorting algorithm. This is
> because sorting inside
On Thu, Mar 13, 2014 at 4:57 PM, Steven D'Aprano
wrote:
> On Mon, 10 Mar 2014 19:34:48 +0200, Marko Rauhamaa wrote:
>
>>> With a high level language like Python, using the provided hash table
>>> will almost always cream any hand-built tree, no matter how
>>> advantageous the data is to the tree.
On Mon, 10 Mar 2014 19:34:48 +0200, Marko Rauhamaa wrote:
>> With a high level language like Python, using the provided hash table
>> will almost always cream any hand-built tree, no matter how
>> advantageous the data is to the tree.
>
> The main thing is there are use cases where order is essen
On Mon, Mar 10, 2014 at 11:34 AM, Marko Rauhamaa wrote:
> The main thing is there are use cases where order is essential. That's
> why I have had to implement the AVL tree in Python myself. No biggie,
> but a C implementation would probably be much faster. Also, a standard
> version would likely b
Op 11-03-14 00:24, Roy Smith schreef:
> In article <8761nmrnfk@elektro.pacujo.net>,
> Marko Rauhamaa wrote:
>
>> Anyway, this whole debate is rather unnecessary since every developer is
>> supposed to have both weapons in their arsenal.
> The problem with having a choice is that it opens up
On 11/03/2014 8:12 PM, Marko Rauhamaa wrote:
Python should let skilled professionals do their work. Thankfully, for
the most part, it does.
Skilled professionals don't solely rely on the standard library, either.
If you know you need a balanced tree, you'll also know where to find an
implemen
On Tue, 11 Mar 2014 04:28:25 -, Chris Angelico
wrote:
On Tue, Mar 11, 2014 at 2:38 PM, Ian Kelly wrote:
On Mon, Mar 10, 2014 at 7:45 PM, Chris Angelico
wrote:
No no,
I could make this so much better by using the 80x86 "REP MOVSW"
command (or commands, depending on your point of view)
Steven D'Aprano :
> On Mon, 10 Mar 2014 19:24:07 -0400, Roy Smith wrote:
>
>> In article <8761nmrnfk@elektro.pacujo.net>,
>> Marko Rauhamaa wrote:
>>
>>> Anyway, this whole debate is rather unnecessary since every developer
>>> is supposed to have both weapons in their arsenal.
>>
>> The p
On Mon, 10 Mar 2014 19:24:07 -0400, Roy Smith wrote:
> In article <8761nmrnfk@elektro.pacujo.net>,
> Marko Rauhamaa wrote:
>
>> Anyway, this whole debate is rather unnecessary since every developer
>> is supposed to have both weapons in their arsenal.
>
> The problem with having a choice i
On Tue, Mar 11, 2014 at 1:20 PM, Roy Smith wrote:
> In article <531e6eca$0$29994$c3e8da3$54964...@news.astraweb.com>,
> Steven D'Aprano wrote:
>
>> There's only so much matter in the
>> universe, so talking about limits as the amount of data approaches
>> infinity is nonsense. Where would you st
On Tue, Mar 11, 2014 at 2:38 PM, Ian Kelly wrote:
> On Mon, Mar 10, 2014 at 7:45 PM, Chris Angelico wrote:
>> No no,
>> I could make this so much better by using the 80x86 "REP MOVSW"
>> command (or commands, depending on your point of view). That would be
>> so much better than all those separat
On Mon, Mar 10, 2014 at 7:45 PM, Chris Angelico wrote:
> On Tue, Mar 11, 2014 at 12:26 PM, Steven D'Aprano
> wrote:
>> In my experience, the average developer has an amazing talent for
>> pessimising code when they think they are optimising it.
>
> I remember a number of incidents from personal e
In article <531e6eca$0$29994$c3e8da3$54964...@news.astraweb.com>,
Steven D'Aprano wrote:
> There's only so much matter in the
> universe, so talking about limits as the amount of data approaches
> infinity is nonsense. Where would you store it?
Funny you should ask that. I just finished watc
In article ,
Dan Stromberg wrote:
> On Mon, Mar 10, 2014 at 6:59 AM, Roy Smith wrote:
> > On the other hand, log n, for n = 1 million, is just 20. It's not hard
> > to imagine a hash function which costs 20x what a node traversal does,
> > in which case, the log n lookup is ahead for all n < 1
On Mon, 10 Mar 2014 09:01:23 -0700, Rustom Mody wrote:
> 2. Being pointed out that a finite-input table-lookup being called a
> hash-function is a rather nonsensical claim and goes counter to the
> basic tenets of asymptotic notation. (In CS unlike in math 'asymptote'
> is always infinity) IOW
Th
On Tue, Mar 11, 2014 at 12:26 PM, Steven D'Aprano
wrote:
> In my experience, the average developer has an amazing talent for
> pessimising code when they think they are optimising it.
I remember a number of incidents from personal experience when I was a
*very* average developer. One time, writin
On Mon, 10 Mar 2014 19:24:07 -0400, Roy Smith wrote:
> In article <8761nmrnfk@elektro.pacujo.net>,
> Marko Rauhamaa wrote:
>
>> Anyway, this whole debate is rather unnecessary since every developer
>> is supposed to have both weapons in their arsenal.
>
> The problem with having a choice i
On Mon, Mar 10, 2014 at 6:59 AM, Roy Smith wrote:
> On the other hand, log n, for n = 1 million, is just 20. It's not hard
> to imagine a hash function which costs 20x what a node traversal does,
> in which case, the log n lookup is ahead for all n < 1 million.
FWIW, both the hash table and the
On Tue, Mar 11, 2014 at 10:24 AM, Roy Smith wrote:
> As this discussion has shown, figuring out whether a hash table or a
> tree is better for a given problem is non-trivial. My guess is that if
> you gave 1000 typical developers both data structures and let them pick
> freely, the number of case
In article <8761nmrnfk@elektro.pacujo.net>,
Marko Rauhamaa wrote:
> Anyway, this whole debate is rather unnecessary since every developer is
> supposed to have both weapons in their arsenal.
The problem with having a choice is that it opens up the possibility of
making the wrong one :-)
A
Chris Angelico :
> Supposed to have? What does that mean, a language isn't ISO-compliant
> unless it provides both?
It's an ancient, fundamental data structure, right up there with dynamic
lists. There's no reason it shouldn't be available in every programming
environment.
> With a high level la
On Tue, Mar 11, 2014 at 4:08 AM, Marko Rauhamaa wrote:
> Chris Angelico :
>
>> The only difference between a tree and a hash here is that the tree
>> might be able to short-cut the comparisons. But if there are a whole
>> bunch of songs with the same "song" and "user", then the tree has to
>> comp
Chris Angelico :
> The only difference between a tree and a hash here is that the tree
> might be able to short-cut the comparisons. But if there are a whole
> bunch of songs with the same "song" and "user", then the tree has to
> compare (song->song? same; user->user? same; add_time->add_time?
>
On Tue, Mar 11, 2014 at 3:33 AM, Tim Chase
wrote:
> On 2014-03-11 03:24, Chris Angelico wrote:
>> Imagine, worst case, all one million records have the same
>> song/user/add_time and you need to do twenty comparisons involving
>> four fields. That's gotta be worse than one hashing of five fields.
On 2014-03-11 03:24, Chris Angelico wrote:
> Imagine, worst case, all one million records have the same
> song/user/add_time and you need to do twenty comparisons involving
> four fields. That's gotta be worse than one hashing of five fields.
And if you have one million songs that are indistinguis
On Tue, Mar 11, 2014 at 12:59 AM, Roy Smith wrote:
> Looking at the Songza source, I see we have one user-defined hash
> function:
>
> def __hash__(self):
> return hash((self.song,
> self.user,
> self.add_time,
> self.delet
On Tue, Mar 11, 2014 at 12:59 AM, Roy Smith wrote:
> The hash vs. tree argument can get very complicated. For example, if
> your tree is not completely resident in memory, the cost of paging in a
> node will swamp everything else, and improving lookup speed will boil
> down to reducing the number
On Monday, March 10, 2014 4:27:13 PM UTC+5:30, Ned Batchelder wrote:
> You are right that you and Steven have had a hard time communicating.
> You are part of "you and Steven", it would be at least polite to
> consider that part of the reason for the difficulty has to do with your
> style. It c
pedantry. The discussion was how a balanced tree
> fares in comparison with hash tables. A simplistic O(1) vs O(log n) was
> presented as a proof that balanced trees don't have a chance in practice
> or theory. I wasn't so easily convinced.
>
>
> Marko
On the other
On 3/10/14 5:41 AM, Marko Rauhamaa wrote:
Steven D'Aprano :
If I am right, that certainly would explain your apparent inability to
understand the difference between "is" and == operators, your
insistence that object IDs are addresses, and your declaration that
object identity is philosophically
On 10/03/2014 08:53, Steven D'Aprano wrote:
In other words, I suspect you are trolling.
s/suspect/know/ , he didn't make captain of my dream team for nothing,
you know :)
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawre
Steven D'Aprano :
> If I am right, that certainly would explain your apparent inability to
> understand the difference between "is" and == operators, your
> insistence that object IDs are addresses, and your declaration that
> object identity is philosophically untenable.
You and I certainly have
lish every single one of their objectives in the war.
> The discussion was how a balanced tree
> fares in comparison with hash tables. A simplistic O(1) vs O(log n) was
> presented as a proof that balanced trees don't have a chance in practice
> or theory. I wasn't so easi
g n) was
presented as a proof that balanced trees don't have a chance in practice
or theory. I wasn't so easily convinced.
Marko
--
https://mail.python.org/mailman/listinfo/python-list
On Sun, 09 Mar 2014 18:04:46 -0600, Ian Kelly wrote:
> On Sun, Mar 9, 2014 at 4:08 PM, Dan Stromberg
> wrote:
>> On Sun, Mar 9, 2014 at 2:43 PM, Marko Rauhamaa
>> wrote:
>>> Dan Stromberg :
>>>
On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa
wrote:
> There is no O(1) hash table.
>>
On Sun, 09 Mar 2014 23:32:54 +0200, Marko Rauhamaa wrote:
> Dan Stromberg :
>
>> This is not just a detail: O(1) tends to be beat O(logn) pretty easily
>> for large n.
>
> There is no O(1) hash table.
Of course there are.
While it is true that hash tables *in general* are not *always* O(1), th
On Sun, Mar 9, 2014 at 4:08 PM, Dan Stromberg wrote:
> On Sun, Mar 9, 2014 at 2:43 PM, Marko Rauhamaa wrote:
>> Dan Stromberg :
>>
>>> On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa wrote:
There is no O(1) hash table.
>>>
>>> http://stackoverflow.com/questions/2771368/can-hash-tables-really
euristics. You can only claim O(1) if your hash table is
at least as large as the number of elements.
As for comparing them with balanced trees, I think one of the main
advantages hash tables have is better CPU cache performance. A tree
involves much more jumping around the RAM than a hash table.
On Sun, Mar 9, 2014 at 2:43 PM, Marko Rauhamaa wrote:
> Dan Stromberg :
>
>> On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa wrote:
>>> There is no O(1) hash table.
>>
>> http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1
>
> Please elaborate.
A hash table of fixed size is O(
Dan Stromberg :
> On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa wrote:
>> There is no O(1) hash table.
>
> http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1
Please elaborate.
Marko
--
https://mail.python.org/mailman/listinfo/python-list
On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa wrote:
> Dan Stromberg :
>
>> This is not just a detail: O(1) tends to be beat O(logn) pretty easily
>> for large n.
>
> There is no O(1) hash table.
http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1
--
https://mail.python.org/
Dan Stromberg :
> This is not just a detail: O(1) tends to be beat O(logn) pretty easily
> for large n.
There is no O(1) hash table.
Marko
--
https://mail.python.org/mailman/listinfo/python-list
tter. It is more
>>> generally usable, has fewer corner cases and probably has an equal
>>> performance even in hash tables' sweet spot.
>>
>> Actually, in the performance comparison I mentioned previously, I
>> compared Python dict's to a bunch of differe
probably has an equal
>> performance even in hash tables' sweet spot.
>
> Actually, in the performance comparison I mentioned previously, I
> compared Python dict's to a bunch of different balanced trees and one
> unbalanced tree. The dictionary was much faster, though granted, it
Roy Smith :
> Marko Rauhamaa wrote:
>
>> If I had to choose between a hash table and AVL (or RB) tree in the
>> standard library, it would definitely have to be the latter. It is more
>> generally usable, has fewer corner cases and probably has an equal
>> performance even in hash tables' sweet
even in hash tables' sweet spot.
Actually, in the performance comparison I mentioned previously, I
compared Python dict's to a bunch of different balanced trees and one
unbalanced tree. The dictionary was much faster, though granted, it
was the only one in C.
That URL again:
http://
In article <87eh2ctmht@elektro.pacujo.net>,
Marko Rauhamaa wrote:
> If I had to choose between a hash table and AVL (or RB) tree in the
> standard library, it would definitely have to be the latter. It is more
> generally usable, has fewer corner cases and probably has an equal
> performance
Mark Lawrence :
> I believe that there are advantages to leaving specialist data
> structures on pypi or other sites, plus it means Python in a Nutshell
> can still fit in your pocket and not a 40 ton articulated lorry,
> unlike the Java equivalent.
An ordered map is a foundational data structure
On 08/03/2014 19:58, Dan Stromberg wrote:
On Sat, Mar 8, 2014 at 12:34 AM, Marko Rauhamaa wrote:
Ian Kelly :
I already mentioned this earlier in the thread, but a balanced binary
tree might implement += as node insertion and then return a different
object if the balancing causes the root node
On Sat, Mar 8, 2014 at 12:34 AM, Marko Rauhamaa wrote:
> Ian Kelly :
>
>> I already mentioned this earlier in the thread, but a balanced binary
>> tree might implement += as node insertion and then return a different
>> object if the balancing causes the root node to change.
>
> True.
>
> Speaking
Ian Kelly :
> Peeking at the code, it appears to use a heapq-based priority queue.
> Why would a balanced binary tree be better?
AFAIK, a heap queue doesn't allow for the deletion of a random element
forcing you to leave the canceled timers in the queue to be deleted
later.
In a very typical sce
t; still doesn't have one.
None currently that I'm aware of. If you want to propose adding one,
I suggest reading:
http://docs.python.org/devguide/stdlibchanges.html
> In fact, since asyncio has timers but Python doesn't have balanced
> trees, I'm led to wonder how good t
quot;batteries" of Python? Timers, cache aging and the like need it. I'm
using my own AVL tree implementation, but I'm wondering why Python
still doesn't have one.
In fact, since asyncio has timers but Python doesn't have balanced
trees, I'm led to wonder how good t
74 matches
Mail list logo