[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-08 Thread Inada Naoki
Inada Naoki added the comment: There is a macro benchmark we regularly used. https://pyperformance.readthedocs.io/ You can try micro benchmark for most common use cases with pyperf like this: ``` $ pyperf timeit -s "def f(**kwargs): pass" -- "f(a=1,b=2,c=3)" Mean +- std dev: 119 ns +- 3 ns

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-08 Thread Daniel Fleischman
Daniel Fleischman added the comment: Hey Raymond, > "Changing the dict implementation to a linked list would make your case perform better but would make most users worse off." Is there any standard benchmark? I would like to implement my idea, as I think that it is very unlikely that "most

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-05 Thread Raymond Hettinger
Raymond Hettinger added the comment: > I still disagree with this design since it's not > how a person using a hash map expects it to behave. All mapping data structures have tradeoffs. For the core dict type, we chose a structure that benefits most Python users most of the time. If a user

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-05 Thread Daniel Fleischman
Daniel Fleischman added the comment: Thank you for your reply. I didn't know that this was the expected behavior (found it at https://wiki.python.org/moin/TimeComplexity): "For these operations, the worst case n is the maximum size the container ever achieved, rather than just the current

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-05 Thread Inada Naoki
Inada Naoki added the comment: iterating whole over the dict is O(n) where n is the historical max size of the dict. On the other hand, there are no guarantee about `next(iter(d))` is O(1). The guarantee is O(n) at worst case. And your example is the worst case. So this is not a bug. As

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-03 Thread Daniel Fleischman
Daniel Fleischman added the comment: Thank you again, Dennis. I would disagree with your conclusion for mainly 3 reasons: 1. The linked list proposal would increase the memory usage by 2 Py_ssize_t per entry, plus another 2 for the whole dictionary. I don't think this is an overwhelming

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-03 Thread Dennis Sweeney
Dennis Sweeney added the comment: OrderedDict already uses a linked list alongside a dictionary and meets all of your timing expectations, but has a higher memory usage, as linked lists generally do. Since so many Python objects use dictionaries under the hood, it would probably not be

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-03 Thread Daniel Fleischman
Daniel Fleischman added the comment: I think the idea of augmenting the struts to maintain a linked list of elements together with periodically shrinking on removal would solve both time and space issues, right? Space will be always linear, and operations would still be constant amortized

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-03 Thread Dennis Sweeney
Dennis Sweeney added the comment: Alternate idea: the dict could shrink, if required, on the iter() call, whenever a keys/values/items iterator is initialized. -- ___ Python tracker

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-03 Thread Dennis Sweeney
Dennis Sweeney added the comment: > Can https://bugs.python.org/issue32623 be a fix (or mitigation) of this issue? If such a change is implemented and dictionaries shrink when their fill falls below 1/8, the behavior of `while d: del d[next(iter(d))]` will remain quadradic: there will be

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-03 Thread Inada Naoki
Inada Naoki added the comment: Can https://bugs.python.org/issue32623 be a fix (or mitigation) of this issue? -- ___ Python tracker ___

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-03 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- nosy: +methane, rhettinger, serhiy.storchaka ___ Python tracker ___ ___ Python-bugs-list mailing

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-02 Thread Daniel Fleischman
Daniel Fleischman added the comment: Please find attached a more complete example of the issue I am reporting. tl;dr: I can make `sum(d.values())` run in O(maximum_size_in_d's_history) instead of O(len(d)), even when len(d) == 1. The linked list approach would work in terms of making it

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-02 Thread Daniel Fleischman
Daniel Fleischman added the comment: Thank you for your message! I'm not particularly looking for an implementation, I was just surprised by this behavior. It can get even worse. Consider this example: ``` d = large_dict() # remove all but one element of d, runs in quadratic time as

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-02 Thread Dennis Sweeney
Dennis Sweeney added the comment: For what it's worth, using collections.OrderedDict gives constant-time behavior you're looking for. -- nosy: +Dennis Sweeney ___ Python tracker

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-02 Thread Daniel Fleischman
Daniel Fleischman added the comment: The linked list solution is not as easy to implement as I expected, because of insertions. :( -- ___ Python tracker ___

[issue44555] Dictionary operations are LINEAR for any dictionary (for a particular code).

2021-07-02 Thread Daniel Fleischman
New submission from Daniel Fleischman : The following code takes quadratic time on the size of the dictionary passed, regardless of the dictionary (explanation below): ``` def slow_dictionary(d): while len(d) > 0: # Remove first element key = next(iter(d)) del