Re: A strange statement in the bisect documentation?

2015-03-09 Thread Dmitry Chichkov
. On Friday, March 6, 2015 at 6:24:24 PM UTC-8, Steven D'Aprano wrote: Dmitry Chichkov wrote: I was looking over documentation of the bisect module and encountered the following very strange statement there: From https://docs.python.org/2/library/bisect.html ...it does not make sense

[issue4356] Add key argument to bisect module functions

2015-03-06 Thread Dmitry Chichkov
Dmitry Chichkov added the comment: Use case: a custom immutable array with a large number of items and indirect key field access. For example ctypes.array, memoryview or ctypes.pointer or any other custom container. 1. I'm not sure how anyone can consider a precached key array as a right ans

A strange statement in the bisect documentation?

2015-03-06 Thread Dmitry Chichkov
I was looking over documentation of the bisect module and encountered the following very strange statement there: From https://docs.python.org/2/library/bisect.html ...it does not make sense for the bisect() functions to have key or reversed arguments because that would lead to an inefficient

Re: Selecting k smallest or largest elements from a large list in python; (benchmarking)

2010-09-02 Thread Dmitry Chichkov
Uh. I'm sorry about the confusion. Last three items are just O(N) baselines. Python min(), Numpy argmin(), Numpy asarray(). I'll update the code. Thanks! A lot of the following doesn't run or returns incorrect results. To give but one example: def nargsmallest_numpy_argmin(iter, k):    

Re: Selecting k smallest or largest elements from a large list in python; (benchmarking)

2010-09-02 Thread Dmitry Chichkov
By the way, improving n-ARG-smallest (that returns indexes as well as values) is actually more desirable than just regular n-smallest: == Result == 1.38639092445 nargsmallest 3.1569879055 nargsmallest_numpy_argsort 1.29344892502 nargsmallest_numpy_argmin Note that numpy array constructor eats

Re: Selecting k smallest or largest elements from a large list in python; (benchmarking)

2010-09-02 Thread Dmitry Chichkov
wrote: On 9/1/2010 9:08 PM, Dmitry Chichkov wrote: Your problem is underspecified;-). Detailed timing comparisons are only valid for a particular Python version running under a particular OS on particular hardware. So, to actually run a contest, you would have to specify a version and OS

Selecting k smallest or largest elements from a large list in python; (benchmarking)

2010-09-01 Thread Dmitry Chichkov
Given: a large list (10,000,000) of floating point numbers; Task: fastest python code that finds k (small, e.g. 10) smallest items, preferably with item indexes; Limitations: in python, using only standard libraries (numpy scipy is Ok); I've tried several methods. With N = 10,000,000, K = 10 The

[issue9520] Add Patricia Trie high performance container

2010-08-14 Thread Dmitry Chichkov
Dmitry Chichkov dchich...@gmail.com added the comment: Yes, it looks like you are right. And while there is some slight performance degradation, at least nothing drastic is happening up to 30M keys. Using your modified test: 1000 words ( 961 keys), 3609555 words/s, 19239926 lookups/s

[issue9520] Add Patricia Trie high performance container

2010-08-13 Thread Dmitry Chichkov
Changes by Dmitry Chichkov dchich...@gmail.com: Added file: http://bugs.python.org/file18515/dc.dict.bench.0.02.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9520

[issue9520] Add Patricia Trie high performance container

2010-08-08 Thread Dmitry Chichkov
Dmitry Chichkov dchich...@gmail.com added the comment: Yes. Data containers optimized for very large datasets, compactness and strict adherence to O(1) can be beneficial. Python have great high performance containers, but there is a certain lack of compact ones. For example, on the x64

[issue9520] Add Patricia Trie high performance container

2010-08-05 Thread Dmitry Chichkov
Dmitry Chichkov dchich...@gmail.com added the comment: Thank you for your comment. Perhaps we should try separate this into two issues: 1) Bug. Python's dict() is unusable on datasets with 10,000,000+ keys. Here I should provide a solid test case showing a deviation from O(1); 2) Feature

[issue9520] Add Patricia Trie high performance container

2010-08-05 Thread Dmitry Chichkov
Dmitry Chichkov dchich...@gmail.com added the comment: No. I'm not simply running out of system memory. 8Gb/x64/linux. And in my test cases I've only seen ~25% of memory utilized. And good idea. I'll try to play with the cyclic garbage collector. It is harder than I thought to make a solid

[issue9520] Add Patricia Trie high performance container (python's defaultdict(int) is unusable on datasets with 10, 000, 000+ keys.)

2010-08-04 Thread Dmitry Chichkov
New submission from Dmitry Chichkov dchich...@gmail.com: On large data sets (10-100 million keys) the default python dictionary implementation fails to meet memory and performance constraints. It also apparently fails to keep O(1) complexity (after just 1M keys). As such, there is a need

[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-05-02 Thread Dmitry Chichkov
Dmitry Chichkov dchich...@gmail.com added the comment: I agree that the argument name choice is poor. But it have already been made by whoever coded the EXPAT parser which cElementTree.XMLParser wraps. So there is not much room here. As to 'proposed feature have to be used with great care

[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-05-02 Thread Dmitry Chichkov
Dmitry Chichkov dchich...@gmail.com added the comment: Interestingly in precisely these applications often you don't care about namespaces at all. Often all you need is to extract 'text' or 'name' elements irregardless of the namespace

[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-05-01 Thread Dmitry Chichkov
Dmitry Chichkov dchich...@gmail.com added the comment: This patch does not modify the existing behavior of the library. The namespace_separator parameter is optional. Parameter already exists in the EXPAT library, but it is hard coded in the cElementTree.XMLParser code. Fredrik, yes

[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-04-30 Thread Dmitry Chichkov
New submission from Dmitry Chichkov dchich...@gmail.com: The namespace_separator parameter is hard coded in the cElementTree.XMLParser class disallowing the option of ignoring XML Namespaces with cElementTree library. Here's the code example: from xml.etree.cElementTree import iterparse

[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-04-30 Thread Dmitry Chichkov
Changes by Dmitry Chichkov dchich...@gmail.com: -- keywords: +patch Added file: http://bugs.python.org/file17153/issue-8583.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8583

[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-04-30 Thread Dmitry Chichkov
Dmitry Chichkov dchich...@gmail.com added the comment: And obviously iterparse can be either overridden in the local user code or patched in the library. Here's the iterparse code/test code: import cElementTree from cStringIO import StringIO class iterparse(object): root = None def

[issue8228] pprint, single/multiple items per line parameter

2010-04-01 Thread Dmitry Chichkov
Dmitry Chichkov dchich...@gmail.com added the comment: Yes. This patch is nowhere near the production level. Unfortunately it works for me. And in the moment I don't have time to improve it further. Current version doesn't check the item's width upfront, there is definitely room

[issue8228] pprint, single/multiple items per line parameter

2010-03-25 Thread Dmitry Chichkov
New submission from Dmitry Chichkov dchich...@gmail.com: I've run into a case where pprint isn't really pretty. import pprint pprint.PrettyPrinter().pprint([1]*100) Prints a lengthy column of '1'; Not pretty at all. Look: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

[issue8228] pprint, single/multiple items per line parameter

2010-03-25 Thread Dmitry Chichkov
Dmitry Chichkov dchich...@gmail.com added the comment: Quick, dirty and utterly incorrect patch that works for me. Includes issue_5131.patch (defaultdict support, etc). Targets trunk (2.6), revision 77310. -- keywords: +patch Added file: http://bugs.python.org/file16640/issue_8228