.
On Friday, March 6, 2015 at 6:24:24 PM UTC-8, Steven D'Aprano wrote:
Dmitry Chichkov wrote:
I was looking over documentation of the bisect module and encountered the
following very strange statement there:
From https://docs.python.org/2/library/bisect.html
...it does not make sense
Dmitry Chichkov added the comment:
Use case: a custom immutable array with a large number of items and indirect
key field access. For example ctypes.array, memoryview or ctypes.pointer or any
other custom container.
1. I'm not sure how anyone can consider a precached key array as a right ans
I was looking over documentation of the bisect module and encountered the
following very strange statement there:
From https://docs.python.org/2/library/bisect.html
...it does not make sense for the bisect() functions to have key or reversed
arguments because that would lead to an inefficient
Uh. I'm sorry about the confusion. Last three items are just O(N)
baselines. Python min(), Numpy argmin(), Numpy asarray().
I'll update the code. Thanks!
A lot of the following doesn't run or returns incorrect results.
To give but one example:
def nargsmallest_numpy_argmin(iter, k):
By the way, improving n-ARG-smallest (that returns indexes as well as
values) is actually more desirable than just regular n-smallest:
== Result ==
1.38639092445 nargsmallest
3.1569879055 nargsmallest_numpy_argsort
1.29344892502 nargsmallest_numpy_argmin
Note that numpy array constructor eats
wrote:
On 9/1/2010 9:08 PM, Dmitry Chichkov wrote:
Your problem is underspecified;-).
Detailed timing comparisons are only valid for a particular Python
version running under a particular OS on particular hardware. So, to
actually run a contest, you would have to specify a version and OS
Given: a large list (10,000,000) of floating point numbers;
Task: fastest python code that finds k (small, e.g. 10) smallest
items, preferably with item indexes;
Limitations: in python, using only standard libraries (numpy scipy
is Ok);
I've tried several methods. With N = 10,000,000, K = 10 The
Dmitry Chichkov dchich...@gmail.com added the comment:
Yes, it looks like you are right. And while there is some slight performance
degradation, at least nothing drastic is happening up to 30M keys. Using your
modified test:
1000 words ( 961 keys), 3609555 words/s, 19239926 lookups/s
Changes by Dmitry Chichkov dchich...@gmail.com:
Added file: http://bugs.python.org/file18515/dc.dict.bench.0.02.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9520
Dmitry Chichkov dchich...@gmail.com added the comment:
Yes. Data containers optimized for very large datasets, compactness and strict
adherence to O(1) can be beneficial.
Python have great high performance containers, but there is a certain lack of
compact ones. For example, on the x64
Dmitry Chichkov dchich...@gmail.com added the comment:
Thank you for your comment. Perhaps we should try separate this into two issues:
1) Bug. Python's dict() is unusable on datasets with 10,000,000+ keys. Here I
should provide a solid test case showing a deviation from O(1);
2) Feature
Dmitry Chichkov dchich...@gmail.com added the comment:
No. I'm not simply running out of system memory. 8Gb/x64/linux. And in my test
cases I've only seen ~25% of memory utilized. And good idea. I'll try to play
with the cyclic garbage collector.
It is harder than I thought to make a solid
New submission from Dmitry Chichkov dchich...@gmail.com:
On large data sets (10-100 million keys) the default python dictionary
implementation fails to meet memory and performance constraints. It also
apparently fails to keep O(1) complexity (after just 1M keys). As such, there
is a need
Dmitry Chichkov dchich...@gmail.com added the comment:
I agree that the argument name choice is poor. But it have already been made by
whoever coded the EXPAT parser which cElementTree.XMLParser wraps. So there is
not much room here.
As to 'proposed feature have to be used with great care
Dmitry Chichkov dchich...@gmail.com added the comment:
Interestingly in precisely these applications often you don't care about
namespaces at all. Often all you need is to extract 'text' or 'name' elements
irregardless of the namespace
Dmitry Chichkov dchich...@gmail.com added the comment:
This patch does not modify the existing behavior of the library. The
namespace_separator parameter is optional. Parameter already exists in the
EXPAT library, but it is hard coded in the cElementTree.XMLParser code.
Fredrik, yes
New submission from Dmitry Chichkov dchich...@gmail.com:
The namespace_separator parameter is hard coded in the cElementTree.XMLParser
class disallowing the option of ignoring XML Namespaces with cElementTree
library.
Here's the code example:
from xml.etree.cElementTree import iterparse
Changes by Dmitry Chichkov dchich...@gmail.com:
--
keywords: +patch
Added file: http://bugs.python.org/file17153/issue-8583.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
Dmitry Chichkov dchich...@gmail.com added the comment:
And obviously iterparse can be either overridden in the local user code or
patched in the library. Here's the iterparse code/test code:
import cElementTree
from cStringIO import StringIO
class iterparse(object):
root = None
def
Dmitry Chichkov dchich...@gmail.com added the comment:
Yes. This patch is nowhere near the production level. Unfortunately it works
for me. And in the moment I don't have time to improve it further. Current
version doesn't check the item's width upfront, there is definitely room
New submission from Dmitry Chichkov dchich...@gmail.com:
I've run into a case where pprint isn't really pretty.
import pprint
pprint.PrettyPrinter().pprint([1]*100)
Prints a lengthy column of '1'; Not pretty at all. Look:
[1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1
Dmitry Chichkov dchich...@gmail.com added the comment:
Quick, dirty and utterly incorrect patch that works for me. Includes
issue_5131.patch (defaultdict support, etc). Targets trunk (2.6), revision
77310.
--
keywords: +patch
Added file: http://bugs.python.org/file16640/issue_8228
22 matches
Mail list logo