[issue14478] Decimal hashing very slow, could be cached

2012-11-25 Thread Mark Dickinson

Mark Dickinson added the comment:

Closing as fixed (for the C version, at least).

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-09-29 Thread Mark Dickinson

Changes by Mark Dickinson :


--
versions: +Python 3.4 -Python 3.3

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-06-23 Thread Mark Dickinson

Mark Dickinson  added the comment:

I agree with Raymond:  I don't see a real need to patch the Python version 
here.  If we do want to patch the Python version, I'd go with Raymond's simple 
patch.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-06-23 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
nosy: +mark.dickinson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-06-23 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

The C version of decimal may not always be available. In particular, it is not 
compatible with C89. Therefore, efficiency of the pure Python version of 
decimal is important.

Any chance to get it in Python 3.3?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-10 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

I don't see a need to cache the hash value in the pure Python version.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-10 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

> Using except: pass as opposed to sticking everything inside the except 
> statement is also very slightly slower as well

I ran the test several times and didn't see the difference between pass
and sticking variants more than between different runs of the same
variant. If it is, then this is a reason for the interpreter
optimization.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-10 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc  added the comment:

> Checking for the AttributeError is very slightly slower
Why are you trying to micro-optimize the Python version, when the C 
implementation does it better and faster?

--
nosy: +amaury.forgeotdarc

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-10 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

> I can't imagine any other exception coming from that try statement.

KeyboardInterrupt

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-10 Thread James Hutchison

James Hutchison  added the comment:

In the patch:

This:
+except AttributeError:
+pass

should be:
+except:


Checking for the AttributeError is very slightly slower. Not by a lot, but I 
think if we're going for speed we might as well be as fast as possible. I can't 
imagine any other exception coming from that try statement.

Using except: pass as opposed to sticking everything inside the except 
statement is also very slightly slower as well

Simple test case, 10 million loops:
except: 7.140999794006348
except AttributeError: 7.8440001010894775

Exception code
in except: 7.48367575073
after except/pass: 7.75

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-10 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

> The patch for the Python version looks good to me

Oh, but used by James Hutchison approach is faster. When I disable the
_decimal:

Unpatched:
int:  2.24056077003479
CachingDecimal:  8.49468207359314
Decimal:  187.68132972717285

With rhettinger's LBYL patch:
int:  2.1670639514923096
CachingDecimal:  8.790924310684204
Decimal:  10.426796436309814

With EAFP patch:
int:  2.1392786502838135
CachingDecimal:  8.431122303009033
Decimal:  8.263015270233154

--
Added file: http://bugs.python.org/file25169/decimal_hash_2.patch

___
Python tracker 

___diff -r a012d5df2c73 Lib/decimal.py
--- a/Lib/decimal.pyTue Apr 10 16:27:58 2012 +0200
+++ b/Lib/decimal.pyTue Apr 10 18:46:19 2012 +0300
@@ -547,7 +547,7 @@
 class Decimal(object):
 """Floating point class for decimal arithmetic."""
 
-__slots__ = ('_exp','_int','_sign', '_is_special')
+__slots__ = ('_exp','_int','_sign', '_is_special', '_hash')
 # Generally, the value of the Decimal instance is given by
 #  (-1)**_sign * _int * 10**_exp
 # Special values are signified by _is_special == True
@@ -983,6 +983,10 @@
 
 def __hash__(self):
 """x.__hash__() <==> hash(x)"""
+try:
+return self._hash
+except AttributeError:
+pass
 
 # In order to make sure that the hash of a Decimal instance
 # agrees with the hash of a numerically equal integer, float
@@ -992,20 +996,22 @@
 if self.is_snan():
 raise TypeError('Cannot hash a signaling NaN value.')
 elif self.is_nan():
-return _PyHASH_NAN
+hash_ = _PyHASH_NAN
 else:
 if self._sign:
-return -_PyHASH_INF
+hash_ = -_PyHASH_INF
 else:
-return _PyHASH_INF
-
-if self._exp >= 0:
-exp_hash = pow(10, self._exp, _PyHASH_MODULUS)
+hash_ = _PyHASH_INF
 else:
-exp_hash = pow(_PyHASH_10INV, -self._exp, _PyHASH_MODULUS)
-hash_ = int(self._int) * exp_hash % _PyHASH_MODULUS
-ans = hash_ if self >= 0 else -hash_
-return -2 if ans == -1 else ans
+if self._exp >= 0:
+exp_hash = pow(10, self._exp, _PyHASH_MODULUS)
+else:
+exp_hash = pow(_PyHASH_10INV, -self._exp, _PyHASH_MODULUS)
+hash_ = int(self._int) * exp_hash % _PyHASH_MODULUS
+if self < 0: hash_ = -hash_
+if hash_ == -1: hash_ = -2
+self._hash = hash_
+return hash_
 
 def as_tuple(self):
 """Represents the number as a triple tuple.
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-10 Thread Stefan Krah

Stefan Krah  added the comment:

The patch for the Python version looks good to me.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-10 Thread Stefan Krah

Stefan Krah  added the comment:

I changed the C version to cache the hash as well: For the submitted
test case the speedup is only 5x, but hashing times vary greatly
depending of the size of the coefficient and the exponent.


BTW, the tests already call both hash() and __hash__() on the same
number, so retrieving the cached value is actually tested.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-10 Thread Roundup Robot

Roundup Robot  added the comment:

New changeset a012d5df2c73 by Stefan Krah in branch 'default':
Issue #14478: Cache the hash of a Decimal in the C version.
http://hg.python.org/cpython/rev/a012d5df2c73

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-10 Thread Jim Jewett

Jim Jewett  added the comment:

Stefan Krah has a good point.  

Since the only cost is an extra slot, and this is for users who have already 
chosen to use Decimal instead of a more efficient (but possibly less accurate) 
representation, even without the native speedups to Decimal ... I guess I'm now 
+1.  

It is clearly an implementation detail, so I don't think the docs need to be 
changed.  I'm not sure the tests do, nor am I sure *how* to test for it in a 
black-box fashion.

I have reviewed the patch and have no objections.

So unless a new objection comes up in the next few days, I would say this is 
ready to be checked in.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-10 Thread Stefan Krah

Stefan Krah  added the comment:

I think it would be a good idea to fix this in the Python version
for the other implementations.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-07 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

Le samedi 07 avril 2012 à 17:22 +, Raymond Hettinger a écrit :
> --
> keywords: +patch
> Added file: http://bugs.python.org/file25152/decimal_hash.diff

I think patching the C version of Decimal would be more useful :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-07 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
keywords: +patch
Added file: http://bugs.python.org/file25152/decimal_hash.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-07 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> > > I recommend that __hash__ should use functools.lru_cache for caching.
> > Why would you do such a thing? A hash value is a single 64-bit slot, no 
> > need to add the memory consumption of a whole dictionary and the runtime 
> > cost of a LRU eviction policy when you can simply cache the hash in the 
> > object itself (like we already do for strings)...
> 
> It was a joke (I think). Taking into account the fact that LRU cache
> uses a hashtable and need to calculate the hash of arguments (i.e., the
> Decimal self) to get the cached value of hash.

Damn. Shame on me for not understanding Raymond's humour :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-07 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

> > I recommend that __hash__ should use functools.lru_cache for caching.
> Why would you do such a thing? A hash value is a single 64-bit slot, no need 
> to add the memory consumption of a whole dictionary and the runtime cost of a 
> LRU eviction policy when you can simply cache the hash in the object itself 
> (like we already do for strings)...

It was a joke (I think). Taking into account the fact that LRU cache
uses a hashtable and need to calculate the hash of arguments (i.e., the
Decimal self) to get the cached value of hash.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-07 Thread Antoine Pitrou

Changes by Antoine Pitrou :


--
stage:  -> needs patch
versions: +Python 3.3 -Python 3.2

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-07 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> I recommend that __hash__ should use functools.lru_cache for caching.

Why would you do such a thing? A hash value is a single 64-bit slot, no need to 
add the memory consumption of a whole dictionary and the runtime cost of a LRU 
eviction policy when you can simply cache the hash in the object itself (like 
we already do for strings)...

--
nosy: +pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-06 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
assignee:  -> rhettinger
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-06 Thread Jim Jewett

Jim Jewett  added the comment:

@Jimbofbx:  You are correct that a value has to be reserved to indicate that 
the hash hasn't been (or can't be) computed, but python uses -1 instead of zero.

An integer can't return itself because a hash is an unboxed integer; that said, 
there is a fast path for small enough integers.  You can see the actual code at
http://hg.python.org/cpython/file/388016438a50/Objects/longobject.c#l2581 

Since it is too late for 3.2, I suggest closing this and opening a separate 3.3 
issue if the existing improvements are not sufficient.

--
nosy: +Jim.Jewett

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-05 Thread James Hutchison

James Hutchison  added the comment:

I presume you mean in 3.2? Have you looked at the source code for that 
decorator? It's fundamentally a try/except but with a lot more unnecessary 
bloat than is needed for caching a single int result from a function with no 
arguments. Its actually a lot slower.

If this is likely going to see use in 3.3 then it would probably just be a long 
int since 3.3 uses C. 0 would indicate uncalculated.

Hash function would have to be set up to never return 0.

Also every function would need to be tested to make sure there isn't any 
"in-place" modification of the Decimal object that could alter the hash value.

I like how the cached hash in 3.3 is faster than int for hashing. Shouldn't an 
int just return itself? Why would it be slower?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-05 Thread Ramchandra Apte

Ramchandra Apte  added the comment:

I recommend that __hash__ should use functools.lru_cache for caching.

--
nosy: +ramchandra.apte

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-03 Thread Stefan Krah

Stefan Krah  added the comment:

> but hashing will be faster.

I retract that. The exponent actually makes things worse.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-03 Thread Stefan Krah

Stefan Krah  added the comment:

I agree that caching the hash would be useful for 3.2, but the
request comes at a unfortunate time: 3.2.3 is about to be released,
and there's no way that the change would go into it.


So let's focus on the C version in 3.3. These are the timings on a
64-bit machine with the C version in 3.3:

int:  0.537806510925293
CachingDecimal:  2.2549374103546143
Decimal:  1.8158345222473145


These are the timings with a hacked C version that caches the hash:

int:  0.5755119323730469
CachingDecimal:  2.3034861087799072
Decimal:  0.4364290237426758



The hash calculation time depends on the size of the coefficient
of the Decimal and the exponent. Note that the context is not
applied when using the Decimal constructor:


>>> Decimal(1e100)
Decimal('1159028911097599180468360808563945281389781327557747838772170381060813469985856815104')


So the numbers you are using have an unusually high precision for
regular decimal floating point arithmetic.

If you want well defined limits, I suggest using either:

>>> Decimal('1e100')
Decimal('1E+100')

Or, if the input really must be a float:

>>> c = getcontext()
>>> c.create_decimal(1e100)
Decimal('1.15902891110E+100')


In that latter case, of course the conversion is inexact and
rounded (but hashing will be faster).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-02 Thread James Hutchison

James Hutchison  added the comment:

100x should be e100

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-02 Thread James Hutchison

James Hutchison  added the comment:

If I increase the cycles increased 10x with 3.2 I get:

int:  0.421313354492
Decimal:  24.20299983024597
CachingDecimal:  1.7809998989105225

The sample you have provided is basically what I'm using. See attached

What about worst case hash calculation time for Decimal? How does the C code 
handle that? This is if I increase the values themselves by 100x, same number 
of loops as above

int:  0.5309998989105225
CachingDecimal:  2.07868664551
Decimal:  41.2979998588562

--
Added file: http://bugs.python.org/file25100/cachedDecimal.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-02 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

It would be better if you provide a working script that demonstrates the issue. 
I have completed your example (attached).

I ran the example on Python 3.1 and received:

0.24046326
9.8737308979
0.587980985641

Then I ran on Python 3.3:

0.2100839614868164
0.8649246692657471
0.6062228679656982

As you can see, the new implementation is much faster. Benefit from caching 
decreased. I suppose, if we implement caching in C the difference will be more.

Then I increased the size of the cycles in 10 times, and the time is almost 
equal (on Python 3):

1.8386573791503906
8.418540477752686
8.355770826339722

That I can't to explain. The time of cached version increased 
disproportionately, more than in 10 times.

--
nosy: +storchaka
Added file: http://bugs.python.org/file25099/CachingDecimal_example.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-02 Thread Jesús Cea Avión

Changes by Jesús Cea Avión :


--
nosy: +jcea

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-02 Thread R. David Murray

Changes by R. David Murray :


--
nosy: +skrah

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14478] Decimal hashing very slow, could be cached

2012-04-02 Thread James Hutchison

New submission from James Hutchison :

Tested on 3.2

Note that I noticed that Decimal is supposed to be faster in 3.3 but I thought 
I would bring this to light just in case its still relevant

Decimal hashing is very slow, even for simple numbers. I found by caching the 
hash result, there is significant speed up whenever a Decimal value is reused.

I created a class that inherits from Decimal and changed the __hash__ function 
to try/except a stored hash value as such:

def __hash__(self):
try: return self.myHash;
except:
self.myHash = super().__hash__();
return self.myHash;

Example code:

d = dict();
start = time.time();
i = int(1);
j = int(2);
k = int(3);
for i in range(10):
d[i,j,k] = 5;
print(time.time() - start);

d = dict();
start = time.time();
i = Decimal(1);
j = Decimal(2);
k = Decimal(3);
for i in range(10):
d[i,j,k] = 5;
print(time.time() - start);

d = dict();
start = time.time();
i = CachingDecimal(1);
j = CachingDecimal(2);
k = CachingDecimal(3);
for i in range(10):
d[i,j,k] = 5;
print(time.time() - start);

Output
int:  0.0463133544922
Decimal:  2.31263760376
CachingDecimal:  0.111335144043

In a real life example, I changed some of the values in my code from int to 
Decimal because non-whole numbers needed to be supported (and this seemed like 
the easiest way to do so without having to worry about my == comparisons 
breaking) and it slowed my code down massively. Changing to a CachingDecimal 
type sped up one code block from 92 seconds with Decimal to 7 seconds, which 
was much closer to the original int speed.

Note that all of the relevant operations have to be overloaded to return the 
CachingDecimal type, which is a pain, so this makes a lot of sense to implement 
into the Decimal module. My understanding is that altering a Decimal value will 
always create a new Decimal object, so there shouldn't be any issues with the 
cached hash desyncing with the correct hash. Someone may want to check that 
though.

Thanks,

James

--
components: Library (Lib)
messages: 157369
nosy: Jimbofbx
priority: normal
severity: normal
status: open
title: Decimal hashing very slow, could be cached
type: performance
versions: Python 3.2

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com