[issue37168] Decimal divisions sometimes 10x or 100x too large

2020-08-30 Thread Facundo Batista


Change by Facundo Batista :


--
nosy:  -facundobatista

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-09-14 Thread Steven D'Aprano


Change by Steven D'Aprano :


--
nosy: +steven.daprano

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-07-10 Thread Phil Frost


Phil Frost  added the comment:

Not yet. I had a hell of a time getting gdb to follow gunicorn forks so it 
would watch the workers but not also every other thing using popen or 
os.system(). And the lldb package on Alpine doesn't seem to work.

So we're currently in the process of testing a Debian container for production, 
hoping the issue just goes away. If not, I'll continue digging in with a 
debugger.

--
nosy: +Phil Frost2

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-07-09 Thread Mark Dickinson


Mark Dickinson  added the comment:

@Phil: Did you ever get to the bottom of this?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-07 Thread STINNER Victor


STINNER Victor  added the comment:

Can you try to reproduce your workflow with Python compiled in debug mode 
(./configure --with-pydebug)? With Python 3.8, you can now use a debug Python 
without having to recompile your C extensions :-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-07 Thread Phil Frost


Phil Frost  added the comment:

I'm afraid that won't be of much use since the object is statically allocated. 
Besides, the trouble isn't finding what created the object, but what mutated an 
int object which should be immutable.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread STINNER Victor


STINNER Victor  added the comment:

On Python 3, you can use tracemalloc.get_object_traceback() to find where an 
object was allocated ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread Phil Frost


Phil Frost  added the comment:

Probably too many C extensions to feasibly audit them all. Also we can't rule 
out something busted in Alpine. So I'm going to set a watchpoint on 
small_ints[1]->ob_ival with a little gdb script to dump core right when it 
changes; hopefully inspecting that backtrace will lead directly to the culprit.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread Mark Dickinson


Mark Dickinson  added the comment:

This smells a bit as though someone's doing something like:

myobject = PyInt_FromLong(some_long);

and then assuming that they hold the only reference to `myobject`, so that it's 
safe to mutate in place.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread Mark Dickinson


Mark Dickinson  added the comment:

Interesting indeed.

What 3rd party extension are installed in your Python environment? Is it 
possible that one of them is messing with the CPython internals (perhaps in an 
overeager attempt to optimize)? Any uses of ctypes?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread Phil Frost


Phil Frost  added the comment:

So this is uminteresting

(gdb) p small_ints[0]
$2 = (PyIntObject *) 0x558ce65df528
(gdb) p *small_ints[0]
$3 = {ob_refcnt = 18, ob_type = 0x7fe019b694c0 , ob_ival = -5}
(gdb) p *small_ints[1]
$4 = {ob_refcnt = 65, ob_type = 0x7fe019b694c0 , ob_ival = -3}
(gdb) p *small_ints[2]
$5 = {ob_refcnt = 204, ob_type = 0x7fe019b694c0 , ob_ival = -3}
(gdb) p *small_ints[3]
$6 = {ob_refcnt = 1872, ob_type = 0x7fe019b694c0 , ob_ival = -2}

When is 1/1=10? When -5 + 1 = -3.

So now I suppose the question is how this comes to be.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread Jeffrey Kintscher


Change by Jeffrey Kintscher :


--
nosy: +Jeffrey.Kintscher

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread Stefan Krah


Stefan Krah  added the comment:

Yes, I'd try Debian first, then potential libc issues can also be eliminated in 
a single experiment.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread Phil Frost


Phil Frost  added the comment:

> Alpine Linux apparently uses musl.  Is that well supported and tested?

Heh. Not really.

Switching to a Debian container is probably just about as much work as building 
Python from source. Would that be preferable?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread Stefan Krah


Stefan Krah  added the comment:

There are lots of ideas still. :)

As I said, on a Linux other than RedHat or Debian or such I'd first try
to blame the shipped Python, then the libc (I had a glibc memmove issue
once with similar inexplicable results).

Alpine Linux apparently uses musl.  Is that well supported and tested?



Since you mention concurrency: Before the arrival of contextvars in 3.7
the decimal context is thread-safe, but not coroutine-safe.


If the issue is really in that particular loop I can't see how
that could be the cause, but perhaps the core dump is not accurate.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread Phil Frost


Phil Frost  added the comment:

skrah: Yes, that's correct. Since I can only produce this bug in production it 
will take me some days to build and validate a source build. But absent any 
better ideas, I will try.

tim.peters: I've observed this bug across hundreds of EC2 hosts, in dozens of 
code paths, with all kinds of inputs. Moreover, the hosts aren't displaying any 
other symptoms of hardware failure such as random segfaults or mysteriously 
corrupted data.

I've also deeply investigated two cores now which show specifically that `exp` 
seems to get 2 added when it should have been 1. I have a hard time explaining 
how a hardware failure can cause precisely the same failure so reliably.

So I doubt hardware is to blame.

Although, it does seem the issue occurs in "clumps" on individual hosts. So we 
might go 10 hours without seeing the issue, then it may happen 5 times within 
30 minutes on one host. We might observe 1 or 2 more such clumps on the same 
host until the next deploy of the application, at which point all the 
containers are replaced with fresh ones. So this suggests there is some 
ephemeral state within a host that creates a propensity for the issue.

I've also been unable to reproduce the problem in a development environment, 
even when that development environment is using the same kernel, instance 
class, and docker container as production. So I suspect the bug is precipitated 
by some particular concurrency or interaction that I haven't been able to 
replicate.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread STINNER Victor


Change by STINNER Victor :


--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-05 Thread Tim Peters


Tim Peters  added the comment:

Also basic:  run hardware CPU and memory stress diagnostics, and/or try running 
the same thing on a different machine.  Hardware isn't infallible, and can fail 
in nearly arbitrary ways.  For example, perhaps a smidgen of silicon has gone 
flaky, so that one time in a billion bit 0 "leaks" into bit 1 in a register.  
That can effectively change the integer 1 to the integer 2.

Is that likely?  No.  But a loop doing mutually inconsistent things to the 
variables it's changing is impossible ;-)

--
nosy: +tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-05 Thread Stefan Krah


Stefan Krah  added the comment:

Thank you for the detailed report. So, in an attempt to summarize,
this is the crucial passage:

>>> ideal_exp = -1
>>> exp = -29
>>> coeff = 15
>>> division_counter = 0
>>> while exp < ideal_exp and coeff % 10 == 0:
... coeff //= 10
... exp += 1
... division_counter += 1
... 
>>> coeff
15L
>>> exp
-1
>>> division_counter
28



But you sometimes get coeff==150 and division_counter==27 instead.
Is that correct?


The first thing to consider is whether Alpine Linux uses a patched
Python version or if you can reproduce this on Python 2.7.15 compiled
from source.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-05 Thread Phil Frost


Phil Frost  added the comment:

I should note that while we do observe this issue with Decimal, my 
troubleshooting so far indicates the logic in decimal.py is fine. It seems 
instead the instructions in decimal.py aren't being correctly executed.

I'm not sure if the problem is limited to only Decimal, or if we just notice it 
there because it's correlated with financial code, which when encountering the 
bug, introduces an error which is eventually caught by the accounting 
department. There could very well be errors happening at least frequently 
elsewhere which simply go unnoticed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-05 Thread Zachary Ware


Change by Zachary Ware :


--
nosy: +facundobatista, mark.dickinson, rhettinger, skrah

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-05 Thread Phil Frost


New submission from Phil Frost :

We've observed instances of Decimal divisions being incorrect a few times per 
day in a web application serving millions of requests daily. I've been unable 
to reproduce the issue but have been investigating core dumps which suggest 
either some issue in the C Python implementation or its execution environment.

Python 2.7.15 on Alpine Linux Docker containers.

A slightly modified decimal.py is attached. It's been modified to check that 
the answer is approximately correct and dumps core if not. It also has a few 
local variables added to provide additional insight. I've annotated the file 
with values from the core dump and my reasoning about what should happen.

The crux of the problem is the loop at line 1389. There are 3 ways to determine 
how many times this loop executed:

1. the number of zeros removed from coeff
2. the number of times exp is incremented
3. the number of times division_counter is incremented

Oddly, #1 and #3 indicate the loop happened 27 times, while #2 indicates the 
loop happened 28 times.

One possible explanation (which makes about as much sense as any other) is that 
`exp += 1` sometimes adds 2 instead of 1. This means the loop happens 1 time 
fewer than it should, leaving `coeff` 10 times bigger than it should properly 
be. Or (very rarely) this happens twice and the result is 100 times bigger.

I find it very odd that something as basic as `+= 1` should not work, but at 
the moment it is the best explanation I have.

Unfortunately I can't share the core dump, as I've only been able to observe 
the issue in production so the core contains private information. But I'd 
welcome any ideas for further exploration. Or perhaps someone is aware of an 
existing bug that would explain this behavior.

--
components: Interpreter Core
files: decimal.py
messages: 344772
nosy: Phil Frost
priority: normal
severity: normal
status: open
title: Decimal divisions sometimes 10x or 100x too large
versions: Python 2.7
Added file: https://bugs.python.org/file48396/decimal.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com