[issue1521] string.decode() fails on long strings

2007-11-30 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

Committed revision 59244 in release25-maint.

--
resolution:  -> fixed
status: open -> closed

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-30 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

Committed revision 59241. Will backport after the buildbots run the test.

--
assignee:  -> amaury.forgeotdarc

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-30 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

> What else needs to be done to make sure your patch finds it's way 
> to the Python core?

Nothing I suppose. It appears like an inconsistency in the source code,
and it happens to correct a real problem. I will commit it in a few hours.

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-30 Thread Andreas Eisele

Andreas Eisele added the comment:

> Then 7G is "enough" for the test to run.

yes, indeed, thanks for pointing this out.
It runs and detects an ERROR, and after applying your patch it succeeds.

What else needs to be done to make sure your patch finds it's way to the
Python core?

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-30 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

> @bigmemtest(minsize=_2G*2+2, memuse=3)

minsize=_2G + 2 should trigger your second problem (where the size wraps
to a negative number). Then 7G is "enough" for the test to run.

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-30 Thread Andreas Eisele

Andreas Eisele added the comment:

> How do you run the test? Do you specify a maximum available size?
I naively assumed that running "make test" from the toplevel would be
clever about finding plausible parameters. However, it runs the bigmem
tests in a minimalistic way, skipping essentially all interesting bits.  

Thanks for the hints on giving the maximal available size explicitly,
which work in principle, but make testing rather slow. Also, if the
encode/decode test are decorated with 
@bigmemtest(minsize=_2G*2+2, memuse=3)
one needs to specify at least -M 15g, otherwise the tests are still
skipped.  No wonder that people do not normally run them...

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-30 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

> the test is done only once with a small size (5147)
How do you run the test? Do you specify a maximum available size?
If you run test_bigmem.py directly, try to run it with an additional
argument like this:
./test_bigmem.py 7G
If you run regrtest.py, you should add an option like "-M 7G".
(assuming you have enough RAM...)

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-30 Thread Andreas Eisele

Andreas Eisele added the comment:

Tried
@bigmemtest(minsize=_2G*2+2, memuse=3)
but no change; the test is done only once with a small
size (5147).  Apparently something does not work as
expected here. I'm trying this with 2.6 (Revision 59231).

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-30 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

> Alas, the extended test code still does not catch the problem
Can you please try again by changing in the tests:
  minsize=_2G
into 
  minsize=_2G * 2 + 2
The length has to be greater than 4G for an int to loose digits.

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-30 Thread Andreas Eisele

Andreas Eisele added the comment:

Thanks a lot for the patch, which indeed seems to solve the issue.
Alas, the extended test code still does not catch the problem, at
least in my installation. Someone with a better understanding of
how these tests work and with access to a 64bit machine should 
still have a look.

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-29 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

Here is a patch, with a unit test (I was surprised that test_bigmem.py
already contained a test_decode function, which was left empty).

But I still don't have access to any 64bit machine.
Can someone try and see if the new tests in test_bigmem.py fail, and
that the patch in getargs.c corrects the problem?

Added file: http://bugs.python.org/file8832/getargs.patch

__
Tracker <[EMAIL PROTECTED]>

__

getargs.patch
Description: Binary data
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-29 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc added the comment:

I don't have any 64bit machine to test with,
but it seems to me that there is a problem in the function
getargs.c::convertsimple(): the t# and w# formats use the buffer
interface, but the code uses an int to store its length!

Look for the variables declared as "int count;". I suggest to replace it
with a Py_ssize_t in both places.

Shouldn't the compiler emit some warning in this case?

--
nosy: +amaury.forgeotdarc

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-29 Thread Andreas Eisele

Andreas Eisele added the comment:

An instance of the other problem:

Python 2.5.1 (r251:54863, Aug 30 2007, 16:15:51) 
[GCC 4.1.0 (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
__[1] >>> s=" "*int(25E8)
2.99 sec
__[1] >>> u=s.decode("utf-8")
Traceback (most recent call last):
  File "", line 1, in 
  File
"/home/cl-home/eisele/lns-root-07/lib/python2.5/encodings/utf_8.py",
line 16, in decode
return codecs.utf_8_decode(input, errors, True)
TypeError: utf_8_decode() argument 1 must be (unspecified), not str
__[1] >>>

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-29 Thread Andreas Eisele

Andreas Eisele added the comment:

For instance:

Python 2.5.1 (r251:54863, Aug 30 2007, 16:15:51) 
[GCC 4.1.0 (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
__[1] >>> s=" "*int(5E9)
6.05 sec
__[1] >>> u=s.decode("utf-8")
4.71 sec
__[1] >>> len(u) 
705032704
__[2] >>> len(s)
50
__[3] >>> 

I would have expected both lengths to be 5E9

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-29 Thread Walter Dörwald

Walter Dörwald added the comment:

Can you attach a (small) example that demonstrates the bug?

--
nosy: +doerwalter

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1521] string.decode() fails on long strings

2007-11-29 Thread Andreas Eisele

New submission from Andreas Eisele:

s.decode("utf-8")

sometimes silently truncates the result if s has more than 2E9 Bytes,
sometimes raises a fairly incomprehensible exception:

Traceback (most recent call last):
  File "", line 2, in 
  File "/usr/lib64/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
TypeError: utf_8_decode() argument 1 must be (unspecified), not str

--
components: Unicode
messages: 57932
nosy: eisele
severity: normal
status: open
title: string.decode() fails on long strings
type: behavior
versions: Python 2.5

__
Tracker <[EMAIL PROTECTED]>

__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com