Changes by Jesús Cea Avión j...@jcea.es:
--
nosy: +jcea
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Kristján Valur Jónsson added the comment:
In this case, we can remove a bunch of 'retval = NULL' from the code.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
Roundup Robot added the comment:
New changeset 4059e871e74e by Antoine Pitrou in branch 'default':
Issue #19219: Speed up marshal.loads(), and make pyc files slightly (5% to 10%)
smaller.
http://hg.python.org/cpython/rev/4059e871e74e
--
nosy: +python-dev
Antoine Pitrou added the comment:
I've now committed the latest patch (marshal_opts5.patch).
--
resolution: - fixed
stage: patch review - committed/rejected
status: open - closed
___
Python tracker rep...@bugs.python.org
Roundup Robot added the comment:
New changeset 2a2b339b6b59 by Christian Heimes in branch 'default':
Issue #19219: retval may be used uninitialized value
http://hg.python.org/cpython/rev/2a2b339b6b59
--
___
Python tracker rep...@bugs.python.org
Antoine Pitrou added the comment:
Why adding ASCII strings, whereas you can add Latin1 (UCS1, U+-U+00FF)
strings?
Reasons:
- most strings in pyc files are pure ASCII (it's like 99% in the stdlib)
- unmarshalling ASCII strings is faster: you can pass 127 to PyUnicode_New
without scanning
STINNER Victor added the comment:
unmarshalling ASCII strings is faster: you can pass 127 to PyUnicode_New
without scanning for non-ASCII chars
Oh, I forgot this pain of the PEP 393. Don't tell me more, it's enough :-)
--
___
Python tracker
Kristján Valur Jónsson added the comment:
I imagine that the test for ASCII is cheaper. It corresponds to the new
compact internal unicode representation (one byte characters).
This looks fine. Can you quantify where the speedup comes from? Reading the
code, I see we now maintain a small
Antoine Pitrou added the comment:
This looks fine. Can you quantify where the speedup comes from?
From all changes, but mainly the ASCII special-casing and the new
buffering.
Reading the code, I see we now maintain a small internal buffer in
the file object, rather than using stack
Antoine Pitrou added the comment:
Reading the code, I see we now maintain a small internal buffer in
the file object, rather than using stack allocation at the call
sites. It is unclear to me how this saves memory, since the amount
of memory copying should be the same.
No, memory
Kristján Valur Jónsson added the comment:
Right, in this case, memory copying is avoided.
Regarding the memoing of 0, empty tuple, etc:
Special opcode may still benefit because it takes only one byte. So, you save
four bytes for each such case. I think it might be worth investigating.
Serhiy Storchaka added the comment:
- unmarshalling ASCII strings is faster: you can pass 127 to PyUnicode_New
without scanning for non-ASCII chars
You should ensure that loaded bytes are ASCII-only. Otherwise broken or
malicious marshalled data will compromise you program. Decoding UTF-8
STINNER Victor added the comment:
You should ensure that loaded bytes are ASCII-only. Otherwise broken or
malicious marshalled data will compromise you program.
This is not new, see the red warning in marshal doc:
Warning
The marshal module is not intended to be secure against erroneous or
Kristján Valur Jónsson added the comment:
We have to make two distinctions here:
1) Loading data and then running it. This is a bad idea if your data is not
trusted. This is what is meant by marshal being unsafe.
2) Loading data and then not running it. This is perfectly fine, because
STINNER Victor added the comment:
As for output, we could use cached UTF-8 representation of string (always
exists for ASCII only strings) before calling PyUnicode_AsUTF8String().
PyUnicode_AsEncodedString(v, utf8, surrogatepass) is expensive. I proposed
an optimization for the pickle module,
Antoine Pitrou added the comment:
In fact, using 'marshal' as a cheap and fast pickler for builtin types
is actually a good idea because it has no side effects like invoking
code.
It's an unsupported use case. The marshal docs are quite clear:
Therefore, the Python maintainers reserve the
STINNER Victor added the comment:
marshal and pickle are unsafe, even without the patch attached to the issue. If
you consider that it is an issue that should be fixed, please open a new issue.
Antoine's patch doesn't make the module less secure, since it was already not
secure :)
Loading
STINNER Victor added the comment:
please discuss it elsewhere.
Hum, I'm not sure that this word exist, I mean: somethere else :-)
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
Kristján Valur Jónsson added the comment:
Therefore, the Python maintainers reserve the right to modify
the marshal format in backward incompatible ways
sure, don't expect such things to survive version changes. (Actually, they
have been hitherto, and my version 3 I actually changed, to be
Kristján Valur Jónsson added the comment:
Loading untrusted data ... is not supported by Python.
This is a pretty bold claim. Is this, indeed, a fact? (and yes, elsewhere
is a word)
--
___
Python tracker rep...@bugs.python.org
Kristján Valur Jónsson added the comment:
Anyway, whether or not Pyhon guarantees this and that wrt. untrusted is
beside the point and offtopic, as Victor poitns out.
However: We are, and have always been, careful to fail gracefully if we detect
data corruption. Never should the flipping of
Changes by Ronald Oussoren ronaldousso...@mac.com:
--
nosy: +ronaldoussoren
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Serhiy Storchaka added the comment:
The marshal module is not intended to be secure against erroneous or
maliciously constructed data. Never unmarshal data received from an untrusted
or unauthenticated source.
Then we can simplify the marshal module by dropping all error handling:
Antoine Pitrou added the comment:
Sorry for sarcasm.
Well, indeed, the sarcasm is undeserved here, if the interpreter cannot
crash because of the change.
It's exactly what you suggest: reuse PyUnicode_AsUTF8String().
Actually _PyUnicode_UTF8(). PyUnicode_AsUTF8String() creates UTF8
Antoine Pitrou added the comment:
That said, I'll try out the patch with _PyUnicode_FromUCS1 instead of
_PyUnicode_FromASCII, to see if it affects performance.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
Kristján Valur Jónsson added the comment:
Just for the record, I want to say that this is great stuff, Antoine! It's
great when this sort of stuff gets some attention.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
Serhiy Storchaka added the comment:
I don't understand how _PyUnicode_UTF8() can be used for *unmarshalling*.
I say about marshalling.
That said, I'll try out the patch with _PyUnicode_FromUCS1 instead of
_PyUnicode_FromASCII, to see if it affects performance.
Could you try out the patch
Kristján Valur Jónsson added the comment:
This will save you two opcodes and perhaps several lines of code.
Just bear in mind that without other changes, version 4 needs to be backwards
compatible with version 3.
I ran into this when developing version 3. The reason is that while the
marshal
Antoine Pitrou added the comment:
That said, I'll try out the patch with _PyUnicode_FromUCS1 instead
of _PyUnicode_FromASCII, to see if it affects performance.
Could you try out the patch with PyUnicode_DecodeUTF8()? This will
save you two opcodes and perhaps several lines of code.
That
Antoine Pitrou added the comment:
I ran into this when developing version 3. The reason is that while
the marshal format includes the version information in its header,
it isn't actually verified on loading. IIRC. You specify the
expected format to the function, or something like that.
Kristján Valur Jónsson added the comment:
How about adding a version opcode? This is a backwards compatible change, and
allows us to reject unsupported versions in the future, as long as they are not
very old unsupported versions.
--
___
Python
Serhiy Storchaka added the comment:
This will save you two opcodes and perhaps several lines of code.
Just bear in mind that without other changes, version 4 needs to be backwards
compatible with version 3.
I meant two of new proposed opcodes: TYPE_ASCII and TYPE_ASCII_INTERNED.
Serhiy Storchaka added the comment:
That would be a change in behaviour, since currently surrogatepass
is the error handler.
I don't propose any change in behaviour.
--
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
Antoine Pitrou added the comment:
I meant two of new proposed opcodes: TYPE_ASCII and
TYPE_ASCII_INTERNED.
You cannot change the meaning of TYPE_UNICODE (it uses surrogatepass).
Therefore, you have to use new opcodes with other semantics.
Besides, opcodes are cheap.
--
Serhiy Storchaka added the comment:
You cannot change the meaning of TYPE_UNICODE (it uses surrogatepass).
Therefore, you have to use new opcodes with other semantics.
You don't need new semantic. Use old semantic. The set of ASCII encoded strings
is a subset of valid UTF8 encoded strings,
Antoine Pitrou added the comment:
You cannot change the meaning of TYPE_UNICODE (it uses
surrogatepass).
Therefore, you have to use new opcodes with other semantics.
You don't need new semantic. Use old semantic. The set of ASCII
encoded strings is a subset of valid UTF8 encoded
Changes by Brett Cannon br...@python.org:
--
nosy: +brett.cannon
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list
Antoine Pitrou added the comment:
Updated patch using PyUnicode_FromKindAndData for ASCII strings, to ensure that
corrupt marshal bytecode doesn't provide corrupt unicode objects. Performance
is within 2% of the previous patch.
(however, a quick test suggests that PyUnicode_DecodeUTF8 is
Serhiy Storchaka added the comment:
Let a code say instead me.
--
Added file: http://bugs.python.org/file32049/marshal_opts5a.patch
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
Antoine Pitrou added the comment:
Let a code say instead me.
I don't understand your patch. The macros you define aren't used
anywhere.
Also, your patch keeps the slow call to PyUnicode_DecodeUTF8.
--
___
Python tracker rep...@bugs.python.org
STINNER Victor added the comment:
(however, a quick test suggests that PyUnicode_DecodeUTF8 is quite slower)
It's surprising that PyUnicode_DecodeUTF8() is quite slower than
_PyUnicode_FromUCS1(). _PyUnicode_FromUCS1() calls ucs1lib_find_max_char() and
then memcpy(). PyUnicode_DecodeUTF8()
New submission from Antoine Pitrou:
This patch contains assorted improvements for unmarshalling pyc files. It will
also make them ~10% smaller.
$ ./python -m timeit -s import marshal; d=marshal.dumps(tuple((i, str(i)) for
i in range(1000))) marshal.loads(d)
- 3.4 unpatched: 232 usec per loop
Changes by Antoine Pitrou pit...@free.fr:
--
nosy: +kristjan.jonsson
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list
STINNER Victor added the comment:
Why adding ASCII strings, whereas you can add Latin1 (UCS1, U+-U+00FF)
strings?
--
nosy: +haypo
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
44 matches
Mail list logo