[issue19219] speed up marshal.loads()

2013-10-15 Thread Jesús Cea Avión

Changes by Jesús Cea Avión j...@jcea.es:


--
nosy: +jcea

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-13 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

In this case, we can remove a bunch of 'retval = NULL' from the code.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-12 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 4059e871e74e by Antoine Pitrou in branch 'default':
Issue #19219: Speed up marshal.loads(), and make pyc files slightly (5% to 10%) 
smaller.
http://hg.python.org/cpython/rev/4059e871e74e

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-12 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I've now committed the latest patch (marshal_opts5.patch).

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-12 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 2a2b339b6b59 by Christian Heimes in branch 'default':
Issue #19219: retval may be used uninitialized value
http://hg.python.org/cpython/rev/2a2b339b6b59

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Why adding ASCII strings, whereas you can add Latin1 (UCS1, U+-U+00FF)
 strings?

Reasons:
- most strings in pyc files are pure ASCII (it's like 99% in the stdlib)
- unmarshalling ASCII strings is faster: you can pass 127 to PyUnicode_New 
without scanning for non-ASCII chars

The aim here is to optimize the common cases. There is no reason to further 
complicate the code for rare cases.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread STINNER Victor

STINNER Victor added the comment:

 unmarshalling ASCII strings is faster: you can pass 127 to PyUnicode_New 
 without scanning for non-ASCII chars

Oh, I forgot this pain of the PEP 393. Don't tell me more, it's enough :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

I imagine that the test for ASCII is cheaper.  It corresponds to the new 
compact internal unicode representation (one byte characters).

This looks fine.  Can you quantify where the speedup comes from? Reading the 
code, I see we now maintain a small internal buffer in the file object, rather 
than using stack allocation at the call sites.  It is unclear to me how this 
saves memory, since the amount of memory copying should be the same.  Could it 
be that the speedup is all due to the native 8 bit support for unicode?

Have you looked at providing a special opcode for a few other magic numbers?(We 
have that in our own custom marshal format)
Some very common values are:
empty tuple
0
1

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 This looks fine.  Can you quantify where the speedup comes from?

From all changes, but mainly the ASCII special-casing and the new
buffering.

 Reading the code, I see we now maintain a small internal buffer in
 the file object, rather than using stack allocation at the call
 sites.  It is unclear to me how this saves memory, since the amount
 of memory copying should be the same.

No, memory copying is suppressed in many cases.

 Have you looked at providing a special opcode for a few other magic
 numbers?(We have that in our own custom marshal format)
 Some very common values are:
 empty tuple
 0
 1

It shouldn't be useful since marshal memoizes them anyway: only the
first appearance in a pyc file would benefit.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

  Reading the code, I see we now maintain a small internal buffer in
  the file object, rather than using stack allocation at the call
  sites.  It is unclear to me how this saves memory, since the amount
  of memory copying should be the same.
 
 No, memory copying is suppressed in many cases.

To clarify: the import logic uses marshal.loads(), not marshal.load().
So the aim is really to speed up unmarshalling from memory.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Right, in this case, memory copying is avoided.

Regarding the memoing of 0, empty tuple, etc:
Special opcode may still benefit because it takes only one byte.  So, you save 
four bytes for each such case.  I think it might be worth investigating.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 - unmarshalling ASCII strings is faster: you can pass 127 to PyUnicode_New 
 without scanning for non-ASCII chars

You should ensure that loaded bytes are ASCII-only. Otherwise broken or 
malicious marshalled data will compromise you program. Decoding UTF-8 is so 
fast as decoding ASCII (with checks) and is almost so fast as memcpy.

As for output, we could use cached UTF-8 representation of string (always 
exists for ASCII only strings) before calling PyUnicode_AsUTF8String().

I'm good with buffering and codes for short strings and tuples (I have not 
examined a code closely yet), but special casing ASCII looks not so good to me.

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread STINNER Victor

STINNER Victor added the comment:

You should ensure that loaded bytes are ASCII-only. Otherwise broken or 
malicious marshalled data will compromise you program.

This is not new, see the red warning in marshal doc:


Warning

The marshal module is not intended to be secure against erroneous or 
maliciously constructed data. Never unmarshal data received from an untrusted 
or unauthenticated source.


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

We have to make two distinctions here:
1) Loading data and then running it. This is a bad idea if your data is not 
trusted.  This is what is meant by marshal being unsafe.
2) Loading data and then not running it.  This is perfectly fine, because 
marshal has _no side effects_ when loading.  Only actually _running_ untrusted 
data is what you should be careful about.  In fact, using 'marshal' as a cheap 
and fast pickler for builtin types is actually a good idea because it has no 
side effects like invoking code.  (and I think the comment you refer to should 
be revised to make this clear)

So, will simply load ASCII data that is, in fact, not ASCII data, destabilize 
your program in any way?  Or even crash it?  If that is true, then we have a 
problem.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread STINNER Victor

STINNER Victor added the comment:

As for output, we could use cached UTF-8 representation of string (always 
exists for ASCII only strings) before calling PyUnicode_AsUTF8String().

PyUnicode_AsEncodedString(v, utf8, surrogatepass) is expensive. I proposed 
an optimization for the pickle module, Antoine finished the work: see issue 
#15596. It's exactly what you suggest: reuse PyUnicode_AsUTF8String().

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 In fact, using 'marshal' as a cheap and fast pickler for builtin types
 is actually a good idea because it has no side effects like invoking
 code.

It's an unsupported use case. The marshal docs are quite clear:

Therefore, the Python maintainers reserve the right to modify
the marshal format in backward incompatible ways should the need
arise. If you’re serializing and de-serializing Python objects,
use the pickle module instead [...]

So, it's a good idea as long as you're willing to deal with the
consequences :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread STINNER Victor

STINNER Victor added the comment:

marshal and pickle are unsafe, even without the patch attached to the issue. If 
you consider that it is an issue that should be fixed, please open a new issue. 
Antoine's patch doesn't make the module less secure, since it was already not 
secure :)

Loading untrusted data and executing untrusted code is not supported by Python. 
Many things should be fixed to support such use case, not only the marshal 
module. I'm interested by the topic (I wrote the pysandbox project, which is 
first try), but please discuss it elsewhere.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread STINNER Victor

STINNER Victor added the comment:

 please discuss it elsewhere.

Hum, I'm not sure that this word exist, I mean: somethere else :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Therefore, the Python maintainers reserve the right to modify
the marshal format in backward incompatible ways

sure, don't expect such things to survive version changes.  (Actually, they 
have been hitherto, and my version 3 I actually changed, to be so, the 
initial draft being unable to read version 2 data)
But for sending stuff over the wire, caching on disk, etc, its perfectly safe 
and super fast.  And what is more, the python developers (that's you and me) 
are super careful to never _crash_ the interpreter, no matter how broken any 
input data is.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Loading untrusted data ... is not supported by Python.
This is a pretty bold claim.  Is this, indeed, a fact?  (and yes, elsewhere 
is a word)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Anyway, whether or not Pyhon guarantees this and that wrt. untrusted is 
beside the point and offtopic, as Victor poitns out.

However:  We are, and have always been, careful to fail gracefully if we detect 
data corruption.  Never should the flipping of a bit on a file on the disk 
cause our program to crash.  It is fine when reading a corrupt file that an 
int(1) turns to int(2), or that we get an UnmarshalError when reading it, or 
that a hello world string turns to hello $orld.  What is not good is if the 
reading of the corrupt string causes the interpreter to crash.

My knowledge of the new unicode internals is limited at best.  If you don't 
think, Antoine, that putting non-7-bit data into the supposedly 7 bit ascii 
unicode data can cause an actual crash, but at worst a corrupt string, then I'm 
quite happy, personally :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Ronald Oussoren

Changes by Ronald Oussoren ronaldousso...@mac.com:


--
nosy: +ronaldoussoren

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 The marshal module is not intended to be secure against erroneous or 
 maliciously constructed data. Never unmarshal data received from an untrusted 
 or unauthenticated source.

Then we can simplify the marshal module by dropping all error handling: 
f.read() returned not bytes, read() returned too much data, EOF read where not 
expected, recursion limit exceeded, long/string/unicode/tuple/list/set size out 
of range, unnormalized long data, digit out of range in long, index list too 
large, invalid reference, unknown type code, NULL object in marshal data for 
set, UTF8 decoding errors, string to float converting errors, etc, etc. Sorry 
for sarcasm.

 It's exactly what you suggest: reuse PyUnicode_AsUTF8String().

Actually _PyUnicode_UTF8(). PyUnicode_AsUTF8String() creates UTF8 cache if it 
is not exists and this can be not desired. We could use this optimization in 
many other places, in particular in PyUnicode_AsUTF8String() itself.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Sorry for sarcasm.

Well, indeed, the sarcasm is undeserved here, if the interpreter cannot
crash because of the change.

  It's exactly what you suggest: reuse PyUnicode_AsUTF8String().
 
 Actually _PyUnicode_UTF8(). PyUnicode_AsUTF8String() creates UTF8
 cache if it is not exists and this can be not desired. We could use
 this optimization in many other places, in particular in
 PyUnicode_AsUTF8String() itself.

I don't understand how _PyUnicode_UTF8() can be used for *unmarshalling*.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

That said, I'll try out the patch with _PyUnicode_FromUCS1 instead of 
_PyUnicode_FromASCII, to see if it affects performance.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

Just for the record, I want to say that this is great stuff, Antoine!  It's 
great when this sort of stuff gets some attention.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 I don't understand how _PyUnicode_UTF8() can be used for *unmarshalling*.

I say about marshalling.

 That said, I'll try out the patch with _PyUnicode_FromUCS1 instead of 
 _PyUnicode_FromASCII, to see if it affects performance.

Could you try out the patch with PyUnicode_DecodeUTF8()? This will save you two 
opcodes and perhaps several lines of code.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

This will save you two opcodes and perhaps several lines of code. 
Just bear in mind that without other changes, version 4 needs to be backwards 
compatible with version 3.
I ran into this when developing version 3.  The reason is that while the 
marshal format includes the version information in its header, it isn't 
actually verified on loading. IIRC.  You specify the expected format to the 
function, or something like that.  So, if you don't do this, you get errors 
when loading previously generated .pyc files.
Of course, we are free to fix that problem as well :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

  That said, I'll try out the patch with _PyUnicode_FromUCS1 instead
  of _PyUnicode_FromASCII, to see if it affects performance.
 
 Could you try out the patch with PyUnicode_DecodeUTF8()? This will
 save you two opcodes and perhaps several lines of code.

That would be a change in behaviour, since currently surrogatepass
is the error handler.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 I ran into this when developing version 3.  The reason is that while
 the marshal format includes the version information in its header,
 it isn't actually verified on loading. IIRC.  You specify the
 expected format to the function, or something like that.  So, if you
 don't do this, you get errors when loading previously generated .pyc
 files.

Exactly (I also tried this :-)). The problem is the version number
is *outside* of the marshal format: e.g. it's in the pyc file header.
When freezing a module (see frozen.c), the version number isn't included.
So our freedom is quite limited here: we have to support legacy frozen
modules.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Kristján Valur Jónsson

Kristján Valur Jónsson added the comment:

How about adding a version opcode?  This is a backwards compatible change, and 
allows us to reject unsupported versions in the future, as long as they are not 
very old unsupported versions.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 This will save you two opcodes and perhaps several lines of code. 
 Just bear in mind that without other changes, version 4 needs to be backwards 
 compatible with version 3.

I meant two of new proposed opcodes: TYPE_ASCII and TYPE_ASCII_INTERNED.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 That would be a change in behaviour, since currently surrogatepass
is the error handler.

I don't propose any change in behaviour.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 I meant two of new proposed opcodes: TYPE_ASCII and
 TYPE_ASCII_INTERNED.

You cannot change the meaning of TYPE_UNICODE (it uses surrogatepass).
Therefore, you have to use new opcodes with other semantics.

Besides, opcodes are cheap.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 You cannot change the meaning of TYPE_UNICODE (it uses surrogatepass).
 Therefore, you have to use new opcodes with other semantics.

You don't need new semantic. Use old semantic. The set of ASCII encoded strings 
is a subset of valid UTF8 encoded strings, which is a subset of UTF8 encoded 
with surrogatepass strings.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

  You cannot change the meaning of TYPE_UNICODE (it uses
  surrogatepass).
  Therefore, you have to use new opcodes with other semantics.
 
 You don't need new semantic. Use old semantic. The set of ASCII
 encoded strings is a subset of valid UTF8 encoded strings, which is
 a subset of UTF8 encoded with surrogatepass strings.

Sorry, I don't understand you.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Brett Cannon

Changes by Brett Cannon br...@python.org:


--
nosy: +brett.cannon

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Updated patch using PyUnicode_FromKindAndData for ASCII strings, to ensure that 
corrupt marshal bytecode doesn't provide corrupt unicode objects. Performance 
is within 2% of the previous patch.

(however, a quick test suggests that PyUnicode_DecodeUTF8 is quite slower)

--
Added file: http://bugs.python.org/file32046/marshal_opts5.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Let a code say instead me.

--
Added file: http://bugs.python.org/file32049/marshal_opts5a.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Let a code say instead me.

I don't understand your patch. The macros you define aren't used
anywhere.
Also, your patch keeps the slow call to PyUnicode_DecodeUTF8.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-11 Thread STINNER Victor

STINNER Victor added the comment:

 (however, a quick test suggests that PyUnicode_DecodeUTF8 is quite slower)

It's surprising that PyUnicode_DecodeUTF8() is quite slower than 
_PyUnicode_FromUCS1(). _PyUnicode_FromUCS1() calls ucs1lib_find_max_char() and 
then memcpy(). PyUnicode_DecodeUTF8() first tries ascii_decode() which is very 
similar than ucs1lib_find_max_char().

The difference is maybe that _PyUnicode_FromUCS1() copies all bytes at once 
(memcpy()), whereas ascii_decode() copies bytes while if the string is ASCII or 
not.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-10 Thread Antoine Pitrou

New submission from Antoine Pitrou:

This patch contains assorted improvements for unmarshalling pyc files. It will 
also make them ~10% smaller.

$ ./python -m timeit -s import marshal; d=marshal.dumps(tuple((i, str(i)) for 
i in range(1000))) marshal.loads(d)

- 3.4 unpatched: 232 usec per loop
- 3.4 patched: 96.3 usec per loop
- 2.7 (for reference): 76.5 usec per loop

--
components: Interpreter Core
files: marshal_opts4.patch
keywords: patch
messages: 199408
nosy: barry, christian.heimes, pitrou
priority: normal
severity: normal
stage: patch review
status: open
title: speed up marshal.loads()
type: performance
versions: Python 3.4
Added file: http://bugs.python.org/file32037/marshal_opts4.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-10 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +kristjan.jonsson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19219] speed up marshal.loads()

2013-10-10 Thread STINNER Victor

STINNER Victor added the comment:

Why adding ASCII strings, whereas you can add Latin1 (UCS1, U+-U+00FF) 
strings?

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue19219
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com