Re: [IronPython] Portable use of pickle.dumps()

Michael Foord Fri, 29 May 2009 08:11:27 -0700

Robert Smallshire wrote:

Hi Michael,
I'm trying to get some commercial code for a simple object
database we
have written for Python 2.6 to work with IronPython 2.6. In
Python 2.6
the return type of pickle.dumps() is str, which is of course a bytestring. In IronPython 2.6 it is also str, which is of course aunicode string. This 'compatibility' is fine until I put thosestrings into a database, at which point my interoperability betweenCPython and IronPython goes off the rails.
How is this actually a problem?
I mean, can you provide a specific example of where a string inIronPython doesn't behave as a byte string in CPython. I'm sure thereare such examples, but those may be bugs that the IPy teamcan fix. Inpractise I've encountered these problems very rarely.
My opening paragraph may be ambiguously worded - by 'interoperability' I
didn't mean the ability to run the same code unchanged on CPython and
IronPython (I have to change the code anyway to use a different database
adapter) - I meant interoperability between pickles persisted into a
database from both IronPython and CPython.


So are you telling the database that it is binary data or text?

Is the question how do I go from a pickle string in IronPython to a bytearray that I can pass to the database adaptor without going through anexplicit encode (which will transform the data)?

(One technique would be to explicitly use pickle protocol 0 which isless efficient but only creates ascii characters - this is actually thedefault. Another alternative would be to use JSON or YAML instead ofpickle.)

Here is an example of getting a byte array from a binary pickle inIronPython:


>>> import pickle
>>> class A(object):
...  b = 'hello'
...  c = (None, 'fish', 7.2, 7j)
...  a = {1: 2}
...
>>> p = pickle.dumps(A(), protocol=2)
>>> p
u'\x80\x02c__main__\nA\nq\x00)\x81q\x01}q\x02b.'
>>> from System import Array, Byte
>>> a = Array[Byte](tuple(Byte(ord(c)) for c in p))
>>> a

Array[Byte]((<System.Byte object at 0x0000000000000033 [128]>,<System.Byte obje...


I hope this is at least slightly helpful. :-)

Michael

My basic issue is that the 'str' unavoidably implies certain semantics when
calling .NET APIs from IronPython. These APIs interpret str as text rather
than just bytes, which therefore gets transformed by various text encodings,
such as UTF-8 to UTF-16. Such encodings are undesirable for my pickled data
since the result is no longer necessarily a valid pickle.   I suppose the
intention in Python 3.0 is that 'bytes' doesn't carry any semantics with it,
its just data, which is why pickle.dumps() in Python 3.0 returns bytes
rather than str.

I want to push plain old byte arrays into the database from both CPython and
IronPython, so I can avoid any head-scratching confusion with database
adapters and/or databases inappropriately encoding or decoding my data.

For example "data = [ord(c) for c in some_string]" has behaved asexpected many times for me in IronPython (and could help you turnstrings into bytes).


Thanks. I'll try something based on that.

Is this a theoretical problem at this stage or an actual problem?


Its an actual problem with SQLiteParameter.Value from the SQLite ADO.NET
provider.  I think our original CPython code is a bit sloppy with respect to
the distinction between text strings and byte arrays, so I'll probably need
to tighten things up on both sides.

Would you agree tha using unicode() and bytes() everywhere and avoiding
str() gives code that has the same meaning in Python 2.6, IronPython 2.6 and
Python 3.0?  Do you think this would be a good guideline to follow until we
can leave Python 2.x behind?

Many thanks,

Rob



--
http://www.ironpythoninaction.com/

_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

Re: [IronPython] Portable use of pickle.dumps()

Reply via email to