Re: [Freeipa-devel] JSON problems (the woes of binary data)

2010-02-26 Thread Dmitri Pal
John Dennis wrote:
 The Problem:
 

 I've been looking at the encoding exception which is being thrown when
 you click on the Services menu item in our current implementation.
 By default we seem to be using JSON as our RPC mechanism. The
 exception is being thrown when the JSON encoder hits a certificate.
 Recall that we store certificates in LDAP as binary data and in our
 implementation we distinguish binary data from text by Python object
 type, text is *always* a unicode object and binary data is *always* a
 str object. However in Python 2.x str objects are believed to be text
 and are subject to encoding/decoding in many parts of the Python world.

 Unlike XML-RPC JSON does *not* have a binary type. In JSON there are
 *only* unicode strings. So what is happening is that that when the
 JSON encoder sees our certificate data in a str object it says str
 objects are text and we have to produce a UTF-8 unicode encoding from
 that str object. There's the problem! It's completely nonsensical to
 try and encode binary to to UTF-8.

 The right way to handle this is to encode the binary data to base64
 ASCII text and then hand it to JSON. FWIW our XML-RPC handler does
 this already because XML-RPC knows about binary data and elects to
 encode/decode it to base64 as it's marshaled and unmarshaled. But JSON
 can't do this during marhasling and unmarshaling because the JSON
 protocol has no concept of binary data.

 The python JSON encoder class does give us the option to hook into the
 encoder and check if the object is a str object and then base64
 encode. But that doesn't help us at the opposite end. How would we
 know when unmarshaling that a given string is supposed to be base64
 decoded back into binary data? We could prepend a special string and
 hope that string never gets used by normal text (yuck). Keeping a list
 of what needs base64 decoding is not an option within JSON because at
 the time of decoding we have no information available about the
 context of the JSON objects.

 That means if we want to use JSON we really should push the base64
 encode/decode to the parts of the code which have a priori knowledge
 about the objects they're pushing through the command interface. This
 would mean any command which passes a certificate should base64 encode
 it prior to sending it and base64 decode after it come back from a
 command result. Actually it would be preferable to use PEM encoding,
 and by the way, the whole reason why PEM encodings for certificates
 was developed was exactly for this scenario: transporting a
 certificate through a text based interchange mechanism!

 Possible Solutions:
 ---

 As I see it we have these options in front of us for how to deal with
 this problem:

 * Drop support for JSON, only use XML-RPC

 * Once we read a certificate from LDAP immediately convert it to PEM
 format. Adopt the convention that anytime we exchange certificates it
 will be in PEM format. Only convert from PEM format when the target
 demands binary (e.g. storing it in LDAP, passing it to a library
 expecting DER encoded data, etc.).

 * Come up with some hacky protocol on top of JSON which signals this
 string is really binary and check for it on every JSON encode/decode
 and cross our fingers no one tries to send a legitimate string which
 would trigger the encode/decode.

 Question: Are certificates the one and only example of binary data we
 exchange?

 Recommendation:
 ---

 My personal recommendation is we adopt the convention that
 certificates are always PEM encoded. We've already run into many
 problems trying to deduce what format a certificate is (e.g. binary,
 base64, PEM) I think it would be good if we just put a stake in the
 ground and said certificates are always PEM encoded and be done with
 all these problems we keep having with the data type of certificates.

 As an aside I'm also skeptical of the robustness of allowing binary
 data at all in our implementation. Trying to support binary data has
 been nothing but a headache and a source of many many bugs. Do we
 really need it?

Yeah, a good Friday afternoon problem to solve...
+1 to your recommendations, though I am not a specialist, but suggestion
seems logical.

-- 
Thank you,
Dmitri Pal

Engineering Manager IPA project,
Red Hat Inc.


---
Looking to carve out IT costs?
www.redhat.com/carveoutcosts/

___
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel


Re: [Freeipa-devel] JSON problems (the woes of binary data)

2010-02-26 Thread Simo Sorce
On Fri, 26 Feb 2010 15:59:53 -0500
John Dennis jden...@redhat.com wrote:

 My personal recommendation is we adopt the convention that
 certificates are always PEM encoded.

+1

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York

___
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel