On 03 Apr 2009, at 17:21, Chad Walters wrote:
In addition, he wants to have it all to work dynamically so, for
example, a
Python script used in Hadoop Streaming can read the header and pull
fields
out of records in the stream without needing to have the generated
bindings.
FWIW, in OSIS I do something similar: Thrift serialization without
generating the bindings, not generating Thrift IDL at all, basically,
but using the Thrift protocol (and Python library) to serialize
objects of 'my' type, infering all required object model information
from a different model description format (Python classes inheriting
from a certain base class with fields of specific types defined, think
Django models).
Embedding model definition metadata in an object(list) should be
trivial using a similar method.
Thrift related code is at [1].
OSIS also contains some (rather trivial) code for this whole 'data
model description once, then several data rows' system, where the base
input is a tuple of 2 items, the first being a list of all
'columns' (every item in the list is a list as well defining column
name and data type), the second one being a list of lists where every
sublist equals one 'document', see [2].
FWIW, maybe some ideas might be reusable (yes, that whole codebase
needs more documentation).
Nicolas
[1] http://osisproject.org/browser/osis/model/serializers/_thrift.py
[2] http://osisproject.org/browser/osis/client/view.py