I have had to do this for image data and per Antonio’s suggestion I am encoding and decoding my byte-array into base64. I’m using the clojure DSL and I’ve found it to be fairly performant (we have more optimizing on our image processing side to do).
Ruhollah Farchtchi [email protected] On Jan 8, 2014, at 1:55 PM, Antonio Verardi <[email protected]> wrote: > Hi, > > I am extensively using the multilang interface for Python. JSON is the way > you serialize things for communication. It adds a fairly amount of overhead, > but it is a reasonable design choice in terms of a multilang interface. > > If your question is: can I read byte array messages from a bolt (made up by > command, id, stream, task and tuple), the answer is "that's not that easy, > you should implement something in order to do that". > > If your question is: can I serialize byte arrays in JSON with Python and use > them as "values" for the field "tuple", the answer is: "yes, even though JSON > always produce string objects". > [http://docs.python.org/3.3/library/json.html#basic-usage]. You may want to > modify storm.py, in order to do that, or simply encode and decode your data > within your own bolt, it depends on your needs. > > This is something I found just googling about encoding binary data in JSON: > http://bytes.com/topic/python/answers/681314-simplejson-pack-binary-data > > I hope it was what you were looking for, > Antonio Uccio Verardi > > > > > On Tue, Jan 7, 2014 at 11:24 PM, churly lin <[email protected]> wrote: > Hi all, > > I am trying to write a topology with a KafkaSpout and a ShellBolt(implemented > by python ). > According to the Multilang-protocol, multilang uses json messages over > stdin/stdout to communicate with the subprocess. Specially, both ends of this > protocol use a line-reading mechanism. Does it mean that, in multilang, we > could not emit message as byte array? If not, how to read a byte array tuple > in a python bolt ? > the json which was read by python bolt is look like: > > { > "command": "emit", > // The id for the tuple. Leave this out for an unreliable emit. The > id can > // be a string or a number. > "id": "1231231", > // The id of the stream this tuple was emitted to. Leave this empty > to emit to default stream. > "stream": "1", > // If doing an emit direct, indicate the task to send the tuple to > "task": 9, > // All the values in this tuple > "tuple": ["field1", 2, 3]} > This example shows that, the "tuple" can be String("field1") and number(2, > 3). Could it be a byte array? >
