Right, that's why I don't understand why serializing with json the tuple
before emitting it fixes the issue. If the whole message is going to be
serialized with JSON anyway I would expect it to work. (I'm ignoring the
JSON encoding/decoding performance, just talking about functionality).
Also, the python dictionary doesn't have any data type that json is not
able to handle, so that's not the issue.

On 29 May 2015 at 14:35, Nathan Leung <[email protected]> wrote:

> The default (and in old releases ONLY) multi lang serializer is json,
> which is in fact slow.
> On May 29, 2015 8:04 AM, "Andrew Xor" <[email protected]> wrote:
>
>> ​I think in the storm documentation it clearly says that not only you
>> have to serialize your objects but when using custom types it is better to
>> implement your own to avoid the "native" serializer which is quite slow.​ I
>> have not used storm multi-lang though to be honest.
>>
>> Regards.
>>
>> On Fri, May 29, 2015 at 2:33 PM, Carlos Perelló Marín <
>> [email protected]> wrote:
>>
>>> Found the problem... I'm not serializing the json object so when I call
>>> emit, it's a python dictionary. It works most of the time, but for some
>>> reason we found several values that break it.
>>>
>>> I'm not 100% it's not a problem with the storm's multilang support,
>>> given that the emit ends doing a json.dumps() call anyway before sending it
>>> to the ShellBolt or ShellSpout Java class, so it should not break the
>>> protocol.
>>>
>>> I have a workaround for my problem, but would be nice to know if it's a
>>> bug or the right behavior, because having to serialize / unserialize that
>>> argument on every bolt would cost us some extra processing time.
>>>
>>> Thanks.
>>>
>>> On 28 May 2015 at 22:35, Andrew Xor <[email protected]> wrote:
>>>
>>>> This must be awkward as I have used storm with tuples that are quite
>>>> large with no such problem. Try to replicate with a single spout that
>>>> generates huge tuples and a single bolt as a consumer and report back your
>>>> results
>>>>
>>>> Regards
>>>> On Thu, May 28, 2015 at 10:59 PM Jeffery Maass <[email protected]>
>>>> wrote:
>>>>
>>>>> I would take the kafka spout, JSON, your code out of the equation and
>>>>> replicate the problem with a spout that generates strings of various
>>>>> lengths around 75KB.
>>>>>
>>>>> Thank you for your time!
>>>>>
>>>>> +++++++++++++++++++++
>>>>> Jeff Maass <[email protected]>
>>>>> linkedin.com/in/jeffmaass
>>>>> stackoverflow.com/users/373418/maassql
>>>>> +++++++++++++++++++++
>>>>>
>>>>>
>>>>> On Thu, May 28, 2015 at 2:45 PM, Carlos Perelló Marín <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> While working with Apache Storm 0.9.4 with python + multilang, I
>>>>>> found that one tuple was hanging the topology. It took me a while to 
>>>>>> figure
>>>>>> what's going on and why it stopped processing payloads until I found that
>>>>>> the hanged bolt was blocked waiting from input on its stdin (it hangs
>>>>>> calling emit).
>>>>>>
>>>>>> After inspecting the tuple that hanged it I found that it includes a
>>>>>> json string that is about 75KB long, it's valid JSON so it's not 
>>>>>> corrupted
>>>>>> but for some reason breaks when it's emitted.
>>>>>>
>>>>>> I'm using Kafka as a way to inject tuples into my topology and the
>>>>>> KafkaSpout is able to inject such tuple so I wonder whether it's just a
>>>>>> limitation of the multilang implementation...
>>>>>>
>>>>>> Is there any hint to debug or fix it?
>>>>>>
>>>>>> The worse thing is that there was no errors on the supervisor nor
>>>>>> workers logs I just found this because I inspected the processes manually
>>>>>> with strace and adding log output on my code to find the place where it
>>>>>> hanged.
>>>>>>
>>>>>> Thanks in advance!
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Carlos Perelló Marínhttps://www.serverdensity.com
>>>>>>
>>>>>>
>>>>>
>>>
>>>
>>> --
>>>
>>> Carlos Perelló Marínhttps://www.serverdensity.com
>>>
>>>
>>


-- 

Carlos Perelló Marínhttps://www.serverdensity.com

Reply via email to