I think in the storm documentation it clearly says that not only you have to serialize your objects but when using custom types it is better to implement your own to avoid the "native" serializer which is quite slow. I have not used storm multi-lang though to be honest.
Regards. On Fri, May 29, 2015 at 2:33 PM, Carlos Perelló Marín < [email protected]> wrote: > Found the problem... I'm not serializing the json object so when I call > emit, it's a python dictionary. It works most of the time, but for some > reason we found several values that break it. > > I'm not 100% it's not a problem with the storm's multilang support, given > that the emit ends doing a json.dumps() call anyway before sending it to > the ShellBolt or ShellSpout Java class, so it should not break the protocol. > > I have a workaround for my problem, but would be nice to know if it's a > bug or the right behavior, because having to serialize / unserialize that > argument on every bolt would cost us some extra processing time. > > Thanks. > > On 28 May 2015 at 22:35, Andrew Xor <[email protected]> wrote: > >> This must be awkward as I have used storm with tuples that are quite >> large with no such problem. Try to replicate with a single spout that >> generates huge tuples and a single bolt as a consumer and report back your >> results >> >> Regards >> On Thu, May 28, 2015 at 10:59 PM Jeffery Maass <[email protected]> wrote: >> >>> I would take the kafka spout, JSON, your code out of the equation and >>> replicate the problem with a spout that generates strings of various >>> lengths around 75KB. >>> >>> Thank you for your time! >>> >>> +++++++++++++++++++++ >>> Jeff Maass <[email protected]> >>> linkedin.com/in/jeffmaass >>> stackoverflow.com/users/373418/maassql >>> +++++++++++++++++++++ >>> >>> >>> On Thu, May 28, 2015 at 2:45 PM, Carlos Perelló Marín < >>> [email protected]> wrote: >>> >>>> Hi, >>>> >>>> While working with Apache Storm 0.9.4 with python + multilang, I found >>>> that one tuple was hanging the topology. It took me a while to figure >>>> what's going on and why it stopped processing payloads until I found that >>>> the hanged bolt was blocked waiting from input on its stdin (it hangs >>>> calling emit). >>>> >>>> After inspecting the tuple that hanged it I found that it includes a >>>> json string that is about 75KB long, it's valid JSON so it's not corrupted >>>> but for some reason breaks when it's emitted. >>>> >>>> I'm using Kafka as a way to inject tuples into my topology and the >>>> KafkaSpout is able to inject such tuple so I wonder whether it's just a >>>> limitation of the multilang implementation... >>>> >>>> Is there any hint to debug or fix it? >>>> >>>> The worse thing is that there was no errors on the supervisor nor >>>> workers logs I just found this because I inspected the processes manually >>>> with strace and adding log output on my code to find the place where it >>>> hanged. >>>> >>>> Thanks in advance! >>>> >>>> -- >>>> >>>> Carlos Perelló Marínhttps://www.serverdensity.com >>>> >>>> >>> > > > -- > > Carlos Perelló Marínhttps://www.serverdensity.com > >
