The reason your code is working locally or with a single worker is because 
there is no reason for serialization to happen when everything is contained in 
the same JVM.  Once you add a worker, your parallelism hint now has the 
opportunity to ship the tuples to another JVM, thus serialization has to occur. 
 So the issue is not with an increasing number of workers, it’s with your 
serialization.  I am using scala as well and have yet to uncover an instance 
where I needed custom serialization… the out of the box java serialization 
seems to work well.

From: Matthew Waymost [mailto:[email protected]]
Sent: Friday, February 27, 2015 4:14 PM
To: [email protected]
Subject: KryoDecorator not working when setNumWorkers > 1

Hi everybody,

I'm a new user to storm and have hit a roadblock in getting my topology to run 
over multiple workers.

Our codebase is in scala and we send scala classes to storm, so I'm using a 
kryo decorator to call to chill's scala registrar to add all the serialization 
logic for scala classes to kryo. In addition, I have a custom serializer than 
I'm adding in the same decorator.

This has worked perfectly fine for me so far locally and on our cluster until I 
tried turning up the number of workers on which the topology runs. When I use 
conf.setNumWorkers to set the number of workers greater than 1, the topology 
gives me InvalidClassExceptions when attempting to deserialize our classes. 
Removing the setNumWorkers call such that the number of workers stays at the 
default of 1 resolves the problem and everything runs fine.

I'm completely stumped as to why this is happening, and I'm not sure how to 
diagnose the issue. I've tried the following:

* Configure the decorator through storm.yaml instead of in source code on all 
worker nodes and nimbus.
* Kill the topology, shut down all worker nodes, nimbus, and zookeeper, clear 
all temporary data, and bring it all back up.
* Verify that everything is using the same version of storm
* Searching google and staring at code

Looking at what's going on in the UI, it doesn't fail at the very first chance 
either. It appears only to fail around the part of the topology where I have a 
parallelismHint set, which is a few steps in. So I'm guessing it's directly a 
result of trying to run it over multiple workers, but I don't know what to do 
with that info.

We're running openjdk 7, zk 3.4.6, and storm 0.9.3 on gce. We've got 1 zk 
server, 1 nimbus server, and 3 worker servers. The call to the topology is made 
over drpc, and drpc is hosted on the nimbus server. The topology is implemented 
using trident.

Thanks for any help you can provide.

Matthew

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may 
contain information that is privileged, confidential and/or proprietary and 
subject to important terms and conditions available at 
http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended 
recipient, please delete this message.

Reply via email to