Thank you very much for your time and help, Nathan and John. Followed up on serialization, and it looks like everything can be serialized: http://stackoverflow.com/a/16851174/453673 Will verify the database connection serialization also during implementation.
On Thu, Apr 21, 2016 at 5:22 PM, Nathan Leung <[email protected]> wrote: > mongoManager is serialized and sent to your spout. If it's not something > that's easily serializable (e.g. a database connection) then you will need > to initialize it in spout prepare() instead of the constructor. > > On Thu, Apr 21, 2016 at 7:34 AM, Navin Ipe < > [email protected]> wrote: > >> Thanks John, but that's odd...in the code I shared, there's a reference >> to mongoManager being used in the Spout (the MongoSpout internally stores a >> reference to mongoManager). If there are no object references shared >> between executors, then when the topology I created is submitted to Storm, >> would Storm serialize or clone and store an instance of mongoManager (and >> all initialized values inside it) inside the Spout? Storm would surely have >> to do *something *to ensure that references aren't cut off when workers >> operate in different JVM's... >> >> On Thu, Apr 21, 2016 at 4:45 PM, <[email protected]> wrote: >> >>> Netty is used for communication between workers and then the LMAX >>> disruptor queue is used to route messages between Netty and the individual >>> executors such as the MongoSpout and KafkaBolt. AFAIK, there are not direct >>> object references shared between executors because all executors >>> communicate via Netty/LMAX (shuffle/fieldsGrouping) or LMAX >>> (localIrShuffleGrouping). >>> >>> --John >>> >>> Sent from my iPhone >>> >>> On Apr 21, 2016, at 1:29 AM, Navin Ipe <[email protected]> >>> wrote: >>> >>> In the below code, >>> >>> >>> >>> >>> >>> >>> >>> >>> *public static void main(String[] cmdArgs) {Config config = new >>> Config();config.setNumWorkers(5); MongoManager mongoManager = new >>> MongoManager();TopologyBuilder builder = new >>> TopologyBuilder();builder.setSpout("someSpout", new >>> MongoSpout(mongoManger)));}* >>> >>> Assuming there are many more spouts and blots created, I understand that >>> each >>> worker will run in its own JVM >>> <http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/>, >>> which means that it will have its own memory space. >>> >>> *Questions:* >>> *1.* So when the mongoManager reference is passed to MongoSpout, will >>> MongoSpout always be able to access the initialized members of mongoManager? >>> *2.* Isn't it likely that main() runs in a different JVM and a >>> MongoSpout will be in another JVM? How would Storm access mongoManager? >>> Using Netty? >>> *3.* (optional help) I have the Storm source code. Could anyone point >>> me to the part that Storm does the inter-worker communication for accessing >>> class references? >>> >>> -- >>> Regards, >>> Navin >>> >>> >> >> >> -- >> Regards, >> Navin >> > > -- Regards, Navin
