To clarify my point is that config is NOT suitable for passing database configuration. It can be used to pass database connection configuration though (eg host/port values). On Apr 22, 2016 6:41 AM, "Navin Ipe" <[email protected]> wrote:
> Thank you very much for your time and help, Nathan and John. > Followed up on serialization, and it looks like everything can be > serialized: http://stackoverflow.com/a/16851174/453673 > Will verify the database connection serialization also during > implementation. > > On Thu, Apr 21, 2016 at 5:22 PM, Nathan Leung <[email protected]> wrote: > >> mongoManager is serialized and sent to your spout. If it's not something >> that's easily serializable (e.g. a database connection) then you will need >> to initialize it in spout prepare() instead of the constructor. >> >> On Thu, Apr 21, 2016 at 7:34 AM, Navin Ipe < >> [email protected]> wrote: >> >>> Thanks John, but that's odd...in the code I shared, there's a reference >>> to mongoManager being used in the Spout (the MongoSpout internally stores a >>> reference to mongoManager). If there are no object references shared >>> between executors, then when the topology I created is submitted to Storm, >>> would Storm serialize or clone and store an instance of mongoManager (and >>> all initialized values inside it) inside the Spout? Storm would surely have >>> to do *something *to ensure that references aren't cut off when workers >>> operate in different JVM's... >>> >>> On Thu, Apr 21, 2016 at 4:45 PM, <[email protected]> wrote: >>> >>>> Netty is used for communication between workers and then the LMAX >>>> disruptor queue is used to route messages between Netty and the individual >>>> executors such as the MongoSpout and KafkaBolt. AFAIK, there are not direct >>>> object references shared between executors because all executors >>>> communicate via Netty/LMAX (shuffle/fieldsGrouping) or LMAX >>>> (localIrShuffleGrouping). >>>> >>>> --John >>>> >>>> Sent from my iPhone >>>> >>>> On Apr 21, 2016, at 1:29 AM, Navin Ipe <[email protected]> >>>> wrote: >>>> >>>> In the below code, >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *public static void main(String[] cmdArgs) {Config config = new >>>> Config();config.setNumWorkers(5); MongoManager mongoManager = new >>>> MongoManager();TopologyBuilder builder = new >>>> TopologyBuilder();builder.setSpout("someSpout", new >>>> MongoSpout(mongoManger)));}* >>>> >>>> Assuming there are many more spouts and blots created, I understand >>>> that each worker will run in its own JVM >>>> <http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/>, >>>> which means that it will have its own memory space. >>>> >>>> *Questions:* >>>> *1.* So when the mongoManager reference is passed to MongoSpout, will >>>> MongoSpout always be able to access the initialized members of >>>> mongoManager? >>>> *2.* Isn't it likely that main() runs in a different JVM and a >>>> MongoSpout will be in another JVM? How would Storm access mongoManager? >>>> Using Netty? >>>> *3.* (optional help) I have the Storm source code. Could anyone point >>>> me to the part that Storm does the inter-worker communication for accessing >>>> class references? >>>> >>>> -- >>>> Regards, >>>> Navin >>>> >>>> >>> >>> >>> -- >>> Regards, >>> Navin >>> >> >> > > > -- > Regards, > Navin >
