Hi Ravi, Thanks for the reply . I got your point of using different bolts for mysql and Mongo .
One thing though , is it a good idea to use different topologies within the same cluster . The idea behind above rational is if I use the same topology but different bolts to do processing , I believe failure in any one of the bolts will cause entire message to be replayed . Though this may not mean any real problem in any of the database (like multiple inserts wont cause any problem ) but overall throughput of ur topology will affect . With different topologies , the idea is to seperate execution to different set of spouts and bolts . So , assuming that topology which had been given the responsibility of doing a different task fails , it wont effect the other topologies . If my rationale is correct , how does it effect cost wise maintaining different topologies .Also , for simulating and testing this at my end , can I test this on local cluster? Thanks Ankur On Mon, Oct 12, 2015 at 3:22 PM, Ravi Sharma <[email protected]> wrote: > Hi Ankur, > > Storm's design is stateless, so storm cant store any info about what bolts > were successful and which one failed. > Idea is to replay the message again without affecting the final outcome. > (means if mysql was success, it shudnt add two rows in case its replayed) > > From looking at far i would say you may be fixing an issue which hasnt > happened yet. Assumption is that one DB will be failing a lot, i guess this > may not be real case. > Any of the DB can fail once in a while and replaying them shudnt affect ur > performance. (say less then 10% Message failed) , you will be planning > atleast 50% more capacity then ur max load. > > > If you really want it to be very effective, i say use something like redis > and store your bolt status with message id there, so every time you plan to > start a bolt proessing check if you have already completed it succesfully, > if yes then skip it. > I have defined my own MessageId object and always put a retry count in it. > So first one goes with 0, and at that moment you can avoid the redis/nosql > checks. > But then u r adding one more technology and it just increased the > complexity. > > > Whatever design you choose, i will still suggest to use two bolts, Monogo > and mysql both are different cluster(hardware) and technology(software), > they both will have different throughput and scalability. And as per your > requirment you dont care if data hasnt reached to one exactly at same time, > no atomicity (basically its not one transaction), so you dont want to slow > down one system because other is slower. > > > Last suggestion is to go with two spouts.... both will read from same > topic(not queue), so all messages will be delivered to both Spouts. One > Spout will send message to Mysql Bolt other will send to Mongo Bolt. > > > Ravi. > > > > > Ravi. > > > > > > > > > > > > > > On Mon, Oct 12, 2015 at 10:14 AM, Ankur Garg <[email protected]> wrote: > >> LOL .. I was looking for something better :) ..If you see then having >> multiple bolts here do not help much .. It would have helped had there been >> a provision to skip the already executed Bolts . >> >> >> I believe this should be there in Storm . >> >> Thanks >> Ankur >> >> On Mon, Oct 12, 2015 at 2:42 PM, Susheel Kumar Gadalay < >> [email protected]> wrote: >> >>> Check and insert >>> >>> On 10/12/15, Ankur Garg <[email protected]> wrote: >>> > But what if MongoDb bolt has some error , in that case I suppose the >>> entire >>> > tuple will be replayed from Spout meaning it will have to redo the >>> > operation of inserting into sql . Is there a way I can skip inserting >>> into >>> > mysql ? >>> > >>> > On Mon, Oct 12, 2015 at 1:54 PM, Susheel Kumar Gadalay >>> > <[email protected]> >>> > wrote: >>> > >>> >> It is better to have 2 bolts - mysql bolt and mongodb bolt. >>> >> >>> >> Let mysql bolt forward the tuple to mongodb bolt, so in case of error >>> >> it won't emit. >>> >> >>> >> On 10/12/15, Ankur Garg <[email protected]> wrote: >>> >> > So I have a situation where the tuple received on Spout has to be >>> saved >>> >> to >>> >> > mysql database and mongoDb as well . >>> >> > >>> >> > What should be better . Using 1 bolt to save it into mysql and >>> MongoDb >>> >> or 2 >>> >> > seperate Bolts (One for saving into mysql and other for saving into >>> >> Mongo). >>> >> > >>> >> > What happens when the exception occurs while saving into mysql ? I >>> >> believe >>> >> > I will get acknowledgement inside the fail method in my Spout . So >>> If I >>> >> > reprocess it using 2 bolts , I believe it will again be sent to Bolt >>> >> > for >>> >> > saving into Mongo database . >>> >> > >>> >> > If the above is true , will having 2 seperate bolts be of any >>> advantage >>> >> > ? >>> >> > how can I configure things so that Failure while inserting into >>> mysql >>> >> does >>> >> > not impact inserting into MongoDb . >>> >> > >>> >> > Thanks >>> >> > Ankur >>> >> > >>> >> > On Sun, Oct 11, 2015 at 10:57 PM, Ravi Sharma <[email protected]> >>> >> wrote: >>> >> > >>> >> >> That depends if ur spout error has affected jvm or normal >>> application >>> >> >> error >>> >> >> >>> >> >> performance issue in case of lot of errors, I don't think there is >>> any >>> >> >> issue be coz of errors themselves but ofcourse if u r retrying >>> these >>> >> >> messages on failure then that means u will be processing lot of >>> >> >> messages >>> >> >> then normal and overall throughput will go down >>> >> >> >>> >> >> Ravi >>> >> >> >>> >> >> If ur topology has enabled acknowledgment that means spout will >>> always >>> >> >> receive >>> >> >> On 11 Oct 2015 18:15, "Ankur Garg" <[email protected]> wrote: >>> >> >> >>> >> >>> >>> >> >>> Thanks for the reply Abhishek and Ravi . >>> >> >>> >>> >> >>> One question though , going with One topology with multiple spouts >>> >> >>> ...What if something goes wrong in One spout or its associated >>> bolts >>> >> >>> .. >>> >> >>> Does it impact other Spout as well? >>> >> >>> >>> >> >>> Thanks >>> >> >>> Ankur >>> >> >>> >>> >> >>> On Sun, Oct 11, 2015 at 10:21 PM, Ravi Sharma < >>> [email protected]> >>> >> >>> wrote: >>> >> >>> >>> >> >>>> No 100% right ansers , u will have to test and see what will >>> fit.. >>> >> >>>> >>> >> >>>> persoanlly i wud suggest Multiple spouts in one Topology and if >>> you >>> >> >>>> have >>> >> >>>> N node where topology will be running then each Spout(reading >>> from >>> >> >>>> one >>> >> >>>> queue) shud run N times in parallel. >>> >> >>>> >>> >> >>>> if 2 Queues and say 4 Nodes >>> >> >>>> then one topolgy >>> >> >>>> 4 Spouts reading from Queue1 in different nodes >>> >> >>>> 4 spouts reading from Queue2 in different nodes >>> >> >>>> >>> >> >>>> Ravi. >>> >> >>>> >>> >> >>>> On Sun, Oct 11, 2015 at 5:25 PM, Abhishek priya < >>> >> >>>> [email protected]> wrote: >>> >> >>>> >>> >> >>>>> I guess this is a question where there r no really correct >>> answers. >>> >> >>>>> I'll certainly avoid#1 as it is better to keep logic separate >>> and >>> >> >>>>> lightweight. >>> >> >>>>> >>> >> >>>>> If your downstream bolts are same, then it makes senses to keep >>> >> >>>>> them >>> >> >>>>> in >>> >> >>>>> same topology but if they r totally different, I'll keep them in >>> >> >>>>> two >>> >> >>>>> different topologies. That will allow me to independently deploy >>> >> >>>>> and >>> >> >>>>> scale >>> >> >>>>> the topology. But if the rest of logic is same I topology >>> scaling >>> >> >>>>> and >>> >> >>>>> resource utilization will be better with one topology. >>> >> >>>>> >>> >> >>>>> I hope this helps.. >>> >> >>>>> >>> >> >>>>> Sent somehow.... >>> >> >>>>> >>> >> >>>>> > On Oct 11, 2015, at 9:07 AM, Ankur Garg <[email protected] >>> > >>> >> >>>>> > wrote: >>> >> >>>>> > >>> >> >>>>> > Hi , >>> >> >>>>> > >>> >> >>>>> > So I have a situation where I want to read messages from >>> >> >>>>> > different >>> >> >>>>> queues hosted in a Rabbitmq Server . >>> >> >>>>> > >>> >> >>>>> > Now , there are three ways which I can think to leverage >>> Apache >>> >> >>>>> > Storm >>> >> >>>>> here :- >>> >> >>>>> > >>> >> >>>>> > 1) Use the same Spout (say Spout A) to read messages from >>> >> >>>>> > different >>> >> >>>>> queues and based on the messages received emit it to different >>> >> >>>>> Bolts. >>> >> >>>>> > >>> >> >>>>> > 2) Use different Spout (Spout A and Spout B and so on) within >>> the >>> >> >>>>> same topology (say Topology A) to read messages from different >>> >> >>>>> queues >>> >> >>>>> . >>> >> >>>>> > >>> >> >>>>> > 3) Use Different Spouts one within eachTopology (Topology A , >>> >> >>>>> Topology B and so on) to read messages from different queues . >>> >> >>>>> > >>> >> >>>>> > Which is the best way to process this considering I want high >>> >> >>>>> throughput (more no of queue messages to be processed >>> concurrently) >>> >> >>>>> . >>> >> >>>>> > >>> >> >>>>> > Also , If In use same Topology for all Spouts (currently >>> though >>> >> >>>>> requirement is for 2 spouts) will failure in one Spout (or its >>> >> >>>>> associated >>> >> >>>>> Bolts) effect the second or will they both continue working >>> >> separately >>> >> >>>>> even >>> >> >>>>> if some failure is in Spout B ? >>> >> >>>>> > >>> >> >>>>> > Cost wise , how much would it be to maintain two different >>> >> >>>>> > topologies >>> >> >>>>> . >>> >> >>>>> > >>> >> >>>>> > Looking for inputs from members here. >>> >> >>>>> > >>> >> >>>>> > Thanks >>> >> >>>>> > Ankur >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> >>> >> >>>> >>> >> >>>> >>> >> >>> >>> >> > >>> >> >>> > >>> >> >> >
