Re: Multiple Spouts in Same topology or Topology per spout

Ravi Sharma Mon, 12 Oct 2015 02:53:42 -0700

Hi Ankur,

Storm's design is stateless, so storm cant store any info about what bolts
were successful and which one failed.
Idea is to replay the message again without affecting the final outcome.
(means if mysql was success, it shudnt add two rows in case its replayed)


>From looking at far i would say you may be fixing an issue which hasnt
happened yet. Assumption is that one DB will be failing a lot, i guess this
may not be real case.
Any of the DB can fail once in a while and replaying them shudnt affect ur
performance. (say less then 10% Message failed) , you will be planning
atleast 50% more capacity then ur max load.


If you really want it to be very effective, i say use something like redis
and store your bolt status with message id there, so every time you plan to
start a bolt proessing check if you have already completed it succesfully,
if yes then skip it.
I have defined my own MessageId object and always put a retry count in it.
So first one goes with 0, and at that moment you can avoid the redis/nosql
checks.
But then u r adding one more technology and it just increased the
complexity.


Whatever design you choose, i will still suggest to use two bolts, Monogo
and mysql both are different cluster(hardware) and technology(software),
they both will have different throughput and scalability. And as per your
requirment you dont care if data hasnt reached to one exactly at same time,
no atomicity (basically its not one transaction), so you dont want to slow
down one system because other is slower.


Last suggestion is to go with two spouts....  both will read from same
topic(not queue), so all messages will be delivered to both Spouts. One
Spout will send message to Mysql Bolt other will send to Mongo Bolt.


Ravi.




Ravi.













On Mon, Oct 12, 2015 at 10:14 AM, Ankur Garg <[email protected]> wrote:

> LOL .. I was looking for something better :) ..If you see then having
> multiple bolts here do not help much .. It would have helped had there been
> a provision to skip the already executed Bolts .
>
>
> I believe this should be there in Storm .
>
> Thanks
> Ankur
>
> On Mon, Oct 12, 2015 at 2:42 PM, Susheel Kumar Gadalay <
> [email protected]> wrote:
>
>> Check and insert
>>
>> On 10/12/15, Ankur Garg <[email protected]> wrote:
>> > But what if MongoDb bolt has some error , in that case I suppose the
>> entire
>> > tuple will be replayed from Spout meaning it will have to redo the
>> > operation of inserting into sql . Is there a way I can skip inserting
>> into
>> > mysql ?
>> >
>> > On Mon, Oct 12, 2015 at 1:54 PM, Susheel Kumar Gadalay
>> > <[email protected]>
>> > wrote:
>> >
>> >> It is better to have 2 bolts - mysql bolt and mongodb bolt.
>> >>
>> >> Let mysql bolt forward the tuple to mongodb bolt, so in case of error
>> >> it won't  emit.
>> >>
>> >> On 10/12/15, Ankur Garg <[email protected]> wrote:
>> >> > So I have a situation where the tuple received on Spout has to be
>> saved
>> >> to
>> >> > mysql database and mongoDb as well .
>> >> >
>> >> > What should be better . Using 1 bolt to save it into mysql and
>> MongoDb
>> >> or 2
>> >> > seperate Bolts (One for saving into mysql and other for saving into
>> >> Mongo).
>> >> >
>> >> > What happens when the exception occurs while saving into mysql ? I
>> >> believe
>> >> > I will get acknowledgement inside the fail method in my Spout . So
>> If I
>> >> > reprocess it using 2 bolts , I believe it will again be sent to Bolt
>> >> > for
>> >> > saving into Mongo database .
>> >> >
>> >> > If the above is true , will having 2 seperate bolts be of any
>> advantage
>> >> > ?
>> >> > how can I configure things so that Failure while inserting into mysql
>> >> does
>> >> > not impact inserting into MongoDb .
>> >> >
>> >> > Thanks
>> >> > Ankur
>> >> >
>> >> > On Sun, Oct 11, 2015 at 10:57 PM, Ravi Sharma <[email protected]>
>> >> wrote:
>> >> >
>> >> >> That depends if ur spout error has affected jvm or normal
>> application
>> >> >> error
>> >> >>
>> >> >> performance issue in case of lot of errors, I don't think there is
>> any
>> >> >> issue be coz of errors themselves but ofcourse if u r retrying these
>> >> >> messages on failure then that means u will be processing lot of
>> >> >> messages
>> >> >> then normal and overall throughput will go down
>> >> >>
>> >> >> Ravi
>> >> >>
>> >> >> If ur topology has enabled acknowledgment that means spout will
>> always
>> >> >> receive
>> >> >> On 11 Oct 2015 18:15, "Ankur Garg" <[email protected]> wrote:
>> >> >>
>> >> >>>
>> >> >>> Thanks for the reply Abhishek and Ravi .
>> >> >>>
>> >> >>> One question though , going with One topology with multiple spouts
>> >> >>> ...What if something goes wrong in One spout or its associated
>> bolts
>> >> >>> ..
>> >> >>> Does it impact other Spout as well?
>> >> >>>
>> >> >>> Thanks
>> >> >>> Ankur
>> >> >>>
>> >> >>> On Sun, Oct 11, 2015 at 10:21 PM, Ravi Sharma <[email protected]
>> >
>> >> >>> wrote:
>> >> >>>
>> >> >>>> No 100% right ansers , u will have to test and see what will fit..
>> >> >>>>
>> >> >>>> persoanlly i wud suggest Multiple spouts in one Topology and if
>> you
>> >> >>>> have
>> >> >>>> N node where topology will be running then each Spout(reading from
>> >> >>>> one
>> >> >>>> queue) shud run N times in parallel.
>> >> >>>>
>> >> >>>> if 2 Queues and say 4 Nodes
>> >> >>>> then one topolgy
>> >> >>>> 4 Spouts reading from Queue1 in different nodes
>> >> >>>> 4 spouts reading from Queue2 in different nodes
>> >> >>>>
>> >> >>>> Ravi.
>> >> >>>>
>> >> >>>> On Sun, Oct 11, 2015 at 5:25 PM, Abhishek priya <
>> >> >>>> [email protected]> wrote:
>> >> >>>>
>> >> >>>>> I guess this is a question where there r no really correct
>> answers.
>> >> >>>>> I'll certainly avoid#1 as it is better to keep logic separate and
>> >> >>>>> lightweight.
>> >> >>>>>
>> >> >>>>> If your downstream bolts are same, then it makes senses to keep
>> >> >>>>> them
>> >> >>>>> in
>> >> >>>>> same topology but if they r totally different, I'll keep them in
>> >> >>>>> two
>> >> >>>>> different topologies. That will allow me to independently deploy
>> >> >>>>> and
>> >> >>>>> scale
>> >> >>>>> the topology. But if the rest of logic is same I topology scaling
>> >> >>>>> and
>> >> >>>>> resource utilization will be better with one topology.
>> >> >>>>>
>> >> >>>>> I hope this helps..
>> >> >>>>>
>> >> >>>>> Sent somehow....
>> >> >>>>>
>> >> >>>>> > On Oct 11, 2015, at 9:07 AM, Ankur Garg <[email protected]>
>> >> >>>>> > wrote:
>> >> >>>>> >
>> >> >>>>> > Hi ,
>> >> >>>>> >
>> >> >>>>> > So I have a situation where I want to read messages from
>> >> >>>>> > different
>> >> >>>>> queues hosted in a Rabbitmq Server .
>> >> >>>>> >
>> >> >>>>> > Now , there are three ways which I can think to leverage Apache
>> >> >>>>> > Storm
>> >> >>>>> here :-
>> >> >>>>> >
>> >> >>>>> > 1) Use the same Spout (say Spout A) to read messages from
>> >> >>>>> > different
>> >> >>>>> queues and based on the messages received emit it to different
>> >> >>>>> Bolts.
>> >> >>>>> >
>> >> >>>>> > 2) Use different Spout (Spout A and Spout B and so on) within
>> the
>> >> >>>>> same topology (say Topology A) to read messages from different
>> >> >>>>> queues
>> >> >>>>> .
>> >> >>>>> >
>> >> >>>>> > 3) Use Different Spouts one within eachTopology (Topology A ,
>> >> >>>>> Topology B and so on) to read messages from different queues .
>> >> >>>>> >
>> >> >>>>> > Which is the best way to process this considering I want high
>> >> >>>>> throughput (more no of queue messages to be processed
>> concurrently)
>> >> >>>>> .
>> >> >>>>> >
>> >> >>>>> > Also , If In use same Topology for all Spouts (currently though
>> >> >>>>> requirement is for 2 spouts)  will failure in one Spout (or its
>> >> >>>>> associated
>> >> >>>>> Bolts) effect the second or will they both continue working
>> >> separately
>> >> >>>>> even
>> >> >>>>> if some failure is in Spout B ?
>> >> >>>>> >
>> >> >>>>> > Cost wise , how much would it be to maintain two different
>> >> >>>>> > topologies
>> >> >>>>> .
>> >> >>>>> >
>> >> >>>>> > Looking for inputs from members here.
>> >> >>>>> >
>> >> >>>>> > Thanks
>> >> >>>>> > Ankur
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>>
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >
>> >>
>> >
>>
>
>

Re: Multiple Spouts in Same topology or Topology per spout

Reply via email to