Re: Multiple Spouts in Same topology or Topology per spout

Ankur Garg Mon, 12 Oct 2015 03:46:33 -0700

Hi Ravi,

Thanks for the reply . I got your point of using different bolts for mysql
and Mongo .


One thing though , is it a good idea to use different topologies within the
same cluster .

The idea behind above rational is if I use the same topology but different
bolts to do processing , I believe failure in any one of the bolts will
cause entire message to be replayed . Though this may not mean any real
problem in any of the database (like multiple inserts wont cause any
problem ) but overall throughput of ur topology will affect .

With different topologies , the idea is to seperate execution to different
set of spouts and bolts . So , assuming that topology which had been given
the responsibility of doing a different task fails , it wont effect the
other topologies .

If my rationale is correct , how does it effect cost wise maintaining
different topologies .Also , for simulating and testing this at my end ,
can I test this on local cluster?

Thanks
Ankur

On Mon, Oct 12, 2015 at 3:22 PM, Ravi Sharma <[email protected]> wrote:

> Hi Ankur,
>
> Storm's design is stateless, so storm cant store any info about what bolts
> were successful and which one failed.
> Idea is to replay the message again without affecting the final outcome.
> (means if mysql was success, it shudnt add two rows in case its replayed)
>
> From looking at far i would say you may be fixing an issue which hasnt
> happened yet. Assumption is that one DB will be failing a lot, i guess this
> may not be real case.
> Any of the DB can fail once in a while and replaying them shudnt affect ur
> performance. (say less then 10% Message failed) , you will be planning
> atleast 50% more capacity then ur max load.
>
>
> If you really want it to be very effective, i say use something like redis
> and store your bolt status with message id there, so every time you plan to
> start a bolt proessing check if you have already completed it succesfully,
> if yes then skip it.
> I have defined my own MessageId object and always put a retry count in it.
> So first one goes with 0, and at that moment you can avoid the redis/nosql
> checks.
> But then u r adding one more technology and it just increased the
> complexity.
>
>
> Whatever design you choose, i will still suggest to use two bolts, Monogo
> and mysql both are different cluster(hardware) and technology(software),
> they both will have different throughput and scalability. And as per your
> requirment you dont care if data hasnt reached to one exactly at same time,
> no atomicity (basically its not one transaction), so you dont want to slow
> down one system because other is slower.
>
>
> Last suggestion is to go with two spouts....  both will read from same
> topic(not queue), so all messages will be delivered to both Spouts. One
> Spout will send message to Mysql Bolt other will send to Mongo Bolt.
>
>
> Ravi.
>
>
>
>
> Ravi.
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Oct 12, 2015 at 10:14 AM, Ankur Garg <[email protected]> wrote:
>
>> LOL .. I was looking for something better :) ..If you see then having
>> multiple bolts here do not help much .. It would have helped had there been
>> a provision to skip the already executed Bolts .
>>
>>
>> I believe this should be there in Storm .
>>
>> Thanks
>> Ankur
>>
>> On Mon, Oct 12, 2015 at 2:42 PM, Susheel Kumar Gadalay <
>> [email protected]> wrote:
>>
>>> Check and insert
>>>
>>> On 10/12/15, Ankur Garg <[email protected]> wrote:
>>> > But what if MongoDb bolt has some error , in that case I suppose the
>>> entire
>>> > tuple will be replayed from Spout meaning it will have to redo the
>>> > operation of inserting into sql . Is there a way I can skip inserting
>>> into
>>> > mysql ?
>>> >
>>> > On Mon, Oct 12, 2015 at 1:54 PM, Susheel Kumar Gadalay
>>> > <[email protected]>
>>> > wrote:
>>> >
>>> >> It is better to have 2 bolts - mysql bolt and mongodb bolt.
>>> >>
>>> >> Let mysql bolt forward the tuple to mongodb bolt, so in case of error
>>> >> it won't  emit.
>>> >>
>>> >> On 10/12/15, Ankur Garg <[email protected]> wrote:
>>> >> > So I have a situation where the tuple received on Spout has to be
>>> saved
>>> >> to
>>> >> > mysql database and mongoDb as well .
>>> >> >
>>> >> > What should be better . Using 1 bolt to save it into mysql and
>>> MongoDb
>>> >> or 2
>>> >> > seperate Bolts (One for saving into mysql and other for saving into
>>> >> Mongo).
>>> >> >
>>> >> > What happens when the exception occurs while saving into mysql ? I
>>> >> believe
>>> >> > I will get acknowledgement inside the fail method in my Spout . So
>>> If I
>>> >> > reprocess it using 2 bolts , I believe it will again be sent to Bolt
>>> >> > for
>>> >> > saving into Mongo database .
>>> >> >
>>> >> > If the above is true , will having 2 seperate bolts be of any
>>> advantage
>>> >> > ?
>>> >> > how can I configure things so that Failure while inserting into
>>> mysql
>>> >> does
>>> >> > not impact inserting into MongoDb .
>>> >> >
>>> >> > Thanks
>>> >> > Ankur
>>> >> >
>>> >> > On Sun, Oct 11, 2015 at 10:57 PM, Ravi Sharma <[email protected]>
>>> >> wrote:
>>> >> >
>>> >> >> That depends if ur spout error has affected jvm or normal
>>> application
>>> >> >> error
>>> >> >>
>>> >> >> performance issue in case of lot of errors, I don't think there is
>>> any
>>> >> >> issue be coz of errors themselves but ofcourse if u r retrying
>>> these
>>> >> >> messages on failure then that means u will be processing lot of
>>> >> >> messages
>>> >> >> then normal and overall throughput will go down
>>> >> >>
>>> >> >> Ravi
>>> >> >>
>>> >> >> If ur topology has enabled acknowledgment that means spout will
>>> always
>>> >> >> receive
>>> >> >> On 11 Oct 2015 18:15, "Ankur Garg" <[email protected]> wrote:
>>> >> >>
>>> >> >>>
>>> >> >>> Thanks for the reply Abhishek and Ravi .
>>> >> >>>
>>> >> >>> One question though , going with One topology with multiple spouts
>>> >> >>> ...What if something goes wrong in One spout or its associated
>>> bolts
>>> >> >>> ..
>>> >> >>> Does it impact other Spout as well?
>>> >> >>>
>>> >> >>> Thanks
>>> >> >>> Ankur
>>> >> >>>
>>> >> >>> On Sun, Oct 11, 2015 at 10:21 PM, Ravi Sharma <
>>> [email protected]>
>>> >> >>> wrote:
>>> >> >>>
>>> >> >>>> No 100% right ansers , u will have to test and see what will
>>> fit..
>>> >> >>>>
>>> >> >>>> persoanlly i wud suggest Multiple spouts in one Topology and if
>>> you
>>> >> >>>> have
>>> >> >>>> N node where topology will be running then each Spout(reading
>>> from
>>> >> >>>> one
>>> >> >>>> queue) shud run N times in parallel.
>>> >> >>>>
>>> >> >>>> if 2 Queues and say 4 Nodes
>>> >> >>>> then one topolgy
>>> >> >>>> 4 Spouts reading from Queue1 in different nodes
>>> >> >>>> 4 spouts reading from Queue2 in different nodes
>>> >> >>>>
>>> >> >>>> Ravi.
>>> >> >>>>
>>> >> >>>> On Sun, Oct 11, 2015 at 5:25 PM, Abhishek priya <
>>> >> >>>> [email protected]> wrote:
>>> >> >>>>
>>> >> >>>>> I guess this is a question where there r no really correct
>>> answers.
>>> >> >>>>> I'll certainly avoid#1 as it is better to keep logic separate
>>> and
>>> >> >>>>> lightweight.
>>> >> >>>>>
>>> >> >>>>> If your downstream bolts are same, then it makes senses to keep
>>> >> >>>>> them
>>> >> >>>>> in
>>> >> >>>>> same topology but if they r totally different, I'll keep them in
>>> >> >>>>> two
>>> >> >>>>> different topologies. That will allow me to independently deploy
>>> >> >>>>> and
>>> >> >>>>> scale
>>> >> >>>>> the topology. But if the rest of logic is same I topology
>>> scaling
>>> >> >>>>> and
>>> >> >>>>> resource utilization will be better with one topology.
>>> >> >>>>>
>>> >> >>>>> I hope this helps..
>>> >> >>>>>
>>> >> >>>>> Sent somehow....
>>> >> >>>>>
>>> >> >>>>> > On Oct 11, 2015, at 9:07 AM, Ankur Garg <[email protected]
>>> >
>>> >> >>>>> > wrote:
>>> >> >>>>> >
>>> >> >>>>> > Hi ,
>>> >> >>>>> >
>>> >> >>>>> > So I have a situation where I want to read messages from
>>> >> >>>>> > different
>>> >> >>>>> queues hosted in a Rabbitmq Server .
>>> >> >>>>> >
>>> >> >>>>> > Now , there are three ways which I can think to leverage
>>> Apache
>>> >> >>>>> > Storm
>>> >> >>>>> here :-
>>> >> >>>>> >
>>> >> >>>>> > 1) Use the same Spout (say Spout A) to read messages from
>>> >> >>>>> > different
>>> >> >>>>> queues and based on the messages received emit it to different
>>> >> >>>>> Bolts.
>>> >> >>>>> >
>>> >> >>>>> > 2) Use different Spout (Spout A and Spout B and so on) within
>>> the
>>> >> >>>>> same topology (say Topology A) to read messages from different
>>> >> >>>>> queues
>>> >> >>>>> .
>>> >> >>>>> >
>>> >> >>>>> > 3) Use Different Spouts one within eachTopology (Topology A ,
>>> >> >>>>> Topology B and so on) to read messages from different queues .
>>> >> >>>>> >
>>> >> >>>>> > Which is the best way to process this considering I want high
>>> >> >>>>> throughput (more no of queue messages to be processed
>>> concurrently)
>>> >> >>>>> .
>>> >> >>>>> >
>>> >> >>>>> > Also , If In use same Topology for all Spouts (currently
>>> though
>>> >> >>>>> requirement is for 2 spouts)  will failure in one Spout (or its
>>> >> >>>>> associated
>>> >> >>>>> Bolts) effect the second or will they both continue working
>>> >> separately
>>> >> >>>>> even
>>> >> >>>>> if some failure is in Spout B ?
>>> >> >>>>> >
>>> >> >>>>> > Cost wise , how much would it be to maintain two different
>>> >> >>>>> > topologies
>>> >> >>>>> .
>>> >> >>>>> >
>>> >> >>>>> > Looking for inputs from members here.
>>> >> >>>>> >
>>> >> >>>>> > Thanks
>>> >> >>>>> > Ankur
>>> >> >>>>> >
>>> >> >>>>> >
>>> >> >>>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>
>>> >> >
>>> >>
>>> >
>>>
>>
>>
>

Re: Multiple Spouts in Same topology or Topology per spout

Reply via email to