Re: Storm scalability issue

John Reilly Sun, 26 Jul 2015 12:45:07 -0700

Could you share the code you are using to create the StormTopology?

>From the storm UI screenshots above, it looks like every tuple emitted from
the spout tasks is sent to 4 different bolts instances (for the spouts, the
transferred is 4x the number emitted).  I'm guessing that each spout
instance is emitting the tuple to each of worker1 through worker4.   Is
this what you expect the code to do?   Also, from the UI screenshot, worker1
and worker3 are > 90% capacity.


I'm not sure if I understand your goal.  If it is just to push as many
tuples as possible through the topology without any particular field
grouping, etc you could increase the throughput by making sure components
are co-located on the same worker.  By ensuring the parallelism of the
spout and each of the workers is the same as the number of workers
(assuming EvenScheduler) you would end up with one instance of each
component on each worker.  If you use locaOrShuffle grouping, you will get
rid of a lot of overhead and should be able to increase throughput. It
really depends on what you are trying to achieve with your topology
though.  Obviously this will not work if you need to use a field grouping.


On Sun, Jul 26, 2015 at 12:04 PM Dimitris Sarlis <[email protected]>
wrote:

>  No, I have a config parameter which changes how many random numbers are
> generated by the bolt's execute method to simulate a heavier task. The
> total number of messages is controlled by another parameter which I keep to
> the same value across my experiments.
>
>
> On 26/07/2015 07:09 μμ, Enno Shioji wrote:
>
> I mean could that be by mistake, you are generating more messages as you
> change the config, so the total test time just appears as if there is no
> improvement?
>
> On Sun, Jul 26, 2015 at 5:08 PM, Enno Shioji <[email protected]> wrote:
>
>> This may be a silly guess, but you are not simply generating
>> proportionally more messages as you change the config right?
>>
>> On Sun, Jul 26, 2015 at 4:53 PM, Dimitris Sarlis <[email protected]>
>> wrote:
>>
>>>  Kashyap,
>>>
>>> I put logger before and after emit in each bolt. In spouts it's not so
>>> easy because I'm using the predefined class KafkaSpout. See the attached
>>> images from a test execution. I used 1 spout with parallelism 8 and 4 bolts
>>> with parallelism 2. I also include a screenshot from a bolt's log where you
>>> can see messages like: "Sending record mpla mpla" and "After emit". These
>>> messages are written before and after each emit in a bolt.
>>>
>>> Dimitris
>>>
>>>
>>> On 26/07/2015 06:24 μμ, Kashyap Mhaisekar wrote:
>>>
>>> Can you put loggers before and after emit () in each bolt/spout?
>>>
>>> Can you share Storm UI screenshots ?
>>>
>>> Thanks
>>> Kashyap
>>>
>>>  On Sun, Jul 26, 2015, 10:08 Dimitris Sarlis <[email protected]>
>>> wrote:
>>>
>>>> Hi Harsha,
>>>>
>>>> 1. the number of topic partitions is set every time to the total number
>>>> of spouts I'm using.
>>>> 2. I have checked that data from the kafka producer are distributed into
>>>> all of these partitions
>>>> 3. I've tried from 4 to 20
>>>> 4. 1000
>>>> 5. This topology is just for some testing. Spouts get data from Kafka
>>>> and then dispatch them to bolts. There if the record has not been
>>>> processed before, each bolt generates some random numbers and then it
>>>> selects another bolt to send the record appended with a "!". If the
>>>> record has been processed before (it has a "!" in the end) then just
>>>> generate some random numbers.
>>>> 6. No
>>>> 7. No
>>>>
>>>> Dimitris
>>>>
>>>> On 26/07/2015 05:52 μμ, Harsha wrote:
>>>> > Hi Dimitris,
>>>> >
>>>> > 1. how many topic partitions you've
>>>> > 2. make sure you are distributing data from kafka producer side into
>>>> all
>>>> > of these partitions
>>>> > 3. whats your kafakspout parallelism set to
>>>> > 4. whats you topology.max.spout.pending set to
>>>> > 5. if you can , briefly describe what topology is doing.
>>>> > 6. are you seeing anything under failed column in Stom UI.
>>>> > 7. any errors in storm topology logs.
>>>> >
>>>> > Thanks,
>>>> > Harsha
>>>> >
>>>> > On Sat, Jul 25, 2015, at 05:29 AM, Dimitris Sarlis wrote:
>>>> >> Hi all,
>>>> >>
>>>> >> I'm trying to run a topology in Storm and I am facing some
>>>> scalability
>>>> >> issues. Specifically, I have a topology where KafkaSpouts read from a
>>>> >> Kafka queue and emit messages to bolts which are connected with each
>>>> >> other through directGrouping. (Each bolt is connected with itself as
>>>> >> well as with each one of the other bolts). Spouts subscribe to bolts
>>>> >> with shuffleGrouping. I observe that when I increase the number of
>>>> >> spouts and bolts proportionally, I don't get the speedup I'm
>>>> expecting
>>>> >> to. In fact, my topology seems to run slower and for the same amount
>>>> of
>>>> >> data, it takes more time to complete. For example, when I increase
>>>> >> spouts from 4->8 and bolts from 4->8, it takes longer to process the
>>>> >> same amount of kafka messages.
>>>> >>
>>>> >> Any ideas why this is happening? Thanks in advance.
>>>> >>
>>>> >> Best,
>>>> >> Dimitris Sarlis
>>>>
>>>>
>>>
>>
>
>

Re: Storm scalability issue

Reply via email to