Re: Storm patterns vis-a-vis external data storage

Nathan Leung Thu, 08 Jan 2015 07:47:21 -0800

I thought the storm documentation indicates that noneGrouping is currently
equivalent to shuffleGrouping?  Has this changed?  If this is still the
case, I would recommend using localOrShuffleGrouping which will keep the
data in process at least, and avoid serialization and network transfer.


On Thu, Jan 8, 2015 at 10:34 AM, Itai Frenkel <[email protected]> wrote:

>  Use noneGrouping between the two bolts so the only overhead is a thread
> context switch. Storm+Linux manages these context switches pretty
> well. Unless you are already in the stage of CPU usage optimizations, I
> would not sweat about it.
>  ------------------------------
> *From:* Hemanth Yamijala <[email protected]>
> *Sent:* Thursday, January 8, 2015 8:27 AM
> *To:* [email protected]
> *Subject:* Re: Storm patterns vis-a-vis external data storage
>
>  Itai & Jens,
>
>  Thank you for sharing your thoughts. My requirement is what Jens has
> referred to as "export" data from my topology outside.
>
>  I can clearly see the benefits of segregating this functionality to
> another bolt - for e.g. to scale it independently of the processing bolts,
> or for accommodating changes.
>
>  The only negative (if it is that) seems to be the increase in number of
> runtime bolt instances in the topology. I understand that it can be solved
> with more hardware resources and the horizontal scalability of Storm. Also,
> it might be hard to quantify this precisely, given the different scaling
> requirements for processing and I/O bound bolts. Do you see this as a
> concern ?
>
>  Thanks
> hemanth
>
> On Wed, Jan 7, 2015 at 9:39 PM, Jens-U. Mozdzen <[email protected]> wrote:
>
>> Hi Hemanth,
>>
>> Zitat von Hemanth Yamijala <[email protected]>
>>
>>> Hi all,
>>>
>>> I guess it is common to build topologies where message processing in
>>> storm results in data that should be stored in external stores like NoSQL
>>> DBs or message queues like Kafka.
>>>
>>> There are two broad approaches to handle this storage:
>>>
>>> 1) Inline the storage functionality with the processing functionality -
>>> i.e. the bolt generating the info to be stored also takes care of storing
>>> it.
>>> 2) Separate out the two and make a downstream bolt responsible for the
>>> storage.
>>>
>>> Just wanted to see if people on the list think if there are advantages
>>> to favour one approach over the other. Any pitfalls to take care of in one
>>> case over the other.
>>>
>>
>> I'd say: it depends ;) In case of aggregation bolts that persist their
>> states, you may want to limit the memory footprint of each bolt instance.
>> Thus implementing an in-mem cache for persisted data is pretty helpful, but
>> means to incorporate persistence access per-bolt.
>>
>> OTOH, if you plan to "export" data from your topology (which seems to be
>> the main focus of your question), separating calculation and "export" into
>> separate bolts seems a natural choice to me - especially when you consider
>> future changes (i.e. to support a different or possibly *additional* export
>> paths - you can keep the "tuple interface" as it is and simply connect
>> different and/or additional export bolts).
>>
>> Regards,
>> Jens
>>
>>
>

Re: Storm patterns vis-a-vis external data storage

Reply via email to