There are so many variables.  Again, I think you just need to run the
application and profile it.   Maybe just run it on VMs to get some
profiling info and then determine if you need real h/w.  This is something
that I don't think you can expect a mailing list to really help you solve,
Storm is *really* just a bunch of APIs and interfaces for running Java code
(ignoring the shell-based stuff for running logic written in other
languages).  So the question boils down to:  what type of h/w do I need to
run some arbitrary program?   I don't see how anyone can answer that for
you.

If you wanna co-locate it you could choose to do so, I just like to keep
stuff separate.  These are large complex systems, I don't know what benefit
you'll be gaining from shoving them into the same host.

- Erik

On Mon, Apr 10, 2017 at 7:18 PM, Ali Nazemian <[email protected]> wrote:

> Thank you very much, Erick.
>
> What if I dedicate a separate disk for Storm logs? Let's say for example
> dedicating 23 number of disks for Kafka and additional disk for Storm.
>
> In terms of application, my application would be parsing unstructured
> data, enrich it with some additional data stored on HBase and send it to
> Elasticsearch/Solr as well as store it on HDFS. Can it narrow down the use
> case to have a better understanding of HW requirements?
>
> Regards,
> Ali
>
> On Tue, Apr 11, 2017 at 12:04 PM, Erik Weathers <[email protected]>
> wrote:
>
>> hi Ali,
>>
>> Unfortunately the answer to these questions is *very* dependent on your
>> application logic in your storm topology, I don't think anyone can really
>> speak to many of these questions, it's just too broad.  You'll need to do
>> your own profiling with your application and figure out your own particular
>> resource needs.
>>
>> I wouldn't recommend running Storm colocated with Kafka on the same
>> host.  You *can* create a lot of disk IO if you log a lot from your storm
>> topology, and you wouldn't want anything messing with Kafka's ability to
>> use the disk (Kafka must read from disk if you read older data that isn't
>> resident in memory, and Kafka is also writing everything to disk).  But if
>> you're using VMs you may not have control over whether Kafka brokers and
>> Storm worker nodes get placed onto the same physical host.
>>
>> - Erik
>>
>> P.S., Storm isn't normally fully capitalized as you're writing (STORM);
>> i.e., it's not an acronym.
>>
>> On Mon, Apr 10, 2017 at 6:16 PM, Ali Nazemian <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> I was wondering if there is any benchmark or any recommendation for
>>> having physical HW vs. virtual for the STORM. I am trying to calculate the
>>> HW requirements for a STORM Cluster with a hard SLA. My questions are as
>>> follows.
>>>
>>> - How much on-heap and off-heap memory would be required per node? Is
>>> there any additional improvement we may have by adding additional memory? I
>>> think STOM supervisor is not a disk-intensive workload. Does it mean
>>> on-heap memory is all that matters?
>>>
>>> - Is there any rule for calculating the number of required CPU cores per
>>> supervisor node?
>>>
>>> - Since Storm is more CPU-intensive and not a Disk-intensive workload,
>>> how bad would be to coexist STORM and a none CPU-intensive workload like
>>> Kafka-Broker?
>>>
>>> Regards,
>>> Ali
>>>
>>
>>
>
>
> --
> A.Nazemian
>

Reply via email to