Thank you very much, Erick. What if I dedicate a separate disk for Storm logs? Let's say for example dedicating 23 number of disks for Kafka and additional disk for Storm.
In terms of application, my application would be parsing unstructured data, enrich it with some additional data stored on HBase and send it to Elasticsearch/Solr as well as store it on HDFS. Can it narrow down the use case to have a better understanding of HW requirements? Regards, Ali On Tue, Apr 11, 2017 at 12:04 PM, Erik Weathers <[email protected]> wrote: > hi Ali, > > Unfortunately the answer to these questions is *very* dependent on your > application logic in your storm topology, I don't think anyone can really > speak to many of these questions, it's just too broad. You'll need to do > your own profiling with your application and figure out your own particular > resource needs. > > I wouldn't recommend running Storm colocated with Kafka on the same host. > You *can* create a lot of disk IO if you log a lot from your storm > topology, and you wouldn't want anything messing with Kafka's ability to > use the disk (Kafka must read from disk if you read older data that isn't > resident in memory, and Kafka is also writing everything to disk). But if > you're using VMs you may not have control over whether Kafka brokers and > Storm worker nodes get placed onto the same physical host. > > - Erik > > P.S., Storm isn't normally fully capitalized as you're writing (STORM); > i.e., it's not an acronym. > > On Mon, Apr 10, 2017 at 6:16 PM, Ali Nazemian <[email protected]> > wrote: > >> Hi all, >> >> I was wondering if there is any benchmark or any recommendation for >> having physical HW vs. virtual for the STORM. I am trying to calculate the >> HW requirements for a STORM Cluster with a hard SLA. My questions are as >> follows. >> >> - How much on-heap and off-heap memory would be required per node? Is >> there any additional improvement we may have by adding additional memory? I >> think STOM supervisor is not a disk-intensive workload. Does it mean >> on-heap memory is all that matters? >> >> - Is there any rule for calculating the number of required CPU cores per >> supervisor node? >> >> - Since Storm is more CPU-intensive and not a Disk-intensive workload, >> how bad would be to coexist STORM and a none CPU-intensive workload like >> Kafka-Broker? >> >> Regards, >> Ali >> > > -- A.Nazemian
