There are so many variables. Again, I think you just need to run the application and profile it. Maybe just run it on VMs to get some profiling info and then determine if you need real h/w. This is something that I don't think you can expect a mailing list to really help you solve, Storm is *really* just a bunch of APIs and interfaces for running Java code (ignoring the shell-based stuff for running logic written in other languages). So the question boils down to: what type of h/w do I need to run some arbitrary program? I don't see how anyone can answer that for you.
If you wanna co-locate it you could choose to do so, I just like to keep stuff separate. These are large complex systems, I don't know what benefit you'll be gaining from shoving them into the same host. - Erik On Mon, Apr 10, 2017 at 7:18 PM, Ali Nazemian <[email protected]> wrote: > Thank you very much, Erick. > > What if I dedicate a separate disk for Storm logs? Let's say for example > dedicating 23 number of disks for Kafka and additional disk for Storm. > > In terms of application, my application would be parsing unstructured > data, enrich it with some additional data stored on HBase and send it to > Elasticsearch/Solr as well as store it on HDFS. Can it narrow down the use > case to have a better understanding of HW requirements? > > Regards, > Ali > > On Tue, Apr 11, 2017 at 12:04 PM, Erik Weathers <[email protected]> > wrote: > >> hi Ali, >> >> Unfortunately the answer to these questions is *very* dependent on your >> application logic in your storm topology, I don't think anyone can really >> speak to many of these questions, it's just too broad. You'll need to do >> your own profiling with your application and figure out your own particular >> resource needs. >> >> I wouldn't recommend running Storm colocated with Kafka on the same >> host. You *can* create a lot of disk IO if you log a lot from your storm >> topology, and you wouldn't want anything messing with Kafka's ability to >> use the disk (Kafka must read from disk if you read older data that isn't >> resident in memory, and Kafka is also writing everything to disk). But if >> you're using VMs you may not have control over whether Kafka brokers and >> Storm worker nodes get placed onto the same physical host. >> >> - Erik >> >> P.S., Storm isn't normally fully capitalized as you're writing (STORM); >> i.e., it's not an acronym. >> >> On Mon, Apr 10, 2017 at 6:16 PM, Ali Nazemian <[email protected]> >> wrote: >> >>> Hi all, >>> >>> I was wondering if there is any benchmark or any recommendation for >>> having physical HW vs. virtual for the STORM. I am trying to calculate the >>> HW requirements for a STORM Cluster with a hard SLA. My questions are as >>> follows. >>> >>> - How much on-heap and off-heap memory would be required per node? Is >>> there any additional improvement we may have by adding additional memory? I >>> think STOM supervisor is not a disk-intensive workload. Does it mean >>> on-heap memory is all that matters? >>> >>> - Is there any rule for calculating the number of required CPU cores per >>> supervisor node? >>> >>> - Since Storm is more CPU-intensive and not a Disk-intensive workload, >>> how bad would be to coexist STORM and a none CPU-intensive workload like >>> Kafka-Broker? >>> >>> Regards, >>> Ali >>> >> >> > > > -- > A.Nazemian >
