On Sun, Jan 19, 2014 at 2:49 PM, Ognen Duzlevski <[email protected]>wrote:
> > My basic requirement is to set everything up myself and understand it. For > testing purposes my cluster has 15 xlarge instances and I guess I will just > set up a hadoop cluster to run over these instances for the purposes of > getting the benefits of HDFS. I would then set up hdfs over S3 with blocks. > By this I mean I would set up a Hadoop cluster running in parallel on the same instances just for the purposes of running Spark over HDFS. Is this a reasonable approach? What kind of a performance penalty (memory, CPU cycles) am I going to incur by the Hadoop daemons running just for this purpose? Thanks! Ognen
