Here what I would suggest. In order to protect from Human error, start a ec2 instance with the ec2 script. Copy over the folders as they are well integrated with hdfs, compatible drivers and versions. Change configuration to configure slaves and masters. Sorry if its offensive :) .. I found dealing with various hadoop incompatibilities very taxing. Regards Mayur
Mayur Rustagi Ph: +919632149971 h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Sun, Jan 19, 2014 at 8:23 PM, Ognen Duzlevski <[email protected]>wrote: > On Sun, Jan 19, 2014 at 2:49 PM, Ognen Duzlevski <[email protected] > > wrote: > >> >> My basic requirement is to set everything up myself and understand it. >> For testing purposes my cluster has 15 xlarge instances and I guess I will >> just set up a hadoop cluster to run over these instances for the purposes >> of getting the benefits of HDFS. I would then set up hdfs over S3 with >> blocks. >> > > By this I mean I would set up a Hadoop cluster running in parallel on the > same instances just for the purposes of running Spark over HDFS. Is this a > reasonable approach? What kind of a performance penalty (memory, CPU > cycles) am I going to incur by the Hadoop daemons running just for this > purpose? > > Thanks! > Ognen >
