@Mich I understand why I would need Zookeeper. It is there for fault tolerance
given that spark is a master-slave architecture and when a mater goes down
zookeeper will run a leader election algorithm to elect a new leader however
DevOps hate Zookeeper they would be much happier to go with etcd & consul and
looks like if we mesos scheduler we should be able to drop Zookeeper.
HDFS I am still trying to understand why I would need for spark. I understand
the purpose of distributed file systems in general but I don't understand in the
context of spark since many people say you can run a spark distributed cluster
in a stand alone mode but I am not sure what are its pros/cons if we do it that
way. In a hadoop world I understand that one of the reasons HDFS is there is for
replication other words if we write some data to a HDFS it will store that block
across different nodes such that if one of nodes goes down it can still retrieve
that block from other nodes. In the context of spark I am not really sure
because 1) I am new 2) Spark paper says it doesn't replicate data instead it
stores the lineage(all the transformations) such that it can reconstruct it.







On Thu, Aug 25, 2016 9:18 AM, Mich Talebzadeh mich.talebza...@gmail.com wrote:
You can use Spark on Oracle as a query tool.
It all depends on the mode of the operation.
If you running Spark with yarn-client/cluster then you will need yarn. It comes
as part of Hadoop core (HDFS, Map-reduce and Yarn).
I have not gone and installed Yarn without installing Hadoop.
What is the overriding reason to have the Spark on its own?
You can use Spark in Local or Standalone mode if you do not want Hadoop core.
HTH
Dr Mich Talebzadeh



LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com




Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any
other property which may arise from relying on this email's technical content is
explicitly disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.




On 24 August 2016 at 21:54, kant kodali < kanth...@gmail.com > wrote:
What do I loose if I run spark without using HDFS or Zookeper ? which of them is
almost a must in practice?

Reply via email to