Depends indeed on what you mean by "Hadoop". The core Hadoop project is MapReduce, YARN and HDFS. MapReduce is still in use as a workhorse but superseded by engines like Spark (or perhaps Flink). (Tez maps loosely to Spark Core really, and is not really a MapReduce replacement.)
"Hadoop" can also be a catch-all term for projects typically used together in conjunction with core Hadoop. That can be Spark, Kafka, HBase, ZK, Solr, Parquet, Impala, Hive, etc. If you mean the former -- mostly no, Spark needs a storage layer like HDFS for persistent storage, and needs to integrate with a cluster manager like YARN in order to share resources with other apps, but replaces MapReduce. If you mean the latter -- no, Spark is a big piece of the broader picture and replaces several pieces (Mahout, maybe Crunch in some ways, Giraph, arguably takes on some of Hive's workloads), but doesn't replace most of them. Really, there's no reason to expect that one project will do everything. Core Hadoop mostly certainly wasn't enough to handle all the "Hadoop" workloads today. It's a false choice. You can use Spark *and* Hadoop-related projects and that's the best of all. On Thu, Apr 14, 2016 at 8:40 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi, > > My two cents here. > > Hadoop as I understand has two components namely HDFS (Hadoop Distributed > File System) and MapReduce. > > Whatever we use I still think we need to store data on HDFS (excluding > standalones like MongoDB etc.). Now moving to MapReduce as the execution > engine that is replaced by TEZ (basically MapReduce with DAG) or with Spark > which uses in memory capabilities and DAG. MapReduce is the one moving > sideways. > > To me Spark besides being versatile is a powerful tool. Remember tools are > just tools, not solutions so we can discuss this all day. Effectively I > would argue that with Spark as the front end tool with Hive and its > organisation for metadata plus HDFS as the storage layer, you have all three > components to create a powerful solution. > > HTH > > > Dr Mich Talebzadeh > > > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > > http://talebzadehmich.wordpress.com > > > > > On 14 April 2016 at 20:22, Andy Davidson <a...@santacruzintegration.com> > wrote: >> >> Hi Ashok >> >> In general if I was starting a new project and had not invested heavily in >> hadoop (i.e. Had a large staff that was trained on hadoop, had a lot of >> existing projects implemented on hadoop, …) I would probably start using >> spark. Its faster and easier to use >> >> Your mileage may vary >> >> Andy >> >> From: Ashok Kumar <ashok34...@yahoo.com.INVALID> >> Reply-To: Ashok Kumar <ashok34...@yahoo.com> >> Date: Thursday, April 14, 2016 at 12:13 PM >> To: "user @spark" <user@spark.apache.org> >> Subject: Spark replacing Hadoop >> >> Hi, >> >> I hear that some saying that Hadoop is getting old and out of date and >> will be replaced by Spark! >> >> Does this make sense and if so how accurate is it? >> >> Best > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org