Re: Spark replacing Hadoop

Sean Owen Thu, 14 Apr 2016 13:04:22 -0700

Depends indeed on what you mean by "Hadoop". The core Hadoop project
is MapReduce, YARN and HDFS. MapReduce is still in use as a workhorse
but superseded by engines like Spark (or perhaps Flink).  (Tez maps
loosely to Spark Core really, and is not really a MapReduce
replacement.)


"Hadoop" can also be a catch-all term for projects typically used
together in conjunction with core Hadoop. That can be Spark, Kafka,
HBase, ZK, Solr, Parquet, Impala, Hive, etc.

If you mean the former -- mostly no, Spark needs a storage layer like
HDFS for persistent storage, and needs to integrate with a cluster
manager like YARN in order to share resources with other apps, but
replaces MapReduce.

If you mean the latter -- no, Spark is a big piece of the broader
picture and replaces several pieces (Mahout, maybe Crunch in some
ways, Giraph, arguably takes on some of Hive's workloads), but doesn't
replace most of them.

Really, there's no reason to expect that one project will do
everything. Core Hadoop mostly certainly wasn't enough to handle all
the "Hadoop" workloads today. It's a false choice. You can use Spark
*and* Hadoop-related projects and that's the best of all.

On Thu, Apr 14, 2016 at 8:40 PM, Mich Talebzadeh
<mich.talebza...@gmail.com> wrote:
> Hi,
>
> My two cents here.
>
> Hadoop as I understand has two components namely HDFS (Hadoop Distributed
> File System) and MapReduce.
>
> Whatever we use I still think we need to store data on HDFS (excluding
> standalones like MongoDB etc.). Now moving to MapReduce as the execution
> engine that is replaced by TEZ (basically MapReduce with DAG) or with Spark
> which uses in memory capabilities and DAG. MapReduce is the one moving
> sideways.
>
> To me Spark besides being versatile is a powerful tool. Remember tools are
> just tools, not solutions so we can discuss this all day. Effectively I
> would argue that with Spark as the front end tool with Hive and its
> organisation for metadata plus HDFS as the storage layer, you have all three
> components to create a powerful solution.
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>
> On 14 April 2016 at 20:22, Andy Davidson <a...@santacruzintegration.com>
> wrote:
>>
>> Hi Ashok
>>
>> In general if I was starting a new project and had not invested heavily in
>> hadoop (i.e. Had a large staff that was trained on hadoop, had a lot of
>> existing projects implemented on hadoop, …) I would probably start using
>> spark. Its faster and easier to use
>>
>> Your mileage may vary
>>
>> Andy
>>
>> From: Ashok Kumar <ashok34...@yahoo.com.INVALID>
>> Reply-To: Ashok Kumar <ashok34...@yahoo.com>
>> Date: Thursday, April 14, 2016 at 12:13 PM
>> To: "user @spark" <user@spark.apache.org>
>> Subject: Spark replacing Hadoop
>>
>> Hi,
>>
>> I hear that some saying that Hadoop is getting old and out of date and
>> will be replaced by Spark!
>>
>> Does this make sense and if so how accurate is it?
>>
>> Best
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark replacing Hadoop

Reply via email to