We use both databricks and emr. We use databricks for our exploratory / adhoc 
use cases because their notebook is pretty badass and better than Zeppelin IMHO.

We use EMR for our production machine learning and ETL tasks. The nice thing 
about EMR is you can use applications other than spark. From a "tools in the 
toolbox" perspective this is very important.

M

> On Jan 28, 2016, at 6:05 PM, Sourav Mazumder <sourav.mazumde...@gmail.com> 
> wrote:
> 
> You can also try out IBM's spark as a service in IBM Bluemix. You'll get 
> there all required features for security, multitenancy, notebook, integration 
> with other big data services. You can try that out for free too.
> 
> Regards,
> Sourav
> 
> On Thu, Jan 28, 2016 at 2:10 PM, Rakesh Soni <rak...@databricks.com> wrote:
>>>> At its core, EMR just launches Spark applications, whereas Databricks is a 
>>>> higher-level platform that also includes multi-user support, an 
>>>> interactive UI, security, and job scheduling.
>>>> 
>>>> Specifically, Databricks runs standard Spark applications inside a user’s 
>>>> AWS account, similar to EMR, but it adds a variety of features to create 
>>>> an end-to-end environment for working with Spark. These include:
>>>> 
>>>> Interactive UI (includes a workspace with notebooks, dashboards, a job 
>>>> scheduler, point-and-click cluster management)
>>>> Cluster sharing (multiple users can connect to the same cluster, saving 
>>>> cost)
>>>> Security features (access controls to the whole workspace)
>>>> Collaboration (multi-user access to the same notebook, revision control, 
>>>> and IDE and GitHub integration)
>>>> Data management (support for connecting different data sources to Spark, 
>>>> caching service to speed up queries)
>>>> 
>>>> The idea is that a lot of Spark deployments soon need to bring in multiple 
>>>> users, different types of jobs, etc, and we want to have these built-in. 
>>>> But if you just want to connect to existing data and run jobs, that also 
>>>> works.
>>>> 
>>>> The cluster manager in Databricks is based on Standalone mode, not YARN, 
>>>> but Databricks adds several features, such as allowing multiple users to 
>>>> run commands on the same cluster and running multiple versions of Spark. 
>>>> Because Databricks is also the team that initially built Spark, the 
>>>> service is very up to date and integrated with the newest Spark features 
>>>> -- e.g. you can run previews of the next release, any data in Spark can be 
>>>> displayed visually, etc.
>>>> 
>>>> From: Alex Nastetsky <alex.nastet...@vervemobile.com>
>>>> Subject: Databricks Cloud vs AWS EMR
>>>> Date: January 26, 2016 at 11:55:41 AM PST
>>>> To: user <user@spark.apache.org>
>>>> 
>>>> As a user of AWS EMR (running Spark and MapReduce), I am interested in 
>>>> potential benefits that I may gain from Databricks Cloud. I was wondering 
>>>> if anyone has used both and done comparison / contrast between the two 
>>>> services.
>>>> 
>>>> In general, which resource manager(s) does Databricks Cloud use for Spark? 
>>>> If it's YARN, can you also run MapReduce jobs in Databricks Cloud?
>>>> 
>>>> Thanks.
>> --
> 

Reply via email to