Re: Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

2016-12-23 Thread Anastasios Zouzias
Hi Rohit, Since your instances have 16G dual core only, I would suggest to use dedicated nodes for elastic using 8GB for elastic heap memory. This way you won't have any interference between spark executors and elastic. Also, if possible, you could try to use SSD disk on these 3 machines for stor

Re: Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

2016-12-22 Thread Rohit Verma
Below ingestion rate is actually when I am using a bactch size of 10mb, 10 records. I have tried with 20-50 partitions, higher partitions give bulk queue exceptions. Anyways thanks for suggestion I would appreciate more inputs, specifically on cluster design. Rohit > On Dec 22, 2016, at 1

Re: Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

2016-12-22 Thread genia...@gmail.com
One thing I will look at is how many partitions your dataset has before writing to ES using Spark. As it may be the limiting factor to your parallel writing. You can also tune the batch size on ES writes... One more thing, make sure you have enough network bandwidth... Regards, Yang Sent fro

Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

2016-12-22 Thread Rohit Verma
I am setting up a spark cluster. I have hdfs data nodes and spark master nodes on same instances. To add elasticsearch to this cluster, should I spawn es on different machine on same machine. I have only 12 machines, 1-master (spark and hdfs) 8-spark workers and hdfs data nodes I can use 3 nodes

Re: spark cluster setup

2015-08-03 Thread Akhil Das
Are you sitting behind a firewall and accessing a remote master machine? In that case, have a look at this http://spark.apache.org/docs/latest/configuration.html#networking, you might want to fix few properties like spark.driver.host, spark.driver.host etc. Thanks Best Regards On Mon, Aug 3, 2015

Re: spark cluster setup

2015-08-02 Thread Sonal Goyal
Your master log files will be on the spark home folder/logs at the master machine. Do they show an error ? Best Regards, Sonal Founder, Nube Technologies Check out Reifier at Spark Summit 2015

Re: spark cluster setup

2015-08-02 Thread Sonal Goyal
What do the master logs show? Best Regards, Sonal Founder, Nube Technologies Check out

Re: Spark Cluster Setup

2015-04-27 Thread Denny Lee
Similar to what Dean called out, we build Puppet manifests so we could do the automation - its a bit of work to setup, but well worth the effort. On Fri, Apr 24, 2015 at 11:27 AM Dean Wampler wrote: > It's mostly manual. You could try automating with something like Chef, of > course, but there's

Re: Spark Cluster Setup

2015-04-24 Thread Dean Wampler
It's mostly manual. You could try automating with something like Chef, of course, but there's nothing already available in terms of automation. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition (O'Reilly) Typesafe

Re: Spark Cluster Setup

2015-04-24 Thread James King
Thanks Dean, Sure I have that setup locally and testing it with ZK. But to start my multiple Masters do I need to go to each host and start there or is there a better way to do this. Regards jk On Fri, Apr 24, 2015 at 5:23 PM, Dean Wampler wrote: > The convention for standalone cluster is to

Re: Spark Cluster Setup

2015-04-24 Thread Dean Wampler
The convention for standalone cluster is to use Zookeeper to manage master failover. http://spark.apache.org/docs/latest/spark-standalone.html Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition (O'Reilly) Typesafe @d

Spark Cluster Setup

2015-04-24 Thread James King
I'm trying to find out how to setup a resilient Spark cluster. Things I'm thinking about include: - How to start multiple masters on different hosts? - there isn't a conf/masters file from what I can see Thank you.