subject:"Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage"

Re: Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

2016-12-23 Thread Anastasios Zouzias

Hi Rohit, Since your instances have 16G dual core only, I would suggest to use dedicated nodes for elastic using 8GB for elastic heap memory. This way you won't have any interference between spark executors and elastic. Also, if possible, you could try to use SSD disk on these 3 machines for stor

Re: Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

2016-12-22 Thread Rohit Verma

Below ingestion rate is actually when I am using a bactch size of 10mb, 10 records. I have tried with 20-50 partitions, higher partitions give bulk queue exceptions. Anyways thanks for suggestion I would appreciate more inputs, specifically on cluster design. Rohit > On Dec 22, 2016, at 1

Re: Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

2016-12-22 Thread genia...@gmail.com

One thing I will look at is how many partitions your dataset has before writing to ES using Spark. As it may be the limiting factor to your parallel writing. You can also tune the batch size on ES writes... One more thing, make sure you have enough network bandwidth... Regards, Yang Sent fro

Ingesting data in elasticsearch from hdfs using spark , cluster setup and usage

2016-12-22 Thread Rohit Verma

I am setting up a spark cluster. I have hdfs data nodes and spark master nodes on same instances. To add elasticsearch to this cluster, should I spawn es on different machine on same machine. I have only 12 machines, 1-master (spark and hdfs) 8-spark workers and hdfs data nodes I can use 3 nodes