Hi Rohit,
Since your instances have 16G dual core only, I would suggest to use
dedicated nodes for elastic using 8GB for elastic heap memory. This way you
won't have any interference between spark executors and elastic.
Also, if possible, you could try to use SSD disk on these 3 machines for
stor
Below ingestion rate is actually when I am using a bactch size of 10mb, 10
records. I have tried with 20-50 partitions, higher partitions give bulk queue
exceptions.
Anyways thanks for suggestion I would appreciate more inputs, specifically on
cluster design.
Rohit
> On Dec 22, 2016, at 1
One thing I will look at is how many partitions your dataset has before writing
to ES using Spark. As it may be the limiting factor to your parallel writing.
You can also tune the batch size on ES writes...
One more thing, make sure you have enough network bandwidth...
Regards,
Yang
Sent fro
I am setting up a spark cluster. I have hdfs data nodes and spark master nodes
on same instances. To add elasticsearch to this cluster, should I spawn es on
different machine on same machine. I have only 12 machines,
1-master (spark and hdfs)
8-spark workers and hdfs data nodes
I can use 3 nodes