Hi All,

I have reached a point in my cluster that i don't need more storage for the
HDFS and i need to add processing power, i'm using Yarn,Spark and Impala on
the normal nodes for processing.

My questions:

1- How much the data locality will impact impala performance as i know
impala rely on data locality on it's processing?

2- I have OS disk with 600GB, will this be enough to be used to spill to
disk when needed? is it dependent on other factors, the impala daemon
memory limit is 35GB.

3- Should i disable the  *HDFS Short Circuit Read*  on these nodes?

Will happy to get more recommendation on this ....

-- 
Take Care
Fawze Abujaber

Reply via email to