Hi All, I have reached a point in my cluster that i don't need more storage for the HDFS and i need to add processing power, i'm using Yarn,Spark and Impala on the normal nodes for processing.
My questions: 1- How much the data locality will impact impala performance as i know impala rely on data locality on it's processing? 2- I have OS disk with 600GB, will this be enough to be used to spill to disk when needed? is it dependent on other factors, the impala daemon memory limit is 35GB. 3- Should i disable the *HDFS Short Circuit Read* on these nodes? Will happy to get more recommendation on this .... -- Take Care Fawze Abujaber