Thanks Tim, That's means that i cannot disable this cross the impala cluster and i need to manage this at the query level, right?
Is it any configuration at the cluster level to disable this? On Wed, Apr 4, 2018 at 3:44 AM, Tim Armstrong <tarmstr...@cloudera.com> wrote: > I agree with Jim's answers. > > You may run into challenges if you have some Impala daemons that have > local DataNodes and some that do not have local DataNodes. By default > Impala always chooses a daemon with a local copy of the data, which would > mean that daemons without a co-located DataNode might never get fragments > scheduled on them. We do have a knob that let's you disable locality-based > scheduling https://impala.apache.org/docs/build/html/topics/impala_ > replica_preference.html but that may be too blunt an instrument. > > On Tue, Apr 3, 2018 at 11:34 AM, Jim Apple <jbap...@cloudera.com> wrote: > >> I think the answers are: >> >> 1. It depends on your workload and your network. I know some users run >> with ONLY remote reads and still get performance they are happy with. Your >> existing nodes will continue to be able to short-circuit read. >> >> 2. This is highly workload-dependent. You want to try and avoid spilling, >> obviously, but if your spinning disk can write 200MB/s it would take 3000 >> seconds, which is 50 minutes, to fill up. >> >> 3. I think the impalads are smart enough to not try and do a >> short-circuit read on data that isn't local. >> >> On Tue, Apr 3, 2018 at 10:22 AM, Fawze Abujaber <fawz...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> I have reached a point in my cluster that i don't need more storage for >>> the HDFS and i need to add processing power, i'm using Yarn,Spark and >>> Impala on the normal nodes for processing. >>> >>> My questions: >>> >>> 1- How much the data locality will impact impala performance as i know >>> impala rely on data locality on it's processing? >>> >>> 2- I have OS disk with 600GB, will this be enough to be used to spill to >>> disk when needed? is it dependent on other factors, the impala daemon >>> memory limit is 35GB. >>> >>> 3- Should i disable the *HDFS Short Circuit Read* on these nodes? >>> >>> Will happy to get more recommendation on this .... >>> >>> -- >>> Take Care >>> Fawze Abujaber >>> >> >> > -- Take Care Fawze Abujaber