You can find documentation on the -default_query_options flag here:
https://impala.apache.org/docs/build/html/topics/impala_config_options.html

Keep in mind that setting replica_preference to REMOTE will make Impala
ignore any locality when deciding where to schedule a read. Even within the
group of impalads that have local storage attached, Impala will pick a
randomized assignment, optimizing for the number of bytes read by each
node. There is currently no logic to schedule a fraction of the reads
locally and assign the rest to remote impalads (such a scenario wasn't part
of the considerations when working on the scheduler).



On Thu, Apr 19, 2018 at 9:47 AM, Fawze Abujaber <fawz...@gmail.com> wrote:

> Thanks Tim for you quick response as usual,
>
> Can you send me a documentation how to do that or send me detail example
> how to do that globally and per pool ...
>
> Again much appreciate your readiness to help
>
> On Thu, 19 Apr 2018 at 19:43 Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
>
>> We have a way to set global and per-pool defaults for query options. You
>> can set default query options via the --default_query_options startup flag
>> or if you have resource pools set up, you can set default query option
>> values for queries submitted to each resource pool (including the default
>> pool)
>>
>> On Tue, Apr 17, 2018 at 3:27 AM, Fawze Abujaber <fawz...@gmail.com>
>> wrote:
>>
>>> Thanks Tim,
>>>
>>> That's means that i cannot disable this cross the impala cluster and i
>>> need to manage this at the query level, right?
>>>
>>> Is it any configuration at the cluster level to disable this?
>>>
>>> On Wed, Apr 4, 2018 at 3:44 AM, Tim Armstrong <tarmstr...@cloudera.com>
>>> wrote:
>>>
>>>> I agree with Jim's answers.
>>>>
>>>> You may run into challenges if you have some Impala daemons that have
>>>> local DataNodes and some that do not have local DataNodes. By default
>>>> Impala always chooses a daemon with a local copy of the data, which would
>>>> mean that daemons without a co-located DataNode might never get fragments
>>>> scheduled on them. We do have a knob that let's you disable locality-based
>>>> scheduling https://impala.apache.org/docs/build/html/topics/impala_
>>>> replica_preference.html but that may be too blunt an instrument.
>>>>
>>>> On Tue, Apr 3, 2018 at 11:34 AM, Jim Apple <jbap...@cloudera.com>
>>>> wrote:
>>>>
>>>>> I think the answers are:
>>>>>
>>>>> 1. It depends on your workload and your network. I know some users run
>>>>> with ONLY remote reads and still get performance they are happy with. Your
>>>>> existing nodes will continue to be able to short-circuit read.
>>>>>
>>>>> 2. This is highly workload-dependent. You want to try and avoid
>>>>> spilling, obviously, but if your spinning disk can write 200MB/s it would
>>>>> take 3000 seconds, which is 50 minutes, to fill up.
>>>>>
>>>>> 3. I think the impalads are smart enough to not try and do a
>>>>> short-circuit read on data that isn't local.
>>>>>
>>>>> On Tue, Apr 3, 2018 at 10:22 AM, Fawze Abujaber <fawz...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have reached a point in my cluster that i don't need more storage
>>>>>> for the HDFS and i need to add processing power, i'm using Yarn,Spark and
>>>>>> Impala on the normal nodes for processing.
>>>>>>
>>>>>> My questions:
>>>>>>
>>>>>> 1- How much the data locality will impact impala performance as i
>>>>>> know impala rely on data locality on it's processing?
>>>>>>
>>>>>> 2- I have OS disk with 600GB, will this be enough to be used to spill
>>>>>> to disk when needed? is it dependent on other factors, the impala daemon
>>>>>> memory limit is 35GB.
>>>>>>
>>>>>> 3- Should i disable the  *HDFS Short Circuit Read*  on these nodes?
>>>>>>
>>>>>> Will happy to get more recommendation on this ....
>>>>>>
>>>>>> --
>>>>>> Take Care
>>>>>> Fawze Abujaber
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Take Care
>>> Fawze Abujaber
>>>
>>
>> --
> Take Care
> Fawze Abujaber
>

Reply via email to