Re: Drill Performance

François Méthot Thu, 08 Sep 2016 13:25:43 -0700

late reply...

We are trying to run drill on 200 nodes, but we keep getting random lost of
connectivity with certain nodes, which spoil the query completely, happens
maybe 50% of the time.
It depends on how many files gets queried, basically how heavy is the query.


It looks exactly like this problem:

ForemanException: One or more nodes lost connectivity during query
https://issues.apache.org/jira/browse/DRILL-4325

Until we find a solution, we stick to a dedicated dozen node cluster.

It would be nice to have to have a query recover from disconnected nodes
and keep gathering result from valid nodes.







On Thu, Jul 14, 2016 at 11:22 PM, scott <[email protected]> wrote:

> Curious what the biggest is. Has anyone configured more than 100 drillbits
> in a cluster before?
>
> Scott
>
> On 07/14/2016 10:27 AM, Ted Dunning wrote:
>
>> On the right distribution, you can restrict the subset of the cluster that
>> has the data you need to avoid locality variation when Drill only runs on
>> a
>> subset of nodes.
>>
>>
>>
>> On Thu, Jul 14, 2016 at 6:48 AM, François Méthot <[email protected]>
>> wrote:
>>
>> We have observed that if the number of drillbits is lower than the number
>>> of nodes in our cluster, some minor fragment takes longer to complete
>>> their
>>> query (We hypothesize that it is because they can't take advantage of
>>> data
>>> locality, fragment has to reach out for data on a different node). One
>>> drillbit to one node, with evenly spread data is the best scenario.
>>>
>>> These results may also vary depending on your hardware I think.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 7, 2016 at 7:06 PM, Ashish Goel <
>>> [email protected]>
>>> wrote:
>>>
>>> That's an interesting question. I would also be curious to learn more
>>>>
>>> about
>>>
>>>> this. Did anyone run any benchmarks around this? It would be helpful to
>>>> understand.
>>>>
>>>> On Thu, Jul 7, 2016 at 11:13 AM, scott <[email protected]> wrote:
>>>>
>>>> Abdel,
>>>>> I didn't ask about having more than one drillbit per node. I asked
>>>>>
>>>> about
>>>
>>>> the number of drillbits per cluster. For instance, if I had a 1000 node
>>>>> Hadoop cluster, should I install drillbits on each node? Or, is there
>>>>>
>>>> some
>>>>
>>>>> point at which the interaction of 1000 drillbits causes contention
>>>>> resulting in a plateau or decline of performance?
>>>>>
>>>>> Thanks,
>>>>> Scott
>>>>>
>>>>> On Thu, Jul 7, 2016 at 5:00 PM, Abdel Hakim Deneche <
>>>>>
>>>> [email protected]
>>>>
>>>>> wrote:
>>>>>
>>>>> I'm not sure you'll get any performance improvement from running more
>>>>>>
>>>>> than
>>>>>
>>>>>> a single drillbit per cluster node.
>>>>>>
>>>>>> On Thu, Jul 7, 2016 at 9:47 AM, scott <[email protected]> wrote:
>>>>>>
>>>>>> Follow up question: Is there a sweet spot for
>>>>>>>
>>>>>> DRILL_MAX_DIRECT_MEMORY
>>>
>>>> and
>>>>>
>>>>>> DRILL_HEAP settings?
>>>>>>>
>>>>>>> On Wed, Jul 6, 2016 at 2:42 PM, scott <[email protected]> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>> Does anyone know if there is a maximum number of drillbits
>>>>>>>>
>>>>>>> recommended
>>>>>
>>>>>> in
>>>>>>
>>>>>>> a Drill cluster? For example, I've observed that in a Solr Cloud,
>>>>>>>>
>>>>>>> the
>>>>
>>>>> performance tapers off for ingest at around 16 JVM instances. Is
>>>>>>>>
>>>>>>> there
>>>>>
>>>>>> a
>>>>>>
>>>>>>> similar practical limitation to the number of drillbits I should
>>>>>>>>
>>>>>>> cluster
>>>>>>
>>>>>>> together?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Scott
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Abdelhakim Deneche
>>>>>>
>>>>>> Software Engineer
>>>>>>
>>>>>>    <http://www.mapr.com/>
>>>>>>
>>>>>>
>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>> <
>>>>>>
>>>>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Sig
>>> nature&utm_campaign=Free%20available
>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Ashish
>>>>
>>>>
>

Re: Drill Performance

Reply via email to