Re: Multiple fragments in apache drill

Kunal Khatua Wed, 13 Feb 2019 11:23:06 -0800

Hi Hugues

The number of fragments is determined by the number of sources (i.e. whether 
the data can be read in parallel) and the number of estimated rows.
CSV and Parquet files are easy to read in parallel, but JSON files are not, 
because Drill does not know how many JSON documents exist in the file and where 
their offsets are.


The number of estimated rows tells Drill whether to parallelize a major 
fragment of operators. You can try reducing this property in your 
session/system via the UI [/options page] : 
planner.slice_target

~ Kunal

On 2/13/2019 7:14:34 AM, Kwizera hugues Teddy <[email protected]> wrote:
Hello Team drill,

I'm executing a query in Apache drill cluster, however, it is making only 1
minor segment. I have tried various queries like union of 2 queries
, aggragation etc, and executing it on millions records however it is
still making 1 fragment only. Is there any configuration change that I can
do for making multiple segments so that these could be executed on each
drill bit individually. How can I confirm whether the query is being
executed on 1 drillbit instance or multiple instances.

- We are trying to compare Impala vs Drill , but for the moment Impala is
more fast Than Drill

- Environment :

Drill On Yarn : whith 6 drillbits;


Regards Hugues Teddy

Re: Multiple fragments in apache drill

Reply via email to