The query plans can indicate if a query is parallelized, by looking for exchanges, which are used to merge work from multiple execution fragments, or to re-distribute data for an operation. Execution fragments can run on different threads or different machines. The best place to find out how queries are executing are the query profiles available in the Web UI under the "profiles" tab. Here the list of Major fragments will include which hostnames the fragment was run on, for queries with enough data volume that parallelization will improve query performance, you should see more than one host listed.
Please see this section of the docs for more info on how to tune your drill queries: https://drill.apache.org/docs/performance-tuning-introduction/ On Thu, Jan 21, 2016 at 9:00 AM, Matt <[email protected]> wrote: > Running a CTAS from csv files in a 4 node HDFS cluster into a Parquet > file, and I note the physical plan in the Drill UI references scans of all > the csv sources on a single node. > > collectl implies read and write IO on all 4 nodes - does this imply that > the full cluster is used for scanning the source files, or does my > configuration have the reads pinned to a single node? > > Nodes in cluster: es0{5-8}: > > Scan(groupscan=[EasyGroupScan > [selectionRoot=hdfs://es05:54310/csv/customer, numFiles=33, columns=[`*`], > files=[hdfs://es05:54310/csv/customer/customer_20151001.csv, > hdfs://es05:54310/csv/customer/customer_20151002.csv, > hdfs://es05:54310/csv/customer/customer_20151003.csv, > hdfs://es05:54310/csv/customer/customer_20151004.csv, > hdfs://es05:54310/csv/customer/customer_20151005.csv, > hdfs://es05:54310/csv/customer/customer_20151006.csv, > hdfs://es05:54310/csv/customer/customer_20151007.csv, > hdfs://es05:54310/csv/customer/customer_20151008.csv, > hdfs://es05:54310/csv/customer/customer_20151009.csv, > hdfs://es05:54310/csv/customer/customer_20151010.csv, > hdfs://es05:54310/csv/customer/customer_20151011.csv, > hdfs://es05:54310/csv/customer/customer_20151012.csv, > hdfs://es05:54310/csv/customer/customer_20151013.csv, > hdfs://es05:54310/csv/customer/customer_20151014.csv, > hdfs://es05:54310/csv/customer/customer_20151015.csv, > hdfs://es05:54310/csv/customer/customer_20151016.csv, > hdfs://es05:54310/csv/customer/customer_20151017.csv, > hdfs://es05:54310/csv/customer/customer_20151018.csv, > hdfs://es05:54310/csv/customer/customer_20151019.csv, > hdfs://es05:54310/csv/customer/customer_20151020.csv, > hdfs://es05:54310/csv/customer/customer_20151021.csv, > hdfs://es05:54310/csv/customer/customer_20151022.csv, > hdfs://es05:54310/csv/customer/customer_20151023.csv, > hdfs://es05:54310/csv/customer/customer_20151024.csv, > hdfs://es05:54310/csv/customer/customer_20151025.csv, > hdfs://es05:54310/csv/customer/customer_20151026.csv, > hdfs://es05:54310/csv/customer/customer_20151027.csv, > hdfs://es05:54310/csv/customer/customer_20151028.csv, > hdfs://es05:54310/csv/customer/customer_20151029.csv, > hdfs://es05:54310/csv/customer/customer_20151030.csv, > hdfs://es05:54310/csv/customer/customer_20151031.csv, > hdfs://es05:54310/csv/customer/customer_20151101.csv, > hdfs://es05:54310/csv/customer/customer_20151102.csv]]]) : rowType = > (DrillRecordRow[*]): rowcount = 2.407374395E9, cumulative cost = > {2.407374395E9 rows, 2.407374395E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, > id = 4355
