Hi LittleCho,

These notes might be helpful:

https://drill.apache.org/docs/architecture-introduction/

<https://drill.apache.org/docs/architecture-introduction/>https://drill.apache.org/docs/drill-query-execution/


Also, please look into the notes for 
https://issues.apache.org/jira/browse/DRILL-4706 which are more applicable to 
Parquet.


However, from what I understand, in general, the scan fragments would be 
assigned to the data nodes but Drill might end up doing some remote reads.


@Padma Penumarthy<mailto:ppenumar...@mapr.com> could you please let us know if 
this is the case?


Gautam





________________________________
From: LittleCho <little...@littlecho.tw>
Sent: Tuesday, November 7, 2017 6:24:49 AM
To: user@drill.apache.org
Subject: Question about how Drill optimizes the queries and splits the loads in 
HDFS cluster?

Hello all:

   I have been studying installing Drill on data nodes within a hadoop
   cluster. According to the Drill's online document, we can install
   Drill on each datanode of hdfs. And then we can change the connection
   setting in file storage plugin to hdfs's namenode to finish the set
   up. And here comes my question, as we know a file will be split into
   several blocks based on the setting, so is the query will be split
   and assigned to each drill instance on each datanode? I would like to
   know how more about how Drill works in distributed mode with hdfs
   cluster. Thank you!!

--
BR, LittleCho

Reply via email to