Hi LittleCho,
These notes might be helpful: https://drill.apache.org/docs/architecture-introduction/ <https://drill.apache.org/docs/architecture-introduction/>https://drill.apache.org/docs/drill-query-execution/ Also, please look into the notes for https://issues.apache.org/jira/browse/DRILL-4706 which are more applicable to Parquet. However, from what I understand, in general, the scan fragments would be assigned to the data nodes but Drill might end up doing some remote reads. @Padma Penumarthy<mailto:ppenumar...@mapr.com> could you please let us know if this is the case? Gautam ________________________________ From: LittleCho <little...@littlecho.tw> Sent: Tuesday, November 7, 2017 6:24:49 AM To: user@drill.apache.org Subject: Question about how Drill optimizes the queries and splits the loads in HDFS cluster? Hello all: I have been studying installing Drill on data nodes within a hadoop cluster. According to the Drill's online document, we can install Drill on each datanode of hdfs. And then we can change the connection setting in file storage plugin to hdfs's namenode to finish the set up. And here comes my question, as we know a file will be split into several blocks based on the setting, so is the query will be split and assigned to each drill instance on each datanode? I would like to know how more about how Drill works in distributed mode with hdfs cluster. Thank you!! -- BR, LittleCho