I have a use-case where 100s of TB of data is in HDFS. Installing Drill on all 
nodes of the HDFS is not an option.  If I have a separate Apache Drill cluster 
(external to HDFS), how will  Apache Drill SQL perform with large data sets ?  
Specifically I would like to know if Drill submits MapReduce jobs on HDFS or 
does Drill extract all data from HDFS cluster into Drill cluster before 
applying filters/joins ? Will Drill pushdown SQL into HDFS ?



Reply via email to