Drill will read the data directly from HDFS in parallel. The performance
will depend on the size of the Drill cluster, the size of the HDFS cluster,
and the network. Drill does not translate SQL into MapReduce (the only
system that works that way is Hive - but that approach lends itself to much
slower performance particularly for ad-hoc analysis).


On Sat, Jan 2, 2016 at 12:28 PM, Shashanka Kuntala <
[email protected]> wrote:

> I have a use-case where 100s of TB of data is in HDFS. Installing Drill on
> all nodes of the HDFS is not an option.  If I have a separate Apache Drill
> cluster (external to HDFS), how will  Apache Drill SQL perform with large
> data sets ?  Specifically I would like to know if Drill submits MapReduce
> jobs on HDFS or does Drill extract all data from HDFS cluster into Drill
> cluster before applying filters/joins ? Will Drill pushdown SQL into HDFS ?
>
>
>
>


-- 
Tomer Shiran
CEO and Co-Founder, Dremio

Reply via email to