Hi,

First of, Drill is an amazing tool and makes my work so much easier when I
have different data types and sources to deal with, its a real life-saver.
So thank you all for creating it!

I've been testing Drill locally with an MS SQL Server, using the PyDrill
Python driver. My larger queries are taking a long time to run. For e.g. a
query extracting ~1 million rows and 8 columns takes 30-40 minutes to
execute.

I have a few questions I'm hoping to get some clarity on:

   1. Is this performance expected or unusual and if so, is it because
   Drill is not optimized for RDBMS querying?

   2. Is there any way for me to speed up queries (apart from running it on
   a cluster in distributed mode), for e.g. by specifying the schema in the
   query so that Drill doesn't need to spend time in schema discovery?

   3. When I look the Physical Plan on the Drill interface for the query, I
   don't see any values in *cumulative cost *in any of my fragment profiles
   *:*

   cumulative cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}

   Is there a particular reason for this?

If you've got any other tips/tricks for working with relational databases
in Drill, please do let me know.

Thank You!!
Karan Hegde

Reply via email to