Thanks Rafael for your answer. As I wrote in the previous email these planning times occur even when selecting one fields from one tiny file (60k) that I pass directly by full path (select name from `parquet/data/data.parquet` limit 1). Any idea what can influence the time in such a trivial scenario? In addition, doesn't Drill cache execution plans between similar queries executions? Best regards, Avner
On Thu, Jun 4, 2020 at 2:55 PM Rafael Jaimes III <rafjai...@gmail.com> wrote: > Hi Avner, > > One way you might be able to optimize this is by modifying the size > and number of the parquet files. How many files do you have and how > big are they? Do you know what the row group size is? What is the HDFS > block size is on your storage? > > There's probably a lot more intricate ways to improve performance with > the Drill settings, but I have not modified them. > > - Rafael > > On Thu, Jun 4, 2020 at 2:43 PM Avner Levy <avner.l...@gmail.com> wrote: > > > > I'm running Apache Drill (1.18 master branch) in a docker with data > stored > > in Parquet files on S3. > > When I run queries, even the most simple ones such as: > > > > select name from `parquet/data/data.parquet` limit 1 > > > > The "Planning" time is 0.7-1.5 sec while the "Execution" is only 0.112 > sec. > > These proportions are maintained even if I run the same query multiple > > times in a row. > > Since I'm trying to minimize query times to a minimum, I was wondering if > > such planning times (compared to execution) make sense and is there any > way > > to reduce it? (some plan caching mechanism) > > Thanks, > > Avner >