Thanks Rafael for your answer.
As I wrote in the previous email these planning times occur even when
selecting one fields from one tiny file (60k) that I pass directly by full
path (select name from `parquet/data/data.parquet` limit 1).
Any idea what can influence the time in such a trivial scenario?
In addition, doesn't Drill cache execution plans between similar queries
executions?
Best regards,
 Avner


On Thu, Jun 4, 2020 at 2:55 PM Rafael Jaimes III <rafjai...@gmail.com>
wrote:

> Hi Avner,
>
> One way you might be able to optimize this is by modifying the size
> and number of the parquet files. How many files do you have and how
> big are they? Do you know what the row group size is? What is the HDFS
> block size is on your storage?
>
> There's probably a lot more intricate ways to improve performance with
> the Drill settings, but I have not modified them.
>
> - Rafael
>
> On Thu, Jun 4, 2020 at 2:43 PM Avner Levy <avner.l...@gmail.com> wrote:
> >
> > I'm running Apache Drill (1.18 master branch) in a docker with data
> stored
> > in Parquet files on S3.
> > When I run queries, even the most simple ones such as:
> >
> > select name from `parquet/data/data.parquet` limit 1
> >
> > The "Planning" time is 0.7-1.5 sec while the "Execution" is only 0.112
> sec.
> > These proportions are maintained even if I run the same query multiple
> > times in a row.
> > Since I'm trying to minimize query times to a minimum, I was wondering if
> > such planning times (compared to execution) make sense and is there any
> way
> > to reduce it? (some plan caching mechanism)
> > Thanks,
> >   Avner
>

Reply via email to