Re: Performance querying a single column out of a parquet file

Ted Dunning Mon, 11 Apr 2016 09:25:09 -0700

I just replicated these results. Full table scans with aggregation take
pretty much exactly the same amount of time with or without filtering.




On Mon, Apr 11, 2016 at 8:09 AM, Johannes Zillmann <[email protected]
> wrote:

> Hey Ted,
>
> Sorry i mixed up row and column!
>
> Queries are like that:
>         (1) "SELECT * FROM dfs.`myParquetFile` WHERE `id` = 23"
>         (2) "SELECT id FROM dfs.`myParquetFile` WHERE `id` = 23"
>
> (1) is 14 sec and (2) is 1.5 sec.
> Using drill-1.6.
> So it looks like Drill is extracting the columns before filtering which i
> didn’t expect…
> Is there anyway to change that behaviour ?
>
> Johannes
>
>
>
> > On 11 Apr 2016, at 16:42, Ted Dunning <[email protected]> wrote:
> >
> > Did you mean that you are doing a select to find a single column? What
> you
> > typed was row, but that seems out of line with the rest of what you
> wrote.
> >
> > If you are truly asking about filtering down to a single row, whether it
> > costs more to return all of the columns rather than just one from a
> single
> > row will depend on whether Drill is extracting columns before filtering
> or
> > after.
> >
> >
> >
> > On Mon, Apr 11, 2016 at 6:41 AM, Johannes Zillmann <
> [email protected]
> >> wrote:
> >
> >> Hey there,
> >>
> >> i currently doing some performance measurements on Drill.
> >> In my case its a single parquet file with a single local Drill Bit.
> >>
> >> Now in one case i have unexpected results and i’m curious if somebody
> has
> >> a good explanation for it!
> >>
> >> So i have a file with 10 mio rows with 9 columns .
> >> Now i’m doing a select statement to find one single row.
> >> Runtime with select * : ~ 14.79 s
> >> Runtime with select(filterField) : ~ 1.5 sec
> >>
> >> So i’m surprised that there is so much variance depending on the fields
> i
> >> select, since i thought Drill needs most time for finding that one
> element,
> >> and then deserialize the other fields only on a hit…
> >> But for deserialising 8 more hits 10 sec seem way to much!?!?!?
> >>
> >> best
> >> Johannes
> >>
> >>
>
>

Re: Performance querying a single column out of a parquet file

Reply via email to