Re: Performance querying a single column out of a parquet file

Ted Dunning Mon, 11 Apr 2016 07:43:33 -0700

Did you mean that you are doing a select to find a single column? What you
typed was row, but that seems out of line with the rest of what you wrote.


If you are truly asking about filtering down to a single row, whether it
costs more to return all of the columns rather than just one from a single
row will depend on whether Drill is extracting columns before filtering or
after.



On Mon, Apr 11, 2016 at 6:41 AM, Johannes Zillmann <[email protected]
> wrote:

> Hey there,
>
> i currently doing some performance measurements on Drill.
> In my case its a single parquet file with a single local Drill Bit.
>
> Now in one case i have unexpected results and i’m curious if somebody has
> a good explanation for it!
>
> So i have a file with 10 mio rows with 9 columns .
> Now i’m doing a select statement to find one single row.
> Runtime with select * : ~ 14.79 s
> Runtime with select(filterField) : ~ 1.5 sec
>
> So i’m surprised that there is so much variance depending on the fields i
> select, since i thought Drill needs most time for finding that one element,
> and then deserialize the other fields only on a hit…
> But for deserialising 8 more hits 10 sec seem way to much!?!?!?
>
> best
> Johannes
>
>

Re: Performance querying a single column out of a parquet file

Reply via email to