Hey Ted,
Sorry i mixed up row and column!
Queries are like that:
(1) "SELECT * FROM dfs.`myParquetFile` WHERE `id` = 23"
(2) "SELECT id FROM dfs.`myParquetFile` WHERE `id` = 23"
(1) is 14 sec and (2) is 1.5 sec.
Using drill-1.6.
So it looks like Drill is extracting the columns before filtering which i
didn’t expect…
Is there anyway to change that behaviour ?
Johannes
> On 11 Apr 2016, at 16:42, Ted Dunning <[email protected]> wrote:
>
> Did you mean that you are doing a select to find a single column? What you
> typed was row, but that seems out of line with the rest of what you wrote.
>
> If you are truly asking about filtering down to a single row, whether it
> costs more to return all of the columns rather than just one from a single
> row will depend on whether Drill is extracting columns before filtering or
> after.
>
>
>
> On Mon, Apr 11, 2016 at 6:41 AM, Johannes Zillmann <[email protected]
>> wrote:
>
>> Hey there,
>>
>> i currently doing some performance measurements on Drill.
>> In my case its a single parquet file with a single local Drill Bit.
>>
>> Now in one case i have unexpected results and i’m curious if somebody has
>> a good explanation for it!
>>
>> So i have a file with 10 mio rows with 9 columns .
>> Now i’m doing a select statement to find one single row.
>> Runtime with select * : ~ 14.79 s
>> Runtime with select(filterField) : ~ 1.5 sec
>>
>> So i’m surprised that there is so much variance depending on the fields i
>> select, since i thought Drill needs most time for finding that one element,
>> and then deserialize the other fields only on a hit…
>> But for deserialising 8 more hits 10 sec seem way to much!?!?!?
>>
>> best
>> Johannes
>>
>>