Hi,
Another observation is
My query had where conditions based on the partition values

Total number of parquet files in directory  - 102290
> Before Metadata refresh - Its reading only 4 files
> After metadata refresh - its reading 102290 files


This is how the refresh metadata works I mean it scans each and every files
and get the results ?

I dont  have access to logs now .

Thanks,
Divya

On 17 August 2017 at 13:48, Divya Gehlot <[email protected]> wrote:

> Hi,
> Another observation is
> My query had where conditions based on the partition values
> Before Metadata refresh - Its reading only 4 files
> After metadata refresh - its reading 102290 files
>
> Thanks,
> Divya
>
> On 17 August 2017 at 13:03, Padma Penumarthy <[email protected]> wrote:
>
>> Does your query have partition filter ?
>> Execution time is increased most likely because partition pruning is not
>> happening.
>> Did you get a chance to look at the logs ?  That might give some clues.
>>
>> Thanks,
>> Padma
>>
>>
>> > On Aug 16, 2017, at 9:32 PM, Divya Gehlot <[email protected]>
>> wrote:
>> >
>> > Hi,
>> > Even I am surprised .
>> > I am running Drill version 1.10  on MapR enterprise version.
>> > *Query *- Selecting all the columns on partitioned parquet table
>> >
>> > I observed few things from Query statistics :
>> >
>> > Value
>> >
>> > Before Refresh Metadata
>> >
>> > After Refresh Metadata
>> >
>> > Fragments
>> >
>> > 1
>> >
>> > 13
>> >
>> > DURATION
>> >
>> > 01 min 0.233 sec
>> >
>> > 18 min 0.744 sec
>> >
>> > PLANNING
>> >
>> > 59.818 sec
>> >
>> > 33.087 sec
>> >
>> > QUEUED
>> >
>> > Not Available
>> >
>> > Not Available
>> >
>> > EXECUTION
>> >
>> > 0.415 sec
>> >
>> > 17 min 27.657 sec
>> >
>> > The planning time is being reduced by approx 60% but the execution time
>> > increased  drastically.
>> > I would like to understand why the exceution time increases after the
>> > metadata refresh .
>> >
>> >
>> > Appreciate the help.
>> >
>> > Thanks,
>> > divya
>> >
>> >
>> > On 17 August 2017 at 11:54, Padma Penumarthy <[email protected]>
>> wrote:
>> >
>> >> Refresh table metadata should  help reduce query planning time.
>> >> It is odd that it went up after you did refresh table metadata.
>> >> Did you check the logs to see what is happening ? You might have to
>> >> turn on some debugs if needed.
>> >> BTW, what version of Drill are you running ?
>> >>
>> >> Thanks,
>> >> Padma
>> >>
>> >>
>> >>> On Aug 16, 2017, at 8:15 PM, Divya Gehlot <[email protected]>
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>> I have data in parquet file format .
>> >>> when I run the query the data and see the execution plan I could see
>> >>> following
>> >>> statistics
>> >>>
>> >>>> TOTAL FRAGMENTS: 1
>> >>>>> DURATION: 01 min 0.233 sec
>> >>>>> PLANNING: 59.818 sec
>> >>>>> QUEUED: Not Available
>> >>>>> EXECUTION: 0.415 sec
>> >>>>
>> >>>>
>> >>>
>> >>> As its a paquet file format I tried enabling refresh meta data
>> >>> and run below command
>> >>> REFRESH TABLE METADATA <path to table> ;
>> >>> then run the same query again on the same table same data (no changes
>> in
>> >>> data)  and could find the statistics as show below :
>> >>>
>> >>> TOTAL FRAGMENTS: 13
>> >>>>> DURATION: 14 min 14.604 sec
>> >>>>> PLANNING: 33.087 sec
>> >>>>> QUEUED: Not Available
>> >>>>> EXECUTION: Not Available
>> >>>>
>> >>>>
>> >>> The query is still running .
>> >>>
>> >>> Can somebody help me  understand why the query taking so long once I
>> >> issue
>> >>> the refresh metadata command.
>> >>>
>> >>> Aprreciate the help !
>> >>>
>> >>> Thanks,
>> >>> Divya
>> >>
>> >>
>>
>>
>

Reply via email to