Re: Load metadata exactly in need

2017-09-11 Thread Dimitris Tsirogiannis
Hi Quanlong, You're right. The catalog needs to handle metadata at a finer granularity. We are actively looking into the options you mentioned as well as other related changes (see IMPALA-3234 and IMPALA-3127) to improve the performance and scalability of metadata management. Thanks Dimitris On

Re:Re: When should we use INVALIDATE METADATA instead of REFRESH statement?

2017-09-11 Thread Quanlong Huang
Thank Dimitris! At 2017-09-12 01:15:46, "Dimitris Tsirogiannis" wrote: >Hi Quanlong, > >You're pretty much correct. REFRESH can handle the majority of external >metadata modifications (adding/dropping files/partitions, etc) and >INVALIDATE METADATA should be used in

Load metadata exactly in need

2017-09-11 Thread Quanlong Huang
Hi all, Currently if a "describe" statement hits an incomplete table, the impalad will send an RPC request to the catalogd for loading metadata of this table. It will take a long time for tables with many partitions and many files. However, to serve the "describe" statement, we just need the

When should we use INVALIDATE METADATA instead of REFRESH statement?

2017-09-11 Thread Quanlong Huang
Hi all, I used to thought that REFRESH statement is just incremental metadata reload. It can't detect file deletion or modification. So we should use INVALIDATE METADATA for these cases. However, one of my friends told me that they always use REFRESH statement in their ETL pipeline, either

New Impala contributors: IMPALA-5614

2017-09-11 Thread Jim Apple
If you'd like to contribute a patch to Impala, but aren't sure what you want to work on, you can look at Impala's newbie issues: https://issues.apache.org/jira/issues/?filter=12341668. You can find detailed instructions on submitting patches at

Re: When should we use INVALIDATE METADATA instead of REFRESH statement?

2017-09-11 Thread Dimitris Tsirogiannis
Hi Quanlong, You're pretty much correct. REFRESH can handle the majority of external metadata modifications (adding/dropping files/partitions, etc) and INVALIDATE METADATA should be used in the two use cases you mention. I am sorry you had to look at the code to figure that out. I checked our

Re: Re: Load metadata exactly in need

2017-09-11 Thread Dimitris Tsirogiannis
Thanks for the feedback Quanlong. We plan on addressing many of these catalog issues in the immediate future. Dimitris On Mon, Sep 11, 2017 at 10:21 PM, Quanlong Huang wrote: > Hi Dimitris, > > Thanks for your quick reply! > > IMPALA-3127 is a great ticket. But it still

Re:Re: Load metadata exactly in need

2017-09-11 Thread Quanlong Huang
Hi Dimitris, Thanks for your quick reply! IMPALA-3127 is a great ticket. But it still has no progress and no assignee. Is it tracked in your internal Jira? Hopes this can be done soon, since some users may choose Presto instead of Impala due to these usability cases. Thanks Quanlong At