[jira] [Commented] (IMPALA-10722) truncate operation deletes data files before deleting metadata

2022-09-08 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602119#comment-17602119
 ] 

Vihang Karajgaonkar commented on IMPALA-10722:
--

Feel free to assign it to yourself. I am not working on this.

> truncate operation deletes data files before deleting metadata
> --
>
> Key: IMPALA-10722
> URL: https://issues.apache.org/jira/browse/IMPALA-10722
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Priority: Minor
>  Labels: newbie
>
> In case of truncate operation, we delete the data files first and then the 
> statistics. But since statistics are derived from data, we should first 
> delete statistics and then data files.
> See: 
> https://github.com/apache/impala/blob/822e8373d1f1737865899b80862c2be7b07cc950/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L2297



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11091) Update documentation for event polling

2022-01-26 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-11091:


 Summary: Update documentation for event polling
 Key: IMPALA-11091
 URL: https://issues.apache.org/jira/browse/IMPALA-11091
 Project: IMPALA
  Issue Type: Documentation
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


IMPALA-8795 enables event polling by default in Impala 4.1. This ticket tracks 
the changes in the document to reflect that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11091) Update documentation for event polling

2022-01-26 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-11091:


 Summary: Update documentation for event polling
 Key: IMPALA-11091
 URL: https://issues.apache.org/jira/browse/IMPALA-11091
 Project: IMPALA
  Issue Type: Documentation
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


IMPALA-8795 enables event polling by default in Impala 4.1. This ticket tracks 
the changes in the document to reflect that.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (IMPALA-8592) Add support for insert events for 'LOAD DATA..' statements from Impala.

2021-12-02 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452610#comment-17452610
 ] 

Vihang Karajgaonkar commented on IMPALA-8592:
-

One of the usecase here is that if you have multiple Impala clusters a load 
data statement in one Impala will not generate any events and hence the table 
will need to be refreshed on all the Impala clusters.

> Add support for insert events for 'LOAD DATA..' statements from Impala.
> ---
>
> Key: IMPALA-8592
> URL: https://issues.apache.org/jira/browse/IMPALA-8592
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Anurag Mantripragada
>Priority: Major
>
> Hive generates INSERT events for LOAD DATA.. statements. We should support 
> the same in Impala.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10886) TestReusePartitionMetadata.test_reuse_partition_meta fails

2021-12-02 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452520#comment-17452520
 ] 

Vihang Karajgaonkar commented on IMPALA-10886:
--

Do we know why we don't detect DROP_PARTITION as self-event in this case?

> TestReusePartitionMetadata.test_reuse_partition_meta fails
> --
>
> Key: IMPALA-10886
> URL: https://issues.apache.org/jira/browse/IMPALA-10886
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>  Labels: broken-build
> Attachments: test_local_catalog.patch
>
>
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14670/testReport/junit/custom_cluster.test_local_catalog/TestReusePartitionMetadata/test_reuse_partition_meta/
> {code}
> custom_cluster/test_local_catalog.py:586: in test_reuse_partition_meta
> self.check_missing_partitions(unique_database, 1)
> custom_cluster/test_local_catalog.py:595: in check_missing_partitions
> assert match.group(1) == str(partition_misses)
> E   assert '0' == '1'
> E - 0
> E + 1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9857) Batch ALTER_PARTITION events

2021-12-02 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-9857.
-
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Batch ALTER_PARTITION events
> 
>
> Key: IMPALA-9857
> URL: https://issues.apache.org/jira/browse/IMPALA-9857
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> When Hive inserts data into partitioned tables, it generates a lot of 
> ALTER_PARTITION (and possibly INSERT_EVENT) in quick succession. Currently, 
> such events are processed one by one by EventsProcessor which is can be slow 
> and can cause EventsProcessor to lag behind. This JIRA proposes to use 
> batching for such ALTER_PARTITION events such that all the successive 
> ALTER_PARTITION events for the same table are batched together into one 
> ALTER_PARTITIONS event and then are processed together to refresh all the 
> partitions from the events. This can significantly speed up the event 
> processing in such cases.   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9857) Batch ALTER_PARTITION events

2021-12-02 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-9857.
-
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Batch ALTER_PARTITION events
> 
>
> Key: IMPALA-9857
> URL: https://issues.apache.org/jira/browse/IMPALA-9857
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> When Hive inserts data into partitioned tables, it generates a lot of 
> ALTER_PARTITION (and possibly INSERT_EVENT) in quick succession. Currently, 
> such events are processed one by one by EventsProcessor which is can be slow 
> and can cause EventsProcessor to lag behind. This JIRA proposes to use 
> batching for such ALTER_PARTITION events such that all the successive 
> ALTER_PARTITION events for the same table are batched together into one 
> ALTER_PARTITIONS event and then are processed together to refresh all the 
> partitions from the events. This can significantly speed up the event 
> processing in such cases.   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (IMPALA-11028) Table loading could fail if metastore cleans up old events

2021-12-02 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-11028.
--
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Table loading could fail if metastore cleans up old events
> --
>
> Key: IMPALA-11028
> URL: https://issues.apache.org/jira/browse/IMPALA-11028
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> After IMPALA-10502, Catalogd tracks the table's create event id. When the 
> table is loaded for the first time, it updates the create event id of the 
> table. But if the table is loaded for the first time after a long delay 
> (after 24 hrs) it is possible the metastore cleans up old notification logs 
> entries which are required by catalogd during the table load.
> See this snippet from TableLoader.java
> {noformat}
>   if (eventId != -1 && catalog_.isEventProcessingActive()) {
> // If the eventId is not -1 it means this table was likely created by 
> Impala.
> // However, since the load operation of the table can happen much 
> later, it is
> // possible that the table was recreated outside Impala and hence the 
> eventId
> // which is stored in the loaded table needs to be updated to the 
> latest.
> // we are only interested in fetching the events if we have a valid 
> eventId
> // for a table. For tables where eventId is unknown are not created by
> // this catalogd and hence the self-event detection logic does not 
> apply.
> events = MetastoreEventsProcessor.getNextMetastoreEvents(catalog_, 
> eventId,
> notificationEvent -> CreateTableEvent.CREATE_TABLE_EVENT_TYPE
> .equals(notificationEvent.getEventType())
> && 
> notificationEvent.getDbName().equalsIgnoreCase(db.getName())
> && 
> notificationEvent.getTableName().equalsIgnoreCase(tblName));
>   }
> {noformat}
> {{getNextMetastoreEvents}} method can throw the following exception if the 
> metastore has cleaned up older entries (by default 24 hrs). This is 
> controlled by configuration {{hive.metastore.event.db.listener.timetolive}} 
> on the metastore side.
> I could reproduce the problem setting the following metastore configs.
> {noformat}
> hive.metastore.event.db.listener.clean.interval=10s
> hive.metastore.event.db.listener.timetolive=120s
> {noformat}
> Now run the following Impala script
> {noformat}
> create table t1 (c1 int);
> create table t2 (c1 int);
> select sleep(24);
> create table t3 (c1 int);
> select * from t1;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-11028) Table loading could fail if metastore cleans up old events

2021-12-02 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-11028.
--
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Table loading could fail if metastore cleans up old events
> --
>
> Key: IMPALA-11028
> URL: https://issues.apache.org/jira/browse/IMPALA-11028
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> After IMPALA-10502, Catalogd tracks the table's create event id. When the 
> table is loaded for the first time, it updates the create event id of the 
> table. But if the table is loaded for the first time after a long delay 
> (after 24 hrs) it is possible the metastore cleans up old notification logs 
> entries which are required by catalogd during the table load.
> See this snippet from TableLoader.java
> {noformat}
>   if (eventId != -1 && catalog_.isEventProcessingActive()) {
> // If the eventId is not -1 it means this table was likely created by 
> Impala.
> // However, since the load operation of the table can happen much 
> later, it is
> // possible that the table was recreated outside Impala and hence the 
> eventId
> // which is stored in the loaded table needs to be updated to the 
> latest.
> // we are only interested in fetching the events if we have a valid 
> eventId
> // for a table. For tables where eventId is unknown are not created by
> // this catalogd and hence the self-event detection logic does not 
> apply.
> events = MetastoreEventsProcessor.getNextMetastoreEvents(catalog_, 
> eventId,
> notificationEvent -> CreateTableEvent.CREATE_TABLE_EVENT_TYPE
> .equals(notificationEvent.getEventType())
> && 
> notificationEvent.getDbName().equalsIgnoreCase(db.getName())
> && 
> notificationEvent.getTableName().equalsIgnoreCase(tblName));
>   }
> {noformat}
> {{getNextMetastoreEvents}} method can throw the following exception if the 
> metastore has cleaned up older entries (by default 24 hrs). This is 
> controlled by configuration {{hive.metastore.event.db.listener.timetolive}} 
> on the metastore side.
> I could reproduce the problem setting the following metastore configs.
> {noformat}
> hive.metastore.event.db.listener.clean.interval=10s
> hive.metastore.event.db.listener.timetolive=120s
> {noformat}
> Now run the following Impala script
> {noformat}
> create table t1 (c1 int);
> create table t2 (c1 int);
> select sleep(24);
> create table t3 (c1 int);
> select * from t1;
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (IMPALA-11028) Table loading could fail if metastore cleans up old events

2021-11-18 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-11028:


 Summary: Table loading could fail if metastore cleans up old events
 Key: IMPALA-11028
 URL: https://issues.apache.org/jira/browse/IMPALA-11028
 Project: IMPALA
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


After IMPALA-10502, Catalogd tracks the table's create event id. When the table 
is loaded for the first time, it updates the create event id of the table. But 
if the table is loaded for the first time after a long delay (after 24 hrs) it 
is possible the metastore cleans up old notification logs entries which are 
required by catalogd during the table load.

See this snippet from TableLoader.java
{noformat}
  if (eventId != -1 && catalog_.isEventProcessingActive()) {
// If the eventId is not -1 it means this table was likely created by 
Impala.
// However, since the load operation of the table can happen much 
later, it is
// possible that the table was recreated outside Impala and hence the 
eventId
// which is stored in the loaded table needs to be updated to the 
latest.
// we are only interested in fetching the events if we have a valid 
eventId
// for a table. For tables where eventId is unknown are not created by
// this catalogd and hence the self-event detection logic does not 
apply.
events = MetastoreEventsProcessor.getNextMetastoreEvents(catalog_, 
eventId,
notificationEvent -> CreateTableEvent.CREATE_TABLE_EVENT_TYPE
.equals(notificationEvent.getEventType())
&& notificationEvent.getDbName().equalsIgnoreCase(db.getName())
&& notificationEvent.getTableName().equalsIgnoreCase(tblName));
  }
{noformat}

{{getNextMetastoreEvents}} method can throw the following exception if the 
metastore has cleaned up older entries (by default 24 hrs). This is controlled 
by configuration {{hive.metastore.event.db.listener.timetolive}} on the 
metastore side.

I could reproduce the problem setting the following metastore configs.

{noformat}
hive.metastore.event.db.listener.clean.interval=10s
hive.metastore.event.db.listener.timetolive=120s
{noformat}

Now run the following Impala script
{noformat}
create table t1 (c1 int);
create table t2 (c1 int);
select sleep(24);
create table t3 (c1 int);
select * from t1;
{noformat}





--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-11028) Table loading could fail if metastore cleans up old events

2021-11-18 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-11028:


 Summary: Table loading could fail if metastore cleans up old events
 Key: IMPALA-11028
 URL: https://issues.apache.org/jira/browse/IMPALA-11028
 Project: IMPALA
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


After IMPALA-10502, Catalogd tracks the table's create event id. When the table 
is loaded for the first time, it updates the create event id of the table. But 
if the table is loaded for the first time after a long delay (after 24 hrs) it 
is possible the metastore cleans up old notification logs entries which are 
required by catalogd during the table load.

See this snippet from TableLoader.java
{noformat}
  if (eventId != -1 && catalog_.isEventProcessingActive()) {
// If the eventId is not -1 it means this table was likely created by 
Impala.
// However, since the load operation of the table can happen much 
later, it is
// possible that the table was recreated outside Impala and hence the 
eventId
// which is stored in the loaded table needs to be updated to the 
latest.
// we are only interested in fetching the events if we have a valid 
eventId
// for a table. For tables where eventId is unknown are not created by
// this catalogd and hence the self-event detection logic does not 
apply.
events = MetastoreEventsProcessor.getNextMetastoreEvents(catalog_, 
eventId,
notificationEvent -> CreateTableEvent.CREATE_TABLE_EVENT_TYPE
.equals(notificationEvent.getEventType())
&& notificationEvent.getDbName().equalsIgnoreCase(db.getName())
&& notificationEvent.getTableName().equalsIgnoreCase(tblName));
  }
{noformat}

{{getNextMetastoreEvents}} method can throw the following exception if the 
metastore has cleaned up older entries (by default 24 hrs). This is controlled 
by configuration {{hive.metastore.event.db.listener.timetolive}} on the 
metastore side.

I could reproduce the problem setting the following metastore configs.

{noformat}
hive.metastore.event.db.listener.clean.interval=10s
hive.metastore.event.db.listener.timetolive=120s
{noformat}

Now run the following Impala script
{noformat}
create table t1 (c1 int);
create table t2 (c1 int);
select sleep(24);
create table t3 (c1 int);
select * from t1;
{noformat}





--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (IMPALA-10987) Changing impala.disableHmsSync in Hive can break event processing

2021-10-26 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434499#comment-17434499
 ] 

Vihang Karajgaonkar commented on IMPALA-10987:
--

Possible solutions to improve this:
1. In case a table level sync is re-enabled:
  a. if the table exists in Impala, we can just invalidate the table so that it 
is reloaded the first time query accesses it. This would take of any missing 
ADD/DROP partition events on the table during the time the events sync was 
disabled on the table.
  b. If the table doesn't exist in Impala, create a Incomplete table, if there 
is no entry in the event delete log for this table.

I am not sure how to handle a database level sync re-enable efficiently. I wish 
we had a {{refresh database}} which would have been useful here. The other 
approach is to invalidate any tables in the database which evaluate to sync 
being turned on and previously didn't have them as turned on. We will still 
need to handle the missing create/drop table events during the time window when 
the events sync was disabled.


> Changing impala.disableHmsSync in Hive can break event processing
> -
>
> Key: IMPALA-10987
> URL: https://issues.apache.org/jira/browse/IMPALA-10987
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Csaba Ringhofer
>Priority: Major
>
> To reproduce, start Impala with event polling:
> {code}
> bin/start-impala-cluster.py --catalogd_args="--hms_event_polling_interval_s=2 
> --catalog_topic_mode=minimal" --impalad_args="--use_local_catalog=1"
> {code}
> From Hive:
> {code}
> CREATE DATABASE temp;
> CREATE EXTERNAL TABLE temp.t (i int) PARTITIONED BY (p int) 
> TBLPROPERTIES('impala.disableHmsSync'='true');
> ALTER TABLE temp.t SET TBLPROPERTIES ('impala.disableHmsSync'='false');
> {code}
> From this point event sync will be broken in Impala. It can be fixed only 
> with global INVALIDATE METADATA (or restarting catalogd)
> catalogd log will include an exception like this:
> {code}
> E1026 10:30:16.151208 22514 MetastoreEventsProcessor.java:653] Event 
> processing needs a invalidate command to resolve the state
> Java exception follows:
> org.apache.impala.catalog.events.MetastoreNotificationNeedsInvalidateException:
>  EventId: 15956 EventType: ALTER_TABLE Detected that event sync was tur
> ned on for the table temp.t and the table does not exist. Event processing 
> cannot be continued further. Issue a invalidate metadata command to reset
>  the event processing state
> at 
> org.apache.impala.catalog.events.MetastoreEvents$AlterTableEvent.process(MetastoreEvents.java:992)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:345)
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:747)
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:645)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> and future events will be lead to a log like this:
> {code}
> W1026 10:30:18.151962 22514 MetastoreEventsProcessor.java:638] Event 
> processing is skipped since status is NEEDS_INVALIDATE. Last synced event id 
> is 15955
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10987) Changing impala.disableHmsSync in Hive can break event processing

2021-10-26 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434491#comment-17434491
 ] 

Vihang Karajgaonkar commented on IMPALA-10987:
--

Yes, unfortunately, currently we require a global invalidate to reset the 
events processor if the events sync is reenabled on a table. My original 
thinking behind this design decision was that 

1) Events processor cannot just start processing events from that point onwards 
because of the fact that it might have missed some create/drop events as well. 
This is probably more relevant to database level flag than table level although 
a table may also had add/drop partition events which are skipped during this 
time window. 
2) I did not anticipate re-enabling events sync on a table or database may not 
be very common. This would likely be a one-time operation and hence I thought 
it was okay to do a catalogd reset. 

That said this was always on my to-do list to get rid of this requirement. I 
will look into ways to avoid doing global invalidate when the events sync is 
turned back on. I don't think this is a bug since it is documented behavior. 
See https://impala.apache.org/docs/build/html/topics/impala_metadata.html

> Changing impala.disableHmsSync in Hive can break event processing
> -
>
> Key: IMPALA-10987
> URL: https://issues.apache.org/jira/browse/IMPALA-10987
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Csaba Ringhofer
>Priority: Major
>
> To reproduce, start Impala with event polling:
> {code}
> bin/start-impala-cluster.py --catalogd_args="--hms_event_polling_interval_s=2 
> --catalog_topic_mode=minimal" --impalad_args="--use_local_catalog=1"
> {code}
> From Hive:
> {code}
> CREATE DATABASE temp;
> CREATE EXTERNAL TABLE temp.t (i int) PARTITIONED BY (p int) 
> TBLPROPERTIES('impala.disableHmsSync'='true');
> ALTER TABLE temp.t SET TBLPROPERTIES ('impala.disableHmsSync'='false');
> {code}
> From this point event sync will be broken in Impala. It can be fixed only 
> with global INVALIDATE METADATA (or restarting catalogd)
> catalogd log will include an exception like this:
> {code}
> E1026 10:30:16.151208 22514 MetastoreEventsProcessor.java:653] Event 
> processing needs a invalidate command to resolve the state
> Java exception follows:
> org.apache.impala.catalog.events.MetastoreNotificationNeedsInvalidateException:
>  EventId: 15956 EventType: ALTER_TABLE Detected that event sync was tur
> ned on for the table temp.t and the table does not exist. Event processing 
> cannot be continued further. Issue a invalidate metadata command to reset
>  the event processing state
> at 
> org.apache.impala.catalog.events.MetastoreEvents$AlterTableEvent.process(MetastoreEvents.java:992)
> at 
> org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:345)
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:747)
> at 
> org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:645)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> and future events will be lead to a log like this:
> {code}
> W1026 10:30:18.151962 22514 MetastoreEventsProcessor.java:638] Event 
> processing is skipped since status is NEEDS_INVALIDATE. Last synced event id 
> is 15955
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10897) TestEventProcessing.test_event_based_replication is flaky

2021-10-20 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10897.
--
Fix Version/s: Impala 4.0.1
   Resolution: Fixed

This test has been disabled as part of IMPALA-9857

> TestEventProcessing.test_event_based_replication is flaky
> -
>
> Key: IMPALA-10897
> URL: https://issues.apache.org/jira/browse/IMPALA-10897
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Vihang Karajgaonkar
>Priority: Critical
> Fix For: Impala 4.0.1
>
>
> Saw this in an ASAN build:
> {code:python}
> metadata/test_event_processing.py:185: in test_event_based_replication
> self.__run_event_based_replication_tests()
> metadata/test_event_processing.py:326: in __run_event_based_replication_tests
> EventProcessorUtils.wait_for_event_processing(self)
> util/event_processor_utils.py:61: in wait_for_event_processing
> within {1} seconds".format(current_event_id, timeout))
> E   Exception: Event processor did not sync till last known event id 34722
>within 10 seconds {code}
> Standard Error
> {code}
> SET 
> client_identifier=metadata/test_event_processing.py::TestEventProcessing::()::test_event_based_replication;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> -- 2021-08-28 23:43:40,300 INFO MainThread: Closing active operation
> -- connecting to localhost:28000 with impyla
> -- 2021-08-28 23:43:40,323 INFO MainThread: Closing active operation
> -- connecting to localhost:11050 with impyla
> -- 2021-08-28 23:43:48,026 INFO MainThread: Waiting until events 
> processor syncs to event id:31451
> -- 2021-08-28 23:43:48,759 DEBUGMainThread: Metric last-synced-event-id 
> has reached the desired value:31455
> -- 2021-08-28 23:43:48,790 DEBUGMainThread: Found 3 impalad/1 
> statestored/1 catalogd process(es)
> -- 2021-08-28 23:43:48,820 INFO MainThread: Getting metric: 
> catalog.curr-version from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000
> -- 2021-08-28 23:43:48,824 INFO MainThread: Sleeping 1s before next retry.
> -- 2021-08-28 23:43:49,825 INFO MainThread: Getting metric: 
> catalog.curr-version from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000
> -- 2021-08-28 23:43:49,829 INFO MainThread: Sleeping 1s before next retry.
> -- 2021-08-28 23:43:50,830 INFO MainThread: Getting metric: 
> catalog.curr-version from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000
> -- 2021-08-28 23:43:50,835 INFO MainThread: Sleeping 1s before next retry.
> -- 2021-08-28 23:43:51,836 INFO MainThread: Getting metric: 
> catalog.curr-version from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000
> -- 2021-08-28 23:43:51,840 INFO MainThread: Sleeping 1s before next retry.
> -- 2021-08-28 23:43:52,841 INFO MainThread: Getting metric: 
> catalog.curr-version from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000
> -- 2021-08-28 23:43:52,846 INFO MainThread: Metric 'catalog.curr-version' 
> has reached desired value: 2364
> -- 2021-08-28 23:43:52,846 INFO MainThread: Getting metric: 
> catalog.curr-version from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25001
> -- 2021-08-28 23:43:52,851 INFO MainThread: Metric 'catalog.curr-version' 
> has reached desired value: 2364
> -- 2021-08-28 23:43:52,851 INFO MainThread: Getting metric: 
> catalog.curr-version from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25002
> -- 2021-08-28 23:43:52,855 INFO MainThread: Metric 'catalog.curr-version' 
> has reached desired value: 2364
> -- executing against localhost:21000
> create table repl_source_ugchr.unpart_tbl (a string, b string) stored as 
> parquet tblproperties 
> ('transactional'='true','transactional_properties'='insert_only');
> -- 2021-08-28 23:43:52,878 INFO MainThread: Started query 
> 394339b6db812c59:a5e5039a
> -- executing against localhost:21000
> create table repl_source_ugchr.part_tbl (id int, bool_col boolean, 
> tinyint_col tinyint, smallint_col smallint, int_col int, bigint_col bigint, 
> float_col float, double_col double, date_string string, string_col string, 
> timestamp_col timestamp) partitioned by (year int, month int) stored as 
> parquet tblproperties 
> ('transactional'='true','transactional_properties'='insert_only');
> -- 2021-08-28 23:43:52,900 INFO MainThread: Started query 
> b74f5e32e4c1790a:46410750
> -- executing against localhost:21000
> insert into repl_source_ugchr.unpart_tbl select * from functional.tinytable;
> -- 2021-08-28 23:43:56,132 INFO MainThread: Started query 
> 

[jira] [Resolved] (IMPALA-10897) TestEventProcessing.test_event_based_replication is flaky

2021-10-20 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10897.
--
Fix Version/s: Impala 4.0.1
   Resolution: Fixed

This test has been disabled as part of IMPALA-9857

> TestEventProcessing.test_event_based_replication is flaky
> -
>
> Key: IMPALA-10897
> URL: https://issues.apache.org/jira/browse/IMPALA-10897
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Vihang Karajgaonkar
>Priority: Critical
> Fix For: Impala 4.0.1
>
>
> Saw this in an ASAN build:
> {code:python}
> metadata/test_event_processing.py:185: in test_event_based_replication
> self.__run_event_based_replication_tests()
> metadata/test_event_processing.py:326: in __run_event_based_replication_tests
> EventProcessorUtils.wait_for_event_processing(self)
> util/event_processor_utils.py:61: in wait_for_event_processing
> within {1} seconds".format(current_event_id, timeout))
> E   Exception: Event processor did not sync till last known event id 34722
>within 10 seconds {code}
> Standard Error
> {code}
> SET 
> client_identifier=metadata/test_event_processing.py::TestEventProcessing::()::test_event_based_replication;
> -- connecting to: localhost:21000
> -- connecting to localhost:21050 with impyla
> -- 2021-08-28 23:43:40,300 INFO MainThread: Closing active operation
> -- connecting to localhost:28000 with impyla
> -- 2021-08-28 23:43:40,323 INFO MainThread: Closing active operation
> -- connecting to localhost:11050 with impyla
> -- 2021-08-28 23:43:48,026 INFO MainThread: Waiting until events 
> processor syncs to event id:31451
> -- 2021-08-28 23:43:48,759 DEBUGMainThread: Metric last-synced-event-id 
> has reached the desired value:31455
> -- 2021-08-28 23:43:48,790 DEBUGMainThread: Found 3 impalad/1 
> statestored/1 catalogd process(es)
> -- 2021-08-28 23:43:48,820 INFO MainThread: Getting metric: 
> catalog.curr-version from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000
> -- 2021-08-28 23:43:48,824 INFO MainThread: Sleeping 1s before next retry.
> -- 2021-08-28 23:43:49,825 INFO MainThread: Getting metric: 
> catalog.curr-version from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000
> -- 2021-08-28 23:43:49,829 INFO MainThread: Sleeping 1s before next retry.
> -- 2021-08-28 23:43:50,830 INFO MainThread: Getting metric: 
> catalog.curr-version from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000
> -- 2021-08-28 23:43:50,835 INFO MainThread: Sleeping 1s before next retry.
> -- 2021-08-28 23:43:51,836 INFO MainThread: Getting metric: 
> catalog.curr-version from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000
> -- 2021-08-28 23:43:51,840 INFO MainThread: Sleeping 1s before next retry.
> -- 2021-08-28 23:43:52,841 INFO MainThread: Getting metric: 
> catalog.curr-version from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000
> -- 2021-08-28 23:43:52,846 INFO MainThread: Metric 'catalog.curr-version' 
> has reached desired value: 2364
> -- 2021-08-28 23:43:52,846 INFO MainThread: Getting metric: 
> catalog.curr-version from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25001
> -- 2021-08-28 23:43:52,851 INFO MainThread: Metric 'catalog.curr-version' 
> has reached desired value: 2364
> -- 2021-08-28 23:43:52,851 INFO MainThread: Getting metric: 
> catalog.curr-version from 
> impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25002
> -- 2021-08-28 23:43:52,855 INFO MainThread: Metric 'catalog.curr-version' 
> has reached desired value: 2364
> -- executing against localhost:21000
> create table repl_source_ugchr.unpart_tbl (a string, b string) stored as 
> parquet tblproperties 
> ('transactional'='true','transactional_properties'='insert_only');
> -- 2021-08-28 23:43:52,878 INFO MainThread: Started query 
> 394339b6db812c59:a5e5039a
> -- executing against localhost:21000
> create table repl_source_ugchr.part_tbl (id int, bool_col boolean, 
> tinyint_col tinyint, smallint_col smallint, int_col int, bigint_col bigint, 
> float_col float, double_col double, date_string string, string_col string, 
> timestamp_col timestamp) partitioned by (year int, month int) stored as 
> parquet tblproperties 
> ('transactional'='true','transactional_properties'='insert_only');
> -- 2021-08-28 23:43:52,900 INFO MainThread: Started query 
> b74f5e32e4c1790a:46410750
> -- executing against localhost:21000
> insert into repl_source_ugchr.unpart_tbl select * from functional.tinytable;
> -- 2021-08-28 23:43:56,132 INFO MainThread: Started query 
> 

[jira] [Commented] (IMPALA-9857) Batch ALTER_PARTITION events

2021-10-04 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424042#comment-17424042
 ] 

Vihang Karajgaonkar commented on IMPALA-9857:
-

IMPALA-10949 is created as a follow-up which can improve the batching logic 
significantly.

> Batch ALTER_PARTITION events
> 
>
> Key: IMPALA-9857
> URL: https://issues.apache.org/jira/browse/IMPALA-9857
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> When Hive inserts data into partitioned tables, it generates a lot of 
> ALTER_PARTITION (and possibly INSERT_EVENT) in quick succession. Currently, 
> such events are processed one by one by EventsProcessor which is can be slow 
> and can cause EventsProcessor to lag behind. This JIRA proposes to use 
> batching for such ALTER_PARTITION events such that all the successive 
> ALTER_PARTITION events for the same table are batched together into one 
> ALTER_PARTITIONS event and then are processed together to refresh all the 
> partitions from the events. This can significantly speed up the event 
> processing in such cases.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10949) Improve batching logic of events

2021-10-01 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10949:


 Summary: Improve batching logic of events
 Key: IMPALA-10949
 URL: https://issues.apache.org/jira/browse/IMPALA-10949
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar


This is a followup based on the review comment 
https://gerrit.cloudera.org/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641

Current approach of batching batches together the events from a single 
operation so that self-event check is done per-batch. However, it looks like 
there is a considerable scope of improving the batching logic by clubbing 
together accross the various sources of the events on a table when IMPALA-10926 
is merged. After IMPALA-10926 each table will track the last_synced_event and 
then the events processor can simply ignore a event which <= the 
last_synced_event. This simplification of self-events logic will enable easier 
batching for all the events of a type on a table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10949) Improve batching logic of events

2021-10-01 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10949:


 Summary: Improve batching logic of events
 Key: IMPALA-10949
 URL: https://issues.apache.org/jira/browse/IMPALA-10949
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar


This is a followup based on the review comment 
https://gerrit.cloudera.org/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641

Current approach of batching batches together the events from a single 
operation so that self-event check is done per-batch. However, it looks like 
there is a considerable scope of improving the batching logic by clubbing 
together accross the various sources of the events on a table when IMPALA-10926 
is merged. After IMPALA-10926 each table will track the last_synced_event and 
then the events processor can simply ignore a event which <= the 
last_synced_event. This simplification of self-events logic will enable easier 
batching for all the events of a type on a table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IMPALA-10236) Queries stuck if catalog topic update compression fails

2021-09-30 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned IMPALA-10236:


Assignee: Vihang Karajgaonkar

> Queries stuck if catalog topic update compression fails
> ---
>
> Key: IMPALA-10236
> URL: https://issues.apache.org/jira/browse/IMPALA-10236
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.12.0
>Reporter: Shant Hovsepian
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>  Labels: hang, supportability
>
> If a to be compressed Catalog Object doesn't fit into a 2GB buffer, an error 
> is thrown. 
>  
> {code:java}
> /// Compresses a serialized catalog object using LZ4 and stores it back in 
> 'dst'. Stores
> /// the size of the uncompressed catalog object in the first sizeof(uint32_t) 
> bytes of
> /// 'dst'. The compression fails if the uncompressed data size exceeds 
> 0x7E00 bytes.
> Status CompressCatalogObject(const uint8_t* src, uint32_t size, std::string* 
> dst)
> WARN_UNUSED_RESULT;
> {code}
>  
> CatalogServer::AddPendingTopicItem() calls CompressCatalogObject()
>  
> {code:java}
> // Add a catalog update to pending_topic_updates_.
> extern "C"
> JNIEXPORT jboolean JNICALL
> Java_org_apache_impala_service_FeSupport_NativeAddPendingTopicItem(JNIEnv* 
> env,
> jclass caller_class, jlong native_catalog_server_ptr, jstring key, jlong 
> version,
> jbyteArray serialized_object, jboolean deleted) {
>   std::string key_string;
>   {
> JniUtfCharGuard key_str;
> if (!JniUtfCharGuard::create(env, key, _str).ok()) {
>   return static_cast(false);
> }
> key_string.assign(key_str.get());
>   }
>   JniScopedArrayCritical obj_buf;
>   if (!JniScopedArrayCritical::Create(env, serialized_object, _buf)) {
> return static_cast(false);
>   }
>   reinterpret_cast(native_catalog_server_ptr)->
>   AddPendingTopicItem(std::move(key_string), version, obj_buf.get(),
>   static_cast(obj_buf.size()), deleted);
>   return static_cast(true);
> }
> {code}
> However the JNI call to AddPendingTopicItem discards the return value.
> Recently the return value was maintained due to IMPALA-10076:
> {code:java}
> -if (!FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr, 
> v1Key,
> -obj.catalog_version, data, delete)) {
> +int actualSize = 
> FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr,
> +v1Key, obj.catalog_version, data, delete);
> +if (actualSize < 0) {
>LOG.error("NativeAddPendingTopicItem failed in BE. key=" + v1Key + 
> ", delete="
>+ delete + ", data_size=" + data.length);
> +} else if (summary != null && obj.type == HDFS_PARTITION) {
> +  summary.update(true, delete, obj.hdfs_partition.partition_name,
> +  obj.catalog_version, data.length, actualSize);
>  }
>}
> {code}
> CatalogServiceCatalog::addCatalogObject() now produces an error message but 
> the Catalog update doesn't go through.
> {code:java}
>   if (topicMode_ == TopicMode.FULL || topicMode_ == TopicMode.MIXED) {
> String v1Key = CatalogServiceConstants.CATALOG_TOPIC_V1_PREFIX + key;
> byte[] data = serializer.serialize(obj);
> int actualSize = 
> FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr,
> v1Key, obj.catalog_version, data, delete);
> if (actualSize < 0) {
>   LOG.error("NativeAddPendingTopicItem failed in BE. key=" + v1Key + 
> ", delete="
>   + delete + ", data_size=" + data.length);
> } else if (summary != null && obj.type == HDFS_PARTITION) {
>   summary.update(true, delete, obj.hdfs_partition.partition_name,
>   obj.catalog_version, data.length, actualSize);
> }
>   }
> {code}
> Not sure what the right behavior would be, we could handle the compression 
> issue and try more aggressive compression, or unblock the catalog update.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10925) Improved self event detection for event processor in catalogd

2021-09-20 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417830#comment-17417830
 ] 

Vihang Karajgaonkar commented on IMPALA-10925:
--

I think the problem of consecutive create and drop events is not present any 
more because we keep a createEventId. The redesign generalizes existing 
approach to keep a lastSyncedEventId instead of createEventId so that we can 
use a similar mechanism for ALTER events.

> Improved self event detection for event processor in catalogd 
> --
>
> Key: IMPALA-10925
> URL: https://issues.apache.org/jira/browse/IMPALA-10925
> Project: IMPALA
>  Issue Type: Epic
>  Components: Catalog
>Reporter: Sourabh Goyal
>Assignee: Sourabh Goyal
>Priority: Major
>
> h3. Problem Statement
> Impala catalogd has Events processor which polls metastore events at regular 
> intervals to automatically apply changes to the metadata in the catalogd. 
> However, the current design to detect the self-generated events (DDL/DMLs 
> coming from the same catalogd) have consistency problems which can cause 
> query failures under certain circumstances.
>  
> h3. Current Design
> The current design of self-event detection is based on adding markers to the 
> HMS objects which are detected when the event is received later to determine 
> if the event is self-generated or not. These markers constitute a serviceID 
> which is unique to the catalogd instance and a catalog version number which 
> is unique for each catalog object. When a DDL is executed, catalogd adds 
> these as object parameters. When the event is received, Events processor 
> checks the serviceID and if the catalog version of the current object with 
> the same name in the catalogd cache and makes a decision of whether to ignore 
> the event or not.
>  
> h3. Problems with the current design
> The approach is problematic under some circumstances where there are 
> conflicting DDLs repeated at a faster interval. For example, a sequence of 
> create/drop table DDLs will generate CREATE_TABLE and DROP_TABLE events. When 
> the events are received, it is possible that the CREATE_TABLE event is 
> processed because the catalogd doesn’t have the table in the catalogd cache. 
> h3. Proposed Solution
> The main idea of the solution is to keep track of the last event id for a 
> given table as eventId which the catalogd has synced to in the Table object. 
> The events processor ignores any event whose EVENT_ID is less than or equal 
> to the eventId stored in the table. Once the events processor successfully 
> processes a given event, it updates the value of eventId in the table before 
> releasing the table lock. Also, any DDL or refresh operation on the catalogd 
> will follow the steps given below to update the event id for the table. The 
> solution relies on the existing locking mechanism in the catalogd to prevent 
> any other concurrent updates to the table (even via EventsProcessor).
>  
> In case of database objects, we will also have a similar eventId which 
> represents the events on the database object (CREATE, DROP, ALTER database) 
> and to which the catalogd as synced to. Since there is no refresh database 
> command, catalogOpExecutor will only update the database eventId when there 
> are DDLs at the database level (e.g CREATE, DROP, ALTER database)
>  
> cc - [~vihangk1] [~kishendas]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10924) TestIcebergTable.test_partitioned_insert fails with IOException

2021-09-20 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10924:


 Summary: TestIcebergTable.test_partitioned_insert fails with 
IOException
 Key: IMPALA-10924
 URL: https://issues.apache.org/jira/browse/IMPALA-10924
 Project: IMPALA
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Zoltán Borók-Nagy


The test query_test.test_iceberg.TestIcebergTable.test_partitioned_insert fails 
intermittently with a IOException and stack trace below.

{noformat}
uery_test/test_iceberg.py:80: in test_partitioned_insert
use_db=unique_database)
common/impala_test_suite.py:682: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:620: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:940: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:212: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:189: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:367: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:388: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:RuntimeIOException: Failed to write json to file: 
hdfs://localhost:20500/test-warehouse/test_partitioned_insert_af8be2c3.db/ice_only_part/metadata/2-b8d13a74-4839-4dd3-b74a-6df9436774a2.metadata.json
E   CAUSED BY: IOException: The stream is closed
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10924) TestIcebergTable.test_partitioned_insert fails with IOException

2021-09-20 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10924:


 Summary: TestIcebergTable.test_partitioned_insert fails with 
IOException
 Key: IMPALA-10924
 URL: https://issues.apache.org/jira/browse/IMPALA-10924
 Project: IMPALA
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Zoltán Borók-Nagy


The test query_test.test_iceberg.TestIcebergTable.test_partitioned_insert fails 
intermittently with a IOException and stack trace below.

{noformat}
uery_test/test_iceberg.py:80: in test_partitioned_insert
use_db=unique_database)
common/impala_test_suite.py:682: in run_test_case
result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
common/impala_test_suite.py:620: in __exec_in_impala
result = self.__execute_query(target_impalad_client, query, user=user)
common/impala_test_suite.py:940: in __execute_query
return impalad_client.execute(query, user=user)
common/impala_connection.py:212: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:189: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:367: in __execute_query
self.wait_for_finished(handle)
beeswax/impala_beeswax.py:388: in wait_for_finished
raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EQuery aborted:RuntimeIOException: Failed to write json to file: 
hdfs://localhost:20500/test-warehouse/test_partitioned_insert_af8be2c3.db/ice_only_part/metadata/2-b8d13a74-4839-4dd3-b74a-6df9436774a2.metadata.json
E   CAUSED BY: IOException: The stream is closed
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IMPALA-10922) test_orc_stats failing on exhaustive builds

2021-09-20 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-10922:
-
Issue Type: Bug  (was: Test)

> test_orc_stats failing on exhaustive builds
> ---
>
> Key: IMPALA-10922
> URL: https://issues.apache.org/jira/browse/IMPALA-10922
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Norbert Luksa
>Priority: Blocker
>  Labels: broken-build
>
> test_orc_stats.py is failing on certain exhaustive builds. The stack trace of 
> the failure looks like below.
> {noformat}
> query_test/test_orc_stats.py:40: in test_orc_stats
> self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database)
> common/impala_test_suite.py:779: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:653: in verify_runtime_profile
> % (function, field, expected_value, actual_value, op, actual))
> E   AssertionError: Aggregation of SUM over RowsRead did not match expected 
> results.
> E   EXPECTED VALUE:
> E   0
> E   
> E   
> E   ACTUAL VALUE:
> E   10
> E   
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10922) test_orc_stats failing on exhaustive builds

2021-09-20 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417770#comment-17417770
 ] 

Vihang Karajgaonkar commented on IMPALA-10922:
--

Assigning this to you [~norbertluksa] since it may be related to your recent 
commit IMPALA-6505.

> test_orc_stats failing on exhaustive builds
> ---
>
> Key: IMPALA-10922
> URL: https://issues.apache.org/jira/browse/IMPALA-10922
> Project: IMPALA
>  Issue Type: Test
>Reporter: Vihang Karajgaonkar
>Assignee: Norbert Luksa
>Priority: Blocker
>  Labels: broken-build
>
> test_orc_stats.py is failing on certain exhaustive builds. The stack trace of 
> the failure looks like below.
> {noformat}
> query_test/test_orc_stats.py:40: in test_orc_stats
> self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database)
> common/impala_test_suite.py:779: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:653: in verify_runtime_profile
> % (function, field, expected_value, actual_value, op, actual))
> E   AssertionError: Aggregation of SUM over RowsRead did not match expected 
> results.
> E   EXPECTED VALUE:
> E   0
> E   
> E   
> E   ACTUAL VALUE:
> E   10
> E   
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10922) test_orc_stats failing on exhaustive builds

2021-09-20 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10922:


 Summary: test_orc_stats failing on exhaustive builds
 Key: IMPALA-10922
 URL: https://issues.apache.org/jira/browse/IMPALA-10922
 Project: IMPALA
  Issue Type: Test
Reporter: Vihang Karajgaonkar
Assignee: Norbert Luksa


test_orc_stats.py is failing on certain exhaustive builds. The stack trace of 
the failure looks like below.

{noformat}
query_test/test_orc_stats.py:40: in test_orc_stats
self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database)
common/impala_test_suite.py:779: in run_test_case
update_section=pytest.config.option.update_results)
common/test_result_verifier.py:653: in verify_runtime_profile
% (function, field, expected_value, actual_value, op, actual))
E   AssertionError: Aggregation of SUM over RowsRead did not match expected 
results.
E   EXPECTED VALUE:
E   0
E   
E   
E   ACTUAL VALUE:
E   10
E   
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10922) test_orc_stats failing on exhaustive builds

2021-09-20 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10922:


 Summary: test_orc_stats failing on exhaustive builds
 Key: IMPALA-10922
 URL: https://issues.apache.org/jira/browse/IMPALA-10922
 Project: IMPALA
  Issue Type: Test
Reporter: Vihang Karajgaonkar
Assignee: Norbert Luksa


test_orc_stats.py is failing on certain exhaustive builds. The stack trace of 
the failure looks like below.

{noformat}
query_test/test_orc_stats.py:40: in test_orc_stats
self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database)
common/impala_test_suite.py:779: in run_test_case
update_section=pytest.config.option.update_results)
common/test_result_verifier.py:653: in verify_runtime_profile
% (function, field, expected_value, actual_value, op, actual))
E   AssertionError: Aggregation of SUM over RowsRead did not match expected 
results.
E   EXPECTED VALUE:
E   0
E   
E   
E   ACTUAL VALUE:
E   10
E   
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10888) getPartitionsByNames should return partitions sorted by name

2021-09-14 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10888.
--
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> getPartitionsByNames should return partitions sorted by name
> 
>
> Key: IMPALA-10888
> URL: https://issues.apache.org/jira/browse/IMPALA-10888
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> The CatalogMetastoreServer's implementation of {{getPartitionByNames}} does 
> not return partitions order by partition name whereas in case of HMS it 
> orders them by partition name. While this is not a documented behavior and 
> clients should not assume this it can cause test flakiness where we expect 
> the order of the partitions to be consistent. We should change the 
> implementation so that the returned partitions over this API are sorted by 
> partition name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10888) getPartitionsByNames should return partitions sorted by name

2021-09-14 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10888.
--
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> getPartitionsByNames should return partitions sorted by name
> 
>
> Key: IMPALA-10888
> URL: https://issues.apache.org/jira/browse/IMPALA-10888
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> The CatalogMetastoreServer's implementation of {{getPartitionByNames}} does 
> not return partitions order by partition name whereas in case of HMS it 
> orders them by partition name. While this is not a documented behavior and 
> clients should not assume this it can cause test flakiness where we expect 
> the order of the partitions to be consistent. We should change the 
> implementation so that the returned partitions over this API are sorted by 
> partition name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10907) Refactor MetastoreEvents class

2021-09-07 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10907:


 Summary: Refactor MetastoreEvents class
 Key: IMPALA-10907
 URL: https://issues.apache.org/jira/browse/IMPALA-10907
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The MetastoreEvents.java is single class which has a bunch of inner classes 
(most of which are public). The file has become pretty large and it would make 
sense to refactor the file into separate classes to improve code readability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10776) Hold write lock for less time in ALTER TABLE RECOVER PARTITIONS

2021-09-07 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned IMPALA-10776:


Assignee: Vihang Karajgaonkar

> Hold write lock for less time in ALTER TABLE RECOVER PARTITIONS
> ---
>
> Key: IMPALA-10776
> URL: https://issues.apache.org/jira/browse/IMPALA-10776
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Csaba Ringhofer
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> ALTER TABLE RECOVER PARTITIONS holds a write lock on the table for the whole 
> time while it lists the HDFS directories and creates the new partitions in 
> HMS. This can potentially take a long time and block catalog updates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10776) Hold write lock for less time in ALTER TABLE RECOVER PARTITIONS

2021-09-07 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411430#comment-17411430
 ] 

Vihang Karajgaonkar commented on IMPALA-10776:
--

I can take a stab at this.

> Hold write lock for less time in ALTER TABLE RECOVER PARTITIONS
> ---
>
> Key: IMPALA-10776
> URL: https://issues.apache.org/jira/browse/IMPALA-10776
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Csaba Ringhofer
>Priority: Major
>
> ALTER TABLE RECOVER PARTITIONS holds a write lock on the table for the whole 
> time while it lists the HDFS directories and creates the new partitions in 
> HMS. This can potentially take a long time and block catalog updates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-10888) getPartitionsByNames should return partitions sorted by name

2021-08-26 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10888 started by Vihang Karajgaonkar.

> getPartitionsByNames should return partitions sorted by name
> 
>
> Key: IMPALA-10888
> URL: https://issues.apache.org/jira/browse/IMPALA-10888
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> The CatalogMetastoreServer's implementation of {{getPartitionByNames}} does 
> not return partitions order by partition name whereas in case of HMS it 
> orders them by partition name. While this is not a documented behavior and 
> clients should not assume this it can cause test flakiness where we expect 
> the order of the partitions to be consistent. We should change the 
> implementation so that the returned partitions over this API are sorted by 
> partition name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10888) getPartitionsByNames should return partitions sorted by name

2021-08-26 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10888:


 Summary: getPartitionsByNames should return partitions sorted by 
name
 Key: IMPALA-10888
 URL: https://issues.apache.org/jira/browse/IMPALA-10888
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The CatalogMetastoreServer's implementation of {{getPartitionByNames}} does not 
return partitions order by partition name whereas in case of HMS it orders them 
by partition name. While this is not a documented behavior and clients should 
not assume this it can cause test flakiness where we expect the order of the 
partitions to be consistent. We should change the implementation so that the 
returned partitions over this API are sorted by partition name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10888) getPartitionsByNames should return partitions sorted by name

2021-08-26 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10888:


 Summary: getPartitionsByNames should return partitions sorted by 
name
 Key: IMPALA-10888
 URL: https://issues.apache.org/jira/browse/IMPALA-10888
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


The CatalogMetastoreServer's implementation of {{getPartitionByNames}} does not 
return partitions order by partition name whereas in case of HMS it orders them 
by partition name. While this is not a documented behavior and clients should 
not assume this it can cause test flakiness where we expect the order of the 
partitions to be consistent. We should change the implementation so that the 
returned partitions over this API are sorted by partition name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-10885) TestMetastoreService.test_get_table_req_without_fallback fails in a S3 build

2021-08-26 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405368#comment-17405368
 ] 

Vihang Karajgaonkar commented on IMPALA-10885:
--

Thanks [~stigahuang] I will take a look.

> TestMetastoreService.test_get_table_req_without_fallback fails in a S3 build
> 
>
> Key: IMPALA-10885
> URL: https://issues.apache.org/jira/browse/IMPALA-10885
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>  Labels: broken-build
>
> custom_cluster.test_metastore_service.TestMetastoreService.test_get_table_req_without_fallback
> {code:java}
> custom_cluster/test_metastore_service.py:269: in 
> test_get_table_req_without_fallback
> get_table_request, expected_exception_str)
> custom_cluster/test_metastore_service.py:1215: in 
> __call_get_table_req_expect_exception
> assert expected_exception_str in str(e)
> E   assert 'Database test_get_table_req_without_fallback_dbgiioi not found' 
> in "NoSuchObjectException(_message='Table 
> test_get_table_req_without_fallback_dbgiioi.test_get_table_req_tblglidw not 
> found')"
> E+  where "NoSuchObjectException(_message='Table 
> test_get_table_req_without_fallback_dbgiioi.test_get_table_req_tblglidw not 
> found')" = str(NoSuchObjectException(_message='Table 
> test_get_table_req_without_fallback_dbgiioi.test_get_table_req_tblglidw not 
> found')){code}
> The commit of the build is 237ed5e8738ec565bc8d3ce813d9b70c12ad4ce7.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9057) TestEventProcessing.test_insert_events_transactional is flaky

2021-08-23 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403332#comment-17403332
 ] 

Vihang Karajgaonkar commented on IMPALA-9057:
-

Just to update the latest here, I found that HMS has new a API which was 
introduced in https://issues.apache.org/jira/browse/HIVE-25137 which gives the 
clients the ability to fetch the WriteId information given the commit 
transaction id. Using this API we can enhance the MetastoreEventsProcessor to 
fetch ACID_WRITE events from HMS and refresh the ACID tables, when commit 
transaction event is received. This should fix the race condition described 
above.

I will see if we can bump up the GBN which includes the new HMS API.

> TestEventProcessing.test_insert_events_transactional is flaky
> -
>
> Key: IMPALA-9057
> URL: https://issues.apache.org/jira/browse/IMPALA-9057
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.4.0
>Reporter: Alice Fan
>Assignee: Vihang Karajgaonkar
>Priority: Blocker
>  Labels: build-failure, flaky
>
> Assertion failure for 
> custom_cluster.test_event_processing.TestEventProcessing.test_insert_events_transactional
>  
> {code:java}
> Error Message
> assert ['101', 'x', ..., '3', '2019'] == ['101', 'z', '28', '3', '2019']   At 
> index 1 diff: 'x' != 'z'   Full diff:   - ['101', 'x', '28', '3', '2019']   ? 
>  ^   + ['101', 'z', '28', '3', '2019']   ?  ^
> Stacktrace
> custom_cluster/test_event_processing.py:49: in 
> test_insert_events_transactional
> self.run_test_insert_events(is_transactional=True)
> custom_cluster/test_event_processing.py:131: in run_test_insert_events
> assert data.split('\t') == ['101', 'z', '28', '3', '2019']
> E   assert ['101', 'x', ..., '3', '2019'] == ['101', 'z', '28', '3', '2019']
> E At index 1 diff: 'x' != 'z'
> E Full diff:
> E - ['101', 'x', '28', '3', '2019']
> E ?  ^
> E + ['101', 'z', '28', '3', '2019']
> E ?  ^
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7954) Support automatic invalidates using metastore notification events

2021-08-09 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-7954.
-
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Support automatic invalidates using metastore notification events
> -
>
> Key: IMPALA-7954
> URL: https://issues.apache.org/jira/browse/IMPALA-7954
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
> Attachments: Automatic_invalidate_DesignDoc_v1.pdf, 
> Impala_Catalogd_Auto_Metadata_Update_v2.pdf
>
>
> Currently, in Impala there are multiple ways to invalidate or refresh the 
> metadata stored in Catalog for Tables. Objects in Catalog can be invalidated 
> either on usage based approach (invalidate_tables_timeout_s) or when there is 
> GC pressure (invalidate_tables_on_memory_pressure) as added in IMPALA-7448. 
> However, most users issue invalidate commands when they want to sync to the 
> latest information from HDFS or HMS. Unfortunately, when data is modified or 
> new data is added outside Impala (eg. Hive) or a different Impala cluster, 
> users don't have a clear idea on whether they have to issue invalidate or 
> not. To be on the safer side, users keep issuing invalidate commands more 
> than necessary and it causes performance as well as stability issues.
> Hive Metastore provides a simple API to get incremental updates to the 
> metadata information stored in its database. Each API which does a 
> add/alter/drop operation in metastore generates event(s) which can be fetched 
> using {{get_next_notification}} API. Each event has a unique and increasing 
> event_id. The current notification event id can be fetched using 
> {{get_current_notificationEventId}} API.
> This JIRA proposes to make use of such events from metastore to proactively 
> either invalidate or refresh information in the catalogD. When configured, 
> CatalogD could poll for such events and take action (like add/drop/refresh 
> partition, add/drop/invalidate tables and databases) based on the events. 
> This way we can automatically refresh the catalogD state using events and it 
> would greatly help the use-cases where users want to see the latest 
> information (within a configurable interval of time delay) without flooding 
> the system with invalidate requests.
> I will be attaching a design doc to this JIRA and create subtasks for the 
> work. Feel free to make comments on the JIRA or make suggestions to improve 
> the design.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7954) Support automatic invalidates using metastore notification events

2021-08-09 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-7954.
-
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Support automatic invalidates using metastore notification events
> -
>
> Key: IMPALA-7954
> URL: https://issues.apache.org/jira/browse/IMPALA-7954
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
> Attachments: Automatic_invalidate_DesignDoc_v1.pdf, 
> Impala_Catalogd_Auto_Metadata_Update_v2.pdf
>
>
> Currently, in Impala there are multiple ways to invalidate or refresh the 
> metadata stored in Catalog for Tables. Objects in Catalog can be invalidated 
> either on usage based approach (invalidate_tables_timeout_s) or when there is 
> GC pressure (invalidate_tables_on_memory_pressure) as added in IMPALA-7448. 
> However, most users issue invalidate commands when they want to sync to the 
> latest information from HDFS or HMS. Unfortunately, when data is modified or 
> new data is added outside Impala (eg. Hive) or a different Impala cluster, 
> users don't have a clear idea on whether they have to issue invalidate or 
> not. To be on the safer side, users keep issuing invalidate commands more 
> than necessary and it causes performance as well as stability issues.
> Hive Metastore provides a simple API to get incremental updates to the 
> metadata information stored in its database. Each API which does a 
> add/alter/drop operation in metastore generates event(s) which can be fetched 
> using {{get_next_notification}} API. Each event has a unique and increasing 
> event_id. The current notification event id can be fetched using 
> {{get_current_notificationEventId}} API.
> This JIRA proposes to make use of such events from metastore to proactively 
> either invalidate or refresh information in the catalogD. When configured, 
> CatalogD could poll for such events and take action (like add/drop/refresh 
> partition, add/drop/invalidate tables and databases) based on the events. 
> This way we can automatically refresh the catalogD state using events and it 
> would greatly help the use-cases where users want to see the latest 
> information (within a configurable interval of time delay) without flooding 
> the system with invalidate requests.
> I will be attaching a design doc to this JIRA and create subtasks for the 
> work. Feel free to make comments on the JIRA or make suggestions to improve 
> the design.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IMPALA-10273) Support function events

2021-08-09 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-10273:
-
Parent: (was: IMPALA-7954)
Issue Type: Improvement  (was: Sub-task)

> Support function events
> ---
>
> Key: IMPALA-10273
> URL: https://issues.apache.org/jira/browse/IMPALA-10273
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> HMS creates ADD_FUNCTION, ALTER_FUNCTION and DROP_FUNCTION events when a 
> function is created/dropped/altered. We can add use these events to refresh 
> the functions in catalogd using the events processor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9857) Batch ALTER_PARTITION events

2021-08-09 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-9857:

Parent: (was: IMPALA-7954)
Issue Type: Improvement  (was: Sub-task)

> Batch ALTER_PARTITION events
> 
>
> Key: IMPALA-9857
> URL: https://issues.apache.org/jira/browse/IMPALA-9857
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> When Hive inserts data into partitioned tables, it generates a lot of 
> ALTER_PARTITION (and possibly INSERT_EVENT) in quick succession. Currently, 
> such events are processed one by one by EventsProcessor which is can be slow 
> and can cause EventsProcessor to lag behind. This JIRA proposes to use 
> batching for such ALTER_PARTITION events such that all the successive 
> ALTER_PARTITION events for the same table are batched together into one 
> ALTER_PARTITIONS event and then are processed together to refresh all the 
> partitions from the events. This can significantly speed up the event 
> processing in such cases.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8592) Add support for insert events for 'LOAD DATA..' statements from Impala.

2021-08-09 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-8592:

Parent: (was: IMPALA-7954)
Issue Type: Improvement  (was: Sub-task)

> Add support for insert events for 'LOAD DATA..' statements from Impala.
> ---
>
> Key: IMPALA-8592
> URL: https://issues.apache.org/jira/browse/IMPALA-8592
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Anurag Mantripragada
>Priority: Major
>
> Hive generates INSERT events for LOAD DATA.. statements. We should support 
> the same in Impala.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8795) Enable event polling by default in tests

2021-08-09 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-8795.
-
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Enable event polling by default in tests
> 
>
> Key: IMPALA-8795
> URL: https://issues.apache.org/jira/browse/IMPALA-8795
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> We should turn on event processing by default in all the tests to make sure 
> that there are no regressions when we turn ON the feature by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-8795) Enable event polling by default in tests

2021-08-09 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-8795.
-
Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Enable event polling by default in tests
> 
>
> Key: IMPALA-8795
> URL: https://issues.apache.org/jira/browse/IMPALA-8795
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> We should turn on event processing by default in all the tests to make sure 
> that there are no regressions when we turn ON the feature by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10815) Ignore events on non-default hive catalogs

2021-07-22 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10815.
--
Fix Version/s: Impala 4.1
   Resolution: Fixed

> Ignore events on non-default hive catalogs
> --
>
> Key: IMPALA-10815
> URL: https://issues.apache.org/jira/browse/IMPALA-10815
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Fix For: Impala 4.1
>
>
> Hive-3 introduces a new object called catalog which is like a namespace for 
> database and tables. Currently, Impala does not support hive catalog. 
> However, if there are events on such non-default catalogs the events 
> processing applies these events on the catalogd if the database and table 
> name matches. Until we support custom catalogs in hive we should ignore the 
> events coming from such non-default catalog objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10815) Ignore events on non-default hive catalogs

2021-07-22 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10815.
--
Fix Version/s: Impala 4.1
   Resolution: Fixed

> Ignore events on non-default hive catalogs
> --
>
> Key: IMPALA-10815
> URL: https://issues.apache.org/jira/browse/IMPALA-10815
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Fix For: Impala 4.1
>
>
> Hive-3 introduces a new object called catalog which is like a namespace for 
> database and tables. Currently, Impala does not support hive catalog. 
> However, if there are events on such non-default catalogs the events 
> processing applies these events on the catalogd if the database and table 
> name matches. Until we support custom catalogs in hive we should ignore the 
> events coming from such non-default catalog objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10815) Ignore events on non-default hive catalogs

2021-07-20 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10815:


 Summary: Ignore events on non-default hive catalogs
 Key: IMPALA-10815
 URL: https://issues.apache.org/jira/browse/IMPALA-10815
 Project: IMPALA
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Hive-3 introduces a new object called catalog which is like a namespace for 
database and tables. Currently, Impala does not support hive catalog. However, 
if there are events on such non-default catalogs the events processing applies 
these events on the catalogd if the database and table name matches. Until we 
support custom catalogs in hive we should ignore the events coming from such 
non-default catalog objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10815) Ignore events on non-default hive catalogs

2021-07-20 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10815:


 Summary: Ignore events on non-default hive catalogs
 Key: IMPALA-10815
 URL: https://issues.apache.org/jira/browse/IMPALA-10815
 Project: IMPALA
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Hive-3 introduces a new object called catalog which is like a namespace for 
database and tables. Currently, Impala does not support hive catalog. However, 
if there are events on such non-default catalogs the events processing applies 
these events on the catalogd if the database and table name matches. Until we 
support custom catalogs in hive we should ignore the events coming from such 
non-default catalog objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10468) DROP events which are generated while a batch is being processed may add table incorrectly

2021-07-20 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10468.
--
Fix Version/s: Impala 4.1
   Resolution: Duplicate

> DROP events which are generated while a batch is being processed may add 
> table incorrectly
> --
>
> Key: IMPALA-10468
> URL: https://issues.apache.org/jira/browse/IMPALA-10468
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1
>
>
> One of the problems with CREATE/DROP events is that they may occur while a 
> batch is being processed and hence EventsProcessor may not able aware of that.
> For example, consider the following sequence of statements:
> create table foo (c1 int);
> drop table foo;
> create table foo (c2 int);
> drop table foo;
> These statements will generate CREATE_TABLE, DROP_TABLE,  CREATE_TABLE, 
> DROP_TABLE event sequence. Generally, if all these 4 events are fetched in a 
> batch, then the first CREATE_TABLE and third CREATE_TABLE is ignored because 
> it is followed by the a DROP_TABLE in the sequence and the DROP_TABLE events 
> take no effect since the table doesn't exist in catalogd anymore.
> However, if the events processor fetches these events in 2 batches (3 and 1) 
> then after the first batch of CREATE_TABLE, DROP_TABLE,  CREATE_TABLE is 
> processed, the third event will add the table foo in the catalogd. The 
> subsequent batch's DROP_TABLE will be processed and remove the table, but 
> between the two batches, catalogd will say that a table called foo exists. 
> This can lead to statements getting errored out. Eg. a statement like create 
> table foo (c3 int) after the above statements will error out with a 
> TableAlreadyExists error.
> The problem happens for databases too. So far I have not been able to 
> reproduce this for Partitions but I don't see why it will not happen with 
> Partitions also.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10490) truncate table fails with IllegalStateException

2021-07-20 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10490.
--
Fix Version/s: Impala 4.1
   Resolution: Fixed

> truncate table fails with IllegalStateException
> ---
>
> Key: IMPALA-10490
> URL: https://issues.apache.org/jira/browse/IMPALA-10490
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1
>
>
> This is a problem for when events processing is turned on. I can reproduce it 
> by following steps.
> 1. start impala without events processing
> 2. create table, load data, compute stats on the table.
> 3. restart impala with events processing turned on
> 4. Run truncate table command.
> I can see the truncate table command fails with following error.
> [localhost:21050] default> truncate t5;
> Query: truncate t5
> ERROR: CatalogException: Failed to truncate table: default.t5.
> Table may be in a partially truncated state.
> CAUSED BY: IllegalStateException: Table parameters must have catalog service 
> identifier before adding it to partition parameters



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10502) delayed 'Invalidated objects in cache' cause 'Table already exists'

2021-07-13 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10502.
--
Fix Version/s: Impala 4.1
   Resolution: Fixed

> delayed 'Invalidated objects in cache' cause 'Table already exists'
> ---
>
> Key: IMPALA-10502
> URL: https://issues.apache.org/jira/browse/IMPALA-10502
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Clients, Frontend
>Affects Versions: Impala 3.4.0
>Reporter: Adriano
>Assignee: Vihang Karajgaonkar
>Priority: Critical
> Fix For: Impala 4.1
>
>
> In fast paced environment where the interval between the step 1 and 2 is # < 
> 100ms (a simplified pipeline looks like):
> 0- catalog 'on demand' in use and disableHmsSync (enabled or disabled: no 
> difference)
> 1- open session to coord A -> DROP TABLE X -> close session
> 2- open session to coord A -> CREATE TABLE X-> close session
> Results: the step -2- can fail with table already exist.
> During the internal investigation was discovered that IMPALA-9913 will 
> regress the issue in almost all scenarios.
> However considering that the investigation are internally ongoing it is nice 
> to have the event tracked also here.
> Once we are sure that IMPALA-9913 fix these events we can close this as 
> duplicate, in alternative carry on the investigation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10502) delayed 'Invalidated objects in cache' cause 'Table already exists'

2021-07-13 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10502.
--
Fix Version/s: Impala 4.1
   Resolution: Fixed

> delayed 'Invalidated objects in cache' cause 'Table already exists'
> ---
>
> Key: IMPALA-10502
> URL: https://issues.apache.org/jira/browse/IMPALA-10502
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Clients, Frontend
>Affects Versions: Impala 3.4.0
>Reporter: Adriano
>Assignee: Vihang Karajgaonkar
>Priority: Critical
> Fix For: Impala 4.1
>
>
> In fast paced environment where the interval between the step 1 and 2 is # < 
> 100ms (a simplified pipeline looks like):
> 0- catalog 'on demand' in use and disableHmsSync (enabled or disabled: no 
> difference)
> 1- open session to coord A -> DROP TABLE X -> close session
> 2- open session to coord A -> CREATE TABLE X-> close session
> Results: the step -2- can fail with table already exist.
> During the internal investigation was discovered that IMPALA-9913 will 
> regress the issue in almost all scenarios.
> However considering that the investigation are internally ongoing it is nice 
> to have the event tracked also here.
> Once we are sure that IMPALA-9913 fix these events we can close this as 
> duplicate, in alternative carry on the investigation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (IMPALA-10490) truncate table fails with IllegalStateException

2021-07-07 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10490 started by Vihang Karajgaonkar.

> truncate table fails with IllegalStateException
> ---
>
> Key: IMPALA-10490
> URL: https://issues.apache.org/jira/browse/IMPALA-10490
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> This is a problem for when events processing is turned on. I can reproduce it 
> by following steps.
> 1. start impala without events processing
> 2. create table, load data, compute stats on the table.
> 3. restart impala with events processing turned on
> 4. Run truncate table command.
> I can see the truncate table command fails with following error.
> [localhost:21050] default> truncate t5;
> Query: truncate t5
> ERROR: CatalogException: Failed to truncate table: default.t5.
> Table may be in a partially truncated state.
> CAUSED BY: IllegalStateException: Table parameters must have catalog service 
> identifier before adding it to partition parameters



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10768) Deflake CatalogHmsFileMetadataTest

2021-07-07 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10768.
--
Fix Version/s: Impala 4.1
   Resolution: Fixed

> Deflake CatalogHmsFileMetadataTest
> --
>
> Key: IMPALA-10768
> URL: https://issues.apache.org/jira/browse/IMPALA-10768
> Project: IMPALA
>  Issue Type: Test
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Fix For: Impala 4.1
>
>
> Some times we see CatalogHmsFileMetadataTest#testFileMetadataForPartitions 
> fail with following stack trace:
> {noformat}
> org.junit.ComparisonFailure: expected:<090[1]01.txt> but was:<090[2]01.txt>
>   at org.junit.Assert.assertEquals(Assert.java:115)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.impala.catalog.metastore.CatalogHmsFileMetadataTest.assertFdsAreSame(CatalogHmsFileMetadataTest.java:133)
>   at 
> org.apache.impala.catalog.metastore.CatalogHmsFileMetadataTest.testFileMetadataForPartitions(CatalogHmsFileMetadataTest.java:121)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)
> {noformat}
> I was not able to reproduce the error locally but based on the code 
> inspection it looks like this happens because the order of the 
> filedescriptors in the two lists is different.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10768) Deflake CatalogHmsFileMetadataTest

2021-07-07 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10768.
--
Fix Version/s: Impala 4.1
   Resolution: Fixed

> Deflake CatalogHmsFileMetadataTest
> --
>
> Key: IMPALA-10768
> URL: https://issues.apache.org/jira/browse/IMPALA-10768
> Project: IMPALA
>  Issue Type: Test
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Fix For: Impala 4.1
>
>
> Some times we see CatalogHmsFileMetadataTest#testFileMetadataForPartitions 
> fail with following stack trace:
> {noformat}
> org.junit.ComparisonFailure: expected:<090[1]01.txt> but was:<090[2]01.txt>
>   at org.junit.Assert.assertEquals(Assert.java:115)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.impala.catalog.metastore.CatalogHmsFileMetadataTest.assertFdsAreSame(CatalogHmsFileMetadataTest.java:133)
>   at 
> org.apache.impala.catalog.metastore.CatalogHmsFileMetadataTest.testFileMetadataForPartitions(CatalogHmsFileMetadataTest.java:121)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)
> {noformat}
> I was not able to reproduce the error locally but based on the code 
> inspection it looks like this happens because the order of the 
> filedescriptors in the two lists is different.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10768) Deflake CatalogHmsFileMetadataTest

2021-06-24 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10768:


 Summary: Deflake CatalogHmsFileMetadataTest
 Key: IMPALA-10768
 URL: https://issues.apache.org/jira/browse/IMPALA-10768
 Project: IMPALA
  Issue Type: Test
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Some times we see CatalogHmsFileMetadataTest#testFileMetadataForPartitions fail 
with following stack trace:

{noformat}
org.junit.ComparisonFailure: expected:<090[1]01.txt> but was:<090[2]01.txt>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.impala.catalog.metastore.CatalogHmsFileMetadataTest.assertFdsAreSame(CatalogHmsFileMetadataTest.java:133)
at 
org.apache.impala.catalog.metastore.CatalogHmsFileMetadataTest.testFileMetadataForPartitions(CatalogHmsFileMetadataTest.java:121)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)
{noformat}

I was not able to reproduce the error locally but based on the code inspection 
it looks like this happens because the order of the filedescriptors in the two 
lists is different.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO

2021-06-24 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368977#comment-17368977
 ] 

Vihang Karajgaonkar commented on IMPALA-10754:
--

Hi [~sql_forever] Is this issue resolved? I hit this test failure here: 
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4372/

> test_overlap_min_max_filters_on_sorted_columns failed during GVO
> 
>
> Key: IMPALA-10754
> URL: https://issues.apache.org/jira/browse/IMPALA-10754
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Zoltán Borók-Nagy
>Assignee: Qifan Chen
>Priority: Major
>  Labels: broken-build
>
> test_overlap_min_max_filters_on_sorted_columns failed in the following build:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/
> *Stack trace:*
> {noformat}
> query_test/test_runtime_filters.py:296: in 
> test_overlap_min_max_filters_on_sorted_columns
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)})
> common/impala_test_suite.py:734: in run_test_case
> update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:653: in verify_runtime_profile
> % (function, field, expected_value, actual_value, op, actual))
> E   AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not 
> match expected results.
> E   EXPECTED VALUE:
> E   58
> E   
> E   
> E   ACTUAL VALUE:
> E   59
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10759) MetastoreServiceHandler.get_partitions_by_names_req throws NoSuchMethodError

2021-06-22 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367628#comment-17367628
 ] 

Vihang Karajgaonkar commented on IMPALA-10759:
--

I took a brief look at this yesterday and I found that the issue happens when 
Impala is using 0.9.3 thrift version and hive is using 0.13.0 version. This 
happens because the {noformat}hashCode{noformat} method in the thrift generated 
code for HMS objects like Partition changes when you change thrift from 0.9.3 
to 0.13.0. Specifically the hashCode values for the primitive fields like long 
now use {noformat}TBaseHelper.hashCode{noformat} instead of the old way of add 
it to a ArrayList and then comparing the hashCode.

For example, in case of writeId field of the partition which is definied as a 
i64 in the thrift file, the hashCode is computed using TBaseHelper as seen in 
the diffs here 
https://github.com/apache/hive/commit/1945e2f67e5b09cdda40146b87e1ba492f897196#diff-505c537842790dadd6f182b07b0b216be40e050588941213220b4ae3622bd0faR877

I don't think there is a good way to "fix" this. This should get fixed 
automatically when Impala uses 0.11.0 thrift version. I confirmed that the 
build where we saw this failure did not have the Impala thrift version as 
0.11.0.


> MetastoreServiceHandler.get_partitions_by_names_req throws NoSuchMethodError
> 
>
> Key: IMPALA-10759
> URL: https://issues.apache.org/jira/browse/IMPALA-10759
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Yongzhi Chen
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>
> impala-cdpd-master-core 
> EnableCatalogdHmsCacheFlagTest.testEnableCatalogdCachingFlag test fails with 
> following stack:
> {noformat}
> Exception in thread "pool-470-thread-1" java.lang.NoSuchMethodError: 
> org.apache.thrift.TBaseHelper.hashCode(J)I
>   at 
> org.apache.hadoop.hive.metastore.api.Partition.hashCode(Partition.java:971)
>   at java.util.HashMap.hash(HashMap.java:338)
>   at java.util.HashMap.put(HashMap.java:611)
>   at 
> org.apache.impala.catalog.CatalogHmsAPIHelper.loadAndSetFileMetadataFromFs(CatalogHmsAPIHelper.java:527)
>   at 
> org.apache.impala.catalog.metastore.MetastoreServiceHandler.get_partitions_by_names_req(MetastoreServiceHandler.java:1443)
>   at 
> org.apache.impala.catalog.metastore.CatalogMetastoreServiceHandler.get_partitions_by_names_req(CatalogMetastoreServiceHandler.java:141)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.impala.catalog.metastore.CatalogMetastoreServer$TimingInvocationHandler.invoke(CatalogMetastoreServer.java:223)
>   at com.sun.proxy.$Proxy87.get_partitions_by_names_req(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_names_req.getResult(ThriftHiveMetastore.java:20087)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_names_req.getResult(ThriftHiveMetastore.java:20066)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-10759) MetastoreServiceHandler.get_partitions_by_names_req throws NoSuchMethodError

2021-06-22 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367628#comment-17367628
 ] 

Vihang Karajgaonkar edited comment on IMPALA-10759 at 6/22/21, 7:24 PM:


I took a brief look at this yesterday and I found that the issue happens when 
Impala is using 0.9.3 thrift version and hive is using 0.13.0 version. This 
happens because the hashCode method in the thrift generated code for HMS 
objects like Partition changes when you change thrift from 0.9.3 to 0.13.0. 
Specifically the hashCode values for the primitive fields like long now use 
TBaseHelper.hashCode(long) instead of the old way of add it to a ArrayList and 
then comparing the hashCode.

For example, in case of writeId field of the partition which is definied as a 
i64 in the thrift file, the hashCode is computed using TBaseHelper as seen in 
the diffs here 
https://github.com/apache/hive/commit/1945e2f67e5b09cdda40146b87e1ba492f897196#diff-505c537842790dadd6f182b07b0b216be40e050588941213220b4ae3622bd0faR877

I don't think there is a good way to "fix" this. This should get fixed 
automatically when Impala uses 0.11.0 thrift version. I confirmed that the 
build where we saw this failure did not have the Impala thrift version as 
0.11.0.



was (Author: vihangk1):
I took a brief look at this yesterday and I found that the issue happens when 
Impala is using 0.9.3 thrift version and hive is using 0.13.0 version. This 
happens because the {noformat}hashCode{noformat} method in the thrift generated 
code for HMS objects like Partition changes when you change thrift from 0.9.3 
to 0.13.0. Specifically the hashCode values for the primitive fields like long 
now use {noformat}TBaseHelper.hashCode{noformat} instead of the old way of add 
it to a ArrayList and then comparing the hashCode.

For example, in case of writeId field of the partition which is definied as a 
i64 in the thrift file, the hashCode is computed using TBaseHelper as seen in 
the diffs here 
https://github.com/apache/hive/commit/1945e2f67e5b09cdda40146b87e1ba492f897196#diff-505c537842790dadd6f182b07b0b216be40e050588941213220b4ae3622bd0faR877

I don't think there is a good way to "fix" this. This should get fixed 
automatically when Impala uses 0.11.0 thrift version. I confirmed that the 
build where we saw this failure did not have the Impala thrift version as 
0.11.0.


> MetastoreServiceHandler.get_partitions_by_names_req throws NoSuchMethodError
> 
>
> Key: IMPALA-10759
> URL: https://issues.apache.org/jira/browse/IMPALA-10759
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Yongzhi Chen
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>
> impala-cdpd-master-core 
> EnableCatalogdHmsCacheFlagTest.testEnableCatalogdCachingFlag test fails with 
> following stack:
> {noformat}
> Exception in thread "pool-470-thread-1" java.lang.NoSuchMethodError: 
> org.apache.thrift.TBaseHelper.hashCode(J)I
>   at 
> org.apache.hadoop.hive.metastore.api.Partition.hashCode(Partition.java:971)
>   at java.util.HashMap.hash(HashMap.java:338)
>   at java.util.HashMap.put(HashMap.java:611)
>   at 
> org.apache.impala.catalog.CatalogHmsAPIHelper.loadAndSetFileMetadataFromFs(CatalogHmsAPIHelper.java:527)
>   at 
> org.apache.impala.catalog.metastore.MetastoreServiceHandler.get_partitions_by_names_req(MetastoreServiceHandler.java:1443)
>   at 
> org.apache.impala.catalog.metastore.CatalogMetastoreServiceHandler.get_partitions_by_names_req(CatalogMetastoreServiceHandler.java:141)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.impala.catalog.metastore.CatalogMetastoreServer$TimingInvocationHandler.invoke(CatalogMetastoreServer.java:223)
>   at com.sun.proxy.$Proxy87.get_partitions_by_names_req(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_names_req.getResult(ThriftHiveMetastore.java:20087)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_names_req.getResult(ThriftHiveMetastore.java:20066)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 

[jira] [Commented] (IMPALA-10700) Introduce an option to skip deleting column statistics on truncate

2021-06-01 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355414#comment-17355414
 ] 

Vihang Karajgaonkar commented on IMPALA-10700:
--

Hi [~shajini] Can you please help document this query option when you get some 
time? Thanks a lot!

> Introduce an option to skip deleting column statistics on truncate
> --
>
> Key: IMPALA-10700
> URL: https://issues.apache.org/jira/browse/IMPALA-10700
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Currently when a user issues {{truncate table}} command on a 
> non-transactional table, catalogd also deletes the table and column 
> statistics. However, this can affect the performance of the truncate 
> operation especially at high concurrency. Based on preliminary research it 
> looks like other databases do not delete statistics after truncate operation 
> (e.g Oracle, Hive). It would be good to introduce a query option which can 
> set by the user to skip deleting the column statistics during the truncate 
> table execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10700) Introduce an option to skip deleting column statistics on truncate

2021-06-01 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10700.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Introduce an option to skip deleting column statistics on truncate
> --
>
> Key: IMPALA-10700
> URL: https://issues.apache.org/jira/browse/IMPALA-10700
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Currently when a user issues {{truncate table}} command on a 
> non-transactional table, catalogd also deletes the table and column 
> statistics. However, this can affect the performance of the truncate 
> operation especially at high concurrency. Based on preliminary research it 
> looks like other databases do not delete statistics after truncate operation 
> (e.g Oracle, Hive). It would be good to introduce a query option which can 
> set by the user to skip deleting the column statistics during the truncate 
> table execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10700) Introduce an option to skip deleting column statistics on truncate

2021-06-01 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10700.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Introduce an option to skip deleting column statistics on truncate
> --
>
> Key: IMPALA-10700
> URL: https://issues.apache.org/jira/browse/IMPALA-10700
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Currently when a user issues {{truncate table}} command on a 
> non-transactional table, catalogd also deletes the table and column 
> statistics. However, this can affect the performance of the truncate 
> operation especially at high concurrency. Based on preliminary research it 
> looks like other databases do not delete statistics after truncate operation 
> (e.g Oracle, Hive). It would be good to introduce a query option which can 
> set by the user to skip deleting the column statistics during the truncate 
> table execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10722) truncate operation deletes data files before deleting metadata

2021-05-27 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10722:


 Summary: truncate operation deletes data files before deleting 
metadata
 Key: IMPALA-10722
 URL: https://issues.apache.org/jira/browse/IMPALA-10722
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Vihang Karajgaonkar


In case of truncate operation, we delete the data files first and then the 
statistics. But since statistics are derived from data, we should first delete 
statistics and then data files.

See: 
https://github.com/apache/impala/blob/822e8373d1f1737865899b80862c2be7b07cc950/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L2297



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10722) truncate operation deletes data files before deleting metadata

2021-05-27 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10722:


 Summary: truncate operation deletes data files before deleting 
metadata
 Key: IMPALA-10722
 URL: https://issues.apache.org/jira/browse/IMPALA-10722
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Vihang Karajgaonkar


In case of truncate operation, we delete the data files first and then the 
statistics. But since statistics are derived from data, we should first delete 
statistics and then data files.

See: 
https://github.com/apache/impala/blob/822e8373d1f1737865899b80862c2be7b07cc950/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L2297



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IMPALA-10502) delayed 'Invalidated objects in cache' cause 'Table already exists'

2021-05-21 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-10502:
-
Priority: Critical  (was: Minor)

> delayed 'Invalidated objects in cache' cause 'Table already exists'
> ---
>
> Key: IMPALA-10502
> URL: https://issues.apache.org/jira/browse/IMPALA-10502
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Clients, Frontend
>Affects Versions: Impala 3.4.0
>Reporter: Adriano
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>
> In fast paced environment where the interval between the step 1 and 2 is # < 
> 100ms (a simplified pipeline looks like):
> 0- catalog 'on demand' in use and disableHmsSync (enabled or disabled: no 
> difference)
> 1- open session to coord A -> DROP TABLE X -> close session
> 2- open session to coord A -> CREATE TABLE X-> close session
> Results: the step -2- can fail with table already exist.
> During the internal investigation was discovered that IMPALA-9913 will 
> regress the issue in almost all scenarios.
> However considering that the investigation are internally ongoing it is nice 
> to have the event tracked also here.
> Once we are sure that IMPALA-9913 fix these events we can close this as 
> duplicate, in alternative carry on the investigation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10645) Expose metrics for catalogd's HMS endpoint

2021-05-21 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10645.
--
Fix Version/s: Impala 4.1
   Resolution: Fixed

> Expose metrics for catalogd's HMS endpoint
> --
>
> Key: IMPALA-10645
> URL: https://issues.apache.org/jira/browse/IMPALA-10645
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1
>
>
> Catalogd's HMS endpoint should expose metrics to help it supportability and 
> identify performance issues. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10645) Expose metrics for catalogd's HMS endpoint

2021-05-21 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10645.
--
Fix Version/s: Impala 4.1
   Resolution: Fixed

> Expose metrics for catalogd's HMS endpoint
> --
>
> Key: IMPALA-10645
> URL: https://issues.apache.org/jira/browse/IMPALA-10645
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.1
>
>
> Catalogd's HMS endpoint should expose metrics to help it supportability and 
> identify performance issues. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IMPALA-10645) Expose metrics for catalogd's HMS endpoint

2021-05-21 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned IMPALA-10645:


Assignee: Vihang Karajgaonkar

> Expose metrics for catalogd's HMS endpoint
> --
>
> Key: IMPALA-10645
> URL: https://issues.apache.org/jira/browse/IMPALA-10645
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> Catalogd's HMS endpoint should expose metrics to help it supportability and 
> identify performance issues. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10706) Get rid of metastoreAccessLock_ in TableLoader

2021-05-17 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10706:


 Summary: Get rid of metastoreAccessLock_ in TableLoader
 Key: IMPALA-10706
 URL: https://issues.apache.org/jira/browse/IMPALA-10706
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Vihang Karajgaonkar


https://github.com/apache/impala/blob/9c38568657d62b6f6d7b10aa1c721ba843374dd8/fe/src/main/java/org/apache/impala/catalog/TableLoader.java#L68
 has a synchronized block for metastore access which seems unnecessary anymore 
and should be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10706) Get rid of metastoreAccessLock_ in TableLoader

2021-05-17 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10706:


 Summary: Get rid of metastoreAccessLock_ in TableLoader
 Key: IMPALA-10706
 URL: https://issues.apache.org/jira/browse/IMPALA-10706
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Vihang Karajgaonkar


https://github.com/apache/impala/blob/9c38568657d62b6f6d7b10aa1c721ba843374dd8/fe/src/main/java/org/apache/impala/catalog/TableLoader.java#L68
 has a synchronized block for metastore access which seems unnecessary anymore 
and should be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10700) Introduce an option to skip deleting column statistics on truncate

2021-05-11 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10700:


 Summary: Introduce an option to skip deleting column statistics on 
truncate
 Key: IMPALA-10700
 URL: https://issues.apache.org/jira/browse/IMPALA-10700
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Currently when a user issues {{truncate table}} command on a non-transactional 
table, catalogd also deletes the table and column statistics. However, this can 
affect the performance of the truncate operation especially at high 
concurrency. Based on preliminary research it looks like other databases do not 
delete statistics after truncate operation (e.g Oracle, Hive). It would be good 
to introduce a query option which can set by the user to skip deleting the 
column statistics during the truncate table execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10700) Introduce an option to skip deleting column statistics on truncate

2021-05-11 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10700:


 Summary: Introduce an option to skip deleting column statistics on 
truncate
 Key: IMPALA-10700
 URL: https://issues.apache.org/jira/browse/IMPALA-10700
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Currently when a user issues {{truncate table}} command on a non-transactional 
table, catalogd also deletes the table and column statistics. However, this can 
affect the performance of the truncate operation especially at high 
concurrency. Based on preliminary research it looks like other databases do not 
delete statistics after truncate operation (e.g Oracle, Hive). It would be good 
to introduce a query option which can set by the user to skip deleting the 
column statistics during the truncate table execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10644) RangerAuthorizationFactory cannot be instantiated after latest GBN bump up

2021-04-26 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10644.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> RangerAuthorizationFactory cannot be instantiated after latest GBN bump up
> --
>
> Key: IMPALA-10644
> URL: https://issues.apache.org/jira/browse/IMPALA-10644
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Blocker
> Fix For: Impala 4.0
>
>
> After the GBN was bumped to 11920537 in the commit 
> https://github.com/apache/impala/commit/1ab1143e98ff09610dff82d1795cf103659ffe97
>  some of the ranger tests are failing with the following exception trace.
> {noformat}
> I0407 17:40:18.681761 25041 jni-util.cc:286] 
> org.apache.impala.common.InternalException: Unable to instantiate 
> authorization provider: 
> org.apache.impala.authorization.ranger.RangerAuthorizationFactory
>   at 
> org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:88)
>   at org.apache.impala.service.JniFrontend.(JniFrontend.java:143)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:86)
>   ... 1 more
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/solr/common/SolrException
>   at 
> org.apache.ranger.audit.provider.AuditProviderFactory.getProviderFromConfig(AuditProviderFactory.java:420)
>   at 
> org.apache.ranger.audit.provider.AuditProviderFactory.init(AuditProviderFactory.java:178)
>   at 
> org.apache.ranger.plugin.service.RangerBasePlugin.init(RangerBasePlugin.java:175)
>   at 
> org.apache.impala.authorization.ranger.RangerImpalaPlugin.init(RangerImpalaPlugin.java:50)
>   at 
> org.apache.impala.authorization.ranger.RangerImpalaPlugin.getInstance(RangerImpalaPlugin.java:69)
>   at 
> org.apache.impala.authorization.ranger.RangerAuthorizationChecker.(RangerAuthorizationChecker.java:82)
>   at 
> org.apache.impala.authorization.ranger.RangerAuthorizationFactory.(RangerAuthorizationFactory.java:44)
>   ... 6 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.solr.common.SolrException
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 13 more
> {noformat}
> It looks like after the GBN was upgraded we need to have solr dependencies in 
> the fe/pom.xml and they should not be reverted. The toolchain should also be 
> updated to include exclude solr and atlas libraries for the GBN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10644) RangerAuthorizationFactory cannot be instantiated after latest GBN bump up

2021-04-26 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10644.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> RangerAuthorizationFactory cannot be instantiated after latest GBN bump up
> --
>
> Key: IMPALA-10644
> URL: https://issues.apache.org/jira/browse/IMPALA-10644
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Blocker
> Fix For: Impala 4.0
>
>
> After the GBN was bumped to 11920537 in the commit 
> https://github.com/apache/impala/commit/1ab1143e98ff09610dff82d1795cf103659ffe97
>  some of the ranger tests are failing with the following exception trace.
> {noformat}
> I0407 17:40:18.681761 25041 jni-util.cc:286] 
> org.apache.impala.common.InternalException: Unable to instantiate 
> authorization provider: 
> org.apache.impala.authorization.ranger.RangerAuthorizationFactory
>   at 
> org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:88)
>   at org.apache.impala.service.JniFrontend.(JniFrontend.java:143)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:86)
>   ... 1 more
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/solr/common/SolrException
>   at 
> org.apache.ranger.audit.provider.AuditProviderFactory.getProviderFromConfig(AuditProviderFactory.java:420)
>   at 
> org.apache.ranger.audit.provider.AuditProviderFactory.init(AuditProviderFactory.java:178)
>   at 
> org.apache.ranger.plugin.service.RangerBasePlugin.init(RangerBasePlugin.java:175)
>   at 
> org.apache.impala.authorization.ranger.RangerImpalaPlugin.init(RangerImpalaPlugin.java:50)
>   at 
> org.apache.impala.authorization.ranger.RangerImpalaPlugin.getInstance(RangerImpalaPlugin.java:69)
>   at 
> org.apache.impala.authorization.ranger.RangerAuthorizationChecker.(RangerAuthorizationChecker.java:82)
>   at 
> org.apache.impala.authorization.ranger.RangerAuthorizationFactory.(RangerAuthorizationFactory.java:44)
>   ... 6 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.solr.common.SolrException
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 13 more
> {noformat}
> It looks like after the GBN was upgraded we need to have solr dependencies in 
> the fe/pom.xml and they should not be reverted. The toolchain should also be 
> updated to include exclude solr and atlas libraries for the GBN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9375) Remove DirectMetaProvider usage from CatalogMetaProvider

2021-04-14 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321314#comment-17321314
 ] 

Vihang Karajgaonkar commented on IMPALA-9375:
-

Thanks [~robbiezhang] for your comment. Yeah that metastore client pool is okay 
to be there since it is only instantiated on coordinators. Coordinators need a 
HMS client because they need to open a transaction in case transaction tables 
are being inserted into.

See 
https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/service/JniFrontend.java#L144

> Remove DirectMetaProvider usage from CatalogMetaProvider
> 
>
> Key: IMPALA-9375
> URL: https://issues.apache.org/jira/browse/IMPALA-9375
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.4.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Critical
>
> I see that CatalogMetaProvider uses {{DirectMetaProvider}} here 
> https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L239
> There are only a couple of places where it is used within 
> CatalogMetaProvider. We should implement those remaining APIs in catalog-v2 
> mode and remove the usage of DirectMetaProvider from CatalogMetaProvider. 
> DirectMetaProvider starts by default a MetastoreClientPool (with 10 
> connections). This is unnecessary given that catalog already makes the 
> connections to HMS at its startup. It also slows down the coordinator startup 
> time if there are HMS connection issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10613) Expose table and partition metadata over HMS API

2021-04-08 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10613.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Expose table and partition metadata over HMS API
> 
>
> Key: IMPALA-10613
> URL: https://issues.apache.org/jira/browse/IMPALA-10613
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Catalogd caches the table and partition metadata. If an external FE needs to 
> be supported to query using the Impala, it would need to get this metadata 
> from catalogd to compile the query and generate the plan. While a subset of 
> the metadata which is cached in catalogd, is sourced from Hive metastore, it 
> also caches file metadata which is needed by the Impala backend to create the 
> Impala plan. It would be good to expose the table and partition metadata 
> cached in catalogd over HMS API so that any Hive metastore client (e.g spark, 
> hive) can potentially use this metadata to create a plan. This JIRA tracks 
> the work needed to expose this information over catalogd.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-10613) Expose table and partition metadata over HMS API

2021-04-08 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10613.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Expose table and partition metadata over HMS API
> 
>
> Key: IMPALA-10613
> URL: https://issues.apache.org/jira/browse/IMPALA-10613
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
> Fix For: Impala 4.0
>
>
> Catalogd caches the table and partition metadata. If an external FE needs to 
> be supported to query using the Impala, it would need to get this metadata 
> from catalogd to compile the query and generate the plan. While a subset of 
> the metadata which is cached in catalogd, is sourced from Hive metastore, it 
> also caches file metadata which is needed by the Impala backend to create the 
> Impala plan. It would be good to expose the table and partition metadata 
> cached in catalogd over HMS API so that any Hive metastore client (e.g spark, 
> hive) can potentially use this metadata to create a plan. This JIRA tracks 
> the work needed to expose this information over catalogd.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10645) Expose metrics for catalogd's HMS endpoint

2021-04-07 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10645:


 Summary: Expose metrics for catalogd's HMS endpoint
 Key: IMPALA-10645
 URL: https://issues.apache.org/jira/browse/IMPALA-10645
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar


Catalogd's HMS endpoint should expose metrics to help it supportability and 
identify performance issues. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10645) Expose metrics for catalogd's HMS endpoint

2021-04-07 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10645:


 Summary: Expose metrics for catalogd's HMS endpoint
 Key: IMPALA-10645
 URL: https://issues.apache.org/jira/browse/IMPALA-10645
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar


Catalogd's HMS endpoint should expose metrics to help it supportability and 
identify performance issues. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-10644) RangerAuthorizationFactory cannot be instantiated after latest GBN bump up

2021-04-07 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316687#comment-17316687
 ] 

Vihang Karajgaonkar commented on IMPALA-10644:
--

http://gerrit.cloudera.org:8080/17282

> RangerAuthorizationFactory cannot be instantiated after latest GBN bump up
> --
>
> Key: IMPALA-10644
> URL: https://issues.apache.org/jira/browse/IMPALA-10644
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> After the GBN was bumped to 11920537 in the commit 
> https://github.com/apache/impala/commit/1ab1143e98ff09610dff82d1795cf103659ffe97
>  some of the ranger tests are failing with the following exception trace.
> {noformat}
> I0407 17:40:18.681761 25041 jni-util.cc:286] 
> org.apache.impala.common.InternalException: Unable to instantiate 
> authorization provider: 
> org.apache.impala.authorization.ranger.RangerAuthorizationFactory
>   at 
> org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:88)
>   at org.apache.impala.service.JniFrontend.(JniFrontend.java:143)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:86)
>   ... 1 more
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/solr/common/SolrException
>   at 
> org.apache.ranger.audit.provider.AuditProviderFactory.getProviderFromConfig(AuditProviderFactory.java:420)
>   at 
> org.apache.ranger.audit.provider.AuditProviderFactory.init(AuditProviderFactory.java:178)
>   at 
> org.apache.ranger.plugin.service.RangerBasePlugin.init(RangerBasePlugin.java:175)
>   at 
> org.apache.impala.authorization.ranger.RangerImpalaPlugin.init(RangerImpalaPlugin.java:50)
>   at 
> org.apache.impala.authorization.ranger.RangerImpalaPlugin.getInstance(RangerImpalaPlugin.java:69)
>   at 
> org.apache.impala.authorization.ranger.RangerAuthorizationChecker.(RangerAuthorizationChecker.java:82)
>   at 
> org.apache.impala.authorization.ranger.RangerAuthorizationFactory.(RangerAuthorizationFactory.java:44)
>   ... 6 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.solr.common.SolrException
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   ... 13 more
> {noformat}
> It looks like after the GBN was upgraded we need to have solr dependencies in 
> the fe/pom.xml and they should not be reverted. The toolchain should also be 
> updated to include exclude solr and atlas libraries for the GBN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10644) RangerAuthorizationFactory cannot be instantiated after latest GBN bump up

2021-04-07 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10644:


 Summary: RangerAuthorizationFactory cannot be instantiated after 
latest GBN bump up
 Key: IMPALA-10644
 URL: https://issues.apache.org/jira/browse/IMPALA-10644
 Project: IMPALA
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


After the GBN was bumped to 11920537 in the commit 
https://github.com/apache/impala/commit/1ab1143e98ff09610dff82d1795cf103659ffe97
 some of the ranger tests are failing with the following exception trace.

{noformat}
I0407 17:40:18.681761 25041 jni-util.cc:286] 
org.apache.impala.common.InternalException: Unable to instantiate authorization 
provider: org.apache.impala.authorization.ranger.RangerAuthorizationFactory
at 
org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:88)
at org.apache.impala.service.JniFrontend.(JniFrontend.java:143)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:86)
... 1 more
Caused by: java.lang.NoClassDefFoundError: org/apache/solr/common/SolrException
at 
org.apache.ranger.audit.provider.AuditProviderFactory.getProviderFromConfig(AuditProviderFactory.java:420)
at 
org.apache.ranger.audit.provider.AuditProviderFactory.init(AuditProviderFactory.java:178)
at 
org.apache.ranger.plugin.service.RangerBasePlugin.init(RangerBasePlugin.java:175)
at 
org.apache.impala.authorization.ranger.RangerImpalaPlugin.init(RangerImpalaPlugin.java:50)
at 
org.apache.impala.authorization.ranger.RangerImpalaPlugin.getInstance(RangerImpalaPlugin.java:69)
at 
org.apache.impala.authorization.ranger.RangerAuthorizationChecker.(RangerAuthorizationChecker.java:82)
at 
org.apache.impala.authorization.ranger.RangerAuthorizationFactory.(RangerAuthorizationFactory.java:44)
... 6 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.solr.common.SolrException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 13 more
{noformat}

It looks like after the GBN was upgraded we need to have solr dependencies in 
the fe/pom.xml and they should not be reverted. The toolchain should also be 
updated to include exclude solr and atlas libraries for the GBN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10644) RangerAuthorizationFactory cannot be instantiated after latest GBN bump up

2021-04-07 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10644:


 Summary: RangerAuthorizationFactory cannot be instantiated after 
latest GBN bump up
 Key: IMPALA-10644
 URL: https://issues.apache.org/jira/browse/IMPALA-10644
 Project: IMPALA
  Issue Type: Bug
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


After the GBN was bumped to 11920537 in the commit 
https://github.com/apache/impala/commit/1ab1143e98ff09610dff82d1795cf103659ffe97
 some of the ranger tests are failing with the following exception trace.

{noformat}
I0407 17:40:18.681761 25041 jni-util.cc:286] 
org.apache.impala.common.InternalException: Unable to instantiate authorization 
provider: org.apache.impala.authorization.ranger.RangerAuthorizationFactory
at 
org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:88)
at org.apache.impala.service.JniFrontend.(JniFrontend.java:143)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:86)
... 1 more
Caused by: java.lang.NoClassDefFoundError: org/apache/solr/common/SolrException
at 
org.apache.ranger.audit.provider.AuditProviderFactory.getProviderFromConfig(AuditProviderFactory.java:420)
at 
org.apache.ranger.audit.provider.AuditProviderFactory.init(AuditProviderFactory.java:178)
at 
org.apache.ranger.plugin.service.RangerBasePlugin.init(RangerBasePlugin.java:175)
at 
org.apache.impala.authorization.ranger.RangerImpalaPlugin.init(RangerImpalaPlugin.java:50)
at 
org.apache.impala.authorization.ranger.RangerImpalaPlugin.getInstance(RangerImpalaPlugin.java:69)
at 
org.apache.impala.authorization.ranger.RangerAuthorizationChecker.(RangerAuthorizationChecker.java:82)
at 
org.apache.impala.authorization.ranger.RangerAuthorizationFactory.(RangerAuthorizationFactory.java:44)
... 6 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.solr.common.SolrException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 13 more
{noformat}

It looks like after the GBN was upgraded we need to have solr dependencies in 
the fe/pom.xml and they should not be reverted. The toolchain should also be 
updated to include exclude solr and atlas libraries for the GBN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10639) useCompactProtocol should be configurable for the catalogd's HMS endpoint

2021-04-06 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10639:


 Summary: useCompactProtocol should be configurable for the 
catalogd's HMS endpoint
 Key: IMPALA-10639
 URL: https://issues.apache.org/jira/browse/IMPALA-10639
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar


Currently, catalog server's HMS endpoint has a hardcoded setting to use 
{{TBinaryProtocol}}. We can add a configuration which can make it switch to 
using {{TCompactProtocol}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10639) useCompactProtocol should be configurable for the catalogd's HMS endpoint

2021-04-06 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10639:


 Summary: useCompactProtocol should be configurable for the 
catalogd's HMS endpoint
 Key: IMPALA-10639
 URL: https://issues.apache.org/jira/browse/IMPALA-10639
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar


Currently, catalog server's HMS endpoint has a hardcoded setting to use 
{{TBinaryProtocol}}. We can add a configuration which can make it switch to 
using {{TCompactProtocol}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10638) Add support for SASL and SSL in catalogd's HMS endpoint

2021-04-06 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10638:


 Summary: Add support for SASL and SSL in catalogd's HMS endpoint
 Key: IMPALA-10638
 URL: https://issues.apache.org/jira/browse/IMPALA-10638
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10638) Add support for SASL and SSL in catalogd's HMS endpoint

2021-04-06 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10638:


 Summary: Add support for SASL and SSL in catalogd's HMS endpoint
 Key: IMPALA-10638
 URL: https://issues.apache.org/jira/browse/IMPALA-10638
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10605) Deflake test_refresh_native

2021-03-31 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10605.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Deflake test_refresh_native
> ---
>
> Key: IMPALA-10605
> URL: https://issues.apache.org/jira/browse/IMPALA-10605
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Fix For: Impala 4.0
>
>
> The test uses a regex to parse the output of describe database and extract 
> the db properties. The regex currently assumes that there will be only one 
> property in the database. This assumption breaks when events processor is 
> running because it might add some db properties as well.
> {noformat}
> regex = r"{(.*?)=(.*?)}"
> {noformat}
> The above regex will select subsequent properties as the value of the first 
> key. We can fix this by changing the regex to specifically look for the 
> functional name property key prefix.
> {noformat}
> regex = r"{.*(impala_registered_function.*?)=(.*?)[,}]"
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10605) Deflake test_refresh_native

2021-03-31 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10605.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> Deflake test_refresh_native
> ---
>
> Key: IMPALA-10605
> URL: https://issues.apache.org/jira/browse/IMPALA-10605
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Fix For: Impala 4.0
>
>
> The test uses a regex to parse the output of describe database and extract 
> the db properties. The regex currently assumes that there will be only one 
> property in the database. This assumption breaks when events processor is 
> running because it might add some db properties as well.
> {noformat}
> regex = r"{(.*?)=(.*?)}"
> {noformat}
> The above regex will select subsequent properties as the value of the first 
> key. We can fix this by changing the regex to specifically look for the 
> functional name property key prefix.
> {noformat}
> regex = r"{.*(impala_registered_function.*?)=(.*?)[,}]"
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-10613) Expose table and partition metadata over HMS API

2021-03-30 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311856#comment-17311856
 ] 

Vihang Karajgaonkar commented on IMPALA-10613:
--

https://gerrit.cloudera.org/#/c/17244/ for the review

> Expose table and partition metadata over HMS API
> 
>
> Key: IMPALA-10613
> URL: https://issues.apache.org/jira/browse/IMPALA-10613
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> Catalogd caches the table and partition metadata. If an external FE needs to 
> be supported to query using the Impala, it would need to get this metadata 
> from catalogd to compile the query and generate the plan. While a subset of 
> the metadata which is cached in catalogd, is sourced from Hive metastore, it 
> also caches file metadata which is needed by the Impala backend to create the 
> Impala plan. It would be good to expose the table and partition metadata 
> cached in catalogd over HMS API so that any Hive metastore client (e.g spark, 
> hive) can potentially use this metadata to create a plan. This JIRA tracks 
> the work needed to expose this information over catalogd.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-10613) Expose table and partition metadata over HMS API

2021-03-30 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10613 started by Vihang Karajgaonkar.

> Expose table and partition metadata over HMS API
> 
>
> Key: IMPALA-10613
> URL: https://issues.apache.org/jira/browse/IMPALA-10613
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> Catalogd caches the table and partition metadata. If an external FE needs to 
> be supported to query using the Impala, it would need to get this metadata 
> from catalogd to compile the query and generate the plan. While a subset of 
> the metadata which is cached in catalogd, is sourced from Hive metastore, it 
> also caches file metadata which is needed by the Impala backend to create the 
> Impala plan. It would be good to expose the table and partition metadata 
> cached in catalogd over HMS API so that any Hive metastore client (e.g spark, 
> hive) can potentially use this metadata to create a plan. This JIRA tracks 
> the work needed to expose this information over catalogd.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10598) test_cache_reload_validation is flaky

2021-03-26 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10598.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> test_cache_reload_validation is flaky
> -
>
> Key: IMPALA-10598
> URL: https://issues.apache.org/jira/browse/IMPALA-10598
> Project: IMPALA
>  Issue Type: Test
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>  Labels: flaky-test
> Fix For: Impala 4.0
>
>
> I noticed that when I run 
> {noformat}
> bin/impala-py.test tests/query_test/test_hdfs_caching.py -k 
> test_cache_reload_validation
> {noformat}
> I see a the following failure on master branch. 
> {noformat}
>  TestHdfsCachingDdl.test_cache_reload_validation[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none] 
> tests/query_test/test_hdfs_caching.py:269: in test_cache_reload_validation
> assert num_entries_pre + 4 == get_num_cache_requests(), \
> E   AssertionError: Adding the tables should be reflected by the number of 
> cache directives.
> E   assert (2 + 4) == 7
> E+  where 7 = get_num_cache_requests()
> {noformat}
> This failure is reproducible for me every time but I am not sure why the 
> jenkins job don't show this test failure. When I looked into this I found 
> that the test depends on the method
> get the number of cache directives on the hdfs.
> {noformat}
>   def get_num_cache_requests_util():
> rc, stdout, stderr = exec_process("hdfs cacheadmin -listDirectives 
> -stats")
> assert rc == 0, 'Error executing hdfs cacheadmin: %s %s' % (stdout, 
> stderr)
> return len(stdout.split('\n'))
> {noformat}
> This output of this command when there are no entries is 
> {noformat}
> Found 0 entries
> {noformat}
> when there are entries the output looks like 
> {noformat}
> Found 4 entries
>   ID POOL   REPL EXPIRY  PATH 
>BYTES_NEEDED  BYTES_CACHED  FILES_NEEDED  FILES_CACHED
>  225 testPool  8 never   /test-warehouse/cachedb.db/cached_tbl_reload 
>   0 0 0 0
>  226 testPool  8 never   
> /test-warehouse/cachedb.db/cached_tbl_reload_part  0  
>0 0 0
>  227 testPool  8 never   
> /test-warehouse/cachedb.db/cached_tbl_reload_part/j=1  0  
>0 0 0
>  228 testPool  8 never   
> /test-warehouse/cachedb.db/cached_tbl_reload_part/j=2  0  
>0 0 0
> {noformat}
> When there are no entries there is also a additional new line which is 
> counted.
> So when there are no entries the method outputs 2 and when there are 4 
> entries the method outputs 7 which causes the failure because the test 
> expects 2+4.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10598) test_cache_reload_validation is flaky

2021-03-26 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved IMPALA-10598.
--
Fix Version/s: Impala 4.0
   Resolution: Fixed

> test_cache_reload_validation is flaky
> -
>
> Key: IMPALA-10598
> URL: https://issues.apache.org/jira/browse/IMPALA-10598
> Project: IMPALA
>  Issue Type: Test
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>  Labels: flaky-test
> Fix For: Impala 4.0
>
>
> I noticed that when I run 
> {noformat}
> bin/impala-py.test tests/query_test/test_hdfs_caching.py -k 
> test_cache_reload_validation
> {noformat}
> I see a the following failure on master branch. 
> {noformat}
>  TestHdfsCachingDdl.test_cache_reload_validation[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none] 
> tests/query_test/test_hdfs_caching.py:269: in test_cache_reload_validation
> assert num_entries_pre + 4 == get_num_cache_requests(), \
> E   AssertionError: Adding the tables should be reflected by the number of 
> cache directives.
> E   assert (2 + 4) == 7
> E+  where 7 = get_num_cache_requests()
> {noformat}
> This failure is reproducible for me every time but I am not sure why the 
> jenkins job don't show this test failure. When I looked into this I found 
> that the test depends on the method
> get the number of cache directives on the hdfs.
> {noformat}
>   def get_num_cache_requests_util():
> rc, stdout, stderr = exec_process("hdfs cacheadmin -listDirectives 
> -stats")
> assert rc == 0, 'Error executing hdfs cacheadmin: %s %s' % (stdout, 
> stderr)
> return len(stdout.split('\n'))
> {noformat}
> This output of this command when there are no entries is 
> {noformat}
> Found 0 entries
> {noformat}
> when there are entries the output looks like 
> {noformat}
> Found 4 entries
>   ID POOL   REPL EXPIRY  PATH 
>BYTES_NEEDED  BYTES_CACHED  FILES_NEEDED  FILES_CACHED
>  225 testPool  8 never   /test-warehouse/cachedb.db/cached_tbl_reload 
>   0 0 0 0
>  226 testPool  8 never   
> /test-warehouse/cachedb.db/cached_tbl_reload_part  0  
>0 0 0
>  227 testPool  8 never   
> /test-warehouse/cachedb.db/cached_tbl_reload_part/j=1  0  
>0 0 0
>  228 testPool  8 never   
> /test-warehouse/cachedb.db/cached_tbl_reload_part/j=2  0  
>0 0 0
> {noformat}
> When there are no entries there is also a additional new line which is 
> counted.
> So when there are no entries the method outputs 2 and when there are 4 
> entries the method outputs 7 which causes the failure because the test 
> expects 2+4.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10613) Expose table and partition metadata over HMS API

2021-03-25 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10613:


 Summary: Expose table and partition metadata over HMS API
 Key: IMPALA-10613
 URL: https://issues.apache.org/jira/browse/IMPALA-10613
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Catalogd caches the table and partition metadata. If an external FE needs to be 
supported to query using the Impala, it would need to get this metadata from 
catalogd to compile the query and generate the plan. While a subset of the 
metadata which is cached in catalogd, is sourced from Hive metastore, it also 
caches file metadata which is needed by the Impala backend to create the Impala 
plan. It would be good to expose the table and partition metadata cached in 
catalogd over HMS API so that any Hive metastore client (e.g spark, hive) can 
potentially use this metadata to create a plan. This JIRA tracks the work 
needed to expose this information over catalogd.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10613) Expose table and partition metadata over HMS API

2021-03-25 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10613:


 Summary: Expose table and partition metadata over HMS API
 Key: IMPALA-10613
 URL: https://issues.apache.org/jira/browse/IMPALA-10613
 Project: IMPALA
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


Catalogd caches the table and partition metadata. If an external FE needs to be 
supported to query using the Impala, it would need to get this metadata from 
catalogd to compile the query and generate the plan. While a subset of the 
metadata which is cached in catalogd, is sourced from Hive metastore, it also 
caches file metadata which is needed by the Impala backend to create the Impala 
plan. It would be good to expose the table and partition metadata cached in 
catalogd over HMS API so that any Hive metastore client (e.g spark, hive) can 
potentially use this metadata to create a plan. This JIRA tracks the work 
needed to expose this information over catalogd.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10612) Catalogd changes to support external FE

2021-03-25 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10612:


 Summary: Catalogd changes to support external FE
 Key: IMPALA-10612
 URL: https://issues.apache.org/jira/browse/IMPALA-10612
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


This issue is used to track the work needed to expose metadata in the catalogd 
over HMS API so that any HMS compatible client would be able to use catalogd as 
a metadata cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10612) Catalogd changes to support external FE

2021-03-25 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10612:


 Summary: Catalogd changes to support external FE
 Key: IMPALA-10612
 URL: https://issues.apache.org/jira/browse/IMPALA-10612
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


This issue is used to track the work needed to expose metadata in the catalogd 
over HMS API so that any HMS compatible client would be able to use catalogd as 
a metadata cache. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IMPALA-10605) Deflake test_refresh_native

2021-03-24 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-10605:
-
Description: 
The test uses a regex to parse the output of describe database and extract the 
db properties. The regex currently assumes that there will be only one property 
in the database. This assumption breaks when events processor is running 
because it might add some db properties as well.

{noformat}
regex = r"{(.*?)=(.*?)}"
{noformat}

The above regex will select subsequent properties as the value of the first 
key. We can fix this by changing the regex to specifically look for the 
functional name property key prefix.
{noformat}
regex = r"{.*(impala_registered_function.*?)=(.*?)[,}]"
{noformat}

> Deflake test_refresh_native
> ---
>
> Key: IMPALA-10605
> URL: https://issues.apache.org/jira/browse/IMPALA-10605
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>
> The test uses a regex to parse the output of describe database and extract 
> the db properties. The regex currently assumes that there will be only one 
> property in the database. This assumption breaks when events processor is 
> running because it might add some db properties as well.
> {noformat}
> regex = r"{(.*?)=(.*?)}"
> {noformat}
> The above regex will select subsequent properties as the value of the first 
> key. We can fix this by changing the regex to specifically look for the 
> functional name property key prefix.
> {noformat}
> regex = r"{.*(impala_registered_function.*?)=(.*?)[,}]"
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10605) Deflake test_refresh_native

2021-03-24 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10605:


 Summary: Deflake test_refresh_native
 Key: IMPALA-10605
 URL: https://issues.apache.org/jira/browse/IMPALA-10605
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10605) Deflake test_refresh_native

2021-03-24 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10605:


 Summary: Deflake test_refresh_native
 Key: IMPALA-10605
 URL: https://issues.apache.org/jira/browse/IMPALA-10605
 Project: IMPALA
  Issue Type: Improvement
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-10598) test_cache_reload_validation is flaky

2021-03-19 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created IMPALA-10598:


 Summary: test_cache_reload_validation is flaky
 Key: IMPALA-10598
 URL: https://issues.apache.org/jira/browse/IMPALA-10598
 Project: IMPALA
  Issue Type: Test
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


I noticed that when I run 

{noformat}
bin/impala-py.test tests/query_test/test_hdfs_caching.py -k 
test_cache_reload_validation
{noformat}

I see a the following failure on master branch. 

{noformat}
 TestHdfsCachingDdl.test_cache_reload_validation[protocol: beeswax | 
exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
text/none] 
tests/query_test/test_hdfs_caching.py:269: in test_cache_reload_validation
assert num_entries_pre + 4 == get_num_cache_requests(), \
E   AssertionError: Adding the tables should be reflected by the number of 
cache directives.
E   assert (2 + 4) == 7
E+  where 7 = get_num_cache_requests()

{noformat}

This failure is reproducible for me every time but I am not sure why the 
jenkins job don't show this test failure. When I looked into this I found that 
the test depends on the method
get the number of cache directives on the hdfs.

{noformat}
  def get_num_cache_requests_util():
rc, stdout, stderr = exec_process("hdfs cacheadmin -listDirectives -stats")
assert rc == 0, 'Error executing hdfs cacheadmin: %s %s' % (stdout, stderr)
return len(stdout.split('\n'))
{noformat}

This output of this command when there are no entries is 
{noformat}
Found 0 entries
{noformat}

when there are entries the output looks like 
{noformat}
Found 4 entries
  ID POOL   REPL EXPIRY  PATH   
 BYTES_NEEDED  BYTES_CACHED  FILES_NEEDED  FILES_CACHED
 225 testPool  8 never   /test-warehouse/cachedb.db/cached_tbl_reload   
0 0 0 0
 226 testPool  8 never   /test-warehouse/cachedb.db/cached_tbl_reload_part  
0 0 0 0
 227 testPool  8 never   
/test-warehouse/cachedb.db/cached_tbl_reload_part/j=1  0
 0 0 0
 228 testPool  8 never   
/test-warehouse/cachedb.db/cached_tbl_reload_part/j=2  0
 0 0 0
{noformat}

When there are no entries there is also a additional new line which is counted.
So when there are no entries the method outputs 2 and when there are 4 entries 
the method outputs 7 which causes the failure because the test expects 2+4.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-10598) test_cache_reload_validation is flaky

2021-03-19 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated IMPALA-10598:
-
Labels: flaky-test  (was: )

> test_cache_reload_validation is flaky
> -
>
> Key: IMPALA-10598
> URL: https://issues.apache.org/jira/browse/IMPALA-10598
> Project: IMPALA
>  Issue Type: Test
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>  Labels: flaky-test
>
> I noticed that when I run 
> {noformat}
> bin/impala-py.test tests/query_test/test_hdfs_caching.py -k 
> test_cache_reload_validation
> {noformat}
> I see a the following failure on master branch. 
> {noformat}
>  TestHdfsCachingDdl.test_cache_reload_validation[protocol: beeswax | 
> exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none] 
> tests/query_test/test_hdfs_caching.py:269: in test_cache_reload_validation
> assert num_entries_pre + 4 == get_num_cache_requests(), \
> E   AssertionError: Adding the tables should be reflected by the number of 
> cache directives.
> E   assert (2 + 4) == 7
> E+  where 7 = get_num_cache_requests()
> {noformat}
> This failure is reproducible for me every time but I am not sure why the 
> jenkins job don't show this test failure. When I looked into this I found 
> that the test depends on the method
> get the number of cache directives on the hdfs.
> {noformat}
>   def get_num_cache_requests_util():
> rc, stdout, stderr = exec_process("hdfs cacheadmin -listDirectives 
> -stats")
> assert rc == 0, 'Error executing hdfs cacheadmin: %s %s' % (stdout, 
> stderr)
> return len(stdout.split('\n'))
> {noformat}
> This output of this command when there are no entries is 
> {noformat}
> Found 0 entries
> {noformat}
> when there are entries the output looks like 
> {noformat}
> Found 4 entries
>   ID POOL   REPL EXPIRY  PATH 
>BYTES_NEEDED  BYTES_CACHED  FILES_NEEDED  FILES_CACHED
>  225 testPool  8 never   /test-warehouse/cachedb.db/cached_tbl_reload 
>   0 0 0 0
>  226 testPool  8 never   
> /test-warehouse/cachedb.db/cached_tbl_reload_part  0  
>0 0 0
>  227 testPool  8 never   
> /test-warehouse/cachedb.db/cached_tbl_reload_part/j=1  0  
>0 0 0
>  228 testPool  8 never   
> /test-warehouse/cachedb.db/cached_tbl_reload_part/j=2  0  
>0 0 0
> {noformat}
> When there are no entries there is also a additional new line which is 
> counted.
> So when there are no entries the method outputs 2 and when there are 4 
> entries the method outputs 7 which causes the failure because the test 
> expects 2+4.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



  1   2   3   4   5   6   7   >