[jira] [Commented] (IMPALA-10722) truncate operation deletes data files before deleting metadata
[ https://issues.apache.org/jira/browse/IMPALA-10722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602119#comment-17602119 ] Vihang Karajgaonkar commented on IMPALA-10722: -- Feel free to assign it to yourself. I am not working on this. > truncate operation deletes data files before deleting metadata > -- > > Key: IMPALA-10722 > URL: https://issues.apache.org/jira/browse/IMPALA-10722 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Vihang Karajgaonkar >Priority: Minor > Labels: newbie > > In case of truncate operation, we delete the data files first and then the > statistics. But since statistics are derived from data, we should first > delete statistics and then data files. > See: > https://github.com/apache/impala/blob/822e8373d1f1737865899b80862c2be7b07cc950/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L2297 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-11091) Update documentation for event polling
Vihang Karajgaonkar created IMPALA-11091: Summary: Update documentation for event polling Key: IMPALA-11091 URL: https://issues.apache.org/jira/browse/IMPALA-11091 Project: IMPALA Issue Type: Documentation Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar IMPALA-8795 enables event polling by default in Impala 4.1. This ticket tracks the changes in the document to reflect that. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-11091) Update documentation for event polling
Vihang Karajgaonkar created IMPALA-11091: Summary: Update documentation for event polling Key: IMPALA-11091 URL: https://issues.apache.org/jira/browse/IMPALA-11091 Project: IMPALA Issue Type: Documentation Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar IMPALA-8795 enables event polling by default in Impala 4.1. This ticket tracks the changes in the document to reflect that. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (IMPALA-8592) Add support for insert events for 'LOAD DATA..' statements from Impala.
[ https://issues.apache.org/jira/browse/IMPALA-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452610#comment-17452610 ] Vihang Karajgaonkar commented on IMPALA-8592: - One of the usecase here is that if you have multiple Impala clusters a load data statement in one Impala will not generate any events and hence the table will need to be refreshed on all the Impala clusters. > Add support for insert events for 'LOAD DATA..' statements from Impala. > --- > > Key: IMPALA-8592 > URL: https://issues.apache.org/jira/browse/IMPALA-8592 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Anurag Mantripragada >Priority: Major > > Hive generates INSERT events for LOAD DATA.. statements. We should support > the same in Impala. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10886) TestReusePartitionMetadata.test_reuse_partition_meta fails
[ https://issues.apache.org/jira/browse/IMPALA-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452520#comment-17452520 ] Vihang Karajgaonkar commented on IMPALA-10886: -- Do we know why we don't detect DROP_PARTITION as self-event in this case? > TestReusePartitionMetadata.test_reuse_partition_meta fails > -- > > Key: IMPALA-10886 > URL: https://issues.apache.org/jira/browse/IMPALA-10886 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Labels: broken-build > Attachments: test_local_catalog.patch > > > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14670/testReport/junit/custom_cluster.test_local_catalog/TestReusePartitionMetadata/test_reuse_partition_meta/ > {code} > custom_cluster/test_local_catalog.py:586: in test_reuse_partition_meta > self.check_missing_partitions(unique_database, 1) > custom_cluster/test_local_catalog.py:595: in check_missing_partitions > assert match.group(1) == str(partition_misses) > E assert '0' == '1' > E - 0 > E + 1 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9857) Batch ALTER_PARTITION events
[ https://issues.apache.org/jira/browse/IMPALA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-9857. - Fix Version/s: Impala 4.1.0 Resolution: Fixed > Batch ALTER_PARTITION events > > > Key: IMPALA-9857 > URL: https://issues.apache.org/jira/browse/IMPALA-9857 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1.0 > > > When Hive inserts data into partitioned tables, it generates a lot of > ALTER_PARTITION (and possibly INSERT_EVENT) in quick succession. Currently, > such events are processed one by one by EventsProcessor which is can be slow > and can cause EventsProcessor to lag behind. This JIRA proposes to use > batching for such ALTER_PARTITION events such that all the successive > ALTER_PARTITION events for the same table are batched together into one > ALTER_PARTITIONS event and then are processed together to refresh all the > partitions from the events. This can significantly speed up the event > processing in such cases. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9857) Batch ALTER_PARTITION events
[ https://issues.apache.org/jira/browse/IMPALA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-9857. - Fix Version/s: Impala 4.1.0 Resolution: Fixed > Batch ALTER_PARTITION events > > > Key: IMPALA-9857 > URL: https://issues.apache.org/jira/browse/IMPALA-9857 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1.0 > > > When Hive inserts data into partitioned tables, it generates a lot of > ALTER_PARTITION (and possibly INSERT_EVENT) in quick succession. Currently, > such events are processed one by one by EventsProcessor which is can be slow > and can cause EventsProcessor to lag behind. This JIRA proposes to use > batching for such ALTER_PARTITION events such that all the successive > ALTER_PARTITION events for the same table are batched together into one > ALTER_PARTITIONS event and then are processed together to refresh all the > partitions from the events. This can significantly speed up the event > processing in such cases. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (IMPALA-11028) Table loading could fail if metastore cleans up old events
[ https://issues.apache.org/jira/browse/IMPALA-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-11028. -- Fix Version/s: Impala 4.1.0 Resolution: Fixed > Table loading could fail if metastore cleans up old events > -- > > Key: IMPALA-11028 > URL: https://issues.apache.org/jira/browse/IMPALA-11028 > Project: IMPALA > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1.0 > > > After IMPALA-10502, Catalogd tracks the table's create event id. When the > table is loaded for the first time, it updates the create event id of the > table. But if the table is loaded for the first time after a long delay > (after 24 hrs) it is possible the metastore cleans up old notification logs > entries which are required by catalogd during the table load. > See this snippet from TableLoader.java > {noformat} > if (eventId != -1 && catalog_.isEventProcessingActive()) { > // If the eventId is not -1 it means this table was likely created by > Impala. > // However, since the load operation of the table can happen much > later, it is > // possible that the table was recreated outside Impala and hence the > eventId > // which is stored in the loaded table needs to be updated to the > latest. > // we are only interested in fetching the events if we have a valid > eventId > // for a table. For tables where eventId is unknown are not created by > // this catalogd and hence the self-event detection logic does not > apply. > events = MetastoreEventsProcessor.getNextMetastoreEvents(catalog_, > eventId, > notificationEvent -> CreateTableEvent.CREATE_TABLE_EVENT_TYPE > .equals(notificationEvent.getEventType()) > && > notificationEvent.getDbName().equalsIgnoreCase(db.getName()) > && > notificationEvent.getTableName().equalsIgnoreCase(tblName)); > } > {noformat} > {{getNextMetastoreEvents}} method can throw the following exception if the > metastore has cleaned up older entries (by default 24 hrs). This is > controlled by configuration {{hive.metastore.event.db.listener.timetolive}} > on the metastore side. > I could reproduce the problem setting the following metastore configs. > {noformat} > hive.metastore.event.db.listener.clean.interval=10s > hive.metastore.event.db.listener.timetolive=120s > {noformat} > Now run the following Impala script > {noformat} > create table t1 (c1 int); > create table t2 (c1 int); > select sleep(24); > create table t3 (c1 int); > select * from t1; > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-11028) Table loading could fail if metastore cleans up old events
[ https://issues.apache.org/jira/browse/IMPALA-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-11028. -- Fix Version/s: Impala 4.1.0 Resolution: Fixed > Table loading could fail if metastore cleans up old events > -- > > Key: IMPALA-11028 > URL: https://issues.apache.org/jira/browse/IMPALA-11028 > Project: IMPALA > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1.0 > > > After IMPALA-10502, Catalogd tracks the table's create event id. When the > table is loaded for the first time, it updates the create event id of the > table. But if the table is loaded for the first time after a long delay > (after 24 hrs) it is possible the metastore cleans up old notification logs > entries which are required by catalogd during the table load. > See this snippet from TableLoader.java > {noformat} > if (eventId != -1 && catalog_.isEventProcessingActive()) { > // If the eventId is not -1 it means this table was likely created by > Impala. > // However, since the load operation of the table can happen much > later, it is > // possible that the table was recreated outside Impala and hence the > eventId > // which is stored in the loaded table needs to be updated to the > latest. > // we are only interested in fetching the events if we have a valid > eventId > // for a table. For tables where eventId is unknown are not created by > // this catalogd and hence the self-event detection logic does not > apply. > events = MetastoreEventsProcessor.getNextMetastoreEvents(catalog_, > eventId, > notificationEvent -> CreateTableEvent.CREATE_TABLE_EVENT_TYPE > .equals(notificationEvent.getEventType()) > && > notificationEvent.getDbName().equalsIgnoreCase(db.getName()) > && > notificationEvent.getTableName().equalsIgnoreCase(tblName)); > } > {noformat} > {{getNextMetastoreEvents}} method can throw the following exception if the > metastore has cleaned up older entries (by default 24 hrs). This is > controlled by configuration {{hive.metastore.event.db.listener.timetolive}} > on the metastore side. > I could reproduce the problem setting the following metastore configs. > {noformat} > hive.metastore.event.db.listener.clean.interval=10s > hive.metastore.event.db.listener.timetolive=120s > {noformat} > Now run the following Impala script > {noformat} > create table t1 (c1 int); > create table t2 (c1 int); > select sleep(24); > create table t3 (c1 int); > select * from t1; > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (IMPALA-11028) Table loading could fail if metastore cleans up old events
Vihang Karajgaonkar created IMPALA-11028: Summary: Table loading could fail if metastore cleans up old events Key: IMPALA-11028 URL: https://issues.apache.org/jira/browse/IMPALA-11028 Project: IMPALA Issue Type: Bug Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar After IMPALA-10502, Catalogd tracks the table's create event id. When the table is loaded for the first time, it updates the create event id of the table. But if the table is loaded for the first time after a long delay (after 24 hrs) it is possible the metastore cleans up old notification logs entries which are required by catalogd during the table load. See this snippet from TableLoader.java {noformat} if (eventId != -1 && catalog_.isEventProcessingActive()) { // If the eventId is not -1 it means this table was likely created by Impala. // However, since the load operation of the table can happen much later, it is // possible that the table was recreated outside Impala and hence the eventId // which is stored in the loaded table needs to be updated to the latest. // we are only interested in fetching the events if we have a valid eventId // for a table. For tables where eventId is unknown are not created by // this catalogd and hence the self-event detection logic does not apply. events = MetastoreEventsProcessor.getNextMetastoreEvents(catalog_, eventId, notificationEvent -> CreateTableEvent.CREATE_TABLE_EVENT_TYPE .equals(notificationEvent.getEventType()) && notificationEvent.getDbName().equalsIgnoreCase(db.getName()) && notificationEvent.getTableName().equalsIgnoreCase(tblName)); } {noformat} {{getNextMetastoreEvents}} method can throw the following exception if the metastore has cleaned up older entries (by default 24 hrs). This is controlled by configuration {{hive.metastore.event.db.listener.timetolive}} on the metastore side. I could reproduce the problem setting the following metastore configs. {noformat} hive.metastore.event.db.listener.clean.interval=10s hive.metastore.event.db.listener.timetolive=120s {noformat} Now run the following Impala script {noformat} create table t1 (c1 int); create table t2 (c1 int); select sleep(24); create table t3 (c1 int); select * from t1; {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-11028) Table loading could fail if metastore cleans up old events
Vihang Karajgaonkar created IMPALA-11028: Summary: Table loading could fail if metastore cleans up old events Key: IMPALA-11028 URL: https://issues.apache.org/jira/browse/IMPALA-11028 Project: IMPALA Issue Type: Bug Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar After IMPALA-10502, Catalogd tracks the table's create event id. When the table is loaded for the first time, it updates the create event id of the table. But if the table is loaded for the first time after a long delay (after 24 hrs) it is possible the metastore cleans up old notification logs entries which are required by catalogd during the table load. See this snippet from TableLoader.java {noformat} if (eventId != -1 && catalog_.isEventProcessingActive()) { // If the eventId is not -1 it means this table was likely created by Impala. // However, since the load operation of the table can happen much later, it is // possible that the table was recreated outside Impala and hence the eventId // which is stored in the loaded table needs to be updated to the latest. // we are only interested in fetching the events if we have a valid eventId // for a table. For tables where eventId is unknown are not created by // this catalogd and hence the self-event detection logic does not apply. events = MetastoreEventsProcessor.getNextMetastoreEvents(catalog_, eventId, notificationEvent -> CreateTableEvent.CREATE_TABLE_EVENT_TYPE .equals(notificationEvent.getEventType()) && notificationEvent.getDbName().equalsIgnoreCase(db.getName()) && notificationEvent.getTableName().equalsIgnoreCase(tblName)); } {noformat} {{getNextMetastoreEvents}} method can throw the following exception if the metastore has cleaned up older entries (by default 24 hrs). This is controlled by configuration {{hive.metastore.event.db.listener.timetolive}} on the metastore side. I could reproduce the problem setting the following metastore configs. {noformat} hive.metastore.event.db.listener.clean.interval=10s hive.metastore.event.db.listener.timetolive=120s {noformat} Now run the following Impala script {noformat} create table t1 (c1 int); create table t2 (c1 int); select sleep(24); create table t3 (c1 int); select * from t1; {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (IMPALA-10987) Changing impala.disableHmsSync in Hive can break event processing
[ https://issues.apache.org/jira/browse/IMPALA-10987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434499#comment-17434499 ] Vihang Karajgaonkar commented on IMPALA-10987: -- Possible solutions to improve this: 1. In case a table level sync is re-enabled: a. if the table exists in Impala, we can just invalidate the table so that it is reloaded the first time query accesses it. This would take of any missing ADD/DROP partition events on the table during the time the events sync was disabled on the table. b. If the table doesn't exist in Impala, create a Incomplete table, if there is no entry in the event delete log for this table. I am not sure how to handle a database level sync re-enable efficiently. I wish we had a {{refresh database}} which would have been useful here. The other approach is to invalidate any tables in the database which evaluate to sync being turned on and previously didn't have them as turned on. We will still need to handle the missing create/drop table events during the time window when the events sync was disabled. > Changing impala.disableHmsSync in Hive can break event processing > - > > Key: IMPALA-10987 > URL: https://issues.apache.org/jira/browse/IMPALA-10987 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Csaba Ringhofer >Priority: Major > > To reproduce, start Impala with event polling: > {code} > bin/start-impala-cluster.py --catalogd_args="--hms_event_polling_interval_s=2 > --catalog_topic_mode=minimal" --impalad_args="--use_local_catalog=1" > {code} > From Hive: > {code} > CREATE DATABASE temp; > CREATE EXTERNAL TABLE temp.t (i int) PARTITIONED BY (p int) > TBLPROPERTIES('impala.disableHmsSync'='true'); > ALTER TABLE temp.t SET TBLPROPERTIES ('impala.disableHmsSync'='false'); > {code} > From this point event sync will be broken in Impala. It can be fixed only > with global INVALIDATE METADATA (or restarting catalogd) > catalogd log will include an exception like this: > {code} > E1026 10:30:16.151208 22514 MetastoreEventsProcessor.java:653] Event > processing needs a invalidate command to resolve the state > Java exception follows: > org.apache.impala.catalog.events.MetastoreNotificationNeedsInvalidateException: > EventId: 15956 EventType: ALTER_TABLE Detected that event sync was tur > ned on for the table temp.t and the table does not exist. Event processing > cannot be continued further. Issue a invalidate metadata command to reset > the event processing state > at > org.apache.impala.catalog.events.MetastoreEvents$AlterTableEvent.process(MetastoreEvents.java:992) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:345) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:747) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:645) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > {code} > and future events will be lead to a log like this: > {code} > W1026 10:30:18.151962 22514 MetastoreEventsProcessor.java:638] Event > processing is skipped since status is NEEDS_INVALIDATE. Last synced event id > is 15955 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10987) Changing impala.disableHmsSync in Hive can break event processing
[ https://issues.apache.org/jira/browse/IMPALA-10987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434491#comment-17434491 ] Vihang Karajgaonkar commented on IMPALA-10987: -- Yes, unfortunately, currently we require a global invalidate to reset the events processor if the events sync is reenabled on a table. My original thinking behind this design decision was that 1) Events processor cannot just start processing events from that point onwards because of the fact that it might have missed some create/drop events as well. This is probably more relevant to database level flag than table level although a table may also had add/drop partition events which are skipped during this time window. 2) I did not anticipate re-enabling events sync on a table or database may not be very common. This would likely be a one-time operation and hence I thought it was okay to do a catalogd reset. That said this was always on my to-do list to get rid of this requirement. I will look into ways to avoid doing global invalidate when the events sync is turned back on. I don't think this is a bug since it is documented behavior. See https://impala.apache.org/docs/build/html/topics/impala_metadata.html > Changing impala.disableHmsSync in Hive can break event processing > - > > Key: IMPALA-10987 > URL: https://issues.apache.org/jira/browse/IMPALA-10987 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Csaba Ringhofer >Priority: Major > > To reproduce, start Impala with event polling: > {code} > bin/start-impala-cluster.py --catalogd_args="--hms_event_polling_interval_s=2 > --catalog_topic_mode=minimal" --impalad_args="--use_local_catalog=1" > {code} > From Hive: > {code} > CREATE DATABASE temp; > CREATE EXTERNAL TABLE temp.t (i int) PARTITIONED BY (p int) > TBLPROPERTIES('impala.disableHmsSync'='true'); > ALTER TABLE temp.t SET TBLPROPERTIES ('impala.disableHmsSync'='false'); > {code} > From this point event sync will be broken in Impala. It can be fixed only > with global INVALIDATE METADATA (or restarting catalogd) > catalogd log will include an exception like this: > {code} > E1026 10:30:16.151208 22514 MetastoreEventsProcessor.java:653] Event > processing needs a invalidate command to resolve the state > Java exception follows: > org.apache.impala.catalog.events.MetastoreNotificationNeedsInvalidateException: > EventId: 15956 EventType: ALTER_TABLE Detected that event sync was tur > ned on for the table temp.t and the table does not exist. Event processing > cannot be continued further. Issue a invalidate metadata command to reset > the event processing state > at > org.apache.impala.catalog.events.MetastoreEvents$AlterTableEvent.process(MetastoreEvents.java:992) > at > org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:345) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:747) > at > org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:645) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > {code} > and future events will be lead to a log like this: > {code} > W1026 10:30:18.151962 22514 MetastoreEventsProcessor.java:638] Event > processing is skipped since status is NEEDS_INVALIDATE. Last synced event id > is 15955 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10897) TestEventProcessing.test_event_based_replication is flaky
[ https://issues.apache.org/jira/browse/IMPALA-10897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10897. -- Fix Version/s: Impala 4.0.1 Resolution: Fixed This test has been disabled as part of IMPALA-9857 > TestEventProcessing.test_event_based_replication is flaky > - > > Key: IMPALA-10897 > URL: https://issues.apache.org/jira/browse/IMPALA-10897 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Vihang Karajgaonkar >Priority: Critical > Fix For: Impala 4.0.1 > > > Saw this in an ASAN build: > {code:python} > metadata/test_event_processing.py:185: in test_event_based_replication > self.__run_event_based_replication_tests() > metadata/test_event_processing.py:326: in __run_event_based_replication_tests > EventProcessorUtils.wait_for_event_processing(self) > util/event_processor_utils.py:61: in wait_for_event_processing > within {1} seconds".format(current_event_id, timeout)) > E Exception: Event processor did not sync till last known event id 34722 >within 10 seconds {code} > Standard Error > {code} > SET > client_identifier=metadata/test_event_processing.py::TestEventProcessing::()::test_event_based_replication; > -- connecting to: localhost:21000 > -- connecting to localhost:21050 with impyla > -- 2021-08-28 23:43:40,300 INFO MainThread: Closing active operation > -- connecting to localhost:28000 with impyla > -- 2021-08-28 23:43:40,323 INFO MainThread: Closing active operation > -- connecting to localhost:11050 with impyla > -- 2021-08-28 23:43:48,026 INFO MainThread: Waiting until events > processor syncs to event id:31451 > -- 2021-08-28 23:43:48,759 DEBUGMainThread: Metric last-synced-event-id > has reached the desired value:31455 > -- 2021-08-28 23:43:48,790 DEBUGMainThread: Found 3 impalad/1 > statestored/1 catalogd process(es) > -- 2021-08-28 23:43:48,820 INFO MainThread: Getting metric: > catalog.curr-version from > impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000 > -- 2021-08-28 23:43:48,824 INFO MainThread: Sleeping 1s before next retry. > -- 2021-08-28 23:43:49,825 INFO MainThread: Getting metric: > catalog.curr-version from > impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000 > -- 2021-08-28 23:43:49,829 INFO MainThread: Sleeping 1s before next retry. > -- 2021-08-28 23:43:50,830 INFO MainThread: Getting metric: > catalog.curr-version from > impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000 > -- 2021-08-28 23:43:50,835 INFO MainThread: Sleeping 1s before next retry. > -- 2021-08-28 23:43:51,836 INFO MainThread: Getting metric: > catalog.curr-version from > impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000 > -- 2021-08-28 23:43:51,840 INFO MainThread: Sleeping 1s before next retry. > -- 2021-08-28 23:43:52,841 INFO MainThread: Getting metric: > catalog.curr-version from > impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000 > -- 2021-08-28 23:43:52,846 INFO MainThread: Metric 'catalog.curr-version' > has reached desired value: 2364 > -- 2021-08-28 23:43:52,846 INFO MainThread: Getting metric: > catalog.curr-version from > impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25001 > -- 2021-08-28 23:43:52,851 INFO MainThread: Metric 'catalog.curr-version' > has reached desired value: 2364 > -- 2021-08-28 23:43:52,851 INFO MainThread: Getting metric: > catalog.curr-version from > impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25002 > -- 2021-08-28 23:43:52,855 INFO MainThread: Metric 'catalog.curr-version' > has reached desired value: 2364 > -- executing against localhost:21000 > create table repl_source_ugchr.unpart_tbl (a string, b string) stored as > parquet tblproperties > ('transactional'='true','transactional_properties'='insert_only'); > -- 2021-08-28 23:43:52,878 INFO MainThread: Started query > 394339b6db812c59:a5e5039a > -- executing against localhost:21000 > create table repl_source_ugchr.part_tbl (id int, bool_col boolean, > tinyint_col tinyint, smallint_col smallint, int_col int, bigint_col bigint, > float_col float, double_col double, date_string string, string_col string, > timestamp_col timestamp) partitioned by (year int, month int) stored as > parquet tblproperties > ('transactional'='true','transactional_properties'='insert_only'); > -- 2021-08-28 23:43:52,900 INFO MainThread: Started query > b74f5e32e4c1790a:46410750 > -- executing against localhost:21000 > insert into repl_source_ugchr.unpart_tbl select * from functional.tinytable; > -- 2021-08-28 23:43:56,132 INFO MainThread: Started query >
[jira] [Resolved] (IMPALA-10897) TestEventProcessing.test_event_based_replication is flaky
[ https://issues.apache.org/jira/browse/IMPALA-10897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10897. -- Fix Version/s: Impala 4.0.1 Resolution: Fixed This test has been disabled as part of IMPALA-9857 > TestEventProcessing.test_event_based_replication is flaky > - > > Key: IMPALA-10897 > URL: https://issues.apache.org/jira/browse/IMPALA-10897 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Vihang Karajgaonkar >Priority: Critical > Fix For: Impala 4.0.1 > > > Saw this in an ASAN build: > {code:python} > metadata/test_event_processing.py:185: in test_event_based_replication > self.__run_event_based_replication_tests() > metadata/test_event_processing.py:326: in __run_event_based_replication_tests > EventProcessorUtils.wait_for_event_processing(self) > util/event_processor_utils.py:61: in wait_for_event_processing > within {1} seconds".format(current_event_id, timeout)) > E Exception: Event processor did not sync till last known event id 34722 >within 10 seconds {code} > Standard Error > {code} > SET > client_identifier=metadata/test_event_processing.py::TestEventProcessing::()::test_event_based_replication; > -- connecting to: localhost:21000 > -- connecting to localhost:21050 with impyla > -- 2021-08-28 23:43:40,300 INFO MainThread: Closing active operation > -- connecting to localhost:28000 with impyla > -- 2021-08-28 23:43:40,323 INFO MainThread: Closing active operation > -- connecting to localhost:11050 with impyla > -- 2021-08-28 23:43:48,026 INFO MainThread: Waiting until events > processor syncs to event id:31451 > -- 2021-08-28 23:43:48,759 DEBUGMainThread: Metric last-synced-event-id > has reached the desired value:31455 > -- 2021-08-28 23:43:48,790 DEBUGMainThread: Found 3 impalad/1 > statestored/1 catalogd process(es) > -- 2021-08-28 23:43:48,820 INFO MainThread: Getting metric: > catalog.curr-version from > impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000 > -- 2021-08-28 23:43:48,824 INFO MainThread: Sleeping 1s before next retry. > -- 2021-08-28 23:43:49,825 INFO MainThread: Getting metric: > catalog.curr-version from > impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000 > -- 2021-08-28 23:43:49,829 INFO MainThread: Sleeping 1s before next retry. > -- 2021-08-28 23:43:50,830 INFO MainThread: Getting metric: > catalog.curr-version from > impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000 > -- 2021-08-28 23:43:50,835 INFO MainThread: Sleeping 1s before next retry. > -- 2021-08-28 23:43:51,836 INFO MainThread: Getting metric: > catalog.curr-version from > impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000 > -- 2021-08-28 23:43:51,840 INFO MainThread: Sleeping 1s before next retry. > -- 2021-08-28 23:43:52,841 INFO MainThread: Getting metric: > catalog.curr-version from > impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25000 > -- 2021-08-28 23:43:52,846 INFO MainThread: Metric 'catalog.curr-version' > has reached desired value: 2364 > -- 2021-08-28 23:43:52,846 INFO MainThread: Getting metric: > catalog.curr-version from > impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25001 > -- 2021-08-28 23:43:52,851 INFO MainThread: Metric 'catalog.curr-version' > has reached desired value: 2364 > -- 2021-08-28 23:43:52,851 INFO MainThread: Getting metric: > catalog.curr-version from > impala-ec2-centos74-m5-4xlarge-ondemand-1787.vpc.cloudera.com:25002 > -- 2021-08-28 23:43:52,855 INFO MainThread: Metric 'catalog.curr-version' > has reached desired value: 2364 > -- executing against localhost:21000 > create table repl_source_ugchr.unpart_tbl (a string, b string) stored as > parquet tblproperties > ('transactional'='true','transactional_properties'='insert_only'); > -- 2021-08-28 23:43:52,878 INFO MainThread: Started query > 394339b6db812c59:a5e5039a > -- executing against localhost:21000 > create table repl_source_ugchr.part_tbl (id int, bool_col boolean, > tinyint_col tinyint, smallint_col smallint, int_col int, bigint_col bigint, > float_col float, double_col double, date_string string, string_col string, > timestamp_col timestamp) partitioned by (year int, month int) stored as > parquet tblproperties > ('transactional'='true','transactional_properties'='insert_only'); > -- 2021-08-28 23:43:52,900 INFO MainThread: Started query > b74f5e32e4c1790a:46410750 > -- executing against localhost:21000 > insert into repl_source_ugchr.unpart_tbl select * from functional.tinytable; > -- 2021-08-28 23:43:56,132 INFO MainThread: Started query >
[jira] [Commented] (IMPALA-9857) Batch ALTER_PARTITION events
[ https://issues.apache.org/jira/browse/IMPALA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424042#comment-17424042 ] Vihang Karajgaonkar commented on IMPALA-9857: - IMPALA-10949 is created as a follow-up which can improve the batching logic significantly. > Batch ALTER_PARTITION events > > > Key: IMPALA-9857 > URL: https://issues.apache.org/jira/browse/IMPALA-9857 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > > When Hive inserts data into partitioned tables, it generates a lot of > ALTER_PARTITION (and possibly INSERT_EVENT) in quick succession. Currently, > such events are processed one by one by EventsProcessor which is can be slow > and can cause EventsProcessor to lag behind. This JIRA proposes to use > batching for such ALTER_PARTITION events such that all the successive > ALTER_PARTITION events for the same table are batched together into one > ALTER_PARTITIONS event and then are processed together to refresh all the > partitions from the events. This can significantly speed up the event > processing in such cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10949) Improve batching logic of events
Vihang Karajgaonkar created IMPALA-10949: Summary: Improve batching logic of events Key: IMPALA-10949 URL: https://issues.apache.org/jira/browse/IMPALA-10949 Project: IMPALA Issue Type: Improvement Reporter: Vihang Karajgaonkar This is a followup based on the review comment https://gerrit.cloudera.org/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641 Current approach of batching batches together the events from a single operation so that self-event check is done per-batch. However, it looks like there is a considerable scope of improving the batching logic by clubbing together accross the various sources of the events on a table when IMPALA-10926 is merged. After IMPALA-10926 each table will track the last_synced_event and then the events processor can simply ignore a event which <= the last_synced_event. This simplification of self-events logic will enable easier batching for all the events of a type on a table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10949) Improve batching logic of events
Vihang Karajgaonkar created IMPALA-10949: Summary: Improve batching logic of events Key: IMPALA-10949 URL: https://issues.apache.org/jira/browse/IMPALA-10949 Project: IMPALA Issue Type: Improvement Reporter: Vihang Karajgaonkar This is a followup based on the review comment https://gerrit.cloudera.org/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641 Current approach of batching batches together the events from a single operation so that self-event check is done per-batch. However, it looks like there is a considerable scope of improving the batching logic by clubbing together accross the various sources of the events on a table when IMPALA-10926 is merged. After IMPALA-10926 each table will track the last_synced_event and then the events processor can simply ignore a event which <= the last_synced_event. This simplification of self-events logic will enable easier batching for all the events of a type on a table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (IMPALA-10236) Queries stuck if catalog topic update compression fails
[ https://issues.apache.org/jira/browse/IMPALA-10236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar reassigned IMPALA-10236: Assignee: Vihang Karajgaonkar > Queries stuck if catalog topic update compression fails > --- > > Key: IMPALA-10236 > URL: https://issues.apache.org/jira/browse/IMPALA-10236 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 2.12.0 >Reporter: Shant Hovsepian >Assignee: Vihang Karajgaonkar >Priority: Critical > Labels: hang, supportability > > If a to be compressed Catalog Object doesn't fit into a 2GB buffer, an error > is thrown. > > {code:java} > /// Compresses a serialized catalog object using LZ4 and stores it back in > 'dst'. Stores > /// the size of the uncompressed catalog object in the first sizeof(uint32_t) > bytes of > /// 'dst'. The compression fails if the uncompressed data size exceeds > 0x7E00 bytes. > Status CompressCatalogObject(const uint8_t* src, uint32_t size, std::string* > dst) > WARN_UNUSED_RESULT; > {code} > > CatalogServer::AddPendingTopicItem() calls CompressCatalogObject() > > {code:java} > // Add a catalog update to pending_topic_updates_. > extern "C" > JNIEXPORT jboolean JNICALL > Java_org_apache_impala_service_FeSupport_NativeAddPendingTopicItem(JNIEnv* > env, > jclass caller_class, jlong native_catalog_server_ptr, jstring key, jlong > version, > jbyteArray serialized_object, jboolean deleted) { > std::string key_string; > { > JniUtfCharGuard key_str; > if (!JniUtfCharGuard::create(env, key, _str).ok()) { > return static_cast(false); > } > key_string.assign(key_str.get()); > } > JniScopedArrayCritical obj_buf; > if (!JniScopedArrayCritical::Create(env, serialized_object, _buf)) { > return static_cast(false); > } > reinterpret_cast(native_catalog_server_ptr)-> > AddPendingTopicItem(std::move(key_string), version, obj_buf.get(), > static_cast(obj_buf.size()), deleted); > return static_cast(true); > } > {code} > However the JNI call to AddPendingTopicItem discards the return value. > Recently the return value was maintained due to IMPALA-10076: > {code:java} > -if (!FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr, > v1Key, > -obj.catalog_version, data, delete)) { > +int actualSize = > FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr, > +v1Key, obj.catalog_version, data, delete); > +if (actualSize < 0) { >LOG.error("NativeAddPendingTopicItem failed in BE. key=" + v1Key + > ", delete=" >+ delete + ", data_size=" + data.length); > +} else if (summary != null && obj.type == HDFS_PARTITION) { > + summary.update(true, delete, obj.hdfs_partition.partition_name, > + obj.catalog_version, data.length, actualSize); > } >} > {code} > CatalogServiceCatalog::addCatalogObject() now produces an error message but > the Catalog update doesn't go through. > {code:java} > if (topicMode_ == TopicMode.FULL || topicMode_ == TopicMode.MIXED) { > String v1Key = CatalogServiceConstants.CATALOG_TOPIC_V1_PREFIX + key; > byte[] data = serializer.serialize(obj); > int actualSize = > FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr, > v1Key, obj.catalog_version, data, delete); > if (actualSize < 0) { > LOG.error("NativeAddPendingTopicItem failed in BE. key=" + v1Key + > ", delete=" > + delete + ", data_size=" + data.length); > } else if (summary != null && obj.type == HDFS_PARTITION) { > summary.update(true, delete, obj.hdfs_partition.partition_name, > obj.catalog_version, data.length, actualSize); > } > } > {code} > Not sure what the right behavior would be, we could handle the compression > issue and try more aggressive compression, or unblock the catalog update. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10925) Improved self event detection for event processor in catalogd
[ https://issues.apache.org/jira/browse/IMPALA-10925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417830#comment-17417830 ] Vihang Karajgaonkar commented on IMPALA-10925: -- I think the problem of consecutive create and drop events is not present any more because we keep a createEventId. The redesign generalizes existing approach to keep a lastSyncedEventId instead of createEventId so that we can use a similar mechanism for ALTER events. > Improved self event detection for event processor in catalogd > -- > > Key: IMPALA-10925 > URL: https://issues.apache.org/jira/browse/IMPALA-10925 > Project: IMPALA > Issue Type: Epic > Components: Catalog >Reporter: Sourabh Goyal >Assignee: Sourabh Goyal >Priority: Major > > h3. Problem Statement > Impala catalogd has Events processor which polls metastore events at regular > intervals to automatically apply changes to the metadata in the catalogd. > However, the current design to detect the self-generated events (DDL/DMLs > coming from the same catalogd) have consistency problems which can cause > query failures under certain circumstances. > > h3. Current Design > The current design of self-event detection is based on adding markers to the > HMS objects which are detected when the event is received later to determine > if the event is self-generated or not. These markers constitute a serviceID > which is unique to the catalogd instance and a catalog version number which > is unique for each catalog object. When a DDL is executed, catalogd adds > these as object parameters. When the event is received, Events processor > checks the serviceID and if the catalog version of the current object with > the same name in the catalogd cache and makes a decision of whether to ignore > the event or not. > > h3. Problems with the current design > The approach is problematic under some circumstances where there are > conflicting DDLs repeated at a faster interval. For example, a sequence of > create/drop table DDLs will generate CREATE_TABLE and DROP_TABLE events. When > the events are received, it is possible that the CREATE_TABLE event is > processed because the catalogd doesn’t have the table in the catalogd cache. > h3. Proposed Solution > The main idea of the solution is to keep track of the last event id for a > given table as eventId which the catalogd has synced to in the Table object. > The events processor ignores any event whose EVENT_ID is less than or equal > to the eventId stored in the table. Once the events processor successfully > processes a given event, it updates the value of eventId in the table before > releasing the table lock. Also, any DDL or refresh operation on the catalogd > will follow the steps given below to update the event id for the table. The > solution relies on the existing locking mechanism in the catalogd to prevent > any other concurrent updates to the table (even via EventsProcessor). > > In case of database objects, we will also have a similar eventId which > represents the events on the database object (CREATE, DROP, ALTER database) > and to which the catalogd as synced to. Since there is no refresh database > command, catalogOpExecutor will only update the database eventId when there > are DDLs at the database level (e.g CREATE, DROP, ALTER database) > > cc - [~vihangk1] [~kishendas] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10924) TestIcebergTable.test_partitioned_insert fails with IOException
Vihang Karajgaonkar created IMPALA-10924: Summary: TestIcebergTable.test_partitioned_insert fails with IOException Key: IMPALA-10924 URL: https://issues.apache.org/jira/browse/IMPALA-10924 Project: IMPALA Issue Type: Bug Reporter: Vihang Karajgaonkar Assignee: Zoltán Borók-Nagy The test query_test.test_iceberg.TestIcebergTable.test_partitioned_insert fails intermittently with a IOException and stack trace below. {noformat} uery_test/test_iceberg.py:80: in test_partitioned_insert use_db=unique_database) common/impala_test_suite.py:682: in run_test_case result = exec_fn(query, user=test_section.get('USER', '').strip() or None) common/impala_test_suite.py:620: in __exec_in_impala result = self.__execute_query(target_impalad_client, query, user=user) common/impala_test_suite.py:940: in __execute_query return impalad_client.execute(query, user=user) common/impala_connection.py:212: in execute return self.__beeswax_client.execute(sql_stmt, user=user) beeswax/impala_beeswax.py:189: in execute handle = self.__execute_query(query_string.strip(), user=user) beeswax/impala_beeswax.py:367: in __execute_query self.wait_for_finished(handle) beeswax/impala_beeswax.py:388: in wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + error_log, None) E ImpalaBeeswaxException: ImpalaBeeswaxException: EQuery aborted:RuntimeIOException: Failed to write json to file: hdfs://localhost:20500/test-warehouse/test_partitioned_insert_af8be2c3.db/ice_only_part/metadata/2-b8d13a74-4839-4dd3-b74a-6df9436774a2.metadata.json E CAUSED BY: IOException: The stream is closed {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10924) TestIcebergTable.test_partitioned_insert fails with IOException
Vihang Karajgaonkar created IMPALA-10924: Summary: TestIcebergTable.test_partitioned_insert fails with IOException Key: IMPALA-10924 URL: https://issues.apache.org/jira/browse/IMPALA-10924 Project: IMPALA Issue Type: Bug Reporter: Vihang Karajgaonkar Assignee: Zoltán Borók-Nagy The test query_test.test_iceberg.TestIcebergTable.test_partitioned_insert fails intermittently with a IOException and stack trace below. {noformat} uery_test/test_iceberg.py:80: in test_partitioned_insert use_db=unique_database) common/impala_test_suite.py:682: in run_test_case result = exec_fn(query, user=test_section.get('USER', '').strip() or None) common/impala_test_suite.py:620: in __exec_in_impala result = self.__execute_query(target_impalad_client, query, user=user) common/impala_test_suite.py:940: in __execute_query return impalad_client.execute(query, user=user) common/impala_connection.py:212: in execute return self.__beeswax_client.execute(sql_stmt, user=user) beeswax/impala_beeswax.py:189: in execute handle = self.__execute_query(query_string.strip(), user=user) beeswax/impala_beeswax.py:367: in __execute_query self.wait_for_finished(handle) beeswax/impala_beeswax.py:388: in wait_for_finished raise ImpalaBeeswaxException("Query aborted:" + error_log, None) E ImpalaBeeswaxException: ImpalaBeeswaxException: EQuery aborted:RuntimeIOException: Failed to write json to file: hdfs://localhost:20500/test-warehouse/test_partitioned_insert_af8be2c3.db/ice_only_part/metadata/2-b8d13a74-4839-4dd3-b74a-6df9436774a2.metadata.json E CAUSED BY: IOException: The stream is closed {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IMPALA-10922) test_orc_stats failing on exhaustive builds
[ https://issues.apache.org/jira/browse/IMPALA-10922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated IMPALA-10922: - Issue Type: Bug (was: Test) > test_orc_stats failing on exhaustive builds > --- > > Key: IMPALA-10922 > URL: https://issues.apache.org/jira/browse/IMPALA-10922 > Project: IMPALA > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Norbert Luksa >Priority: Blocker > Labels: broken-build > > test_orc_stats.py is failing on certain exhaustive builds. The stack trace of > the failure looks like below. > {noformat} > query_test/test_orc_stats.py:40: in test_orc_stats > self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database) > common/impala_test_suite.py:779: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:653: in verify_runtime_profile > % (function, field, expected_value, actual_value, op, actual)) > E AssertionError: Aggregation of SUM over RowsRead did not match expected > results. > E EXPECTED VALUE: > E 0 > E > E > E ACTUAL VALUE: > E 10 > E > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10922) test_orc_stats failing on exhaustive builds
[ https://issues.apache.org/jira/browse/IMPALA-10922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417770#comment-17417770 ] Vihang Karajgaonkar commented on IMPALA-10922: -- Assigning this to you [~norbertluksa] since it may be related to your recent commit IMPALA-6505. > test_orc_stats failing on exhaustive builds > --- > > Key: IMPALA-10922 > URL: https://issues.apache.org/jira/browse/IMPALA-10922 > Project: IMPALA > Issue Type: Test >Reporter: Vihang Karajgaonkar >Assignee: Norbert Luksa >Priority: Blocker > Labels: broken-build > > test_orc_stats.py is failing on certain exhaustive builds. The stack trace of > the failure looks like below. > {noformat} > query_test/test_orc_stats.py:40: in test_orc_stats > self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database) > common/impala_test_suite.py:779: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:653: in verify_runtime_profile > % (function, field, expected_value, actual_value, op, actual)) > E AssertionError: Aggregation of SUM over RowsRead did not match expected > results. > E EXPECTED VALUE: > E 0 > E > E > E ACTUAL VALUE: > E 10 > E > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10922) test_orc_stats failing on exhaustive builds
Vihang Karajgaonkar created IMPALA-10922: Summary: test_orc_stats failing on exhaustive builds Key: IMPALA-10922 URL: https://issues.apache.org/jira/browse/IMPALA-10922 Project: IMPALA Issue Type: Test Reporter: Vihang Karajgaonkar Assignee: Norbert Luksa test_orc_stats.py is failing on certain exhaustive builds. The stack trace of the failure looks like below. {noformat} query_test/test_orc_stats.py:40: in test_orc_stats self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database) common/impala_test_suite.py:779: in run_test_case update_section=pytest.config.option.update_results) common/test_result_verifier.py:653: in verify_runtime_profile % (function, field, expected_value, actual_value, op, actual)) E AssertionError: Aggregation of SUM over RowsRead did not match expected results. E EXPECTED VALUE: E 0 E E E ACTUAL VALUE: E 10 E {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10922) test_orc_stats failing on exhaustive builds
Vihang Karajgaonkar created IMPALA-10922: Summary: test_orc_stats failing on exhaustive builds Key: IMPALA-10922 URL: https://issues.apache.org/jira/browse/IMPALA-10922 Project: IMPALA Issue Type: Test Reporter: Vihang Karajgaonkar Assignee: Norbert Luksa test_orc_stats.py is failing on certain exhaustive builds. The stack trace of the failure looks like below. {noformat} query_test/test_orc_stats.py:40: in test_orc_stats self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database) common/impala_test_suite.py:779: in run_test_case update_section=pytest.config.option.update_results) common/test_result_verifier.py:653: in verify_runtime_profile % (function, field, expected_value, actual_value, op, actual)) E AssertionError: Aggregation of SUM over RowsRead did not match expected results. E EXPECTED VALUE: E 0 E E E ACTUAL VALUE: E 10 E {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10888) getPartitionsByNames should return partitions sorted by name
[ https://issues.apache.org/jira/browse/IMPALA-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10888. -- Fix Version/s: Impala 4.1.0 Resolution: Fixed > getPartitionsByNames should return partitions sorted by name > > > Key: IMPALA-10888 > URL: https://issues.apache.org/jira/browse/IMPALA-10888 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1.0 > > > The CatalogMetastoreServer's implementation of {{getPartitionByNames}} does > not return partitions order by partition name whereas in case of HMS it > orders them by partition name. While this is not a documented behavior and > clients should not assume this it can cause test flakiness where we expect > the order of the partitions to be consistent. We should change the > implementation so that the returned partitions over this API are sorted by > partition name. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10888) getPartitionsByNames should return partitions sorted by name
[ https://issues.apache.org/jira/browse/IMPALA-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10888. -- Fix Version/s: Impala 4.1.0 Resolution: Fixed > getPartitionsByNames should return partitions sorted by name > > > Key: IMPALA-10888 > URL: https://issues.apache.org/jira/browse/IMPALA-10888 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1.0 > > > The CatalogMetastoreServer's implementation of {{getPartitionByNames}} does > not return partitions order by partition name whereas in case of HMS it > orders them by partition name. While this is not a documented behavior and > clients should not assume this it can cause test flakiness where we expect > the order of the partitions to be consistent. We should change the > implementation so that the returned partitions over this API are sorted by > partition name. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10907) Refactor MetastoreEvents class
Vihang Karajgaonkar created IMPALA-10907: Summary: Refactor MetastoreEvents class Key: IMPALA-10907 URL: https://issues.apache.org/jira/browse/IMPALA-10907 Project: IMPALA Issue Type: Improvement Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar The MetastoreEvents.java is single class which has a bunch of inner classes (most of which are public). The file has become pretty large and it would make sense to refactor the file into separate classes to improve code readability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10776) Hold write lock for less time in ALTER TABLE RECOVER PARTITIONS
[ https://issues.apache.org/jira/browse/IMPALA-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar reassigned IMPALA-10776: Assignee: Vihang Karajgaonkar > Hold write lock for less time in ALTER TABLE RECOVER PARTITIONS > --- > > Key: IMPALA-10776 > URL: https://issues.apache.org/jira/browse/IMPALA-10776 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Csaba Ringhofer >Assignee: Vihang Karajgaonkar >Priority: Major > > ALTER TABLE RECOVER PARTITIONS holds a write lock on the table for the whole > time while it lists the HDFS directories and creates the new partitions in > HMS. This can potentially take a long time and block catalog updates. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10776) Hold write lock for less time in ALTER TABLE RECOVER PARTITIONS
[ https://issues.apache.org/jira/browse/IMPALA-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411430#comment-17411430 ] Vihang Karajgaonkar commented on IMPALA-10776: -- I can take a stab at this. > Hold write lock for less time in ALTER TABLE RECOVER PARTITIONS > --- > > Key: IMPALA-10776 > URL: https://issues.apache.org/jira/browse/IMPALA-10776 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Csaba Ringhofer >Priority: Major > > ALTER TABLE RECOVER PARTITIONS holds a write lock on the table for the whole > time while it lists the HDFS directories and creates the new partitions in > HMS. This can potentially take a long time and block catalog updates. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-10888) getPartitionsByNames should return partitions sorted by name
[ https://issues.apache.org/jira/browse/IMPALA-10888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10888 started by Vihang Karajgaonkar. > getPartitionsByNames should return partitions sorted by name > > > Key: IMPALA-10888 > URL: https://issues.apache.org/jira/browse/IMPALA-10888 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > > The CatalogMetastoreServer's implementation of {{getPartitionByNames}} does > not return partitions order by partition name whereas in case of HMS it > orders them by partition name. While this is not a documented behavior and > clients should not assume this it can cause test flakiness where we expect > the order of the partitions to be consistent. We should change the > implementation so that the returned partitions over this API are sorted by > partition name. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10888) getPartitionsByNames should return partitions sorted by name
Vihang Karajgaonkar created IMPALA-10888: Summary: getPartitionsByNames should return partitions sorted by name Key: IMPALA-10888 URL: https://issues.apache.org/jira/browse/IMPALA-10888 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar The CatalogMetastoreServer's implementation of {{getPartitionByNames}} does not return partitions order by partition name whereas in case of HMS it orders them by partition name. While this is not a documented behavior and clients should not assume this it can cause test flakiness where we expect the order of the partitions to be consistent. We should change the implementation so that the returned partitions over this API are sorted by partition name. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10888) getPartitionsByNames should return partitions sorted by name
Vihang Karajgaonkar created IMPALA-10888: Summary: getPartitionsByNames should return partitions sorted by name Key: IMPALA-10888 URL: https://issues.apache.org/jira/browse/IMPALA-10888 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar The CatalogMetastoreServer's implementation of {{getPartitionByNames}} does not return partitions order by partition name whereas in case of HMS it orders them by partition name. While this is not a documented behavior and clients should not assume this it can cause test flakiness where we expect the order of the partitions to be consistent. We should change the implementation so that the returned partitions over this API are sorted by partition name. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IMPALA-10885) TestMetastoreService.test_get_table_req_without_fallback fails in a S3 build
[ https://issues.apache.org/jira/browse/IMPALA-10885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405368#comment-17405368 ] Vihang Karajgaonkar commented on IMPALA-10885: -- Thanks [~stigahuang] I will take a look. > TestMetastoreService.test_get_table_req_without_fallback fails in a S3 build > > > Key: IMPALA-10885 > URL: https://issues.apache.org/jira/browse/IMPALA-10885 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Vihang Karajgaonkar >Priority: Critical > Labels: broken-build > > custom_cluster.test_metastore_service.TestMetastoreService.test_get_table_req_without_fallback > {code:java} > custom_cluster/test_metastore_service.py:269: in > test_get_table_req_without_fallback > get_table_request, expected_exception_str) > custom_cluster/test_metastore_service.py:1215: in > __call_get_table_req_expect_exception > assert expected_exception_str in str(e) > E assert 'Database test_get_table_req_without_fallback_dbgiioi not found' > in "NoSuchObjectException(_message='Table > test_get_table_req_without_fallback_dbgiioi.test_get_table_req_tblglidw not > found')" > E+ where "NoSuchObjectException(_message='Table > test_get_table_req_without_fallback_dbgiioi.test_get_table_req_tblglidw not > found')" = str(NoSuchObjectException(_message='Table > test_get_table_req_without_fallback_dbgiioi.test_get_table_req_tblglidw not > found')){code} > The commit of the build is 237ed5e8738ec565bc8d3ce813d9b70c12ad4ce7. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9057) TestEventProcessing.test_insert_events_transactional is flaky
[ https://issues.apache.org/jira/browse/IMPALA-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17403332#comment-17403332 ] Vihang Karajgaonkar commented on IMPALA-9057: - Just to update the latest here, I found that HMS has new a API which was introduced in https://issues.apache.org/jira/browse/HIVE-25137 which gives the clients the ability to fetch the WriteId information given the commit transaction id. Using this API we can enhance the MetastoreEventsProcessor to fetch ACID_WRITE events from HMS and refresh the ACID tables, when commit transaction event is received. This should fix the race condition described above. I will see if we can bump up the GBN which includes the new HMS API. > TestEventProcessing.test_insert_events_transactional is flaky > - > > Key: IMPALA-9057 > URL: https://issues.apache.org/jira/browse/IMPALA-9057 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.4.0 >Reporter: Alice Fan >Assignee: Vihang Karajgaonkar >Priority: Blocker > Labels: build-failure, flaky > > Assertion failure for > custom_cluster.test_event_processing.TestEventProcessing.test_insert_events_transactional > > {code:java} > Error Message > assert ['101', 'x', ..., '3', '2019'] == ['101', 'z', '28', '3', '2019'] At > index 1 diff: 'x' != 'z' Full diff: - ['101', 'x', '28', '3', '2019'] ? > ^ + ['101', 'z', '28', '3', '2019'] ? ^ > Stacktrace > custom_cluster/test_event_processing.py:49: in > test_insert_events_transactional > self.run_test_insert_events(is_transactional=True) > custom_cluster/test_event_processing.py:131: in run_test_insert_events > assert data.split('\t') == ['101', 'z', '28', '3', '2019'] > E assert ['101', 'x', ..., '3', '2019'] == ['101', 'z', '28', '3', '2019'] > E At index 1 diff: 'x' != 'z' > E Full diff: > E - ['101', 'x', '28', '3', '2019'] > E ? ^ > E + ['101', 'z', '28', '3', '2019'] > E ? ^ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7954) Support automatic invalidates using metastore notification events
[ https://issues.apache.org/jira/browse/IMPALA-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-7954. - Fix Version/s: Impala 4.1.0 Resolution: Fixed > Support automatic invalidates using metastore notification events > - > > Key: IMPALA-7954 > URL: https://issues.apache.org/jira/browse/IMPALA-7954 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 3.1.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1.0 > > Attachments: Automatic_invalidate_DesignDoc_v1.pdf, > Impala_Catalogd_Auto_Metadata_Update_v2.pdf > > > Currently, in Impala there are multiple ways to invalidate or refresh the > metadata stored in Catalog for Tables. Objects in Catalog can be invalidated > either on usage based approach (invalidate_tables_timeout_s) or when there is > GC pressure (invalidate_tables_on_memory_pressure) as added in IMPALA-7448. > However, most users issue invalidate commands when they want to sync to the > latest information from HDFS or HMS. Unfortunately, when data is modified or > new data is added outside Impala (eg. Hive) or a different Impala cluster, > users don't have a clear idea on whether they have to issue invalidate or > not. To be on the safer side, users keep issuing invalidate commands more > than necessary and it causes performance as well as stability issues. > Hive Metastore provides a simple API to get incremental updates to the > metadata information stored in its database. Each API which does a > add/alter/drop operation in metastore generates event(s) which can be fetched > using {{get_next_notification}} API. Each event has a unique and increasing > event_id. The current notification event id can be fetched using > {{get_current_notificationEventId}} API. > This JIRA proposes to make use of such events from metastore to proactively > either invalidate or refresh information in the catalogD. When configured, > CatalogD could poll for such events and take action (like add/drop/refresh > partition, add/drop/invalidate tables and databases) based on the events. > This way we can automatically refresh the catalogD state using events and it > would greatly help the use-cases where users want to see the latest > information (within a configurable interval of time delay) without flooding > the system with invalidate requests. > I will be attaching a design doc to this JIRA and create subtasks for the > work. Feel free to make comments on the JIRA or make suggestions to improve > the design. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-7954) Support automatic invalidates using metastore notification events
[ https://issues.apache.org/jira/browse/IMPALA-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-7954. - Fix Version/s: Impala 4.1.0 Resolution: Fixed > Support automatic invalidates using metastore notification events > - > > Key: IMPALA-7954 > URL: https://issues.apache.org/jira/browse/IMPALA-7954 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 3.1.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1.0 > > Attachments: Automatic_invalidate_DesignDoc_v1.pdf, > Impala_Catalogd_Auto_Metadata_Update_v2.pdf > > > Currently, in Impala there are multiple ways to invalidate or refresh the > metadata stored in Catalog for Tables. Objects in Catalog can be invalidated > either on usage based approach (invalidate_tables_timeout_s) or when there is > GC pressure (invalidate_tables_on_memory_pressure) as added in IMPALA-7448. > However, most users issue invalidate commands when they want to sync to the > latest information from HDFS or HMS. Unfortunately, when data is modified or > new data is added outside Impala (eg. Hive) or a different Impala cluster, > users don't have a clear idea on whether they have to issue invalidate or > not. To be on the safer side, users keep issuing invalidate commands more > than necessary and it causes performance as well as stability issues. > Hive Metastore provides a simple API to get incremental updates to the > metadata information stored in its database. Each API which does a > add/alter/drop operation in metastore generates event(s) which can be fetched > using {{get_next_notification}} API. Each event has a unique and increasing > event_id. The current notification event id can be fetched using > {{get_current_notificationEventId}} API. > This JIRA proposes to make use of such events from metastore to proactively > either invalidate or refresh information in the catalogD. When configured, > CatalogD could poll for such events and take action (like add/drop/refresh > partition, add/drop/invalidate tables and databases) based on the events. > This way we can automatically refresh the catalogD state using events and it > would greatly help the use-cases where users want to see the latest > information (within a configurable interval of time delay) without flooding > the system with invalidate requests. > I will be attaching a design doc to this JIRA and create subtasks for the > work. Feel free to make comments on the JIRA or make suggestions to improve > the design. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IMPALA-10273) Support function events
[ https://issues.apache.org/jira/browse/IMPALA-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated IMPALA-10273: - Parent: (was: IMPALA-7954) Issue Type: Improvement (was: Sub-task) > Support function events > --- > > Key: IMPALA-10273 > URL: https://issues.apache.org/jira/browse/IMPALA-10273 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Vihang Karajgaonkar >Priority: Major > > HMS creates ADD_FUNCTION, ALTER_FUNCTION and DROP_FUNCTION events when a > function is created/dropped/altered. We can add use these events to refresh > the functions in catalogd using the events processor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9857) Batch ALTER_PARTITION events
[ https://issues.apache.org/jira/browse/IMPALA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated IMPALA-9857: Parent: (was: IMPALA-7954) Issue Type: Improvement (was: Sub-task) > Batch ALTER_PARTITION events > > > Key: IMPALA-9857 > URL: https://issues.apache.org/jira/browse/IMPALA-9857 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > > When Hive inserts data into partitioned tables, it generates a lot of > ALTER_PARTITION (and possibly INSERT_EVENT) in quick succession. Currently, > such events are processed one by one by EventsProcessor which is can be slow > and can cause EventsProcessor to lag behind. This JIRA proposes to use > batching for such ALTER_PARTITION events such that all the successive > ALTER_PARTITION events for the same table are batched together into one > ALTER_PARTITIONS event and then are processed together to refresh all the > partitions from the events. This can significantly speed up the event > processing in such cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8592) Add support for insert events for 'LOAD DATA..' statements from Impala.
[ https://issues.apache.org/jira/browse/IMPALA-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated IMPALA-8592: Parent: (was: IMPALA-7954) Issue Type: Improvement (was: Sub-task) > Add support for insert events for 'LOAD DATA..' statements from Impala. > --- > > Key: IMPALA-8592 > URL: https://issues.apache.org/jira/browse/IMPALA-8592 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Anurag Mantripragada >Priority: Major > > Hive generates INSERT events for LOAD DATA.. statements. We should support > the same in Impala. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-8795) Enable event polling by default in tests
[ https://issues.apache.org/jira/browse/IMPALA-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-8795. - Fix Version/s: Impala 4.1.0 Resolution: Fixed > Enable event polling by default in tests > > > Key: IMPALA-8795 > URL: https://issues.apache.org/jira/browse/IMPALA-8795 > Project: IMPALA > Issue Type: Sub-task > Components: Catalog >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1.0 > > > We should turn on event processing by default in all the tests to make sure > that there are no regressions when we turn ON the feature by default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-8795) Enable event polling by default in tests
[ https://issues.apache.org/jira/browse/IMPALA-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-8795. - Fix Version/s: Impala 4.1.0 Resolution: Fixed > Enable event polling by default in tests > > > Key: IMPALA-8795 > URL: https://issues.apache.org/jira/browse/IMPALA-8795 > Project: IMPALA > Issue Type: Sub-task > Components: Catalog >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1.0 > > > We should turn on event processing by default in all the tests to make sure > that there are no regressions when we turn ON the feature by default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10815) Ignore events on non-default hive catalogs
[ https://issues.apache.org/jira/browse/IMPALA-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10815. -- Fix Version/s: Impala 4.1 Resolution: Fixed > Ignore events on non-default hive catalogs > -- > > Key: IMPALA-10815 > URL: https://issues.apache.org/jira/browse/IMPALA-10815 > Project: IMPALA > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Fix For: Impala 4.1 > > > Hive-3 introduces a new object called catalog which is like a namespace for > database and tables. Currently, Impala does not support hive catalog. > However, if there are events on such non-default catalogs the events > processing applies these events on the catalogd if the database and table > name matches. Until we support custom catalogs in hive we should ignore the > events coming from such non-default catalog objects. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10815) Ignore events on non-default hive catalogs
[ https://issues.apache.org/jira/browse/IMPALA-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10815. -- Fix Version/s: Impala 4.1 Resolution: Fixed > Ignore events on non-default hive catalogs > -- > > Key: IMPALA-10815 > URL: https://issues.apache.org/jira/browse/IMPALA-10815 > Project: IMPALA > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Fix For: Impala 4.1 > > > Hive-3 introduces a new object called catalog which is like a namespace for > database and tables. Currently, Impala does not support hive catalog. > However, if there are events on such non-default catalogs the events > processing applies these events on the catalogd if the database and table > name matches. Until we support custom catalogs in hive we should ignore the > events coming from such non-default catalog objects. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10815) Ignore events on non-default hive catalogs
Vihang Karajgaonkar created IMPALA-10815: Summary: Ignore events on non-default hive catalogs Key: IMPALA-10815 URL: https://issues.apache.org/jira/browse/IMPALA-10815 Project: IMPALA Issue Type: Bug Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar Hive-3 introduces a new object called catalog which is like a namespace for database and tables. Currently, Impala does not support hive catalog. However, if there are events on such non-default catalogs the events processing applies these events on the catalogd if the database and table name matches. Until we support custom catalogs in hive we should ignore the events coming from such non-default catalog objects. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10815) Ignore events on non-default hive catalogs
Vihang Karajgaonkar created IMPALA-10815: Summary: Ignore events on non-default hive catalogs Key: IMPALA-10815 URL: https://issues.apache.org/jira/browse/IMPALA-10815 Project: IMPALA Issue Type: Bug Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar Hive-3 introduces a new object called catalog which is like a namespace for database and tables. Currently, Impala does not support hive catalog. However, if there are events on such non-default catalogs the events processing applies these events on the catalogd if the database and table name matches. Until we support custom catalogs in hive we should ignore the events coming from such non-default catalog objects. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10468) DROP events which are generated while a batch is being processed may add table incorrectly
[ https://issues.apache.org/jira/browse/IMPALA-10468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10468. -- Fix Version/s: Impala 4.1 Resolution: Duplicate > DROP events which are generated while a batch is being processed may add > table incorrectly > -- > > Key: IMPALA-10468 > URL: https://issues.apache.org/jira/browse/IMPALA-10468 > Project: IMPALA > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1 > > > One of the problems with CREATE/DROP events is that they may occur while a > batch is being processed and hence EventsProcessor may not able aware of that. > For example, consider the following sequence of statements: > create table foo (c1 int); > drop table foo; > create table foo (c2 int); > drop table foo; > These statements will generate CREATE_TABLE, DROP_TABLE, CREATE_TABLE, > DROP_TABLE event sequence. Generally, if all these 4 events are fetched in a > batch, then the first CREATE_TABLE and third CREATE_TABLE is ignored because > it is followed by the a DROP_TABLE in the sequence and the DROP_TABLE events > take no effect since the table doesn't exist in catalogd anymore. > However, if the events processor fetches these events in 2 batches (3 and 1) > then after the first batch of CREATE_TABLE, DROP_TABLE, CREATE_TABLE is > processed, the third event will add the table foo in the catalogd. The > subsequent batch's DROP_TABLE will be processed and remove the table, but > between the two batches, catalogd will say that a table called foo exists. > This can lead to statements getting errored out. Eg. a statement like create > table foo (c3 int) after the above statements will error out with a > TableAlreadyExists error. > The problem happens for databases too. So far I have not been able to > reproduce this for Partitions but I don't see why it will not happen with > Partitions also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10490) truncate table fails with IllegalStateException
[ https://issues.apache.org/jira/browse/IMPALA-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10490. -- Fix Version/s: Impala 4.1 Resolution: Fixed > truncate table fails with IllegalStateException > --- > > Key: IMPALA-10490 > URL: https://issues.apache.org/jira/browse/IMPALA-10490 > Project: IMPALA > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1 > > > This is a problem for when events processing is turned on. I can reproduce it > by following steps. > 1. start impala without events processing > 2. create table, load data, compute stats on the table. > 3. restart impala with events processing turned on > 4. Run truncate table command. > I can see the truncate table command fails with following error. > [localhost:21050] default> truncate t5; > Query: truncate t5 > ERROR: CatalogException: Failed to truncate table: default.t5. > Table may be in a partially truncated state. > CAUSED BY: IllegalStateException: Table parameters must have catalog service > identifier before adding it to partition parameters -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10502) delayed 'Invalidated objects in cache' cause 'Table already exists'
[ https://issues.apache.org/jira/browse/IMPALA-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10502. -- Fix Version/s: Impala 4.1 Resolution: Fixed > delayed 'Invalidated objects in cache' cause 'Table already exists' > --- > > Key: IMPALA-10502 > URL: https://issues.apache.org/jira/browse/IMPALA-10502 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Clients, Frontend >Affects Versions: Impala 3.4.0 >Reporter: Adriano >Assignee: Vihang Karajgaonkar >Priority: Critical > Fix For: Impala 4.1 > > > In fast paced environment where the interval between the step 1 and 2 is # < > 100ms (a simplified pipeline looks like): > 0- catalog 'on demand' in use and disableHmsSync (enabled or disabled: no > difference) > 1- open session to coord A -> DROP TABLE X -> close session > 2- open session to coord A -> CREATE TABLE X-> close session > Results: the step -2- can fail with table already exist. > During the internal investigation was discovered that IMPALA-9913 will > regress the issue in almost all scenarios. > However considering that the investigation are internally ongoing it is nice > to have the event tracked also here. > Once we are sure that IMPALA-9913 fix these events we can close this as > duplicate, in alternative carry on the investigation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10502) delayed 'Invalidated objects in cache' cause 'Table already exists'
[ https://issues.apache.org/jira/browse/IMPALA-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10502. -- Fix Version/s: Impala 4.1 Resolution: Fixed > delayed 'Invalidated objects in cache' cause 'Table already exists' > --- > > Key: IMPALA-10502 > URL: https://issues.apache.org/jira/browse/IMPALA-10502 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Clients, Frontend >Affects Versions: Impala 3.4.0 >Reporter: Adriano >Assignee: Vihang Karajgaonkar >Priority: Critical > Fix For: Impala 4.1 > > > In fast paced environment where the interval between the step 1 and 2 is # < > 100ms (a simplified pipeline looks like): > 0- catalog 'on demand' in use and disableHmsSync (enabled or disabled: no > difference) > 1- open session to coord A -> DROP TABLE X -> close session > 2- open session to coord A -> CREATE TABLE X-> close session > Results: the step -2- can fail with table already exist. > During the internal investigation was discovered that IMPALA-9913 will > regress the issue in almost all scenarios. > However considering that the investigation are internally ongoing it is nice > to have the event tracked also here. > Once we are sure that IMPALA-9913 fix these events we can close this as > duplicate, in alternative carry on the investigation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (IMPALA-10490) truncate table fails with IllegalStateException
[ https://issues.apache.org/jira/browse/IMPALA-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10490 started by Vihang Karajgaonkar. > truncate table fails with IllegalStateException > --- > > Key: IMPALA-10490 > URL: https://issues.apache.org/jira/browse/IMPALA-10490 > Project: IMPALA > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > > This is a problem for when events processing is turned on. I can reproduce it > by following steps. > 1. start impala without events processing > 2. create table, load data, compute stats on the table. > 3. restart impala with events processing turned on > 4. Run truncate table command. > I can see the truncate table command fails with following error. > [localhost:21050] default> truncate t5; > Query: truncate t5 > ERROR: CatalogException: Failed to truncate table: default.t5. > Table may be in a partially truncated state. > CAUSED BY: IllegalStateException: Table parameters must have catalog service > identifier before adding it to partition parameters -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10768) Deflake CatalogHmsFileMetadataTest
[ https://issues.apache.org/jira/browse/IMPALA-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10768. -- Fix Version/s: Impala 4.1 Resolution: Fixed > Deflake CatalogHmsFileMetadataTest > -- > > Key: IMPALA-10768 > URL: https://issues.apache.org/jira/browse/IMPALA-10768 > Project: IMPALA > Issue Type: Test >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Fix For: Impala 4.1 > > > Some times we see CatalogHmsFileMetadataTest#testFileMetadataForPartitions > fail with following stack trace: > {noformat} > org.junit.ComparisonFailure: expected:<090[1]01.txt> but was:<090[2]01.txt> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.impala.catalog.metastore.CatalogHmsFileMetadataTest.assertFdsAreSame(CatalogHmsFileMetadataTest.java:133) > at > org.apache.impala.catalog.metastore.CatalogHmsFileMetadataTest.testFileMetadataForPartitions(CatalogHmsFileMetadataTest.java:121) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143) > {noformat} > I was not able to reproduce the error locally but based on the code > inspection it looks like this happens because the order of the > filedescriptors in the two lists is different. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10768) Deflake CatalogHmsFileMetadataTest
[ https://issues.apache.org/jira/browse/IMPALA-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10768. -- Fix Version/s: Impala 4.1 Resolution: Fixed > Deflake CatalogHmsFileMetadataTest > -- > > Key: IMPALA-10768 > URL: https://issues.apache.org/jira/browse/IMPALA-10768 > Project: IMPALA > Issue Type: Test >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Fix For: Impala 4.1 > > > Some times we see CatalogHmsFileMetadataTest#testFileMetadataForPartitions > fail with following stack trace: > {noformat} > org.junit.ComparisonFailure: expected:<090[1]01.txt> but was:<090[2]01.txt> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.impala.catalog.metastore.CatalogHmsFileMetadataTest.assertFdsAreSame(CatalogHmsFileMetadataTest.java:133) > at > org.apache.impala.catalog.metastore.CatalogHmsFileMetadataTest.testFileMetadataForPartitions(CatalogHmsFileMetadataTest.java:121) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143) > {noformat} > I was not able to reproduce the error locally but based on the code > inspection it looks like this happens because the order of the > filedescriptors in the two lists is different. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10768) Deflake CatalogHmsFileMetadataTest
Vihang Karajgaonkar created IMPALA-10768: Summary: Deflake CatalogHmsFileMetadataTest Key: IMPALA-10768 URL: https://issues.apache.org/jira/browse/IMPALA-10768 Project: IMPALA Issue Type: Test Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar Some times we see CatalogHmsFileMetadataTest#testFileMetadataForPartitions fail with following stack trace: {noformat} org.junit.ComparisonFailure: expected:<090[1]01.txt> but was:<090[2]01.txt> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.impala.catalog.metastore.CatalogHmsFileMetadataTest.assertFdsAreSame(CatalogHmsFileMetadataTest.java:133) at org.apache.impala.catalog.metastore.CatalogHmsFileMetadataTest.testFileMetadataForPartitions(CatalogHmsFileMetadataTest.java:121) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143) {noformat} I was not able to reproduce the error locally but based on the code inspection it looks like this happens because the order of the filedescriptors in the two lists is different. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10754) test_overlap_min_max_filters_on_sorted_columns failed during GVO
[ https://issues.apache.org/jira/browse/IMPALA-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368977#comment-17368977 ] Vihang Karajgaonkar commented on IMPALA-10754: -- Hi [~sql_forever] Is this issue resolved? I hit this test failure here: https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4372/ > test_overlap_min_max_filters_on_sorted_columns failed during GVO > > > Key: IMPALA-10754 > URL: https://issues.apache.org/jira/browse/IMPALA-10754 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Zoltán Borók-Nagy >Assignee: Qifan Chen >Priority: Major > Labels: broken-build > > test_overlap_min_max_filters_on_sorted_columns failed in the following build: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/4338/testReport/ > *Stack trace:* > {noformat} > query_test/test_runtime_filters.py:296: in > test_overlap_min_max_filters_on_sorted_columns > test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) > common/impala_test_suite.py:734: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:653: in verify_runtime_profile > % (function, field, expected_value, actual_value, op, actual)) > E AssertionError: Aggregation of SUM over NumRuntimeFilteredPages did not > match expected results. > E EXPECTED VALUE: > E 58 > E > E > E ACTUAL VALUE: > E 59 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10759) MetastoreServiceHandler.get_partitions_by_names_req throws NoSuchMethodError
[ https://issues.apache.org/jira/browse/IMPALA-10759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367628#comment-17367628 ] Vihang Karajgaonkar commented on IMPALA-10759: -- I took a brief look at this yesterday and I found that the issue happens when Impala is using 0.9.3 thrift version and hive is using 0.13.0 version. This happens because the {noformat}hashCode{noformat} method in the thrift generated code for HMS objects like Partition changes when you change thrift from 0.9.3 to 0.13.0. Specifically the hashCode values for the primitive fields like long now use {noformat}TBaseHelper.hashCode{noformat} instead of the old way of add it to a ArrayList and then comparing the hashCode. For example, in case of writeId field of the partition which is definied as a i64 in the thrift file, the hashCode is computed using TBaseHelper as seen in the diffs here https://github.com/apache/hive/commit/1945e2f67e5b09cdda40146b87e1ba492f897196#diff-505c537842790dadd6f182b07b0b216be40e050588941213220b4ae3622bd0faR877 I don't think there is a good way to "fix" this. This should get fixed automatically when Impala uses 0.11.0 thrift version. I confirmed that the build where we saw this failure did not have the Impala thrift version as 0.11.0. > MetastoreServiceHandler.get_partitions_by_names_req throws NoSuchMethodError > > > Key: IMPALA-10759 > URL: https://issues.apache.org/jira/browse/IMPALA-10759 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Yongzhi Chen >Assignee: Vihang Karajgaonkar >Priority: Critical > > impala-cdpd-master-core > EnableCatalogdHmsCacheFlagTest.testEnableCatalogdCachingFlag test fails with > following stack: > {noformat} > Exception in thread "pool-470-thread-1" java.lang.NoSuchMethodError: > org.apache.thrift.TBaseHelper.hashCode(J)I > at > org.apache.hadoop.hive.metastore.api.Partition.hashCode(Partition.java:971) > at java.util.HashMap.hash(HashMap.java:338) > at java.util.HashMap.put(HashMap.java:611) > at > org.apache.impala.catalog.CatalogHmsAPIHelper.loadAndSetFileMetadataFromFs(CatalogHmsAPIHelper.java:527) > at > org.apache.impala.catalog.metastore.MetastoreServiceHandler.get_partitions_by_names_req(MetastoreServiceHandler.java:1443) > at > org.apache.impala.catalog.metastore.CatalogMetastoreServiceHandler.get_partitions_by_names_req(CatalogMetastoreServiceHandler.java:141) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.impala.catalog.metastore.CatalogMetastoreServer$TimingInvocationHandler.invoke(CatalogMetastoreServer.java:223) > at com.sun.proxy.$Proxy87.get_partitions_by_names_req(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_names_req.getResult(ThriftHiveMetastore.java:20087) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_names_req.getResult(ThriftHiveMetastore.java:20066) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Comment Edited] (IMPALA-10759) MetastoreServiceHandler.get_partitions_by_names_req throws NoSuchMethodError
[ https://issues.apache.org/jira/browse/IMPALA-10759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367628#comment-17367628 ] Vihang Karajgaonkar edited comment on IMPALA-10759 at 6/22/21, 7:24 PM: I took a brief look at this yesterday and I found that the issue happens when Impala is using 0.9.3 thrift version and hive is using 0.13.0 version. This happens because the hashCode method in the thrift generated code for HMS objects like Partition changes when you change thrift from 0.9.3 to 0.13.0. Specifically the hashCode values for the primitive fields like long now use TBaseHelper.hashCode(long) instead of the old way of add it to a ArrayList and then comparing the hashCode. For example, in case of writeId field of the partition which is definied as a i64 in the thrift file, the hashCode is computed using TBaseHelper as seen in the diffs here https://github.com/apache/hive/commit/1945e2f67e5b09cdda40146b87e1ba492f897196#diff-505c537842790dadd6f182b07b0b216be40e050588941213220b4ae3622bd0faR877 I don't think there is a good way to "fix" this. This should get fixed automatically when Impala uses 0.11.0 thrift version. I confirmed that the build where we saw this failure did not have the Impala thrift version as 0.11.0. was (Author: vihangk1): I took a brief look at this yesterday and I found that the issue happens when Impala is using 0.9.3 thrift version and hive is using 0.13.0 version. This happens because the {noformat}hashCode{noformat} method in the thrift generated code for HMS objects like Partition changes when you change thrift from 0.9.3 to 0.13.0. Specifically the hashCode values for the primitive fields like long now use {noformat}TBaseHelper.hashCode{noformat} instead of the old way of add it to a ArrayList and then comparing the hashCode. For example, in case of writeId field of the partition which is definied as a i64 in the thrift file, the hashCode is computed using TBaseHelper as seen in the diffs here https://github.com/apache/hive/commit/1945e2f67e5b09cdda40146b87e1ba492f897196#diff-505c537842790dadd6f182b07b0b216be40e050588941213220b4ae3622bd0faR877 I don't think there is a good way to "fix" this. This should get fixed automatically when Impala uses 0.11.0 thrift version. I confirmed that the build where we saw this failure did not have the Impala thrift version as 0.11.0. > MetastoreServiceHandler.get_partitions_by_names_req throws NoSuchMethodError > > > Key: IMPALA-10759 > URL: https://issues.apache.org/jira/browse/IMPALA-10759 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Yongzhi Chen >Assignee: Vihang Karajgaonkar >Priority: Critical > > impala-cdpd-master-core > EnableCatalogdHmsCacheFlagTest.testEnableCatalogdCachingFlag test fails with > following stack: > {noformat} > Exception in thread "pool-470-thread-1" java.lang.NoSuchMethodError: > org.apache.thrift.TBaseHelper.hashCode(J)I > at > org.apache.hadoop.hive.metastore.api.Partition.hashCode(Partition.java:971) > at java.util.HashMap.hash(HashMap.java:338) > at java.util.HashMap.put(HashMap.java:611) > at > org.apache.impala.catalog.CatalogHmsAPIHelper.loadAndSetFileMetadataFromFs(CatalogHmsAPIHelper.java:527) > at > org.apache.impala.catalog.metastore.MetastoreServiceHandler.get_partitions_by_names_req(MetastoreServiceHandler.java:1443) > at > org.apache.impala.catalog.metastore.CatalogMetastoreServiceHandler.get_partitions_by_names_req(CatalogMetastoreServiceHandler.java:141) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.impala.catalog.metastore.CatalogMetastoreServer$TimingInvocationHandler.invoke(CatalogMetastoreServer.java:223) > at com.sun.proxy.$Proxy87.get_partitions_by_names_req(Unknown Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_names_req.getResult(ThriftHiveMetastore.java:20087) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_names_req.getResult(ThriftHiveMetastore.java:20066) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >
[jira] [Commented] (IMPALA-10700) Introduce an option to skip deleting column statistics on truncate
[ https://issues.apache.org/jira/browse/IMPALA-10700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355414#comment-17355414 ] Vihang Karajgaonkar commented on IMPALA-10700: -- Hi [~shajini] Can you please help document this query option when you get some time? Thanks a lot! > Introduce an option to skip deleting column statistics on truncate > -- > > Key: IMPALA-10700 > URL: https://issues.apache.org/jira/browse/IMPALA-10700 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.0 > > > Currently when a user issues {{truncate table}} command on a > non-transactional table, catalogd also deletes the table and column > statistics. However, this can affect the performance of the truncate > operation especially at high concurrency. Based on preliminary research it > looks like other databases do not delete statistics after truncate operation > (e.g Oracle, Hive). It would be good to introduce a query option which can > set by the user to skip deleting the column statistics during the truncate > table execution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10700) Introduce an option to skip deleting column statistics on truncate
[ https://issues.apache.org/jira/browse/IMPALA-10700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10700. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Introduce an option to skip deleting column statistics on truncate > -- > > Key: IMPALA-10700 > URL: https://issues.apache.org/jira/browse/IMPALA-10700 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.0 > > > Currently when a user issues {{truncate table}} command on a > non-transactional table, catalogd also deletes the table and column > statistics. However, this can affect the performance of the truncate > operation especially at high concurrency. Based on preliminary research it > looks like other databases do not delete statistics after truncate operation > (e.g Oracle, Hive). It would be good to introduce a query option which can > set by the user to skip deleting the column statistics during the truncate > table execution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10700) Introduce an option to skip deleting column statistics on truncate
[ https://issues.apache.org/jira/browse/IMPALA-10700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10700. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Introduce an option to skip deleting column statistics on truncate > -- > > Key: IMPALA-10700 > URL: https://issues.apache.org/jira/browse/IMPALA-10700 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.0 > > > Currently when a user issues {{truncate table}} command on a > non-transactional table, catalogd also deletes the table and column > statistics. However, this can affect the performance of the truncate > operation especially at high concurrency. Based on preliminary research it > looks like other databases do not delete statistics after truncate operation > (e.g Oracle, Hive). It would be good to introduce a query option which can > set by the user to skip deleting the column statistics during the truncate > table execution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10722) truncate operation deletes data files before deleting metadata
Vihang Karajgaonkar created IMPALA-10722: Summary: truncate operation deletes data files before deleting metadata Key: IMPALA-10722 URL: https://issues.apache.org/jira/browse/IMPALA-10722 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Vihang Karajgaonkar In case of truncate operation, we delete the data files first and then the statistics. But since statistics are derived from data, we should first delete statistics and then data files. See: https://github.com/apache/impala/blob/822e8373d1f1737865899b80862c2be7b07cc950/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L2297 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10722) truncate operation deletes data files before deleting metadata
Vihang Karajgaonkar created IMPALA-10722: Summary: truncate operation deletes data files before deleting metadata Key: IMPALA-10722 URL: https://issues.apache.org/jira/browse/IMPALA-10722 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Vihang Karajgaonkar In case of truncate operation, we delete the data files first and then the statistics. But since statistics are derived from data, we should first delete statistics and then data files. See: https://github.com/apache/impala/blob/822e8373d1f1737865899b80862c2be7b07cc950/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L2297 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IMPALA-10502) delayed 'Invalidated objects in cache' cause 'Table already exists'
[ https://issues.apache.org/jira/browse/IMPALA-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated IMPALA-10502: - Priority: Critical (was: Minor) > delayed 'Invalidated objects in cache' cause 'Table already exists' > --- > > Key: IMPALA-10502 > URL: https://issues.apache.org/jira/browse/IMPALA-10502 > Project: IMPALA > Issue Type: Bug > Components: Catalog, Clients, Frontend >Affects Versions: Impala 3.4.0 >Reporter: Adriano >Assignee: Vihang Karajgaonkar >Priority: Critical > > In fast paced environment where the interval between the step 1 and 2 is # < > 100ms (a simplified pipeline looks like): > 0- catalog 'on demand' in use and disableHmsSync (enabled or disabled: no > difference) > 1- open session to coord A -> DROP TABLE X -> close session > 2- open session to coord A -> CREATE TABLE X-> close session > Results: the step -2- can fail with table already exist. > During the internal investigation was discovered that IMPALA-9913 will > regress the issue in almost all scenarios. > However considering that the investigation are internally ongoing it is nice > to have the event tracked also here. > Once we are sure that IMPALA-9913 fix these events we can close this as > duplicate, in alternative carry on the investigation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10645) Expose metrics for catalogd's HMS endpoint
[ https://issues.apache.org/jira/browse/IMPALA-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10645. -- Fix Version/s: Impala 4.1 Resolution: Fixed > Expose metrics for catalogd's HMS endpoint > -- > > Key: IMPALA-10645 > URL: https://issues.apache.org/jira/browse/IMPALA-10645 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1 > > > Catalogd's HMS endpoint should expose metrics to help it supportability and > identify performance issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10645) Expose metrics for catalogd's HMS endpoint
[ https://issues.apache.org/jira/browse/IMPALA-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10645. -- Fix Version/s: Impala 4.1 Resolution: Fixed > Expose metrics for catalogd's HMS endpoint > -- > > Key: IMPALA-10645 > URL: https://issues.apache.org/jira/browse/IMPALA-10645 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.1 > > > Catalogd's HMS endpoint should expose metrics to help it supportability and > identify performance issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (IMPALA-10645) Expose metrics for catalogd's HMS endpoint
[ https://issues.apache.org/jira/browse/IMPALA-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar reassigned IMPALA-10645: Assignee: Vihang Karajgaonkar > Expose metrics for catalogd's HMS endpoint > -- > > Key: IMPALA-10645 > URL: https://issues.apache.org/jira/browse/IMPALA-10645 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > > Catalogd's HMS endpoint should expose metrics to help it supportability and > identify performance issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10706) Get rid of metastoreAccessLock_ in TableLoader
Vihang Karajgaonkar created IMPALA-10706: Summary: Get rid of metastoreAccessLock_ in TableLoader Key: IMPALA-10706 URL: https://issues.apache.org/jira/browse/IMPALA-10706 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Vihang Karajgaonkar https://github.com/apache/impala/blob/9c38568657d62b6f6d7b10aa1c721ba843374dd8/fe/src/main/java/org/apache/impala/catalog/TableLoader.java#L68 has a synchronized block for metastore access which seems unnecessary anymore and should be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10706) Get rid of metastoreAccessLock_ in TableLoader
Vihang Karajgaonkar created IMPALA-10706: Summary: Get rid of metastoreAccessLock_ in TableLoader Key: IMPALA-10706 URL: https://issues.apache.org/jira/browse/IMPALA-10706 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Vihang Karajgaonkar https://github.com/apache/impala/blob/9c38568657d62b6f6d7b10aa1c721ba843374dd8/fe/src/main/java/org/apache/impala/catalog/TableLoader.java#L68 has a synchronized block for metastore access which seems unnecessary anymore and should be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10700) Introduce an option to skip deleting column statistics on truncate
Vihang Karajgaonkar created IMPALA-10700: Summary: Introduce an option to skip deleting column statistics on truncate Key: IMPALA-10700 URL: https://issues.apache.org/jira/browse/IMPALA-10700 Project: IMPALA Issue Type: Improvement Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar Currently when a user issues {{truncate table}} command on a non-transactional table, catalogd also deletes the table and column statistics. However, this can affect the performance of the truncate operation especially at high concurrency. Based on preliminary research it looks like other databases do not delete statistics after truncate operation (e.g Oracle, Hive). It would be good to introduce a query option which can set by the user to skip deleting the column statistics during the truncate table execution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10700) Introduce an option to skip deleting column statistics on truncate
Vihang Karajgaonkar created IMPALA-10700: Summary: Introduce an option to skip deleting column statistics on truncate Key: IMPALA-10700 URL: https://issues.apache.org/jira/browse/IMPALA-10700 Project: IMPALA Issue Type: Improvement Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar Currently when a user issues {{truncate table}} command on a non-transactional table, catalogd also deletes the table and column statistics. However, this can affect the performance of the truncate operation especially at high concurrency. Based on preliminary research it looks like other databases do not delete statistics after truncate operation (e.g Oracle, Hive). It would be good to introduce a query option which can set by the user to skip deleting the column statistics during the truncate table execution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10644) RangerAuthorizationFactory cannot be instantiated after latest GBN bump up
[ https://issues.apache.org/jira/browse/IMPALA-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10644. -- Fix Version/s: Impala 4.0 Resolution: Fixed > RangerAuthorizationFactory cannot be instantiated after latest GBN bump up > -- > > Key: IMPALA-10644 > URL: https://issues.apache.org/jira/browse/IMPALA-10644 > Project: IMPALA > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Blocker > Fix For: Impala 4.0 > > > After the GBN was bumped to 11920537 in the commit > https://github.com/apache/impala/commit/1ab1143e98ff09610dff82d1795cf103659ffe97 > some of the ranger tests are failing with the following exception trace. > {noformat} > I0407 17:40:18.681761 25041 jni-util.cc:286] > org.apache.impala.common.InternalException: Unable to instantiate > authorization provider: > org.apache.impala.authorization.ranger.RangerAuthorizationFactory > at > org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:88) > at org.apache.impala.service.JniFrontend.(JniFrontend.java:143) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:86) > ... 1 more > Caused by: java.lang.NoClassDefFoundError: > org/apache/solr/common/SolrException > at > org.apache.ranger.audit.provider.AuditProviderFactory.getProviderFromConfig(AuditProviderFactory.java:420) > at > org.apache.ranger.audit.provider.AuditProviderFactory.init(AuditProviderFactory.java:178) > at > org.apache.ranger.plugin.service.RangerBasePlugin.init(RangerBasePlugin.java:175) > at > org.apache.impala.authorization.ranger.RangerImpalaPlugin.init(RangerImpalaPlugin.java:50) > at > org.apache.impala.authorization.ranger.RangerImpalaPlugin.getInstance(RangerImpalaPlugin.java:69) > at > org.apache.impala.authorization.ranger.RangerAuthorizationChecker.(RangerAuthorizationChecker.java:82) > at > org.apache.impala.authorization.ranger.RangerAuthorizationFactory.(RangerAuthorizationFactory.java:44) > ... 6 more > Caused by: java.lang.ClassNotFoundException: > org.apache.solr.common.SolrException > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 13 more > {noformat} > It looks like after the GBN was upgraded we need to have solr dependencies in > the fe/pom.xml and they should not be reverted. The toolchain should also be > updated to include exclude solr and atlas libraries for the GBN. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10644) RangerAuthorizationFactory cannot be instantiated after latest GBN bump up
[ https://issues.apache.org/jira/browse/IMPALA-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10644. -- Fix Version/s: Impala 4.0 Resolution: Fixed > RangerAuthorizationFactory cannot be instantiated after latest GBN bump up > -- > > Key: IMPALA-10644 > URL: https://issues.apache.org/jira/browse/IMPALA-10644 > Project: IMPALA > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Blocker > Fix For: Impala 4.0 > > > After the GBN was bumped to 11920537 in the commit > https://github.com/apache/impala/commit/1ab1143e98ff09610dff82d1795cf103659ffe97 > some of the ranger tests are failing with the following exception trace. > {noformat} > I0407 17:40:18.681761 25041 jni-util.cc:286] > org.apache.impala.common.InternalException: Unable to instantiate > authorization provider: > org.apache.impala.authorization.ranger.RangerAuthorizationFactory > at > org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:88) > at org.apache.impala.service.JniFrontend.(JniFrontend.java:143) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:86) > ... 1 more > Caused by: java.lang.NoClassDefFoundError: > org/apache/solr/common/SolrException > at > org.apache.ranger.audit.provider.AuditProviderFactory.getProviderFromConfig(AuditProviderFactory.java:420) > at > org.apache.ranger.audit.provider.AuditProviderFactory.init(AuditProviderFactory.java:178) > at > org.apache.ranger.plugin.service.RangerBasePlugin.init(RangerBasePlugin.java:175) > at > org.apache.impala.authorization.ranger.RangerImpalaPlugin.init(RangerImpalaPlugin.java:50) > at > org.apache.impala.authorization.ranger.RangerImpalaPlugin.getInstance(RangerImpalaPlugin.java:69) > at > org.apache.impala.authorization.ranger.RangerAuthorizationChecker.(RangerAuthorizationChecker.java:82) > at > org.apache.impala.authorization.ranger.RangerAuthorizationFactory.(RangerAuthorizationFactory.java:44) > ... 6 more > Caused by: java.lang.ClassNotFoundException: > org.apache.solr.common.SolrException > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 13 more > {noformat} > It looks like after the GBN was upgraded we need to have solr dependencies in > the fe/pom.xml and they should not be reverted. The toolchain should also be > updated to include exclude solr and atlas libraries for the GBN. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9375) Remove DirectMetaProvider usage from CatalogMetaProvider
[ https://issues.apache.org/jira/browse/IMPALA-9375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321314#comment-17321314 ] Vihang Karajgaonkar commented on IMPALA-9375: - Thanks [~robbiezhang] for your comment. Yeah that metastore client pool is okay to be there since it is only instantiated on coordinators. Coordinators need a HMS client because they need to open a transaction in case transaction tables are being inserted into. See https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/service/JniFrontend.java#L144 > Remove DirectMetaProvider usage from CatalogMetaProvider > > > Key: IMPALA-9375 > URL: https://issues.apache.org/jira/browse/IMPALA-9375 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 3.4.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Critical > > I see that CatalogMetaProvider uses {{DirectMetaProvider}} here > https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L239 > There are only a couple of places where it is used within > CatalogMetaProvider. We should implement those remaining APIs in catalog-v2 > mode and remove the usage of DirectMetaProvider from CatalogMetaProvider. > DirectMetaProvider starts by default a MetastoreClientPool (with 10 > connections). This is unnecessary given that catalog already makes the > connections to HMS at its startup. It also slows down the coordinator startup > time if there are HMS connection issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10613) Expose table and partition metadata over HMS API
[ https://issues.apache.org/jira/browse/IMPALA-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10613. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Expose table and partition metadata over HMS API > > > Key: IMPALA-10613 > URL: https://issues.apache.org/jira/browse/IMPALA-10613 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.0 > > > Catalogd caches the table and partition metadata. If an external FE needs to > be supported to query using the Impala, it would need to get this metadata > from catalogd to compile the query and generate the plan. While a subset of > the metadata which is cached in catalogd, is sourced from Hive metastore, it > also caches file metadata which is needed by the Impala backend to create the > Impala plan. It would be good to expose the table and partition metadata > cached in catalogd over HMS API so that any Hive metastore client (e.g spark, > hive) can potentially use this metadata to create a plan. This JIRA tracks > the work needed to expose this information over catalogd. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (IMPALA-10613) Expose table and partition metadata over HMS API
[ https://issues.apache.org/jira/browse/IMPALA-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10613. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Expose table and partition metadata over HMS API > > > Key: IMPALA-10613 > URL: https://issues.apache.org/jira/browse/IMPALA-10613 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Fix For: Impala 4.0 > > > Catalogd caches the table and partition metadata. If an external FE needs to > be supported to query using the Impala, it would need to get this metadata > from catalogd to compile the query and generate the plan. While a subset of > the metadata which is cached in catalogd, is sourced from Hive metastore, it > also caches file metadata which is needed by the Impala backend to create the > Impala plan. It would be good to expose the table and partition metadata > cached in catalogd over HMS API so that any Hive metastore client (e.g spark, > hive) can potentially use this metadata to create a plan. This JIRA tracks > the work needed to expose this information over catalogd. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10645) Expose metrics for catalogd's HMS endpoint
Vihang Karajgaonkar created IMPALA-10645: Summary: Expose metrics for catalogd's HMS endpoint Key: IMPALA-10645 URL: https://issues.apache.org/jira/browse/IMPALA-10645 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Catalogd's HMS endpoint should expose metrics to help it supportability and identify performance issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10645) Expose metrics for catalogd's HMS endpoint
Vihang Karajgaonkar created IMPALA-10645: Summary: Expose metrics for catalogd's HMS endpoint Key: IMPALA-10645 URL: https://issues.apache.org/jira/browse/IMPALA-10645 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Catalogd's HMS endpoint should expose metrics to help it supportability and identify performance issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IMPALA-10644) RangerAuthorizationFactory cannot be instantiated after latest GBN bump up
[ https://issues.apache.org/jira/browse/IMPALA-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17316687#comment-17316687 ] Vihang Karajgaonkar commented on IMPALA-10644: -- http://gerrit.cloudera.org:8080/17282 > RangerAuthorizationFactory cannot be instantiated after latest GBN bump up > -- > > Key: IMPALA-10644 > URL: https://issues.apache.org/jira/browse/IMPALA-10644 > Project: IMPALA > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > > After the GBN was bumped to 11920537 in the commit > https://github.com/apache/impala/commit/1ab1143e98ff09610dff82d1795cf103659ffe97 > some of the ranger tests are failing with the following exception trace. > {noformat} > I0407 17:40:18.681761 25041 jni-util.cc:286] > org.apache.impala.common.InternalException: Unable to instantiate > authorization provider: > org.apache.impala.authorization.ranger.RangerAuthorizationFactory > at > org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:88) > at org.apache.impala.service.JniFrontend.(JniFrontend.java:143) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:86) > ... 1 more > Caused by: java.lang.NoClassDefFoundError: > org/apache/solr/common/SolrException > at > org.apache.ranger.audit.provider.AuditProviderFactory.getProviderFromConfig(AuditProviderFactory.java:420) > at > org.apache.ranger.audit.provider.AuditProviderFactory.init(AuditProviderFactory.java:178) > at > org.apache.ranger.plugin.service.RangerBasePlugin.init(RangerBasePlugin.java:175) > at > org.apache.impala.authorization.ranger.RangerImpalaPlugin.init(RangerImpalaPlugin.java:50) > at > org.apache.impala.authorization.ranger.RangerImpalaPlugin.getInstance(RangerImpalaPlugin.java:69) > at > org.apache.impala.authorization.ranger.RangerAuthorizationChecker.(RangerAuthorizationChecker.java:82) > at > org.apache.impala.authorization.ranger.RangerAuthorizationFactory.(RangerAuthorizationFactory.java:44) > ... 6 more > Caused by: java.lang.ClassNotFoundException: > org.apache.solr.common.SolrException > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > ... 13 more > {noformat} > It looks like after the GBN was upgraded we need to have solr dependencies in > the fe/pom.xml and they should not be reverted. The toolchain should also be > updated to include exclude solr and atlas libraries for the GBN. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10644) RangerAuthorizationFactory cannot be instantiated after latest GBN bump up
Vihang Karajgaonkar created IMPALA-10644: Summary: RangerAuthorizationFactory cannot be instantiated after latest GBN bump up Key: IMPALA-10644 URL: https://issues.apache.org/jira/browse/IMPALA-10644 Project: IMPALA Issue Type: Bug Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar After the GBN was bumped to 11920537 in the commit https://github.com/apache/impala/commit/1ab1143e98ff09610dff82d1795cf103659ffe97 some of the ranger tests are failing with the following exception trace. {noformat} I0407 17:40:18.681761 25041 jni-util.cc:286] org.apache.impala.common.InternalException: Unable to instantiate authorization provider: org.apache.impala.authorization.ranger.RangerAuthorizationFactory at org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:88) at org.apache.impala.service.JniFrontend.(JniFrontend.java:143) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:86) ... 1 more Caused by: java.lang.NoClassDefFoundError: org/apache/solr/common/SolrException at org.apache.ranger.audit.provider.AuditProviderFactory.getProviderFromConfig(AuditProviderFactory.java:420) at org.apache.ranger.audit.provider.AuditProviderFactory.init(AuditProviderFactory.java:178) at org.apache.ranger.plugin.service.RangerBasePlugin.init(RangerBasePlugin.java:175) at org.apache.impala.authorization.ranger.RangerImpalaPlugin.init(RangerImpalaPlugin.java:50) at org.apache.impala.authorization.ranger.RangerImpalaPlugin.getInstance(RangerImpalaPlugin.java:69) at org.apache.impala.authorization.ranger.RangerAuthorizationChecker.(RangerAuthorizationChecker.java:82) at org.apache.impala.authorization.ranger.RangerAuthorizationFactory.(RangerAuthorizationFactory.java:44) ... 6 more Caused by: java.lang.ClassNotFoundException: org.apache.solr.common.SolrException at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 13 more {noformat} It looks like after the GBN was upgraded we need to have solr dependencies in the fe/pom.xml and they should not be reverted. The toolchain should also be updated to include exclude solr and atlas libraries for the GBN. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10644) RangerAuthorizationFactory cannot be instantiated after latest GBN bump up
Vihang Karajgaonkar created IMPALA-10644: Summary: RangerAuthorizationFactory cannot be instantiated after latest GBN bump up Key: IMPALA-10644 URL: https://issues.apache.org/jira/browse/IMPALA-10644 Project: IMPALA Issue Type: Bug Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar After the GBN was bumped to 11920537 in the commit https://github.com/apache/impala/commit/1ab1143e98ff09610dff82d1795cf103659ffe97 some of the ranger tests are failing with the following exception trace. {noformat} I0407 17:40:18.681761 25041 jni-util.cc:286] org.apache.impala.common.InternalException: Unable to instantiate authorization provider: org.apache.impala.authorization.ranger.RangerAuthorizationFactory at org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:88) at org.apache.impala.service.JniFrontend.(JniFrontend.java:143) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.impala.util.AuthorizationUtil.authzFactoryFrom(AuthorizationUtil.java:86) ... 1 more Caused by: java.lang.NoClassDefFoundError: org/apache/solr/common/SolrException at org.apache.ranger.audit.provider.AuditProviderFactory.getProviderFromConfig(AuditProviderFactory.java:420) at org.apache.ranger.audit.provider.AuditProviderFactory.init(AuditProviderFactory.java:178) at org.apache.ranger.plugin.service.RangerBasePlugin.init(RangerBasePlugin.java:175) at org.apache.impala.authorization.ranger.RangerImpalaPlugin.init(RangerImpalaPlugin.java:50) at org.apache.impala.authorization.ranger.RangerImpalaPlugin.getInstance(RangerImpalaPlugin.java:69) at org.apache.impala.authorization.ranger.RangerAuthorizationChecker.(RangerAuthorizationChecker.java:82) at org.apache.impala.authorization.ranger.RangerAuthorizationFactory.(RangerAuthorizationFactory.java:44) ... 6 more Caused by: java.lang.ClassNotFoundException: org.apache.solr.common.SolrException at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 13 more {noformat} It looks like after the GBN was upgraded we need to have solr dependencies in the fe/pom.xml and they should not be reverted. The toolchain should also be updated to include exclude solr and atlas libraries for the GBN. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10639) useCompactProtocol should be configurable for the catalogd's HMS endpoint
Vihang Karajgaonkar created IMPALA-10639: Summary: useCompactProtocol should be configurable for the catalogd's HMS endpoint Key: IMPALA-10639 URL: https://issues.apache.org/jira/browse/IMPALA-10639 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Currently, catalog server's HMS endpoint has a hardcoded setting to use {{TBinaryProtocol}}. We can add a configuration which can make it switch to using {{TCompactProtocol}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10639) useCompactProtocol should be configurable for the catalogd's HMS endpoint
Vihang Karajgaonkar created IMPALA-10639: Summary: useCompactProtocol should be configurable for the catalogd's HMS endpoint Key: IMPALA-10639 URL: https://issues.apache.org/jira/browse/IMPALA-10639 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Currently, catalog server's HMS endpoint has a hardcoded setting to use {{TBinaryProtocol}}. We can add a configuration which can make it switch to using {{TCompactProtocol}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10638) Add support for SASL and SSL in catalogd's HMS endpoint
Vihang Karajgaonkar created IMPALA-10638: Summary: Add support for SASL and SSL in catalogd's HMS endpoint Key: IMPALA-10638 URL: https://issues.apache.org/jira/browse/IMPALA-10638 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10638) Add support for SASL and SSL in catalogd's HMS endpoint
Vihang Karajgaonkar created IMPALA-10638: Summary: Add support for SASL and SSL in catalogd's HMS endpoint Key: IMPALA-10638 URL: https://issues.apache.org/jira/browse/IMPALA-10638 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10605) Deflake test_refresh_native
[ https://issues.apache.org/jira/browse/IMPALA-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10605. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Deflake test_refresh_native > --- > > Key: IMPALA-10605 > URL: https://issues.apache.org/jira/browse/IMPALA-10605 > Project: IMPALA > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Fix For: Impala 4.0 > > > The test uses a regex to parse the output of describe database and extract > the db properties. The regex currently assumes that there will be only one > property in the database. This assumption breaks when events processor is > running because it might add some db properties as well. > {noformat} > regex = r"{(.*?)=(.*?)}" > {noformat} > The above regex will select subsequent properties as the value of the first > key. We can fix this by changing the regex to specifically look for the > functional name property key prefix. > {noformat} > regex = r"{.*(impala_registered_function.*?)=(.*?)[,}]" > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10605) Deflake test_refresh_native
[ https://issues.apache.org/jira/browse/IMPALA-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10605. -- Fix Version/s: Impala 4.0 Resolution: Fixed > Deflake test_refresh_native > --- > > Key: IMPALA-10605 > URL: https://issues.apache.org/jira/browse/IMPALA-10605 > Project: IMPALA > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Fix For: Impala 4.0 > > > The test uses a regex to parse the output of describe database and extract > the db properties. The regex currently assumes that there will be only one > property in the database. This assumption breaks when events processor is > running because it might add some db properties as well. > {noformat} > regex = r"{(.*?)=(.*?)}" > {noformat} > The above regex will select subsequent properties as the value of the first > key. We can fix this by changing the regex to specifically look for the > functional name property key prefix. > {noformat} > regex = r"{.*(impala_registered_function.*?)=(.*?)[,}]" > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IMPALA-10613) Expose table and partition metadata over HMS API
[ https://issues.apache.org/jira/browse/IMPALA-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17311856#comment-17311856 ] Vihang Karajgaonkar commented on IMPALA-10613: -- https://gerrit.cloudera.org/#/c/17244/ for the review > Expose table and partition metadata over HMS API > > > Key: IMPALA-10613 > URL: https://issues.apache.org/jira/browse/IMPALA-10613 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > > Catalogd caches the table and partition metadata. If an external FE needs to > be supported to query using the Impala, it would need to get this metadata > from catalogd to compile the query and generate the plan. While a subset of > the metadata which is cached in catalogd, is sourced from Hive metastore, it > also caches file metadata which is needed by the Impala backend to create the > Impala plan. It would be good to expose the table and partition metadata > cached in catalogd over HMS API so that any Hive metastore client (e.g spark, > hive) can potentially use this metadata to create a plan. This JIRA tracks > the work needed to expose this information over catalogd. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-10613) Expose table and partition metadata over HMS API
[ https://issues.apache.org/jira/browse/IMPALA-10613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10613 started by Vihang Karajgaonkar. > Expose table and partition metadata over HMS API > > > Key: IMPALA-10613 > URL: https://issues.apache.org/jira/browse/IMPALA-10613 > Project: IMPALA > Issue Type: Sub-task >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > > Catalogd caches the table and partition metadata. If an external FE needs to > be supported to query using the Impala, it would need to get this metadata > from catalogd to compile the query and generate the plan. While a subset of > the metadata which is cached in catalogd, is sourced from Hive metastore, it > also caches file metadata which is needed by the Impala backend to create the > Impala plan. It would be good to expose the table and partition metadata > cached in catalogd over HMS API so that any Hive metastore client (e.g spark, > hive) can potentially use this metadata to create a plan. This JIRA tracks > the work needed to expose this information over catalogd. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10598) test_cache_reload_validation is flaky
[ https://issues.apache.org/jira/browse/IMPALA-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10598. -- Fix Version/s: Impala 4.0 Resolution: Fixed > test_cache_reload_validation is flaky > - > > Key: IMPALA-10598 > URL: https://issues.apache.org/jira/browse/IMPALA-10598 > Project: IMPALA > Issue Type: Test >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Labels: flaky-test > Fix For: Impala 4.0 > > > I noticed that when I run > {noformat} > bin/impala-py.test tests/query_test/test_hdfs_caching.py -k > test_cache_reload_validation > {noformat} > I see a the following failure on master branch. > {noformat} > TestHdfsCachingDdl.test_cache_reload_validation[protocol: beeswax | > exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > text/none] > tests/query_test/test_hdfs_caching.py:269: in test_cache_reload_validation > assert num_entries_pre + 4 == get_num_cache_requests(), \ > E AssertionError: Adding the tables should be reflected by the number of > cache directives. > E assert (2 + 4) == 7 > E+ where 7 = get_num_cache_requests() > {noformat} > This failure is reproducible for me every time but I am not sure why the > jenkins job don't show this test failure. When I looked into this I found > that the test depends on the method > get the number of cache directives on the hdfs. > {noformat} > def get_num_cache_requests_util(): > rc, stdout, stderr = exec_process("hdfs cacheadmin -listDirectives > -stats") > assert rc == 0, 'Error executing hdfs cacheadmin: %s %s' % (stdout, > stderr) > return len(stdout.split('\n')) > {noformat} > This output of this command when there are no entries is > {noformat} > Found 0 entries > {noformat} > when there are entries the output looks like > {noformat} > Found 4 entries > ID POOL REPL EXPIRY PATH >BYTES_NEEDED BYTES_CACHED FILES_NEEDED FILES_CACHED > 225 testPool 8 never /test-warehouse/cachedb.db/cached_tbl_reload > 0 0 0 0 > 226 testPool 8 never > /test-warehouse/cachedb.db/cached_tbl_reload_part 0 >0 0 0 > 227 testPool 8 never > /test-warehouse/cachedb.db/cached_tbl_reload_part/j=1 0 >0 0 0 > 228 testPool 8 never > /test-warehouse/cachedb.db/cached_tbl_reload_part/j=2 0 >0 0 0 > {noformat} > When there are no entries there is also a additional new line which is > counted. > So when there are no entries the method outputs 2 and when there are 4 > entries the method outputs 7 which causes the failure because the test > expects 2+4. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10598) test_cache_reload_validation is flaky
[ https://issues.apache.org/jira/browse/IMPALA-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved IMPALA-10598. -- Fix Version/s: Impala 4.0 Resolution: Fixed > test_cache_reload_validation is flaky > - > > Key: IMPALA-10598 > URL: https://issues.apache.org/jira/browse/IMPALA-10598 > Project: IMPALA > Issue Type: Test >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Labels: flaky-test > Fix For: Impala 4.0 > > > I noticed that when I run > {noformat} > bin/impala-py.test tests/query_test/test_hdfs_caching.py -k > test_cache_reload_validation > {noformat} > I see a the following failure on master branch. > {noformat} > TestHdfsCachingDdl.test_cache_reload_validation[protocol: beeswax | > exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > text/none] > tests/query_test/test_hdfs_caching.py:269: in test_cache_reload_validation > assert num_entries_pre + 4 == get_num_cache_requests(), \ > E AssertionError: Adding the tables should be reflected by the number of > cache directives. > E assert (2 + 4) == 7 > E+ where 7 = get_num_cache_requests() > {noformat} > This failure is reproducible for me every time but I am not sure why the > jenkins job don't show this test failure. When I looked into this I found > that the test depends on the method > get the number of cache directives on the hdfs. > {noformat} > def get_num_cache_requests_util(): > rc, stdout, stderr = exec_process("hdfs cacheadmin -listDirectives > -stats") > assert rc == 0, 'Error executing hdfs cacheadmin: %s %s' % (stdout, > stderr) > return len(stdout.split('\n')) > {noformat} > This output of this command when there are no entries is > {noformat} > Found 0 entries > {noformat} > when there are entries the output looks like > {noformat} > Found 4 entries > ID POOL REPL EXPIRY PATH >BYTES_NEEDED BYTES_CACHED FILES_NEEDED FILES_CACHED > 225 testPool 8 never /test-warehouse/cachedb.db/cached_tbl_reload > 0 0 0 0 > 226 testPool 8 never > /test-warehouse/cachedb.db/cached_tbl_reload_part 0 >0 0 0 > 227 testPool 8 never > /test-warehouse/cachedb.db/cached_tbl_reload_part/j=1 0 >0 0 0 > 228 testPool 8 never > /test-warehouse/cachedb.db/cached_tbl_reload_part/j=2 0 >0 0 0 > {noformat} > When there are no entries there is also a additional new line which is > counted. > So when there are no entries the method outputs 2 and when there are 4 > entries the method outputs 7 which causes the failure because the test > expects 2+4. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10613) Expose table and partition metadata over HMS API
Vihang Karajgaonkar created IMPALA-10613: Summary: Expose table and partition metadata over HMS API Key: IMPALA-10613 URL: https://issues.apache.org/jira/browse/IMPALA-10613 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar Catalogd caches the table and partition metadata. If an external FE needs to be supported to query using the Impala, it would need to get this metadata from catalogd to compile the query and generate the plan. While a subset of the metadata which is cached in catalogd, is sourced from Hive metastore, it also caches file metadata which is needed by the Impala backend to create the Impala plan. It would be good to expose the table and partition metadata cached in catalogd over HMS API so that any Hive metastore client (e.g spark, hive) can potentially use this metadata to create a plan. This JIRA tracks the work needed to expose this information over catalogd. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10613) Expose table and partition metadata over HMS API
Vihang Karajgaonkar created IMPALA-10613: Summary: Expose table and partition metadata over HMS API Key: IMPALA-10613 URL: https://issues.apache.org/jira/browse/IMPALA-10613 Project: IMPALA Issue Type: Sub-task Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar Catalogd caches the table and partition metadata. If an external FE needs to be supported to query using the Impala, it would need to get this metadata from catalogd to compile the query and generate the plan. While a subset of the metadata which is cached in catalogd, is sourced from Hive metastore, it also caches file metadata which is needed by the Impala backend to create the Impala plan. It would be good to expose the table and partition metadata cached in catalogd over HMS API so that any Hive metastore client (e.g spark, hive) can potentially use this metadata to create a plan. This JIRA tracks the work needed to expose this information over catalogd. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10612) Catalogd changes to support external FE
Vihang Karajgaonkar created IMPALA-10612: Summary: Catalogd changes to support external FE Key: IMPALA-10612 URL: https://issues.apache.org/jira/browse/IMPALA-10612 Project: IMPALA Issue Type: Improvement Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar This issue is used to track the work needed to expose metadata in the catalogd over HMS API so that any HMS compatible client would be able to use catalogd as a metadata cache. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10612) Catalogd changes to support external FE
Vihang Karajgaonkar created IMPALA-10612: Summary: Catalogd changes to support external FE Key: IMPALA-10612 URL: https://issues.apache.org/jira/browse/IMPALA-10612 Project: IMPALA Issue Type: Improvement Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar This issue is used to track the work needed to expose metadata in the catalogd over HMS API so that any HMS compatible client would be able to use catalogd as a metadata cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (IMPALA-10605) Deflake test_refresh_native
[ https://issues.apache.org/jira/browse/IMPALA-10605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated IMPALA-10605: - Description: The test uses a regex to parse the output of describe database and extract the db properties. The regex currently assumes that there will be only one property in the database. This assumption breaks when events processor is running because it might add some db properties as well. {noformat} regex = r"{(.*?)=(.*?)}" {noformat} The above regex will select subsequent properties as the value of the first key. We can fix this by changing the regex to specifically look for the functional name property key prefix. {noformat} regex = r"{.*(impala_registered_function.*?)=(.*?)[,}]" {noformat} > Deflake test_refresh_native > --- > > Key: IMPALA-10605 > URL: https://issues.apache.org/jira/browse/IMPALA-10605 > Project: IMPALA > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > > The test uses a regex to parse the output of describe database and extract > the db properties. The regex currently assumes that there will be only one > property in the database. This assumption breaks when events processor is > running because it might add some db properties as well. > {noformat} > regex = r"{(.*?)=(.*?)}" > {noformat} > The above regex will select subsequent properties as the value of the first > key. We can fix this by changing the regex to specifically look for the > functional name property key prefix. > {noformat} > regex = r"{.*(impala_registered_function.*?)=(.*?)[,}]" > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10605) Deflake test_refresh_native
Vihang Karajgaonkar created IMPALA-10605: Summary: Deflake test_refresh_native Key: IMPALA-10605 URL: https://issues.apache.org/jira/browse/IMPALA-10605 Project: IMPALA Issue Type: Improvement Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10605) Deflake test_refresh_native
Vihang Karajgaonkar created IMPALA-10605: Summary: Deflake test_refresh_native Key: IMPALA-10605 URL: https://issues.apache.org/jira/browse/IMPALA-10605 Project: IMPALA Issue Type: Improvement Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IMPALA-10598) test_cache_reload_validation is flaky
Vihang Karajgaonkar created IMPALA-10598: Summary: test_cache_reload_validation is flaky Key: IMPALA-10598 URL: https://issues.apache.org/jira/browse/IMPALA-10598 Project: IMPALA Issue Type: Test Reporter: Vihang Karajgaonkar Assignee: Vihang Karajgaonkar I noticed that when I run {noformat} bin/impala-py.test tests/query_test/test_hdfs_caching.py -k test_cache_reload_validation {noformat} I see a the following failure on master branch. {noformat} TestHdfsCachingDdl.test_cache_reload_validation[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none] tests/query_test/test_hdfs_caching.py:269: in test_cache_reload_validation assert num_entries_pre + 4 == get_num_cache_requests(), \ E AssertionError: Adding the tables should be reflected by the number of cache directives. E assert (2 + 4) == 7 E+ where 7 = get_num_cache_requests() {noformat} This failure is reproducible for me every time but I am not sure why the jenkins job don't show this test failure. When I looked into this I found that the test depends on the method get the number of cache directives on the hdfs. {noformat} def get_num_cache_requests_util(): rc, stdout, stderr = exec_process("hdfs cacheadmin -listDirectives -stats") assert rc == 0, 'Error executing hdfs cacheadmin: %s %s' % (stdout, stderr) return len(stdout.split('\n')) {noformat} This output of this command when there are no entries is {noformat} Found 0 entries {noformat} when there are entries the output looks like {noformat} Found 4 entries ID POOL REPL EXPIRY PATH BYTES_NEEDED BYTES_CACHED FILES_NEEDED FILES_CACHED 225 testPool 8 never /test-warehouse/cachedb.db/cached_tbl_reload 0 0 0 0 226 testPool 8 never /test-warehouse/cachedb.db/cached_tbl_reload_part 0 0 0 0 227 testPool 8 never /test-warehouse/cachedb.db/cached_tbl_reload_part/j=1 0 0 0 0 228 testPool 8 never /test-warehouse/cachedb.db/cached_tbl_reload_part/j=2 0 0 0 0 {noformat} When there are no entries there is also a additional new line which is counted. So when there are no entries the method outputs 2 and when there are 4 entries the method outputs 7 which causes the failure because the test expects 2+4. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-10598) test_cache_reload_validation is flaky
[ https://issues.apache.org/jira/browse/IMPALA-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated IMPALA-10598: - Labels: flaky-test (was: ) > test_cache_reload_validation is flaky > - > > Key: IMPALA-10598 > URL: https://issues.apache.org/jira/browse/IMPALA-10598 > Project: IMPALA > Issue Type: Test >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > Labels: flaky-test > > I noticed that when I run > {noformat} > bin/impala-py.test tests/query_test/test_hdfs_caching.py -k > test_cache_reload_validation > {noformat} > I see a the following failure on master branch. > {noformat} > TestHdfsCachingDdl.test_cache_reload_validation[protocol: beeswax | > exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > text/none] > tests/query_test/test_hdfs_caching.py:269: in test_cache_reload_validation > assert num_entries_pre + 4 == get_num_cache_requests(), \ > E AssertionError: Adding the tables should be reflected by the number of > cache directives. > E assert (2 + 4) == 7 > E+ where 7 = get_num_cache_requests() > {noformat} > This failure is reproducible for me every time but I am not sure why the > jenkins job don't show this test failure. When I looked into this I found > that the test depends on the method > get the number of cache directives on the hdfs. > {noformat} > def get_num_cache_requests_util(): > rc, stdout, stderr = exec_process("hdfs cacheadmin -listDirectives > -stats") > assert rc == 0, 'Error executing hdfs cacheadmin: %s %s' % (stdout, > stderr) > return len(stdout.split('\n')) > {noformat} > This output of this command when there are no entries is > {noformat} > Found 0 entries > {noformat} > when there are entries the output looks like > {noformat} > Found 4 entries > ID POOL REPL EXPIRY PATH >BYTES_NEEDED BYTES_CACHED FILES_NEEDED FILES_CACHED > 225 testPool 8 never /test-warehouse/cachedb.db/cached_tbl_reload > 0 0 0 0 > 226 testPool 8 never > /test-warehouse/cachedb.db/cached_tbl_reload_part 0 >0 0 0 > 227 testPool 8 never > /test-warehouse/cachedb.db/cached_tbl_reload_part/j=1 0 >0 0 0 > 228 testPool 8 never > /test-warehouse/cachedb.db/cached_tbl_reload_part/j=2 0 >0 0 0 > {noformat} > When there are no entries there is also a additional new line which is > counted. > So when there are no entries the method outputs 2 and when there are 4 > entries the method outputs 7 which causes the failure because the test > expects 2+4. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org