[jira] [Comment Edited] (IMPALA-11677) FireInsertEvents function can be very slow for tables with large number of partitions.
[ https://issues.apache.org/jira/browse/IMPALA-11677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620713#comment-17620713 ] Qihong Jiang edited comment on IMPALA-11677 at 10/20/22 5:05 AM: - Hello !, [~csringhofer] I'm only using non-transactional tables right now and it's equally slow. I tried using the Bulk API last week, but the improvement was very small. Then I referenced the code in impala3 and modified it to be an asynchronous call. The execution speed is greatly improved, but I don't know if there is any risk. {code:java} public static List fireInsertEvents(MetaStoreClient msClient, TableInsertEventInfo insertEventInfo, String dbName, String tableName) { if (!insertEventInfo.isTransactional()) { LOG.info("fire the insert events asynchronously."); ExecutorService fireInsertEventThread = Executors.newSingleThreadExecutor(); CompletableFuture.runAsync(() -> { try { fireInsertEventHelper(msClient.getHiveClient(), insertEventInfo.getInsertEventReqData(), insertEventInfo.getInsertEventPartVals(), dbName, tableName); } catch(Exception e) { LOG.error("failed to async call fireInsertEventHelper"); } finally { msClient.close(); LOG.info("fire the insert events asynchronously end."); } }, fireInsertEventThread) .thenRun(() -> fireInsertEventThread.shutdown()); } else { Stopwatch sw = Stopwatch.createStarted(); try { fireInsertTransactionalEventHelper(msClient.getHiveClient(), insertEventInfo, dbName, tableName); } catch (Exception e) { LOG.error("Failed to fire insert event. Some tables might not be" + " refreshed on other impala clusters.", e); } finally { LOG.info("Time taken to fire insert events on table {}.{}: {} msec", dbName, tableName, sw.stop().elapsed(TimeUnit.MILLISECONDS)); msClient.close(); } } return Collections.emptyList(); }{code} I am an impala newbie. I hope to get your guidance. Thank you! was (Author: JIRAUSER289149): Hello !, [~csringhofer] I'm only using non-transactional tables right now and it's equally slow. I tried using the Bulk API last week, but the improvement was very small. Then I referenced the code in impala3 and modified it to be an asynchronous call. The execution speed is greatly improved, but I don't know if there is any risk. {code:java} public static List fireInsertEvents(MetaStoreClient msClient, TableInsertEventInfo insertEventInfo, String dbName, String tableName) { if (!insertEventInfo.isTransactional()) { LOG.info("fire the insert events asynchronously."); ExecutorService fireInsertEventThread = Executors.newSingleThreadExecutor(); CompletableFuture.runAsync(() -> { try { fireInsertEventHelper(msClient.getHiveClient(), insertEventInfo.getInsertEventReqData(), insertEventInfo.getInsertEventPartVals(), dbName, tableName); } catch(Exception e) { LOG.error("failed to async call fireInsertEventHelper"); } finally { msClient.close(); LOG.info("fire the insert events asynchronously end."); } }, fireInsertEventThread) .thenRun(() -> fireInsertEventThread.shutdown()); } else { Stopwatch sw = Stopwatch.createStarted(); try { fireInsertTransactionalEventHelper(msClient.getHiveClient(), insertEventInfo, dbName, tableName); } catch (Exception e) { LOG.error("Failed to fire insert event. Some tables might not be" + " refreshed on other impala clusters.", e); } finally { LOG.info("Time taken to fire insert events on table {}.{}: {} msec", dbName, tableName, sw.stop().elapsed(TimeUnit.MILLISECONDS)); msClient.close(); } } return Collections.emptyList(); }{code} I am not an expert in impala. I hope to get your guidance. Thank you! > FireInsertEvents function can be very slow for tables with large number of > partitions. > -- > > Key: IMPALA-11677 > URL: https://issues.apache.org/jira/browse/IMPALA-11677 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 4.1.0 >Reporter: Qihong Jiang >Assignee: Qihong Jiang >Priority: Major > > In src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java. > fireInsertEvents function can be very slow for tables with large number of > partitio
[jira] [Comment Edited] (IMPALA-11677) FireInsertEvents function can be very slow for tables with large number of partitions.
[ https://issues.apache.org/jira/browse/IMPALA-11677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620713#comment-17620713 ] Qihong Jiang edited comment on IMPALA-11677 at 10/20/22 5:04 AM: - Hello !, [~csringhofer] I'm only using non-transactional tables right now and it's equally slow. I tried using the Bulk API last week, but the improvement was very small. Then I referenced the code in impala3 and modified it to be an asynchronous call. The execution speed is greatly improved, but I don't know if there is any risk. {code:java} public static List fireInsertEvents(MetaStoreClient msClient, TableInsertEventInfo insertEventInfo, String dbName, String tableName) { if (!insertEventInfo.isTransactional()) { LOG.info("fire the insert events asynchronously."); ExecutorService fireInsertEventThread = Executors.newSingleThreadExecutor(); CompletableFuture.runAsync(() -> { try { fireInsertEventHelper(msClient.getHiveClient(), insertEventInfo.getInsertEventReqData(), insertEventInfo.getInsertEventPartVals(), dbName, tableName); } catch(Exception e) { LOG.error("failed to async call fireInsertEventHelper"); } finally { msClient.close(); LOG.info("fire the insert events asynchronously end."); } }, fireInsertEventThread) .thenRun(() -> fireInsertEventThread.shutdown()); } else { Stopwatch sw = Stopwatch.createStarted(); try { fireInsertTransactionalEventHelper(msClient.getHiveClient(), insertEventInfo, dbName, tableName); } catch (Exception e) { LOG.error("Failed to fire insert event. Some tables might not be" + " refreshed on other impala clusters.", e); } finally { LOG.info("Time taken to fire insert events on table {}.{}: {} msec", dbName, tableName, sw.stop().elapsed(TimeUnit.MILLISECONDS)); msClient.close(); } } return Collections.emptyList(); }{code} I am not an expert in impala. I hope to get your guidance. Thank you! was (Author: JIRAUSER289149): Hello !, [~csringhofer] I'm only using non-transactional tables right now and it's equally slow. I tried using the Bulk API last week, but the improvement was very small. Then I referenced the code in impala3 and modified it to be an asynchronous call. The execution speed is greatly improved, but I don't know if there is any risk. {code:java} public static List fireInsertEvents(MetaStoreClient msClient, TableInsertEventInfo insertEventInfo, String dbName, String tableName) { if (!insertEventInfo.isTransactional()) { LOG.info("fire the insert events asynchronously."); ExecutorService fireInsertEventThread = Executors.newSingleThreadExecutor(); CompletableFuture.runAsync(() -> { try { fireInsertEventHelper(msClient.getHiveClient(), insertEventInfo.getInsertEventReqData(), insertEventInfo.getInsertEventPartVals(), dbName, tableName); } catch(Exception e) { LOG.error("failed to async call fireInsertEventHelper"); } }, fireInsertEventThread) .thenRun(() -> { LOG.info("fire the insert events asynchronously end."); msClient.close(); fireInsertEventThread.shutdown(); }); } else { Stopwatch sw = Stopwatch.createStarted(); try { fireInsertTransactionalEventHelper(msClient.getHiveClient(), insertEventInfo, dbName, tableName); } catch (Exception e) { LOG.error("Failed to fire insert event. Some tables might not be" + " refreshed on other impala clusters.", e); } finally { LOG.info("Time taken to fire insert events on table {}.{}: {} msec", dbName, tableName, sw.stop().elapsed(TimeUnit.MILLISECONDS)); msClient.close(); } } return Collections.emptyList(); }{code} I am not an expert in impala. I hope to get your guidance. Thank you! > FireInsertEvents function can be very slow for tables with large number of > partitions. > -- > > Key: IMPALA-11677 > URL: https://issues.apache.org/jira/browse/IMPALA-11677 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Affects Versions: Impala 4.1.0 >Reporter: Qihong Jiang >Assignee: Qihong Jiang >Priority: Major > > In src/compat-apache-hive-3/java/org/apache/impala/compat/MetastoreShim.java. > fireInsertEvents function can be very slow for tables with large numbe