[jira] [Commented] (HIVE-23764) Remove unnecessary getLastFlushLength when checking delete delta files
[ https://issues.apache.org/jira/browse/HIVE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17150614#comment-17150614 ] Rajesh Balamohan commented on HIVE-23764: - [~pvary] : We can get this fix committed and revise the other ticket later. > Remove unnecessary getLastFlushLength when checking delete delta files > -- > > Key: HIVE-23764 > URL: https://issues.apache.org/jira/browse/HIVE-23764 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls > OrcAcidUtils.getLastFlushLength for every delete delta file. > Even the comment says: > {code} > // NOTE: Calling last flush length below is more for > future-proofing when we have > // streaming deletes. But currently we don't support streaming > deletes, and this can > // be removed if this becomes a performance issue. > {code} > If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then > for every base + delta dir we will check all of the delete_delta directories, > and check the getLastFlushLength method which will result in 6*5=30 > unnecessary NN/S3 calls. > We should remove the check as already proposed in the comment. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23764) Remove unnecessary getLastFlushLength when checking delete delta files
[ https://issues.apache.org/jira/browse/HIVE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149471#comment-17149471 ] Peter Vary commented on HIVE-23764: --- [~rajesh.balamohan]: I see that in HIVE-23597 we have issues with some tests. Also caching the OrcTail might be better placed in LLAP IO, and [~szita] is working on a possible solution. What do you think about pushing this change, and if we hit some road-block with the LLAP IO solution then we might pick up HIVE-23597 again? Thanks, Peter > Remove unnecessary getLastFlushLength when checking delete delta files > -- > > Key: HIVE-23764 > URL: https://issues.apache.org/jira/browse/HIVE-23764 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls > OrcAcidUtils.getLastFlushLength for every delete delta file. > Even the comment says: > {code} > // NOTE: Calling last flush length below is more for > future-proofing when we have > // streaming deletes. But currently we don't support streaming > deletes, and this can > // be removed if this becomes a performance issue. > {code} > If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then > for every base + delta dir we will check all of the delete_delta directories, > and check the getLastFlushLength method which will result in 6*5=30 > unnecessary NN/S3 calls. > We should remove the check as already proposed in the comment. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23764) Remove unnecessary getLastFlushLength when checking delete delta files
[ https://issues.apache.org/jira/browse/HIVE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147563#comment-17147563 ] Peter Vary commented on HIVE-23764: --- Yeah. I somehow forgotten about that jira. Sry :( Do you still plan to push that? Thanks, Peter > Remove unnecessary getLastFlushLength when checking delete delta files > -- > > Key: HIVE-23764 > URL: https://issues.apache.org/jira/browse/HIVE-23764 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls > OrcAcidUtils.getLastFlushLength for every delete delta file. > Even the comment says: > {code} > // NOTE: Calling last flush length below is more for > future-proofing when we have > // streaming deletes. But currently we don't support streaming > deletes, and this can > // be removed if this becomes a performance issue. > {code} > If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then > for every base + delta dir we will check all of the delete_delta directories, > and check the getLastFlushLength method which will result in 6*5=30 > unnecessary NN/S3 calls. > We should remove the check as already proposed in the comment. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23764) Remove unnecessary getLastFlushLength when checking delete delta files
[ https://issues.apache.org/jira/browse/HIVE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147553#comment-17147553 ] Rajesh Balamohan commented on HIVE-23764: - Related ticket : https://issues.apache.org/jira/browse/HIVE-23597 > Remove unnecessary getLastFlushLength when checking delete delta files > -- > > Key: HIVE-23764 > URL: https://issues.apache.org/jira/browse/HIVE-23764 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls > OrcAcidUtils.getLastFlushLength for every delete delta file. > Even the comment says: > {code} > // NOTE: Calling last flush length below is more for > future-proofing when we have > // streaming deletes. But currently we don't support streaming > deletes, and this can > // be removed if this becomes a performance issue. > {code} > If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then > for every base + delta dir we will check all of the delete_delta directories, > and check the getLastFlushLength method which will result in 6*5=30 > unnecessary NN/S3 calls. > We should remove the check as already proposed in the comment. -- This message was sent by Atlassian Jira (v8.3.4#803005)