[jira] [Work logged] (HIVE-23597) VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete delta directories multiple times

2020-09-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23597?focusedWorklogId=480532&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480532
 ]

ASF GitHub Bot logged work on HIVE-23597:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 00:47
Start Date: 09/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1081:
URL: https://github.com/apache/hive/pull/1081


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480532)
Time Spent: 1h 40m  (was: 1.5h)

> VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete 
> delta directories multiple times
> 
>
> Key: HIVE-23597
> URL: https://issues.apache.org/jira/browse/HIVE-23597
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1562]
> {code:java}
> try {
> final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
> if (deleteDeltaDirs.length > 0) {
>   int totalDeleteEventCount = 0;
>   for (Path deleteDeltaDir : deleteDeltaDirs) {
> {code}
>  
> Consider a directory layout like the following. This was created by having 
> simple set of "insert --> update --> select" queries.
>  
> {noformat}
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_001
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_002
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_008_008_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_009_009_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_010_010_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_011_011_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_012_012_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_013_013_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_008_008_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_009_009_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_010_010_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_011_011_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_012_012_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_013_013_
>  {noformat}
>  
> Orcsplit contains all the delete delta folder information. For the directory 
> layout like this, it would create {{~12 splits}}. For every split, it 
> constructs "ColumnizedDeleteEventRegistry" in VRBAcidReader a

[jira] [Work logged] (HIVE-23597) VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete delta directories multiple times

2020-08-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23597?focusedWorklogId=476948&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-476948
 ]

ASF GitHub Bot logged work on HIVE-23597:
-

Author: ASF GitHub Bot
Created on: 01/Sep/20 00:47
Start Date: 01/Sep/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1081:
URL: https://github.com/apache/hive/pull/1081#issuecomment-684124193


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 476948)
Time Spent: 1.5h  (was: 1h 20m)

> VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete 
> delta directories multiple times
> 
>
> Key: HIVE-23597
> URL: https://issues.apache.org/jira/browse/HIVE-23597
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1562]
> {code:java}
> try {
> final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
> if (deleteDeltaDirs.length > 0) {
>   int totalDeleteEventCount = 0;
>   for (Path deleteDeltaDir : deleteDeltaDirs) {
> {code}
>  
> Consider a directory layout like the following. This was created by having 
> simple set of "insert --> update --> select" queries.
>  
> {noformat}
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_001
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_002
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_008_008_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_009_009_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_010_010_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_011_011_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_012_012_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_013_013_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_008_008_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_009_009_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_010_010_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_011_011_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_012_012_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_u

[jira] [Work logged] (HIVE-23597) VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete delta directories multiple times

2020-06-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23597?focusedWorklogId=452674&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452674
 ]

ASF GitHub Bot logged work on HIVE-23597:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 22:49
Start Date: 29/Jun/20 22:49
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1081:
URL: https://github.com/apache/hive/pull/1081#discussion_r447302965



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1605,6 +1618,46 @@ public int compareTo(CompressedOwid other) {
 throw e; // rethrow the exception so that the caller can handle.
   }
 }
+
+/**
+ * Create delete delta reader. Caching orc tail to avoid FS lookup/reads 
for repeated scans.
+ *
+ * @param deleteDeltaFile
+ * @param conf
+ * @param fs FileSystem
+ * @return delete file reader
+ * @throws IOException
+ */
+private Reader getDeleteDeltaReader(Path deleteDeltaFile, JobConf conf, 
FileSystem fs) throws IOException {
+  OrcTail deleteDeltaTail = 
deleteDeltaOrcTailCache.getIfPresent(deleteDeltaFile);

Review comment:
   Yes, it will not change for the file.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452674)
Time Spent: 1h 20m  (was: 1h 10m)

> VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete 
> delta directories multiple times
> 
>
> Key: HIVE-23597
> URL: https://issues.apache.org/jira/browse/HIVE-23597
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1562]
> {code:java}
> try {
> final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
> if (deleteDeltaDirs.length > 0) {
>   int totalDeleteEventCount = 0;
>   for (Path deleteDeltaDir : deleteDeltaDirs) {
> {code}
>  
> Consider a directory layout like the following. This was created by having 
> simple set of "insert --> update --> select" queries.
>  
> {noformat}
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_001
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_002
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_008_008_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_009_009_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_010_010_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_011_011_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_012_012_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_013_013_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/man

[jira] [Work logged] (HIVE-23597) VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete delta directories multiple times

2020-06-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23597?focusedWorklogId=452175&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452175
 ]

ASF GitHub Bot logged work on HIVE-23597:
-

Author: ASF GitHub Bot
Created on: 29/Jun/20 05:43
Start Date: 29/Jun/20 05:43
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1081:
URL: https://github.com/apache/hive/pull/1081#discussion_r446784234



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1605,6 +1618,46 @@ public int compareTo(CompressedOwid other) {
 throw e; // rethrow the exception so that the caller can handle.
   }
 }
+
+/**
+ * Create delete delta reader. Caching orc tail to avoid FS lookup/reads 
for repeated scans.
+ *
+ * @param deleteDeltaFile
+ * @param conf
+ * @param fs FileSystem
+ * @return delete file reader
+ * @throws IOException
+ */
+private Reader getDeleteDeltaReader(Path deleteDeltaFile, JobConf conf, 
FileSystem fs) throws IOException {
+  OrcTail deleteDeltaTail = 
deleteDeltaOrcTailCache.getIfPresent(deleteDeltaFile);

Review comment:
   Is the OrcTail thread safe? If I understandcorrectly the tail will be 
read by multiple LLAP threads concurrenly





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452175)
Time Spent: 1h 10m  (was: 1h)

> VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete 
> delta directories multiple times
> 
>
> Key: HIVE-23597
> URL: https://issues.apache.org/jira/browse/HIVE-23597
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1562]
> {code:java}
> try {
> final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
> if (deleteDeltaDirs.length > 0) {
>   int totalDeleteEventCount = 0;
>   for (Path deleteDeltaDir : deleteDeltaDirs) {
> {code}
>  
> Consider a directory layout like the following. This was created by having 
> simple set of "insert --> update --> select" queries.
>  
> {noformat}
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_001
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_002
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_008_008_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_009_009_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_010_010_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_011_011_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_012_012_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_013_013_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_007_

[jira] [Work logged] (HIVE-23597) VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete delta directories multiple times

2020-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23597?focusedWorklogId=444254&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-444254
 ]

ASF GitHub Bot logged work on HIVE-23597:
-

Author: ASF GitHub Bot
Created on: 11/Jun/20 12:58
Start Date: 11/Jun/20 12:58
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1081:
URL: https://github.com/apache/hive/pull/1081#discussion_r438762099



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1561,24 +1572,22 @@ public int compareTo(CompressedOwid other) {
   try {
 final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
 if (deleteDeltaDirs.length > 0) {
+  FileSystem fs = orcSplit.getPath().getFileSystem(conf);
+  AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+  AcidUtils.parseBaseOrDeltaBucketFilename(orcSplit.getPath(), 
conf);
   int totalDeleteEventCount = 0;
   for (Path deleteDeltaDir : deleteDeltaDirs) {
-FileSystem fs = deleteDeltaDir.getFileSystem(conf);
+if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
+  continue;
+}
 Path[] deleteDeltaFiles = 
OrcRawRecordMerger.getDeltaFiles(deleteDeltaDir, bucket,
 new OrcRawRecordMerger.Options().isCompacting(false), null);
 for (Path deleteDeltaFile : deleteDeltaFiles) {
-  // NOTE: Calling last flush length below is more for 
future-proofing when we have
-  // streaming deletes. But currently we don't support streaming 
deletes, and this can
-  // be removed if this becomes a performance issue.
-  long length = OrcAcidUtils.getLastFlushLength(fs, 
deleteDeltaFile);
+  // NOTE: When streaming deletes are supported, consider using 
OrcAcidUtils.getLastFlushLength(fs, deleteDeltaFile)
   // NOTE: A check for existence of deleteDeltaFile is required 
because we may not have
   // deletes for the bucket being taken into consideration for 
this split processing.
-  if (length != -1 && fs.exists(deleteDeltaFile)) {
-/**
- * todo: we have OrcSplit.orcTail so we should be able to get 
stats from there
- */
-Reader deleteDeltaReader = 
OrcFile.createReader(deleteDeltaFile,
-OrcFile.readerOptions(conf).maxLength(length));
+  if (fs.exists(deleteDeltaFile)) {

Review comment:
   Looks good, let's get a green run :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 444254)
Time Spent: 1h  (was: 50m)

> VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete 
> delta directories multiple times
> 
>
> Key: HIVE-23597
> URL: https://issues.apache.org/jira/browse/HIVE-23597
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1562]
> {code:java}
> try {
> final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
> if (deleteDeltaDirs.length > 0) {
>   int totalDeleteEventCount = 0;
>   for (Path deleteDeltaDir : deleteDeltaDirs) {
> {code}
>  
> Consider a directory layout like the following. This was created by having 
> simple set of "insert --> update --> select" queries.
>  
> {noformat}
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_001
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_002
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_006_006_
> /warehouse-1591131255-hl5z/wareh

[jira] [Work logged] (HIVE-23597) VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete delta directories multiple times

2020-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23597?focusedWorklogId=444201&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-444201
 ]

ASF GitHub Bot logged work on HIVE-23597:
-

Author: ASF GitHub Bot
Created on: 11/Jun/20 10:02
Start Date: 11/Jun/20 10:02
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1081:
URL: https://github.com/apache/hive/pull/1081#discussion_r438677418



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1561,24 +1572,22 @@ public int compareTo(CompressedOwid other) {
   try {
 final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
 if (deleteDeltaDirs.length > 0) {
+  FileSystem fs = orcSplit.getPath().getFileSystem(conf);
+  AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+  AcidUtils.parseBaseOrDeltaBucketFilename(orcSplit.getPath(), 
conf);
   int totalDeleteEventCount = 0;
   for (Path deleteDeltaDir : deleteDeltaDirs) {
-FileSystem fs = deleteDeltaDir.getFileSystem(conf);
+if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
+  continue;
+}
 Path[] deleteDeltaFiles = 
OrcRawRecordMerger.getDeltaFiles(deleteDeltaDir, bucket,
 new OrcRawRecordMerger.Options().isCompacting(false), null);
 for (Path deleteDeltaFile : deleteDeltaFiles) {
-  // NOTE: Calling last flush length below is more for 
future-proofing when we have
-  // streaming deletes. But currently we don't support streaming 
deletes, and this can
-  // be removed if this becomes a performance issue.
-  long length = OrcAcidUtils.getLastFlushLength(fs, 
deleteDeltaFile);
+  // NOTE: When streaming deletes are supported, consider using 
OrcAcidUtils.getLastFlushLength(fs, deleteDeltaFile)
   // NOTE: A check for existence of deleteDeltaFile is required 
because we may not have
   // deletes for the bucket being taken into consideration for 
this split processing.
-  if (length != -1 && fs.exists(deleteDeltaFile)) {
-/**
- * todo: we have OrcSplit.orcTail so we should be able to get 
stats from there
- */
-Reader deleteDeltaReader = 
OrcFile.createReader(deleteDeltaFile,
-OrcFile.readerOptions(conf).maxLength(length));
+  if (fs.exists(deleteDeltaFile)) {

Review comment:
   Addressed the file.exists issue in recent commit.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 444201)
Time Spent: 50m  (was: 40m)

> VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete 
> delta directories multiple times
> 
>
> Key: HIVE-23597
> URL: https://issues.apache.org/jira/browse/HIVE-23597
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1562]
> {code:java}
> try {
> final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
> if (deleteDeltaDirs.length > 0) {
>   int totalDeleteEventCount = 0;
>   for (Path deleteDeltaDir : deleteDeltaDirs) {
> {code}
>  
> Consider a directory layout like the following. This was created by having 
> simple set of "insert --> update --> select" queries.
>  
> {noformat}
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_001
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_002
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_006_006_
> /warehouse-1

[jira] [Work logged] (HIVE-23597) VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete delta directories multiple times

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23597?focusedWorklogId=443532&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-443532
 ]

ASF GitHub Bot logged work on HIVE-23597:
-

Author: ASF GitHub Bot
Created on: 10/Jun/20 02:11
Start Date: 10/Jun/20 02:11
Worklog Time Spent: 10m 
  Work Description: rbalamohan commented on a change in pull request #1081:
URL: https://github.com/apache/hive/pull/1081#discussion_r437820538



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1561,24 +1572,22 @@ public int compareTo(CompressedOwid other) {
   try {
 final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
 if (deleteDeltaDirs.length > 0) {
+  FileSystem fs = orcSplit.getPath().getFileSystem(conf);
+  AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+  AcidUtils.parseBaseOrDeltaBucketFilename(orcSplit.getPath(), 
conf);
   int totalDeleteEventCount = 0;
   for (Path deleteDeltaDir : deleteDeltaDirs) {
-FileSystem fs = deleteDeltaDir.getFileSystem(conf);
+if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
+  continue;
+}
 Path[] deleteDeltaFiles = 
OrcRawRecordMerger.getDeltaFiles(deleteDeltaDir, bucket,
 new OrcRawRecordMerger.Options().isCompacting(false), null);
 for (Path deleteDeltaFile : deleteDeltaFiles) {
-  // NOTE: Calling last flush length below is more for 
future-proofing when we have
-  // streaming deletes. But currently we don't support streaming 
deletes, and this can
-  // be removed if this becomes a performance issue.
-  long length = OrcAcidUtils.getLastFlushLength(fs, 
deleteDeltaFile);
+  // NOTE: When streaming deletes are supported, consider using 
OrcAcidUtils.getLastFlushLength(fs, deleteDeltaFile)
   // NOTE: A check for existence of deleteDeltaFile is required 
because we may not have
   // deletes for the bucket being taken into consideration for 
this split processing.
-  if (length != -1 && fs.exists(deleteDeltaFile)) {
-/**
- * todo: we have OrcSplit.orcTail so we should be able to get 
stats from there
- */
-Reader deleteDeltaReader = 
OrcFile.createReader(deleteDeltaFile,
-OrcFile.readerOptions(conf).maxLength(length));
+  if (fs.exists(deleteDeltaFile)) {

Review comment:
   Yes, also runtime can be further reduced if DeleteReader values are in 
memory as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 443532)
Time Spent: 40m  (was: 0.5h)

> VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete 
> delta directories multiple times
> 
>
> Key: HIVE-23597
> URL: https://issues.apache.org/jira/browse/HIVE-23597
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1562]
> {code:java}
> try {
> final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
> if (deleteDeltaDirs.length > 0) {
>   int totalDeleteEventCount = 0;
>   for (Path deleteDeltaDir : deleteDeltaDirs) {
> {code}
>  
> Consider a directory layout like the following. This was created by having 
> simple set of "insert --> update --> select" queries.
>  
> {noformat}
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_001
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_002
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_de

[jira] [Work logged] (HIVE-23597) VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete delta directories multiple times

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23597?focusedWorklogId=443199&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-443199
 ]

ASF GitHub Bot logged work on HIVE-23597:
-

Author: ASF GitHub Bot
Created on: 09/Jun/20 16:40
Start Date: 09/Jun/20 16:40
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk closed pull request #1081:
URL: https://github.com/apache/hive/pull/1081


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 443199)
Time Spent: 0.5h  (was: 20m)

> VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete 
> delta directories multiple times
> 
>
> Key: HIVE-23597
> URL: https://issues.apache.org/jira/browse/HIVE-23597
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1562]
> {code:java}
> try {
> final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
> if (deleteDeltaDirs.length > 0) {
>   int totalDeleteEventCount = 0;
>   for (Path deleteDeltaDir : deleteDeltaDirs) {
> {code}
>  
> Consider a directory layout like the following. This was created by having 
> simple set of "insert --> update --> select" queries.
>  
> {noformat}
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_001
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_002
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_008_008_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_009_009_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_010_010_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_011_011_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_012_012_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_013_013_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_008_008_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_009_009_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_010_010_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_011_011_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_012_012_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_013_013_
>  {noformat}
>  
> Orcsplit contains all the delete delta folder information. For the directory 
> layout like this, it would create {{~12 splits}}. For every split, it 
> constructs "ColumnizedDeleteEventRegistry" in VRBAcidReader and ends up 
> r

[jira] [Work logged] (HIVE-23597) VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete delta directories multiple times

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23597?focusedWorklogId=442855&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442855
 ]

ASF GitHub Bot logged work on HIVE-23597:
-

Author: ASF GitHub Bot
Created on: 09/Jun/20 16:05
Start Date: 09/Jun/20 16:05
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #1081:
URL: https://github.com/apache/hive/pull/1081#discussion_r437191512



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java
##
@@ -1561,24 +1572,22 @@ public int compareTo(CompressedOwid other) {
   try {
 final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
 if (deleteDeltaDirs.length > 0) {
+  FileSystem fs = orcSplit.getPath().getFileSystem(conf);
+  AcidOutputFormat.Options orcSplitMinMaxWriteIds =
+  AcidUtils.parseBaseOrDeltaBucketFilename(orcSplit.getPath(), 
conf);
   int totalDeleteEventCount = 0;
   for (Path deleteDeltaDir : deleteDeltaDirs) {
-FileSystem fs = deleteDeltaDir.getFileSystem(conf);
+if (!isQualifiedDeleteDeltaForSplit(orcSplitMinMaxWriteIds, 
deleteDeltaDir)) {
+  continue;
+}
 Path[] deleteDeltaFiles = 
OrcRawRecordMerger.getDeltaFiles(deleteDeltaDir, bucket,
 new OrcRawRecordMerger.Options().isCompacting(false), null);
 for (Path deleteDeltaFile : deleteDeltaFiles) {
-  // NOTE: Calling last flush length below is more for 
future-proofing when we have
-  // streaming deletes. But currently we don't support streaming 
deletes, and this can
-  // be removed if this becomes a performance issue.
-  long length = OrcAcidUtils.getLastFlushLength(fs, 
deleteDeltaFile);
+  // NOTE: When streaming deletes are supported, consider using 
OrcAcidUtils.getLastFlushLength(fs, deleteDeltaFile)
   // NOTE: A check for existence of deleteDeltaFile is required 
because we may not have
   // deletes for the bucket being taken into consideration for 
this split processing.
-  if (length != -1 && fs.exists(deleteDeltaFile)) {
-/**
- * todo: we have OrcSplit.orcTail so we should be able to get 
stats from there
- */
-Reader deleteDeltaReader = 
OrcFile.createReader(deleteDeltaFile,
-OrcFile.readerOptions(conf).maxLength(length));
+  if (fs.exists(deleteDeltaFile)) {

Review comment:
   We might wan to get rid of this exists to save NN calls, and just handle 
FileNotFoundException in case of missing file (exists does the same inside it's 
method)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442855)
Time Spent: 20m  (was: 10m)

> VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete 
> delta directories multiple times
> 
>
> Key: HIVE-23597
> URL: https://issues.apache.org/jira/browse/HIVE-23597
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1562]
> {code:java}
> try {
> final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
> if (deleteDeltaDirs.length > 0) {
>   int totalDeleteEventCount = 0;
>   for (Path deleteDeltaDir : deleteDeltaDirs) {
> {code}
>  
> Consider a directory layout like the following. This was created by having 
> simple set of "insert --> update --> select" queries.
>  
> {noformat}
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_001
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_002
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_005_005_
> /warehouse-1591131255

[jira] [Work logged] (HIVE-23597) VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete delta directories multiple times

2020-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23597?focusedWorklogId=442811&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442811
 ]

ASF GitHub Bot logged work on HIVE-23597:
-

Author: ASF GitHub Bot
Created on: 09/Jun/20 16:00
Start Date: 09/Jun/20 16:00
Worklog Time Spent: 10m 
  Work Description: rbalamohan opened a new pull request #1081:
URL: https://github.com/apache/hive/pull/1081







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 442811)
Remaining Estimate: 0h
Time Spent: 10m

> VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete 
> delta directories multiple times
> 
>
> Key: HIVE-23597
> URL: https://issues.apache.org/jira/browse/HIVE-23597
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1562]
> {code:java}
> try {
> final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
> if (deleteDeltaDirs.length > 0) {
>   int totalDeleteEventCount = 0;
>   for (Path deleteDeltaDir : deleteDeltaDirs) {
> {code}
>  
> Consider a directory layout like the following. This was created by having 
> simple set of "insert --> update --> select" queries.
>  
> {noformat}
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_001
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_002
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_008_008_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_009_009_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_010_010_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_011_011_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_012_012_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_013_013_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_003_003_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_004_004_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_005_005_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_006_006_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_007_007_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_008_008_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_009_009_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_010_010_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_011_011_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_012_012_
> /warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_013_013_
>  {noformat}
>  
> Orcsplit contains all the delete delta folder information. For the directory 
> layout like this, it would create {{~12 splits}}. For every split, it 
> constructs "ColumnizedDeleteEventRegistry" in VRBAcidReader and ends up 
> reading all t