subject:"\[GitHub\] \[hudi\] garyli1019 commented on a change in pull request #2721\: \[HUDI\-1720\] when query incr view of mor table which has many delete records use sparksql\/hive\-beeline, StackOverflowError"

[GitHub] [hudi] garyli1019 commented on a change in pull request #2721: [HUDI-1720] when query incr view of mor table which has many delete records use sparksql/hive-beeline, StackOverflowError

2021-03-31 Thread GitBox



garyli1019 commented on a change in pull request #2721:
URL: https://github.com/apache/hudi/pull/2721#discussion_r605342627



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeCompactedRecordReader.java
##
@@ -95,15 +103,24 @@ public boolean next(NullWritable aVoid, ArrayWritable 
arrayWritable) throws IOEx
 // TODO(NA): Invoke preCombine here by converting arrayWritable to 
Avro. This is required since the
 // deltaRecord may not be a full record and needs values of columns 
from the parquet
 Option rec;
-if (usesCustomPayload) {
-  rec = 
deltaRecordMap.get(key).getData().getInsertValue(getWriterSchema());
-} else {
-  rec = 
deltaRecordMap.get(key).getData().getInsertValue(getReaderSchema());
+rec = buildGenericRecordwithCustomPayload(deltaRecordMap.get(key));
+// If the record is not present, this is a delete record using an 
empty payload so skip this base record
+// and move to the next record
+while (!rec.isPresent()) {
+  // if current parquet reader has no record, return false
+  if (!this.parquetReader.next(aVoid, arrayWritable)) {

Review comment:
   ok, I got confused by Spark Record Reader Iterator with this. There is 
no problem here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] garyli1019 commented on a change in pull request #2721: [HUDI-1720] when query incr view of mor table which has many delete records use sparksql/hive-beeline, StackOverflowError

2021-03-25 Thread GitBox



garyli1019 commented on a change in pull request #2721:
URL: https://github.com/apache/hudi/pull/2721#discussion_r601540575



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeCompactedRecordReader.java
##
@@ -95,15 +103,24 @@ public boolean next(NullWritable aVoid, ArrayWritable 
arrayWritable) throws IOEx
 // TODO(NA): Invoke preCombine here by converting arrayWritable to 
Avro. This is required since the
 // deltaRecord may not be a full record and needs values of columns 
from the parquet
 Option rec;
-if (usesCustomPayload) {
-  rec = 
deltaRecordMap.get(key).getData().getInsertValue(getWriterSchema());
-} else {
-  rec = 
deltaRecordMap.get(key).getData().getInsertValue(getReaderSchema());
+rec = buildGenericRecordwithCustomPayload(deltaRecordMap.get(key));
+// If the record is not present, this is a delete record using an 
empty payload so skip this base record
+// and move to the next record
+while (!rec.isPresent()) {
+  // if current parquet reader has no record, return false
+  if (!this.parquetReader.next(aVoid, arrayWritable)) {

Review comment:
   if parquet has records, this will get the record but we didn't read it, 
so I guess we will miss a record here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] garyli1019 commented on a change in pull request #2721: [HUDI-1720] when query incr view of mor table which has many delete records use sparksql/hive-beeline, StackOverflowError

[GitHub] [hudi] garyli1019 commented on a change in pull request #2721: [HUDI-1720] when query incr view of mor table which has many delete records use sparksql/hive-beeline, StackOverflowError

2 matches

Site Navigation

Mail list logo

Footer information