[GitHub] spark pull request #22348: [SPARK-25354][SQL] Reduce unneeded operation in n...
Github user SongYadong closed the pull request at: https://github.com/apache/spark/pull/22348 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22348: [SPARK-25354][SQL] Reduce unneeded operation in n...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22348#discussion_r216405506 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java --- @@ -154,8 +159,6 @@ public void close() throws IOException { @Override public boolean nextKeyValue() throws IOException { -resultBatch(); --- End diff -- Ah, i see. Thank you for explanation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22348: [SPARK-25354][SQL] Reduce unneeded operation in n...
Github user SongYadong commented on a diff in the pull request: https://github.com/apache/spark/pull/22348#discussion_r215870108 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java --- @@ -154,8 +159,6 @@ public void close() throws IOException { @Override public boolean nextKeyValue() throws IOException { -resultBatch(); --- End diff -- Thanks for your review. calling sequence is optional. `initBatch()` was moved to `nextBatch()`, if `hasNext()` called before `initBatch()`, eventually `initBatch()` will be called first of all in `nextBatch()`: ``` public boolean nextBatch() throws IOException { if (columnarBatch == null) initBatch(); ``` Orc has a much simple way: ``` public boolean nextKeyValue() throws IOException { return nextBatch(); } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22348: [SPARK-25354][SQL] Reduce unneeded operation in n...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22348#discussion_r215860694 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java --- @@ -154,8 +159,6 @@ public void close() throws IOException { @Override public boolean nextKeyValue() throws IOException { -resultBatch(); --- End diff -- I like this direction to reduce overhead. I confirmed that the current code calls `initBatch()` before calling `nextKeyValue()`, too. My question is that is calling sequence is mandatory or optional. In the future, I wonder a developer may call this method thru `hasNext()` before calling `initBatch()`. How about moving `if-statement` into here like? Does this help performance improvement? ``` if (columnarBatch == null) resultBatch(); ``` I am curious how orc reader handle this. cc @dongjoon-hyun @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org