Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20316#discussion_r162518087 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java --- @@ -50,6 +50,9 @@ * TODO: make this always return ColumnarBatches. */ public class VectorizedParquetRecordReader extends SpecificParquetRecordReaderBase<Object> { + // TODO: make this configurable. + private static final int CAPACITY = 4 * 1024; --- End diff -- we should set them separately, places using `ColumnarBatch` should decide the default size themselves.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org