Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20316#discussion_r162518569 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java --- @@ -50,6 +50,9 @@ * TODO: make this always return ColumnarBatches. */ public class VectorizedParquetRecordReader extends SpecificParquetRecordReaderBase<Object> { + // TODO: make this configurable. + private static final int CAPACITY = 4 * 1024; --- End diff -- Then, what should we use for capacity in `ColumnVectorUtils.toBatch()` ?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org