MaxGekk commented on code in PR #55923:
URL: https://github.com/apache/spark/pull/55923#discussion_r3403286111
##########
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdaterFactory.java:
##########
@@ -470,8 +470,10 @@ public void readValues(
int offset,
WritableColumnVector values,
VectorizedValuesReader valuesReader) {
- for (int i = 0; i < total; ++i) {
- readValue(offset + i, values, valuesReader);
+ valuesReader.readIntegersAsLongs(total, values, offset);
Review Comment:
Now that this two-pass pattern is in place, `DateToTimestampNTZUpdater`
(optimized in SPARK-56804 via the reader-side `readIntegersAsTimestampMicros`)
is the odd one out — worth a follow-up migrating it to the same two-pass form
and deleting `readIntegersAsTimestampMicros` from `VectorizedValuesReader`. The
two-pass form is the stronger pattern: the bulk reads are mandatory interface
methods (or have safe per-row defaults), so every encoding gets the speedup
rather than only readers that override, and it adds no per-conversion interface
methods. Not asking for it in this PR.
##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdaterBenchmark.scala:
##########
@@ -264,6 +264,10 @@ object ParquetVectorUpdaterBenchmark extends BenchmarkBase
{
TimestampNTZType,
descriptor(PrimitiveTypeName.INT32, LogicalTypeAnnotation.dateType()),
longVec, intBytes)
+ addReadValuesCase(benchmark, "LongAsNanosUpdater (TimeType)",
+ TimeType(),
+ descriptor(PrimitiveTypeName.INT64,
LogicalTypeAnnotation.timeType(false, LogicalTypeAnnotation.TimeUnit.MICROS)),
Review Comment:
This line is 120 chars, over scalastyle's 100-char limit, so
`dev/lint-scala` will fail — my earlier single-line suggestion was over-long,
sorry. Wrapped to match the `timestampType` cases below:
```suggestion
descriptor(PrimitiveTypeName.INT64,
LogicalTypeAnnotation.timeType(false,
LogicalTypeAnnotation.TimeUnit.MICROS)),
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]