MaxGekk commented on code in PR #55923:
URL: https://github.com/apache/spark/pull/55923#discussion_r3403286111


##########
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdaterFactory.java:
##########
@@ -470,8 +470,10 @@ public void readValues(
         int offset,
         WritableColumnVector values,
         VectorizedValuesReader valuesReader) {
-      for (int i = 0; i < total; ++i) {
-        readValue(offset + i, values, valuesReader);
+      valuesReader.readIntegersAsLongs(total, values, offset);

Review Comment:
   Now that this two-pass pattern is in place, `DateToTimestampNTZUpdater` 
(optimized in SPARK-56804 via the reader-side `readIntegersAsTimestampMicros`) 
is the odd one out — worth a follow-up migrating it to the same two-pass form 
and deleting `readIntegersAsTimestampMicros` from `VectorizedValuesReader`. The 
two-pass form is the stronger pattern: the bulk reads are mandatory interface 
methods (or have safe per-row defaults), so every encoding gets the speedup 
rather than only readers that override, and it adds no per-conversion interface 
methods. Not asking for it in this PR.



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdaterBenchmark.scala:
##########
@@ -264,6 +264,10 @@ object ParquetVectorUpdaterBenchmark extends BenchmarkBase 
{
       TimestampNTZType,
       descriptor(PrimitiveTypeName.INT32, LogicalTypeAnnotation.dateType()),
       longVec, intBytes)
+    addReadValuesCase(benchmark, "LongAsNanosUpdater (TimeType)",
+      TimeType(),
+      descriptor(PrimitiveTypeName.INT64, 
LogicalTypeAnnotation.timeType(false, LogicalTypeAnnotation.TimeUnit.MICROS)),

Review Comment:
   This line is 120 chars, over scalastyle's 100-char limit, so 
`dev/lint-scala` will fail — my earlier single-line suggestion was over-long, 
sorry. Wrapped to match the `timestampType` cases below:
   ```suggestion
         descriptor(PrimitiveTypeName.INT64,
           LogicalTypeAnnotation.timeType(false, 
LogicalTypeAnnotation.TimeUnit.MICROS)),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to