uros-b commented on code in PR #56661:
URL: https://github.com/apache/spark/pull/56661#discussion_r3475297273


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/types/ops/ParquetTypeOps.scala:
##########
@@ -200,6 +183,32 @@ private[parquet] trait ParquetTypeOps extends Serializable 
{
    * Primitive types return None (no sub-fields to clip).
    */
   def parquetStructSchema: Option[StructType] = None
+
+  // ==================== Vectorized Read ====================
+
+  /**
+   * Whether vectorized (batch) reading is supported for this type.
+   * Used by ParquetUtils.isBatchReadSupported. Default is false - types must 
opt in
+   * by overriding to true. When false, Spark uses the row-based read path 
(newConverter)
+   * which is always available.
+   *
+   * A type that returns true must also supply a batch decoder via 
[[getVectorUpdater]]
+   * (dispatched from ParquetVectorUpdaterFactory.getUpdater); otherwise the 
vectorized factory
+   * would not recognize it. TimeType returns true and overrides 
getVectorUpdater accordingly.
+   *
+   * @param sqlConf the active SQL configuration
+   */
+  def isBatchReadSupported(sqlConf: SQLConf): Boolean = false
+
+  /**
+   * The vectorized (batch) [[ParquetVectorUpdater]] for this type, or None to 
fall back to the
+   * built-in `ParquetVectorUpdaterFactory`. A type that returns Some here 
should also return

Review Comment:
   Docstring nit: getVectorUpdater's "A type that returns Some here should also 
return true from isBatchReadSupported" is really a two-way invariant; supplying 
an updater without flipping the gate means the vectorized path is never taken; 
flipping the gate without an updater (and without legacy factory support) 
routes into a factory that won't recognize the type. Worth stating as the 
bidirectional contract.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to