uros-b commented on code in PR #56661:
URL: https://github.com/apache/spark/pull/56661#discussion_r3475297273
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/types/ops/ParquetTypeOps.scala:
##########
@@ -200,6 +183,32 @@ private[parquet] trait ParquetTypeOps extends Serializable
{
* Primitive types return None (no sub-fields to clip).
*/
def parquetStructSchema: Option[StructType] = None
+
+ // ==================== Vectorized Read ====================
+
+ /**
+ * Whether vectorized (batch) reading is supported for this type.
+ * Used by ParquetUtils.isBatchReadSupported. Default is false - types must
opt in
+ * by overriding to true. When false, Spark uses the row-based read path
(newConverter)
+ * which is always available.
+ *
+ * A type that returns true must also supply a batch decoder via
[[getVectorUpdater]]
+ * (dispatched from ParquetVectorUpdaterFactory.getUpdater); otherwise the
vectorized factory
+ * would not recognize it. TimeType returns true and overrides
getVectorUpdater accordingly.
+ *
+ * @param sqlConf the active SQL configuration
+ */
+ def isBatchReadSupported(sqlConf: SQLConf): Boolean = false
+
+ /**
+ * The vectorized (batch) [[ParquetVectorUpdater]] for this type, or None to
fall back to the
+ * built-in `ParquetVectorUpdaterFactory`. A type that returns Some here
should also return
Review Comment:
Docstring nit: getVectorUpdater's "A type that returns Some here should also
return true from isBatchReadSupported" is really a two-way invariant; supplying
an updater without flipping the gate means the vectorized path is never taken;
flipping the gate without an updater (and without legacy factory support)
routes into a factory that won't recognize the type. Worth stating as the
bidirectional contract.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]