[GitHub] [spark] sadikovi commented on a diff in pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

GitBox Tue, 10 May 2022 16:44:34 -0700


sadikovi commented on code in PR #36427:
URL: https://github.com/apache/spark/pull/36427#discussion_r869758806



##########
sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java:
##########
@@ -310,6 +311,10 @@ public final CalendarInterval getInterval(int rowId) {
    * Sets up the data type of this column vector.
    */
   protected ColumnVector(DataType type) {
-    this.type = type;
+    if (type instanceof UserDefinedType) {

Review Comment:
   My understanding is ArrowEvalPythonExec/EvalPythonExec works with a list of 
attributes as an output which is the actual Spark schema, not column vectors' 
types, so it should work. I can add a test for it to make sure it works.
   
   I thought about moving it to `reserveInternal` but then I would need to 
handle it in both off-heap and on-heap and call the method recursively. I 
thought it would be simpler to convert the type directly in ColumnVector and 
use the expanded type everywhere. 
   
   Let me know if you would like me to follow up on anything mentioned above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sadikovi commented on a diff in pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

Reply via email to