Hi, Consider the following code using spark.ml to get the probability column on a data set:
model.transform(dataSet) .selectExpr("probability.values") .printSchema() Note that "probability" is `vector` type which is a UDT with the following implementation. class VectorUDT extends UserDefinedType[Vector] { override def sqlType: StructType = { // type: 0 = sparse, 1 = dense // We only use "values" for dense vectors, and "size", "indices", and "values" for sparse // vectors. The "values" field is nullable because we might want to add binary vectors later, // which uses "size" and "indices", but not "values". StructType(Seq( StructField("type", ByteType, nullable = false), StructField("size", IntegerType, nullable = true), StructField("indices", ArrayType(IntegerType, containsNull = false), nullable = true), StructField("values", ArrayType(DoubleType, containsNull = false), nullable = true))) } //... } `values` is one of its attribute. However, it can not be extracted. The first code snippet results in an exception of complexTypeExtractors: org.apache.spark.sql.AnalysisException: Can't extract value from probability#743; at ... at ... at ... ... Here is the code: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala#L49 It seems that the pattern matching does not take UDT into consideration. Is this an intended feature? If not, I would like to create a PR to fix it. -- Hao Ren Data Engineer @ leboncoin Paris, France