erikerlandson commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row URL: https://github.com/apache/spark/pull/25024#issuecomment-509416668 To elaborate on the 'raw object reference' above, what I specifically did was try using a DataType like `ObjectType(classOf[TDigest])` in the mutable agg buffer schema. That immediately fails here: https://github.com/apache/spark/blob/3139d642fac0e6ae6b9edd1b4c2912c3a69f71e5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala#L215 For fun I tried defaulting that to "identity" for `ObjectType`, and it gets farther but then it fails way down in code generation: ``` ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 37, Column 24: No applicable constructor/method found for actual parameters "int, org.isarnproject.sketches.TDigest" ``` So that is a flavor of catalyst's problem with handling anything outside its defined universe of data types.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
