Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9190#discussion_r42778416 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala --- @@ -46,13 +47,27 @@ trait Encoder[T] { /** * Returns an object of type `T`, extracting the required values from the provided row. Note that - * you must bind the encoder to a specific schema before you can call this function. + * you must `bind` an encoder to a specific schema before you can call this function. */ def fromRow(row: InternalRow): T /** * Returns a new copy of this encoder, where the expressions used by `fromRow` are bound to the - * given schema + * given schema. */ def bind(schema: Seq[Attribute]): Encoder[T] --- End diff -- I agree that this is needs to be reworked. In particular we should separate resolution from binding (as mentioned in the PR description). The way we are doing it today allows us to do very efficient codegen (no extra copies) and correctly handles things like joins that produce ambiguous column names (since internally we are binding to AttributeReferences). Given limited time before 1.6 code freeze, I'd rather mark the Encoder API as private and focus on fleshing out the user facing API. I think that long term we'll do what you suggest and have a wrapper that reorders input for custom encoder, and stick with pure expressions for the built in encoders for performance reasons.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org