[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

cloud-fan Mon, 06 Jun 2016 10:06:12 -0700

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/13512
  
    `Aggregator` API is designed for typed `Dataset` only, not for untyped 
`DataFrame`. It can work if users use `Row` as the input type of `Aggregator`, 
but it's not a recommended usage.
    
    On the other way, it's dangerous to use a different encoder when apply a 
typed operation on `Dataset`. Let's say we have a `Dataset[T]` called `ds`, and 
its data is encoded from instances of `T` by `enc1`. Now you apply an 
`Aggregator` on the `ds`, with input encoder `enc2`. We can not guarantee that 
`enc2` can decode the data of `ds` to `T` instances, because the data is 
encoded by `enc1`.
    
    We can discuss it further if you have other ideas, and close this PR if you 
think my explaintion makes sense, thanks for working on it anyway!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

Reply via email to