[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

michalsenkyr Sat, 10 Dec 2016 13:50:28 -0800

Github user michalsenkyr commented on the issue:

    https://github.com/apache/spark/pull/16240
  
    Added support for arbitrary sequences.
    
    Now also Queues, ArrayBuffers and such can be used in datasets (all are 
serialized into ArrayType).
    
    I had to alter and add new implicit encoders into `SQLImplicits`. The new 
encoders are for `Seq` with `Product` combination (essentially only `List`) to 
disambiguate between `Seq` and `Product` encoders.
    
    However, I encountered a problem with implicits. When constructing a 
complex Dataset using `Seq.toDS` that includes a `Product` (like a case class) 
and a sequence, the encoder doesn't seem to be created. When constructed with 
`spark.createDataset` or when transforming an existing dataset, there is no 
problem.
    
    I added a workaround by defining a specific implicit just for `Seq`s. This 
makes the problem go away for existing usages, however other collections cannot 
be constructed by `Seq.toDS` unless `newProductSeqEncoder[A, T]` is created 
with the correct type parameters.
    
    If anybody knows how to fix this, let me know.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

Reply via email to