Github user michalsenkyr commented on the issue:
https://github.com/apache/spark/pull/16240
Added support for arbitrary sequences.
Now also Queues, ArrayBuffers and such can be used in datasets (all are
serialized into ArrayType).
I had to alter and add new implicit encoders into `SQLImplicits`. The new
encoders are for `Seq` with `Product` combination (essentially only `List`) to
disambiguate between `Seq` and `Product` encoders.
However, I encountered a problem with implicits. When constructing a
complex Dataset using `Seq.toDS` that includes a `Product` (like a case class)
and a sequence, the encoder doesn't seem to be created. When constructed with
`spark.createDataset` or when transforming an existing dataset, there is no
problem.
I added a workaround by defining a specific implicit just for `Seq`s. This
makes the problem go away for existing usages, however other collections cannot
be constructed by `Seq.toDS` unless `newProductSeqEncoder[A, T]` is created
with the correct type parameters.
If anybody knows how to fix this, let me know.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]