you can still use it as Dataset[Set[X]]. all transformations should work correctly.
however dataset.schema will show binary type, and dataset.show will show bytes (unfortunately). for example: scala> implicit def setEncoder[X]: Encoder[Set[X]] = Encoders.kryo[Set[X]] setEncoder: [X]=> org.apache.spark.sql.Encoder[Set[X]] scala> val x = Seq(Set(1,2,3)).toDS x: org.apache.spark.sql.Dataset[scala.collection.immutable.Set[Int]] = [value: binary] scala> x.map(_ + 4).collect res17: Array[scala.collection.immutable.Set[Int]] = Array(Set(1, 2, 3, 4)) scala> x.show +--------------------+ | value| +--------------------+ |[2A 01 03 02 02 0...| +--------------------+ scala> x.schema res19: org.apache.spark.sql.types.StructType = StructType(StructField(value,BinaryType,true)) On Wed, Feb 1, 2017 at 12:03 PM, Jerry Lam <chiling...@gmail.com> wrote: > Hi Koert, > > Thanks for the tips. I tried to do that but the column's type is now > Binary. Do I get the Set[X] back in the Dataset? > > Best Regards, > > Jerry > > > On Tue, Jan 31, 2017 at 8:04 PM, Koert Kuipers <ko...@tresata.com> wrote: > >> set is currently not supported. you can use kryo encoder. there is no >> other work around that i know of. >> >> import org.apache.spark.sql.{ Encoder, Encoders } >> implicit def setEncoder[X]: Encoder[Set[X]] = Encoders.kryo[Set[X]] >> >> On Tue, Jan 31, 2017 at 7:33 PM, Jerry Lam <chiling...@gmail.com> wrote: >> >>> Hi guys, >>> >>> I got an exception like the following, when I tried to implement a user >>> defined aggregation function. >>> >>> Exception in thread "main" java.lang.UnsupportedOperationException: No >>> Encoder found for Set[(scala.Long, scala.Long)] >>> >>> The Set[(Long, Long)] is a field in the case class which is the output >>> type for the aggregation. >>> >>> Is there a workaround for this? >>> >>> Best Regards, >>> >>> Jerry >>> >> >> >