Datasets and columns
assume I have the following code SparkConf sparkConf = new SparkConf(); JavaSparkContext sqlCtx= new JavaSparkContext(sparkConf); JavaRDD rddMyType= generateRDD(); // some code Encoder evidence = Encoders.kryo(MyType.class); Dataset datasetMyType= sqlCtx.createDataset( rddMyType.rdd(), evidence); Now I have a Dataset of MyType and assume there is some data. Assume MyType has bean fields with getters and setters as well as some internal collections and other data. What can I say about datasetMyType?? Does datasetMyType have columns and if so what? If not are there other ways to maka a DataSet with columns and if so what are they
Re: Datasets and columns
The encoder is responsible for mapping your class onto some set of columns. Try running: datasetMyType.printSchema() On Mon, Jan 25, 2016 at 1:16 PM, Steve Lewiswrote: > assume I have the following code > > SparkConf sparkConf = new SparkConf(); > > JavaSparkContext sqlCtx= new JavaSparkContext(sparkConf); > > JavaRDD rddMyType= generateRDD(); // some code > > Encoder evidence = Encoders.kryo(MyType.class); > Dataset datasetMyType= sqlCtx.createDataset( rddMyType.rdd(), > evidence); > > Now I have a Dataset of MyType and assume there is some data. > > Assume MyType has bean fields with getters and setters as well as some > internal collections and other data. What can I say about datasetMyType?? > > Does datasetMyType have columns and if so what? > > If not are there other ways to maka a DataSet with columns and if so what are > they > > >
Re: Datasets and columns
There is no public API for custom encoders yet, but since your class looks like a bean you should be able to use the `bean` method instead of `kryo`. This will expose the actual columns. On Mon, Jan 25, 2016 at 2:04 PM, Steve Lewiswrote: > Ok when I look at the schema it looks like KRYO makes one column is there > a way to do a custom encoder with my own columns > On Jan 25, 2016 1:30 PM, "Michael Armbrust" > wrote: > >> The encoder is responsible for mapping your class onto some set of >> columns. Try running: datasetMyType.printSchema() >> >> On Mon, Jan 25, 2016 at 1:16 PM, Steve Lewis >> wrote: >> >>> assume I have the following code >>> >>> SparkConf sparkConf = new SparkConf(); >>> >>> JavaSparkContext sqlCtx= new JavaSparkContext(sparkConf); >>> >>> JavaRDD rddMyType= generateRDD(); // some code >>> >>> Encoder evidence = Encoders.kryo(MyType.class); >>> Dataset datasetMyType= sqlCtx.createDataset( rddMyType.rdd(), >>> evidence); >>> >>> Now I have a Dataset of MyType and assume there is some data. >>> >>> Assume MyType has bean fields with getters and setters as well as some >>> internal collections and other data. What can I say about datasetMyType?? >>> >>> Does datasetMyType have columns and if so what? >>> >>> If not are there other ways to maka a DataSet with columns and if so what >>> are they >>> >>> >>> >>
Re: Datasets and columns
Ok when I look at the schema it looks like KRYO makes one column is there a way to do a custom encoder with my own columns On Jan 25, 2016 1:30 PM, "Michael Armbrust"wrote: > The encoder is responsible for mapping your class onto some set of > columns. Try running: datasetMyType.printSchema() > > On Mon, Jan 25, 2016 at 1:16 PM, Steve Lewis > wrote: > >> assume I have the following code >> >> SparkConf sparkConf = new SparkConf(); >> >> JavaSparkContext sqlCtx= new JavaSparkContext(sparkConf); >> >> JavaRDD rddMyType= generateRDD(); // some code >> >> Encoder evidence = Encoders.kryo(MyType.class); >> Dataset datasetMyType= sqlCtx.createDataset( rddMyType.rdd(), >> evidence); >> >> Now I have a Dataset of MyType and assume there is some data. >> >> Assume MyType has bean fields with getters and setters as well as some >> internal collections and other data. What can I say about datasetMyType?? >> >> Does datasetMyType have columns and if so what? >> >> If not are there other ways to maka a DataSet with columns and if so what >> are they >> >> >> >