Datasets and columns

2016-01-25 Thread Steve Lewis
assume I have the following code

SparkConf sparkConf = new SparkConf();

JavaSparkContext sqlCtx= new JavaSparkContext(sparkConf);

JavaRDD rddMyType= generateRDD(); // some code

Encoder evidence = Encoders.kryo(MyType.class);
Dataset datasetMyType= sqlCtx.createDataset( rddMyType.rdd(), evidence);

Now I have a Dataset of MyType and assume there is some data.

Assume MyType has bean fields with getters and setters as well as some
internal collections and other data. What can I say about
datasetMyType??

Does datasetMyType have columns and if so what?

If not are there other ways to maka a DataSet with columns and if so
what are they


Re: Datasets and columns

2016-01-25 Thread Michael Armbrust
The encoder is responsible for mapping your class onto some set of
columns.  Try running: datasetMyType.printSchema()

On Mon, Jan 25, 2016 at 1:16 PM, Steve Lewis  wrote:

> assume I have the following code
>
> SparkConf sparkConf = new SparkConf();
>
> JavaSparkContext sqlCtx= new JavaSparkContext(sparkConf);
>
> JavaRDD rddMyType= generateRDD(); // some code
>
> Encoder evidence = Encoders.kryo(MyType.class);
> Dataset datasetMyType= sqlCtx.createDataset( rddMyType.rdd(), 
> evidence);
>
> Now I have a Dataset of MyType and assume there is some data.
>
> Assume MyType has bean fields with getters and setters as well as some 
> internal collections and other data. What can I say about datasetMyType??
>
> Does datasetMyType have columns and if so what?
>
> If not are there other ways to maka a DataSet with columns and if so what are 
> they
>
>
>


Re: Datasets and columns

2016-01-25 Thread Michael Armbrust
There is no public API for custom encoders yet, but since your class looks
like a bean you should be able to use the `bean` method instead of `kryo`.
This will expose the actual columns.

On Mon, Jan 25, 2016 at 2:04 PM, Steve Lewis  wrote:

> Ok when I look at the schema it looks like KRYO makes one column is there
> a way to do a custom encoder with my own columns
> On Jan 25, 2016 1:30 PM, "Michael Armbrust" 
> wrote:
>
>> The encoder is responsible for mapping your class onto some set of
>> columns.  Try running: datasetMyType.printSchema()
>>
>> On Mon, Jan 25, 2016 at 1:16 PM, Steve Lewis 
>> wrote:
>>
>>> assume I have the following code
>>>
>>> SparkConf sparkConf = new SparkConf();
>>>
>>> JavaSparkContext sqlCtx= new JavaSparkContext(sparkConf);
>>>
>>> JavaRDD rddMyType= generateRDD(); // some code
>>>
>>> Encoder evidence = Encoders.kryo(MyType.class);
>>> Dataset datasetMyType= sqlCtx.createDataset( rddMyType.rdd(), 
>>> evidence);
>>>
>>> Now I have a Dataset of MyType and assume there is some data.
>>>
>>> Assume MyType has bean fields with getters and setters as well as some 
>>> internal collections and other data. What can I say about datasetMyType??
>>>
>>> Does datasetMyType have columns and if so what?
>>>
>>> If not are there other ways to maka a DataSet with columns and if so what 
>>> are they
>>>
>>>
>>>
>>


Re: Datasets and columns

2016-01-25 Thread Steve Lewis
Ok when I look at the schema it looks like KRYO makes one column is there a
way to do a custom encoder with my own columns
On Jan 25, 2016 1:30 PM, "Michael Armbrust"  wrote:

> The encoder is responsible for mapping your class onto some set of
> columns.  Try running: datasetMyType.printSchema()
>
> On Mon, Jan 25, 2016 at 1:16 PM, Steve Lewis 
> wrote:
>
>> assume I have the following code
>>
>> SparkConf sparkConf = new SparkConf();
>>
>> JavaSparkContext sqlCtx= new JavaSparkContext(sparkConf);
>>
>> JavaRDD rddMyType= generateRDD(); // some code
>>
>> Encoder evidence = Encoders.kryo(MyType.class);
>> Dataset datasetMyType= sqlCtx.createDataset( rddMyType.rdd(), 
>> evidence);
>>
>> Now I have a Dataset of MyType and assume there is some data.
>>
>> Assume MyType has bean fields with getters and setters as well as some 
>> internal collections and other data. What can I say about datasetMyType??
>>
>> Does datasetMyType have columns and if so what?
>>
>> If not are there other ways to maka a DataSet with columns and if so what 
>> are they
>>
>>
>>
>