Re: Spark 2.0.0 - Apply schema on few columns of dataset

Aseem Bansal Wed, 10 Aug 2016 08:45:52 -0700

To  those interested I changed the data frame to RDD. Then I created a data
frame. That has an option of giving a schema.


But probably someone should improve how to use the as function.

On Mon, Aug 8, 2016 at 1:05 PM, Ewan Leith <[email protected]>
wrote:

> Hmm I’m not sure, I don’t use the Java API sorry
>
>
>
> The simplest way to work around it would be to read the csv as a text file
> using sparkContext textFile, split each row based on a comma, then convert
> it to a dataset afterwards.
>
>
>
> *From:* Aseem Bansal [mailto:[email protected]]
> *Sent:* 08 August 2016 07:37
> *To:* Ewan Leith <[email protected]>
> *Cc:* user <[email protected]>
> *Subject:* Re: Spark 2.0.0 - Apply schema on few columns of dataset
>
>
>
> Hi Ewan
>
>
>
> The .as function take a single encoder or a single string or a single
> Symbol. I have like more than 10 columns so I cannot use the tuple
> functions. Passing using bracket does not work.
>
>
>
> On Mon, Aug 8, 2016 at 11:26 AM, Ewan Leith <[email protected]>
> wrote:
>
> Looking at the encoders api documentation at
>
> http://spark.apache.org/docs/latest/api/java/
>
> == Java == Encoders are specified by calling static methods on Encoders
> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Encoders.html>
> .
>
> List<String> data = Arrays.asList("abc", "abc", "xyz"); Dataset<String> ds
> = context.createDataset(data, Encoders.STRING());
>
> I think you should be calling
>
> .as((Encoders.STRING(), Encoders.STRING()))
>
> or similar
>
> Ewan
>
>
>
> On 8 Aug 2016 06:10, Aseem Bansal <[email protected]> wrote:
>
> Hi All
>
>
>
> Has anyone done this with Java API?
>
>
>
> On Fri, Aug 5, 2016 at 5:36 PM, Aseem Bansal <[email protected]> wrote:
>
> I need to use few columns out of a csv. But as there is no option to read
> few columns out of csv so
>
>  1. I am reading the whole CSV using SparkSession.csv()
>
>  2.  selecting few of the columns using DataFrame.select()
>
>  3. applying schema using the .as() function of Dataset<Row>.  I tried to
> extent org.apache.spark.sql.Encoder as the input for as function
>
>
>
> But I am getting the following exception
>
>
>
> Exception in thread "main" java.lang.RuntimeException: Only expression
> encoders are supported today
>
>
>
> So my questions are -
>
> 1. Is it possible to read few columns instead of whole CSV? I cannot
> change the CSV as that is upstream data
>
> 2. How do I apply schema to few columns if I cannot write my encoder?
>
>
>
>
>
>
>

Re: Spark 2.0.0 - Apply schema on few columns of dataset

Reply via email to