To those interested I changed the data frame to RDD. Then I created a data frame. That has an option of giving a schema.
But probably someone should improve how to use the as function. On Mon, Aug 8, 2016 at 1:05 PM, Ewan Leith <[email protected]> wrote: > Hmm I’m not sure, I don’t use the Java API sorry > > > > The simplest way to work around it would be to read the csv as a text file > using sparkContext textFile, split each row based on a comma, then convert > it to a dataset afterwards. > > > > *From:* Aseem Bansal [mailto:[email protected]] > *Sent:* 08 August 2016 07:37 > *To:* Ewan Leith <[email protected]> > *Cc:* user <[email protected]> > *Subject:* Re: Spark 2.0.0 - Apply schema on few columns of dataset > > > > Hi Ewan > > > > The .as function take a single encoder or a single string or a single > Symbol. I have like more than 10 columns so I cannot use the tuple > functions. Passing using bracket does not work. > > > > On Mon, Aug 8, 2016 at 11:26 AM, Ewan Leith <[email protected]> > wrote: > > Looking at the encoders api documentation at > > http://spark.apache.org/docs/latest/api/java/ > > == Java == Encoders are specified by calling static methods on Encoders > <http://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Encoders.html> > . > > List<String> data = Arrays.asList("abc", "abc", "xyz"); Dataset<String> ds > = context.createDataset(data, Encoders.STRING()); > > I think you should be calling > > .as((Encoders.STRING(), Encoders.STRING())) > > or similar > > Ewan > > > > On 8 Aug 2016 06:10, Aseem Bansal <[email protected]> wrote: > > Hi All > > > > Has anyone done this with Java API? > > > > On Fri, Aug 5, 2016 at 5:36 PM, Aseem Bansal <[email protected]> wrote: > > I need to use few columns out of a csv. But as there is no option to read > few columns out of csv so > > 1. I am reading the whole CSV using SparkSession.csv() > > 2. selecting few of the columns using DataFrame.select() > > 3. applying schema using the .as() function of Dataset<Row>. I tried to > extent org.apache.spark.sql.Encoder as the input for as function > > > > But I am getting the following exception > > > > Exception in thread "main" java.lang.RuntimeException: Only expression > encoders are supported today > > > > So my questions are - > > 1. Is it possible to read few columns instead of whole CSV? I cannot > change the CSV as that is upstream data > > 2. How do I apply schema to few columns if I cannot write my encoder? > > > > > > >
