Re: Spark 2.0.0 - Apply schema on few columns of dataset

Jacek Laskowski Fri, 05 Aug 2016 06:39:53 -0700

Hi,

I don't understand where the issue is...


➜  spark git:(master) ✗ cat csv-logs/people-1.csv
name,city,country,age,alive
Jacek,Warszawa,Polska,42,true

val df = spark.read.option("header", true).csv("csv-logs/people-1.csv")
val nameCityPairs = df.select('name, 'city).as[(String, String)]

scala> nameCityPairs.printSchema
root
 |-- name: string (nullable = true)
 |-- city: string (nullable = true)

Is this what you're after?

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Aug 5, 2016 at 2:06 PM, Aseem Bansal <[email protected]> wrote:
> I need to use few columns out of a csv. But as there is no option to read
> few columns out of csv so
>  1. I am reading the whole CSV using SparkSession.csv()
>  2.  selecting few of the columns using DataFrame.select()
>  3. applying schema using the .as() function of Dataset<Row>.  I tried to
> extent org.apache.spark.sql.Encoder as the input for as function
>
> But I am getting the following exception
>
> Exception in thread "main" java.lang.RuntimeException: Only expression
> encoders are supported today
>
> So my questions are -
> 1. Is it possible to read few columns instead of whole CSV? I cannot change
> the CSV as that is upstream data
> 2. How do I apply schema to few columns if I cannot write my encoder?

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: Spark 2.0.0 - Apply schema on few columns of dataset

Reply via email to