As Sean mentioned Scala case class is a handy way of representing objects with names and types. For example, if you are reading a csv file with spaced column names like "counter party" etc and you want a more compact column name like counterparty etc
scala> val location="hdfs://rhes75:9000/tmp/crap.csv" location: String = hdfs://rhes75:9000/tmp/crap.csv scala> val df1 = spark.read.option("header", false).csv(location) // don't read the header df1: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 34 more fields] // column header are represted as _c0, _c1 etc scala> case class columns(KEY: String, TICKER: String, TIMEISSUED: String, PRICE: Double) // create name and type for _c0, _c1 and so forth defined class columns scala> val df2 = df1.map(p => columns(p(0).toString,p(1).toString, p(2).toString,p(3).toString.toDouble)) // map those columns df2: org.apache.spark.sql.Dataset[columns] = [KEY: string, TICKER: string ... 2 more fields] scala> df2.printSchema root |-- KEY: string (nullable = true) |-- TICKER: string (nullable = true) |-- TIMEISSUED: string (nullable = true) |-- PRICE: double (nullable = false) HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Tue, 8 Feb 2022 at 14:32, Sean Owen <sro...@gmail.com> wrote: > It's just a possibly tidier way to represent objects with named, typed > fields, in order to specify a DataFrame's contents. > > On Tue, Feb 8, 2022 at 4:16 AM <capitnfrak...@free.fr> wrote: > >> Hello >> >> I am converting some py code to scala. >> This works in python: >> >> >>> rdd = sc.parallelize([('apple',1),('orange',2)]) >> >>> rdd.toDF(['fruit','num']).show() >> +------+---+ >> | fruit|num| >> +------+---+ >> | apple| 1| >> |orange| 2| >> +------+---+ >> >> And in scala: >> scala> rdd.toDF("fruit","num").show() >> +------+---+ >> | fruit|num| >> +------+---+ >> | apple| 1| >> |orange| 2| >> +------+---+ >> >> But I saw many code that use a case class for translation. >> >> scala> case class Fruit(fruit:String,num:Int) >> defined class Fruit >> >> scala> rdd.map{case (x,y) => Fruit(x,y) }.toDF().show() >> +------+---+ >> | fruit|num| >> +------+---+ >> | apple| 1| >> |orange| 2| >> +------+---+ >> >> >> Do you know why to use a "case class" here? >> >> thanks. >> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>