Re: Converting dataframe to dataset question
Ryan, you are right. That was issue. It works now. Thanks. On Thu, Mar 23, 2017 at 8:26 PM, Ryan <ryan.hd@gmail.com> wrote: > you should import either spark.implicits or sqlContext.implicits, not > both. Otherwise the compiler will be confused about two implicit > transformations > > following code works for me, spark version 2.1.0 > > object Test { > def main(args: Array[String]) { > val spark = SparkSession > .builder > .master("local") > .appName(getClass.getSimpleName) > .getOrCreate() > import spark.implicits._ > val df = Seq(TeamUser("t1", "u1", "r1")).toDF() > df.printSchema() > spark.close() > } > } > > case class TeamUser(teamId: String, userId: String, role: String) > > > On Fri, Mar 24, 2017 at 5:23 AM, shyla deshpande <deshpandesh...@gmail.com > > wrote: > >> I made the code even more simpler still getting the error >> >> error: value toDF is not a member of Seq[com.whil.batch.Teamuser] >> [ERROR] val df = Seq(Teamuser("t1","u1","r1")).toDF() >> >> object Test { >> def main(args: Array[String]) { >> val spark = SparkSession >> .builder >> .appName(getClass.getSimpleName) >> .getOrCreate() >> import spark.implicits._ >> val sqlContext = spark.sqlContext >> import sqlContext.implicits._ >> val df = Seq(Teamuser("t1","u1","r1")).toDF() >> df.printSchema() >> } >> } >> case class Teamuser(teamid:String, userid:String, role:String) >> >> >> >> >> On Thu, Mar 23, 2017 at 1:07 PM, Yong Zhang <java8...@hotmail.com> wrote: >> >>> Not sure I understand this problem, why I cannot reproduce it? >>> >>> >>> scala> spark.version >>> res22: String = 2.1.0 >>> >>> scala> case class Teamuser(teamid: String, userid: String, role: String) >>> defined class Teamuser >>> >>> scala> val df = Seq(Teamuser("t1", "u1", "role1")).toDF >>> df: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 >>> more field] >>> >>> scala> df.show >>> +--+--+-+ >>> |teamid|userid| role| >>> +--+--+-+ >>> |t1|u1|role1| >>> +--+--+-+ >>> >>> scala> df.createOrReplaceTempView("teamuser") >>> >>> scala> val newDF = spark.sql("select teamid, userid, role from teamuser") >>> newDF: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... >>> 1 more field] >>> >>> scala> val userDS: Dataset[Teamuser] = newDF.as[Teamuser] >>> userDS: org.apache.spark.sql.Dataset[Teamuser] = [teamid: string, userid: >>> string ... 1 more field] >>> >>> scala> userDS.show >>> +--+--+-+ >>> |teamid|userid| role| >>> +--+--+-+ >>> |t1|u1|role1| >>> +--+--+-+ >>> >>> >>> scala> userDS.printSchema >>> root >>> |-- teamid: string (nullable = true) >>> |-- userid: string (nullable = true) >>> |-- role: string (nullable = true) >>> >>> >>> Am I missing anything? >>> >>> >>> Yong >>> >>> >>> -- >>> *From:* shyla deshpande <deshpandesh...@gmail.com> >>> *Sent:* Thursday, March 23, 2017 3:49 PM >>> *To:* user >>> *Subject:* Re: Converting dataframe to dataset question >>> >>> I realized, my case class was inside the object. It should be defined >>> outside the scope of the object. Thanks >>> >>> On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande < >>> deshpandesh...@gmail.com> wrote: >>> >>>> Why userDS is Dataset[Any], instead of Dataset[Teamuser]? Appreciate your >>>> help. Thanks >>>> >>>> val spark = SparkSession >>>> .builder >>>> .config("spark.cassandra.connection.host", cassandrahost) >>>> .appName(getClass.getSimpleName) >>>> .getOrCreate() >>>> >>>> import spark.implicits._ >>>> val sqlContext = spark.sqlContext >>>> import sqlContext.implicits._ >>>> >>>> case class Teamuser(teamid:String, userid:String, role:String) >>>> spark >>>> .read >>>> .format("org.apache.spark.sql.cassandra") >>>> .options(Map("keyspace" -> "test", "table" -> "teamuser")) >>>> .load >>>> .createOrReplaceTempView("teamuser") >>>> >>>> val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser") >>>> >>>> userDF.show() >>>> >>>> val userDS:Dataset[Teamuser] = userDF.as[Teamuser] >>>> >>>> >>> >> >
Re: Converting dataframe to dataset question
you should import either spark.implicits or sqlContext.implicits, not both. Otherwise the compiler will be confused about two implicit transformations following code works for me, spark version 2.1.0 object Test { def main(args: Array[String]) { val spark = SparkSession .builder .master("local") .appName(getClass.getSimpleName) .getOrCreate() import spark.implicits._ val df = Seq(TeamUser("t1", "u1", "r1")).toDF() df.printSchema() spark.close() } } case class TeamUser(teamId: String, userId: String, role: String) On Fri, Mar 24, 2017 at 5:23 AM, shyla deshpande <deshpandesh...@gmail.com> wrote: > I made the code even more simpler still getting the error > > error: value toDF is not a member of Seq[com.whil.batch.Teamuser] > [ERROR] val df = Seq(Teamuser("t1","u1","r1")).toDF() > > object Test { > def main(args: Array[String]) { > val spark = SparkSession > .builder > .appName(getClass.getSimpleName) > .getOrCreate() > import spark.implicits._ > val sqlContext = spark.sqlContext > import sqlContext.implicits._ > val df = Seq(Teamuser("t1","u1","r1")).toDF() > df.printSchema() > } > } > case class Teamuser(teamid:String, userid:String, role:String) > > > > > On Thu, Mar 23, 2017 at 1:07 PM, Yong Zhang <java8...@hotmail.com> wrote: > >> Not sure I understand this problem, why I cannot reproduce it? >> >> >> scala> spark.version >> res22: String = 2.1.0 >> >> scala> case class Teamuser(teamid: String, userid: String, role: String) >> defined class Teamuser >> >> scala> val df = Seq(Teamuser("t1", "u1", "role1")).toDF >> df: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 >> more field] >> >> scala> df.show >> +--+--+-+ >> |teamid|userid| role| >> +--+--+-+ >> |t1|u1|role1| >> +--+--+-+ >> >> scala> df.createOrReplaceTempView("teamuser") >> >> scala> val newDF = spark.sql("select teamid, userid, role from teamuser") >> newDF: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... >> 1 more field] >> >> scala> val userDS: Dataset[Teamuser] = newDF.as[Teamuser] >> userDS: org.apache.spark.sql.Dataset[Teamuser] = [teamid: string, userid: >> string ... 1 more field] >> >> scala> userDS.show >> +--+--+-+ >> |teamid|userid| role| >> +--+------+-----+ >> |t1|u1|role1| >> +--+--+-+ >> >> >> scala> userDS.printSchema >> root >> |-- teamid: string (nullable = true) >> |-- userid: string (nullable = true) >> |-- role: string (nullable = true) >> >> >> Am I missing anything? >> >> >> Yong >> >> >> -- >> *From:* shyla deshpande <deshpandesh...@gmail.com> >> *Sent:* Thursday, March 23, 2017 3:49 PM >> *To:* user >> *Subject:* Re: Converting dataframe to dataset question >> >> I realized, my case class was inside the object. It should be defined >> outside the scope of the object. Thanks >> >> On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande < >> deshpandesh...@gmail.com> wrote: >> >>> Why userDS is Dataset[Any], instead of Dataset[Teamuser]? Appreciate your >>> help. Thanks >>> >>> val spark = SparkSession >>> .builder >>> .config("spark.cassandra.connection.host", cassandrahost) >>> .appName(getClass.getSimpleName) >>> .getOrCreate() >>> >>> import spark.implicits._ >>> val sqlContext = spark.sqlContext >>> import sqlContext.implicits._ >>> >>> case class Teamuser(teamid:String, userid:String, role:String) >>> spark >>> .read >>> .format("org.apache.spark.sql.cassandra") >>> .options(Map("keyspace" -> "test", "table" -> "teamuser")) >>> .load >>> .createOrReplaceTempView("teamuser") >>> >>> val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser") >>> >>> userDF.show() >>> >>> val userDS:Dataset[Teamuser] = userDF.as[Teamuser] >>> >>> >> >
Re: Converting dataframe to dataset question
I made the code even more simpler still getting the error error: value toDF is not a member of Seq[com.whil.batch.Teamuser] [ERROR] val df = Seq(Teamuser("t1","u1","r1")).toDF() object Test { def main(args: Array[String]) { val spark = SparkSession .builder .appName(getClass.getSimpleName) .getOrCreate() import spark.implicits._ val sqlContext = spark.sqlContext import sqlContext.implicits._ val df = Seq(Teamuser("t1","u1","r1")).toDF() df.printSchema() } } case class Teamuser(teamid:String, userid:String, role:String) On Thu, Mar 23, 2017 at 1:07 PM, Yong Zhang <java8...@hotmail.com> wrote: > Not sure I understand this problem, why I cannot reproduce it? > > > scala> spark.version > res22: String = 2.1.0 > > scala> case class Teamuser(teamid: String, userid: String, role: String) > defined class Teamuser > > scala> val df = Seq(Teamuser("t1", "u1", "role1")).toDF > df: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 > more field] > > scala> df.show > +--+--+-+ > |teamid|userid| role| > +--+--+-+ > |t1|u1|role1| > +--+--+-+ > > scala> df.createOrReplaceTempView("teamuser") > > scala> val newDF = spark.sql("select teamid, userid, role from teamuser") > newDF: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 > more field] > > scala> val userDS: Dataset[Teamuser] = newDF.as[Teamuser] > userDS: org.apache.spark.sql.Dataset[Teamuser] = [teamid: string, userid: > string ... 1 more field] > > scala> userDS.show > +--+--+-+ > |teamid|userid| role| > +--+--+-+ > |t1|u1|role1| > +--+--+-+ > > > scala> userDS.printSchema > root > |-- teamid: string (nullable = true) > |-- userid: string (nullable = true) > |-- role: string (nullable = true) > > > Am I missing anything? > > > Yong > > > -- > *From:* shyla deshpande <deshpandesh...@gmail.com> > *Sent:* Thursday, March 23, 2017 3:49 PM > *To:* user > *Subject:* Re: Converting dataframe to dataset question > > I realized, my case class was inside the object. It should be defined > outside the scope of the object. Thanks > > On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande <deshpandesh...@gmail.com > > wrote: > >> Why userDS is Dataset[Any], instead of Dataset[Teamuser]? Appreciate your >> help. Thanks >> >> val spark = SparkSession >> .builder >> .config("spark.cassandra.connection.host", cassandrahost) >> .appName(getClass.getSimpleName) >> .getOrCreate() >> >> import spark.implicits._ >> val sqlContext = spark.sqlContext >> import sqlContext.implicits._ >> >> case class Teamuser(teamid:String, userid:String, role:String) >> spark >> .read >> .format("org.apache.spark.sql.cassandra") >> .options(Map("keyspace" -> "test", "table" -> "teamuser")) >> .load >> .createOrReplaceTempView("teamuser") >> >> val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser") >> >> userDF.show() >> >> val userDS:Dataset[Teamuser] = userDF.as[Teamuser] >> >> >
Re: Converting dataframe to dataset question
now I get a run time error... error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. [ERROR] val userDS:Dataset[Teamuser] = userDF.as[Teamuser] On Thu, Mar 23, 2017 at 12:49 PM, shyla deshpandewrote: > I realized, my case class was inside the object. It should be defined > outside the scope of the object. Thanks > > On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande > wrote: > >> Why userDS is Dataset[Any], instead of Dataset[Teamuser]? Appreciate your >> help. Thanks >> >> val spark = SparkSession >> .builder >> .config("spark.cassandra.connection.host", cassandrahost) >> .appName(getClass.getSimpleName) >> .getOrCreate() >> >> import spark.implicits._ >> val sqlContext = spark.sqlContext >> import sqlContext.implicits._ >> >> case class Teamuser(teamid:String, userid:String, role:String) >> spark >> .read >> .format("org.apache.spark.sql.cassandra") >> .options(Map("keyspace" -> "test", "table" -> "teamuser")) >> .load >> .createOrReplaceTempView("teamuser") >> >> val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser") >> >> userDF.show() >> >> val userDS:Dataset[Teamuser] = userDF.as[Teamuser] >> >> >
Re: Converting dataframe to dataset question
Not sure I understand this problem, why I cannot reproduce it? scala> spark.version res22: String = 2.1.0 scala> case class Teamuser(teamid: String, userid: String, role: String) defined class Teamuser scala> val df = Seq(Teamuser("t1", "u1", "role1")).toDF df: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 more field] scala> df.show +--+--+-+ |teamid|userid| role| +--+--+-+ |t1|u1|role1| +--+--+-+ scala> df.createOrReplaceTempView("teamuser") scala> val newDF = spark.sql("select teamid, userid, role from teamuser") newDF: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 more field] scala> val userDS: Dataset[Teamuser] = newDF.as[Teamuser] userDS: org.apache.spark.sql.Dataset[Teamuser] = [teamid: string, userid: string ... 1 more field] scala> userDS.show +--+--+-+ |teamid|userid| role| +--+--+-+ |t1|u1|role1| +--+--+-+ scala> userDS.printSchema root |-- teamid: string (nullable = true) |-- userid: string (nullable = true) |-- role: string (nullable = true) Am I missing anything? Yong From: shyla deshpande <deshpandesh...@gmail.com> Sent: Thursday, March 23, 2017 3:49 PM To: user Subject: Re: Converting dataframe to dataset question I realized, my case class was inside the object. It should be defined outside the scope of the object. Thanks On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande <deshpandesh...@gmail.com<mailto:deshpandesh...@gmail.com>> wrote: Why userDS is Dataset[Any], instead of Dataset[Teamuser]? Appreciate your help. Thanks val spark = SparkSession .builder .config("spark.cassandra.connection.host", cassandrahost) .appName(getClass.getSimpleName) .getOrCreate() import spark.implicits._ val sqlContext = spark.sqlContext import sqlContext.implicits._ case class Teamuser(teamid:String, userid:String, role:String) spark .read .format("org.apache.spark.sql.cassandra") .options(Map("keyspace" -> "test", "table" -> "teamuser")) .load .createOrReplaceTempView("teamuser") val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser") userDF.show() val userDS:Dataset[Teamuser] = userDF.as[Teamuser]
Re: Converting dataframe to dataset question
I realized, my case class was inside the object. It should be defined outside the scope of the object. Thanks On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpandewrote: > Why userDS is Dataset[Any], instead of Dataset[Teamuser]? Appreciate your > help. Thanks > > val spark = SparkSession > .builder > .config("spark.cassandra.connection.host", cassandrahost) > .appName(getClass.getSimpleName) > .getOrCreate() > > import spark.implicits._ > val sqlContext = spark.sqlContext > import sqlContext.implicits._ > > case class Teamuser(teamid:String, userid:String, role:String) > spark > .read > .format("org.apache.spark.sql.cassandra") > .options(Map("keyspace" -> "test", "table" -> "teamuser")) > .load > .createOrReplaceTempView("teamuser") > > val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser") > > userDF.show() > > val userDS:Dataset[Teamuser] = userDF.as[Teamuser] > >
Converting dataframe to dataset question
Why userDS is Dataset[Any], instead of Dataset[Teamuser]? Appreciate your help. Thanks val spark = SparkSession .builder .config("spark.cassandra.connection.host", cassandrahost) .appName(getClass.getSimpleName) .getOrCreate() import spark.implicits._ val sqlContext = spark.sqlContext import sqlContext.implicits._ case class Teamuser(teamid:String, userid:String, role:String) spark .read .format("org.apache.spark.sql.cassandra") .options(Map("keyspace" -> "test", "table" -> "teamuser")) .load .createOrReplaceTempView("teamuser") val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser") userDF.show() val userDS:Dataset[Teamuser] = userDF.as[Teamuser]