Re: Converting dataframe to dataset question

2017-03-23 Thread shyla deshpande
Ryan, you are right. That was issue. It works now. Thanks.

On Thu, Mar 23, 2017 at 8:26 PM, Ryan <ryan.hd@gmail.com> wrote:

> you should import either spark.implicits or sqlContext.implicits, not
> both. Otherwise the compiler will be confused about two implicit
> transformations
>
> following code works for me, spark version 2.1.0
>
> object Test {
>   def main(args: Array[String]) {
> val spark = SparkSession
>   .builder
>   .master("local")
>   .appName(getClass.getSimpleName)
>   .getOrCreate()
> import spark.implicits._
> val df = Seq(TeamUser("t1", "u1", "r1")).toDF()
> df.printSchema()
> spark.close()
>   }
> }
>
> case class TeamUser(teamId: String, userId: String, role: String)
>
>
> On Fri, Mar 24, 2017 at 5:23 AM, shyla deshpande <deshpandesh...@gmail.com
> > wrote:
>
>> I made the code even more simpler still getting the error
>>
>> error: value toDF is not a member of Seq[com.whil.batch.Teamuser]
>> [ERROR] val df = Seq(Teamuser("t1","u1","r1")).toDF()
>>
>> object Test {
>>   def main(args: Array[String]) {
>> val spark = SparkSession
>>   .builder
>>   .appName(getClass.getSimpleName)
>>   .getOrCreate()
>> import spark.implicits._
>> val sqlContext = spark.sqlContext
>> import sqlContext.implicits._
>> val df = Seq(Teamuser("t1","u1","r1")).toDF()
>> df.printSchema()
>>   }
>> }
>> case class Teamuser(teamid:String, userid:String, role:String)
>>
>>
>>
>>
>> On Thu, Mar 23, 2017 at 1:07 PM, Yong Zhang <java8...@hotmail.com> wrote:
>>
>>> Not sure I understand this problem, why I cannot reproduce it?
>>>
>>>
>>> scala> spark.version
>>> res22: String = 2.1.0
>>>
>>> scala> case class Teamuser(teamid: String, userid: String, role: String)
>>> defined class Teamuser
>>>
>>> scala> val df = Seq(Teamuser("t1", "u1", "role1")).toDF
>>> df: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 
>>> more field]
>>>
>>> scala> df.show
>>> +--+--+-+
>>> |teamid|userid| role|
>>> +--+--+-+
>>> |t1|u1|role1|
>>> +--+--+-+
>>>
>>> scala> df.createOrReplaceTempView("teamuser")
>>>
>>> scala> val newDF = spark.sql("select teamid, userid, role from teamuser")
>>> newDF: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 
>>> 1 more field]
>>>
>>> scala> val userDS: Dataset[Teamuser] = newDF.as[Teamuser]
>>> userDS: org.apache.spark.sql.Dataset[Teamuser] = [teamid: string, userid: 
>>> string ... 1 more field]
>>>
>>> scala> userDS.show
>>> +--+--+-+
>>> |teamid|userid| role|
>>> +--+--+-+
>>> |t1|u1|role1|
>>> +--+--+-+
>>>
>>>
>>> scala> userDS.printSchema
>>> root
>>>  |-- teamid: string (nullable = true)
>>>  |-- userid: string (nullable = true)
>>>  |-- role: string (nullable = true)
>>>
>>>
>>> Am I missing anything?
>>>
>>>
>>> Yong
>>>
>>>
>>> --
>>> *From:* shyla deshpande <deshpandesh...@gmail.com>
>>> *Sent:* Thursday, March 23, 2017 3:49 PM
>>> *To:* user
>>> *Subject:* Re: Converting dataframe to dataset question
>>>
>>> I realized, my case class was inside the object. It should be defined
>>> outside the scope of the object. Thanks
>>>
>>> On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande <
>>> deshpandesh...@gmail.com> wrote:
>>>
>>>> Why userDS is Dataset[Any], instead of Dataset[Teamuser]?  Appreciate your 
>>>> help. Thanks
>>>>
>>>> val spark = SparkSession
>>>>   .builder
>>>>   .config("spark.cassandra.connection.host", cassandrahost)
>>>>   .appName(getClass.getSimpleName)
>>>>   .getOrCreate()
>>>>
>>>> import spark.implicits._
>>>> val sqlContext = spark.sqlContext
>>>> import sqlContext.implicits._
>>>>
>>>> case class Teamuser(teamid:String, userid:String, role:String)
>>>> spark
>>>>   .read
>>>>   .format("org.apache.spark.sql.cassandra")
>>>>   .options(Map("keyspace" -> "test", "table" -> "teamuser"))
>>>>   .load
>>>>   .createOrReplaceTempView("teamuser")
>>>>
>>>> val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser")
>>>>
>>>> userDF.show()
>>>>
>>>> val userDS:Dataset[Teamuser] = userDF.as[Teamuser]
>>>>
>>>>
>>>
>>
>


Re: Converting dataframe to dataset question

2017-03-23 Thread Ryan
you should import either spark.implicits or sqlContext.implicits, not both.
Otherwise the compiler will be confused about two implicit transformations

following code works for me, spark version 2.1.0

object Test {
  def main(args: Array[String]) {
val spark = SparkSession
  .builder
  .master("local")
  .appName(getClass.getSimpleName)
  .getOrCreate()
import spark.implicits._
val df = Seq(TeamUser("t1", "u1", "r1")).toDF()
df.printSchema()
spark.close()
  }
}

case class TeamUser(teamId: String, userId: String, role: String)


On Fri, Mar 24, 2017 at 5:23 AM, shyla deshpande <deshpandesh...@gmail.com>
wrote:

> I made the code even more simpler still getting the error
>
> error: value toDF is not a member of Seq[com.whil.batch.Teamuser]
> [ERROR] val df = Seq(Teamuser("t1","u1","r1")).toDF()
>
> object Test {
>   def main(args: Array[String]) {
> val spark = SparkSession
>   .builder
>   .appName(getClass.getSimpleName)
>   .getOrCreate()
> import spark.implicits._
> val sqlContext = spark.sqlContext
> import sqlContext.implicits._
> val df = Seq(Teamuser("t1","u1","r1")).toDF()
> df.printSchema()
>   }
> }
> case class Teamuser(teamid:String, userid:String, role:String)
>
>
>
>
> On Thu, Mar 23, 2017 at 1:07 PM, Yong Zhang <java8...@hotmail.com> wrote:
>
>> Not sure I understand this problem, why I cannot reproduce it?
>>
>>
>> scala> spark.version
>> res22: String = 2.1.0
>>
>> scala> case class Teamuser(teamid: String, userid: String, role: String)
>> defined class Teamuser
>>
>> scala> val df = Seq(Teamuser("t1", "u1", "role1")).toDF
>> df: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 
>> more field]
>>
>> scala> df.show
>> +--+--+-+
>> |teamid|userid| role|
>> +--+--+-+
>> |t1|u1|role1|
>> +--+--+-+
>>
>> scala> df.createOrReplaceTempView("teamuser")
>>
>> scala> val newDF = spark.sql("select teamid, userid, role from teamuser")
>> newDF: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 
>> 1 more field]
>>
>> scala> val userDS: Dataset[Teamuser] = newDF.as[Teamuser]
>> userDS: org.apache.spark.sql.Dataset[Teamuser] = [teamid: string, userid: 
>> string ... 1 more field]
>>
>> scala> userDS.show
>> +--+--+-+
>> |teamid|userid| role|
>> +--+------+-----+
>> |t1|u1|role1|
>> +--+--+-+
>>
>>
>> scala> userDS.printSchema
>> root
>>  |-- teamid: string (nullable = true)
>>  |-- userid: string (nullable = true)
>>  |-- role: string (nullable = true)
>>
>>
>> Am I missing anything?
>>
>>
>> Yong
>>
>>
>> --
>> *From:* shyla deshpande <deshpandesh...@gmail.com>
>> *Sent:* Thursday, March 23, 2017 3:49 PM
>> *To:* user
>> *Subject:* Re: Converting dataframe to dataset question
>>
>> I realized, my case class was inside the object. It should be defined
>> outside the scope of the object. Thanks
>>
>> On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande <
>> deshpandesh...@gmail.com> wrote:
>>
>>> Why userDS is Dataset[Any], instead of Dataset[Teamuser]?  Appreciate your 
>>> help. Thanks
>>>
>>> val spark = SparkSession
>>>   .builder
>>>   .config("spark.cassandra.connection.host", cassandrahost)
>>>   .appName(getClass.getSimpleName)
>>>   .getOrCreate()
>>>
>>> import spark.implicits._
>>> val sqlContext = spark.sqlContext
>>> import sqlContext.implicits._
>>>
>>> case class Teamuser(teamid:String, userid:String, role:String)
>>> spark
>>>   .read
>>>   .format("org.apache.spark.sql.cassandra")
>>>   .options(Map("keyspace" -> "test", "table" -> "teamuser"))
>>>   .load
>>>   .createOrReplaceTempView("teamuser")
>>>
>>> val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser")
>>>
>>> userDF.show()
>>>
>>> val userDS:Dataset[Teamuser] = userDF.as[Teamuser]
>>>
>>>
>>
>


Re: Converting dataframe to dataset question

2017-03-23 Thread shyla deshpande
I made the code even more simpler still getting the error

error: value toDF is not a member of Seq[com.whil.batch.Teamuser]
[ERROR] val df = Seq(Teamuser("t1","u1","r1")).toDF()

object Test {
  def main(args: Array[String]) {
val spark = SparkSession
  .builder
  .appName(getClass.getSimpleName)
  .getOrCreate()
import spark.implicits._
val sqlContext = spark.sqlContext
import sqlContext.implicits._
val df = Seq(Teamuser("t1","u1","r1")).toDF()
df.printSchema()
  }
}
case class Teamuser(teamid:String, userid:String, role:String)




On Thu, Mar 23, 2017 at 1:07 PM, Yong Zhang <java8...@hotmail.com> wrote:

> Not sure I understand this problem, why I cannot reproduce it?
>
>
> scala> spark.version
> res22: String = 2.1.0
>
> scala> case class Teamuser(teamid: String, userid: String, role: String)
> defined class Teamuser
>
> scala> val df = Seq(Teamuser("t1", "u1", "role1")).toDF
> df: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 
> more field]
>
> scala> df.show
> +--+--+-+
> |teamid|userid| role|
> +--+--+-+
> |t1|u1|role1|
> +--+--+-+
>
> scala> df.createOrReplaceTempView("teamuser")
>
> scala> val newDF = spark.sql("select teamid, userid, role from teamuser")
> newDF: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 
> more field]
>
> scala> val userDS: Dataset[Teamuser] = newDF.as[Teamuser]
> userDS: org.apache.spark.sql.Dataset[Teamuser] = [teamid: string, userid: 
> string ... 1 more field]
>
> scala> userDS.show
> +--+--+-+
> |teamid|userid| role|
> +--+--+-+
> |t1|u1|role1|
> +--+--+-+
>
>
> scala> userDS.printSchema
> root
>  |-- teamid: string (nullable = true)
>  |-- userid: string (nullable = true)
>  |-- role: string (nullable = true)
>
>
> Am I missing anything?
>
>
> Yong
>
>
> --
> *From:* shyla deshpande <deshpandesh...@gmail.com>
> *Sent:* Thursday, March 23, 2017 3:49 PM
> *To:* user
> *Subject:* Re: Converting dataframe to dataset question
>
> I realized, my case class was inside the object. It should be defined
> outside the scope of the object. Thanks
>
> On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande <deshpandesh...@gmail.com
> > wrote:
>
>> Why userDS is Dataset[Any], instead of Dataset[Teamuser]?  Appreciate your 
>> help. Thanks
>>
>> val spark = SparkSession
>>   .builder
>>   .config("spark.cassandra.connection.host", cassandrahost)
>>   .appName(getClass.getSimpleName)
>>   .getOrCreate()
>>
>> import spark.implicits._
>> val sqlContext = spark.sqlContext
>> import sqlContext.implicits._
>>
>> case class Teamuser(teamid:String, userid:String, role:String)
>> spark
>>   .read
>>   .format("org.apache.spark.sql.cassandra")
>>   .options(Map("keyspace" -> "test", "table" -> "teamuser"))
>>   .load
>>   .createOrReplaceTempView("teamuser")
>>
>> val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser")
>>
>> userDF.show()
>>
>> val userDS:Dataset[Teamuser] = userDF.as[Teamuser]
>>
>>
>


Re: Converting dataframe to dataset question

2017-03-23 Thread shyla deshpande
now I get a run time error...

error: Unable to find encoder for type stored in a Dataset.  Primitive
types (Int, String, etc) and Product types (case classes) are supported by
importing spark.implicits._  Support for serializing other types will be
added in future releases.
[ERROR] val userDS:Dataset[Teamuser] = userDF.as[Teamuser]

On Thu, Mar 23, 2017 at 12:49 PM, shyla deshpande 
wrote:

> I realized, my case class was inside the object. It should be defined
> outside the scope of the object. Thanks
>
> On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande  > wrote:
>
>> Why userDS is Dataset[Any], instead of Dataset[Teamuser]?  Appreciate your 
>> help. Thanks
>>
>> val spark = SparkSession
>>   .builder
>>   .config("spark.cassandra.connection.host", cassandrahost)
>>   .appName(getClass.getSimpleName)
>>   .getOrCreate()
>>
>> import spark.implicits._
>> val sqlContext = spark.sqlContext
>> import sqlContext.implicits._
>>
>> case class Teamuser(teamid:String, userid:String, role:String)
>> spark
>>   .read
>>   .format("org.apache.spark.sql.cassandra")
>>   .options(Map("keyspace" -> "test", "table" -> "teamuser"))
>>   .load
>>   .createOrReplaceTempView("teamuser")
>>
>> val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser")
>>
>> userDF.show()
>>
>> val userDS:Dataset[Teamuser] = userDF.as[Teamuser]
>>
>>
>


Re: Converting dataframe to dataset question

2017-03-23 Thread Yong Zhang
Not sure I understand this problem, why I cannot reproduce it?


scala> spark.version
res22: String = 2.1.0

scala> case class Teamuser(teamid: String, userid: String, role: String)
defined class Teamuser

scala> val df = Seq(Teamuser("t1", "u1", "role1")).toDF
df: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 more 
field]

scala> df.show
+--+--+-+
|teamid|userid| role|
+--+--+-+
|t1|u1|role1|
+--+--+-+

scala> df.createOrReplaceTempView("teamuser")

scala> val newDF = spark.sql("select teamid, userid, role from teamuser")
newDF: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 
more field]

scala> val userDS: Dataset[Teamuser] = newDF.as[Teamuser]
userDS: org.apache.spark.sql.Dataset[Teamuser] = [teamid: string, userid: 
string ... 1 more field]

scala> userDS.show
+--+--+-+
|teamid|userid| role|
+--+--+-+
|t1|u1|role1|
+--+--+-+


scala> userDS.printSchema
root
 |-- teamid: string (nullable = true)
 |-- userid: string (nullable = true)
 |-- role: string (nullable = true)


Am I missing anything?


Yong



From: shyla deshpande <deshpandesh...@gmail.com>
Sent: Thursday, March 23, 2017 3:49 PM
To: user
Subject: Re: Converting dataframe to dataset question

I realized, my case class was inside the object. It should be defined outside 
the scope of the object. Thanks

On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande 
<deshpandesh...@gmail.com<mailto:deshpandesh...@gmail.com>> wrote:

Why userDS is Dataset[Any], instead of Dataset[Teamuser]?  Appreciate your 
help. Thanks

val spark = SparkSession
  .builder
  .config("spark.cassandra.connection.host", cassandrahost)
  .appName(getClass.getSimpleName)
  .getOrCreate()

import spark.implicits._
val sqlContext = spark.sqlContext
import sqlContext.implicits._

case class Teamuser(teamid:String, userid:String, role:String)
spark
  .read
  .format("org.apache.spark.sql.cassandra")
  .options(Map("keyspace" -> "test", "table" -> "teamuser"))
  .load
  .createOrReplaceTempView("teamuser")

val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser")

userDF.show()

val userDS:Dataset[Teamuser] = userDF.as[Teamuser]




Re: Converting dataframe to dataset question

2017-03-23 Thread shyla deshpande
I realized, my case class was inside the object. It should be defined
outside the scope of the object. Thanks

On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande 
wrote:

> Why userDS is Dataset[Any], instead of Dataset[Teamuser]?  Appreciate your 
> help. Thanks
>
> val spark = SparkSession
>   .builder
>   .config("spark.cassandra.connection.host", cassandrahost)
>   .appName(getClass.getSimpleName)
>   .getOrCreate()
>
> import spark.implicits._
> val sqlContext = spark.sqlContext
> import sqlContext.implicits._
>
> case class Teamuser(teamid:String, userid:String, role:String)
> spark
>   .read
>   .format("org.apache.spark.sql.cassandra")
>   .options(Map("keyspace" -> "test", "table" -> "teamuser"))
>   .load
>   .createOrReplaceTempView("teamuser")
>
> val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser")
>
> userDF.show()
>
> val userDS:Dataset[Teamuser] = userDF.as[Teamuser]
>
>


Converting dataframe to dataset question

2017-03-22 Thread shyla deshpande
Why userDS is Dataset[Any], instead of Dataset[Teamuser]?  Appreciate
your help. Thanks

val spark = SparkSession
  .builder
  .config("spark.cassandra.connection.host", cassandrahost)
  .appName(getClass.getSimpleName)
  .getOrCreate()

import spark.implicits._
val sqlContext = spark.sqlContext
import sqlContext.implicits._

case class Teamuser(teamid:String, userid:String, role:String)
spark
  .read
  .format("org.apache.spark.sql.cassandra")
  .options(Map("keyspace" -> "test", "table" -> "teamuser"))
  .load
  .createOrReplaceTempView("teamuser")

val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser")

userDF.show()

val userDS:Dataset[Teamuser] = userDF.as[Teamuser]