Re: How can I pass a Data Frame from object to another class

Mich Talebzadeh Sun, 06 Mar 2016 02:58:26 -0800

Thanks for this tip

The way I do it is to pass SparckContext "sc" to method
firstquery.firstquerym by calling the following


val firstquery =  new FirstQuery
firstquery.firstquerym(sc, rs)


And creating the method as follows:

class FirstQuery {
   def firstquerym(sc: org.apache.spark.SparkContext, rs:
org.apache.spark.sql.DataFrame) {
   val sqlContext = SQLContext.getOrCreate(sc)
       println ("\nfirst query at"); sqlContext.sql("SELECT
FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss')
").collect.foreach(println)
      val rs1 =
rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
  }
}

This works. However, I don't seem to invoke getOrCreate without passing sc?

Is this the way you are implying. Also why "sc" is not available within the
life of JVM please

Thanks




Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 6 March 2016 at 01:25, Ted Yu <yuzhih...@gmail.com> wrote:

> Looking at the methods you call on HiveContext, they seem to belong
> to SQLContext.
>
> For SQLContext, you can use the below method of SQLContext in FirstQuery
> to retrieve SQLContext:
>
>   def getOrCreate(sparkContext: SparkContext): SQLContext = {
>
> FYI
>
> On Sat, Mar 5, 2016 at 3:37 PM, Mich Talebzadeh <mich.talebza...@gmail.com
> > wrote:
>
>> I managed to sort this one out.
>>
>> The class should be defined as below with its method accepting two input
>> parameters for HiveContext and rs as below
>>
>> class FirstQuery {
>>    def firstquerym(HiveContext: org.apache.spark.sql.hive.HiveContext,
>> rs: org.apache.spark.sql.DataFrame) {
>>        println ("\nfirst query at"); HiveContext.sql("SELECT
>> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss')
>> ").collect.foreach(println)
>>       val rs1 =
>> rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
>>   }
>> }
>>
>> and called from the main method as follows:
>>
>> val firstquery =  new FirstQuery
>> firstquery.firstquerym(HiveContext, rs)
>>
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 5 March 2016 at 20:56, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I can use sbt to compile and run the following code. It works without
>>> any problem.
>>>
>>> I want to divide this into the obj and another class. I would like to do
>>> the result set joining tables identified by Data Frame 'rs' and then calls
>>> the method "firstquerym" in the class FirstQuery to do the calculation
>>> identified as "rs1"
>>>
>>> Now it needs "rs" to be available in class FrstQuery. Two questions
>>> please
>>>
>>>
>>>    1. How can I pass rs to class FirstQuery
>>>    2. Is there a better way of modularising this work so I can use
>>>    methods defined in another class to be called in main method
>>>
>>> Thanks
>>>
>>> import org.apache.spark.SparkContext
>>> import org.apache.spark.SparkConf
>>> import org.apache.spark.sql.Row
>>> import org.apache.spark.sql.hive.HiveContext
>>> import org.apache.spark.sql.types._
>>> import org.apache.spark.sql.SQLContext
>>> import org.apache.spark.sql.functions._
>>> //
>>> object Harness4 {
>>>   def main(args: Array[String]) {
>>>   val conf = new
>>> SparkConf().setAppName("Harness4").setMaster("local[*]").set("spark.driver.allowMultipleContexts",
>>> "true")
>>>   val sc = new SparkContext(conf)
>>>   // Note that this should be done only after an instance of
>>> org.apache.spark.sql.SQLContext is created. It should be written as:
>>>   val sqlContext= new org.apache.spark.sql.SQLContext(sc)
>>>   import sqlContext.implicits._
>>>   val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>> println ("\nStarted at"); HiveContext.sql("SELECT
>>> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss')
>>> ").collect.foreach(println)
>>> HiveContext.sql("use oraclehadoop")
>>> var s =
>>> HiveContext.table("sales").select("AMOUNT_SOLD","TIME_ID","CHANNEL_ID")
>>> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>> val t =
>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>> println ("\ncreating data set at"); HiveContext.sql("SELECT
>>> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss')
>>> ").collect.foreach(println)
>>> val rs =
>>> s.join(t,"time_id").join(c,"channel_id").groupBy("calendar_month_desc","channel_desc").agg(sum("amount_sold").as("TotalSales"))
>>> //println ("\nfirst query at"); HiveContext.sql("SELECT
>>> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss')
>>> ").collect.foreach(println)
>>> //val rs1 =
>>> rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
>>> val firstquery =  new FirstQuery
>>> firstquery.firstquerym
>>>  }
>>> }
>>> //
>>> class FirstQuery {
>>>    def firstquerym {
>>>       println ("\nfirst query at"); HiveContext.sql("SELECT
>>> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss')
>>> ").collect.foreach(println)
>>>       val rs1 =
>>> rs.orderBy("calendar_month_desc","channel_desc").take(5).foreach(println)
>>>   }
>>> }
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>
>>
>

Re: How can I pass a Data Frame from object to another class

Reply via email to