however to really enjoy functional programming i assume you also want to use lambda in your map and filter, which means you need to convert DataFrame to Dataset, using df.as[SomeCaseClass]. Just be aware that its somewhat early days for Dataset.
On Mon, Feb 22, 2016 at 6:45 PM, Kevin Mellott <kevin.r.mell...@gmail.com> wrote: > In your example, the *rs* instance should be a DataFrame object. In other > words, the result of *HiveContext.sql* is a DataFrame that you can > manipulate using *filter, map, *etc. > > > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.hive.HiveContext > > > On Mon, Feb 22, 2016 at 5:16 PM, Mich Talebzadeh < > mich.talebza...@cloudtechnologypartners.co.uk> wrote: > >> Hi, >> >> I have data stored in Hive tables that I want to do simple manipulation. >> >> Currently in Spark I perform the following with getting the result set >> using SQL from Hive tables, registering as a temporary table in Spark >> >> Now Ideally I can get the result set into a DF and work on DF to slice >> and dice the data using functional programming with filter, map. split etc. >> >> I wanted to get some ideas on how to go about it. >> >> thanks >> >> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >> >> HiveContext.sql("use oraclehadoop") >> val rs = HiveContext.sql("""SELECT t.calendar_month_desc, c.channel_desc, >> SUM(s.amount_sold) AS TotalSales >> FROM smallsales s, times t, channels c >> WHERE s.time_id = t.time_id >> AND s.channel_id = c.channel_id >> GROUP BY t.calendar_month_desc, c.channel_desc >> """) >> *rs.registerTempTable("tmp")* >> >> >> HiveContext.sql(""" >> SELECT calendar_month_desc AS MONTH, channel_desc AS CHANNEL, TotalSales >> from tmp >> ORDER BY MONTH, CHANNEL >> """).collect.foreach(println) >> HiveContext.sql(""" >> SELECT channel_desc AS CHANNEL, MAX(TotalSales) AS SALES >> FROM tmp >> GROUP BY channel_desc >> order by SALES DESC >> """).collect.foreach(println) >> >> >> -- >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> http://talebzadehmich.wordpress.com >> >> NOTE: The information in this email is proprietary and confidential. This >> message is for the designated recipient only, if you are not the intended >> recipient, you should destroy it immediately. Any information in this >> message shall not be understood as given or endorsed by Cloud Technology >> Partners Ltd, its subsidiaries or their employees, unless expressly so >> stated. It is the responsibility of the recipient to ensure that this email >> is virus free, therefore neither Cloud Technology partners Ltd, its >> subsidiaries nor their employees accept any responsibility. >> >> >> >