val sparkConf = new SparkConf()
.setMaster("local[*]")
.setAppName("Dataframe Test")
val sc = new SparkContext(sparkConf)
val sql = new SQLContext(sc)
val dataframe = sql.read.json("orders.json")
val expanded = dataframe
.explode[::[Long], Long]("items", "item1")(row => row)
.explode[::[Long], Long]("items", "item2")(row => row)
val grouped = expanded
.where(expanded("item1") !== expanded("item2"))
.groupBy("item1", "item2")
.count()
val recs = grouped
.groupBy("item1")
I found another example above, but I cant seem to figure out what this does?
val expanded = dataframe
.explode[::[Long], Long]("items", "item1")(row => row)
.explode[::[Long], Long]("items", "item2")(row => row)
On 5 January 2016 at 20:00, Deenar Toraskar <[email protected]>
wrote:
> Hi All
>
> I have the following spark sql query and would like to use convert this to
> use the dataframes api (spark 1.6). The eee, eep and pfep are all maps of
> (int -> float)
>
>
> select e.counterparty, epe, mpfe, eepe, noOfMonthseep, teee as
> effectiveExpectedExposure, teep as expectedExposure , tpfep as pfe
> |from exposureMeasuresCpty e
> | lateral view explode(eee) dummy1 as noOfMonthseee, teee
> | lateral view explode(eep) dummy2 as noOfMonthseep, teep
> | lateral view explode(pfep) dummy3 as noOfMonthspfep, tpfep
> |where e.counterparty = '$cpty' and noOfMonthseee = noOfMonthseep and
> noOfMonthseee = noOfMonthspfep
> |order by noOfMonthseep""".stripMargin
>
> Any guidance or samples would be appreciated. I have seen code snippets
> that handle arrays, but havent come across how to handle maps
>
> case class Book(title: String, words: String)
> val df: RDD[Book]
>
> case class Word(word: String)
> val allWords = df.explode('words) {
> case Row(words: String) => words.split(" ").map(Word(_))
> }
>
> val bookCountPerWord = allWords.groupBy("word").agg(countDistinct("title"))
>
>
> Regards
> Deenar
>