Re: run huge number of queries in Spark

2018-04-04 Thread Georg Heiler
See https://gist.github.com/geoHeil/e0799860262ceebf830859716bbf in particular: You will probably want to use sparks imperative (non SQL) API: .rdd .reduceByKey { (count1, count2) => count1 + count2 }.map { case ((word, path), n) => (word, (path, n)) }.toDF i.e. builds an inverted index which

run huge number of queries in Spark

2018-04-04 Thread Donni Khan
Hi all, I want to run huge number of queries on Dataframe in Spark. I have a big data of text documents, I loded all documents into SparkDataFrame and create a temp table. dataFrame.registerTempTable("table1"); I have more than 50,000 terms, I want to get the document frequency for each by using