See https://gist.github.com/geoHeil/e0799860262ceebf830859716bbf in
particular:
You will probably want to use sparks imperative (non SQL) API:
.rdd
.reduceByKey {
(count1, count2) => count1 + count2
}.map {
case ((word, path), n) => (word, (path, n))
}.toDF
i.e. builds an inverted index
which
Hi all,
I want to run huge number of queries on Dataframe in Spark. I have a big
data of text documents, I loded all documents into SparkDataFrame and
create a temp table.
dataFrame.registerTempTable("table1");
I have more than 50,000 terms, I want to get the document frequency for
each by using