If you want to avoid pulling values into python you can use hive function
"histogram_numeric", you need set `SparkSession.enableHiveSupport()`, but
note that, calling hive function in spark will also slow down performance.
Spark-sql haven't implemented "histogram_numeric" yet. But I think it will
Hi All,
My google/SO searching is somehow failing on this I simply want to compute
histograms for a column in a Spark dataframe.
There are two SO hits on this question:
-
https://stackoverflow.com/questions/39154325/pyspark-show-histogram-of-a-data-frame-column
-