Re: pyspark histogram

2017-09-27 Thread Weichen Xu
If you want to avoid pulling values into python you can use hive function "histogram_numeric", you need set `SparkSession.enableHiveSupport()`, but note that, calling hive function in spark will also slow down performance. Spark-sql haven't implemented "histogram_numeric" yet. But I think it will

pyspark histogram

2017-09-27 Thread Brian Wylie
Hi All, My google/SO searching is somehow failing on this I simply want to compute histograms for a column in a Spark dataframe. There are two SO hits on this question: - https://stackoverflow.com/questions/39154325/pyspark-show-histogram-of-a-data-frame-column -