Hi all,
I am trying to calculate Histogram of all columns from a CSV file using Spark Scala. I found that DoubleRDDFunctions supporting Histogram. So i coded like following for getting histogram of all columns. 1. Get column count 2. Create RDD[double] of each column and calculate Histogram of each RDD using DoubleRDDFunctions var columnIndexArray = Array.tabulate(rdd.first().length) (_ * 1) val histogramData = columnIndexArray.map(columns=>{ rdd.map(lines => lines(columns)).histogram(6) }) Is it a good way ? Can anyone suggest some better ways to tackle this ? Thanks in advance.