Ted Malaska created SPARK-9237: ---------------------------------- Summary: Added Top N Column Values for DataFrames Key: SPARK-9237 URL: https://issues.apache.org/jira/browse/SPARK-9237 Project: Spark Issue Type: Improvement Reporter: Ted Malaska Priority: Minor
This jira is to add a very common data quality check into dataframes. A quick outline of this functionality can be seen in the following blog post http://blog.cloudera.com/blog/2015/07/how-to-do-data-quality-checks-using-apache-spark-dataframes/ There are two parts to this Jira. 1. How to implement the Top N Count. Which I will start with the implementation in the blog 2. Where to add the function. Ether straight off Dataframe, in Dataframe describe or in DataFrameStatFunctions. I will start with putting it into DataFrameStatFunctions. Please let me know if you have any input. Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org