[ https://issues.apache.org/jira/browse/SPARK-22448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16445355#comment-16445355 ]
Dedunu Dhananjaya commented on SPARK-22448: ------------------------------------------- I'm thinking about implementing this. Is it okay to start working on this? > Add functions like Mode(), NumNulls(), etc. in Summarizer > --------------------------------------------------------- > > Key: SPARK-22448 > URL: https://issues.apache.org/jira/browse/SPARK-22448 > Project: Spark > Issue Type: New Feature > Components: Optimizer > Affects Versions: 2.2.0 > Reporter: Abdeali Kothari > Priority: Trivial > > Would be very useful to have a MODE() function in the Summary statistics > currently supported by DataSets. > I can see that the Summarizer has many useful functions in 2.3.0 and it would > be useful to add the following to it: > - Mode - Element that occurs maximum number of times > - CSS - Cumulative Sum of Squares ... Sum((x - mean)^2) > - NumNull - The number of values that are NULL in the column > - SUM - Just the sum of the column ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org