GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19156
[SPARK-19634][FOLLOW-UP][ML] Improve interface of dataframe vectorized summarizer ## What changes were proposed in this pull request? Make several improvements in dataframe vectorized summarizer. 1. Make the summarizer return `Vector` type for all metrics (exception "count"). It will return "WrappedArray" type before which won't be very convenient. 2. Make `MetricsAggregate` inherit `ImplicitCastInputTypes` trait. So it can check and implicitly cast input values. 3. Add "weight" parameter for all single metric method. 4. Update doc and improve the example code in doc. 5. Simplified test cases. ## How was this patch tested? Test added and simplified. You can merge this pull request into a Git repository by running: $ git pull https://github.com/WeichenXu123/spark improve_vec_summarizer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19156.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19156 ---- commit 7b9fbdccabba3442e42e6a7600c32657dd3436ff Author: WeichenXu <weichen...@databricks.com> Date: 2017-09-07T10:54:58Z init pr ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org