GitHub user WeichenXu123 opened a pull request:
https://github.com/apache/spark/pull/19156
[SPARK-19634][FOLLOW-UP][ML] Improve interface of dataframe vectorized
summarizer
## What changes were proposed in this pull request?
Make several improvements in dataframe vectorized summarizer.
1. Make the summarizer return `Vector` type for all metrics (exception
"count").
It will return "WrappedArray" type before which won't be very convenient.
2. Make `MetricsAggregate` inherit `ImplicitCastInputTypes` trait. So it
can check and implicitly cast input values.
3. Add "weight" parameter for all single metric method.
4. Update doc and improve the example code in doc.
5. Simplified test cases.
## How was this patch tested?
Test added and simplified.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/WeichenXu123/spark improve_vec_summarizer
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19156.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19156
----
commit 7b9fbdccabba3442e42e6a7600c32657dd3436ff
Author: WeichenXu <[email protected]>
Date: 2017-09-07T10:54:58Z
init pr
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]