GitHub user WeichenXu123 opened a pull request:

    https://github.com/apache/spark/pull/19156

    [SPARK-19634][FOLLOW-UP][ML] Improve interface of dataframe vectorized 
summarizer

    ## What changes were proposed in this pull request?
    
    Make several improvements in dataframe vectorized summarizer.
    
    1. Make the summarizer return `Vector` type for all metrics (exception 
"count").
    It will return "WrappedArray" type before which won't be very convenient.
    
    2. Make `MetricsAggregate` inherit `ImplicitCastInputTypes` trait. So it 
can check and implicitly cast input values.
    
    3. Add "weight" parameter for all single metric method.
    
    4. Update doc and improve the example code in doc.
    
    5. Simplified test cases.
    
    ## How was this patch tested?
    
    Test added and simplified.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/WeichenXu123/spark improve_vec_summarizer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19156.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19156
    
----
commit 7b9fbdccabba3442e42e6a7600c32657dd3436ff
Author: WeichenXu <weichen...@databricks.com>
Date:   2017-09-07T10:54:58Z

    init pr

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to