[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

yu-iskw Fri, 29 Aug 2014 00:29:57 -0700

Github user yu-iskw commented on the pull request:

    https://github.com/apache/spark/pull/1964#issuecomment-53845417
  
    @mengxr, @dlwh, @erikerlandson, @rnowling, 
    
    Thank you for your feedback.
    I agree with that idea which distance metrics/measures are implemented in 
Breeze. However, I am  a bit worried about the interface of the distance 
metrics in Spark.
    
    I could implement sample distance metric for Breeze like below. However, I 
couldn't thought of the interface for Spark.
    https://gist.github.com/yu-iskw/37ae208c530f7018e048
    
    I expect users can switch distance metric and use their own distance 
function in some machine learning algorithms in MLlib, such as Kmeans. Because 
some of them are applied to `RDD[Vector]` as its input data.  I think that 
users can make their own function metric, using Spark Vector instead of Breeze 
Vector.  For example, `(v1: Vector, v2: Vector) => Double`. At least, the 
interface shouldn't depend on Breeze.
    
    ## High Level Use Case Image
    
    ```
    KMeans().setMeasure(EuclideanDistance).run(RDD[Vector])
    KMeans().setMeasure((v1: Vector, v2: Vector) => fun(v1, 
v2)).run(RDD[Vector])
    ```
    
    I'm sorry if I misunderstood your opinion. If you have any good idea, can 
you please show me mock code for use case?
    
    (Please be generous in finding my rude expressions,if any. I don't mean 
such, because I'm not good at English.)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

Reply via email to