[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

wangmiao1981 Tue, 21 Feb 2017 15:17:19 -0800

Github user wangmiao1981 commented on the issue:

    https://github.com/apache/spark/pull/15770
  
    Yanbo Liang added a comment - 02/Nov/16 09:30 - edited
    
    I'm prefer to #1 and #3, but it looks like we can achieve both goals.
    Graph can be represented by GraphX/GraphFrame or DataFrame/RDD. PIC model 
can be trained on both of them, but we use GraphX operators in the internal 
implementation which means input data should be converted to GraphX 
representation if it's RDD of tuples. So it's straight forward to make PIC as 
one of the algorithms in GraphX(or GraphFrame when it is merged back into 
Spark). However, users may load their graph as DataFrame/RDD and transform via 
ML Pipeline which should also be supported, so it's better we can wrap PIC of 
GraphX/GraphFrame as an Pipeline stage and then ML users can use it as well.
    For some historical reasons(we don't want to add new features to GraphX), I 
propose to split this task into the following step:
    
        Put PIC in Pipeline as a Transformer, use the GraphX operators in the 
implementation (This is consistent with Joseph K. Bradley's proposal).
        Add PIC algorithms to GraphFrames when it is merged into Spark.
        Make the ML PIC as a wrapper to call the GraphFrames PIC implementation.
    
    I think this scenario should be better for different users(ML users and 
GraphFrames users), but still open to hear your thoughts. Thanks.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...

Reply via email to