GitHub user bookling opened a pull request:
https://github.com/apache/spark/pull/14940
[SPARK-17383][GRAPHX]LabelPropagation
In the labelPropagation of graphx lib, node is initialized with a unique
label and at every step each node adopts the label that most of its
neighbors currently have, but ignore the label it currently have. I think it is
unreasonable, because the labe a node had is also useful. When a node trend to
has a stable label, this means there is an association between two iterations,
so a node not only affected by its neighbors, but also its current label.
so I change the code, and use both the label of its neighbors and itself.
This iterative process densely connected groups of nodes form a consensus
on a unique label to form
communities. But the communities of the LabelPropagation often
discontinuous.
Because when the label that most of its neighbors currents have are
many,e.g, node "0" has 6 neigbors labed {"1","1","2","2","3","3"},it maybe
randomly select a label. in order to get a stable label of communities, and
prevent the randomness, so I chose the max lable of node.
you can test graph with Edges: {10L->11L,10L->12L,
11L->12L,11L->14L,12L->14L,13L->14L,13L->15L,13L->16L,15L->16L,15L->17L,16L->17L
};or dandelion shape {1L->2L,2L->7L,2L->3L,2L->4L,2L->5L,2L->6L},etc.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/bookling/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14940.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14940
----
commit 11bdab6bb042cd2102570c96db17279cf6ebbd92
Author: bookling <[email protected]>
Date: 2016-08-30T17:51:43Z
to solve "label shock "
I have test the result, which is more reasonable
Because the LabelPropagation often suffers "labe shock"ï¼ and the result
of communities are often non-adjacent.
I think the label of node is userful between adjacent supersteps, and
the adjacent supersteps are relevant.
commit bb875fef8f47ec99878d972f2c17b50123375a4c
Author: bookling <[email protected]>
Date: 2016-08-30T17:55:06Z
to reduce "label shock "
I have test the result, which is more reasonable
Because the LabelPropagation often suffers "labe shock"ï¼ and the result
of communities are often non-adjacent.
I think the label of node is userful between adjacent supersteps, and
the adjacent supersteps are relevant.
commit 60e6f0ee2a3cdfb2b526a6d12887513f3aabed42
Author: XiaoSen Lee <[email protected]>
Date: 2016-09-02T18:57:29Z
Improvement labelPropagation of garphx lib
In the labelPropagation of graphx lib, node is initialized with a unique
label and at every step each node adopts the label that most of its
neighbors currently have, but ignore the label it currently have. I think it is
unreasonable, because the labe a node had is also useful. When a node trend to
has a stable label, this means there is an association between two iterations,
so a node not only affected by its neighbors, but also its current label.
so I change the code, and use both the label of its neighbors and itself.
This iterative process densely connected groups of nodes form a consensus
on a unique label to form
communities. But the communities of the LabelPropagation often
discontinuous.
Because when the label that most of its neighbors currents have are
many,e.g, node "0" has 6 neigbors labed {"1","1","2","2","3","3"},it maybe
randomly select a label. in order to get a stable label of communities, and
prevent the randomness, so I chose the max lable of node.
you can test graph with Edges: {10L->11L,10L->12L,
11L->12L,11L->14L,12L->14L,13L->14L,13L->15L,13L->16L,15L->16L,15L->17L,16L->17L
};or dandelion shape {1L->2L,2L->7L,2L->3L,2L->4L,2L->5L,2L->6L},etc.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]