[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575233#comment-16575233 ] ASF GitHub Bot commented on TINKERPOP-1967: --- Github user asfgit closed the pull request at: https://github.com/apache/tinkerpop/pull/897 > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562274#comment-16562274 ] ASF GitHub Bot commented on TINKERPOP-1967: --- Github user dkuppitz commented on the issue: https://github.com/apache/tinkerpop/pull/897 Nice. VOTE: +1 > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562261#comment-16562261 ] ASF GitHub Bot commented on TINKERPOP-1967: --- Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/897 cool - pushed a test > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562251#comment-16562251 ] ASF GitHub Bot commented on TINKERPOP-1967: --- Github user dkuppitz commented on the issue: https://github.com/apache/tinkerpop/pull/897 Mucho mejor. ``` gremlin> g.withComputer().V().dedup().connectedComponent().valueMap() ==>[gremlin.connectedComponentVertexProgram.component:[1],name:[ripple],lang:[java]] ==>[gremlin.connectedComponentVertexProgram.component:[1],name:[marko],age:[29]] ==>[gremlin.connectedComponentVertexProgram.component:[1],name:[lop],lang:[java]] ==>[gremlin.connectedComponentVertexProgram.component:[1],name:[vadas],age:[27]] ==>[gremlin.connectedComponentVertexProgram.component:[1],name:[josh],age:[32]] ==>[gremlin.connectedComponentVertexProgram.component:[1],name:[peter],age:[35]] ``` Perhaps there should be a test case for this query, just in case somebody's gonna try to optimize the extra halted traverser code away. > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561989#comment-16561989 ] ASF GitHub Bot commented on TINKERPOP-1967: --- Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/897 @dkuppitz i just pushed a fix for halted traverser stuff - does that look right to you now? or am i still missing something? > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561861#comment-16561861 ] ASF GitHub Bot commented on TINKERPOP-1967: --- Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/897 @FlorianHockmann regarding your point - yes, those tokens should be added to GLVs. I think we should wait for #893 to merge because it started the pattern for adding such tokens. I think that approach can be generalized better and then we can implement it for this PR and for the @dkuppitz at #882. It may even be best to simply merge all three PRs and then implement that improvement once they are all in a single branch (i'd be in favor of that approach). > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561857#comment-16561857 ] ASF GitHub Bot commented on TINKERPOP-1967: --- Github user spmallette commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/897#discussion_r206114783 --- Diff: docs/src/recipes/connected-components.asciidoc --- @@ -35,46 +54,92 @@ g.addV().property(id, "A").as("a"). addE("link").from("d").to("e").iterate() -One way to detect the various subgraphs would be to do something like this: + Small graph traversals +Connected components in a small graph can be determined with either an OLTP traversal or the OLAP +`connectedComponent()`-step. The `connectedComponent()`-step is available as of TinkerPop 3.4.0 and is +described in more detail in the +link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation]. +The traversal looks like: [gremlin-groovy,existing] -g.V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()). <1> - path().aggregate("p"). <2> - unfold().dedup(). <3> - map(__.as("v").select("p").unfold(). <4> - filter(unfold().where(eq("v"))). - unfold().dedup().order().by(id).fold()). - dedup() <5> +g.withComputer().V().connectedComponent(). +group().by('gremlin.connectedComponentVertexProgram.component'). --- End diff -- Ended up dong this: https://github.com/apache/tinkerpop/commit/627b6e95b24130dea5582903add52c06b7b64a41 > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561848#comment-16561848 ] ASF GitHub Bot commented on TINKERPOP-1967: --- Github user spmallette commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/897#discussion_r206112146 --- Diff: docs/src/recipes/connected-components.asciidoc --- @@ -35,46 +54,92 @@ g.addV().property(id, "A").as("a"). addE("link").from("d").to("e").iterate() -One way to detect the various subgraphs would be to do something like this: + Small graph traversals +Connected components in a small graph can be determined with either an OLTP traversal or the OLAP +`connectedComponent()`-step. The `connectedComponent()`-step is available as of TinkerPop 3.4.0 and is +described in more detail in the +link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation]. +The traversal looks like: [gremlin-groovy,existing] -g.V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()). <1> - path().aggregate("p"). <2> - unfold().dedup(). <3> - map(__.as("v").select("p").unfold(). <4> - filter(unfold().where(eq("v"))). - unfold().dedup().order().by(id).fold()). - dedup() <5> +g.withComputer().V().connectedComponent(). +group().by('gremlin.connectedComponentVertexProgram.component'). --- End diff -- This comment combined with the one from @FlorianHockmann is troublesome because GLVs don't have `ConnectedComponentVertexProgram` so the Gremlin won't work. I think that's why I wrote the docs this way. Maybe I need to move `COMPONENT` to `ConnectedComponent` and then allow GLVs to have access. I'll do it that way. @dkuppitz do you have a similar issue with this? https://github.com/apache/tinkerpop/pull/882/files#diff-36e52ac0c49a08a7f3e6ee54b60a3745R73 > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561210#comment-16561210 ] ASF GitHub Bot commented on TINKERPOP-1967: --- Github user dkuppitz commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/897#discussion_r205981525 --- Diff: docs/src/recipes/connected-components.asciidoc --- @@ -35,46 +54,92 @@ g.addV().property(id, "A").as("a"). addE("link").from("d").to("e").iterate() -One way to detect the various subgraphs would be to do something like this: + Small graph traversals +Connected components in a small graph can be determined with either an OLTP traversal or the OLAP +`connectedComponent()`-step. The `connectedComponent()`-step is available as of TinkerPop 3.4.0 and is +described in more detail in the +link:http://tinkerpop.apache.org/docs/x.y.z/reference/#connectedcomponent-step[Reference Documentation]. +The traversal looks like: [gremlin-groovy,existing] -g.V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()). <1> - path().aggregate("p"). <2> - unfold().dedup(). <3> - map(__.as("v").select("p").unfold(). <4> - filter(unfold().where(eq("v"))). - unfold().dedup().order().by(id).fold()). - dedup() <5> +g.withComputer().V().connectedComponent(). +group().by('gremlin.connectedComponentVertexProgram.component'). --- End diff -- I think you should use the constant `ConnectedComponentVertexProgram.COMPONENT`. > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560678#comment-16560678 ] ASF GitHub Bot commented on TINKERPOP-1967: --- Github user FlorianHockmann commented on the issue: https://github.com/apache/tinkerpop/pull/897 Shouldn't we also add the `ConnectedComponent` class to the GLVs so users can easily do something like this in other languages: ```java g.V().hasLabel('person'). connectedComponent(). with(ConnectedComponent.propertyName, 'component'). with(ConnectedComponent.edges, outE('knows')) ``` ? > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560291#comment-16560291 ] ASF GitHub Bot commented on TINKERPOP-1967: --- GitHub user spmallette opened a pull request: https://github.com/apache/tinkerpop/pull/897 TINKERPOP-1967 connectedComponent() https://issues.apache.org/jira/browse/TINKERPOP-1967 Adds a `connectedComponent()` step. Was previously planning to hold this and #882 until i'd explored TINKERPOP-1991 in more detail, but I don't think adding that for 3.4.0 is reasonable. I think we cover a lot of the common problems/questions that we deal with day-to-day with these two algorithms so while we sorta extend the number of steps in the "core" space, Given that a separate "algorithm" space was determined after both of these steps were done, I think we can just include these for now and then work toward the "algorithm" space later. In the mean time we can look to avoid adding new "algorithm" steps to "core". All tests pass with `docker/build.sh -t -n -i` VOTE +1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/tinkerpop TINKERPOP-1967 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/897.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #897 > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513749#comment-16513749 ] ASF GitHub Bot commented on TINKERPOP-1967: --- Github user asfgit closed the pull request at: https://github.com/apache/tinkerpop/pull/877 > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511602#comment-16511602 ] ASF GitHub Bot commented on TINKERPOP-1967: --- Github user vtslab closed the pull request at: https://github.com/apache/tinkerpop/pull/875 > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511601#comment-16511601 ] ASF GitHub Bot commented on TINKERPOP-1967: --- Github user vtslab commented on the issue: https://github.com/apache/tinkerpop/pull/875 PR is malformed due to problems in master. I close this one and prepare a new PR with the same commits. > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511646#comment-16511646 ] ASF GitHub Bot commented on TINKERPOP-1967: --- GitHub user vtslab opened a pull request: https://github.com/apache/tinkerpop/pull/877 Tinkerpop 1967 Add a connectedComponent() step - vtslab contribution2 This merges the new OLAP step into a corrected version of the old recipe. I did not adapt the release-update file, which is no longer consistent with the sutiation (leave that to you). I made a comment in the TINKERPOP-1967 branch proper about the gremlin-console import section. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vtslab/incubator-tinkerpop TINKERPOP-1967-vtslab2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/877.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #877 commit 28d4b02660f3f5c682538acaf4768218d9a8b40a Author: HadoopMarc Date: 2018-05-21T12:03:54Z Merged vtslab recipe for connected components commit b087822708707013f7f0cd3b5abaf6d0f574a72e Author: HadoopMarc Date: 2018-06-10T13:17:17Z Extended the connected-components recipe > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510340#comment-16510340 ] stephen mallette commented on TINKERPOP-1967: - I've resolved the problem on {{master}} and then rebased {{TINKERPOP-1967}} on that. Then I fixed some problems in {{TINKERPOP-1967}} and force pushed those. I generated docs on both branches and both work locally (didn't try docker though). sorry for the trouble - you should be good to continue now. > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510014#comment-16510014 ] Marc de Lignie commented on TINKERPOP-1967: --- [~spmallette] The recent with() step commit really has the building of the traversal.asciidoc fail, both in master and in TINKERPOP-1967. Tag 3.3.3 builds fine with bin/process-docs.sh in my environment. So, I will wait for resolution before I rebase my commits for TINKERPOP-1967. I also tried "{{docker/build.sh -d", but that failed on gremlin-python irrespective of the TinkerPop version:}} {{[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (setup-py-env) on project gremlin-python: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "virtualenv" (in directory "/usr/src/tinkerpop/gremlin-python/target/python2"): error=2, No such file or directory}} {{[ERROR] around Ant part .. @ 13:107 in /usr/src/tinkerpop/gremlin-python/target/antrun/build-main.xml}} > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508703#comment-16508703 ] Marc de Lignie commented on TINKERPOP-1967: --- [~spmallette] Merging my commits on current 1967 is painless, however, docs generation from console now fails on the ref/traversal doc in the connectedComponent part (which I did not change). Have to look into this some other time before I create the updated PR. > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508635#comment-16508635 ] stephen mallette commented on TINKERPOP-1967: - hmm - 06fdb7d1ac319c7f8afc115a54aa535e8078039b is the hash i have locally as what is current and that matches what's on TINKERPOP-1967 now. I don't remember pushing to that recently. Well, hopefully you can fix up the pull request easily against this "new" history and we can get it merged. Please let me know if you have any doubts along the way. > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508618#comment-16508618 ] Marc de Lignie commented on TINKERPOP-1967: --- [~spmallette] I gave it a look. When I pulled your branch the connectedComponent commit was dated at May 17 with hash f91d3d9d21f1e921a071200668b0fd0e33b321a8, see: [https://github.com/vtslab/incubator-tinkerpop/commits/TINKERPOP-1967-vtslab] When I look at the current TINKERPOP-1967 branch, the connectedComponent commit is dated around May 6 with hash fc0b2dc5c2f0ecce9a1690df36dc9b061dad5d1b, see: [https://github.com/apache/tinkerpop/commits/TINKERPOP-1967] Did you change history? Anyway, I guess you want the PR to be based on the actual history :) > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507961#comment-16507961 ] ASF GitHub Bot commented on TINKERPOP-1967: --- Github user spmallette commented on the issue: https://github.com/apache/tinkerpop/pull/875 @vtslab is there something amiss with this PR? any idea why are there so many conflicted files? > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507385#comment-16507385 ] ASF GitHub Bot commented on TINKERPOP-1967: --- GitHub user vtslab opened a pull request: https://github.com/apache/tinkerpop/pull/875 TINKERPOP-1967 Add a connectedComponent() step - vtslab contribution This merges the new OLAP step into a corrected version of the old recipe. I did not adapt the release-update file, which is no longer consistent with the sutiation (leave that to you). I made a comment in the TINKERPOP-1967 branch proper about the gremlin-console import section. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vtslab/incubator-tinkerpop TINKERPOP-1967-vtslab Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/875.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #875 commit f91d3d9d21f1e921a071200668b0fd0e33b321a8 Author: Stephen Mallette Date: 2018-05-17T18:44:01Z TINKERPOP-1967 Added connectedComponent() step Deprecated the recipe for "Connected Components" but left the old content present as I felt it had educational value. commit 552cc228f4b86ddf76c250667466291ece2fc705 Author: HadoopMarc Date: 2018-05-21T12:03:54Z Merged vtslab recipe for connected components commit 657ffd423df093909fc861a079280e9e2b94f100 Author: HadoopMarc Date: 2018-06-10T13:17:17Z Extended the connected-components recipe > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505204#comment-16505204 ] stephen mallette commented on TINKERPOP-1967: - I guess you just need to set {{VertexProgramStrategy.WORKERS}} in the configuration like in {{TinkerGraphComputerProvider}} in the tests: {code} graph.traversal().withStrategies(VertexProgramStrategy.create(new MapConfiguration(new HashMap() {{ put(VertexProgramStrategy.WORKERS, RANDOM.nextInt(Runtime.getRuntime().availableProcessors()) + 1); put(VertexProgramStrategy.GRAPH_COMPUTER, RANDOM.nextBoolean() ? GraphComputer.class.getCanonicalName() : TinkerGraphComputer.class.getCanonicalName()); }}))); {code} > I saw that the ClusterCountMapReduce and ClusterPopulationMapReduce classes > are not usable for the ConnectedComponentVertexProgram. tbh, i wasn't sure that i saw the point of making similar classes as i was mostly thinking about {{connectedComponent()}} step usage. Do you think those should be made available for some reason? > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505181#comment-16505181 ] Marc de Lignie commented on TINKERPOP-1967: --- [~spmallette] The code for the VertexProgramStrategy suggest that you can configure the number of workers that is used by the TinkerGraphComputer when calling one of the xyzVertexProgramSteps. Are the graph properties for this documented anywhere? Also, I saw that the ClusterCountMapReduce and ClusterPopulationMapReduce classes are not usable for the ConnectedComponentVertexProgram. Was that a true choice (given that these classes predate the VertexProgram steps). With the ComputerProgram builder it was easy to configure the number of workers. > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503837#comment-16503837 ] Marc de Lignie commented on TINKERPOP-1967: --- The comparison between the OLTP query and the connectedComponent step was more productive. Below you see figures for graphs with random components of fixed size with a edge/vertex ratio of 6, for which the two algorithms have the same order of magnitude runtime. This means the OLTP query could still be useful for networks with long strings of vertices. So, I will do some more work to produce some comparison graphs, add them to my work and rework them into a PR against TTINKERPOP-1967. OLTP: Duration: 0.064 Component sizes: [100, 100, 100, 100, 100, 100, 100, 100, 100, 100] Step: Duration: 0.052 Component sizes: [100, 100, 100, 100, 100, 100, 100, 100, 100, 100] OLTP: Duration: 0.134 Component sizes: [200, 200, 200, 200, 200, 200, 200, 200, 200, 200] Step: Duration: 0.096 Component sizes: [200, 200, 200, 200, 200, 200, 200, 200, 200, 200] OLTP: Duration: 0.269 Component sizes: [400, 400, 400, 400, 400, 400, 400, 400, 400, 400] Step: Duration: 0.129 Component sizes: [400, 400, 400, 400, 400, 400, 400, 400, 400, 400] OLTP: Duration: 0.585 Component sizes: [800, 800, 800, 800, 800, 800, 800, 800, 800, 800] Step: Duration: 0.252 Component sizes: [800, 800, 800, 800, 800, 800, 800, 800, 800, 800] OLTP: Duration: 1.183 Component sizes: [1600, 1600, 1600, 1600, 1600, 1600, 1600, 1600, 1600, 1600] Step: Duration: 0.559 Component sizes: [1600, 1600, 1600, 1600, 1600, 1600, 1600, 1600, 1600, 1600] OLTP: Duration: 2.375 Component sizes: [3200, 3200, 3200, 3200, 3200, 3200, 3200, 3200, 3200, 3200] Step: Duration: 1.125 Component sizes: [3200, 3200, 3200, 3200, 3200, 3200, 3200, 3200, 3200, 3200] OLTP: Duration: 4.986 Component sizes: [6400, 6400, 6400, 6400, 6400, 6400, 6400, 6400, 6400, 6400] Step: Duration: 2.326 Component sizes: [6400, 6400, 6400, 6400, 6400, 6400, 6400, 6400, 6400, 6400] OLTP: Duration: 10.507 Component sizes: [12800, 12800, 12800, 12800, 12800, 12800, 12800, 12800, 12800, 12800] Step: Duration: 6.308 Component sizes: [12800, 12800, 12800, 12800, 12800, 12800, 12800, 12800, 12800, 12800] OLTP: Duration: 22.265 Component sizes: [25600, 25600, 25600, 25600, 25600, 25600, 25600, 25600, 25600, 25600] Step: Duration: 16.212 Component sizes: [25600, 25600, 25600, 25600, 25600, 25600, 25600, 25600, 25600, 25600] > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502391#comment-16502391 ] Marc de Lignie commented on TINKERPOP-1967: --- Quite busy at the moment, with other things... I put in a day or so with the Friendster graph but was not lucky, A single 48Gb machine really could not dig it (endless spilling for lack of memory). Hortonworks (the software on the cluster I can access) enriched their hadoop-2.7 with a hadoop-2.9 feature that made it incompatible with TinkerPop's vanilla hadoop and an attempt to build TinkerPop with the HDP artifacts proved unsuccesful. I quit the Friendster graph for now and will do the hopefully easier comparison between the OLTP query and the connectedComponent() step on a few atificial graphs of increasing size. If the OLTP query still seems useful, it will not be much work to contibrute the documentation I prepared. If not, my contribution has not much added value. I will try to find some time. > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502193#comment-16502193 ] stephen mallette commented on TINKERPOP-1967: - [~HadoopMarc] when TINKERPOP-1975 merges i will likely issue a PR for this issue. Do you think you will have some time to get that documentation update PR submitted and do some testing as we discussed in the earlier comments? > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482932#comment-16482932 ] stephen mallette commented on TINKERPOP-1967: - > I am not so sure we can simply dismiss the OLTP traversal as educational > only after the improvements Robert Dale and I made Your recipe work looks good to me - want to submit a PR to my branch with your content updates? > Also for your code it would be worthwhile to run a one-off test with the > friendster graph Agreed - if you are in a position to test it at scale, it would be great to hear your results. > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482925#comment-16482925 ] Marc de Lignie commented on TINKERPOP-1967: --- I gave your feature branch a quick scan: * your {{ConnectedComponentVertexProgram}} is clearly much more developed with all the traversal steps present * I am not so sure we can simply dismiss the OLTP traversal as educational only after the improvements Robert Dale and I made (see the code in the recipe of my feature branch). I would like to run some performance tests on in-memory TinkerGraphs to be sure. If there is still use for the OLTP query, my take on the recipe would provide useful text (apart from the external references and the explanation about weakly connected components) * Also for your code it would be worthwhile to run a one-off test with the friendster graph (I plan to do this, but no problem if you chime in on this). * It is a good idea indeed to mention that the vertexprogram will be present since 3.4.0, because users of older versions may yet land on the latest ref docs. > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482489#comment-16482489 ] stephen mallette commented on TINKERPOP-1967: - yeah - i built a {{ConnectedComponentVertexProgram}} and related step: https://github.com/apache/tinkerpop/tree/TINKERPOP-1967 Not sure how much our work overlaps at this point. Maybe take a look at my branch and see what you think? > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-1967) Add a connectedComponent() step
[ https://issues.apache.org/jira/browse/TINKERPOP-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482474#comment-16482474 ] Marc de Lignie commented on TINKERPOP-1967: --- I pushed my work on [#TINKERPOP-1852] to [https://github.com/vtslab/incubator-tinkerpop/tree/components|[https://github.com/vtslab/incubator-tinkerpop/tree/components].] It includes the WeakComponentsVertexProgram + tests as well as the improved recipe for connected components with ideas from the gremlin user list and the new vertex program. I got stuck on also wanting to include an algo for storage-backed graphs like JanusGraph, but this new Jira issue prompted me to no longer pursue this. Largest issue right now to finish the work is to have the vertex program tested on the friendster graph (with known published outcome). Small issues are: 1) not passing the revapi check and 2) not having updated yet the vertexprogram with [#TINKERPOP-1862]. @Stephen, are you, as assignee, also working on this? > Add a connectedComponent() step > --- > > Key: TINKERPOP-1967 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1967 > Project: TinkerPop > Issue Type: Improvement > Components: process >Affects Versions: 3.3.3 >Reporter: stephen mallette >Assignee: stephen mallette >Priority: Minor > Fix For: 3.4.0 > > > Given TINKERPOP-1852 we should probably just simplify and improve performance > of connected component identification. Implementing this will involved the > creation of {{ConnectedComponentVertexProgram}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)