[jira] [Commented] (TINKERPOP-1585) OLAP dedup over non elements
[ https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806270#comment-15806270 ] ASF GitHub Bot commented on TINKERPOP-1585: --- Github user dkuppitz commented on the issue: https://github.com/apache/tinkerpop/pull/524 `docker/build.sh -t -i` succeeded. VOTE: +1 > OLAP dedup over non elements > > > Key: TINKERPOP-1585 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1585 > Project: TinkerPop > Issue Type: Bug > Components: hadoop, process >Affects Versions: 3.2.3 >Reporter: Daniel Kuppitz >Assignee: Marko A. Rodriguez > > OLAP {{dedup()}} is highly inefficient when it's fed with non elements. > In a customer project a query similar tho the following returned a result in > slightly more than 6 seconds: > {noformat} > persistedRDD. > V().hasLabel("label1","label2"). > inE("edgeLabel1","edgeLabel2").outV(). > id().count() > {noformat} > The same query with {{dedup()}} added: > {noformat} > persistedRDD. > V().hasLabel("label1","label2"). > inE("edgeLabel1","edgeLabel2").outV(). > id().dedup().count() > {noformat} > ...took more than 120 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TINKERPOP-1443) Use an API checker during build
[ https://issues.apache.org/jira/browse/TINKERPOP-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806023#comment-15806023 ] ASF GitHub Bot commented on TINKERPOP-1443: --- Github user metlos commented on the issue: https://github.com/apache/tinkerpop/pull/494 Hmm, maven central seems to be taking its time - it still doesn't have the latest revapi-java version available (which is why CI failed for the latest commit). But anyway, I've updated the versions to the latest, so you can move on to the internal branch. Let's hope maven central gets synced soon. Thanks! > Use an API checker during build > --- > > Key: TINKERPOP-1443 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1443 > Project: TinkerPop > Issue Type: Improvement > Components: build-release >Affects Versions: 3.2.2 >Reporter: Lukas Krejci > > Tinkerpop 3.2.2 changed the signature of the method > {{GraphTraversal.hasLabel}} from {{(String...)}} to {{(String, String...)}}. > While this is certainly an improvement, it is both source and binary > incompatible change. > I.e. even if every usage of {{hasLabel}} had at least one parameter in the > user code, none of those calls will work until all the user code is > recompiled using Tinkerpop 3.2.2. > I don't know the versioning policy of Tinkerpop but changes like the above in > a micro/patch release are generally unexpected. > Please consider API checkers like http://revapi.org to warn about such > incompatible API changes... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] tinkerpop issue #494: TINKERPOP-1443 - Introduce API check into the build
Github user metlos commented on the issue: https://github.com/apache/tinkerpop/pull/494 Hmm, maven central seems to be taking its time - it still doesn't have the latest revapi-java version available (which is why CI failed for the latest commit). But anyway, I've updated the versions to the latest, so you can move on to the internal branch. Let's hope maven central gets synced soon. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (TINKERPOP-1585) OLAP dedup over non elements
[ https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805596#comment-15805596 ] ASF GitHub Bot commented on TINKERPOP-1585: --- Github user okram commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/524#discussion_r95010996 --- Diff: spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/traversal/strategy/optimization/SparkInterceptorStrategyTest.java --- @@ -142,7 +142,7 @@ public void shouldSuccessfullyEvaluateInterceptedTraversals() throws Exception { test(6l, g.V().out().values("name").count()); test(2l, g.V().out("knows").values("name").count()); test(3l, g.V().in().has("name", "marko").count()); -test(6l, g.V().repeat(__.dedup()).times(2).count()); +test(0l, g.V().repeat(__.dedup()).times(2).count()); --- End diff -- As discussed in IM. This is actually a bug in `RepeatUnrollStrategy` that was introduced in 3.2, but doesn't exist in 3.1. Added a test case to `DedupTest` and both OLTP and OLAP now behave as expected. > OLAP dedup over non elements > > > Key: TINKERPOP-1585 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1585 > Project: TinkerPop > Issue Type: Bug > Components: hadoop, process >Affects Versions: 3.2.3 >Reporter: Daniel Kuppitz >Assignee: Marko A. Rodriguez > > OLAP {{dedup()}} is highly inefficient when it's fed with non elements. > In a customer project a query similar tho the following returned a result in > slightly more than 6 seconds: > {noformat} > persistedRDD. > V().hasLabel("label1","label2"). > inE("edgeLabel1","edgeLabel2").outV(). > id().count() > {noformat} > The same query with {{dedup()}} added: > {noformat} > persistedRDD. > V().hasLabel("label1","label2"). > inE("edgeLabel1","edgeLabel2").outV(). > id().dedup().count() > {noformat} > ...took more than 120 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] tinkerpop pull request #524: TINKERPOP-1585 & TINKERPOP-1590: DedupGlobalSte...
Github user okram commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/524#discussion_r95010996 --- Diff: spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/traversal/strategy/optimization/SparkInterceptorStrategyTest.java --- @@ -142,7 +142,7 @@ public void shouldSuccessfullyEvaluateInterceptedTraversals() throws Exception { test(6l, g.V().out().values("name").count()); test(2l, g.V().out("knows").values("name").count()); test(3l, g.V().in().has("name", "marko").count()); -test(6l, g.V().repeat(__.dedup()).times(2).count()); +test(0l, g.V().repeat(__.dedup()).times(2).count()); --- End diff -- As discussed in IM. This is actually a bug in `RepeatUnrollStrategy` that was introduced in 3.2, but doesn't exist in 3.1. Added a test case to `DedupTest` and both OLTP and OLAP now behave as expected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tinkerpop pull request #520: TINKERPOP-1130 IO Testing
Github user asfgit closed the pull request at: https://github.com/apache/tinkerpop/pull/520 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (TINKERPOP-1130) Each release should store Kryo/GraphSON/GraphML versions to ensure future compatibility
[ https://issues.apache.org/jira/browse/TINKERPOP-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stephen mallette closed TINKERPOP-1130. --- Resolution: Done Assignee: stephen mallette Fix Version/s: 3.3.0 > Each release should store Kryo/GraphSON/GraphML versions to ensure future > compatibility > --- > > Key: TINKERPOP-1130 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1130 > Project: TinkerPop > Issue Type: Improvement > Components: io, test-suite >Affects Versions: 3.1.1-incubating >Reporter: Marko A. Rodriguez >Assignee: stephen mallette > Labels: breaking > Fix For: 3.3.0 > > > I think we should make a new toy data set that has all the graph structure > features in it -- vertices, edges, vertex properties, multi-properties, > meta-properties, graph variables, different edge labels with different > property keys, etc. etc. > The graph doesn't have to be big, it just needs to cover all the features. > Next, we should then stamp out a version of that file at every release: > {code} > graph-test-x.y.z.xml > graph-test-x.y.z.kryo > graph-test-x.y.z.json > graph-test-x.y.z-typed.json > {code} > Then we should have a test case that verifies that the current SNAPSHOT > {{GryoReader}}, {{GraphSONReader}}, {{GraphMLReader}}, etc. can still read > those files. If they can't, then we have introduced a change in our > serialization format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TINKERPOP-1130) Each release should store Kryo/GraphSON/GraphML versions to ensure future compatibility
[ https://issues.apache.org/jira/browse/TINKERPOP-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805517#comment-15805517 ] ASF GitHub Bot commented on TINKERPOP-1130: --- Github user asfgit closed the pull request at: https://github.com/apache/tinkerpop/pull/520 > Each release should store Kryo/GraphSON/GraphML versions to ensure future > compatibility > --- > > Key: TINKERPOP-1130 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1130 > Project: TinkerPop > Issue Type: Improvement > Components: io, test-suite >Affects Versions: 3.1.1-incubating >Reporter: Marko A. Rodriguez > Labels: breaking > > I think we should make a new toy data set that has all the graph structure > features in it -- vertices, edges, vertex properties, multi-properties, > meta-properties, graph variables, different edge labels with different > property keys, etc. etc. > The graph doesn't have to be big, it just needs to cover all the features. > Next, we should then stamp out a version of that file at every release: > {code} > graph-test-x.y.z.xml > graph-test-x.y.z.kryo > graph-test-x.y.z.json > graph-test-x.y.z-typed.json > {code} > Then we should have a test case that verifies that the current SNAPSHOT > {{GryoReader}}, {{GraphSONReader}}, {{GraphMLReader}}, etc. can still read > those files. If they can't, then we have introduced a change in our > serialization format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TINKERPOP-1585) OLAP dedup over non elements
[ https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805510#comment-15805510 ] ASF GitHub Bot commented on TINKERPOP-1585: --- Github user dkuppitz commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/524#discussion_r95006502 --- Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/DedupGlobalStep.java --- @@ -89,6 +92,17 @@ public ElementRequirement getMaxRequirement() { @Override protected Traverser.Admin processNextStart() { +if (null != this.barrier) { +this.barrierIterator = this.barrier.entrySet().iterator(); +this.barrier = null; +} +while (this.barrierIterator != null && this.barrierIterator.hasNext()) { +if (null == this.barrierIterator) --- End diff -- I didn't run it. Just thought we can get rid of the multiple `null` checks. > OLAP dedup over non elements > > > Key: TINKERPOP-1585 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1585 > Project: TinkerPop > Issue Type: Bug > Components: hadoop, process >Affects Versions: 3.2.3 >Reporter: Daniel Kuppitz >Assignee: Marko A. Rodriguez > > OLAP {{dedup()}} is highly inefficient when it's fed with non elements. > In a customer project a query similar tho the following returned a result in > slightly more than 6 seconds: > {noformat} > persistedRDD. > V().hasLabel("label1","label2"). > inE("edgeLabel1","edgeLabel2").outV(). > id().count() > {noformat} > The same query with {{dedup()}} added: > {noformat} > persistedRDD. > V().hasLabel("label1","label2"). > inE("edgeLabel1","edgeLabel2").outV(). > id().dedup().count() > {noformat} > ...took more than 120 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] tinkerpop pull request #524: TINKERPOP-1585 & TINKERPOP-1590: DedupGlobalSte...
Github user dkuppitz commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/524#discussion_r95006502 --- Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/DedupGlobalStep.java --- @@ -89,6 +92,17 @@ public ElementRequirement getMaxRequirement() { @Override protected Traverser.Admin processNextStart() { +if (null != this.barrier) { +this.barrierIterator = this.barrier.entrySet().iterator(); +this.barrier = null; +} +while (this.barrierIterator != null && this.barrierIterator.hasNext()) { +if (null == this.barrierIterator) --- End diff -- I didn't run it. Just thought we can get rid of the multiple `null` checks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (TINKERPOP-1585) OLAP dedup over non elements
[ https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805327#comment-15805327 ] ASF GitHub Bot commented on TINKERPOP-1585: --- Github user okram commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/524#discussion_r95001231 --- Diff: spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/traversal/strategy/optimization/SparkInterceptorStrategyTest.java --- @@ -142,7 +142,7 @@ public void shouldSuccessfullyEvaluateInterceptedTraversals() throws Exception { test(6l, g.V().out().values("name").count()); test(2l, g.V().out("knows").values("name").count()); test(3l, g.V().in().has("name", "marko").count()); -test(6l, g.V().repeat(__.dedup()).times(2).count()); +test(0l, g.V().repeat(__.dedup()).times(2).count()); --- End diff -- This is crazy. The right answer is `0`. Think about it. It goes through once, fine, all 6 vertices. The second time -- already seem them! Filter. Thus, 0. However, I just checked OLTP and it assumes 6. Wondering what is "right" ?. ``` gremlin> g = TinkerFactory.createModern().traversal() ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard] gremlin> g.V().repeat(dedup()).times(2) ==>v[1] ==>v[2] ==>v[3] ==>v[4] ==>v[5] ==>v[6] gremlin> g.withComputer().V().repeat(dedup()).times(2) gremlin> ``` Whatever, is decided as the correct answer, we should definitely put this into `RepeatTest`. I just randomly had this query in a Spark test. > OLAP dedup over non elements > > > Key: TINKERPOP-1585 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1585 > Project: TinkerPop > Issue Type: Bug > Components: hadoop, process >Affects Versions: 3.2.3 >Reporter: Daniel Kuppitz >Assignee: Marko A. Rodriguez > > OLAP {{dedup()}} is highly inefficient when it's fed with non elements. > In a customer project a query similar tho the following returned a result in > slightly more than 6 seconds: > {noformat} > persistedRDD. > V().hasLabel("label1","label2"). > inE("edgeLabel1","edgeLabel2").outV(). > id().count() > {noformat} > The same query with {{dedup()}} added: > {noformat} > persistedRDD. > V().hasLabel("label1","label2"). > inE("edgeLabel1","edgeLabel2").outV(). > id().dedup().count() > {noformat} > ...took more than 120 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] tinkerpop pull request #524: TINKERPOP-1585 & TINKERPOP-1590: DedupGlobalSte...
Github user okram commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/524#discussion_r95001231 --- Diff: spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/traversal/strategy/optimization/SparkInterceptorStrategyTest.java --- @@ -142,7 +142,7 @@ public void shouldSuccessfullyEvaluateInterceptedTraversals() throws Exception { test(6l, g.V().out().values("name").count()); test(2l, g.V().out("knows").values("name").count()); test(3l, g.V().in().has("name", "marko").count()); -test(6l, g.V().repeat(__.dedup()).times(2).count()); +test(0l, g.V().repeat(__.dedup()).times(2).count()); --- End diff -- This is crazy. The right answer is `0`. Think about it. It goes through once, fine, all 6 vertices. The second time -- already seem them! Filter. Thus, 0. However, I just checked OLTP and it assumes 6. Wondering what is "right" ?. ``` gremlin> g = TinkerFactory.createModern().traversal() ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard] gremlin> g.V().repeat(dedup()).times(2) ==>v[1] ==>v[2] ==>v[3] ==>v[4] ==>v[5] ==>v[6] gremlin> g.withComputer().V().repeat(dedup()).times(2) gremlin> ``` Whatever, is decided as the correct answer, we should definitely put this into `RepeatTest`. I just randomly had this query in a Spark test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] tinkerpop pull request #524: TINKERPOP-1585 & TINKERPOP-1590: DedupGlobalSte...
Github user okram commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/524#discussion_r95000938 --- Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/DedupGlobalStep.java --- @@ -89,6 +92,17 @@ public ElementRequirement getMaxRequirement() { @Override protected Traverser.Admin processNextStart() { +if (null != this.barrier) { +this.barrierIterator = this.barrier.entrySet().iterator(); +this.barrier = null; +} +while (this.barrierIterator != null && this.barrierIterator.hasNext()) { +if (null == this.barrierIterator) --- End diff -- Your code fails. Did you run it? ... The problem is that `this.barrier` is null'd but the `barrierIterator` still exists and you still need to fetch results from it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (TINKERPOP-1585) OLAP dedup over non elements
[ https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805321#comment-15805321 ] ASF GitHub Bot commented on TINKERPOP-1585: --- Github user okram commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/524#discussion_r95000938 --- Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/DedupGlobalStep.java --- @@ -89,6 +92,17 @@ public ElementRequirement getMaxRequirement() { @Override protected Traverser.Admin processNextStart() { +if (null != this.barrier) { +this.barrierIterator = this.barrier.entrySet().iterator(); +this.barrier = null; +} +while (this.barrierIterator != null && this.barrierIterator.hasNext()) { +if (null == this.barrierIterator) --- End diff -- Your code fails. Did you run it? ... The problem is that `this.barrier` is null'd but the `barrierIterator` still exists and you still need to fetch results from it. > OLAP dedup over non elements > > > Key: TINKERPOP-1585 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1585 > Project: TinkerPop > Issue Type: Bug > Components: hadoop, process >Affects Versions: 3.2.3 >Reporter: Daniel Kuppitz >Assignee: Marko A. Rodriguez > > OLAP {{dedup()}} is highly inefficient when it's fed with non elements. > In a customer project a query similar tho the following returned a result in > slightly more than 6 seconds: > {noformat} > persistedRDD. > V().hasLabel("label1","label2"). > inE("edgeLabel1","edgeLabel2").outV(). > id().count() > {noformat} > The same query with {{dedup()}} added: > {noformat} > persistedRDD. > V().hasLabel("label1","label2"). > inE("edgeLabel1","edgeLabel2").outV(). > id().dedup().count() > {noformat} > ...took more than 120 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] tinkerpop issue #520: TINKERPOP-1130 IO Testing
Github user dkuppitz commented on the issue: https://github.com/apache/tinkerpop/pull/520 `docker/build.sh -t -i` succeeded. VOTE: +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (TINKERPOP-1130) Each release should store Kryo/GraphSON/GraphML versions to ensure future compatibility
[ https://issues.apache.org/jira/browse/TINKERPOP-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804642#comment-15804642 ] ASF GitHub Bot commented on TINKERPOP-1130: --- Github user dkuppitz commented on the issue: https://github.com/apache/tinkerpop/pull/520 `docker/build.sh -t -i` succeeded. VOTE: +1 > Each release should store Kryo/GraphSON/GraphML versions to ensure future > compatibility > --- > > Key: TINKERPOP-1130 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1130 > Project: TinkerPop > Issue Type: Improvement > Components: io, test-suite >Affects Versions: 3.1.1-incubating >Reporter: Marko A. Rodriguez > Labels: breaking > > I think we should make a new toy data set that has all the graph structure > features in it -- vertices, edges, vertex properties, multi-properties, > meta-properties, graph variables, different edge labels with different > property keys, etc. etc. > The graph doesn't have to be big, it just needs to cover all the features. > Next, we should then stamp out a version of that file at every release: > {code} > graph-test-x.y.z.xml > graph-test-x.y.z.kryo > graph-test-x.y.z.json > graph-test-x.y.z-typed.json > {code} > Then we should have a test case that verifies that the current SNAPSHOT > {{GryoReader}}, {{GraphSONReader}}, {{GraphMLReader}}, etc. can still read > those files. If they can't, then we have introduced a change in our > serialization format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TINKERPOP-1585) OLAP dedup over non elements
[ https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804224#comment-15804224 ] ASF GitHub Bot commented on TINKERPOP-1585: --- Github user dkuppitz commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/524#discussion_r94925652 --- Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/DedupGlobalStep.java --- @@ -89,6 +92,17 @@ public ElementRequirement getMaxRequirement() { @Override protected Traverser.Admin processNextStart() { +if (null != this.barrier) { +this.barrierIterator = this.barrier.entrySet().iterator(); +this.barrier = null; +} +while (this.barrierIterator != null && this.barrierIterator.hasNext()) { +if (null == this.barrierIterator) --- End diff -- `this.barrierIterator` can never be null within within the `while()` loop. Unless I overlooked something fundamental, `processNextStart` can be simplified to: ``` protected Traverser.Admin processNextStart() { if (null != this.barrier) { this.barrierIterator = this.barrier.entrySet().iterator(); this.barrier = null; while (this.barrierIterator.hasNext()) { final Map.Entry
[jira] [Commented] (TINKERPOP-1585) OLAP dedup over non elements
[ https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804225#comment-15804225 ] ASF GitHub Bot commented on TINKERPOP-1585: --- Github user dkuppitz commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/524#discussion_r94926376 --- Diff: spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/traversal/strategy/optimization/SparkInterceptorStrategyTest.java --- @@ -142,7 +142,7 @@ public void shouldSuccessfullyEvaluateInterceptedTraversals() throws Exception { test(6l, g.V().out().values("name").count()); test(2l, g.V().out("knows").values("name").count()); test(3l, g.V().in().has("name", "marko").count()); -test(6l, g.V().repeat(__.dedup()).times(2).count()); +test(0l, g.V().repeat(__.dedup()).times(2).count()); --- End diff -- Why did that change? `0` as the expected result looks kinda wrong. > OLAP dedup over non elements > > > Key: TINKERPOP-1585 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1585 > Project: TinkerPop > Issue Type: Bug > Components: hadoop, process >Affects Versions: 3.2.3 >Reporter: Daniel Kuppitz >Assignee: Marko A. Rodriguez > > OLAP {{dedup()}} is highly inefficient when it's fed with non elements. > In a customer project a query similar tho the following returned a result in > slightly more than 6 seconds: > {noformat} > persistedRDD. > V().hasLabel("label1","label2"). > inE("edgeLabel1","edgeLabel2").outV(). > id().count() > {noformat} > The same query with {{dedup()}} added: > {noformat} > persistedRDD. > V().hasLabel("label1","label2"). > inE("edgeLabel1","edgeLabel2").outV(). > id().dedup().count() > {noformat} > ...took more than 120 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] tinkerpop pull request #524: TINKERPOP-1585 & TINKERPOP-1590: DedupGlobalSte...
Github user dkuppitz commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/524#discussion_r94925652 --- Diff: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/DedupGlobalStep.java --- @@ -89,6 +92,17 @@ public ElementRequirement getMaxRequirement() { @Override protected Traverser.Admin processNextStart() { +if (null != this.barrier) { +this.barrierIterator = this.barrier.entrySet().iterator(); +this.barrier = null; +} +while (this.barrierIterator != null && this.barrierIterator.hasNext()) { +if (null == this.barrierIterator) --- End diff -- `this.barrierIterator` can never be null within within the `while()` loop. Unless I overlooked something fundamental, `processNextStart` can be simplified to: ``` protected Traverser.Admin processNextStart() { if (null != this.barrier) { this.barrierIterator = this.barrier.entrySet().iterator(); this.barrier = null; while (this.barrierIterator.hasNext()) { final Map.Entry
[GitHub] tinkerpop pull request #524: TINKERPOP-1585 & TINKERPOP-1590: DedupGlobalSte...
Github user dkuppitz commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/524#discussion_r94926376 --- Diff: spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/traversal/strategy/optimization/SparkInterceptorStrategyTest.java --- @@ -142,7 +142,7 @@ public void shouldSuccessfullyEvaluateInterceptedTraversals() throws Exception { test(6l, g.V().out().values("name").count()); test(2l, g.V().out("knows").values("name").count()); test(3l, g.V().in().has("name", "marko").count()); -test(6l, g.V().repeat(__.dedup()).times(2).count()); +test(0l, g.V().repeat(__.dedup()).times(2).count()); --- End diff -- Why did that change? `0` as the expected result looks kinda wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---