[jira] [Commented] (TINKERPOP-1585) OLAP dedup over non elements

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806270#comment-15806270
 ] 

ASF GitHub Bot commented on TINKERPOP-1585:
---

Github user dkuppitz commented on the issue:

https://github.com/apache/tinkerpop/pull/524
  
`docker/build.sh -t -i` succeeded.

VOTE: +1


> OLAP dedup over non elements
> 
>
> Key: TINKERPOP-1585
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1585
> Project: TinkerPop
>  Issue Type: Bug
>  Components: hadoop, process
>Affects Versions: 3.2.3
>Reporter: Daniel Kuppitz
>Assignee: Marko A. Rodriguez
>
> OLAP {{dedup()}} is highly inefficient when it's fed with non elements.
> In a customer project a query similar tho the following returned a result in 
> slightly more than 6 seconds:
> {noformat}
> persistedRDD.
>   V().hasLabel("label1","label2").
>   inE("edgeLabel1","edgeLabel2").outV().
>   id().count()
> {noformat}
> The same query with {{dedup()}} added:
> {noformat}
> persistedRDD.
>   V().hasLabel("label1","label2").
>   inE("edgeLabel1","edgeLabel2").outV().
>   id().dedup().count()
> {noformat}
> ...took more than 120 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TINKERPOP-1443) Use an API checker during build

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806023#comment-15806023
 ] 

ASF GitHub Bot commented on TINKERPOP-1443:
---

Github user metlos commented on the issue:

https://github.com/apache/tinkerpop/pull/494
  
Hmm, maven central seems to be taking its time - it still doesn't have the 
latest revapi-java version available (which is why CI failed for the latest 
commit).

But anyway, I've updated the versions to the latest, so you can move on to 
the internal branch. Let's hope maven central gets synced soon.

Thanks!


> Use an API checker during build
> ---
>
> Key: TINKERPOP-1443
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1443
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: build-release
>Affects Versions: 3.2.2
>Reporter: Lukas Krejci
>
> Tinkerpop 3.2.2 changed the signature of the method 
> {{GraphTraversal.hasLabel}} from {{(String...)}} to {{(String, String...)}}. 
> While this is certainly an improvement, it is both source and binary 
> incompatible change.
> I.e. even if every usage of {{hasLabel}} had at least one parameter in the 
> user code, none of those calls will work until all the user code is 
> recompiled using Tinkerpop 3.2.2.
> I don't know the versioning policy of Tinkerpop but changes like the above in 
> a micro/patch release are generally unexpected.
> Please consider API checkers like http://revapi.org to warn about such 
> incompatible API changes...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] tinkerpop issue #494: TINKERPOP-1443 - Introduce API check into the build

2017-01-06 Thread metlos
Github user metlos commented on the issue:

https://github.com/apache/tinkerpop/pull/494
  
Hmm, maven central seems to be taking its time - it still doesn't have the 
latest revapi-java version available (which is why CI failed for the latest 
commit).

But anyway, I've updated the versions to the latest, so you can move on to 
the internal branch. Let's hope maven central gets synced soon.

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TINKERPOP-1585) OLAP dedup over non elements

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805596#comment-15805596
 ] 

ASF GitHub Bot commented on TINKERPOP-1585:
---

Github user okram commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/524#discussion_r95010996
  
--- Diff: 
spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/traversal/strategy/optimization/SparkInterceptorStrategyTest.java
 ---
@@ -142,7 +142,7 @@ public void 
shouldSuccessfullyEvaluateInterceptedTraversals() throws Exception {
 test(6l, g.V().out().values("name").count());
 test(2l, g.V().out("knows").values("name").count());
 test(3l, g.V().in().has("name", "marko").count());
-test(6l, g.V().repeat(__.dedup()).times(2).count());
+test(0l, g.V().repeat(__.dedup()).times(2).count());
--- End diff --

As discussed in IM. This is actually a bug in `RepeatUnrollStrategy` that 
was introduced in 3.2, but doesn't exist in 3.1. Added a test case to 
`DedupTest` and both OLTP and OLAP now behave as expected.


> OLAP dedup over non elements
> 
>
> Key: TINKERPOP-1585
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1585
> Project: TinkerPop
>  Issue Type: Bug
>  Components: hadoop, process
>Affects Versions: 3.2.3
>Reporter: Daniel Kuppitz
>Assignee: Marko A. Rodriguez
>
> OLAP {{dedup()}} is highly inefficient when it's fed with non elements.
> In a customer project a query similar tho the following returned a result in 
> slightly more than 6 seconds:
> {noformat}
> persistedRDD.
>   V().hasLabel("label1","label2").
>   inE("edgeLabel1","edgeLabel2").outV().
>   id().count()
> {noformat}
> The same query with {{dedup()}} added:
> {noformat}
> persistedRDD.
>   V().hasLabel("label1","label2").
>   inE("edgeLabel1","edgeLabel2").outV().
>   id().dedup().count()
> {noformat}
> ...took more than 120 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] tinkerpop pull request #524: TINKERPOP-1585 & TINKERPOP-1590: DedupGlobalSte...

2017-01-06 Thread okram
Github user okram commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/524#discussion_r95010996
  
--- Diff: 
spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/traversal/strategy/optimization/SparkInterceptorStrategyTest.java
 ---
@@ -142,7 +142,7 @@ public void 
shouldSuccessfullyEvaluateInterceptedTraversals() throws Exception {
 test(6l, g.V().out().values("name").count());
 test(2l, g.V().out("knows").values("name").count());
 test(3l, g.V().in().has("name", "marko").count());
-test(6l, g.V().repeat(__.dedup()).times(2).count());
+test(0l, g.V().repeat(__.dedup()).times(2).count());
--- End diff --

As discussed in IM. This is actually a bug in `RepeatUnrollStrategy` that 
was introduced in 3.2, but doesn't exist in 3.1. Added a test case to 
`DedupTest` and both OLTP and OLAP now behave as expected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tinkerpop pull request #520: TINKERPOP-1130 IO Testing

2017-01-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/tinkerpop/pull/520


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (TINKERPOP-1130) Each release should store Kryo/GraphSON/GraphML versions to ensure future compatibility

2017-01-06 Thread stephen mallette (JIRA)

 [ 
https://issues.apache.org/jira/browse/TINKERPOP-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stephen mallette closed TINKERPOP-1130.
---
   Resolution: Done
 Assignee: stephen mallette
Fix Version/s: 3.3.0

> Each release should store Kryo/GraphSON/GraphML versions to ensure future 
> compatibility
> ---
>
> Key: TINKERPOP-1130
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1130
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: io, test-suite
>Affects Versions: 3.1.1-incubating
>Reporter: Marko A. Rodriguez
>Assignee: stephen mallette
>  Labels: breaking
> Fix For: 3.3.0
>
>
> I think we should make a new toy data set that has all the graph structure 
> features in it -- vertices, edges, vertex properties, multi-properties, 
> meta-properties, graph variables, different edge labels with different 
> property keys, etc. etc.
> The graph doesn't have to be big, it just needs to cover all the features. 
> Next, we should then stamp out a version of that file at every release:
> {code}
> graph-test-x.y.z.xml
> graph-test-x.y.z.kryo
> graph-test-x.y.z.json
> graph-test-x.y.z-typed.json
> {code}
> Then we should have a test case that verifies that the current SNAPSHOT 
> {{GryoReader}}, {{GraphSONReader}}, {{GraphMLReader}}, etc. can still read 
> those files. If they can't, then we have introduced a change in our 
> serialization format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TINKERPOP-1130) Each release should store Kryo/GraphSON/GraphML versions to ensure future compatibility

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805517#comment-15805517
 ] 

ASF GitHub Bot commented on TINKERPOP-1130:
---

Github user asfgit closed the pull request at:

https://github.com/apache/tinkerpop/pull/520


> Each release should store Kryo/GraphSON/GraphML versions to ensure future 
> compatibility
> ---
>
> Key: TINKERPOP-1130
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1130
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: io, test-suite
>Affects Versions: 3.1.1-incubating
>Reporter: Marko A. Rodriguez
>  Labels: breaking
>
> I think we should make a new toy data set that has all the graph structure 
> features in it -- vertices, edges, vertex properties, multi-properties, 
> meta-properties, graph variables, different edge labels with different 
> property keys, etc. etc.
> The graph doesn't have to be big, it just needs to cover all the features. 
> Next, we should then stamp out a version of that file at every release:
> {code}
> graph-test-x.y.z.xml
> graph-test-x.y.z.kryo
> graph-test-x.y.z.json
> graph-test-x.y.z-typed.json
> {code}
> Then we should have a test case that verifies that the current SNAPSHOT 
> {{GryoReader}}, {{GraphSONReader}}, {{GraphMLReader}}, etc. can still read 
> those files. If they can't, then we have introduced a change in our 
> serialization format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TINKERPOP-1585) OLAP dedup over non elements

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805510#comment-15805510
 ] 

ASF GitHub Bot commented on TINKERPOP-1585:
---

Github user dkuppitz commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/524#discussion_r95006502
  
--- Diff: 
gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/DedupGlobalStep.java
 ---
@@ -89,6 +92,17 @@ public ElementRequirement getMaxRequirement() {
 
 @Override
 protected Traverser.Admin processNextStart() {
+if (null != this.barrier) {
+this.barrierIterator = this.barrier.entrySet().iterator();
+this.barrier = null;
+}
+while (this.barrierIterator != null && 
this.barrierIterator.hasNext()) {
+if (null == this.barrierIterator)
--- End diff --

I didn't run it. Just thought we can get rid of the multiple `null` checks.


> OLAP dedup over non elements
> 
>
> Key: TINKERPOP-1585
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1585
> Project: TinkerPop
>  Issue Type: Bug
>  Components: hadoop, process
>Affects Versions: 3.2.3
>Reporter: Daniel Kuppitz
>Assignee: Marko A. Rodriguez
>
> OLAP {{dedup()}} is highly inefficient when it's fed with non elements.
> In a customer project a query similar tho the following returned a result in 
> slightly more than 6 seconds:
> {noformat}
> persistedRDD.
>   V().hasLabel("label1","label2").
>   inE("edgeLabel1","edgeLabel2").outV().
>   id().count()
> {noformat}
> The same query with {{dedup()}} added:
> {noformat}
> persistedRDD.
>   V().hasLabel("label1","label2").
>   inE("edgeLabel1","edgeLabel2").outV().
>   id().dedup().count()
> {noformat}
> ...took more than 120 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] tinkerpop pull request #524: TINKERPOP-1585 & TINKERPOP-1590: DedupGlobalSte...

2017-01-06 Thread dkuppitz
Github user dkuppitz commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/524#discussion_r95006502
  
--- Diff: 
gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/DedupGlobalStep.java
 ---
@@ -89,6 +92,17 @@ public ElementRequirement getMaxRequirement() {
 
 @Override
 protected Traverser.Admin processNextStart() {
+if (null != this.barrier) {
+this.barrierIterator = this.barrier.entrySet().iterator();
+this.barrier = null;
+}
+while (this.barrierIterator != null && 
this.barrierIterator.hasNext()) {
+if (null == this.barrierIterator)
--- End diff --

I didn't run it. Just thought we can get rid of the multiple `null` checks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TINKERPOP-1585) OLAP dedup over non elements

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805327#comment-15805327
 ] 

ASF GitHub Bot commented on TINKERPOP-1585:
---

Github user okram commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/524#discussion_r95001231
  
--- Diff: 
spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/traversal/strategy/optimization/SparkInterceptorStrategyTest.java
 ---
@@ -142,7 +142,7 @@ public void 
shouldSuccessfullyEvaluateInterceptedTraversals() throws Exception {
 test(6l, g.V().out().values("name").count());
 test(2l, g.V().out("knows").values("name").count());
 test(3l, g.V().in().has("name", "marko").count());
-test(6l, g.V().repeat(__.dedup()).times(2).count());
+test(0l, g.V().repeat(__.dedup()).times(2).count());
--- End diff --

This is crazy. The right answer is `0`. Think about it. It goes through 
once, fine, all 6 vertices. The second time -- already seem them! Filter. Thus, 
0.

However, I just checked OLTP and it assumes 6. Wondering what is "right" ?. 
 

```
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().repeat(dedup()).times(2)
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
gremlin> g.withComputer().V().repeat(dedup()).times(2)
gremlin>
```

Whatever, is decided as the correct answer, we should definitely put this 
into `RepeatTest`. I just randomly had this query in a Spark test.



> OLAP dedup over non elements
> 
>
> Key: TINKERPOP-1585
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1585
> Project: TinkerPop
>  Issue Type: Bug
>  Components: hadoop, process
>Affects Versions: 3.2.3
>Reporter: Daniel Kuppitz
>Assignee: Marko A. Rodriguez
>
> OLAP {{dedup()}} is highly inefficient when it's fed with non elements.
> In a customer project a query similar tho the following returned a result in 
> slightly more than 6 seconds:
> {noformat}
> persistedRDD.
>   V().hasLabel("label1","label2").
>   inE("edgeLabel1","edgeLabel2").outV().
>   id().count()
> {noformat}
> The same query with {{dedup()}} added:
> {noformat}
> persistedRDD.
>   V().hasLabel("label1","label2").
>   inE("edgeLabel1","edgeLabel2").outV().
>   id().dedup().count()
> {noformat}
> ...took more than 120 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] tinkerpop pull request #524: TINKERPOP-1585 & TINKERPOP-1590: DedupGlobalSte...

2017-01-06 Thread okram
Github user okram commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/524#discussion_r95001231
  
--- Diff: 
spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/traversal/strategy/optimization/SparkInterceptorStrategyTest.java
 ---
@@ -142,7 +142,7 @@ public void 
shouldSuccessfullyEvaluateInterceptedTraversals() throws Exception {
 test(6l, g.V().out().values("name").count());
 test(2l, g.V().out("knows").values("name").count());
 test(3l, g.V().in().has("name", "marko").count());
-test(6l, g.V().repeat(__.dedup()).times(2).count());
+test(0l, g.V().repeat(__.dedup()).times(2).count());
--- End diff --

This is crazy. The right answer is `0`. Think about it. It goes through 
once, fine, all 6 vertices. The second time -- already seem them! Filter. Thus, 
0.

However, I just checked OLTP and it assumes 6. Wondering what is "right" ?. 
 

```
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().repeat(dedup()).times(2)
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
gremlin> g.withComputer().V().repeat(dedup()).times(2)
gremlin>
```

Whatever, is decided as the correct answer, we should definitely put this 
into `RepeatTest`. I just randomly had this query in a Spark test.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tinkerpop pull request #524: TINKERPOP-1585 & TINKERPOP-1590: DedupGlobalSte...

2017-01-06 Thread okram
Github user okram commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/524#discussion_r95000938
  
--- Diff: 
gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/DedupGlobalStep.java
 ---
@@ -89,6 +92,17 @@ public ElementRequirement getMaxRequirement() {
 
 @Override
 protected Traverser.Admin processNextStart() {
+if (null != this.barrier) {
+this.barrierIterator = this.barrier.entrySet().iterator();
+this.barrier = null;
+}
+while (this.barrierIterator != null && 
this.barrierIterator.hasNext()) {
+if (null == this.barrierIterator)
--- End diff --

Your code fails. Did you run it? ... The problem is that `this.barrier` is 
null'd but the `barrierIterator` still exists and you still need to fetch 
results from it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TINKERPOP-1585) OLAP dedup over non elements

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805321#comment-15805321
 ] 

ASF GitHub Bot commented on TINKERPOP-1585:
---

Github user okram commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/524#discussion_r95000938
  
--- Diff: 
gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/DedupGlobalStep.java
 ---
@@ -89,6 +92,17 @@ public ElementRequirement getMaxRequirement() {
 
 @Override
 protected Traverser.Admin processNextStart() {
+if (null != this.barrier) {
+this.barrierIterator = this.barrier.entrySet().iterator();
+this.barrier = null;
+}
+while (this.barrierIterator != null && 
this.barrierIterator.hasNext()) {
+if (null == this.barrierIterator)
--- End diff --

Your code fails. Did you run it? ... The problem is that `this.barrier` is 
null'd but the `barrierIterator` still exists and you still need to fetch 
results from it.


> OLAP dedup over non elements
> 
>
> Key: TINKERPOP-1585
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1585
> Project: TinkerPop
>  Issue Type: Bug
>  Components: hadoop, process
>Affects Versions: 3.2.3
>Reporter: Daniel Kuppitz
>Assignee: Marko A. Rodriguez
>
> OLAP {{dedup()}} is highly inefficient when it's fed with non elements.
> In a customer project a query similar tho the following returned a result in 
> slightly more than 6 seconds:
> {noformat}
> persistedRDD.
>   V().hasLabel("label1","label2").
>   inE("edgeLabel1","edgeLabel2").outV().
>   id().count()
> {noformat}
> The same query with {{dedup()}} added:
> {noformat}
> persistedRDD.
>   V().hasLabel("label1","label2").
>   inE("edgeLabel1","edgeLabel2").outV().
>   id().dedup().count()
> {noformat}
> ...took more than 120 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] tinkerpop issue #520: TINKERPOP-1130 IO Testing

2017-01-06 Thread dkuppitz
Github user dkuppitz commented on the issue:

https://github.com/apache/tinkerpop/pull/520
  
`docker/build.sh -t -i` succeeded.

VOTE: +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TINKERPOP-1130) Each release should store Kryo/GraphSON/GraphML versions to ensure future compatibility

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804642#comment-15804642
 ] 

ASF GitHub Bot commented on TINKERPOP-1130:
---

Github user dkuppitz commented on the issue:

https://github.com/apache/tinkerpop/pull/520
  
`docker/build.sh -t -i` succeeded.

VOTE: +1


> Each release should store Kryo/GraphSON/GraphML versions to ensure future 
> compatibility
> ---
>
> Key: TINKERPOP-1130
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1130
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: io, test-suite
>Affects Versions: 3.1.1-incubating
>Reporter: Marko A. Rodriguez
>  Labels: breaking
>
> I think we should make a new toy data set that has all the graph structure 
> features in it -- vertices, edges, vertex properties, multi-properties, 
> meta-properties, graph variables, different edge labels with different 
> property keys, etc. etc.
> The graph doesn't have to be big, it just needs to cover all the features. 
> Next, we should then stamp out a version of that file at every release:
> {code}
> graph-test-x.y.z.xml
> graph-test-x.y.z.kryo
> graph-test-x.y.z.json
> graph-test-x.y.z-typed.json
> {code}
> Then we should have a test case that verifies that the current SNAPSHOT 
> {{GryoReader}}, {{GraphSONReader}}, {{GraphMLReader}}, etc. can still read 
> those files. If they can't, then we have introduced a change in our 
> serialization format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TINKERPOP-1585) OLAP dedup over non elements

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804224#comment-15804224
 ] 

ASF GitHub Bot commented on TINKERPOP-1585:
---

Github user dkuppitz commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/524#discussion_r94925652
  
--- Diff: 
gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/DedupGlobalStep.java
 ---
@@ -89,6 +92,17 @@ public ElementRequirement getMaxRequirement() {
 
 @Override
 protected Traverser.Admin processNextStart() {
+if (null != this.barrier) {
+this.barrierIterator = this.barrier.entrySet().iterator();
+this.barrier = null;
+}
+while (this.barrierIterator != null && 
this.barrierIterator.hasNext()) {
+if (null == this.barrierIterator)
--- End diff --

`this.barrierIterator` can never be null within within the `while()` loop. 
Unless I overlooked something fundamental, `processNextStart` can be simplified 
to:

```
protected Traverser.Admin processNextStart() {
if (null != this.barrier) {
this.barrierIterator = this.barrier.entrySet().iterator();
this.barrier = null;
while (this.barrierIterator.hasNext()) {
final Map.Entry entry = 
this.barrierIterator.next();
if (this.duplicateSet.add(entry.getKey()))
return 
PathProcessor.processTraverserPathLabels(entry.getValue(), this.keepLabels);
}
}
return 
PathProcessor.processTraverserPathLabels(super.processNextStart(), 
this.keepLabels);
}
```


> OLAP dedup over non elements
> 
>
> Key: TINKERPOP-1585
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1585
> Project: TinkerPop
>  Issue Type: Bug
>  Components: hadoop, process
>Affects Versions: 3.2.3
>Reporter: Daniel Kuppitz
>Assignee: Marko A. Rodriguez
>
> OLAP {{dedup()}} is highly inefficient when it's fed with non elements.
> In a customer project a query similar tho the following returned a result in 
> slightly more than 6 seconds:
> {noformat}
> persistedRDD.
>   V().hasLabel("label1","label2").
>   inE("edgeLabel1","edgeLabel2").outV().
>   id().count()
> {noformat}
> The same query with {{dedup()}} added:
> {noformat}
> persistedRDD.
>   V().hasLabel("label1","label2").
>   inE("edgeLabel1","edgeLabel2").outV().
>   id().dedup().count()
> {noformat}
> ...took more than 120 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TINKERPOP-1585) OLAP dedup over non elements

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15804225#comment-15804225
 ] 

ASF GitHub Bot commented on TINKERPOP-1585:
---

Github user dkuppitz commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/524#discussion_r94926376
  
--- Diff: 
spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/traversal/strategy/optimization/SparkInterceptorStrategyTest.java
 ---
@@ -142,7 +142,7 @@ public void 
shouldSuccessfullyEvaluateInterceptedTraversals() throws Exception {
 test(6l, g.V().out().values("name").count());
 test(2l, g.V().out("knows").values("name").count());
 test(3l, g.V().in().has("name", "marko").count());
-test(6l, g.V().repeat(__.dedup()).times(2).count());
+test(0l, g.V().repeat(__.dedup()).times(2).count());
--- End diff --

Why did that change? `0` as the expected result looks kinda wrong.


> OLAP dedup over non elements
> 
>
> Key: TINKERPOP-1585
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1585
> Project: TinkerPop
>  Issue Type: Bug
>  Components: hadoop, process
>Affects Versions: 3.2.3
>Reporter: Daniel Kuppitz
>Assignee: Marko A. Rodriguez
>
> OLAP {{dedup()}} is highly inefficient when it's fed with non elements.
> In a customer project a query similar tho the following returned a result in 
> slightly more than 6 seconds:
> {noformat}
> persistedRDD.
>   V().hasLabel("label1","label2").
>   inE("edgeLabel1","edgeLabel2").outV().
>   id().count()
> {noformat}
> The same query with {{dedup()}} added:
> {noformat}
> persistedRDD.
>   V().hasLabel("label1","label2").
>   inE("edgeLabel1","edgeLabel2").outV().
>   id().dedup().count()
> {noformat}
> ...took more than 120 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] tinkerpop pull request #524: TINKERPOP-1585 & TINKERPOP-1590: DedupGlobalSte...

2017-01-06 Thread dkuppitz
Github user dkuppitz commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/524#discussion_r94925652
  
--- Diff: 
gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/step/filter/DedupGlobalStep.java
 ---
@@ -89,6 +92,17 @@ public ElementRequirement getMaxRequirement() {
 
 @Override
 protected Traverser.Admin processNextStart() {
+if (null != this.barrier) {
+this.barrierIterator = this.barrier.entrySet().iterator();
+this.barrier = null;
+}
+while (this.barrierIterator != null && 
this.barrierIterator.hasNext()) {
+if (null == this.barrierIterator)
--- End diff --

`this.barrierIterator` can never be null within within the `while()` loop. 
Unless I overlooked something fundamental, `processNextStart` can be simplified 
to:

```
protected Traverser.Admin processNextStart() {
if (null != this.barrier) {
this.barrierIterator = this.barrier.entrySet().iterator();
this.barrier = null;
while (this.barrierIterator.hasNext()) {
final Map.Entry entry = 
this.barrierIterator.next();
if (this.duplicateSet.add(entry.getKey()))
return 
PathProcessor.processTraverserPathLabels(entry.getValue(), this.keepLabels);
}
}
return 
PathProcessor.processTraverserPathLabels(super.processNextStart(), 
this.keepLabels);
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] tinkerpop pull request #524: TINKERPOP-1585 & TINKERPOP-1590: DedupGlobalSte...

2017-01-06 Thread dkuppitz
Github user dkuppitz commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/524#discussion_r94926376
  
--- Diff: 
spark-gremlin/src/test/java/org/apache/tinkerpop/gremlin/spark/process/computer/traversal/strategy/optimization/SparkInterceptorStrategyTest.java
 ---
@@ -142,7 +142,7 @@ public void 
shouldSuccessfullyEvaluateInterceptedTraversals() throws Exception {
 test(6l, g.V().out().values("name").count());
 test(2l, g.V().out("knows").values("name").count());
 test(3l, g.V().in().has("name", "marko").count());
-test(6l, g.V().repeat(__.dedup()).times(2).count());
+test(0l, g.V().repeat(__.dedup()).times(2).count());
--- End diff --

Why did that change? `0` as the expected result looks kinda wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---