[jira] [Commented] (TINKERPOP-1254) Support dropping traverser path information when it is no longer needed.

2016-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370018#comment-15370018
 ] 

ASF GitHub Bot commented on TINKERPOP-1254:
---

Github user twilmes commented on the issue:

https://github.com/apache/tinkerpop/pull/358
  
That's excellent!  Integration tests almost ran all the way through for me, 
got an odd failure during the Archetype - Server, noted below.  I'm going to 
rerun anyway with the latest commit and using docker this time.

```
[INFO] 

[INFO] Reactor Summary:
[INFO]
[INFO] Apache TinkerPop ... SUCCESS [  
9.540 s]
[INFO] Apache TinkerPop :: Gremlin Shaded . SUCCESS [  
3.640 s]
[INFO] Apache TinkerPop :: Gremlin Core ... SUCCESS [01:27 
min]
[INFO] Apache TinkerPop :: Gremlin Test ... SUCCESS [ 
11.518 s]
[INFO] Apache TinkerPop :: Gremlin Groovy . SUCCESS [ 
49.133 s]
[INFO] Apache TinkerPop :: Gremlin Groovy Test  SUCCESS [  
8.280 s]
[INFO] Apache TinkerPop :: TinkerGraph Gremlin  SUCCESS [03:40 
min]
[INFO] Apache TinkerPop :: Gremlin Benchmark .. SUCCESS [  
4.660 s]
[INFO] Apache TinkerPop :: Hadoop Gremlin . SUCCESS [06:14 
min]
[INFO] Apache TinkerPop :: Spark Gremlin .. SUCCESS [19:37 
min]
[INFO] Apache TinkerPop :: Giraph Gremlin . SUCCESS [  
02:54 h]
[INFO] Apache TinkerPop :: Neo4j Gremlin .. SUCCESS [  
4.160 s]
[INFO] Apache TinkerPop :: Gremlin Driver . SUCCESS [ 
11.340 s]
[INFO] Apache TinkerPop :: Gremlin Server . SUCCESS [12:58 
min]
[INFO] Apache TinkerPop :: Gremlin Console  SUCCESS [01:36 
min]
[INFO] Apache TinkerPop :: Gremlin Archetype .. SUCCESS [  
0.151 s]
[INFO] Apache TinkerPop :: Archetype - TinkerGraph  FAILURE [  
0.374 s]
[INFO] Apache TinkerPop :: Archetype - Server . SKIPPED
[INFO] 

[INFO] BUILD FAILURE
[INFO] 

[INFO] Total time: 03:41 h
[INFO] Finished at: 2016-07-10T15:06:49-05:00
[INFO] Final Memory: 66M/727M
[INFO] 

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
project gremlin-archetype-tinkergraph: Execution default-test of goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test failed: There was an 
error in the forke
d process
[ERROR] java.lang.NoClassDefFoundError: 
projects/standard/project/gremlin-archetype-tinkergraph/target/test-classes/com/test/example/AppTest
 (wrong name: com/test/example/AppTest)
[ERROR] at java.lang.ClassLoader.defineClass1(Native Method)
[ERROR] at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
[ERROR] at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
```


> Support dropping traverser path information when it is no longer needed.
> 
>
> Key: TINKERPOP-1254
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1254
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: process
>Affects Versions: 3.1.1-incubating
>Reporter: Marko A. Rodriguez
>Assignee: Ted Wilmes
>
> The most expensive traversals (especially in OLAP) are those that can not be 
> "bulked." There are various reasons why two traversers at the same object can 
> not be bulked, but the primary reason is {{PATH}} or {{LABELED_PATH}}. That 
> is, when the history of the traverser is required, the probability of two 
> traversers having the same history is low.
> A key to making traversals more efficient is to do as a much as possible to 
> remove historic information from a traverser so it can get bulked. How does 
> one do this? 
> {code}
> g.V.as('a').out().as('b').out().where(neq('a').and().neq('b')).both().name
> {code}
> The {{LABELED_PATH}} of "a" and "b" are required up to the {{where()}} and at 
> which point, at {{both()}}, they are no longer required. It would be smart to 
> support:
> {code}
> traverser.dropLabels(Set)
> traverser.dropPath()
> {code}
> We would then, via a {{TraversalOptimizationStrategy}} insert a step between 
> {{where()}} and {{both()}} called {{PathPruneStep}} which would be a 
> {{SideEffectStep}}. The strategy would know which labels were no longer 
> needed (via forward lookahead) and then do:
> {code}
> public class 

[jira] [Commented] (TINKERPOP-1254) Support dropping traverser path information when it is no longer needed.

2016-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369701#comment-15369701
 ] 

ASF GitHub Bot commented on TINKERPOP-1254:
---

Github user twilmes commented on the issue:

https://github.com/apache/tinkerpop/pull/358
  
I made a small update to `ReferencePath` to create new label Sets when a 
patch is detached.  This had been causing issues where the first set of labels 
for a path where being shared across `MutablePaths` after detachment.  A label 
would be removed from one, and therefore all of the traversers that had that 
label `Set` in their path, would be affected.

The `PathProcessors` are now respecting keepLabels null and labels are not 
dropped if `PrunePathStrategy` is not applied.

**PrunePathStrategy on**
```
g.V().match(
as("a").in("sungBy").as("b"),
as("a").in("sungBy").as("c"),
as("b").out("writtenBy").as("d"),
as("c").out("writtenBy").as("e"),
as("d").has("name", "George_Harrison"),
as("e").has("name", "Bob_Marley")).select("a").count().profile()

Traversal Metrics
Step   Count  
Traversers   Time (ms)% Dur

=
GraphStep(vertex,[]) 808
 808  44.21799.97
MatchStep(AND,[[MatchStartStep(a), ProfileStep,... 1
   1   0.004 0.01
  MatchStartStep(a)  808
 808  43.517
  VertexStep(IN,[sungBy],vertex) 501
 499  20.323
  MatchEndStep(b) (profiling ignored)   
   0.000
  MatchStartStep(a)2
   2   0.006
  VertexStep(IN,[sungBy],vertex) 156
 156   2.129
  MatchEndStep(c) (profiling ignored)   
   0.000
  MatchStartStep(b)  501
 499   5.126
  VertexStep(OUT,[writtenBy],vertex) 509
 504   3.423
  MatchEndStep(d) (profiling ignored)   
   0.000
  MatchStartStep(c)  156
 156   1.083
  VertexStep(OUT,[writtenBy],vertex) 157
 157   1.029
  MatchEndStep(e) (profiling ignored)   
   0.000
  MatchStartStep(d)  509
 266   1.685
  HasStep([name.eq(George_Harrison)])  2
   2   0.002
  MatchEndStep (profiling ignored)  
   0.000
  MatchStartStep(e)  157
  57   0.391
  HasStep([name.eq(Bob_Marley)])   1
   1   0.001
  MatchEndStep (profiling ignored)  
   0.000
SelectOneStep(a)   1
   1   0.003 0.01
CountGlobalStep1
   1   0.003 0.01
>TOTAL -
   -  44.228-
```
**PrunePathStrategy off**
```
Traversal Metrics
Step   Count  
Traversers   Time (ms)% Dur

=
GraphStep(vertex,[]) 808
 808   7.56599.84
MatchStep(AND,[[MatchStartStep(a), ProfileStep,... 1
   1   0.007 0.10
  MatchStartStep(a)  808
 808   5.726
  VertexStep(IN,[sungBy],vertex) 501
 499   9.532
  MatchEndStep(b) (profiling ignored)   
   0.000
  MatchStartStep(a)2
   2   0.007
  VertexStep(IN,[sungBy],vertex) 156
 156   1.803
  

[jira] [Commented] (TINKERPOP-1254) Support dropping traverser path information when it is no longer needed.

2016-07-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369020#comment-15369020
 ] 

ASF GitHub Bot commented on TINKERPOP-1254:
---

Github user dkuppitz commented on the issue:

https://github.com/apache/tinkerpop/pull/358
  
`docker/build.sh -t -i -n` fails with:

```
Failed tests: 
  
DedupTest$Traversals>DedupTest.g_V_both_name_order_byXa_bX_dedup_value:110 
expected:<[josh]> but was:<[marko]>
```


> Support dropping traverser path information when it is no longer needed.
> 
>
> Key: TINKERPOP-1254
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1254
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: process
>Affects Versions: 3.1.1-incubating
>Reporter: Marko A. Rodriguez
>Assignee: Ted Wilmes
>
> The most expensive traversals (especially in OLAP) are those that can not be 
> "bulked." There are various reasons why two traversers at the same object can 
> not be bulked, but the primary reason is {{PATH}} or {{LABELED_PATH}}. That 
> is, when the history of the traverser is required, the probability of two 
> traversers having the same history is low.
> A key to making traversals more efficient is to do as a much as possible to 
> remove historic information from a traverser so it can get bulked. How does 
> one do this? 
> {code}
> g.V.as('a').out().as('b').out().where(neq('a').and().neq('b')).both().name
> {code}
> The {{LABELED_PATH}} of "a" and "b" are required up to the {{where()}} and at 
> which point, at {{both()}}, they are no longer required. It would be smart to 
> support:
> {code}
> traverser.dropLabels(Set)
> traverser.dropPath()
> {code}
> We would then, via a {{TraversalOptimizationStrategy}} insert a step between 
> {{where()}} and {{both()}} called {{PathPruneStep}} which would be a 
> {{SideEffectStep}}. The strategy would know which labels were no longer 
> needed (via forward lookahead) and then do:
> {code}
> public class PathPruneStep {
>   final Set dropLabels = ...
>   final boolean dropPath = ...
>   public void sideEffect(final Traverser traverser) {
> final Traverser start = this.starts.next();
> if(this.dropPath) start.dropPath();
> else start.dropLabels(labels); 
>   }
> }
> {code}
> Again, the more we can prune historic path data no longer needed, the higher 
> the probability of bulking. Think about this in terms of {{match()}}.
> {code}
> g.V().match(
>   a.out.b,
>   b.out.c,
>   c.neq.a,
>   c.out.b,
> ).select("a")
> {code}
> All we need is "a" at the end. Thus, once a pattern has been passed and no 
> future patterns require that label, drop it! 
> This idea is related to TINKERPOP-331, but I don't think we should deal with 
> manipulating the species. Thus, I think 331 is too "low level."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TINKERPOP-1254) Support dropping traverser path information when it is no longer needed.

2016-07-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367673#comment-15367673
 ] 

ASF GitHub Bot commented on TINKERPOP-1254:
---

GitHub user twilmes opened a pull request:

https://github.com/apache/tinkerpop/pull/358

TINKERPOP-1254 Support dropping traverser path information when it is no 
longer needed

This PR adds support for path retraction to increase the likelihood of 
bulking in OLTP and OLAP modes. Traversal analysis is performed during the 
application of the PrunePathStrategy to identify labels that may be dropped at 
various points in the traversal.  MatchStep also performs runtime analysis to 
determine which labels it can drop in addition to the labels identified during 
traversal strategy application.

Here is a set of profiles showing the benefit of path dropping.  These were 
generated with `TinkerGraphComputer` first against 3.2 and then TinkerPop-1254. 
 **Note** that the times should not be compared here.  The first was run on my 
anemic macbook which, and the second was run on an 8 core AWS m3.2xlarge 
instance that I've been using for testing.  I'll be following up with more 
numbers on the same hardware but you can see here the dramatic drop in 
traversers with path pruning enabled.

**TP 3.2**
```
gremlin> g.V().match(__.as('a').out().as('b'), __.as('b').out().as('c'), 
__.as('c').out().as('d')).select('d').count().profile()
==>Traversal Metrics
Step   Count  
Traversers   Time (ms)% Dur

=
TinkerGraphStep(vertex,[])   808
 808  21.205 0.02
MatchStep(AND,[[MatchStartStep(a), ProfileStep,...  14465066
14465066  110317.81385.88
  MatchStartStep(a)  808
 808   10975.797
  VertexStep(OUT,vertex)8049
80496953.071
  MatchEndStep(b)   8049
80496727.167
  MatchStartStep(b) 8049
79575461.242
  VertexStep(OUT,vertex)  327370
  3273706782.024
  MatchEndStep(c) 327370
  3273706238.268
  MatchStartStep(c)   327370
  3269831771.773
  VertexStep(OUT,vertex)14465066
14465066   11489.228
  MatchEndStep(d)   14465066
14465066   14301.313
SelectOneStep(d)14465066
14465066   13752.96610.71
CountGlobalStep1
   14363.667 3.40
>TOTAL -
   -  128455.652-
```
 **TinkerPop-1254**
```
gremlin> g.V().match(__.as('a').out().as('b'), __.as('b').out().as('c'), 
__.as('c').out().as('d')).select('d').count().profile()
==>Traversal Metrics
Step   Count  
Traversers   Time (ms)% Dur

=
GraphStep(vertex,[]) 808
 808  32.45319.96
MatchStep(AND,[[MatchStartStep(a), ProfileStep,...  14465066
7510  89.46355.01
  MatchStartStep(a)  808
 808  22.388
  VertexStep(OUT,vertex)8049
7957  85.493
  MatchEndStep(b) (profiling ignored)   
   0.000
  MatchStartStep(b) 8049
 563   7.488
  VertexStep(OUT,vertex)  327370
7561  19.548
  MatchEndStep(c) (profiling ignored)   
   0.000
  MatchStartStep(c)   327370
 452   4.247
  VertexStep(OUT,vertex)14465066
7510  14.812
  MatchEndStep(d) (profiling ignored)   
   0.000
SelectOneStep(d)

[jira] [Commented] (TINKERPOP-1254) Support dropping traverser path information when it is no longer needed.

2016-06-07 Thread Ted Wilmes (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318584#comment-15318584
 ] 

Ted Wilmes commented on TINKERPOP-1254:
---

I'm getting close to ready for a review.  I ended up switching over to keeping 
vs. dropping the labels.  Drop works out well if a traversal is totally flat 
but becomes problematic when you're dealing with steps that take nested 
traversals.  Currently I'm finishing up some fixes related to errors while 
running on graph computer, and then I'll push so we have something concrete to 
discuss.

> Support dropping traverser path information when it is no longer needed.
> 
>
> Key: TINKERPOP-1254
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1254
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: process
>Affects Versions: 3.1.1-incubating
>Reporter: Marko A. Rodriguez
>Assignee: Ted Wilmes
>
> The most expensive traversals (especially in OLAP) are those that can not be 
> "bulked." There are various reasons why two traversers at the same object can 
> not be bulked, but the primary reason is {{PATH}} or {{LABELED_PATH}}. That 
> is, when the history of the traverser is required, the probability of two 
> traversers having the same history is low.
> A key to making traversals more efficient is to do as a much as possible to 
> remove historic information from a traverser so it can get bulked. How does 
> one do this? 
> {code}
> g.V.as('a').out().as('b').out().where(neq('a').and().neq('b')).both().name
> {code}
> The {{LABELED_PATH}} of "a" and "b" are required up to the {{where()}} and at 
> which point, at {{both()}}, they are no longer required. It would be smart to 
> support:
> {code}
> traverser.dropLabels(Set)
> traverser.dropPath()
> {code}
> We would then, via a {{TraversalOptimizationStrategy}} insert a step between 
> {{where()}} and {{both()}} called {{PathPruneStep}} which would be a 
> {{SideEffectStep}}. The strategy would know which labels were no longer 
> needed (via forward lookahead) and then do:
> {code}
> public class PathPruneStep {
>   final Set dropLabels = ...
>   final boolean dropPath = ...
>   public void sideEffect(final Traverser traverser) {
> final Traverser start = this.starts.next();
> if(this.dropPath) start.dropPath();
> else start.dropLabels(labels); 
>   }
> }
> {code}
> Again, the more we can prune historic path data no longer needed, the higher 
> the probability of bulking. Think about this in terms of {{match()}}.
> {code}
> g.V().match(
>   a.out.b,
>   b.out.c,
>   c.neq.a,
>   c.out.b,
> ).select("a")
> {code}
> All we need is "a" at the end. Thus, once a pattern has been passed and no 
> future patterns require that label, drop it! 
> This idea is related to TINKERPOP-331, but I don't think we should deal with 
> manipulating the species. Thus, I think 331 is too "low level."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TINKERPOP-1254) Support dropping traverser path information when it is no longer needed.

2016-06-07 Thread Marko A. Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318434#comment-15318434
 ] 

Marko A. Rodriguez commented on TINKERPOP-1254:
---

How are things going on this work? [~twilmes] Doing okay? Need any 
advice/reviews?

> Support dropping traverser path information when it is no longer needed.
> 
>
> Key: TINKERPOP-1254
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1254
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: process
>Affects Versions: 3.1.1-incubating
>Reporter: Marko A. Rodriguez
>Assignee: Ted Wilmes
>
> The most expensive traversals (especially in OLAP) are those that can not be 
> "bulked." There are various reasons why two traversers at the same object can 
> not be bulked, but the primary reason is {{PATH}} or {{LABELED_PATH}}. That 
> is, when the history of the traverser is required, the probability of two 
> traversers having the same history is low.
> A key to making traversals more efficient is to do as a much as possible to 
> remove historic information from a traverser so it can get bulked. How does 
> one do this? 
> {code}
> g.V.as('a').out().as('b').out().where(neq('a').and().neq('b')).both().name
> {code}
> The {{LABELED_PATH}} of "a" and "b" are required up to the {{where()}} and at 
> which point, at {{both()}}, they are no longer required. It would be smart to 
> support:
> {code}
> traverser.dropLabels(Set)
> traverser.dropPath()
> {code}
> We would then, via a {{TraversalOptimizationStrategy}} insert a step between 
> {{where()}} and {{both()}} called {{PathPruneStep}} which would be a 
> {{SideEffectStep}}. The strategy would know which labels were no longer 
> needed (via forward lookahead) and then do:
> {code}
> public class PathPruneStep {
>   final Set dropLabels = ...
>   final boolean dropPath = ...
>   public void sideEffect(final Traverser traverser) {
> final Traverser start = this.starts.next();
> if(this.dropPath) start.dropPath();
> else start.dropLabels(labels); 
>   }
> }
> {code}
> Again, the more we can prune historic path data no longer needed, the higher 
> the probability of bulking. Think about this in terms of {{match()}}.
> {code}
> g.V().match(
>   a.out.b,
>   b.out.c,
>   c.neq.a,
>   c.out.b,
> ).select("a")
> {code}
> All we need is "a" at the end. Thus, once a pattern has been passed and no 
> future patterns require that label, drop it! 
> This idea is related to TINKERPOP-331, but I don't think we should deal with 
> manipulating the species. Thus, I think 331 is too "low level."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TINKERPOP-1254) Support dropping traverser path information when it is no longer needed.

2016-05-31 Thread Marko A. Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308427#comment-15308427
 ] 

Marko A. Rodriguez commented on TINKERPOP-1254:
---

Hm. I would put the {{dropLabels}} logic into {{MatchStep}} itself. For each 
traverser, you can {{Traverser.getTags()}} to know which traversals it has 
already gone down. If the travesals it has NOT gone down don't need the labels 
of the tagged traversals, then drop them. That is, don't insert 
{{dropLabels()}}-steps, do the logic in {{MatchStep.computerAlgorithm()}} and 
{{MatchStep.standardAlgorithm()}}. 

> Support dropping traverser path information when it is no longer needed.
> 
>
> Key: TINKERPOP-1254
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1254
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: process
>Affects Versions: 3.1.1-incubating
>Reporter: Marko A. Rodriguez
>Assignee: Ted Wilmes
>
> The most expensive traversals (especially in OLAP) are those that can not be 
> "bulked." There are various reasons why two traversers at the same object can 
> not be bulked, but the primary reason is {{PATH}} or {{LABELED_PATH}}. That 
> is, when the history of the traverser is required, the probability of two 
> traversers having the same history is low.
> A key to making traversals more efficient is to do as a much as possible to 
> remove historic information from a traverser so it can get bulked. How does 
> one do this? 
> {code}
> g.V.as('a').out().as('b').out().where(neq('a').and().neq('b')).both().name
> {code}
> The {{LABELED_PATH}} of "a" and "b" are required up to the {{where()}} and at 
> which point, at {{both()}}, they are no longer required. It would be smart to 
> support:
> {code}
> traverser.dropLabels(Set)
> traverser.dropPath()
> {code}
> We would then, via a {{TraversalOptimizationStrategy}} insert a step between 
> {{where()}} and {{both()}} called {{PathPruneStep}} which would be a 
> {{SideEffectStep}}. The strategy would know which labels were no longer 
> needed (via forward lookahead) and then do:
> {code}
> public class PathPruneStep {
>   final Set dropLabels = ...
>   final boolean dropPath = ...
>   public void sideEffect(final Traverser traverser) {
> final Traverser start = this.starts.next();
> if(this.dropPath) start.dropPath();
> else start.dropLabels(labels); 
>   }
> }
> {code}
> Again, the more we can prune historic path data no longer needed, the higher 
> the probability of bulking. Think about this in terms of {{match()}}.
> {code}
> g.V().match(
>   a.out.b,
>   b.out.c,
>   c.neq.a,
>   c.out.b,
> ).select("a")
> {code}
> All we need is "a" at the end. Thus, once a pattern has been passed and no 
> future patterns require that label, drop it! 
> This idea is related to TINKERPOP-331, but I don't think we should deal with 
> manipulating the species. Thus, I think 331 is too "low level."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TINKERPOP-1254) Support dropping traverser path information when it is no longer needed.

2016-05-31 Thread Ted Wilmes (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308248#comment-15308248
 ] 

Ted Wilmes commented on TINKERPOP-1254:
---

Not sure if this makes sense, but I'm in the thick of this right now and taking 
an approach where I build a dependency tree for the match traversals.  
Contrived example here to illustrate:

{code}
match(
__.as('a').out().as('b'),   (1)
__.as('b').out().as('c'),   (2)
__.as('c').out().as('d'),   (3)
__.as('b').out().as('e')(4)
).select('d', 'e').dropLabels('d', 'e')
{code}

Traversal  (2) and (4) depend on (1), Traversal (3) depends on (2).  The 
results of (3) and (4) are referenced in the {{select}}.  This info is used to 
insert the following dropLabel steps:
{code}
match(
__.as('a').out().as('b'),
__.as('b').dropLabels('b').out().as('c'),
__.as('c').dropLabels('c').out().as('d')
__.as('b').dropLabels('b').out().as('e')
).select('d', 'e').dropLabels('d', 'e')
{code}

My thinking is even though the order execution is determined at runtime, we can 
figure out beforehand which match traversals depend on each other and insert 
the drop steps accordingly.  Does this make sense or have I made some incorrect 
assumptions on dependencies between match traversals?

> Support dropping traverser path information when it is no longer needed.
> 
>
> Key: TINKERPOP-1254
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1254
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: process
>Affects Versions: 3.1.1-incubating
>Reporter: Marko A. Rodriguez
>Assignee: Ted Wilmes
>
> The most expensive traversals (especially in OLAP) are those that can not be 
> "bulked." There are various reasons why two traversers at the same object can 
> not be bulked, but the primary reason is {{PATH}} or {{LABELED_PATH}}. That 
> is, when the history of the traverser is required, the probability of two 
> traversers having the same history is low.
> A key to making traversals more efficient is to do as a much as possible to 
> remove historic information from a traverser so it can get bulked. How does 
> one do this? 
> {code}
> g.V.as('a').out().as('b').out().where(neq('a').and().neq('b')).both().name
> {code}
> The {{LABELED_PATH}} of "a" and "b" are required up to the {{where()}} and at 
> which point, at {{both()}}, they are no longer required. It would be smart to 
> support:
> {code}
> traverser.dropLabels(Set)
> traverser.dropPath()
> {code}
> We would then, via a {{TraversalOptimizationStrategy}} insert a step between 
> {{where()}} and {{both()}} called {{PathPruneStep}} which would be a 
> {{SideEffectStep}}. The strategy would know which labels were no longer 
> needed (via forward lookahead) and then do:
> {code}
> public class PathPruneStep {
>   final Set dropLabels = ...
>   final boolean dropPath = ...
>   public void sideEffect(final Traverser traverser) {
> final Traverser start = this.starts.next();
> if(this.dropPath) start.dropPath();
> else start.dropLabels(labels); 
>   }
> }
> {code}
> Again, the more we can prune historic path data no longer needed, the higher 
> the probability of bulking. Think about this in terms of {{match()}}.
> {code}
> g.V().match(
>   a.out.b,
>   b.out.c,
>   c.neq.a,
>   c.out.b,
> ).select("a")
> {code}
> All we need is "a" at the end. Thus, once a pattern has been passed and no 
> future patterns require that label, drop it! 
> This idea is related to TINKERPOP-331, but I don't think we should deal with 
> manipulating the species. Thus, I think 331 is too "low level."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TINKERPOP-1254) Support dropping traverser path information when it is no longer needed.

2016-05-31 Thread Marko A. Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307850#comment-15307850
 ] 

Marko A. Rodriguez commented on TINKERPOP-1254:
---

Excellent! I'm excited to review the PR and to see this feature in action. One 
thing, given that {{MatchStep}} will have pattern order execution determined 
dynamically at runtime, are you being smart about knowing when to drop and not 
drop with {{match()}}?

> Support dropping traverser path information when it is no longer needed.
> 
>
> Key: TINKERPOP-1254
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1254
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: process
>Affects Versions: 3.1.1-incubating
>Reporter: Marko A. Rodriguez
>Assignee: Ted Wilmes
>
> The most expensive traversals (especially in OLAP) are those that can not be 
> "bulked." There are various reasons why two traversers at the same object can 
> not be bulked, but the primary reason is {{PATH}} or {{LABELED_PATH}}. That 
> is, when the history of the traverser is required, the probability of two 
> traversers having the same history is low.
> A key to making traversals more efficient is to do as a much as possible to 
> remove historic information from a traverser so it can get bulked. How does 
> one do this? 
> {code}
> g.V.as('a').out().as('b').out().where(neq('a').and().neq('b')).both().name
> {code}
> The {{LABELED_PATH}} of "a" and "b" are required up to the {{where()}} and at 
> which point, at {{both()}}, they are no longer required. It would be smart to 
> support:
> {code}
> traverser.dropLabels(Set)
> traverser.dropPath()
> {code}
> We would then, via a {{TraversalOptimizationStrategy}} insert a step between 
> {{where()}} and {{both()}} called {{PathPruneStep}} which would be a 
> {{SideEffectStep}}. The strategy would know which labels were no longer 
> needed (via forward lookahead) and then do:
> {code}
> public class PathPruneStep {
>   final Set dropLabels = ...
>   final boolean dropPath = ...
>   public void sideEffect(final Traverser traverser) {
> final Traverser start = this.starts.next();
> if(this.dropPath) start.dropPath();
> else start.dropLabels(labels); 
>   }
> }
> {code}
> Again, the more we can prune historic path data no longer needed, the higher 
> the probability of bulking. Think about this in terms of {{match()}}.
> {code}
> g.V().match(
>   a.out.b,
>   b.out.c,
>   c.neq.a,
>   c.out.b,
> ).select("a")
> {code}
> All we need is "a" at the end. Thus, once a pattern has been passed and no 
> future patterns require that label, drop it! 
> This idea is related to TINKERPOP-331, but I don't think we should deal with 
> manipulating the species. Thus, I think 331 is too "low level."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TINKERPOP-1254) Support dropping traverser path information when it is no longer needed.

2016-05-31 Thread Ted Wilmes (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307845#comment-15307845
 ] 

Ted Wilmes commented on TINKERPOP-1254:
---

I'm pretty close to PR-ready.  Hopefully have something pushed later this week 
and ready for an in-depth review.  I need to make a few final updates to the 
strategy and I also need to do some further testing to make sure my retract on 
the {{ImmutablePath}} doesn't lead to performance degradation as the number of 
Immutable paths temporarily increases.  

> Support dropping traverser path information when it is no longer needed.
> 
>
> Key: TINKERPOP-1254
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1254
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: process
>Affects Versions: 3.1.1-incubating
>Reporter: Marko A. Rodriguez
>Assignee: Ted Wilmes
>
> The most expensive traversals (especially in OLAP) are those that can not be 
> "bulked." There are various reasons why two traversers at the same object can 
> not be bulked, but the primary reason is {{PATH}} or {{LABELED_PATH}}. That 
> is, when the history of the traverser is required, the probability of two 
> traversers having the same history is low.
> A key to making traversals more efficient is to do as a much as possible to 
> remove historic information from a traverser so it can get bulked. How does 
> one do this? 
> {code}
> g.V.as('a').out().as('b').out().where(neq('a').and().neq('b')).both().name
> {code}
> The {{LABELED_PATH}} of "a" and "b" are required up to the {{where()}} and at 
> which point, at {{both()}}, they are no longer required. It would be smart to 
> support:
> {code}
> traverser.dropLabels(Set)
> traverser.dropPath()
> {code}
> We would then, via a {{TraversalOptimizationStrategy}} insert a step between 
> {{where()}} and {{both()}} called {{PathPruneStep}} which would be a 
> {{SideEffectStep}}. The strategy would know which labels were no longer 
> needed (via forward lookahead) and then do:
> {code}
> public class PathPruneStep {
>   final Set dropLabels = ...
>   final boolean dropPath = ...
>   public void sideEffect(final Traverser traverser) {
> final Traverser start = this.starts.next();
> if(this.dropPath) start.dropPath();
> else start.dropLabels(labels); 
>   }
> }
> {code}
> Again, the more we can prune historic path data no longer needed, the higher 
> the probability of bulking. Think about this in terms of {{match()}}.
> {code}
> g.V().match(
>   a.out.b,
>   b.out.c,
>   c.neq.a,
>   c.out.b,
> ).select("a")
> {code}
> All we need is "a" at the end. Thus, once a pattern has been passed and no 
> future patterns require that label, drop it! 
> This idea is related to TINKERPOP-331, but I don't think we should deal with 
> manipulating the species. Thus, I think 331 is too "low level."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TINKERPOP-1254) Support dropping traverser path information when it is no longer needed.

2016-05-31 Thread Marko A. Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307793#comment-15307793
 ] 

Marko A. Rodriguez commented on TINKERPOP-1254:
---

How is this going [~twilmes]? This is getting more and more important -- 
especially as more people are using {{match()}}.

> Support dropping traverser path information when it is no longer needed.
> 
>
> Key: TINKERPOP-1254
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1254
> Project: TinkerPop
>  Issue Type: Improvement
>  Components: process
>Affects Versions: 3.1.1-incubating
>Reporter: Marko A. Rodriguez
>Assignee: Ted Wilmes
>
> The most expensive traversals (especially in OLAP) are those that can not be 
> "bulked." There are various reasons why two traversers at the same object can 
> not be bulked, but the primary reason is {{PATH}} or {{LABELED_PATH}}. That 
> is, when the history of the traverser is required, the probability of two 
> traversers having the same history is low.
> A key to making traversals more efficient is to do as a much as possible to 
> remove historic information from a traverser so it can get bulked. How does 
> one do this? 
> {code}
> g.V.as('a').out().as('b').out().where(neq('a').and().neq('b')).both().name
> {code}
> The {{LABELED_PATH}} of "a" and "b" are required up to the {{where()}} and at 
> which point, at {{both()}}, they are no longer required. It would be smart to 
> support:
> {code}
> traverser.dropLabels(Set)
> traverser.dropPath()
> {code}
> We would then, via a {{TraversalOptimizationStrategy}} insert a step between 
> {{where()}} and {{both()}} called {{PathPruneStep}} which would be a 
> {{SideEffectStep}}. The strategy would know which labels were no longer 
> needed (via forward lookahead) and then do:
> {code}
> public class PathPruneStep {
>   final Set dropLabels = ...
>   final boolean dropPath = ...
>   public void sideEffect(final Traverser traverser) {
> final Traverser start = this.starts.next();
> if(this.dropPath) start.dropPath();
> else start.dropLabels(labels); 
>   }
> }
> {code}
> Again, the more we can prune historic path data no longer needed, the higher 
> the probability of bulking. Think about this in terms of {{match()}}.
> {code}
> g.V().match(
>   a.out.b,
>   b.out.c,
>   c.neq.a,
>   c.out.b,
> ).select("a")
> {code}
> All we need is "a" at the end. Thus, once a pattern has been passed and no 
> future patterns require that label, drop it! 
> This idea is related to TINKERPOP-331, but I don't think we should deal with 
> manipulating the species. Thus, I think 331 is too "low level."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)