[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results

2019-05-20 Thread Daniel Choi (JIRA)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844377#comment-16844377
 ] 

Daniel Choi commented on TINKERPOP-2220:


True.  Ok perhaps dedup not being scoped to each iteration is not a problem, as 
you point out you can simulate that by taking (depth, vertex) as the dedup key.

> Dedup inside Repeat Produces 0 results
> --
>
> Key: TINKERPOP-2220
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2220
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.3.0
>Reporter: Rahul Chander
>Priority: Major
>
> Testing against the Tinkerpop Modern graph dataset, I ran this query:
> {code:java}
> g.V().repeat(__.dedup()).times(2).count()
> {code}
> which should essentially be the same as running dedup twice. It produced 0 
> results, while dedup twice produced the correct 6.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results

2019-05-20 Thread Daniel Kuppitz (JIRA)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844368#comment-16844368
 ] 

Daniel Kuppitz commented on TINKERPOP-2220:
---

{quote}we'd be processing *v[6]* twice at _depth=3_, only to later dedup the 
duplicate pairs created from the double traversal{quote}

You are saying that you want to deduplicate the pair, but in your query, you 
deduplicate the vertex.

{code}
// what you want
g.V(1).repeat(identity().as("incoming").project("d","v").by(loops()).by(identity()).dedup().aggregate("pairs").select("incoming").out()).cap("pairs")
 // what you bring up as a non-working example
g.V(1).repeat(identity().as("incoming").project("d","v").by(loops()).by(identity()).aggregate("pairs").select("incoming").out().dedup()).cap("pairs")
{code}

> Dedup inside Repeat Produces 0 results
> --
>
> Key: TINKERPOP-2220
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2220
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.3.0
>Reporter: Rahul Chander
>Priority: Major
>
> Testing against the Tinkerpop Modern graph dataset, I ran this query:
> {code:java}
> g.V().repeat(__.dedup()).times(2).count()
> {code}
> which should essentially be the same as running dedup twice. It produced 0 
> results, while dedup twice produced the correct 6.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results

2019-05-20 Thread Daniel Choi (JIRA)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844240#comment-16844240
 ] 

Daniel Choi commented on TINKERPOP-2220:


Just realized unrolling doesn't achieve the same outcome as above either; it's 
almost like we need a local dedup() per iteration (assuming BFS traversal 
order).  Perhaps another corroborating point to making a dedicated graph search 
step instead of trying to use repeat for this kinds of things.

> Dedup inside Repeat Produces 0 results
> --
>
> Key: TINKERPOP-2220
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2220
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.3.0
>Reporter: Rahul Chander
>Priority: Major
>
> Testing against the Tinkerpop Modern graph dataset, I ran this query:
> {code:java}
> g.V().repeat(__.dedup()).times(2).count()
> {code}
> which should essentially be the same as running dedup twice. It produced 0 
> results, while dedup twice produced the correct 6.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results

2019-05-20 Thread Daniel Choi (JIRA)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844203#comment-16844203
 ] 

Daniel Choi commented on TINKERPOP-2220:


Yes if you define that's how dedup() should work inside a repeat(), then that 
is the correct behavior.  And I don't deny it's useful to have it behave this 
way.  I was just pointing out that it didn't seem to be consistent with how 
repeat works in general (repeat the inner traversal verbatim as if unrolled), 
but perhaps I'm introducing my own bias here in terms of how repeat should work.

As a counter point, imagine you wanted to do a BFS traversal starting from a 
node to all sink nodes, and wanted to print out all distinct nodes at each 
frontier depth.  In other words, all pairs (d, v), where d=depth and v=vertex. 

 
{code:java}
gremlin> 
g.V(1).repeat(identity().as("incoming").project("d","v").by(loops()).by(identity()).aggregate("pairs").select("incoming").out()).cap("pairs")
==>[[d:0,v:v[1]],[d:1,v:v[3]],[d:1,v:v[2]],[d:1,v:v[4]],[d:2,v:v[5]],[d:2,v:v[3]]{code}
Now with dedup:

 

 
{code:java}
gremlin> 
g.V(1).repeat(identity().as("incoming").project("d","v").by(loops()).by(identity()).aggregate("pairs").select("incoming").out().dedup()).cap("pairs")
==>[[d:0,v:v[1]],[d:1,v:v[3]],[d:1,v:v[2]],[d:1,v:v[4]],[d:2,v:v[5]]]
{code}
 

 

Notice how pair *[d:2, v:v[3]]* is gone in the dedup version, even though this 
is the first time it's appearing in _depth=2_.  You could argue you could 
instead do the repeat traversals without any dedup, then later dedup all the 
aggregated pairs.  But then you're introducing more unnecessary computations at 
each depth, for example if ***v[5]* and *v[3]* both had edges going out to 
*v[6]*, we'd be processing *v[6]* twice at _depth=3_, only to later dedup the 
duplicate pairs created from the double traversal.  The problem gets worse as 
your traversal tree deepens.**

> Dedup inside Repeat Produces 0 results
> --
>
> Key: TINKERPOP-2220
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2220
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.3.0
>Reporter: Rahul Chander
>Priority: Major
>
> Testing against the Tinkerpop Modern graph dataset, I ran this query:
> {code:java}
> g.V().repeat(__.dedup()).times(2).count()
> {code}
> which should essentially be the same as running dedup twice. It produced 0 
> results, while dedup twice produced the correct 6.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results

2019-05-17 Thread Daniel Kuppitz (JIRA)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842565#comment-16842565
 ] 

Daniel Kuppitz commented on TINKERPOP-2220:
---

Just had another idea on how to explain it. Instead of {{dedup()}} let's just 
use a filter lambda that follows the same rules (and prints some debug 
messages):

{code}
gremlin> dedupLambda = {
..1>   element = it.get()
..2>   step = it.getStepId()
..3>   if ((m = (step =~ /[0-9]+\.[0-9]+\.[0-9]+\(\)/)).find()) {
..4> step = m.group()
..5>   }
..6>   seen = memory[step]?.contains(element)
..7>   memory[step] = (memory[step] ?: []) + [element]
..8>   println "${element} seen at step ${step}: ${seen ? 'yes (filter it)' 
: 'no (let it pass)'}"
..9>   return !seen
.10> }
==>groovysh_evaluate$_run_closure1@67514bdd
gremlin> 
gremlin> memory = [:] ; g.V().filter(dedupLambda).filter(dedupLambda).count()
v[1] seen at step 1.0.0(): no (let it pass)
v[1] seen at step 2.0.0(): no (let it pass)
v[2] seen at step 1.0.0(): no (let it pass)
v[2] seen at step 2.0.0(): no (let it pass)
v[3] seen at step 1.0.0(): no (let it pass)
v[3] seen at step 2.0.0(): no (let it pass)
v[4] seen at step 1.0.0(): no (let it pass)
v[4] seen at step 2.0.0(): no (let it pass)
v[5] seen at step 1.0.0(): no (let it pass)
v[5] seen at step 2.0.0(): no (let it pass)
v[6] seen at step 1.0.0(): no (let it pass)
v[6] seen at step 2.0.0(): no (let it pass)
==>6
gremlin> memory = [:] ; g.V().repeat(filter(dedupLambda)).times(2).count()
v[1] seen at step 1.0.0(): no (let it pass)
v[1] seen at step 1.0.0(): yes (filter it)
v[2] seen at step 1.0.0(): no (let it pass)
v[2] seen at step 1.0.0(): yes (filter it)
v[3] seen at step 1.0.0(): no (let it pass)
v[3] seen at step 1.0.0(): yes (filter it)
v[4] seen at step 1.0.0(): no (let it pass)
v[4] seen at step 1.0.0(): yes (filter it)
v[5] seen at step 1.0.0(): no (let it pass)
v[5] seen at step 1.0.0(): yes (filter it)
v[6] seen at step 1.0.0(): no (let it pass)
v[6] seen at step 1.0.0(): yes (filter it)
==>0
{code}

> Dedup inside Repeat Produces 0 results
> --
>
> Key: TINKERPOP-2220
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2220
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.3.0
>Reporter: Rahul Chander
>Priority: Major
>
> Testing against the Tinkerpop Modern graph dataset, I ran this query:
> {code:java}
> g.V().repeat(__.dedup()).times(2).count()
> {code}
> which should essentially be the same as running dedup twice. It produced 0 
> results, while dedup twice produced the correct 6.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results

2019-05-17 Thread Daniel Kuppitz (JIRA)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842470#comment-16842470
 ] 

Daniel Kuppitz commented on TINKERPOP-2220:
---

{{dedup()}} is a {{FilterStep}}. Think about it, {{repeat(whatever).times(2)}} 
is supposed to emit whatever is left after the 2nd iteration. For 
{{repeat(dedup()).times(2)}} there's just nothing left in the stream as every 
element gets filtered in the 2nd iteration.

{code}
gremlin> g.V().repeat(dedup()).times(2).count()
==>0
gremlin> g.V().repeat(dedup()).emit().times(2).count()
==>6
{code}

If you emit all elements after each iteration, you'll get all the survivors 
from iteration 1. Does it make any more sense now?

It's not the same as {{dedup().dedup()}} as this only ensures uniqueness at two 
different steps in the traversal. And because it's not the same, 
{{RepeatUnrollStrategy}} won't do anything if it finds a {{DedupStep}}.

> Dedup inside Repeat Produces 0 results
> --
>
> Key: TINKERPOP-2220
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2220
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.3.0
>Reporter: Rahul Chander
>Priority: Major
>
> Testing against the Tinkerpop Modern graph dataset, I ran this query:
> {code:java}
> g.V().repeat(__.dedup()).times(2).count()
> {code}
> which should essentially be the same as running dedup twice. It produced 0 
> results, while dedup twice produced the correct 6.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results

2019-05-17 Thread Daniel Choi (JIRA)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842430#comment-16842430
 ] 

Daniel Choi commented on TINKERPOP-2220:


Thanks Daniel, I see what you're getting at.  Although I feel like if dedup()'s 
meaning is different in the context of repeat, maybe it would've been better to 
separate it out as an explicit argument to do sort of "distinct" graph search 
(only process each node once).  repeat().distinct(true) or something like that. 
 It just seems weird that repeat() is normally meant to repeat the inner 
traversal verbatim, except if it's a dedup() then it's not (evidenced by 
special case handling of dedup() inside repeat() in unroll strategy). 

I realize repeat() is usually used to implement graph search like BFS in 
gremlin and there are strong merits to having a mechanism to reduce the 
frontier based on a visited set, and that dedup() is being used for this use 
case.  But it seems to start to muddy the semantics of repeat() and maybe it's 
better to have repeat() as verbatim repetition of the inner traversal while 
making separate dedicated steps for graph search (with various search related 
options: bfs, dfs, bi-directional, etc.)?

> Dedup inside Repeat Produces 0 results
> --
>
> Key: TINKERPOP-2220
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2220
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.3.0
>Reporter: Rahul Chander
>Priority: Major
>
> Testing against the Tinkerpop Modern graph dataset, I ran this query:
> {code:java}
> g.V().repeat(__.dedup()).times(2).count()
> {code}
> which should essentially be the same as running dedup twice. It produced 0 
> results, while dedup twice produced the correct 6.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results

2019-05-17 Thread Daniel Kuppitz (JIRA)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842317#comment-16842317
 ] 

Daniel Kuppitz commented on TINKERPOP-2220:
---

The meaning of {{dedup()}} inside {{repeat()}} is a little different. Just like 
{{simplePath}} ensures that no element appears twice on the current path, 
{{repeat(dedup())}} ensures that no element is visited twice within the 
repetition.

> Dedup inside Repeat Produces 0 results
> --
>
> Key: TINKERPOP-2220
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2220
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.3.0
>Reporter: Rahul Chander
>Priority: Major
>
> Testing against the Tinkerpop Modern graph dataset, I ran this query:
> {code:java}
> g.V().repeat(__.dedup()).times(2).count()
> {code}
> which should essentially be the same as running dedup twice. It produced 0 
> results, while dedup twice produced the correct 6.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results

2019-05-16 Thread Daniel Choi (JIRA)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841634#comment-16841634
 ] 

Daniel Choi commented on TINKERPOP-2220:


[~okram] could you chime in on this?  The explanation in the 524 pull request 
above seems to imply there context carry over is intended between iterations, 
but that feels like to me it's breaking the "barrier" nature of dedup().  The 
second dedup() is reaching into the first dedup()'s context (the deduping set), 
breaking the barrier, so to speak.

> Dedup inside Repeat Produces 0 results
> --
>
> Key: TINKERPOP-2220
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2220
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.3.0
>Reporter: Rahul Chander
>Priority: Major
>
> Testing against the Tinkerpop Modern graph dataset, I ran this query:
> {code:java}
> g.V().repeat(__.dedup()).times(2).count()
> {code}
> which should essentially be the same as running dedup twice. It produced 0 
> results, while dedup twice produced the correct 6.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results

2019-05-16 Thread Divij Vaidya (JIRA)


[ 
https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841093#comment-16841093
 ] 

Divij Vaidya commented on TINKERPOP-2220:
-

Getting an answer of 0 for this query is not a bug but intended behaviour. It 
has been discussed as part of the PR 
[https://github.com/apache/tinkerpop/pull/524] 

> Dedup inside Repeat Produces 0 results
> --
>
> Key: TINKERPOP-2220
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2220
> Project: TinkerPop
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Rahul Chander
>Priority: Major
>
> Testing against the Tinkerpop Modern graph dataset, I ran this query:
> {code:java}
> g.V().repeat(__.dedup()).times(2).count()
> {code}
> which should essentially be the same as running dedup twice. It produced 0 
> results, while dedup twice produced the correct 6.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)