[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results
[ https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844377#comment-16844377 ] Daniel Choi commented on TINKERPOP-2220: True. Ok perhaps dedup not being scoped to each iteration is not a problem, as you point out you can simulate that by taking (depth, vertex) as the dedup key. > Dedup inside Repeat Produces 0 results > -- > > Key: TINKERPOP-2220 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2220 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.3.0 >Reporter: Rahul Chander >Priority: Major > > Testing against the Tinkerpop Modern graph dataset, I ran this query: > {code:java} > g.V().repeat(__.dedup()).times(2).count() > {code} > which should essentially be the same as running dedup twice. It produced 0 > results, while dedup twice produced the correct 6. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results
[ https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844368#comment-16844368 ] Daniel Kuppitz commented on TINKERPOP-2220: --- {quote}we'd be processing *v[6]* twice at _depth=3_, only to later dedup the duplicate pairs created from the double traversal{quote} You are saying that you want to deduplicate the pair, but in your query, you deduplicate the vertex. {code} // what you want g.V(1).repeat(identity().as("incoming").project("d","v").by(loops()).by(identity()).dedup().aggregate("pairs").select("incoming").out()).cap("pairs") // what you bring up as a non-working example g.V(1).repeat(identity().as("incoming").project("d","v").by(loops()).by(identity()).aggregate("pairs").select("incoming").out().dedup()).cap("pairs") {code} > Dedup inside Repeat Produces 0 results > -- > > Key: TINKERPOP-2220 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2220 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.3.0 >Reporter: Rahul Chander >Priority: Major > > Testing against the Tinkerpop Modern graph dataset, I ran this query: > {code:java} > g.V().repeat(__.dedup()).times(2).count() > {code} > which should essentially be the same as running dedup twice. It produced 0 > results, while dedup twice produced the correct 6. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results
[ https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844240#comment-16844240 ] Daniel Choi commented on TINKERPOP-2220: Just realized unrolling doesn't achieve the same outcome as above either; it's almost like we need a local dedup() per iteration (assuming BFS traversal order). Perhaps another corroborating point to making a dedicated graph search step instead of trying to use repeat for this kinds of things. > Dedup inside Repeat Produces 0 results > -- > > Key: TINKERPOP-2220 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2220 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.3.0 >Reporter: Rahul Chander >Priority: Major > > Testing against the Tinkerpop Modern graph dataset, I ran this query: > {code:java} > g.V().repeat(__.dedup()).times(2).count() > {code} > which should essentially be the same as running dedup twice. It produced 0 > results, while dedup twice produced the correct 6. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results
[ https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844203#comment-16844203 ] Daniel Choi commented on TINKERPOP-2220: Yes if you define that's how dedup() should work inside a repeat(), then that is the correct behavior. And I don't deny it's useful to have it behave this way. I was just pointing out that it didn't seem to be consistent with how repeat works in general (repeat the inner traversal verbatim as if unrolled), but perhaps I'm introducing my own bias here in terms of how repeat should work. As a counter point, imagine you wanted to do a BFS traversal starting from a node to all sink nodes, and wanted to print out all distinct nodes at each frontier depth. In other words, all pairs (d, v), where d=depth and v=vertex. {code:java} gremlin> g.V(1).repeat(identity().as("incoming").project("d","v").by(loops()).by(identity()).aggregate("pairs").select("incoming").out()).cap("pairs") ==>[[d:0,v:v[1]],[d:1,v:v[3]],[d:1,v:v[2]],[d:1,v:v[4]],[d:2,v:v[5]],[d:2,v:v[3]]{code} Now with dedup: {code:java} gremlin> g.V(1).repeat(identity().as("incoming").project("d","v").by(loops()).by(identity()).aggregate("pairs").select("incoming").out().dedup()).cap("pairs") ==>[[d:0,v:v[1]],[d:1,v:v[3]],[d:1,v:v[2]],[d:1,v:v[4]],[d:2,v:v[5]]] {code} Notice how pair *[d:2, v:v[3]]* is gone in the dedup version, even though this is the first time it's appearing in _depth=2_. You could argue you could instead do the repeat traversals without any dedup, then later dedup all the aggregated pairs. But then you're introducing more unnecessary computations at each depth, for example if ***v[5]* and *v[3]* both had edges going out to *v[6]*, we'd be processing *v[6]* twice at _depth=3_, only to later dedup the duplicate pairs created from the double traversal. The problem gets worse as your traversal tree deepens.** > Dedup inside Repeat Produces 0 results > -- > > Key: TINKERPOP-2220 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2220 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.3.0 >Reporter: Rahul Chander >Priority: Major > > Testing against the Tinkerpop Modern graph dataset, I ran this query: > {code:java} > g.V().repeat(__.dedup()).times(2).count() > {code} > which should essentially be the same as running dedup twice. It produced 0 > results, while dedup twice produced the correct 6. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results
[ https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842565#comment-16842565 ] Daniel Kuppitz commented on TINKERPOP-2220: --- Just had another idea on how to explain it. Instead of {{dedup()}} let's just use a filter lambda that follows the same rules (and prints some debug messages): {code} gremlin> dedupLambda = { ..1> element = it.get() ..2> step = it.getStepId() ..3> if ((m = (step =~ /[0-9]+\.[0-9]+\.[0-9]+\(\)/)).find()) { ..4> step = m.group() ..5> } ..6> seen = memory[step]?.contains(element) ..7> memory[step] = (memory[step] ?: []) + [element] ..8> println "${element} seen at step ${step}: ${seen ? 'yes (filter it)' : 'no (let it pass)'}" ..9> return !seen .10> } ==>groovysh_evaluate$_run_closure1@67514bdd gremlin> gremlin> memory = [:] ; g.V().filter(dedupLambda).filter(dedupLambda).count() v[1] seen at step 1.0.0(): no (let it pass) v[1] seen at step 2.0.0(): no (let it pass) v[2] seen at step 1.0.0(): no (let it pass) v[2] seen at step 2.0.0(): no (let it pass) v[3] seen at step 1.0.0(): no (let it pass) v[3] seen at step 2.0.0(): no (let it pass) v[4] seen at step 1.0.0(): no (let it pass) v[4] seen at step 2.0.0(): no (let it pass) v[5] seen at step 1.0.0(): no (let it pass) v[5] seen at step 2.0.0(): no (let it pass) v[6] seen at step 1.0.0(): no (let it pass) v[6] seen at step 2.0.0(): no (let it pass) ==>6 gremlin> memory = [:] ; g.V().repeat(filter(dedupLambda)).times(2).count() v[1] seen at step 1.0.0(): no (let it pass) v[1] seen at step 1.0.0(): yes (filter it) v[2] seen at step 1.0.0(): no (let it pass) v[2] seen at step 1.0.0(): yes (filter it) v[3] seen at step 1.0.0(): no (let it pass) v[3] seen at step 1.0.0(): yes (filter it) v[4] seen at step 1.0.0(): no (let it pass) v[4] seen at step 1.0.0(): yes (filter it) v[5] seen at step 1.0.0(): no (let it pass) v[5] seen at step 1.0.0(): yes (filter it) v[6] seen at step 1.0.0(): no (let it pass) v[6] seen at step 1.0.0(): yes (filter it) ==>0 {code} > Dedup inside Repeat Produces 0 results > -- > > Key: TINKERPOP-2220 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2220 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.3.0 >Reporter: Rahul Chander >Priority: Major > > Testing against the Tinkerpop Modern graph dataset, I ran this query: > {code:java} > g.V().repeat(__.dedup()).times(2).count() > {code} > which should essentially be the same as running dedup twice. It produced 0 > results, while dedup twice produced the correct 6. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results
[ https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842470#comment-16842470 ] Daniel Kuppitz commented on TINKERPOP-2220: --- {{dedup()}} is a {{FilterStep}}. Think about it, {{repeat(whatever).times(2)}} is supposed to emit whatever is left after the 2nd iteration. For {{repeat(dedup()).times(2)}} there's just nothing left in the stream as every element gets filtered in the 2nd iteration. {code} gremlin> g.V().repeat(dedup()).times(2).count() ==>0 gremlin> g.V().repeat(dedup()).emit().times(2).count() ==>6 {code} If you emit all elements after each iteration, you'll get all the survivors from iteration 1. Does it make any more sense now? It's not the same as {{dedup().dedup()}} as this only ensures uniqueness at two different steps in the traversal. And because it's not the same, {{RepeatUnrollStrategy}} won't do anything if it finds a {{DedupStep}}. > Dedup inside Repeat Produces 0 results > -- > > Key: TINKERPOP-2220 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2220 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.3.0 >Reporter: Rahul Chander >Priority: Major > > Testing against the Tinkerpop Modern graph dataset, I ran this query: > {code:java} > g.V().repeat(__.dedup()).times(2).count() > {code} > which should essentially be the same as running dedup twice. It produced 0 > results, while dedup twice produced the correct 6. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results
[ https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842430#comment-16842430 ] Daniel Choi commented on TINKERPOP-2220: Thanks Daniel, I see what you're getting at. Although I feel like if dedup()'s meaning is different in the context of repeat, maybe it would've been better to separate it out as an explicit argument to do sort of "distinct" graph search (only process each node once). repeat().distinct(true) or something like that. It just seems weird that repeat() is normally meant to repeat the inner traversal verbatim, except if it's a dedup() then it's not (evidenced by special case handling of dedup() inside repeat() in unroll strategy). I realize repeat() is usually used to implement graph search like BFS in gremlin and there are strong merits to having a mechanism to reduce the frontier based on a visited set, and that dedup() is being used for this use case. But it seems to start to muddy the semantics of repeat() and maybe it's better to have repeat() as verbatim repetition of the inner traversal while making separate dedicated steps for graph search (with various search related options: bfs, dfs, bi-directional, etc.)? > Dedup inside Repeat Produces 0 results > -- > > Key: TINKERPOP-2220 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2220 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.3.0 >Reporter: Rahul Chander >Priority: Major > > Testing against the Tinkerpop Modern graph dataset, I ran this query: > {code:java} > g.V().repeat(__.dedup()).times(2).count() > {code} > which should essentially be the same as running dedup twice. It produced 0 > results, while dedup twice produced the correct 6. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results
[ https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842317#comment-16842317 ] Daniel Kuppitz commented on TINKERPOP-2220: --- The meaning of {{dedup()}} inside {{repeat()}} is a little different. Just like {{simplePath}} ensures that no element appears twice on the current path, {{repeat(dedup())}} ensures that no element is visited twice within the repetition. > Dedup inside Repeat Produces 0 results > -- > > Key: TINKERPOP-2220 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2220 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.3.0 >Reporter: Rahul Chander >Priority: Major > > Testing against the Tinkerpop Modern graph dataset, I ran this query: > {code:java} > g.V().repeat(__.dedup()).times(2).count() > {code} > which should essentially be the same as running dedup twice. It produced 0 > results, while dedup twice produced the correct 6. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results
[ https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841634#comment-16841634 ] Daniel Choi commented on TINKERPOP-2220: [~okram] could you chime in on this? The explanation in the 524 pull request above seems to imply there context carry over is intended between iterations, but that feels like to me it's breaking the "barrier" nature of dedup(). The second dedup() is reaching into the first dedup()'s context (the deduping set), breaking the barrier, so to speak. > Dedup inside Repeat Produces 0 results > -- > > Key: TINKERPOP-2220 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2220 > Project: TinkerPop > Issue Type: Bug > Components: process >Affects Versions: 3.3.0 >Reporter: Rahul Chander >Priority: Major > > Testing against the Tinkerpop Modern graph dataset, I ran this query: > {code:java} > g.V().repeat(__.dedup()).times(2).count() > {code} > which should essentially be the same as running dedup twice. It produced 0 > results, while dedup twice produced the correct 6. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TINKERPOP-2220) Dedup inside Repeat Produces 0 results
[ https://issues.apache.org/jira/browse/TINKERPOP-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841093#comment-16841093 ] Divij Vaidya commented on TINKERPOP-2220: - Getting an answer of 0 for this query is not a bug but intended behaviour. It has been discussed as part of the PR [https://github.com/apache/tinkerpop/pull/524] > Dedup inside Repeat Produces 0 results > -- > > Key: TINKERPOP-2220 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2220 > Project: TinkerPop > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Rahul Chander >Priority: Major > > Testing against the Tinkerpop Modern graph dataset, I ran this query: > {code:java} > g.V().repeat(__.dedup()).times(2).count() > {code} > which should essentially be the same as running dedup twice. It produced 0 > results, while dedup twice produced the correct 6. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)