Re: [VOTE] TinkerPop 3.2.1 Release

2016-07-21 Thread Ted Wilmes
Validate distribution looks good along with manual inspection of docs and
testing.  Thanks for putting this all together Stephen and for running me
through part of the process this time.

VOTE: +1

--Ted

On Wed, Jul 20, 2016 at 3:42 PM, Stephen Mallette 
wrote:

> Thanks Hadrian - I read that section you referenced to remind myself of how
> Apache viewed the -1 and I understood it as you described, however I went
> so far as to say a "-1 means aborting the release" because as the release
> manager I was taking the the non-binding -1 pretty seriously with the
> likely action of killing the release if Pieter's concern wasn't cleared.
> anyway - glad to see that we worked through our first -1 on VOTE day for a
> release ;)
>
> On Wed, Jul 20, 2016 at 4:24 PM, Hadrian Zbarcea 
> wrote:
>
> > +1
> >
> > Stephen, a -1 on a release is not a veto, see the "votes on package
> > releases" section on the foundation site [1]. It is up to the release
> > manager to decide how to proceed. Usually releases are redone not because
> > of the -1, but because there is a valid reason behind the -1. Experienced
> > committers and contributors understand if a -1 is warranted and weird -1s
> > are rare. It is also my preference (and that's what I did in the past) to
> > cancel releases even based on non-binding -1s, because the voice of
> > contributors matters too.
> >
> > Pretty cool that dialogue led to consensus and actions on how to make
> > progress. Another proof of how awesome the Tinkerpop community is.
> >
> > Cheers,
> > Hadrian
> >
> >
> > On 07/20/2016 12:01 PM, Stephen Mallette wrote:
> >
> >> Pieter, Thanks as usual for testing. I would offer than this is not a
> case
> >> for a -1. Note that a -1 says we abort the release completely.
> >>
> >> imo, a -1 should be reserved for when there is a massive bug that brings
> >> down the house - meaning the system is abend in some way and there are
> no
> >> workarounds. a -1 might also be presented if the packaging is bad
> somehow
> >> -
> >> like we didn't include the documentation in the zips. i could also see a
> >> -1
> >> if somehow a GPL'd dependency snuck into our packaging somehow or we
> >> otherwise violated Apache licensing. If other's don't agree, I hope
> >> they'll
> >> say so.
> >>
> >> in this case, you have a single backend for Sqlg that is failing a
> single
> >> test that you can temporarily OptOut of for your tests to pass. Users
> >> don't
> >> specifically have a workaround for this problem if they use Sqlg and
> >> HSQLDB, but it's less of a "bug" and more of a feature that they can't
> use
> >> (i.e. they can't interrupt a running traversal). To me, I don't think we
> >> need to stop release of TinkerPop over that narrow case.
> >>
> >> Would you reconsider your -1 based on that logic?
> >>
> >>
> >>
> >> On Wed, Jul 20, 2016 at 11:45 AM, pieter-gmail  >
> >> wrote:
> >>
> >> Hi,
> >>>
> >>> Ran all Sqlg's tests and the process and structured  test suites.
> >>> But alas there are failures.
> >>>
> >>> TraversalInterruptionTest are failing on HSQLDB as the
> >>> Thread.interrupt() is intercepted by them and the interrupt flag is
> >>> reset.
> >>> The TraversalInterruptionTest tests themselves suffers from this as its
> >>> own Thread.sleep() logic resets the interrupt flag and requires special
> >>> resetting. I'd say the current interrupt strategy needs rethinking.
> >>>
> >>> TailTest.g_V_repeatXbothX_timesX3X_tailX7X fails. I added a few more,
> >>> repeat followed by a tail step, tests in sqlg, all of which also fails.
> >>> Jason has already proposed a fix for this here
> >>> .
> >>>
> >>> vote -1
> >>>
> >>> Thanks
> >>> Pieter
> >>>
> >>>
> >>>
> >>> On 19/07/2016 15:20, Stephen Mallette wrote:
> >>>
>  Hello,
> 
>  We are happy to announce that TinkerPop 3.2.1 is ready for release -
>  note
>  the lack of "-incubating" everywhere.  :)
> 
>  The release artifacts can be found at this location:
>  https://dist.apache.org/repos/dist/dev/tinkerpop/3.2.1/
> 
>  The source distribution is provided by:
>  apache-tinkerpop-3.2.1-src.zip
> 
>  Two binary distributions are provided for user convenience:
>  apache-gremlin-console-3.2.1-bin.zip
>  apache-gremlin-server-3.2.1-bin.zip
> 
>  The GPG key used to sign the release artifacts is available at:
>   https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
> 
>  The online docs can be found here:
>  http://tinkerpop.apache.org/docs/3.2.1/reference/ (user docs)
>  http://tinkerpop.apache.org/docs/3.2.1/upgrade/ (upgrade docs)
>  http://tinkerpop.apache.org/javadocs/3.2.1/core/ (core javadoc)
>  http://tinkerpop.apache.org/javadocs/3.2.1/full/ (full javadoc)
> 
>  The tag in Apache Git can be found here:
> 
> 
> 
> >>>
> 

[RESULT][VOTE] TinkerPop 3.1.3 Release

2016-07-21 Thread Stephen Mallette
This vote is now closed with a total of 4 +1s, no +0s and no -1s. The
results are:

BINDING VOTES:

+1  (X -- Stephen Mallette, Daniel Kuppitz, Dylan Millikin, Ted Wilmes)
0   (0)
-1  (0)

I will wait to officially release 3.1.3 to go alongside the close of vote
on 3.2.1 tomorrow.

Thank you very much,

Stephen

On Wed, Jul 20, 2016 at 10:48 AM, Ted Wilmes  wrote:

> valitedate-distribution.sh looked good from my end along with manual tests
> and a review of the generated java and other docs.
>
> VOTE: +1
>
> --Ted
>
> On Tue, Jul 19, 2016 at 1:50 PM, Dylan Millikin 
> wrote:
>
> > Added tests for all the new features in the PHP driver.
> > Installed gremlin-neo4j and ran the tests against a combination of
> > TinkerGraph, Neo4j, sasl, etc.
> >
> > All tests pass.
> > VOTE: +1
> >
> > On Mon, Jul 18, 2016 at 5:16 PM, Daniel Kuppitz  wrote:
> >
> > > $ bin/validate-distribution.sh 3.1.3
> > >
> > > Validating binary distributions
> > >
> > > * downloading Apache Gremlin Console
> > > (apache-gremlin-console-3.1.3-bin.zip)... OK
> > > * validating signatures and checksums ...
> > >   * PGP signature ... OK
> > >   * MD5 checksum ... OK
> > >   * SHA1 chacksum ... OK
> > > * unzipping Apache Gremlin Console ... OK
> > > * validating Apache Gremlin Console's docs ... OK
> > > * validating Apache Gremlin Console's binaries ... OK
> > > * validating Apache Gremlin Console's legal files ...
> > >   * LICENSE ... OK
> > >   * NOTICE ... OK
> > > * validating Apache Gremlin Console's plugin directory ... OK
> > > * validating Apache Gremlin Console's lib directory ... OK
> > > * testing script evaluation ... OK
> > >
> > > * downloading Apache Gremlin Server
> > > (apache-gremlin-server-3.1.3-bin.zip)... OK
> > > * validating signatures and checksums ...
> > >   * PGP signature ... OK
> > >   * MD5 checksum ... OK
> > >   * SHA1 chacksum ... OK
> > > * unzipping Apache Gremlin Server ... OK
> > > * validating Apache Gremlin Server's docs ... OK
> > > * validating Apache Gremlin Server's binaries ... OK
> > > * validating Apache Gremlin Server's legal files ...
> > >   * LICENSE ... OK
> > >   * NOTICE ... OK
> > > * validating Apache Gremlin Server's plugin directory ... OK
> > > * validating Apache Gremlin Server's lib directory ... OK
> > >
> > > Validating source distribution
> > >
> > > * downloading Apache Tinkerpop 3.1.3
> (apache-tinkerpop-3.1.3-src.zip)...
> > OK
> > > * validating signatures and checksums ...
> > >   * PGP signature ... OK
> > >   * MD5 checksum ... OK
> > >   * SHA1 chacksum ... OK
> > > * unzipping Apache Tinkerpop 3.1.3 ... OK
> > > OK
> > >
> > > Looks good. I will do a few more manual tests tomorrow, but for now...
> > >
> > > VOTE: +1
> > >
> > > Oh, and I remember that I already mentioned it last time: We should fix
> > > those typos in the script (chacksum .> checksum, Tinkerpop ->
> TinkerPop)
> > >
> > > Cheers,
> > > Daniel
> > >
> > >
> > > On Mon, Jul 18, 2016 at 10:32 PM, Stephen Mallette <
> spmalle...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > We are happy to announce that TinkerPop 3.1.3 is ready for release -
> > note
> > > > the lack of "-incubating" everywhere.  :)
> > > >
> > > > The release artifacts can be found at this location:
> > > > https://dist.apache.org/repos/dist/dev/tinkerpop/3.1.3/
> > > >
> > > > The source distribution is provided by:
> > > > apache-tinkerpop-3.1.3-src.zip
> > > >
> > > > Two binary distributions are provided for user convenience:
> > > > apache-gremlin-console-3.1.3-bin.zip
> > > > apache-gremlin-server-3.1.3-bin.zip
> > > >
> > > > The GPG key used to sign the release artifacts is available at:
> > > > https://dist.apache.org/repos/dist/dev/tinkerpop/KEYS
> > > >
> > > > The online docs can be found here:
> > > > http://tinkerpop.apache.org/docs/3.1.3/reference/ (user docs)
> > > > http://tinkerpop.apache.org/docs/3.1.3/upgrade/ (upgrade docs)
> > > > http://tinkerpop.apache.org/javadocs/3.1.3/core/ (core javadoc)
> > > > http://tinkerpop.apache.org/javadocs/3.1.3/full/ (full javadoc)
> > > >
> > > > The tag in Apache Git can be found here:
> > > >
> > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=tinkerpop.git;a=tag;h=0f960ff0823246176343468b746b6e3ac2ade23c
> > > >
> > > > The release notes are available here:
> > > >
> > > >
> > >
> >
> https://github.com/apache/tinkerpop/blob/3.1.3/CHANGELOG.asciidoc#tinkerpop-313-release-date-july-18-2016
> > > >
> > > > The [VOTE] will be open for the next 72 hours --- closing Thursday
> > (July
> > > > 21, 2016) at 4:30 pm EST.
> > > >
> > > > My vote is +1.
> > > >
> > > > Thank you very much,
> > > > Stephen
> > > >
> > >
> >
>


[jira] [Commented] (TINKERPOP-1383) publish-docs.sh might publish to current too early

2016-07-21 Thread stephen mallette (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388427#comment-15388427
 ] 

stephen mallette commented on TINKERPOP-1383:
-

+1

> publish-docs.sh might publish to current too early
> --
>
> Key: TINKERPOP-1383
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1383
> Project: TinkerPop
>  Issue Type: Bug
>  Components: build-release
>Affects Versions: 3.1.3
>Reporter: stephen mallette
>Assignee: Daniel Kuppitz
> Fix For: 3.1.4, 3.2.2
>
>
> In the standard release flow of things, the release manager runs 
> `bin/publish-docs.sh` right before VOTE which means that `/current` gets 
> updated at least 3 days before we announce the release. 
> Maybe updating current should be a specific command? 
> {code}
> bin/publish-docs.sh --update userName
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] interrupt

2016-07-21 Thread pieter-gmail
Ok, np, its not serious, Postgres is the important one for me anyhow and
it is behaving. I'll investigate how to tell Postgres to cancel the
query. Just stopping the traversal is not quite good enough as every now
and again we have queries on Postgres that persist even if the java
thread dies.
Thanks,
Pieter

On 21/07/2016 22:16, Stephen Mallette wrote:
>> For every traversal that starts it notifies the caller via the reference
> object about the traversal.
>
> that's the tricky bit. you'd have to have some global tracking of spawned
> traversals to know that and it would have to be bound to the Thread that
> started it I guess. That information isn't going to be available out of a
> standard JSR-223 ScriptEngine.eval() call. We are making some changes to
> ScriptEngine where we extend upon it for purposes of Gremlin. Maybe there's
> opportunity in those changes to make a change like this somewhere in that
> work (though how that would happen is still murky to me).
>
> If we need changes to the ScriptEngine to even think about doing this, it
> may be a bit of a way off before we can make much progress here. I don't
> expect to see all the ScriptEngine work I had in mind done until 3.3.x as
> it must include some breaking changes to some public APIs to happen.
>
>
>
>
>
> On Thu, Jul 21, 2016 at 4:01 PM, pieter-gmail 
> wrote:
>
>> Well no, the problem is Thread.interrupted() is not reliable. Does not
>> really matter who the caller is, GremlinServer or other.
>> Just about every 3rd party library I can see might reset the flag
>> meaning that the check will randomly return false or true. Something as
>> trivial as a logger might even reset the flag. It seems to me interrupt
>> is more for code that actually calls wait/join/sleep and they handle the
>> any subsequent InterruptException as they please.
>>
>> All I can think of for GremlinServer is a way more complex multi
>> threaded solution.
>> The ScriptEngine.eval passes in a reference object and returns
>> immediately. For every traversal that starts it notifies the caller via
>> the reference object about the traversal. The caller then uses that
>> traversal to interrupt it. Plus some more logic to know when the script
>> is done.
>>
>> Ok had another idea but kinda want to try it first as it might be
>> nonsense. Basically keep retrying the Thread.interrupt() till the thread
>> via exceptions bubbles to the top of the stack and gets handled
>> appropriately.
>>
>> On 21/07/2016 18:47, Stephen Mallette wrote:
>>> thanks for all that pieter. the primary reason for traversal interruption
>>> in the first place was so that gremlin server would have a chance to kill
>>> traversals that were running too long. Without a solution to that
>> problem,
>>> I'm not sure what to do here. just tossing ideas around - could we still
>>> check for thread interruption as an additional way to interrupt a
>>> Traversal. maybe instead of:
>>>
>>> if (Thread.interrupted()) throw new TraversalInterruptedException();
>>>
>>> we need:
>>>
>>> if (Thread.interrupted()) this.traversal.interrupt()
>>>
>>> that would then trigger whatever interrupt logic the traversal had?
>>>
>>> If we need to do a better job with AbstractStep, please create a JIRA
>>> (and/or submit a PR) so we don't forget to make some improvements there.
>>>
>>> On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail 
>>> wrote:
>>>
 I just did a global Intellij search in the Sqlg project.

 HSQLDB has 13 catch (InterruptedException e) clauses. All of them
 swallows the exception and none resets the interrupt flag.

 Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2
 swallows the exception without resetting the interrupt flag and one
 throws an exception.

 The rest,

 logback, 7 catch (InterruptedException e) 1 resets the flag while the
 rest swallow the exception without resetting the interrupt flag

 google guava about 25 catch (InterruptedException e) all resets the
 interrupt flag

 hazelcast 85 catch (InterruptedException e) too many to count but some
 resets the interrupt flag and some don't

 mchange c3po pool 7 catch (InterruptedException e), 4 throws exception
 without resetting the interrupt flag and 3 swallow the exception without
 resetting the interrupt flag.

 mchange common 8 catch (InterruptedException e), 2 throws an exception
 without resetting the interrult flag and 6 complete swallow without
 resetting.

 commons-io 8 catch (InterruptedException e) 1 reset of the interrupt
 flag, 7 swallow the exception without resetting the interrupt flag

 jline 3 catch (InterruptedException e) all swallow the exception without
 resetting the flag.


 All and all I don't think using interrupt will be a reliable strategy to
 use.


>> 

Re: [DISCUSS] interrupt

2016-07-21 Thread Stephen Mallette
> For every traversal that starts it notifies the caller via the reference
object about the traversal.

that's the tricky bit. you'd have to have some global tracking of spawned
traversals to know that and it would have to be bound to the Thread that
started it I guess. That information isn't going to be available out of a
standard JSR-223 ScriptEngine.eval() call. We are making some changes to
ScriptEngine where we extend upon it for purposes of Gremlin. Maybe there's
opportunity in those changes to make a change like this somewhere in that
work (though how that would happen is still murky to me).

If we need changes to the ScriptEngine to even think about doing this, it
may be a bit of a way off before we can make much progress here. I don't
expect to see all the ScriptEngine work I had in mind done until 3.3.x as
it must include some breaking changes to some public APIs to happen.





On Thu, Jul 21, 2016 at 4:01 PM, pieter-gmail 
wrote:

> Well no, the problem is Thread.interrupted() is not reliable. Does not
> really matter who the caller is, GremlinServer or other.
> Just about every 3rd party library I can see might reset the flag
> meaning that the check will randomly return false or true. Something as
> trivial as a logger might even reset the flag. It seems to me interrupt
> is more for code that actually calls wait/join/sleep and they handle the
> any subsequent InterruptException as they please.
>
> All I can think of for GremlinServer is a way more complex multi
> threaded solution.
> The ScriptEngine.eval passes in a reference object and returns
> immediately. For every traversal that starts it notifies the caller via
> the reference object about the traversal. The caller then uses that
> traversal to interrupt it. Plus some more logic to know when the script
> is done.
>
> Ok had another idea but kinda want to try it first as it might be
> nonsense. Basically keep retrying the Thread.interrupt() till the thread
> via exceptions bubbles to the top of the stack and gets handled
> appropriately.
>
> On 21/07/2016 18:47, Stephen Mallette wrote:
> > thanks for all that pieter. the primary reason for traversal interruption
> > in the first place was so that gremlin server would have a chance to kill
> > traversals that were running too long. Without a solution to that
> problem,
> > I'm not sure what to do here. just tossing ideas around - could we still
> > check for thread interruption as an additional way to interrupt a
> > Traversal. maybe instead of:
> >
> > if (Thread.interrupted()) throw new TraversalInterruptedException();
> >
> > we need:
> >
> > if (Thread.interrupted()) this.traversal.interrupt()
> >
> > that would then trigger whatever interrupt logic the traversal had?
> >
> > If we need to do a better job with AbstractStep, please create a JIRA
> > (and/or submit a PR) so we don't forget to make some improvements there.
> >
> > On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail 
> > wrote:
> >
> >> I just did a global Intellij search in the Sqlg project.
> >>
> >> HSQLDB has 13 catch (InterruptedException e) clauses. All of them
> >> swallows the exception and none resets the interrupt flag.
> >>
> >> Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2
> >> swallows the exception without resetting the interrupt flag and one
> >> throws an exception.
> >>
> >> The rest,
> >>
> >> logback, 7 catch (InterruptedException e) 1 resets the flag while the
> >> rest swallow the exception without resetting the interrupt flag
> >>
> >> google guava about 25 catch (InterruptedException e) all resets the
> >> interrupt flag
> >>
> >> hazelcast 85 catch (InterruptedException e) too many to count but some
> >> resets the interrupt flag and some don't
> >>
> >> mchange c3po pool 7 catch (InterruptedException e), 4 throws exception
> >> without resetting the interrupt flag and 3 swallow the exception without
> >> resetting the interrupt flag.
> >>
> >> mchange common 8 catch (InterruptedException e), 2 throws an exception
> >> without resetting the interrult flag and 6 complete swallow without
> >> resetting.
> >>
> >> commons-io 8 catch (InterruptedException e) 1 reset of the interrupt
> >> flag, 7 swallow the exception without resetting the interrupt flag
> >>
> >> jline 3 catch (InterruptedException e) all swallow the exception without
> >> resetting the flag.
> >>
> >>
> >> All and all I don't think using interrupt will be a reliable strategy to
> >> use.
> >>
> >>
> http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag
> >> says that it is good practise to always reset the flag. It might be good
> >> but it is not common.
> >> From the above rather quick search only google guava respected that good
> >> practice.
> >>
> >> AbstractStep code
> >> if (Thread.interrupted()) throw new TraversalInterruptedException();
> >>
> >> will also reset the interrupt flag potentially making someone else's

Re: [DISCUSS] interrupt

2016-07-21 Thread pieter-gmail
Well no, the problem is Thread.interrupted() is not reliable. Does not
really matter who the caller is, GremlinServer or other.
Just about every 3rd party library I can see might reset the flag
meaning that the check will randomly return false or true. Something as
trivial as a logger might even reset the flag. It seems to me interrupt
is more for code that actually calls wait/join/sleep and they handle the
any subsequent InterruptException as they please.

All I can think of for GremlinServer is a way more complex multi
threaded solution.
The ScriptEngine.eval passes in a reference object and returns
immediately. For every traversal that starts it notifies the caller via
the reference object about the traversal. The caller then uses that
traversal to interrupt it. Plus some more logic to know when the script
is done.

Ok had another idea but kinda want to try it first as it might be
nonsense. Basically keep retrying the Thread.interrupt() till the thread
via exceptions bubbles to the top of the stack and gets handled
appropriately.

On 21/07/2016 18:47, Stephen Mallette wrote:
> thanks for all that pieter. the primary reason for traversal interruption
> in the first place was so that gremlin server would have a chance to kill
> traversals that were running too long. Without a solution to that problem,
> I'm not sure what to do here. just tossing ideas around - could we still
> check for thread interruption as an additional way to interrupt a
> Traversal. maybe instead of:
>
> if (Thread.interrupted()) throw new TraversalInterruptedException();
>
> we need:
>
> if (Thread.interrupted()) this.traversal.interrupt()
>
> that would then trigger whatever interrupt logic the traversal had?
>
> If we need to do a better job with AbstractStep, please create a JIRA
> (and/or submit a PR) so we don't forget to make some improvements there.
>
> On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail 
> wrote:
>
>> I just did a global Intellij search in the Sqlg project.
>>
>> HSQLDB has 13 catch (InterruptedException e) clauses. All of them
>> swallows the exception and none resets the interrupt flag.
>>
>> Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2
>> swallows the exception without resetting the interrupt flag and one
>> throws an exception.
>>
>> The rest,
>>
>> logback, 7 catch (InterruptedException e) 1 resets the flag while the
>> rest swallow the exception without resetting the interrupt flag
>>
>> google guava about 25 catch (InterruptedException e) all resets the
>> interrupt flag
>>
>> hazelcast 85 catch (InterruptedException e) too many to count but some
>> resets the interrupt flag and some don't
>>
>> mchange c3po pool 7 catch (InterruptedException e), 4 throws exception
>> without resetting the interrupt flag and 3 swallow the exception without
>> resetting the interrupt flag.
>>
>> mchange common 8 catch (InterruptedException e), 2 throws an exception
>> without resetting the interrult flag and 6 complete swallow without
>> resetting.
>>
>> commons-io 8 catch (InterruptedException e) 1 reset of the interrupt
>> flag, 7 swallow the exception without resetting the interrupt flag
>>
>> jline 3 catch (InterruptedException e) all swallow the exception without
>> resetting the flag.
>>
>>
>> All and all I don't think using interrupt will be a reliable strategy to
>> use.
>>
>> http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag
>> says that it is good practise to always reset the flag. It might be good
>> but it is not common.
>> From the above rather quick search only google guava respected that good
>> practice.
>>
>> AbstractStep code
>> if (Thread.interrupted()) throw new TraversalInterruptedException();
>>
>> will also reset the interrupt flag potentially making someone else's
>> Thread.interrupted() check fail.
>>
>>
>> All that said I do not have a solution for GremlinServer not having
>> access to the traversal.
>>
>> Thanks
>> Pieter
>>
>>
>>
>>
>>
>>
>> On 21/07/2016 17:09, Stephen Mallette wrote:
>>> I don't recall all the issues with doing traversal interruption with a
>>> flag. I suppose it could work in the same way that thread interruption
>>> works now. I will say that I'm hesitant to say that we should change this
>>> on the basis of this being a problem general to databases as we've only
>>> seen in so far in HSQLDB. If it was shown to be a problem in other graphs
>>> i'd be more amplified to see a change. Not sure if any other graph
>>> providers out there can attest to a problem with the thread interruption
>>> approach but it would be nice to hear so if there did.
>>>
>>> Of course, I think you alluded to the bigger problem, which is that
>> Gremlin
>>> Server uses thread interruption to kill script executions and iterations
>>> that exceed timeouts. So, the problem there is that, if someone submits a
>>> script like this:
>>>
>>> t = g.V()
>>> x = t.toList()
>>>
>>> that script gets pushed into a 

[jira] [Updated] (TINKERPOP-1383) publish-docs.sh might publish to current too early

2016-07-21 Thread stephen mallette (JIRA)

 [ 
https://issues.apache.org/jira/browse/TINKERPOP-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stephen mallette updated TINKERPOP-1383:

Summary: publish-docs.sh might publish to current too early  (was: 
publish-docs.sh might publish to current to early)

> publish-docs.sh might publish to current too early
> --
>
> Key: TINKERPOP-1383
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1383
> Project: TinkerPop
>  Issue Type: Bug
>  Components: build-release
>Affects Versions: 3.1.3
>Reporter: stephen mallette
>Assignee: Daniel Kuppitz
> Fix For: 3.1.4, 3.2.2
>
>
> In the standard release flow of things, the release manager runs 
> `bin/publish-docs.sh` right before VOTE which means that `/current` gets 
> updated at least 3 days before we announce the release. 
> Maybe updating current should be a specific command? 
> {code}
> bin/publish-docs.sh --update userName
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TINKERPOP-1383) publish-docs.sh might publish to current to early

2016-07-21 Thread stephen mallette (JIRA)
stephen mallette created TINKERPOP-1383:
---

 Summary: publish-docs.sh might publish to current to early
 Key: TINKERPOP-1383
 URL: https://issues.apache.org/jira/browse/TINKERPOP-1383
 Project: TinkerPop
  Issue Type: Bug
  Components: build-release
Affects Versions: 3.1.3
Reporter: stephen mallette
Assignee: Daniel Kuppitz
 Fix For: 3.1.4, 3.2.2


In the standard release flow of things, the release manager runs 
`bin/publish-docs.sh` right before VOTE which means that `/current` gets 
updated at least 3 days before we announce the release. 

Maybe updating current should be a specific command? 

{code}
bin/publish-docs.sh --update userName
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Returning Side Effects

2016-07-21 Thread Stephen Mallette
Your way made me think that if you wrote your traversal like that, you
would return the side-effects twice - once in your traversal as part of the
standard result and then again as a side-effect.  Not sure what that means
- just a thought.

While I'm thinking thoughts that may or may not be obvious, it also occurs
to me that the downside for a GLV retrieving data that way is that the
result of the traversal won't be streamed back. It will aggregate the
result (and the side-effects naturally) in memory and then return that all
as a whole.

On Thu, Jul 21, 2016 at 11:24 AM, Daniel Kuppitz  wrote:

> If you really want to have your result and your side-effects returned by a
> single request, you could do something like this:
>
> gremlin>
>
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().as("data").select("data",
> "names", "ages")*
> ==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh], ages:[29, 27, 32]]
> gremlin>
>
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().project("data",
> "se").by().by(cap("names","ages"))*
> ==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh], ages:[29, 27,
> 32]]]
> gremlin> g.V(1,2,4).aggregate("names").by("name")*.fold().project("data",
> "se").by().by(cap("names"))*
> ==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]]
>
> I'm not saying it would be bad to have Gremlin Server handle that for you,
> just wanted to show that it's actually pretty easy to get the data and the
> side-effects without using the traversal admin methods (hence it should
> work for all GLVs).
>
> Cheers,
> Daniel
>
>
> On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette 
> wrote:
>
> > As we look to build out GLVs and expand Gremlin into other programming
> > languages, one of the important aspects of doing this should be to
> consider
> > consistency across GLVs. We should try to prevent capabilities of Java
> from
> > being lost in Python, JS, etc.
> >
> > As we look at both RemoteGraph in Java and gremlin-python we find that
> > there is no way to get traversal side-effects. If you write a Traversal
> and
> > want side-effects from it, you have to write your traversal to return
> them
> > so that it comes back as part of the result set. Since RemoteGraph and
> > gremlin-python don't really allow you to directly "submit a script" it's
> > not as though you can execute a traversal once for both the result and
> the
> > side-effect and package them together in a single request as you might do
> > with a simple script request:
> >
> > $ curl -X POST -d
> >
> >
> "{\"gremlin\":\"t=g.V(1).values('name').aggregate('x');[v:t.toList(),se:t.getSideEffects().get('x')]\"}"
> > http://localhost:8182
> >
> >
> {"requestId":"3d3258b2-e421-459a-bf53-ea1e58ece4aa","status":{"message":"","code":200,"attributes":{}},"result":{"data":[{"v":["marko"]},{"se":["marko"]}],"meta":{}}}
> >
> > I'm thinking that we could alter things in a non-breaking way to allow
> > optional return of side-effect data so that there is a way to have this
> all
> > streamed back without the need for the little workaround I just
> > demonstrated. For REST I think we could just include a sideEffect request
> > parameter that allowed for a list of side-effect keys to return. Perhaps
> > the a "*" could indicate that all should be returned.  the side-effects
> > could be serialized into a key sibling to "data" called "sideEffect".
> >
> > I think a similar approach could be used for websockets and NIO where we
> > could amend the protocol to accept that sideEffect parameter. We would
> > first stream results (marked with meta data to specify a "result") and
> then
> > stream side effects (again marked with meta data as such).
> >
> > I considered caching the Traversal instances so that a future request
> could
> > get the side effects, but for a variety of reasons I abandoned that (the
> > cache meant more heap and trying to get the right balance, new
> transactions
> > would have to be opened if the side-effect contained graph elements,
> etc.)
> >
> > I like the approach of just maintaining our single request-response model
> > with the changes I proposed above.It seems to provide the least impact
> with
> > no new dependencies, is backward compatible and could be completely
> > optional to RemoteConnections.
> >
>


Re: [DISCUSS] interrupt

2016-07-21 Thread Stephen Mallette
thanks for all that pieter. the primary reason for traversal interruption
in the first place was so that gremlin server would have a chance to kill
traversals that were running too long. Without a solution to that problem,
I'm not sure what to do here. just tossing ideas around - could we still
check for thread interruption as an additional way to interrupt a
Traversal. maybe instead of:

if (Thread.interrupted()) throw new TraversalInterruptedException();

we need:

if (Thread.interrupted()) this.traversal.interrupt()

that would then trigger whatever interrupt logic the traversal had?

If we need to do a better job with AbstractStep, please create a JIRA
(and/or submit a PR) so we don't forget to make some improvements there.

On Thu, Jul 21, 2016 at 12:37 PM, pieter-gmail 
wrote:

> I just did a global Intellij search in the Sqlg project.
>
> HSQLDB has 13 catch (InterruptedException e) clauses. All of them
> swallows the exception and none resets the interrupt flag.
>
> Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2
> swallows the exception without resetting the interrupt flag and one
> throws an exception.
>
> The rest,
>
> logback, 7 catch (InterruptedException e) 1 resets the flag while the
> rest swallow the exception without resetting the interrupt flag
>
> google guava about 25 catch (InterruptedException e) all resets the
> interrupt flag
>
> hazelcast 85 catch (InterruptedException e) too many to count but some
> resets the interrupt flag and some don't
>
> mchange c3po pool 7 catch (InterruptedException e), 4 throws exception
> without resetting the interrupt flag and 3 swallow the exception without
> resetting the interrupt flag.
>
> mchange common 8 catch (InterruptedException e), 2 throws an exception
> without resetting the interrult flag and 6 complete swallow without
> resetting.
>
> commons-io 8 catch (InterruptedException e) 1 reset of the interrupt
> flag, 7 swallow the exception without resetting the interrupt flag
>
> jline 3 catch (InterruptedException e) all swallow the exception without
> resetting the flag.
>
>
> All and all I don't think using interrupt will be a reliable strategy to
> use.
>
> http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag
> says that it is good practise to always reset the flag. It might be good
> but it is not common.
> From the above rather quick search only google guava respected that good
> practice.
>
> AbstractStep code
> if (Thread.interrupted()) throw new TraversalInterruptedException();
>
> will also reset the interrupt flag potentially making someone else's
> Thread.interrupted() check fail.
>
>
> All that said I do not have a solution for GremlinServer not having
> access to the traversal.
>
> Thanks
> Pieter
>
>
>
>
>
>
> On 21/07/2016 17:09, Stephen Mallette wrote:
> > I don't recall all the issues with doing traversal interruption with a
> > flag. I suppose it could work in the same way that thread interruption
> > works now. I will say that I'm hesitant to say that we should change this
> > on the basis of this being a problem general to databases as we've only
> > seen in so far in HSQLDB. If it was shown to be a problem in other graphs
> > i'd be more amplified to see a change. Not sure if any other graph
> > providers out there can attest to a problem with the thread interruption
> > approach but it would be nice to hear so if there did.
> >
> > Of course, I think you alluded to the bigger problem, which is that
> Gremlin
> > Server uses thread interruption to kill script executions and iterations
> > that exceed timeouts. So, the problem there is that, if someone submits a
> > script like this:
> >
> > t = g.V()
> > x = t.toList()
> >
> > that script gets pushed into a ScriptEngine.eval() method. That method
> > blocks until it is complete. Under that situation, Gremlin Server doesn't
> > have access to the Traversal to call interrupt on it. "t" is iterating
> via
> > toList() and there is no way to stop it. Not sure what we could do about
> > situations like that.
> >
> > On Wed, Jul 20, 2016 at 4:00 PM, pieter-gmail 
> > wrote:
> >
> >> The current interrupt implementation is failing on Sqlg's HSQLDB
> >> implementation.
> >> The reason for this is that HSQLDB itself relies on Thread.interrupt()
> >> for its own internal logic. When TinkerPop interrupts the thread it
> >> thinks it has to do with its own logic and as a result the interrupt
> >> flag is reset and no exception is thrown.
> >>
> >> Reading the Thread.interrupt javadocs it says that wait(), join() and
> >> sleep() will all reset the interrupt flag throw an InterruptedException.
> >> This makes TinkerPop's reliance on the flag being set somewhat fragile.
> >> All of those methods I suspect are common with database io code and
> >> TinkerPop being a high level database layer makes it susceptible to 3rd
> >> party interpretations of interrupt semantics.
> >>
> 

Re: [DISCUSS] interrupt

2016-07-21 Thread pieter-gmail
I just did a global Intellij search in the Sqlg project.

HSQLDB has 13 catch (InterruptedException e) clauses. All of them
swallows the exception and none resets the interrupt flag.

Postgresql jdbc driver has 3 catch (InterruptedException e) clauses. 2
swallows the exception without resetting the interrupt flag and one
throws an exception.

The rest,

logback, 7 catch (InterruptedException e) 1 resets the flag while the
rest swallow the exception without resetting the interrupt flag

google guava about 25 catch (InterruptedException e) all resets the
interrupt flag

hazelcast 85 catch (InterruptedException e) too many to count but some
resets the interrupt flag and some don't

mchange c3po pool 7 catch (InterruptedException e), 4 throws exception
without resetting the interrupt flag and 3 swallow the exception without
resetting the interrupt flag.

mchange common 8 catch (InterruptedException e), 2 throws an exception
without resetting the interrult flag and 6 complete swallow without
resetting.

commons-io 8 catch (InterruptedException e) 1 reset of the interrupt
flag, 7 swallow the exception without resetting the interrupt flag

jline 3 catch (InterruptedException e) all swallow the exception without
resetting the flag.


All and all I don't think using interrupt will be a reliable strategy to
use.
http://stackoverflow.com/questions/10401947/methods-that-clear-the-thread-interrupt-flag
says that it is good practise to always reset the flag. It might be good
but it is not common.
>From the above rather quick search only google guava respected that good
practice.

AbstractStep code
if (Thread.interrupted()) throw new TraversalInterruptedException();

will also reset the interrupt flag potentially making someone else's
Thread.interrupted() check fail.


All that said I do not have a solution for GremlinServer not having
access to the traversal.

Thanks
Pieter






On 21/07/2016 17:09, Stephen Mallette wrote:
> I don't recall all the issues with doing traversal interruption with a
> flag. I suppose it could work in the same way that thread interruption
> works now. I will say that I'm hesitant to say that we should change this
> on the basis of this being a problem general to databases as we've only
> seen in so far in HSQLDB. If it was shown to be a problem in other graphs
> i'd be more amplified to see a change. Not sure if any other graph
> providers out there can attest to a problem with the thread interruption
> approach but it would be nice to hear so if there did.
>
> Of course, I think you alluded to the bigger problem, which is that Gremlin
> Server uses thread interruption to kill script executions and iterations
> that exceed timeouts. So, the problem there is that, if someone submits a
> script like this:
>
> t = g.V()
> x = t.toList()
>
> that script gets pushed into a ScriptEngine.eval() method. That method
> blocks until it is complete. Under that situation, Gremlin Server doesn't
> have access to the Traversal to call interrupt on it. "t" is iterating via
> toList() and there is no way to stop it. Not sure what we could do about
> situations like that.
>
> On Wed, Jul 20, 2016 at 4:00 PM, pieter-gmail 
> wrote:
>
>> The current interrupt implementation is failing on Sqlg's HSQLDB
>> implementation.
>> The reason for this is that HSQLDB itself relies on Thread.interrupt()
>> for its own internal logic. When TinkerPop interrupts the thread it
>> thinks it has to do with its own logic and as a result the interrupt
>> flag is reset and no exception is thrown.
>>
>> Reading the Thread.interrupt javadocs it says that wait(), join() and
>> sleep() will all reset the interrupt flag throw an InterruptedException.
>> This makes TinkerPop's reliance on the flag being set somewhat fragile.
>> All of those methods I suspect are common with database io code and
>> TinkerPop being a high level database layer makes it susceptible to 3rd
>> party interpretations of interrupt semantics.
>>
>> In some ways the TraversalInterruptionTest itself had to carefully reset
>> the flag with its usage of Thread.sleep().
>>
>> My proposal is to mark the traversal itself as interrupted rather than
>> the thread and keep the logic contained to TinkerPop's space.
>>
>> Another benefit is that the traversal.interrupt() can raise an event
>> that implementations can listen to. On receipt of the event they would
>> then be able to send a separate request to the database to cancel a
>> particular query. In my case would be a nice way for Sqlg to tell
>> Postgresql or HSQLDB to cancel a particular query (the latest one the
>> traversal executed).
>>
>> In many ways the semantics are the same. Currently for client code
>> wanting to interrupt a particular traversal it needs to have a reference
>> to the thread the traversal is executing in. Now instead it needs to
>> keep a reference to executing traversals and interrupt them directly.
>>
>> Add Traversal.interrupt() and 

Re: [DISCUSS] interrupt

2016-07-21 Thread Stephen Mallette
I don't recall all the issues with doing traversal interruption with a
flag. I suppose it could work in the same way that thread interruption
works now. I will say that I'm hesitant to say that we should change this
on the basis of this being a problem general to databases as we've only
seen in so far in HSQLDB. If it was shown to be a problem in other graphs
i'd be more amplified to see a change. Not sure if any other graph
providers out there can attest to a problem with the thread interruption
approach but it would be nice to hear so if there did.

Of course, I think you alluded to the bigger problem, which is that Gremlin
Server uses thread interruption to kill script executions and iterations
that exceed timeouts. So, the problem there is that, if someone submits a
script like this:

t = g.V()
x = t.toList()

that script gets pushed into a ScriptEngine.eval() method. That method
blocks until it is complete. Under that situation, Gremlin Server doesn't
have access to the Traversal to call interrupt on it. "t" is iterating via
toList() and there is no way to stop it. Not sure what we could do about
situations like that.

On Wed, Jul 20, 2016 at 4:00 PM, pieter-gmail 
wrote:

> The current interrupt implementation is failing on Sqlg's HSQLDB
> implementation.
> The reason for this is that HSQLDB itself relies on Thread.interrupt()
> for its own internal logic. When TinkerPop interrupts the thread it
> thinks it has to do with its own logic and as a result the interrupt
> flag is reset and no exception is thrown.
>
> Reading the Thread.interrupt javadocs it says that wait(), join() and
> sleep() will all reset the interrupt flag throw an InterruptedException.
> This makes TinkerPop's reliance on the flag being set somewhat fragile.
> All of those methods I suspect are common with database io code and
> TinkerPop being a high level database layer makes it susceptible to 3rd
> party interpretations of interrupt semantics.
>
> In some ways the TraversalInterruptionTest itself had to carefully reset
> the flag with its usage of Thread.sleep().
>
> My proposal is to mark the traversal itself as interrupted rather than
> the thread and keep the logic contained to TinkerPop's space.
>
> Another benefit is that the traversal.interrupt() can raise an event
> that implementations can listen to. On receipt of the event they would
> then be able to send a separate request to the database to cancel a
> particular query. In my case would be a nice way for Sqlg to tell
> Postgresql or HSQLDB to cancel a particular query (the latest one the
> traversal executed).
>
> In many ways the semantics are the same. Currently for client code
> wanting to interrupt a particular traversal it needs to have a reference
> to the thread the traversal is executing in. Now instead it needs to
> keep a reference to executing traversals and interrupt them directly.
>
> Add Traversal.interrupt() and Traversal.isInterrupted(boolean
> ClearInterrupted)
>
> Caveat, I am not familiar with GremlinServer nor the complications
> around interrupt there so perhaps I am missing something.
>
> Thanks
> Pieter
>


[jira] [Commented] (TINKERPOP-1380) dedup() doesn't dedup in rare cases

2016-07-21 Thread Daniel Kuppitz (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387444#comment-15387444
 ] 

Daniel Kuppitz commented on TINKERPOP-1380:
---

A simple test case we could add is this one:

{code}
gremlin> 
TinkerFactory.createModern().traversal().withComputer().V().repeat(both()).until(cyclicPath()).aggregate("x").cap("x")
==>{v[1]=21, v[2]=1, v[3]=21, v[4]=21, v[6]=1, v[5]=1}
==>{v[1]=21, v[2]=1, v[3]=21, v[4]=21, v[6]=1, v[5]=1}
{code}

The test should verify that the result size is 1.

> dedup() doesn't dedup in rare cases
> ---
>
> Key: TINKERPOP-1380
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1380
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.2.1
>Reporter: Daniel Kuppitz
> Fix For: 3.2.2
>
>
> I stumbled across this issue when I tried to solve a problem on the mailing 
> list. It seems like a lot of steps need to be involved in order to make it 
> reproducible.
> {code}
> gremlin> :set max-iteration 10
> gremlin> 
> gremlin> g = TinkerFactory.createModern().traversal().withComputer()
> ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], graphcomputer]
> gremlin> 
> g.V().repeat(both()).until(cyclicPath()).path().aggregate("x").cap("x").unfold().dedup()
> ==>[v[1], v[2], v[1]]
> ==>[v[1], v[2], v[1]]
> ==>[v[1], v[3], v[1]]
> ==>[v[1], v[3], v[1]]
> ==>[v[1], v[4], v[1]]
> ==>[v[1], v[4], v[1]]
> ==>[v[2], v[1], v[2]]
> ==>[v[2], v[1], v[2]]
> ==>[v[3], v[1], v[3]]
> ==>[v[3], v[1], v[3]]
> ...
> {code}
> I can't reproduce it w/o using {{repeat()}}, {{aggregate()}} or {{cap()}}. It 
> is reproducible without {{path()}} though. And then it even gets a little 
> worse; check this out:
> {code}
> gremlin> 
> g.V().repeat(both()).until(cyclicPath()).aggregate("x").cap("x").unfold().dedup()
> ==>v[1]
> ==>v[1]
> ==>v[2]
> ==>v[2]
> ==>v[3]
> ==>v[3]
> ==>v[4]
> ==>v[4]
> ==>v[5]
> ==>v[5]
> ...
> gremlin> 
> g.V().repeat(both()).until(cyclicPath()).aggregate("x").cap("x").unfold().dedup().dedup()
> java.lang.RuntimeException: java.lang.IllegalStateException: 
> java.lang.IllegalArgumentException: The memory can only be set() during 
> vertex program setup and terminate: x
> Display stack trace? [yN]
> {code}
> The exception occurs only in OLAP mode, but also for more meaningful patterns 
> ({{.dedup().dedup()}} really doesn't make much sense).
> For a better / larger example see: 
> https://groups.google.com/d/msg/gremlin-users/NMXExuvDjt0/ps7bJDYwAQAJ



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TINKERPOP-1380) dedup() doesn't dedup in rare cases

2016-07-21 Thread Daniel Kuppitz (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387407#comment-15387407
 ] 

Daniel Kuppitz edited comment on TINKERPOP-1380 at 7/21/16 9:08 AM:


I inspected the side-effects in my original traversal and they actually look 
good. But what I found is that {{cap("p")}} emits 2 results.

{code}
gremlin> 
g.withComputer().V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()).
gremlin>   aggregate("p").by(path()).cap("p")
==>{[v[0]]=1, [v[1], v[2], v[1]]=1, [v[2], v[1], v[2]]=1}
==>{[v[0]]=1, [v[1], v[2], v[1]]=1, [v[2], v[1], v[2]]=1}
{code}

By explicitly limiting it to 1 result, {{dedup()}} gives the expected result:

{code}
gremlin> 
g.withComputer().V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()).
gremlin>   aggregate("p").by(path()).cap("p").limit(1).unfold().limit(local, 
1).map(
gremlin> __.as("v").select("p").unfold().
gremlin> filter(unfold().where(eq("v"))).
gremlin> unfold().dedup().order().by(id).fold()
gremlin>   ).dedup()
==>[v[1], v[2]]
==>[v[0]]
{code}

So it looks like it's more of a {{cap()}} problem. Perhaps we should change the 
title of this ticket and also split it into 3 separate tickets.

We have:

* {{dedup()}} sometimes doesn't dedup
* some usages if {{dedup()}} lead to exceptions
* {{cap}} in OLAP emits 2 results instead of just 1
* side-effects get duplicated

I thought about the last one a bit longer and I think it's not a bug. The 
"problem" is, that the original traversal was already executed; the result is 
confusing, but actually what should be expected. Maybe {{clone()}} should throw 
an exception if the traversal was already iterated.


was (Author: dkuppitz):
I inspected the side-effects in my original traversal and the actually look 
good. But what I found is that {{cap("p")}} emits 2 results.

{code}
gremlin> 
g.withComputer().V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()).
gremlin>   aggregate("p").by(path()).cap("p")
==>{[v[0]]=1, [v[1], v[2], v[1]]=1, [v[2], v[1], v[2]]=1}
==>{[v[0]]=1, [v[1], v[2], v[1]]=1, [v[2], v[1], v[2]]=1}
{code}

By explicitly limiting it to 1 result, {{dedup()}} gives the expected result:

{code}
gremlin> 
g.withComputer().V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()).
gremlin>   aggregate("p").by(path()).cap("p").limit(1).unfold().limit(local, 
1).map(
gremlin> __.as("v").select("p").unfold().
gremlin> filter(unfold().where(eq("v"))).
gremlin> unfold().dedup().order().by(id).fold()
gremlin>   ).dedup()
==>[v[1], v[2]]
==>[v[0]]
{code}

So it looks like it's more of a {{cap()}} problem. Perhaps we should change the 
title of this ticket and also split it into 3 separate tickets.

We have:

* {{dedup()}} sometimes doesn't dedup
* some usages if {{dedup()}} lead to exceptions
* {{cap}} in OLAP emits 2 results instead of just 1
* side-effects get duplicated

I thought about the last one a bit longer and I think it's not a bug. The 
"problem" is, that the original traversal was already executed; the result is 
confusing, but actually what should be expected. Maybe {{clone()}} should throw 
an exception if the traversal was already iterated.

> dedup() doesn't dedup in rare cases
> ---
>
> Key: TINKERPOP-1380
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1380
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.2.1
>Reporter: Daniel Kuppitz
> Fix For: 3.2.2
>
>
> I stumbled across this issue when I tried to solve a problem on the mailing 
> list. It seems like a lot of steps need to be involved in order to make it 
> reproducible.
> {code}
> gremlin> :set max-iteration 10
> gremlin> 
> gremlin> g = TinkerFactory.createModern().traversal().withComputer()
> ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], graphcomputer]
> gremlin> 
> g.V().repeat(both()).until(cyclicPath()).path().aggregate("x").cap("x").unfold().dedup()
> ==>[v[1], v[2], v[1]]
> ==>[v[1], v[2], v[1]]
> ==>[v[1], v[3], v[1]]
> ==>[v[1], v[3], v[1]]
> ==>[v[1], v[4], v[1]]
> ==>[v[1], v[4], v[1]]
> ==>[v[2], v[1], v[2]]
> ==>[v[2], v[1], v[2]]
> ==>[v[3], v[1], v[3]]
> ==>[v[3], v[1], v[3]]
> ...
> {code}
> I can't reproduce it w/o using {{repeat()}}, {{aggregate()}} or {{cap()}}. It 
> is reproducible without {{path()}} though. And then it even gets a little 
> worse; check this out:
> {code}
> gremlin> 
> g.V().repeat(both()).until(cyclicPath()).aggregate("x").cap("x").unfold().dedup()
> ==>v[1]
> ==>v[1]
> ==>v[2]
> ==>v[2]
> ==>v[3]
> ==>v[3]
> ==>v[4]
> ==>v[4]
> ==>v[5]
> ==>v[5]
> ...
> gremlin> 
> g.V().repeat(both()).until(cyclicPath()).aggregate("x").cap("x").unfold().dedup().dedup()
> java.lang.RuntimeException: 

[jira] [Commented] (TINKERPOP-1380) dedup() doesn't dedup in rare cases

2016-07-21 Thread Daniel Kuppitz (JIRA)

[ 
https://issues.apache.org/jira/browse/TINKERPOP-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387407#comment-15387407
 ] 

Daniel Kuppitz commented on TINKERPOP-1380:
---

I inspected the side-effects in my original traversal and the actually look 
good. But what I found is that {{cap("p")}} emits 2 results.

{code}
gremlin> 
g.withComputer().V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()).
gremlin>   aggregate("p").by(path()).cap("p")
==>{[v[0]]=1, [v[1], v[2], v[1]]=1, [v[2], v[1], v[2]]=1}
==>{[v[0]]=1, [v[1], v[2], v[1]]=1, [v[2], v[1], v[2]]=1}
{code}

By explicitly limiting it to 1 result, {{dedup()}} gives the expected result:

{code}
gremlin> 
g.withComputer().V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()).
gremlin>   aggregate("p").by(path()).cap("p").limit(1).unfold().limit(local, 
1).map(
gremlin> __.as("v").select("p").unfold().
gremlin> filter(unfold().where(eq("v"))).
gremlin> unfold().dedup().order().by(id).fold()
gremlin>   ).dedup()
==>[v[1], v[2]]
==>[v[0]]
{code}

So it looks like it's more of a {{cap()}} problem. Perhaps we should change the 
title of this ticket and also split it into 3 separate tickets.

We have:

* {{dedup()}} sometimes doesn't dedup
* some usages if {{dedup()}} lead to exceptions
* {{cap}} in OLAP emits 2 results instead of just 1
* side-effects get duplicated

I thought about the last one a bit longer and I think it's not a bug. The 
"problem" is, that the original traversal was already executed; the result is 
confusing, but actually what should be expected. Maybe {{clone()}} should throw 
an exception if the traversal was already iterated.

> dedup() doesn't dedup in rare cases
> ---
>
> Key: TINKERPOP-1380
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1380
> Project: TinkerPop
>  Issue Type: Bug
>  Components: process
>Affects Versions: 3.2.1
>Reporter: Daniel Kuppitz
> Fix For: 3.2.2
>
>
> I stumbled across this issue when I tried to solve a problem on the mailing 
> list. It seems like a lot of steps need to be involved in order to make it 
> reproducible.
> {code}
> gremlin> :set max-iteration 10
> gremlin> 
> gremlin> g = TinkerFactory.createModern().traversal().withComputer()
> ==>graphtraversalsource[tinkergraph[vertices:6 edges:6], graphcomputer]
> gremlin> 
> g.V().repeat(both()).until(cyclicPath()).path().aggregate("x").cap("x").unfold().dedup()
> ==>[v[1], v[2], v[1]]
> ==>[v[1], v[2], v[1]]
> ==>[v[1], v[3], v[1]]
> ==>[v[1], v[3], v[1]]
> ==>[v[1], v[4], v[1]]
> ==>[v[1], v[4], v[1]]
> ==>[v[2], v[1], v[2]]
> ==>[v[2], v[1], v[2]]
> ==>[v[3], v[1], v[3]]
> ==>[v[3], v[1], v[3]]
> ...
> {code}
> I can't reproduce it w/o using {{repeat()}}, {{aggregate()}} or {{cap()}}. It 
> is reproducible without {{path()}} though. And then it even gets a little 
> worse; check this out:
> {code}
> gremlin> 
> g.V().repeat(both()).until(cyclicPath()).aggregate("x").cap("x").unfold().dedup()
> ==>v[1]
> ==>v[1]
> ==>v[2]
> ==>v[2]
> ==>v[3]
> ==>v[3]
> ==>v[4]
> ==>v[4]
> ==>v[5]
> ==>v[5]
> ...
> gremlin> 
> g.V().repeat(both()).until(cyclicPath()).aggregate("x").cap("x").unfold().dedup().dedup()
> java.lang.RuntimeException: java.lang.IllegalStateException: 
> java.lang.IllegalArgumentException: The memory can only be set() during 
> vertex program setup and terminate: x
> Display stack trace? [yN]
> {code}
> The exception occurs only in OLAP mode, but also for more meaningful patterns 
> ({{.dedup().dedup()}} really doesn't make much sense).
> For a better / larger example see: 
> https://groups.google.com/d/msg/gremlin-users/NMXExuvDjt0/ps7bJDYwAQAJ



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)