Re: [DISCUSS] Returning Side Effects

2016-08-08 Thread Stephen Mallette
so - in retrospect the unified streaming model for results and side-effects
wasn't so awesome and things needed to get re-worked a bit. I changed it to
the other option we had which was to cache the side-effects on the server
and then return them on demand when requested. So the benefit is that we
get to only return data when requested:

gremlin> graph ='conf/')
==>remotegraph[DriverServerConnection-localhost/ [graph=g]]
gremlin> g = graph.traversal()
==>graphtraversalsource[remotegraph[DriverServerConnection-localhost/ [graph=g]], standard]
gremlin> t = g.V(1).aggregate('a').outE("knows").aggregate("b").inV()
gremlin> se = t.getSideEffects();[]
gremlin> se.keys()  // request the keys from the server and cache
locally for future calls
gremlin> se.get('a')  // get "a" side-effect from the server and
cache locally for future calls against "a"
gremlin> se.get('b')  // get "b" side-effect from the server and
cache locally for future calls against "b"

The downside is that we have to hold the side-effects on the server in a
cache so there is some cost in doing that. I think that cost can be
mitigated though if a Traversal is treated like a resource that we close().
The close() can then trigger something to release the side-effects on the

As far as the protocol goes, I didn't change a whole lot from what I
previously described actually but there are now multiple ops on the
TraversalOpProcessor for drivers to implement:

+ bytecode - send Traversal bytecode here and get the results from the
+ keys - get the keys from the sideeffects on a previously executed
Traversal streamed back in the same way we ship back traversal results
currently (cached by the request id of the original traversal sent to
+ gather -  get the sideffects for a key - streamed back in the same way we
ship traversal results currently with same extra meta-data described in the
previous thread (i.e. the sideEffect/aggregateTo keys/values)
+ close - kill a particular set of side effects in the cache

The implementation should be fairly straightforward as the same streaming
protocol is used for return of results of keys/gather as is used for
bytecode (which is all the same as returning results from

On Thu, Jul 28, 2016 at 7:12 PM, Stephen Mallette 

> I have a rough cut of "returning side-effects" working on TINKERPOP-1278
> branch. I didn't bother making this change for REST at this time as I felt
> like it was more important and useful to have it run for websockets/NIO as
> the drivers that would ultimately power a RemoteConnection are generally
> written for that interface.
> gremlin> graph ='conf/')
> ==>remotegraph[DriverServerConnection-localhost/ [graph=g]]
> gremlin>  g = graph.traversal()
> ==>graphtraversalsource[remotegraph[DriverServerConnection-localhost/
> [graph=g]], standard]
> gremlin> t = g.V(1).aggregate('a').outE("knows").aggregate("b").inV()
> ==>v[2]
> ==>v[4]
> gremlin> t.getSideEffects().get('a')
> ==>v[1]
> gremlin> t.getSideEffects().get('b')
> ==>e[7][1-knows->2]
> ==>e[8][1-knows->4]
> It was more effort than i expected to get this to work mostly because of
> my attempts to do it all without breaking change. It was also interesting
> (and nice) to see that the protocol didn't need to change structurally for
> this to work, however, drivers will need to adjust a bit to deal with the
> side-effects now streaming back following results. Note that this only
> matters for those drivers who support submitting Traversals as Bytecode
> (which I assume is "none" of them) and existing script submissions should
> still have he same behavior and thus a terminating stream with the final
> result (side effects left on the server as always).
> To allow for side-effects to come back I added two pieces of metadata to a
> ResponseMessage:
> 1. sideEffect - which is the value of the side effect key. for instance in
> the above example, there would be values for "a" and "b" at different
> points in the stream
> 2. aggregateTo - which will be one of map, list, bulkset, or none. the
> significance here is that we needed a way to to tell the client how a batch
> of results should be re-assembled. recall that Gremlin Server iterates
> everything. If you return a String it puts the String into an Iterator for
> the response. There needed to be a way to say that a particular sideeffect
> was converted to iterator so that it could be re-assembled (or not) to what
> the original type was.
> As for the streaming model, Gremlin Server iterates the results first and
> then the side effects by key. Recall that a ResponseMessage batches up
> results returned from the server based on iteration size. I've arranged it
> so that a ResponseMes

Re: [DISCUSS] Returning Side Effects

2016-07-28 Thread Stephen Mallette
I have a rough cut of "returning side-effects" working on TINKERPOP-1278
branch. I didn't bother making this change for REST at this time as I felt
like it was more important and useful to have it run for websockets/NIO as
the drivers that would ultimately power a RemoteConnection are generally
written for that interface.

gremlin> graph ='conf/')
==>remotegraph[DriverServerConnection-localhost/ [graph=g]]
gremlin>  g = graph.traversal()
==>graphtraversalsource[remotegraph[DriverServerConnection-localhost/ [graph=g]], standard]
gremlin> t = g.V(1).aggregate('a').outE("knows").aggregate("b").inV()
gremlin> t.getSideEffects().get('a')
gremlin> t.getSideEffects().get('b')

It was more effort than i expected to get this to work mostly because of my
attempts to do it all without breaking change. It was also interesting (and
nice) to see that the protocol didn't need to change structurally for this
to work, however, drivers will need to adjust a bit to deal with the
side-effects now streaming back following results. Note that this only
matters for those drivers who support submitting Traversals as Bytecode
(which I assume is "none" of them) and existing script submissions should
still have he same behavior and thus a terminating stream with the final
result (side effects left on the server as always).

To allow for side-effects to come back I added two pieces of metadata to a

1. sideEffect - which is the value of the side effect key. for instance in
the above example, there would be values for "a" and "b" at different
points in the stream
2. aggregateTo - which will be one of map, list, bulkset, or none. the
significance here is that we needed a way to to tell the client how a batch
of results should be re-assembled. recall that Gremlin Server iterates
everything. If you return a String it puts the String into an Iterator for
the response. There needed to be a way to say that a particular sideeffect
was converted to iterator so that it could be re-assembled (or not) to what
the original type was.

As for the streaming model, Gremlin Server iterates the results first and
then the side effects by key. Recall that a ResponseMessage batches up
results returned from the server based on iteration size. I've arranged it
so that a ResponseMessage will never mix results with side effects or one
side-effect key with another key. In this way, it's easy to tie the
sideEffect/aggregateTo values to the data within the message. That made it
pretty easy for me to assemble the stream of side-effects into something
useful on the client side.

There is still a lot to do here:

1. Lots of code cleanup to say the least - Some of the basic interfaces,
classes, etc that i added may see some change as i review with a fresh mind
2. I'd like to make it optional to return side-effects so that drivers or
users can choose to opt-out of the expense of sending that information back
if it isn't needed somehow
3. Piggy-backing on 2, as mentioned earlier in this thread, i think it
would be nice if you could actively state as a user which side-effects you
wanted sent back when you submit the traversal. not sure where that would
be specified right now given the way everything is hooked together.
4. Documentation is non-existent at this point beyond what i've tried to
lay out in this thread so I gotta get to that when all the change settles
down. I assume that won't happen until Marko gets back from his time off as
I suspect he'll think of a few extra things to do in making this all work

Anyway, please let me know if there are any thoughts on this approach.

On Fri, Jul 22, 2016 at 6:24 PM, Stephen Mallette 

> Yes, I expected to return results first and then stream the side-effects.
> On Fri, Jul 22, 2016 at 5:05 PM, Dylan Millikin 
> wrote:
>> > Perhaps nicer than doing all that trickery with transactions would be to
>> self-detach the vertex ahead of time
>> This was the original idea, I never dove too deep into it as the
>> sideEffects were applied mid traversal and extra filtering/SEs still had
>> to
>> occur. I wasn't sure it was actually possible and the transaction hack
>> allowed me to move on.
>> As for the GLV limitations, it's mostly going to be network overhead.
>> Unfortunately one round trip with the server is costly and I know that
>> we've ended up having to be creative in order to limit the round trips by
>> concatenating scripts for each query. A GLV approach would need some
>> careful planing and probably a multiline byteCode feature. But I digress
>> that's not what this thread is about.
>> In the spirit of GLVs returning side effects how would your original
>> proposition stream over the network? Would you get all data first and then
>> SE? I'm guessing you would want to stream the SEs as well.
>> On Fri, Jul 22, 2016 at 4:42 PM, Step

Re: [DISCUSS] Returning Side Effects

2016-07-22 Thread Stephen Mallette
Yes, I expected to return results first and then stream the side-effects.

On Fri, Jul 22, 2016 at 5:05 PM, Dylan Millikin 

> > Perhaps nicer than doing all that trickery with transactions would be to
> self-detach the vertex ahead of time
> This was the original idea, I never dove too deep into it as the
> sideEffects were applied mid traversal and extra filtering/SEs still had to
> occur. I wasn't sure it was actually possible and the transaction hack
> allowed me to move on.
> As for the GLV limitations, it's mostly going to be network overhead.
> Unfortunately one round trip with the server is costly and I know that
> we've ended up having to be creative in order to limit the round trips by
> concatenating scripts for each query. A GLV approach would need some
> careful planing and probably a multiline byteCode feature. But I digress
> that's not what this thread is about.
> In the spirit of GLVs returning side effects how would your original
> proposition stream over the network? Would you get all data first and then
> SE? I'm guessing you would want to stream the SEs as well.
> On Fri, Jul 22, 2016 at 4:42 PM, Stephen Mallette 
> wrote:
> > > You can take the case of a group count as a really simple example.
> >
> > So you want the side-effect in the Vertex itself so you can use it with
> the
> > ORM. Interesting. Perhaps nicer than doing all that trickery with
> > transactions would be to self-detach the vertex ahead of time (i.e.
> create
> > a DetachedVertex) and add the property you want. As indirect as that
> > sounds, that seems more direct to me than the "fake" transaction. Not
> sure
> > that what I'm doing here will help you with that problem.
> >
> > > I'll add that I'm looking at this from a non-GLV perspective so I'm
> > disregarding object mapping done through GraphSONv2.0 typing in favor of
> a
> > format guarantied result set (say that either only contains vertices,
> >  edges, or a combination of both).
> >
> > Also interesting. Not sure that kind of serialization has a place in
> > TinkerPop where we encourage folks to return everything under the sun by
> > using Gremlin to return data in a form that suits their required end
> > result. if this is the outcome you want, I think that my suggestion with
> > self-detaching is probably on the right track. Maybe consider a custom
> > serializer that coerces all results to a graph elements. That would take
> > care of all the embedded objects and the whole lot.
> >
> > > The reason for this is that GLV is too
> > inefficient for larger projects so a more traditional script->result
> > approach is required.
> >
> > I'm hijacking my own thread by going too deep down this path, but I think
> > we should strive toward a solution for GLVs to be robust enough for
> > developers to be successful with TinkerPop in the language of their
> choice.
> > Just like we'll never get rid of all lambdas in Gremlin, we will probably
> > never quite get rid of script->result for all use cases (but, again, like
> > lambdas the goal will be to get quite close). I find it quite interesting
> > that we might be able to figure out how a python dev could write Gremlin
> in
> > python that would remotely execute on the server seamlessly, however it's
> > also interesting that that same GLV code could be treated as server-side
> to
> > be accessed by from a python client. In that way, heavy complex logic
> (the
> > type you are talking about) could be written in python and then accessed
> > from python on the client. In short, i think that it would be better to
> > prefer to think of the work around GLVs as "how to make Gremlin good in
> > other languages" rather than the more narrow view of just "remoting
> > traversals".  If we go wider, we might come up with some good ideas to
> > really broaden access to TinkerPop and graphs in a very big way.
> >
> > We already have a really big improvement with "remoting" as compared to
> > good 'ol RexsterGraph - so that's something  - haha  ;)
> >
> >
> >
> >
> >
> >
> > On Fri, Jul 22, 2016 at 3:17 PM, Dylan Millikin <
> > wrote:
> >
> > > Yeah sorry I left out an important part. This is especially an issue
> when
> > > you're dealing with an ORM layer that's expecting results of a specific
> > > type (for example vertices).
> > > You can take the case of a group count as a really simple example. Your
> > > result set could be :
> > >
> > > [{count:5, vertex:v[1]}, {count:3, vertex:v[2]}, {count:1,
> vertex:v[3]}]
> > > and this is easy enough to do with gremlin. But unless this is built
> into
> > > the ORM itself chances are you'll need to implement the object mapping
> > > yourself.
> > >
> > > The alternative is to add "count" as a property of vertex and then you
> > can
> > > leverage all available features from your ORM such as filtering,
> > ordering,
> > > etc... Actually, the way we did it above we can also do those directly
> in
> > > gremlin as well.
> > >
> > > This is

Re: [DISCUSS] Returning Side Effects

2016-07-22 Thread Dylan Millikin
> Perhaps nicer than doing all that trickery with transactions would be to
self-detach the vertex ahead of time

This was the original idea, I never dove too deep into it as the
sideEffects were applied mid traversal and extra filtering/SEs still had to
occur. I wasn't sure it was actually possible and the transaction hack
allowed me to move on.

As for the GLV limitations, it's mostly going to be network overhead.
Unfortunately one round trip with the server is costly and I know that
we've ended up having to be creative in order to limit the round trips by
concatenating scripts for each query. A GLV approach would need some
careful planing and probably a multiline byteCode feature. But I digress
that's not what this thread is about.

In the spirit of GLVs returning side effects how would your original
proposition stream over the network? Would you get all data first and then
SE? I'm guessing you would want to stream the SEs as well.

On Fri, Jul 22, 2016 at 4:42 PM, Stephen Mallette 

> > You can take the case of a group count as a really simple example.
> So you want the side-effect in the Vertex itself so you can use it with the
> ORM. Interesting. Perhaps nicer than doing all that trickery with
> transactions would be to self-detach the vertex ahead of time (i.e. create
> a DetachedVertex) and add the property you want. As indirect as that
> sounds, that seems more direct to me than the "fake" transaction. Not sure
> that what I'm doing here will help you with that problem.
> > I'll add that I'm looking at this from a non-GLV perspective so I'm
> disregarding object mapping done through GraphSONv2.0 typing in favor of a
> format guarantied result set (say that either only contains vertices,
>  edges, or a combination of both).
> Also interesting. Not sure that kind of serialization has a place in
> TinkerPop where we encourage folks to return everything under the sun by
> using Gremlin to return data in a form that suits their required end
> result. if this is the outcome you want, I think that my suggestion with
> self-detaching is probably on the right track. Maybe consider a custom
> serializer that coerces all results to a graph elements. That would take
> care of all the embedded objects and the whole lot.
> > The reason for this is that GLV is too
> inefficient for larger projects so a more traditional script->result
> approach is required.
> I'm hijacking my own thread by going too deep down this path, but I think
> we should strive toward a solution for GLVs to be robust enough for
> developers to be successful with TinkerPop in the language of their choice.
> Just like we'll never get rid of all lambdas in Gremlin, we will probably
> never quite get rid of script->result for all use cases (but, again, like
> lambdas the goal will be to get quite close). I find it quite interesting
> that we might be able to figure out how a python dev could write Gremlin in
> python that would remotely execute on the server seamlessly, however it's
> also interesting that that same GLV code could be treated as server-side to
> be accessed by from a python client. In that way, heavy complex logic (the
> type you are talking about) could be written in python and then accessed
> from python on the client. In short, i think that it would be better to
> prefer to think of the work around GLVs as "how to make Gremlin good in
> other languages" rather than the more narrow view of just "remoting
> traversals".  If we go wider, we might come up with some good ideas to
> really broaden access to TinkerPop and graphs in a very big way.
> We already have a really big improvement with "remoting" as compared to
> good 'ol RexsterGraph - so that's something  - haha  ;)
> On Fri, Jul 22, 2016 at 3:17 PM, Dylan Millikin 
> wrote:
> > Yeah sorry I left out an important part. This is especially an issue when
> > you're dealing with an ORM layer that's expecting results of a specific
> > type (for example vertices).
> > You can take the case of a group count as a really simple example. Your
> > result set could be :
> >
> > [{count:5, vertex:v[1]}, {count:3, vertex:v[2]}, {count:1, vertex:v[3]}]
> > and this is easy enough to do with gremlin. But unless this is built into
> > the ORM itself chances are you'll need to implement the object mapping
> > yourself.
> >
> > The alternative is to add "count" as a property of vertex and then you
> can
> > leverage all available features from your ORM such as filtering,
> ordering,
> > etc... Actually, the way we did it above we can also do those directly in
> > gremlin as well.
> >
> > This is a simple case, but once it gets more complicated with
> hierarchical
> > data, the option of implementing the object mapping yourself is just a
> > headache and often times less efficient than just rolling back a
> > transaction.
> >
> > Dunno if that was clear enough this time around.
> >
> > I'll add that I'm looking at this from a non-GLV perspective so I'm
> > di

Re: [DISCUSS] Returning Side Effects

2016-07-22 Thread Stephen Mallette
> You can take the case of a group count as a really simple example.

So you want the side-effect in the Vertex itself so you can use it with the
ORM. Interesting. Perhaps nicer than doing all that trickery with
transactions would be to self-detach the vertex ahead of time (i.e. create
a DetachedVertex) and add the property you want. As indirect as that
sounds, that seems more direct to me than the "fake" transaction. Not sure
that what I'm doing here will help you with that problem.

> I'll add that I'm looking at this from a non-GLV perspective so I'm
disregarding object mapping done through GraphSONv2.0 typing in favor of a
format guarantied result set (say that either only contains vertices,
 edges, or a combination of both).

Also interesting. Not sure that kind of serialization has a place in
TinkerPop where we encourage folks to return everything under the sun by
using Gremlin to return data in a form that suits their required end
result. if this is the outcome you want, I think that my suggestion with
self-detaching is probably on the right track. Maybe consider a custom
serializer that coerces all results to a graph elements. That would take
care of all the embedded objects and the whole lot.

> The reason for this is that GLV is too
inefficient for larger projects so a more traditional script->result
approach is required.

I'm hijacking my own thread by going too deep down this path, but I think
we should strive toward a solution for GLVs to be robust enough for
developers to be successful with TinkerPop in the language of their choice.
Just like we'll never get rid of all lambdas in Gremlin, we will probably
never quite get rid of script->result for all use cases (but, again, like
lambdas the goal will be to get quite close). I find it quite interesting
that we might be able to figure out how a python dev could write Gremlin in
python that would remotely execute on the server seamlessly, however it's
also interesting that that same GLV code could be treated as server-side to
be accessed by from a python client. In that way, heavy complex logic (the
type you are talking about) could be written in python and then accessed
from python on the client. In short, i think that it would be better to
prefer to think of the work around GLVs as "how to make Gremlin good in
other languages" rather than the more narrow view of just "remoting
traversals".  If we go wider, we might come up with some good ideas to
really broaden access to TinkerPop and graphs in a very big way.

We already have a really big improvement with "remoting" as compared to
good 'ol RexsterGraph - so that's something  - haha  ;)

On Fri, Jul 22, 2016 at 3:17 PM, Dylan Millikin 

> Yeah sorry I left out an important part. This is especially an issue when
> you're dealing with an ORM layer that's expecting results of a specific
> type (for example vertices).
> You can take the case of a group count as a really simple example. Your
> result set could be :
> [{count:5, vertex:v[1]}, {count:3, vertex:v[2]}, {count:1, vertex:v[3]}]
> and this is easy enough to do with gremlin. But unless this is built into
> the ORM itself chances are you'll need to implement the object mapping
> yourself.
> The alternative is to add "count" as a property of vertex and then you can
> leverage all available features from your ORM such as filtering, ordering,
> etc... Actually, the way we did it above we can also do those directly in
> gremlin as well.
> This is a simple case, but once it gets more complicated with hierarchical
> data, the option of implementing the object mapping yourself is just a
> headache and often times less efficient than just rolling back a
> transaction.
> Dunno if that was clear enough this time around.
> I'll add that I'm looking at this from a non-GLV perspective so I'm
> disregarding object mapping done through GraphSONv2.0 typing in favor of a
> format guarantied result set (say that either only contains vertices,
>  edges, or a combination of both). The reason for this is that GLV is too
> inefficient for larger projects so a more traditional script->result
> approach is required.
> On Fri, Jul 22, 2016 at 2:09 PM, Stephen Mallette 
> wrote:
> > hi dylan, could you please provide a more concrete example of the problem
> > you're facing?
> >
> > On Fri, Jul 22, 2016 at 1:24 PM, Dylan Millikin <
> > wrote:
> >
> > > I'm going to confirm that this is actually a common issue.
> > > One thing to keep in mind is that often times the sideEffects are
> > directly
> > > linked to returned elements on a 1 --> n basis which neither of the
> above
> > > really help with. That is to say that if you're streaming your results
> > > you'll need the sideEffects that relate to the streamed element.
> > >
> > > There is no easy way of handling this currently. Especially if you
> order
> > > your results and get unordered sideEffect results.
> > > One way we've found to work around this is very h

Re: [DISCUSS] Returning Side Effects

2016-07-22 Thread Dylan Millikin
Yeah sorry I left out an important part. This is especially an issue when
you're dealing with an ORM layer that's expecting results of a specific
type (for example vertices).
You can take the case of a group count as a really simple example. Your
result set could be :

[{count:5, vertex:v[1]}, {count:3, vertex:v[2]}, {count:1, vertex:v[3]}]
and this is easy enough to do with gremlin. But unless this is built into
the ORM itself chances are you'll need to implement the object mapping

The alternative is to add "count" as a property of vertex and then you can
leverage all available features from your ORM such as filtering, ordering,
etc... Actually, the way we did it above we can also do those directly in
gremlin as well.

This is a simple case, but once it gets more complicated with hierarchical
data, the option of implementing the object mapping yourself is just a
headache and often times less efficient than just rolling back a

Dunno if that was clear enough this time around.

I'll add that I'm looking at this from a non-GLV perspective so I'm
disregarding object mapping done through GraphSONv2.0 typing in favor of a
format guarantied result set (say that either only contains vertices,
 edges, or a combination of both). The reason for this is that GLV is too
inefficient for larger projects so a more traditional script->result
approach is required.

On Fri, Jul 22, 2016 at 2:09 PM, Stephen Mallette 

> hi dylan, could you please provide a more concrete example of the problem
> you're facing?
> On Fri, Jul 22, 2016 at 1:24 PM, Dylan Millikin 
> wrote:
> > I'm going to confirm that this is actually a common issue.
> > One thing to keep in mind is that often times the sideEffects are
> directly
> > linked to returned elements on a 1 --> n basis which neither of the above
> > really help with. That is to say that if you're streaming your results
> > you'll need the sideEffects that relate to the streamed element.
> >
> > There is no easy way of handling this currently. Especially if you order
> > your results and get unordered sideEffect results.
> > One way we've found to work around this is very hacky, not efficient and
> > only works for non mutating queries:
> >
> > - we start a transaction
> > - we append the sideEffect data to the elements we're emitting (say as
> > properties of a vertex)
> > - get the full result set with sideEffects as properties of the result
> > elements.
> > - rollback transaction so properties are not persisted to the graph.
> >
> > A truly wicked succession of events born from absolute desperation.
> > I enquired a while back about the ability to treat elements as detached
> > from the graph in order to do the above without the transaction handling.
> > But I never followed up.
> >
> > I figured I would put this out there as another case where non-Java
> > languages struggle.
> >
> > On Thu, Jul 21, 2016 at 1:19 PM, Stephen Mallette 
> > wrote:
> >
> > > Your way made me think that if you wrote your traversal like that, you
> > > would return the side-effects twice - once in your traversal as part of
> > the
> > > standard result and then again as a side-effect.  Not sure what that
> > means
> > > - just a thought.
> > >
> > > While I'm thinking thoughts that may or may not be obvious, it also
> > occurs
> > > to me that the downside for a GLV retrieving data that way is that the
> > > result of the traversal won't be streamed back. It will aggregate the
> > > result (and the side-effects naturally) in memory and then return that
> > all
> > > as a whole.
> > >
> > > On Thu, Jul 21, 2016 at 11:24 AM, Daniel Kuppitz 
> > wrote:
> > >
> > > > If you really want to have your result and your side-effects returned
> > by
> > > a
> > > > single request, you could do something like this:
> > > >
> > > > gremlin>
> > > >
> > > >
> > >
> >
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().as("data").select("data",
> > > > "names", "ages")*
> > > > ==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh], ages:[29,
> 27,
> > > 32]]
> > > > gremlin>
> > > >
> > > >
> > >
> >
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().project("data",
> > > > "se").by().by(cap("names","ages"))*
> > > > ==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh],
> ages:[29,
> > > 27,
> > > > 32]]]
> > > > gremlin>
> > g.V(1,2,4).aggregate("names").by("name")*.fold().project("data",
> > > > "se").by().by(cap("names"))*
> > > > ==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]]
> > > >
> > > > I'm not saying it would be bad to have Gremlin Server handle that for
> > > you,
> > > > just wanted to show that it's actually pretty easy to get the data
> and
> > > the
> > > > side-effects without using the traversal admin methods (hence it
> should
> > > > work for all GLVs).
> > > >
> > > > Cheers,
> > > > Daniel
> > > >
> > > >
> > > > On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette <
> >>
> >

Re: [DISCUSS] Returning Side Effects

2016-07-22 Thread Stephen Mallette
hi dylan, could you please provide a more concrete example of the problem
you're facing?

On Fri, Jul 22, 2016 at 1:24 PM, Dylan Millikin 

> I'm going to confirm that this is actually a common issue.
> One thing to keep in mind is that often times the sideEffects are directly
> linked to returned elements on a 1 --> n basis which neither of the above
> really help with. That is to say that if you're streaming your results
> you'll need the sideEffects that relate to the streamed element.
> There is no easy way of handling this currently. Especially if you order
> your results and get unordered sideEffect results.
> One way we've found to work around this is very hacky, not efficient and
> only works for non mutating queries:
> - we start a transaction
> - we append the sideEffect data to the elements we're emitting (say as
> properties of a vertex)
> - get the full result set with sideEffects as properties of the result
> elements.
> - rollback transaction so properties are not persisted to the graph.
> A truly wicked succession of events born from absolute desperation.
> I enquired a while back about the ability to treat elements as detached
> from the graph in order to do the above without the transaction handling.
> But I never followed up.
> I figured I would put this out there as another case where non-Java
> languages struggle.
> On Thu, Jul 21, 2016 at 1:19 PM, Stephen Mallette 
> wrote:
> > Your way made me think that if you wrote your traversal like that, you
> > would return the side-effects twice - once in your traversal as part of
> the
> > standard result and then again as a side-effect.  Not sure what that
> means
> > - just a thought.
> >
> > While I'm thinking thoughts that may or may not be obvious, it also
> occurs
> > to me that the downside for a GLV retrieving data that way is that the
> > result of the traversal won't be streamed back. It will aggregate the
> > result (and the side-effects naturally) in memory and then return that
> all
> > as a whole.
> >
> > On Thu, Jul 21, 2016 at 11:24 AM, Daniel Kuppitz 
> wrote:
> >
> > > If you really want to have your result and your side-effects returned
> by
> > a
> > > single request, you could do something like this:
> > >
> > > gremlin>
> > >
> > >
> >
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().as("data").select("data",
> > > "names", "ages")*
> > > ==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh], ages:[29, 27,
> > 32]]
> > > gremlin>
> > >
> > >
> >
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().project("data",
> > > "se").by().by(cap("names","ages"))*
> > > ==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh], ages:[29,
> > 27,
> > > 32]]]
> > > gremlin>
> g.V(1,2,4).aggregate("names").by("name")*.fold().project("data",
> > > "se").by().by(cap("names"))*
> > > ==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]]
> > >
> > > I'm not saying it would be bad to have Gremlin Server handle that for
> > you,
> > > just wanted to show that it's actually pretty easy to get the data and
> > the
> > > side-effects without using the traversal admin methods (hence it should
> > > work for all GLVs).
> > >
> > > Cheers,
> > > Daniel
> > >
> > >
> > > On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette <
> > > wrote:
> > >
> > > > As we look to build out GLVs and expand Gremlin into other
> programming
> > > > languages, one of the important aspects of doing this should be to
> > > consider
> > > > consistency across GLVs. We should try to prevent capabilities of
> Java
> > > from
> > > > being lost in Python, JS, etc.
> > > >
> > > > As we look at both RemoteGraph in Java and gremlin-python we find
> that
> > > > there is no way to get traversal side-effects. If you write a
> Traversal
> > > and
> > > > want side-effects from it, you have to write your traversal to return
> > > them
> > > > so that it comes back as part of the result set. Since RemoteGraph
> and
> > > > gremlin-python don't really allow you to directly "submit a script"
> > it's
> > > > not as though you can execute a traversal once for both the result
> and
> > > the
> > > > side-effect and package them together in a single request as you
> might
> > do
> > > > with a simple script request:
> > > >
> > > > $ curl -X POST -d
> > > >
> > > >
> > >
> >
> "{\"gremlin\":\"t=g.V(1).values('name').aggregate('x');[v:t.toList(),se:t.getSideEffects().get('x')]\"}"
> > > > http://localhost:8182
> > > >
> > > >
> > >
> >
> {"requestId":"3d3258b2-e421-459a-bf53-ea1e58ece4aa","status":{"message":"","code":200,"attributes":{}},"result":{"data":[{"v":["marko"]},{"se":["marko"]}],"meta":{}}}
> > > >
> > > > I'm thinking that we could alter things in a non-breaking way to
> allow
> > > > optional return of side-effect data so that there is a way to have
> this
> > > all
> > > > streamed back without the need for the little workaround I just
> > > > demonstrated. For REST I thi

Re: [DISCUSS] Returning Side Effects

2016-07-22 Thread Dylan Millikin
I'm going to confirm that this is actually a common issue.
One thing to keep in mind is that often times the sideEffects are directly
linked to returned elements on a 1 --> n basis which neither of the above
really help with. That is to say that if you're streaming your results
you'll need the sideEffects that relate to the streamed element.

There is no easy way of handling this currently. Especially if you order
your results and get unordered sideEffect results.
One way we've found to work around this is very hacky, not efficient and
only works for non mutating queries:

- we start a transaction
- we append the sideEffect data to the elements we're emitting (say as
properties of a vertex)
- get the full result set with sideEffects as properties of the result
- rollback transaction so properties are not persisted to the graph.

A truly wicked succession of events born from absolute desperation.
I enquired a while back about the ability to treat elements as detached
from the graph in order to do the above without the transaction handling.
But I never followed up.

I figured I would put this out there as another case where non-Java
languages struggle.

On Thu, Jul 21, 2016 at 1:19 PM, Stephen Mallette 

> Your way made me think that if you wrote your traversal like that, you
> would return the side-effects twice - once in your traversal as part of the
> standard result and then again as a side-effect.  Not sure what that means
> - just a thought.
> While I'm thinking thoughts that may or may not be obvious, it also occurs
> to me that the downside for a GLV retrieving data that way is that the
> result of the traversal won't be streamed back. It will aggregate the
> result (and the side-effects naturally) in memory and then return that all
> as a whole.
> On Thu, Jul 21, 2016 at 11:24 AM, Daniel Kuppitz  wrote:
> > If you really want to have your result and your side-effects returned by
> a
> > single request, you could do something like this:
> >
> > gremlin>
> >
> >
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().as("data").select("data",
> > "names", "ages")*
> > ==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh], ages:[29, 27,
> 32]]
> > gremlin>
> >
> >
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().project("data",
> > "se").by().by(cap("names","ages"))*
> > ==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh], ages:[29,
> 27,
> > 32]]]
> > gremlin> g.V(1,2,4).aggregate("names").by("name")*.fold().project("data",
> > "se").by().by(cap("names"))*
> > ==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]]
> >
> > I'm not saying it would be bad to have Gremlin Server handle that for
> you,
> > just wanted to show that it's actually pretty easy to get the data and
> the
> > side-effects without using the traversal admin methods (hence it should
> > work for all GLVs).
> >
> > Cheers,
> > Daniel
> >
> >
> > On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette 
> > wrote:
> >
> > > As we look to build out GLVs and expand Gremlin into other programming
> > > languages, one of the important aspects of doing this should be to
> > consider
> > > consistency across GLVs. We should try to prevent capabilities of Java
> > from
> > > being lost in Python, JS, etc.
> > >
> > > As we look at both RemoteGraph in Java and gremlin-python we find that
> > > there is no way to get traversal side-effects. If you write a Traversal
> > and
> > > want side-effects from it, you have to write your traversal to return
> > them
> > > so that it comes back as part of the result set. Since RemoteGraph and
> > > gremlin-python don't really allow you to directly "submit a script"
> it's
> > > not as though you can execute a traversal once for both the result and
> > the
> > > side-effect and package them together in a single request as you might
> do
> > > with a simple script request:
> > >
> > > $ curl -X POST -d
> > >
> > >
> >
> "{\"gremlin\":\"t=g.V(1).values('name').aggregate('x');[v:t.toList(),se:t.getSideEffects().get('x')]\"}"
> > > http://localhost:8182
> > >
> > >
> >
> {"requestId":"3d3258b2-e421-459a-bf53-ea1e58ece4aa","status":{"message":"","code":200,"attributes":{}},"result":{"data":[{"v":["marko"]},{"se":["marko"]}],"meta":{}}}
> > >
> > > I'm thinking that we could alter things in a non-breaking way to allow
> > > optional return of side-effect data so that there is a way to have this
> > all
> > > streamed back without the need for the little workaround I just
> > > demonstrated. For REST I think we could just include a sideEffect
> request
> > > parameter that allowed for a list of side-effect keys to return.
> Perhaps
> > > the a "*" could indicate that all should be returned.  the side-effects
> > > could be serialized into a key sibling to "data" called "sideEffect".
> > >
> > > I think a similar approach could be used for websockets and NIO where
> we
> > > could amend the protocol to accept that sideEffect parameter. 

Re: [DISCUSS] Returning Side Effects

2016-07-21 Thread Stephen Mallette
Your way made me think that if you wrote your traversal like that, you
would return the side-effects twice - once in your traversal as part of the
standard result and then again as a side-effect.  Not sure what that means
- just a thought.

While I'm thinking thoughts that may or may not be obvious, it also occurs
to me that the downside for a GLV retrieving data that way is that the
result of the traversal won't be streamed back. It will aggregate the
result (and the side-effects naturally) in memory and then return that all
as a whole.

On Thu, Jul 21, 2016 at 11:24 AM, Daniel Kuppitz  wrote:

> If you really want to have your result and your side-effects returned by a
> single request, you could do something like this:
> gremlin>
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().as("data").select("data",
> "names", "ages")*
> ==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh], ages:[29, 27, 32]]
> gremlin>
> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().project("data",
> "se").by().by(cap("names","ages"))*
> ==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh], ages:[29, 27,
> 32]]]
> gremlin> g.V(1,2,4).aggregate("names").by("name")*.fold().project("data",
> "se").by().by(cap("names"))*
> ==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]]
> I'm not saying it would be bad to have Gremlin Server handle that for you,
> just wanted to show that it's actually pretty easy to get the data and the
> side-effects without using the traversal admin methods (hence it should
> work for all GLVs).
> Cheers,
> Daniel
> On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette 
> wrote:
> > As we look to build out GLVs and expand Gremlin into other programming
> > languages, one of the important aspects of doing this should be to
> consider
> > consistency across GLVs. We should try to prevent capabilities of Java
> from
> > being lost in Python, JS, etc.
> >
> > As we look at both RemoteGraph in Java and gremlin-python we find that
> > there is no way to get traversal side-effects. If you write a Traversal
> and
> > want side-effects from it, you have to write your traversal to return
> them
> > so that it comes back as part of the result set. Since RemoteGraph and
> > gremlin-python don't really allow you to directly "submit a script" it's
> > not as though you can execute a traversal once for both the result and
> the
> > side-effect and package them together in a single request as you might do
> > with a simple script request:
> >
> > $ curl -X POST -d
> >
> >
> "{\"gremlin\":\"t=g.V(1).values('name').aggregate('x');[v:t.toList(),se:t.getSideEffects().get('x')]\"}"
> > http://localhost:8182
> >
> >
> {"requestId":"3d3258b2-e421-459a-bf53-ea1e58ece4aa","status":{"message":"","code":200,"attributes":{}},"result":{"data":[{"v":["marko"]},{"se":["marko"]}],"meta":{}}}
> >
> > I'm thinking that we could alter things in a non-breaking way to allow
> > optional return of side-effect data so that there is a way to have this
> all
> > streamed back without the need for the little workaround I just
> > demonstrated. For REST I think we could just include a sideEffect request
> > parameter that allowed for a list of side-effect keys to return. Perhaps
> > the a "*" could indicate that all should be returned.  the side-effects
> > could be serialized into a key sibling to "data" called "sideEffect".
> >
> > I think a similar approach could be used for websockets and NIO where we
> > could amend the protocol to accept that sideEffect parameter. We would
> > first stream results (marked with meta data to specify a "result") and
> then
> > stream side effects (again marked with meta data as such).
> >
> > I considered caching the Traversal instances so that a future request
> could
> > get the side effects, but for a variety of reasons I abandoned that (the
> > cache meant more heap and trying to get the right balance, new
> transactions
> > would have to be opened if the side-effect contained graph elements,
> etc.)
> >
> > I like the approach of just maintaining our single request-response model
> > with the changes I proposed above.It seems to provide the least impact
> with
> > no new dependencies, is backward compatible and could be completely
> > optional to RemoteConnections.
> >

Re: [DISCUSS] Returning Side Effects

2016-07-21 Thread Daniel Kuppitz
If you really want to have your result and your side-effects returned by a
single request, you could do something like this:

"names", "ages")*
==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh], ages:[29, 27, 32]]
==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh], ages:[29, 27,
gremlin> g.V(1,2,4).aggregate("names").by("name")*.fold().project("data",
==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]]

I'm not saying it would be bad to have Gremlin Server handle that for you,
just wanted to show that it's actually pretty easy to get the data and the
side-effects without using the traversal admin methods (hence it should
work for all GLVs).


On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette 

> As we look to build out GLVs and expand Gremlin into other programming
> languages, one of the important aspects of doing this should be to consider
> consistency across GLVs. We should try to prevent capabilities of Java from
> being lost in Python, JS, etc.
> As we look at both RemoteGraph in Java and gremlin-python we find that
> there is no way to get traversal side-effects. If you write a Traversal and
> want side-effects from it, you have to write your traversal to return them
> so that it comes back as part of the result set. Since RemoteGraph and
> gremlin-python don't really allow you to directly "submit a script" it's
> not as though you can execute a traversal once for both the result and the
> side-effect and package them together in a single request as you might do
> with a simple script request:
> $ curl -X POST -d
> "{\"gremlin\":\"t=g.V(1).values('name').aggregate('x');[v:t.toList(),se:t.getSideEffects().get('x')]\"}"
> http://localhost:8182
> {"requestId":"3d3258b2-e421-459a-bf53-ea1e58ece4aa","status":{"message":"","code":200,"attributes":{}},"result":{"data":[{"v":["marko"]},{"se":["marko"]}],"meta":{}}}
> I'm thinking that we could alter things in a non-breaking way to allow
> optional return of side-effect data so that there is a way to have this all
> streamed back without the need for the little workaround I just
> demonstrated. For REST I think we could just include a sideEffect request
> parameter that allowed for a list of side-effect keys to return. Perhaps
> the a "*" could indicate that all should be returned.  the side-effects
> could be serialized into a key sibling to "data" called "sideEffect".
> I think a similar approach could be used for websockets and NIO where we
> could amend the protocol to accept that sideEffect parameter. We would
> first stream results (marked with meta data to specify a "result") and then
> stream side effects (again marked with meta data as such).
> I considered caching the Traversal instances so that a future request could
> get the side effects, but for a variety of reasons I abandoned that (the
> cache meant more heap and trying to get the right balance, new transactions
> would have to be opened if the side-effect contained graph elements, etc.)
> I like the approach of just maintaining our single request-response model
> with the changes I proposed above.It seems to provide the least impact with
> no new dependencies, is backward compatible and could be completely
> optional to RemoteConnections.