Re: [DISCUSS] Returning Side Effects
so - in retrospect the unified streaming model for results and side-effects wasn't so awesome and things needed to get re-worked a bit. I changed it to the other option we had which was to cache the side-effects on the server and then return them on demand when requested. So the benefit is that we get to only return data when requested: gremlin> graph = RemoteGraph.open('conf/remote-graph.properties') ==>remotegraph[DriverServerConnection-localhost/127.0.0.1:8182 [graph=g]] gremlin> g = graph.traversal() ==>graphtraversalsource[remotegraph[DriverServerConnection-localhost/ 127.0.0.1:8182 [graph=g]], standard] gremlin> t = g.V(1).aggregate('a').outE("knows").aggregate("b").inV() ==>v[2] ==>v[4] gremlin> se = t.getSideEffects();[] gremlin> se.keys() // request the keys from the server and cache locally for future calls ==>a ==>b gremlin> se.get('a') // get "a" side-effect from the server and cache locally for future calls against "a" ==>v[1] gremlin> se.get('b') // get "b" side-effect from the server and cache locally for future calls against "b" ==>e[7][1-knows->2] ==>e[8][1-knows->4] The downside is that we have to hold the side-effects on the server in a cache so there is some cost in doing that. I think that cost can be mitigated though if a Traversal is treated like a resource that we close(). The close() can then trigger something to release the side-effects on the server. As far as the protocol goes, I didn't change a whole lot from what I previously described actually but there are now multiple ops on the TraversalOpProcessor for drivers to implement: + bytecode - send Traversal bytecode here and get the results from the Traversal + keys - get the keys from the sideeffects on a previously executed Traversal streamed back in the same way we ship back traversal results currently (cached by the request id of the original traversal sent to bytecode) + gather - get the sideffects for a key - streamed back in the same way we ship traversal results currently with same extra meta-data described in the previous thread (i.e. the sideEffect/aggregateTo keys/values) + close - kill a particular set of side effects in the cache The implementation should be fairly straightforward as the same streaming protocol is used for return of results of keys/gather as is used for bytecode (which is all the same as returning results from Standard/SessionOpProcessor). On Thu, Jul 28, 2016 at 7:12 PM, Stephen Mallette wrote: > I have a rough cut of "returning side-effects" working on TINKERPOP-1278 > branch. I didn't bother making this change for REST at this time as I felt > like it was more important and useful to have it run for websockets/NIO as > the drivers that would ultimately power a RemoteConnection are generally > written for that interface. > > gremlin> graph = RemoteGraph.open('conf/remote-graph.properties') > ==>remotegraph[DriverServerConnection-localhost/127.0.0.1:8182 [graph=g]] > gremlin> g = graph.traversal() > ==>graphtraversalsource[remotegraph[DriverServerConnection-localhost/ > 127.0.0.1:8182 [graph=g]], standard] > gremlin> t = g.V(1).aggregate('a').outE("knows").aggregate("b").inV() > ==>v[2] > ==>v[4] > gremlin> t.getSideEffects().get('a') > ==>v[1] > gremlin> t.getSideEffects().get('b') > ==>e[7][1-knows->2] > ==>e[8][1-knows->4] > > It was more effort than i expected to get this to work mostly because of > my attempts to do it all without breaking change. It was also interesting > (and nice) to see that the protocol didn't need to change structurally for > this to work, however, drivers will need to adjust a bit to deal with the > side-effects now streaming back following results. Note that this only > matters for those drivers who support submitting Traversals as Bytecode > (which I assume is "none" of them) and existing script submissions should > still have he same behavior and thus a terminating stream with the final > result (side effects left on the server as always). > > To allow for side-effects to come back I added two pieces of metadata to a > ResponseMessage: > > 1. sideEffect - which is the value of the side effect key. for instance in > the above example, there would be values for "a" and "b" at different > points in the stream > 2. aggregateTo - which will be one of map, list, bulkset, or none. the > significance here is that we needed a way to to tell the client how a batch > of results should be re-assembled. recall that Gremlin Server iterates > everything. If you return a String it puts the String into an Iterator for > the response. There needed to be a way to say that a particular sideeffect > was converted to iterator so that it could be re-assembled (or not) to what > the original type was. > > As for the streaming model, Gremlin Server iterates the results first and > then the side effects by key. Recall that a ResponseMessage batches up > results returned from the server based on iteration size. I've arranged it > so that a ResponseMes
Re: [DISCUSS] Returning Side Effects
I have a rough cut of "returning side-effects" working on TINKERPOP-1278 branch. I didn't bother making this change for REST at this time as I felt like it was more important and useful to have it run for websockets/NIO as the drivers that would ultimately power a RemoteConnection are generally written for that interface. gremlin> graph = RemoteGraph.open('conf/remote-graph.properties') ==>remotegraph[DriverServerConnection-localhost/127.0.0.1:8182 [graph=g]] gremlin> g = graph.traversal() ==>graphtraversalsource[remotegraph[DriverServerConnection-localhost/ 127.0.0.1:8182 [graph=g]], standard] gremlin> t = g.V(1).aggregate('a').outE("knows").aggregate("b").inV() ==>v[2] ==>v[4] gremlin> t.getSideEffects().get('a') ==>v[1] gremlin> t.getSideEffects().get('b') ==>e[7][1-knows->2] ==>e[8][1-knows->4] It was more effort than i expected to get this to work mostly because of my attempts to do it all without breaking change. It was also interesting (and nice) to see that the protocol didn't need to change structurally for this to work, however, drivers will need to adjust a bit to deal with the side-effects now streaming back following results. Note that this only matters for those drivers who support submitting Traversals as Bytecode (which I assume is "none" of them) and existing script submissions should still have he same behavior and thus a terminating stream with the final result (side effects left on the server as always). To allow for side-effects to come back I added two pieces of metadata to a ResponseMessage: 1. sideEffect - which is the value of the side effect key. for instance in the above example, there would be values for "a" and "b" at different points in the stream 2. aggregateTo - which will be one of map, list, bulkset, or none. the significance here is that we needed a way to to tell the client how a batch of results should be re-assembled. recall that Gremlin Server iterates everything. If you return a String it puts the String into an Iterator for the response. There needed to be a way to say that a particular sideeffect was converted to iterator so that it could be re-assembled (or not) to what the original type was. As for the streaming model, Gremlin Server iterates the results first and then the side effects by key. Recall that a ResponseMessage batches up results returned from the server based on iteration size. I've arranged it so that a ResponseMessage will never mix results with side effects or one side-effect key with another key. In this way, it's easy to tie the sideEffect/aggregateTo values to the data within the message. That made it pretty easy for me to assemble the stream of side-effects into something useful on the client side. There is still a lot to do here: 1. Lots of code cleanup to say the least - Some of the basic interfaces, classes, etc that i added may see some change as i review with a fresh mind tomorrow. 2. I'd like to make it optional to return side-effects so that drivers or users can choose to opt-out of the expense of sending that information back if it isn't needed somehow 3. Piggy-backing on 2, as mentioned earlier in this thread, i think it would be nice if you could actively state as a user which side-effects you wanted sent back when you submit the traversal. not sure where that would be specified right now given the way everything is hooked together. 4. Documentation is non-existent at this point beyond what i've tried to lay out in this thread so I gotta get to that when all the change settles down. I assume that won't happen until Marko gets back from his time off as I suspect he'll think of a few extra things to do in making this all work well. Anyway, please let me know if there are any thoughts on this approach. On Fri, Jul 22, 2016 at 6:24 PM, Stephen Mallette wrote: > Yes, I expected to return results first and then stream the side-effects. > > On Fri, Jul 22, 2016 at 5:05 PM, Dylan Millikin > wrote: > >> > Perhaps nicer than doing all that trickery with transactions would be to >> self-detach the vertex ahead of time >> >> This was the original idea, I never dove too deep into it as the >> sideEffects were applied mid traversal and extra filtering/SEs still had >> to >> occur. I wasn't sure it was actually possible and the transaction hack >> allowed me to move on. >> >> As for the GLV limitations, it's mostly going to be network overhead. >> Unfortunately one round trip with the server is costly and I know that >> we've ended up having to be creative in order to limit the round trips by >> concatenating scripts for each query. A GLV approach would need some >> careful planing and probably a multiline byteCode feature. But I digress >> that's not what this thread is about. >> >> In the spirit of GLVs returning side effects how would your original >> proposition stream over the network? Would you get all data first and then >> SE? I'm guessing you would want to stream the SEs as well. >> >> On Fri, Jul 22, 2016 at 4:42 PM, Step
Re: [DISCUSS] Returning Side Effects
Yes, I expected to return results first and then stream the side-effects. On Fri, Jul 22, 2016 at 5:05 PM, Dylan Millikin wrote: > > Perhaps nicer than doing all that trickery with transactions would be to > self-detach the vertex ahead of time > > This was the original idea, I never dove too deep into it as the > sideEffects were applied mid traversal and extra filtering/SEs still had to > occur. I wasn't sure it was actually possible and the transaction hack > allowed me to move on. > > As for the GLV limitations, it's mostly going to be network overhead. > Unfortunately one round trip with the server is costly and I know that > we've ended up having to be creative in order to limit the round trips by > concatenating scripts for each query. A GLV approach would need some > careful planing and probably a multiline byteCode feature. But I digress > that's not what this thread is about. > > In the spirit of GLVs returning side effects how would your original > proposition stream over the network? Would you get all data first and then > SE? I'm guessing you would want to stream the SEs as well. > > On Fri, Jul 22, 2016 at 4:42 PM, Stephen Mallette > wrote: > > > > You can take the case of a group count as a really simple example. > > > > So you want the side-effect in the Vertex itself so you can use it with > the > > ORM. Interesting. Perhaps nicer than doing all that trickery with > > transactions would be to self-detach the vertex ahead of time (i.e. > create > > a DetachedVertex) and add the property you want. As indirect as that > > sounds, that seems more direct to me than the "fake" transaction. Not > sure > > that what I'm doing here will help you with that problem. > > > > > I'll add that I'm looking at this from a non-GLV perspective so I'm > > disregarding object mapping done through GraphSONv2.0 typing in favor of > a > > format guarantied result set (say that either only contains vertices, > > edges, or a combination of both). > > > > Also interesting. Not sure that kind of serialization has a place in > > TinkerPop where we encourage folks to return everything under the sun by > > using Gremlin to return data in a form that suits their required end > > result. if this is the outcome you want, I think that my suggestion with > > self-detaching is probably on the right track. Maybe consider a custom > > serializer that coerces all results to a graph elements. That would take > > care of all the embedded objects and the whole lot. > > > > > The reason for this is that GLV is too > > inefficient for larger projects so a more traditional script->result > > approach is required. > > > > I'm hijacking my own thread by going too deep down this path, but I think > > we should strive toward a solution for GLVs to be robust enough for > > developers to be successful with TinkerPop in the language of their > choice. > > Just like we'll never get rid of all lambdas in Gremlin, we will probably > > never quite get rid of script->result for all use cases (but, again, like > > lambdas the goal will be to get quite close). I find it quite interesting > > that we might be able to figure out how a python dev could write Gremlin > in > > python that would remotely execute on the server seamlessly, however it's > > also interesting that that same GLV code could be treated as server-side > to > > be accessed by from a python client. In that way, heavy complex logic > (the > > type you are talking about) could be written in python and then accessed > > from python on the client. In short, i think that it would be better to > > prefer to think of the work around GLVs as "how to make Gremlin good in > > other languages" rather than the more narrow view of just "remoting > > traversals". If we go wider, we might come up with some good ideas to > > really broaden access to TinkerPop and graphs in a very big way. > > > > We already have a really big improvement with "remoting" as compared to > > good 'ol RexsterGraph - so that's something - haha ;) > > > > > > > > > > > > > > On Fri, Jul 22, 2016 at 3:17 PM, Dylan Millikin < > dylan.milli...@gmail.com> > > wrote: > > > > > Yeah sorry I left out an important part. This is especially an issue > when > > > you're dealing with an ORM layer that's expecting results of a specific > > > type (for example vertices). > > > You can take the case of a group count as a really simple example. Your > > > result set could be : > > > > > > [{count:5, vertex:v[1]}, {count:3, vertex:v[2]}, {count:1, > vertex:v[3]}] > > > and this is easy enough to do with gremlin. But unless this is built > into > > > the ORM itself chances are you'll need to implement the object mapping > > > yourself. > > > > > > The alternative is to add "count" as a property of vertex and then you > > can > > > leverage all available features from your ORM such as filtering, > > ordering, > > > etc... Actually, the way we did it above we can also do those directly > in > > > gremlin as well. > > > > > > This is
Re: [DISCUSS] Returning Side Effects
> Perhaps nicer than doing all that trickery with transactions would be to self-detach the vertex ahead of time This was the original idea, I never dove too deep into it as the sideEffects were applied mid traversal and extra filtering/SEs still had to occur. I wasn't sure it was actually possible and the transaction hack allowed me to move on. As for the GLV limitations, it's mostly going to be network overhead. Unfortunately one round trip with the server is costly and I know that we've ended up having to be creative in order to limit the round trips by concatenating scripts for each query. A GLV approach would need some careful planing and probably a multiline byteCode feature. But I digress that's not what this thread is about. In the spirit of GLVs returning side effects how would your original proposition stream over the network? Would you get all data first and then SE? I'm guessing you would want to stream the SEs as well. On Fri, Jul 22, 2016 at 4:42 PM, Stephen Mallette wrote: > > You can take the case of a group count as a really simple example. > > So you want the side-effect in the Vertex itself so you can use it with the > ORM. Interesting. Perhaps nicer than doing all that trickery with > transactions would be to self-detach the vertex ahead of time (i.e. create > a DetachedVertex) and add the property you want. As indirect as that > sounds, that seems more direct to me than the "fake" transaction. Not sure > that what I'm doing here will help you with that problem. > > > I'll add that I'm looking at this from a non-GLV perspective so I'm > disregarding object mapping done through GraphSONv2.0 typing in favor of a > format guarantied result set (say that either only contains vertices, > edges, or a combination of both). > > Also interesting. Not sure that kind of serialization has a place in > TinkerPop where we encourage folks to return everything under the sun by > using Gremlin to return data in a form that suits their required end > result. if this is the outcome you want, I think that my suggestion with > self-detaching is probably on the right track. Maybe consider a custom > serializer that coerces all results to a graph elements. That would take > care of all the embedded objects and the whole lot. > > > The reason for this is that GLV is too > inefficient for larger projects so a more traditional script->result > approach is required. > > I'm hijacking my own thread by going too deep down this path, but I think > we should strive toward a solution for GLVs to be robust enough for > developers to be successful with TinkerPop in the language of their choice. > Just like we'll never get rid of all lambdas in Gremlin, we will probably > never quite get rid of script->result for all use cases (but, again, like > lambdas the goal will be to get quite close). I find it quite interesting > that we might be able to figure out how a python dev could write Gremlin in > python that would remotely execute on the server seamlessly, however it's > also interesting that that same GLV code could be treated as server-side to > be accessed by from a python client. In that way, heavy complex logic (the > type you are talking about) could be written in python and then accessed > from python on the client. In short, i think that it would be better to > prefer to think of the work around GLVs as "how to make Gremlin good in > other languages" rather than the more narrow view of just "remoting > traversals". If we go wider, we might come up with some good ideas to > really broaden access to TinkerPop and graphs in a very big way. > > We already have a really big improvement with "remoting" as compared to > good 'ol RexsterGraph - so that's something - haha ;) > > > > > > > On Fri, Jul 22, 2016 at 3:17 PM, Dylan Millikin > wrote: > > > Yeah sorry I left out an important part. This is especially an issue when > > you're dealing with an ORM layer that's expecting results of a specific > > type (for example vertices). > > You can take the case of a group count as a really simple example. Your > > result set could be : > > > > [{count:5, vertex:v[1]}, {count:3, vertex:v[2]}, {count:1, vertex:v[3]}] > > and this is easy enough to do with gremlin. But unless this is built into > > the ORM itself chances are you'll need to implement the object mapping > > yourself. > > > > The alternative is to add "count" as a property of vertex and then you > can > > leverage all available features from your ORM such as filtering, > ordering, > > etc... Actually, the way we did it above we can also do those directly in > > gremlin as well. > > > > This is a simple case, but once it gets more complicated with > hierarchical > > data, the option of implementing the object mapping yourself is just a > > headache and often times less efficient than just rolling back a > > transaction. > > > > Dunno if that was clear enough this time around. > > > > I'll add that I'm looking at this from a non-GLV perspective so I'm > > di
Re: [DISCUSS] Returning Side Effects
> You can take the case of a group count as a really simple example. So you want the side-effect in the Vertex itself so you can use it with the ORM. Interesting. Perhaps nicer than doing all that trickery with transactions would be to self-detach the vertex ahead of time (i.e. create a DetachedVertex) and add the property you want. As indirect as that sounds, that seems more direct to me than the "fake" transaction. Not sure that what I'm doing here will help you with that problem. > I'll add that I'm looking at this from a non-GLV perspective so I'm disregarding object mapping done through GraphSONv2.0 typing in favor of a format guarantied result set (say that either only contains vertices, edges, or a combination of both). Also interesting. Not sure that kind of serialization has a place in TinkerPop where we encourage folks to return everything under the sun by using Gremlin to return data in a form that suits their required end result. if this is the outcome you want, I think that my suggestion with self-detaching is probably on the right track. Maybe consider a custom serializer that coerces all results to a graph elements. That would take care of all the embedded objects and the whole lot. > The reason for this is that GLV is too inefficient for larger projects so a more traditional script->result approach is required. I'm hijacking my own thread by going too deep down this path, but I think we should strive toward a solution for GLVs to be robust enough for developers to be successful with TinkerPop in the language of their choice. Just like we'll never get rid of all lambdas in Gremlin, we will probably never quite get rid of script->result for all use cases (but, again, like lambdas the goal will be to get quite close). I find it quite interesting that we might be able to figure out how a python dev could write Gremlin in python that would remotely execute on the server seamlessly, however it's also interesting that that same GLV code could be treated as server-side to be accessed by from a python client. In that way, heavy complex logic (the type you are talking about) could be written in python and then accessed from python on the client. In short, i think that it would be better to prefer to think of the work around GLVs as "how to make Gremlin good in other languages" rather than the more narrow view of just "remoting traversals". If we go wider, we might come up with some good ideas to really broaden access to TinkerPop and graphs in a very big way. We already have a really big improvement with "remoting" as compared to good 'ol RexsterGraph - so that's something - haha ;) On Fri, Jul 22, 2016 at 3:17 PM, Dylan Millikin wrote: > Yeah sorry I left out an important part. This is especially an issue when > you're dealing with an ORM layer that's expecting results of a specific > type (for example vertices). > You can take the case of a group count as a really simple example. Your > result set could be : > > [{count:5, vertex:v[1]}, {count:3, vertex:v[2]}, {count:1, vertex:v[3]}] > and this is easy enough to do with gremlin. But unless this is built into > the ORM itself chances are you'll need to implement the object mapping > yourself. > > The alternative is to add "count" as a property of vertex and then you can > leverage all available features from your ORM such as filtering, ordering, > etc... Actually, the way we did it above we can also do those directly in > gremlin as well. > > This is a simple case, but once it gets more complicated with hierarchical > data, the option of implementing the object mapping yourself is just a > headache and often times less efficient than just rolling back a > transaction. > > Dunno if that was clear enough this time around. > > I'll add that I'm looking at this from a non-GLV perspective so I'm > disregarding object mapping done through GraphSONv2.0 typing in favor of a > format guarantied result set (say that either only contains vertices, > edges, or a combination of both). The reason for this is that GLV is too > inefficient for larger projects so a more traditional script->result > approach is required. > > On Fri, Jul 22, 2016 at 2:09 PM, Stephen Mallette > wrote: > > > hi dylan, could you please provide a more concrete example of the problem > > you're facing? > > > > On Fri, Jul 22, 2016 at 1:24 PM, Dylan Millikin < > dylan.milli...@gmail.com> > > wrote: > > > > > I'm going to confirm that this is actually a common issue. > > > One thing to keep in mind is that often times the sideEffects are > > directly > > > linked to returned elements on a 1 --> n basis which neither of the > above > > > really help with. That is to say that if you're streaming your results > > > you'll need the sideEffects that relate to the streamed element. > > > > > > There is no easy way of handling this currently. Especially if you > order > > > your results and get unordered sideEffect results. > > > One way we've found to work around this is very h
Re: [DISCUSS] Returning Side Effects
Yeah sorry I left out an important part. This is especially an issue when you're dealing with an ORM layer that's expecting results of a specific type (for example vertices). You can take the case of a group count as a really simple example. Your result set could be : [{count:5, vertex:v[1]}, {count:3, vertex:v[2]}, {count:1, vertex:v[3]}] and this is easy enough to do with gremlin. But unless this is built into the ORM itself chances are you'll need to implement the object mapping yourself. The alternative is to add "count" as a property of vertex and then you can leverage all available features from your ORM such as filtering, ordering, etc... Actually, the way we did it above we can also do those directly in gremlin as well. This is a simple case, but once it gets more complicated with hierarchical data, the option of implementing the object mapping yourself is just a headache and often times less efficient than just rolling back a transaction. Dunno if that was clear enough this time around. I'll add that I'm looking at this from a non-GLV perspective so I'm disregarding object mapping done through GraphSONv2.0 typing in favor of a format guarantied result set (say that either only contains vertices, edges, or a combination of both). The reason for this is that GLV is too inefficient for larger projects so a more traditional script->result approach is required. On Fri, Jul 22, 2016 at 2:09 PM, Stephen Mallette wrote: > hi dylan, could you please provide a more concrete example of the problem > you're facing? > > On Fri, Jul 22, 2016 at 1:24 PM, Dylan Millikin > wrote: > > > I'm going to confirm that this is actually a common issue. > > One thing to keep in mind is that often times the sideEffects are > directly > > linked to returned elements on a 1 --> n basis which neither of the above > > really help with. That is to say that if you're streaming your results > > you'll need the sideEffects that relate to the streamed element. > > > > There is no easy way of handling this currently. Especially if you order > > your results and get unordered sideEffect results. > > One way we've found to work around this is very hacky, not efficient and > > only works for non mutating queries: > > > > - we start a transaction > > - we append the sideEffect data to the elements we're emitting (say as > > properties of a vertex) > > - get the full result set with sideEffects as properties of the result > > elements. > > - rollback transaction so properties are not persisted to the graph. > > > > A truly wicked succession of events born from absolute desperation. > > I enquired a while back about the ability to treat elements as detached > > from the graph in order to do the above without the transaction handling. > > But I never followed up. > > > > I figured I would put this out there as another case where non-Java > > languages struggle. > > > > On Thu, Jul 21, 2016 at 1:19 PM, Stephen Mallette > > wrote: > > > > > Your way made me think that if you wrote your traversal like that, you > > > would return the side-effects twice - once in your traversal as part of > > the > > > standard result and then again as a side-effect. Not sure what that > > means > > > - just a thought. > > > > > > While I'm thinking thoughts that may or may not be obvious, it also > > occurs > > > to me that the downside for a GLV retrieving data that way is that the > > > result of the traversal won't be streamed back. It will aggregate the > > > result (and the side-effects naturally) in memory and then return that > > all > > > as a whole. > > > > > > On Thu, Jul 21, 2016 at 11:24 AM, Daniel Kuppitz > > wrote: > > > > > > > If you really want to have your result and your side-effects returned > > by > > > a > > > > single request, you could do something like this: > > > > > > > > gremlin> > > > > > > > > > > > > > > g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().as("data").select("data", > > > > "names", "ages")* > > > > ==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh], ages:[29, > 27, > > > 32]] > > > > gremlin> > > > > > > > > > > > > > > g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().project("data", > > > > "se").by().by(cap("names","ages"))* > > > > ==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh], > ages:[29, > > > 27, > > > > 32]]] > > > > gremlin> > > g.V(1,2,4).aggregate("names").by("name")*.fold().project("data", > > > > "se").by().by(cap("names"))* > > > > ==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]] > > > > > > > > I'm not saying it would be bad to have Gremlin Server handle that for > > > you, > > > > just wanted to show that it's actually pretty easy to get the data > and > > > the > > > > side-effects without using the traversal admin methods (hence it > should > > > > work for all GLVs). > > > > > > > > Cheers, > > > > Daniel > > > > > > > > > > > > On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette < > > spmalle...@gmail.com> > >
Re: [DISCUSS] Returning Side Effects
hi dylan, could you please provide a more concrete example of the problem you're facing? On Fri, Jul 22, 2016 at 1:24 PM, Dylan Millikin wrote: > I'm going to confirm that this is actually a common issue. > One thing to keep in mind is that often times the sideEffects are directly > linked to returned elements on a 1 --> n basis which neither of the above > really help with. That is to say that if you're streaming your results > you'll need the sideEffects that relate to the streamed element. > > There is no easy way of handling this currently. Especially if you order > your results and get unordered sideEffect results. > One way we've found to work around this is very hacky, not efficient and > only works for non mutating queries: > > - we start a transaction > - we append the sideEffect data to the elements we're emitting (say as > properties of a vertex) > - get the full result set with sideEffects as properties of the result > elements. > - rollback transaction so properties are not persisted to the graph. > > A truly wicked succession of events born from absolute desperation. > I enquired a while back about the ability to treat elements as detached > from the graph in order to do the above without the transaction handling. > But I never followed up. > > I figured I would put this out there as another case where non-Java > languages struggle. > > On Thu, Jul 21, 2016 at 1:19 PM, Stephen Mallette > wrote: > > > Your way made me think that if you wrote your traversal like that, you > > would return the side-effects twice - once in your traversal as part of > the > > standard result and then again as a side-effect. Not sure what that > means > > - just a thought. > > > > While I'm thinking thoughts that may or may not be obvious, it also > occurs > > to me that the downside for a GLV retrieving data that way is that the > > result of the traversal won't be streamed back. It will aggregate the > > result (and the side-effects naturally) in memory and then return that > all > > as a whole. > > > > On Thu, Jul 21, 2016 at 11:24 AM, Daniel Kuppitz > wrote: > > > > > If you really want to have your result and your side-effects returned > by > > a > > > single request, you could do something like this: > > > > > > gremlin> > > > > > > > > > g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().as("data").select("data", > > > "names", "ages")* > > > ==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh], ages:[29, 27, > > 32]] > > > gremlin> > > > > > > > > > g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().project("data", > > > "se").by().by(cap("names","ages"))* > > > ==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh], ages:[29, > > 27, > > > 32]]] > > > gremlin> > g.V(1,2,4).aggregate("names").by("name")*.fold().project("data", > > > "se").by().by(cap("names"))* > > > ==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]] > > > > > > I'm not saying it would be bad to have Gremlin Server handle that for > > you, > > > just wanted to show that it's actually pretty easy to get the data and > > the > > > side-effects without using the traversal admin methods (hence it should > > > work for all GLVs). > > > > > > Cheers, > > > Daniel > > > > > > > > > On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette < > spmalle...@gmail.com> > > > wrote: > > > > > > > As we look to build out GLVs and expand Gremlin into other > programming > > > > languages, one of the important aspects of doing this should be to > > > consider > > > > consistency across GLVs. We should try to prevent capabilities of > Java > > > from > > > > being lost in Python, JS, etc. > > > > > > > > As we look at both RemoteGraph in Java and gremlin-python we find > that > > > > there is no way to get traversal side-effects. If you write a > Traversal > > > and > > > > want side-effects from it, you have to write your traversal to return > > > them > > > > so that it comes back as part of the result set. Since RemoteGraph > and > > > > gremlin-python don't really allow you to directly "submit a script" > > it's > > > > not as though you can execute a traversal once for both the result > and > > > the > > > > side-effect and package them together in a single request as you > might > > do > > > > with a simple script request: > > > > > > > > $ curl -X POST -d > > > > > > > > > > > > > > "{\"gremlin\":\"t=g.V(1).values('name').aggregate('x');[v:t.toList(),se:t.getSideEffects().get('x')]\"}" > > > > http://localhost:8182 > > > > > > > > > > > > > > {"requestId":"3d3258b2-e421-459a-bf53-ea1e58ece4aa","status":{"message":"","code":200,"attributes":{}},"result":{"data":[{"v":["marko"]},{"se":["marko"]}],"meta":{}}} > > > > > > > > I'm thinking that we could alter things in a non-breaking way to > allow > > > > optional return of side-effect data so that there is a way to have > this > > > all > > > > streamed back without the need for the little workaround I just > > > > demonstrated. For REST I thi
Re: [DISCUSS] Returning Side Effects
I'm going to confirm that this is actually a common issue. One thing to keep in mind is that often times the sideEffects are directly linked to returned elements on a 1 --> n basis which neither of the above really help with. That is to say that if you're streaming your results you'll need the sideEffects that relate to the streamed element. There is no easy way of handling this currently. Especially if you order your results and get unordered sideEffect results. One way we've found to work around this is very hacky, not efficient and only works for non mutating queries: - we start a transaction - we append the sideEffect data to the elements we're emitting (say as properties of a vertex) - get the full result set with sideEffects as properties of the result elements. - rollback transaction so properties are not persisted to the graph. A truly wicked succession of events born from absolute desperation. I enquired a while back about the ability to treat elements as detached from the graph in order to do the above without the transaction handling. But I never followed up. I figured I would put this out there as another case where non-Java languages struggle. On Thu, Jul 21, 2016 at 1:19 PM, Stephen Mallette wrote: > Your way made me think that if you wrote your traversal like that, you > would return the side-effects twice - once in your traversal as part of the > standard result and then again as a side-effect. Not sure what that means > - just a thought. > > While I'm thinking thoughts that may or may not be obvious, it also occurs > to me that the downside for a GLV retrieving data that way is that the > result of the traversal won't be streamed back. It will aggregate the > result (and the side-effects naturally) in memory and then return that all > as a whole. > > On Thu, Jul 21, 2016 at 11:24 AM, Daniel Kuppitz wrote: > > > If you really want to have your result and your side-effects returned by > a > > single request, you could do something like this: > > > > gremlin> > > > > > g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().as("data").select("data", > > "names", "ages")* > > ==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh], ages:[29, 27, > 32]] > > gremlin> > > > > > g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().project("data", > > "se").by().by(cap("names","ages"))* > > ==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh], ages:[29, > 27, > > 32]]] > > gremlin> g.V(1,2,4).aggregate("names").by("name")*.fold().project("data", > > "se").by().by(cap("names"))* > > ==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]] > > > > I'm not saying it would be bad to have Gremlin Server handle that for > you, > > just wanted to show that it's actually pretty easy to get the data and > the > > side-effects without using the traversal admin methods (hence it should > > work for all GLVs). > > > > Cheers, > > Daniel > > > > > > On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette > > wrote: > > > > > As we look to build out GLVs and expand Gremlin into other programming > > > languages, one of the important aspects of doing this should be to > > consider > > > consistency across GLVs. We should try to prevent capabilities of Java > > from > > > being lost in Python, JS, etc. > > > > > > As we look at both RemoteGraph in Java and gremlin-python we find that > > > there is no way to get traversal side-effects. If you write a Traversal > > and > > > want side-effects from it, you have to write your traversal to return > > them > > > so that it comes back as part of the result set. Since RemoteGraph and > > > gremlin-python don't really allow you to directly "submit a script" > it's > > > not as though you can execute a traversal once for both the result and > > the > > > side-effect and package them together in a single request as you might > do > > > with a simple script request: > > > > > > $ curl -X POST -d > > > > > > > > > "{\"gremlin\":\"t=g.V(1).values('name').aggregate('x');[v:t.toList(),se:t.getSideEffects().get('x')]\"}" > > > http://localhost:8182 > > > > > > > > > {"requestId":"3d3258b2-e421-459a-bf53-ea1e58ece4aa","status":{"message":"","code":200,"attributes":{}},"result":{"data":[{"v":["marko"]},{"se":["marko"]}],"meta":{}}} > > > > > > I'm thinking that we could alter things in a non-breaking way to allow > > > optional return of side-effect data so that there is a way to have this > > all > > > streamed back without the need for the little workaround I just > > > demonstrated. For REST I think we could just include a sideEffect > request > > > parameter that allowed for a list of side-effect keys to return. > Perhaps > > > the a "*" could indicate that all should be returned. the side-effects > > > could be serialized into a key sibling to "data" called "sideEffect". > > > > > > I think a similar approach could be used for websockets and NIO where > we > > > could amend the protocol to accept that sideEffect parameter.
Re: [DISCUSS] Returning Side Effects
Your way made me think that if you wrote your traversal like that, you would return the side-effects twice - once in your traversal as part of the standard result and then again as a side-effect. Not sure what that means - just a thought. While I'm thinking thoughts that may or may not be obvious, it also occurs to me that the downside for a GLV retrieving data that way is that the result of the traversal won't be streamed back. It will aggregate the result (and the side-effects naturally) in memory and then return that all as a whole. On Thu, Jul 21, 2016 at 11:24 AM, Daniel Kuppitz wrote: > If you really want to have your result and your side-effects returned by a > single request, you could do something like this: > > gremlin> > > g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().as("data").select("data", > "names", "ages")* > ==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh], ages:[29, 27, 32]] > gremlin> > > g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().project("data", > "se").by().by(cap("names","ages"))* > ==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh], ages:[29, 27, > 32]]] > gremlin> g.V(1,2,4).aggregate("names").by("name")*.fold().project("data", > "se").by().by(cap("names"))* > ==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]] > > I'm not saying it would be bad to have Gremlin Server handle that for you, > just wanted to show that it's actually pretty easy to get the data and the > side-effects without using the traversal admin methods (hence it should > work for all GLVs). > > Cheers, > Daniel > > > On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette > wrote: > > > As we look to build out GLVs and expand Gremlin into other programming > > languages, one of the important aspects of doing this should be to > consider > > consistency across GLVs. We should try to prevent capabilities of Java > from > > being lost in Python, JS, etc. > > > > As we look at both RemoteGraph in Java and gremlin-python we find that > > there is no way to get traversal side-effects. If you write a Traversal > and > > want side-effects from it, you have to write your traversal to return > them > > so that it comes back as part of the result set. Since RemoteGraph and > > gremlin-python don't really allow you to directly "submit a script" it's > > not as though you can execute a traversal once for both the result and > the > > side-effect and package them together in a single request as you might do > > with a simple script request: > > > > $ curl -X POST -d > > > > > "{\"gremlin\":\"t=g.V(1).values('name').aggregate('x');[v:t.toList(),se:t.getSideEffects().get('x')]\"}" > > http://localhost:8182 > > > > > {"requestId":"3d3258b2-e421-459a-bf53-ea1e58ece4aa","status":{"message":"","code":200,"attributes":{}},"result":{"data":[{"v":["marko"]},{"se":["marko"]}],"meta":{}}} > > > > I'm thinking that we could alter things in a non-breaking way to allow > > optional return of side-effect data so that there is a way to have this > all > > streamed back without the need for the little workaround I just > > demonstrated. For REST I think we could just include a sideEffect request > > parameter that allowed for a list of side-effect keys to return. Perhaps > > the a "*" could indicate that all should be returned. the side-effects > > could be serialized into a key sibling to "data" called "sideEffect". > > > > I think a similar approach could be used for websockets and NIO where we > > could amend the protocol to accept that sideEffect parameter. We would > > first stream results (marked with meta data to specify a "result") and > then > > stream side effects (again marked with meta data as such). > > > > I considered caching the Traversal instances so that a future request > could > > get the side effects, but for a variety of reasons I abandoned that (the > > cache meant more heap and trying to get the right balance, new > transactions > > would have to be opened if the side-effect contained graph elements, > etc.) > > > > I like the approach of just maintaining our single request-response model > > with the changes I proposed above.It seems to provide the least impact > with > > no new dependencies, is backward compatible and could be completely > > optional to RemoteConnections. > > >
Re: [DISCUSS] Returning Side Effects
If you really want to have your result and your side-effects returned by a single request, you could do something like this: gremlin> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().as("data").select("data", "names", "ages")* ==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh], ages:[29, 27, 32]] gremlin> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().project("data", "se").by().by(cap("names","ages"))* ==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh], ages:[29, 27, 32]]] gremlin> g.V(1,2,4).aggregate("names").by("name")*.fold().project("data", "se").by().by(cap("names"))* ==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]] I'm not saying it would be bad to have Gremlin Server handle that for you, just wanted to show that it's actually pretty easy to get the data and the side-effects without using the traversal admin methods (hence it should work for all GLVs). Cheers, Daniel On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette wrote: > As we look to build out GLVs and expand Gremlin into other programming > languages, one of the important aspects of doing this should be to consider > consistency across GLVs. We should try to prevent capabilities of Java from > being lost in Python, JS, etc. > > As we look at both RemoteGraph in Java and gremlin-python we find that > there is no way to get traversal side-effects. If you write a Traversal and > want side-effects from it, you have to write your traversal to return them > so that it comes back as part of the result set. Since RemoteGraph and > gremlin-python don't really allow you to directly "submit a script" it's > not as though you can execute a traversal once for both the result and the > side-effect and package them together in a single request as you might do > with a simple script request: > > $ curl -X POST -d > > "{\"gremlin\":\"t=g.V(1).values('name').aggregate('x');[v:t.toList(),se:t.getSideEffects().get('x')]\"}" > http://localhost:8182 > > {"requestId":"3d3258b2-e421-459a-bf53-ea1e58ece4aa","status":{"message":"","code":200,"attributes":{}},"result":{"data":[{"v":["marko"]},{"se":["marko"]}],"meta":{}}} > > I'm thinking that we could alter things in a non-breaking way to allow > optional return of side-effect data so that there is a way to have this all > streamed back without the need for the little workaround I just > demonstrated. For REST I think we could just include a sideEffect request > parameter that allowed for a list of side-effect keys to return. Perhaps > the a "*" could indicate that all should be returned. the side-effects > could be serialized into a key sibling to "data" called "sideEffect". > > I think a similar approach could be used for websockets and NIO where we > could amend the protocol to accept that sideEffect parameter. We would > first stream results (marked with meta data to specify a "result") and then > stream side effects (again marked with meta data as such). > > I considered caching the Traversal instances so that a future request could > get the side effects, but for a variety of reasons I abandoned that (the > cache meant more heap and trying to get the right balance, new transactions > would have to be opened if the side-effect contained graph elements, etc.) > > I like the approach of just maintaining our single request-response model > with the changes I proposed above.It seems to provide the least impact with > no new dependencies, is backward compatible and could be completely > optional to RemoteConnections. >