Re: [Neo4j] Gremlin help

2011-10-25 Thread Nuo Yan
On Tue, Oct 25, 2011 at 12:35 PM, Peter Neubauer <
peter.neuba...@neotechnology.com> wrote:

> Yes,
> that is true. We are still in QA with 1.5 GA, expect it during the
> next few weeks as we are hunting down HA potential issues. Hope it is
> ok to wait for some more days?
>


Sure no problem. :)




>
> Cheers,
>
> /peter neubauer
>
> GTalk:  neubauer.peter
> Skype   peter.neubauer
> Phone   +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter  http://twitter.com/peterneubauer
>
> http://www.neo4j.org  - NOSQL for the Enterprise.
> http://startupbootcamp.org/- Öresund - Innovation happens HERE.
>
>
>
> On Tue, Oct 25, 2011 at 2:32 PM, Nuo Yan  wrote:
> > Hi Marko,
> >
> > I believe 1.5 milestone release has Gremlin 1.3 and Blueprints 1.0 but
> > before 1.5 stable release I'm going to be using 1.4.x. In 1.4.2 it only
> has
> > Gremlin 1.2 and doesn't appear to have the setTransactionBufferSize
> stuff.
> >
> > On Tue, Oct 25, 2011 at 11:52 AM, Marko Rodriguez  >wrote:
> >
> >> Hi,
> >>
> >> Note that with Blueprints 1.0, you do not have to deal with a commit
> >> manager. You can do:
> >>
> >>graph.setTransactionBufferSize(50);
> >>
> >> ...and then simply do your traversal. No manager.incrCount() needed. I
> >> believe the latest Neo4j release uses Gremlin 1.3 and Blueprints 1.0. ??
> >> Peter?
> >>
> >> Take care,
> >> Marko.
> >>
> >> http://markorodriguez.com
> >>
> >> On Oct 25, 2011, at 12:43 PM, Nuo Yan wrote:
> >>
> >> > For the record, in case someone else has similar need, I came up with
> the
> >> > following query that does what I described in the last email below
> (still
> >> on
> >> > gremlin 1.2 so still using Commit Manager):
> >> >
> >> > manager = TransactionalGraphHelper.createCommitManager(g, 50);
> >> > g.v(1).out('foo').transform{[it, it.name,
> >> >
> >>
> it.outE('bar').count()]}.aggregate().cap.next().groupBy{it[1]}.each{key,value
> >> > -> value.sort{a,b -> b[2] <=> a[2]}.eachWithIndex{a,i -> if(i > 0)
> >> > {g.removeVertex(a[0]); manager.incrCounter()}}}
> >> > manager.close();
> >> >
> >> > After going through this I got a lot better understanding in Gremlin.
> >> Thanks
> >> > Peter and Marko.
> >> >
> >> >
> >> > On Sat, Oct 22, 2011 at 6:04 PM, Nuo Yan  wrote:
> >> >
> >> >> Thanks very much Marko. I researched the query one step at a time and
> >> >> gained much more knowledge about gremlin.
> >> >>
> >> >> However, I wanted to do something a little bit different, instead of
> >> >> comparing the "name" property of the children nodes to the source
> node,
> >> I
> >> >> wanted to compare among the siblings of the children nodes (only
> first
> >> level
> >> >> under the source node) and if there are duplicates, only keep the one
> >> with
> >> >> the biggest degree of "bar" relationship. (The source node doesn't
> have
> >> a
> >> >> "name" property).
> >> >>
> >> >> For example,
> >> >>
> >> >> v(1) --foo--> v(2) name: "abc" --bar--> (15 nodes)
> >> >> v(1) --foo--> v(3) name: "abc --bar --> (20 nodes)
> >> >> v(1) --foo--> v(4) name "xyz" --bar--> (15 nodes)
> >> >> v(1) --foo--> v(5) name "xyz" --bar--> (25 nodes)
> >> >>
> >> >> would become:
> >> >>
> >> >> v(1) --foo--> v(3) name: "abc --bar --> (20 nodes)
> >> >> v(1) --foo--> v(5) name "xyz" --bar--> (25 nodes)
> >> >>
> >> >> So instead of doing
> >> >>
> >> >>
> >> >> g.v(1).sideEffect{x =
> >> >>
> >>
> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}
> >> >>
> >> >> I proposed doing:
> >> >>
> >> >> g.v(1).out("foo").transform{[it, it.name,
> >> >> it.out("bar").count]}.aggregate.cap
> >> >>
> >> >> to get an array of first level children nodes, their names, and
> degree
> >> of
> >> >> "bar" edges like [v(2), "abc", 15], [v(3), "abc", 20], [v(4), "xyz",
> >> 15],
> >> >> [v(5), "xyz", 20]
> >> >>
> >> >> And then I can sort the array by the name property, and iterate
> through
> >> >> that array to delete nodes that have a smaller count based on the
> count
> >> >> value specified in each sub array.
> >> >>
> >> >> But since my gremlin knowledge is still very limited, before digging
> too
> >> >> much into this proposed solution I want to verify with you that it
> would
> >> >> work and see if you have better or easier approach to do it (i.e.
> maybe
> >> one
> >> >> simple method that I can make use that I'm not aware of).  Thanks
> very
> >> much
> >> >> again.
> >> >>
> >> >>
> >> >> On Sat, Oct 22, 2011 at 9:40 AM, Marko Rodriguez <
> okramma...@gmail.com
> >> >wrote:
> >> >>
> >> >>> Hi,
> >> >>>
> >>  Currently I'm doing the following in my own code with multiple
> >> requests
> >> >>> to the standalone neo4j server. I wonder if it's possible to achieve
> in
> >> one
> >> >>> gremlin query/script so that I can post the gremlin query to the
> server
> >> as 1
> >> >>> request and done. What I'm trying to achieve is:
> >> 
> >>  Start from one given node (e.g. v1), get all of the nodes connected
> >> >>> thr

Re: [Neo4j] Gremlin help

2011-10-25 Thread Peter Neubauer
Yes,
that is true. We are still in QA with 1.5 GA, expect it during the
next few weeks as we are hunting down HA potential issues. Hope it is
ok to wait for some more days?

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org              - NOSQL for the Enterprise.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.



On Tue, Oct 25, 2011 at 2:32 PM, Nuo Yan  wrote:
> Hi Marko,
>
> I believe 1.5 milestone release has Gremlin 1.3 and Blueprints 1.0 but
> before 1.5 stable release I'm going to be using 1.4.x. In 1.4.2 it only has
> Gremlin 1.2 and doesn't appear to have the setTransactionBufferSize stuff.
>
> On Tue, Oct 25, 2011 at 11:52 AM, Marko Rodriguez wrote:
>
>> Hi,
>>
>> Note that with Blueprints 1.0, you do not have to deal with a commit
>> manager. You can do:
>>
>>        graph.setTransactionBufferSize(50);
>>
>> ...and then simply do your traversal. No manager.incrCount() needed. I
>> believe the latest Neo4j release uses Gremlin 1.3 and Blueprints 1.0. ??
>> Peter?
>>
>> Take care,
>> Marko.
>>
>> http://markorodriguez.com
>>
>> On Oct 25, 2011, at 12:43 PM, Nuo Yan wrote:
>>
>> > For the record, in case someone else has similar need, I came up with the
>> > following query that does what I described in the last email below (still
>> on
>> > gremlin 1.2 so still using Commit Manager):
>> >
>> > manager = TransactionalGraphHelper.createCommitManager(g, 50);
>> > g.v(1).out('foo').transform{[it, it.name,
>> >
>> it.outE('bar').count()]}.aggregate().cap.next().groupBy{it[1]}.each{key,value
>> > -> value.sort{a,b -> b[2] <=> a[2]}.eachWithIndex{a,i -> if(i > 0)
>> > {g.removeVertex(a[0]); manager.incrCounter()}}}
>> > manager.close();
>> >
>> > After going through this I got a lot better understanding in Gremlin.
>> Thanks
>> > Peter and Marko.
>> >
>> >
>> > On Sat, Oct 22, 2011 at 6:04 PM, Nuo Yan  wrote:
>> >
>> >> Thanks very much Marko. I researched the query one step at a time and
>> >> gained much more knowledge about gremlin.
>> >>
>> >> However, I wanted to do something a little bit different, instead of
>> >> comparing the "name" property of the children nodes to the source node,
>> I
>> >> wanted to compare among the siblings of the children nodes (only first
>> level
>> >> under the source node) and if there are duplicates, only keep the one
>> with
>> >> the biggest degree of "bar" relationship. (The source node doesn't have
>> a
>> >> "name" property).
>> >>
>> >> For example,
>> >>
>> >> v(1) --foo--> v(2) name: "abc" --bar--> (15 nodes)
>> >> v(1) --foo--> v(3) name: "abc --bar --> (20 nodes)
>> >> v(1) --foo--> v(4) name "xyz" --bar--> (15 nodes)
>> >> v(1) --foo--> v(5) name "xyz" --bar--> (25 nodes)
>> >>
>> >> would become:
>> >>
>> >> v(1) --foo--> v(3) name: "abc --bar --> (20 nodes)
>> >> v(1) --foo--> v(5) name "xyz" --bar--> (25 nodes)
>> >>
>> >> So instead of doing
>> >>
>> >>
>> >> g.v(1).sideEffect{x =
>> >>
>> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}
>> >>
>> >> I proposed doing:
>> >>
>> >> g.v(1).out("foo").transform{[it, it.name,
>> >> it.out("bar").count]}.aggregate.cap
>> >>
>> >> to get an array of first level children nodes, their names, and degree
>> of
>> >> "bar" edges like [v(2), "abc", 15], [v(3), "abc", 20], [v(4), "xyz",
>> 15],
>> >> [v(5), "xyz", 20]
>> >>
>> >> And then I can sort the array by the name property, and iterate through
>> >> that array to delete nodes that have a smaller count based on the count
>> >> value specified in each sub array.
>> >>
>> >> But since my gremlin knowledge is still very limited, before digging too
>> >> much into this proposed solution I want to verify with you that it would
>> >> work and see if you have better or easier approach to do it (i.e. maybe
>> one
>> >> simple method that I can make use that I'm not aware of).  Thanks very
>> much
>> >> again.
>> >>
>> >>
>> >> On Sat, Oct 22, 2011 at 9:40 AM, Marko Rodriguez > >wrote:
>> >>
>> >>> Hi,
>> >>>
>>  Currently I'm doing the following in my own code with multiple
>> requests
>> >>> to the standalone neo4j server. I wonder if it's possible to achieve in
>> one
>> >>> gremlin query/script so that I can post the gremlin query to the server
>> as 1
>> >>> request and done. What I'm trying to achieve is:
>> 
>>  Start from one given node (e.g. v1), get all of the nodes connected
>> >>> through a given type of relationship (e.g. relationship "foo"), within
>> all
>> >>> of these nodes, see if their "name" property has the same value, and if
>> so,
>> >>> delete the node (and the "foo" relationship connected to it) with
>> smaller
>> >>> outgoing degree (on a specific type of relationship, say, "bar"). If
>> there
>> >>> are more than two nodes with the same "name" property, only keep the
>> one
>> >>> with biggest outgoing degree (on typ

Re: [Neo4j] Gremlin help

2011-10-25 Thread Nuo Yan
Hi Marko,

I believe 1.5 milestone release has Gremlin 1.3 and Blueprints 1.0 but
before 1.5 stable release I'm going to be using 1.4.x. In 1.4.2 it only has
Gremlin 1.2 and doesn't appear to have the setTransactionBufferSize stuff.

On Tue, Oct 25, 2011 at 11:52 AM, Marko Rodriguez wrote:

> Hi,
>
> Note that with Blueprints 1.0, you do not have to deal with a commit
> manager. You can do:
>
>graph.setTransactionBufferSize(50);
>
> ...and then simply do your traversal. No manager.incrCount() needed. I
> believe the latest Neo4j release uses Gremlin 1.3 and Blueprints 1.0. ??
> Peter?
>
> Take care,
> Marko.
>
> http://markorodriguez.com
>
> On Oct 25, 2011, at 12:43 PM, Nuo Yan wrote:
>
> > For the record, in case someone else has similar need, I came up with the
> > following query that does what I described in the last email below (still
> on
> > gremlin 1.2 so still using Commit Manager):
> >
> > manager = TransactionalGraphHelper.createCommitManager(g, 50);
> > g.v(1).out('foo').transform{[it, it.name,
> >
> it.outE('bar').count()]}.aggregate().cap.next().groupBy{it[1]}.each{key,value
> > -> value.sort{a,b -> b[2] <=> a[2]}.eachWithIndex{a,i -> if(i > 0)
> > {g.removeVertex(a[0]); manager.incrCounter()}}}
> > manager.close();
> >
> > After going through this I got a lot better understanding in Gremlin.
> Thanks
> > Peter and Marko.
> >
> >
> > On Sat, Oct 22, 2011 at 6:04 PM, Nuo Yan  wrote:
> >
> >> Thanks very much Marko. I researched the query one step at a time and
> >> gained much more knowledge about gremlin.
> >>
> >> However, I wanted to do something a little bit different, instead of
> >> comparing the "name" property of the children nodes to the source node,
> I
> >> wanted to compare among the siblings of the children nodes (only first
> level
> >> under the source node) and if there are duplicates, only keep the one
> with
> >> the biggest degree of "bar" relationship. (The source node doesn't have
> a
> >> "name" property).
> >>
> >> For example,
> >>
> >> v(1) --foo--> v(2) name: "abc" --bar--> (15 nodes)
> >> v(1) --foo--> v(3) name: "abc --bar --> (20 nodes)
> >> v(1) --foo--> v(4) name "xyz" --bar--> (15 nodes)
> >> v(1) --foo--> v(5) name "xyz" --bar--> (25 nodes)
> >>
> >> would become:
> >>
> >> v(1) --foo--> v(3) name: "abc --bar --> (20 nodes)
> >> v(1) --foo--> v(5) name "xyz" --bar--> (25 nodes)
> >>
> >> So instead of doing
> >>
> >>
> >> g.v(1).sideEffect{x =
> >>
> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}
> >>
> >> I proposed doing:
> >>
> >> g.v(1).out("foo").transform{[it, it.name,
> >> it.out("bar").count]}.aggregate.cap
> >>
> >> to get an array of first level children nodes, their names, and degree
> of
> >> "bar" edges like [v(2), "abc", 15], [v(3), "abc", 20], [v(4), "xyz",
> 15],
> >> [v(5), "xyz", 20]
> >>
> >> And then I can sort the array by the name property, and iterate through
> >> that array to delete nodes that have a smaller count based on the count
> >> value specified in each sub array.
> >>
> >> But since my gremlin knowledge is still very limited, before digging too
> >> much into this proposed solution I want to verify with you that it would
> >> work and see if you have better or easier approach to do it (i.e. maybe
> one
> >> simple method that I can make use that I'm not aware of).  Thanks very
> much
> >> again.
> >>
> >>
> >> On Sat, Oct 22, 2011 at 9:40 AM, Marko Rodriguez  >wrote:
> >>
> >>> Hi,
> >>>
>  Currently I'm doing the following in my own code with multiple
> requests
> >>> to the standalone neo4j server. I wonder if it's possible to achieve in
> one
> >>> gremlin query/script so that I can post the gremlin query to the server
> as 1
> >>> request and done. What I'm trying to achieve is:
> 
>  Start from one given node (e.g. v1), get all of the nodes connected
> >>> through a given type of relationship (e.g. relationship "foo"), within
> all
> >>> of these nodes, see if their "name" property has the same value, and if
> so,
> >>> delete the node (and the "foo" relationship connected to it) with
> smaller
> >>> outgoing degree (on a specific type of relationship, say, "bar"). If
> there
> >>> are more than two nodes with the same "name" property, only keep the
> one
> >>> with biggest outgoing degree (on type "bar").
> >>>
> >>>
> >>> The query below is to warm you up. It will delete all vertices with
> same
> >>> property value as source vertex that are 'foo' related to source
> vertex.
> >>> Given that you are mutating the graph, you will want to deal with
> >>> transaction buffers so you don't do one transaction per mutations:
> >>>   https://github.com/tinkerpop/blueprints/wiki/Graph-Transactions
> >>>
> >>> g.v(1).sideEffect{x =
> >>>
> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}.sideEffect{g.removeVertex(it)}
> >>>
> >>> -
> >>>
> >>> To do the stuff with the smaller counts, etc. You can do:
> >>>
> >>> 

Re: [Neo4j] Gremlin help

2011-10-25 Thread Peter Neubauer
Cool. Keep it coming Nuo!

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org              - NOSQL for the Enterprise.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.



On Tue, Oct 25, 2011 at 1:43 PM, Nuo Yan  wrote:
> For the record, in case someone else has similar need, I came up with the
> following query that does what I described in the last email below (still on
> gremlin 1.2 so still using Commit Manager):
>
> manager = TransactionalGraphHelper.createCommitManager(g, 50);
> g.v(1).out('foo').transform{[it, it.name,
> it.outE('bar').count()]}.aggregate().cap.next().groupBy{it[1]}.each{key,value
> -> value.sort{a,b -> b[2] <=> a[2]}.eachWithIndex{a,i -> if(i > 0)
> {g.removeVertex(a[0]); manager.incrCounter()}}}
> manager.close();
>
> After going through this I got a lot better understanding in Gremlin. Thanks
> Peter and Marko.
>
>
> On Sat, Oct 22, 2011 at 6:04 PM, Nuo Yan  wrote:
>
>> Thanks very much Marko. I researched the query one step at a time and
>> gained much more knowledge about gremlin.
>>
>> However, I wanted to do something a little bit different, instead of
>> comparing the "name" property of the children nodes to the source node, I
>> wanted to compare among the siblings of the children nodes (only first level
>> under the source node) and if there are duplicates, only keep the one with
>> the biggest degree of "bar" relationship. (The source node doesn't have a
>> "name" property).
>>
>> For example,
>>
>> v(1) --foo--> v(2) name: "abc" --bar--> (15 nodes)
>> v(1) --foo--> v(3) name: "abc --bar --> (20 nodes)
>> v(1) --foo--> v(4) name "xyz" --bar--> (15 nodes)
>> v(1) --foo--> v(5) name "xyz" --bar--> (25 nodes)
>>
>> would become:
>>
>> v(1) --foo--> v(3) name: "abc --bar --> (20 nodes)
>> v(1) --foo--> v(5) name "xyz" --bar--> (25 nodes)
>>
>> So instead of doing
>>
>>
>> g.v(1).sideEffect{x =
>> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}
>>
>> I proposed doing:
>>
>> g.v(1).out("foo").transform{[it, it.name,
>> it.out("bar").count]}.aggregate.cap
>>
>> to get an array of first level children nodes, their names, and degree of
>> "bar" edges like [v(2), "abc", 15], [v(3), "abc", 20], [v(4), "xyz", 15],
>> [v(5), "xyz", 20]
>>
>> And then I can sort the array by the name property, and iterate through
>> that array to delete nodes that have a smaller count based on the count
>> value specified in each sub array.
>>
>> But since my gremlin knowledge is still very limited, before digging too
>> much into this proposed solution I want to verify with you that it would
>> work and see if you have better or easier approach to do it (i.e. maybe one
>> simple method that I can make use that I'm not aware of).  Thanks very much
>> again.
>>
>>
>> On Sat, Oct 22, 2011 at 9:40 AM, Marko Rodriguez wrote:
>>
>>> Hi,
>>>
>>> > Currently I'm doing the following in my own code with multiple requests
>>> to the standalone neo4j server. I wonder if it's possible to achieve in one
>>> gremlin query/script so that I can post the gremlin query to the server as 1
>>> request and done. What I'm trying to achieve is:
>>> >
>>> > Start from one given node (e.g. v1), get all of the nodes connected
>>> through a given type of relationship (e.g. relationship "foo"), within all
>>> of these nodes, see if their "name" property has the same value, and if so,
>>> delete the node (and the "foo" relationship connected to it) with smaller
>>> outgoing degree (on a specific type of relationship, say, "bar"). If there
>>> are more than two nodes with the same "name" property, only keep the one
>>> with biggest outgoing degree (on type "bar").
>>>
>>>
>>> The query below is to warm you up. It will delete all vertices with same
>>> property value as source vertex that are 'foo' related to source vertex.
>>> Given that you are mutating the graph, you will want to deal with
>>> transaction buffers so you don't do one transaction per mutations:
>>>        https://github.com/tinkerpop/blueprints/wiki/Graph-Transactions
>>>
>>> g.v(1).sideEffect{x =
>>> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}.sideEffect{g.removeVertex(it)}
>>>
>>> -
>>>
>>> To do the stuff with the smaller counts, etc. You can do:
>>>
>>> g.v(1).sideEffect{x =
>>> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}.transform{[it,
>>> it.outE('bar').count()]}.filter{it[1] > 0}.aggregate.cap.next().sort{a,b ->
>>> b[1] <=> a[1]}.eachWithIndex{a,i -> if(i > 0) g.removeVertex(a[0])}
>>>
>>> There you go! One big fatty Gremlin query to solve your problem.
>>>
>>> I would recommend going through each step and seeing what it returns so
>>> you understand what is going on Again, given that you are mutating the
>>> graph, be sure to be wise ab

Re: [Neo4j] Gremlin help

2011-10-25 Thread Marko Rodriguez
Hi,

Note that with Blueprints 1.0, you do not have to deal with a commit manager. 
You can do:

graph.setTransactionBufferSize(50);

...and then simply do your traversal. No manager.incrCount() needed. I believe 
the latest Neo4j release uses Gremlin 1.3 and Blueprints 1.0. ?? Peter?

Take care,
Marko.

http://markorodriguez.com

On Oct 25, 2011, at 12:43 PM, Nuo Yan wrote:

> For the record, in case someone else has similar need, I came up with the
> following query that does what I described in the last email below (still on
> gremlin 1.2 so still using Commit Manager):
> 
> manager = TransactionalGraphHelper.createCommitManager(g, 50);
> g.v(1).out('foo').transform{[it, it.name,
> it.outE('bar').count()]}.aggregate().cap.next().groupBy{it[1]}.each{key,value
> -> value.sort{a,b -> b[2] <=> a[2]}.eachWithIndex{a,i -> if(i > 0)
> {g.removeVertex(a[0]); manager.incrCounter()}}}
> manager.close();
> 
> After going through this I got a lot better understanding in Gremlin. Thanks
> Peter and Marko.
> 
> 
> On Sat, Oct 22, 2011 at 6:04 PM, Nuo Yan  wrote:
> 
>> Thanks very much Marko. I researched the query one step at a time and
>> gained much more knowledge about gremlin.
>> 
>> However, I wanted to do something a little bit different, instead of
>> comparing the "name" property of the children nodes to the source node, I
>> wanted to compare among the siblings of the children nodes (only first level
>> under the source node) and if there are duplicates, only keep the one with
>> the biggest degree of "bar" relationship. (The source node doesn't have a
>> "name" property).
>> 
>> For example,
>> 
>> v(1) --foo--> v(2) name: "abc" --bar--> (15 nodes)
>> v(1) --foo--> v(3) name: "abc --bar --> (20 nodes)
>> v(1) --foo--> v(4) name "xyz" --bar--> (15 nodes)
>> v(1) --foo--> v(5) name "xyz" --bar--> (25 nodes)
>> 
>> would become:
>> 
>> v(1) --foo--> v(3) name: "abc --bar --> (20 nodes)
>> v(1) --foo--> v(5) name "xyz" --bar--> (25 nodes)
>> 
>> So instead of doing
>> 
>> 
>> g.v(1).sideEffect{x =
>> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}
>> 
>> I proposed doing:
>> 
>> g.v(1).out("foo").transform{[it, it.name,
>> it.out("bar").count]}.aggregate.cap
>> 
>> to get an array of first level children nodes, their names, and degree of
>> "bar" edges like [v(2), "abc", 15], [v(3), "abc", 20], [v(4), "xyz", 15],
>> [v(5), "xyz", 20]
>> 
>> And then I can sort the array by the name property, and iterate through
>> that array to delete nodes that have a smaller count based on the count
>> value specified in each sub array.
>> 
>> But since my gremlin knowledge is still very limited, before digging too
>> much into this proposed solution I want to verify with you that it would
>> work and see if you have better or easier approach to do it (i.e. maybe one
>> simple method that I can make use that I'm not aware of).  Thanks very much
>> again.
>> 
>> 
>> On Sat, Oct 22, 2011 at 9:40 AM, Marko Rodriguez wrote:
>> 
>>> Hi,
>>> 
 Currently I'm doing the following in my own code with multiple requests
>>> to the standalone neo4j server. I wonder if it's possible to achieve in one
>>> gremlin query/script so that I can post the gremlin query to the server as 1
>>> request and done. What I'm trying to achieve is:
 
 Start from one given node (e.g. v1), get all of the nodes connected
>>> through a given type of relationship (e.g. relationship "foo"), within all
>>> of these nodes, see if their "name" property has the same value, and if so,
>>> delete the node (and the "foo" relationship connected to it) with smaller
>>> outgoing degree (on a specific type of relationship, say, "bar"). If there
>>> are more than two nodes with the same "name" property, only keep the one
>>> with biggest outgoing degree (on type "bar").
>>> 
>>> 
>>> The query below is to warm you up. It will delete all vertices with same
>>> property value as source vertex that are 'foo' related to source vertex.
>>> Given that you are mutating the graph, you will want to deal with
>>> transaction buffers so you don't do one transaction per mutations:
>>>   https://github.com/tinkerpop/blueprints/wiki/Graph-Transactions
>>> 
>>> g.v(1).sideEffect{x =
>>> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}.sideEffect{g.removeVertex(it)}
>>> 
>>> -
>>> 
>>> To do the stuff with the smaller counts, etc. You can do:
>>> 
>>> g.v(1).sideEffect{x =
>>> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}.transform{[it,
>>> it.outE('bar').count()]}.filter{it[1] > 0}.aggregate.cap.next().sort{a,b ->
>>> b[1] <=> a[1]}.eachWithIndex{a,i -> if(i > 0) g.removeVertex(a[0])}
>>> 
>>> There you go! One big fatty Gremlin query to solve your problem.
>>> 
>>> I would recommend going through each step and seeing what it returns so
>>> you understand what is going on Again, given that you are mutating the
>>> graph, be sure to be wis

Re: [Neo4j] Gremlin help

2011-10-25 Thread Nuo Yan
For the record, in case someone else has similar need, I came up with the
following query that does what I described in the last email below (still on
gremlin 1.2 so still using Commit Manager):

manager = TransactionalGraphHelper.createCommitManager(g, 50);
g.v(1).out('foo').transform{[it, it.name,
it.outE('bar').count()]}.aggregate().cap.next().groupBy{it[1]}.each{key,value
-> value.sort{a,b -> b[2] <=> a[2]}.eachWithIndex{a,i -> if(i > 0)
{g.removeVertex(a[0]); manager.incrCounter()}}}
manager.close();

After going through this I got a lot better understanding in Gremlin. Thanks
Peter and Marko.


On Sat, Oct 22, 2011 at 6:04 PM, Nuo Yan  wrote:

> Thanks very much Marko. I researched the query one step at a time and
> gained much more knowledge about gremlin.
>
> However, I wanted to do something a little bit different, instead of
> comparing the "name" property of the children nodes to the source node, I
> wanted to compare among the siblings of the children nodes (only first level
> under the source node) and if there are duplicates, only keep the one with
> the biggest degree of "bar" relationship. (The source node doesn't have a
> "name" property).
>
> For example,
>
> v(1) --foo--> v(2) name: "abc" --bar--> (15 nodes)
> v(1) --foo--> v(3) name: "abc --bar --> (20 nodes)
> v(1) --foo--> v(4) name "xyz" --bar--> (15 nodes)
> v(1) --foo--> v(5) name "xyz" --bar--> (25 nodes)
>
> would become:
>
> v(1) --foo--> v(3) name: "abc --bar --> (20 nodes)
> v(1) --foo--> v(5) name "xyz" --bar--> (25 nodes)
>
> So instead of doing
>
>
> g.v(1).sideEffect{x =
> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}
>
> I proposed doing:
>
> g.v(1).out("foo").transform{[it, it.name,
> it.out("bar").count]}.aggregate.cap
>
> to get an array of first level children nodes, their names, and degree of
> "bar" edges like [v(2), "abc", 15], [v(3), "abc", 20], [v(4), "xyz", 15],
> [v(5), "xyz", 20]
>
> And then I can sort the array by the name property, and iterate through
> that array to delete nodes that have a smaller count based on the count
> value specified in each sub array.
>
> But since my gremlin knowledge is still very limited, before digging too
> much into this proposed solution I want to verify with you that it would
> work and see if you have better or easier approach to do it (i.e. maybe one
> simple method that I can make use that I'm not aware of).  Thanks very much
> again.
>
>
> On Sat, Oct 22, 2011 at 9:40 AM, Marko Rodriguez wrote:
>
>> Hi,
>>
>> > Currently I'm doing the following in my own code with multiple requests
>> to the standalone neo4j server. I wonder if it's possible to achieve in one
>> gremlin query/script so that I can post the gremlin query to the server as 1
>> request and done. What I'm trying to achieve is:
>> >
>> > Start from one given node (e.g. v1), get all of the nodes connected
>> through a given type of relationship (e.g. relationship "foo"), within all
>> of these nodes, see if their "name" property has the same value, and if so,
>> delete the node (and the "foo" relationship connected to it) with smaller
>> outgoing degree (on a specific type of relationship, say, "bar"). If there
>> are more than two nodes with the same "name" property, only keep the one
>> with biggest outgoing degree (on type "bar").
>>
>>
>> The query below is to warm you up. It will delete all vertices with same
>> property value as source vertex that are 'foo' related to source vertex.
>> Given that you are mutating the graph, you will want to deal with
>> transaction buffers so you don't do one transaction per mutations:
>>https://github.com/tinkerpop/blueprints/wiki/Graph-Transactions
>>
>> g.v(1).sideEffect{x =
>> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}.sideEffect{g.removeVertex(it)}
>>
>> -
>>
>> To do the stuff with the smaller counts, etc. You can do:
>>
>> g.v(1).sideEffect{x =
>> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}.transform{[it,
>> it.outE('bar').count()]}.filter{it[1] > 0}.aggregate.cap.next().sort{a,b ->
>> b[1] <=> a[1]}.eachWithIndex{a,i -> if(i > 0) g.removeVertex(a[0])}
>>
>> There you go! One big fatty Gremlin query to solve your problem.
>>
>> I would recommend going through each step and seeing what it returns so
>> you understand what is going on Again, given that you are mutating the
>> graph, be sure to be wise about transactions.
>>
>> Enjoy!,
>> Marko.
>>
>> http://markorodriguez.com
>>
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Gremlin help

2011-10-22 Thread Nuo Yan
Thanks very much Marko. I researched the query one step at a time and gained
much more knowledge about gremlin.

However, I wanted to do something a little bit different, instead of
comparing the "name" property of the children nodes to the source node, I
wanted to compare among the siblings of the children nodes (only first level
under the source node) and if there are duplicates, only keep the one with
the biggest degree of "bar" relationship. (The source node doesn't have a
"name" property).

For example,

v(1) --foo--> v(2) name: "abc" --bar--> (15 nodes)
v(1) --foo--> v(3) name: "abc --bar --> (20 nodes)
v(1) --foo--> v(4) name "xyz" --bar--> (15 nodes)
v(1) --foo--> v(5) name "xyz" --bar--> (25 nodes)

would become:

v(1) --foo--> v(3) name: "abc --bar --> (20 nodes)
v(1) --foo--> v(5) name "xyz" --bar--> (25 nodes)

So instead of doing

g.v(1).sideEffect{x =
it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}

I proposed doing:

g.v(1).out("foo").transform{[it, it.name,
it.out("bar").count]}.aggregate.cap

to get an array of first level children nodes, their names, and degree of
"bar" edges like [v(2), "abc", 15], [v(3), "abc", 20], [v(4), "xyz", 15],
[v(5), "xyz", 20]

And then I can sort the array by the name property, and iterate through that
array to delete nodes that have a smaller count based on the count value
specified in each sub array.

But since my gremlin knowledge is still very limited, before digging too
much into this proposed solution I want to verify with you that it would
work and see if you have better or easier approach to do it (i.e. maybe one
simple method that I can make use that I'm not aware of).  Thanks very much
again.


On Sat, Oct 22, 2011 at 9:40 AM, Marko Rodriguez wrote:

> Hi,
>
> > Currently I'm doing the following in my own code with multiple requests
> to the standalone neo4j server. I wonder if it's possible to achieve in one
> gremlin query/script so that I can post the gremlin query to the server as 1
> request and done. What I'm trying to achieve is:
> >
> > Start from one given node (e.g. v1), get all of the nodes connected
> through a given type of relationship (e.g. relationship "foo"), within all
> of these nodes, see if their "name" property has the same value, and if so,
> delete the node (and the "foo" relationship connected to it) with smaller
> outgoing degree (on a specific type of relationship, say, "bar"). If there
> are more than two nodes with the same "name" property, only keep the one
> with biggest outgoing degree (on type "bar").
>
>
> The query below is to warm you up. It will delete all vertices with same
> property value as source vertex that are 'foo' related to source vertex.
> Given that you are mutating the graph, you will want to deal with
> transaction buffers so you don't do one transaction per mutations:
>https://github.com/tinkerpop/blueprints/wiki/Graph-Transactions
>
> g.v(1).sideEffect{x =
> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}.sideEffect{g.removeVertex(it)}
>
> -
>
> To do the stuff with the smaller counts, etc. You can do:
>
> g.v(1).sideEffect{x =
> it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}.transform{[it,
> it.outE('bar').count()]}.filter{it[1] > 0}.aggregate.cap.next().sort{a,b ->
> b[1] <=> a[1]}.eachWithIndex{a,i -> if(i > 0) g.removeVertex(a[0])}
>
> There you go! One big fatty Gremlin query to solve your problem.
>
> I would recommend going through each step and seeing what it returns so you
> understand what is going on Again, given that you are mutating the
> graph, be sure to be wise about transactions.
>
> Enjoy!,
> Marko.
>
> http://markorodriguez.com
>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Gremlin help

2011-10-22 Thread Marko Rodriguez
Hi,

> Currently I'm doing the following in my own code with multiple requests to 
> the standalone neo4j server. I wonder if it's possible to achieve in one 
> gremlin query/script so that I can post the gremlin query to the server as 1 
> request and done. What I'm trying to achieve is:
> 
> Start from one given node (e.g. v1), get all of the nodes connected through a 
> given type of relationship (e.g. relationship "foo"), within all of these 
> nodes, see if their "name" property has the same value, and if so, delete the 
> node (and the "foo" relationship connected to it) with smaller outgoing 
> degree (on a specific type of relationship, say, "bar"). If there are more 
> than two nodes with the same "name" property, only keep the one with biggest 
> outgoing degree (on type "bar").


The query below is to warm you up. It will delete all vertices with same 
property value as source vertex that are 'foo' related to source vertex. Given 
that you are mutating the graph, you will want to deal with transaction buffers 
so you don't do one transaction per mutations:
https://github.com/tinkerpop/blueprints/wiki/Graph-Transactions

g.v(1).sideEffect{x = 
it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}.sideEffect{g.removeVertex(it)}

-

To do the stuff with the smaller counts, etc. You can do:

g.v(1).sideEffect{x = 
it.getProperty('name')}.out('foo').filter{it.getProperty('name').equals(x)}.transform{[it,
 it.outE('bar').count()]}.filter{it[1] > 0}.aggregate.cap.next().sort{a,b -> 
b[1] <=> a[1]}.eachWithIndex{a,i -> if(i > 0) g.removeVertex(a[0])}

There you go! One big fatty Gremlin query to solve your problem. 

I would recommend going through each step and seeing what it returns so you 
understand what is going on Again, given that you are mutating the graph, 
be sure to be wise about transactions.

Enjoy!,
Marko.

http://markorodriguez.com

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Gremlin help

2011-10-22 Thread Peter Neubauer
Nuo,
In principle this looks ok except you will have to take care that you are
not deleting nodes that are in the current traversal and would recursively
change your traversal result.

Dunno the Groovy expression for this, but if you can do it in Java, you can
do it in Groovy, for instance
http://docs.neo4j.org/chunked/snapshot/gremlin-plugin.html#rest-api-send-an-arbitrary-groovy-script---lucene-sorting


HTH

/peter

On Friday, October 21, 2011, Nuo Yan  wrote:
> Hi Marko and Gremlin gurus:
>
> Currently I'm doing the following in my own code with multiple requests to
> the standalone neo4j server. I wonder if it's possible to achieve in one
> gremlin query/script so that I can post the gremlin query to the server as
1
> request and done. What I'm trying to achieve is:
>
> Start from one given node (e.g. v1), get all of the nodes connected
through
> a given type of relationship (e.g. relationship "foo"), within all of
these
> nodes, see if their "name" property has the same value, and if so, delete
> the node (and the "foo" relationship connected to it) with smaller
outgoing
> degree (on a specific type of relationship, say, "bar"). If there are more
> than two nodes with the same "name" property, only keep the one with
biggest
> outgoing degree (on type "bar").
>
>
> For example, for the following graph:
>
> v1 --foo--> v2("name" => "abc") --"bar"--> (15 nodes)
> v1 --foo--> v3("name" => "abc") --"bar"--> (5 nodes)
> v1 --foo--> v4("name" => "abc") --"bar"--> (8 nodes)
> v1 --foo--> v5("name" => "xyz")--"bar"-->(16 nodes)
> v1 --foo--> v6("name" => "abc")--"not_bar"--> (20 nodes)
>
> Ideally, after running the gremlin script, it should be:
>
> v1 --foo--> v2("name" => "abc") --"bar"--> (15 nodes)
> v1 --foo--> v5("name" => "xyz")--"bar"-->(16 nodes)
> v1 --foo--> v6("name" => "abc")--"not_bar"--> (20 nodes)
>
> with v3 and v4 (and the "foo" relationships connecting them to v1) deleted
> because they have the same "name" attributes with v2 but a smaller degree
> with outgoing "bar" relationship.
>
> It this possible to achieve relatively easily with Gremlin?
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>

-- 
Sent from Gmail Mobile
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Gremlin help

2011-10-21 Thread Nuo Yan
Hi Marko and Gremlin gurus:

Currently I'm doing the following in my own code with multiple requests to
the standalone neo4j server. I wonder if it's possible to achieve in one
gremlin query/script so that I can post the gremlin query to the server as 1
request and done. What I'm trying to achieve is:

Start from one given node (e.g. v1), get all of the nodes connected through
a given type of relationship (e.g. relationship "foo"), within all of these
nodes, see if their "name" property has the same value, and if so, delete
the node (and the "foo" relationship connected to it) with smaller outgoing
degree (on a specific type of relationship, say, "bar"). If there are more
than two nodes with the same "name" property, only keep the one with biggest
outgoing degree (on type "bar").


For example, for the following graph:

v1 --foo--> v2("name" => "abc") --"bar"--> (15 nodes)
v1 --foo--> v3("name" => "abc") --"bar"--> (5 nodes)
v1 --foo--> v4("name" => "abc") --"bar"--> (8 nodes)
v1 --foo--> v5("name" => "xyz")--"bar"-->(16 nodes)
v1 --foo--> v6("name" => "abc")--"not_bar"--> (20 nodes)

Ideally, after running the gremlin script, it should be:

v1 --foo--> v2("name" => "abc") --"bar"--> (15 nodes)
v1 --foo--> v5("name" => "xyz")--"bar"-->(16 nodes)
v1 --foo--> v6("name" => "abc")--"not_bar"--> (20 nodes)

with v3 and v4 (and the "foo" relationships connecting them to v1) deleted
because they have the same "name" attributes with v2 but a smaller degree
with outgoing "bar" relationship.

It this possible to achieve relatively easily with Gremlin?
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user