Re: [Neo4j] Gremlin performance?

Marko Rodriguez Sat, 23 Jul 2011 12:00:06 -0700

Cool.

Good luck. Check out:
        https://github.com/tinkerpop/gremlin/wiki/Path-Optimizations


See ya,
Marko.

http://markorodriguez.com

On Jul 23, 2011, at 10:44 AM, Michael Hunger wrote:

> Especially in comparison to the raw Neo4j performance.
> 
> Would be interesting to see both, code and times side by side.
> 
> Cheers
> 
> Michael
> 
> Am 23.07.2011 um 18:40 schrieb Josef Holy:
> 
>> Thanks a lot guys for quick and comprehensive answers!
>> 
>> I was assuming that Gremlin serves only for Pipes assembly which shouldn't 
>> impact the overall performance too much. We've 'hit the ceiling' with 
>> implementing various custom traversals with native neo4j APIs - the 
>> algorithms are quite lengthy and thus quite hard to maintain and tune. We 
>> are hoping that Gremlin (+Groovy) expressiveness could make things easier, 
>> even if in exchange for a little performance. The numbers you Marko provided 
>> are promising!
>> 
>> Will give Gremlin a shot and report back some real numbers.
>> 
>> Thanks a lot!
>> 
>> 
>> Cheers!
>> 
>> Josef.
>> 
>> On sobota, 23. července 2011 at 17:25, Marko Rodriguez wrote: 
>>> Hi,
>>> 
>>> Finally, one point to add.
>>> 
>>> If I only need to do a ShorestPath over a particular edge type or a "find 
>>> all paths" between two vertices and I'm using Neo4j as the graph backend, 
>>> then I will drop down and use Neo4j's Algo library. This is because their 
>>> ShorestPath implementation is bi-directional (efficient) and I would have 
>>> to write that in Gremlin as Gremlin doesn't provide "out of the box" 
>>> textbook algorithm support.
>>> 
>>> TinkerPop plans an algo library for standard graph algorithms whose paths 
>>> are defined by Pipes/Gremlin, but as of yet, it doesn't exist.
>>> See http://markorodriguez.com/2011/02/08/property-graph-algorithms/
>>> 
>>> Thanks,
>>> Marko.
>>> 
>>> http://markorodriguez.com
>>> 
>>> On Jul 23, 2011, at 9:16 AM, Marko Rodriguez wrote:
>>> 
>>>> Hey,
>>>> 
>>>> Groovy is only used to compile a statement like "g.v(1).out.in.blah" to a 
>>>> Pipes which is native Java. As such, once the compilation is complete 
>>>> (milliseconds), it is simply native Java (This is not completely true as 
>>>> there are some Gremlin specific pipes). Next, for the relationship between 
>>>> Blueprints Neo4jGraph and native EmbeddedGraphDatabase, see this from some 
>>>> time ago:
>>>> 
>>>> http://groups.google.com/group/gremlin-users/msg/c94dfef8352f68d3
>>>> 
>>>> In short, traversing 29.6 million things took:
>>>> 5.6 seconds via EmbeddedGraphDatabase
>>>> 6.0 seconds via Neo4jGraph
>>>> 
>>>> ** As a side, the same experiment was run for OrientDB with a 7.2 (native 
>>>> OrientDB) vs. 7.9 (Blueprints OrientGraph).
>>>> http://groups.google.com/group/gremlin-users/msg/ff5c03e188efcffe
>>>> 
>>>> There is more discussion in that particular thread if you are interested.
>>>> 
>>>> Finally, with respect to production, I have many clients that use Gremlin 
>>>> in production. Here are the benefits of do so:
>>>> 1. Traversal descriptions are concise and expressive.
>>>> - any arbitrary graph computation can be represented and evaluated.
>>>> - in language theoretic terms, it can recognize Turing complete paths.
>>>> 2. Traversal descriptions can be expressed as classes in Groovy and thus, 
>>>> IDE friendly.
>>>> - syntax highlighting, easy to write test cases/debug, etc.
>>>> - See slides 234 and 235 from 
>>>> http://www.slideshare.net/slidarko/the-pathology-of-graph-databases
>>>> 
>>>> Thanks,
>>>> Marko.
>>>> 
>>>> http://markorodriguez.com
>>>> 
>>>> On Jul 23, 2011, at 2:30 AM, Michael Hunger wrote:
>>>> 
>>>>> If you look at the comments of the post -
>>>>> 
>>>>> groovy is only that slow if you implement all the algorithm details in 
>>>>> groovy !
>>>>> 
>>>>> Gremlin uses blueprints which is written in Java. Gremlin is just a DSL 
>>>>> on top of that API so it is just used for the construction of the 
>>>>> underlying pipeline.
>>>>> 
>>>>> Anyway, easiest way to see if that holds true is to write a PoC for 
>>>>> _your_ domain, I think general 
>>>>> statements are difficult.
>>>>> 
>>>>> But probably Marko has some nice performance benchmarks at hand.
>>>>> 
>>>>> Michael
>>>>> 
>>>>> Am 23.07.2011 um 09:51 schrieb Josef Holy:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> has someone on this list any practical experience with using Gremlin for 
>>>>>> traversing the EmbeddedGraphDatabase in a production environment? What 
>>>>>> interests me is how it performs compared to the traversal algorithms 
>>>>>> written directly against Neo4j APIs (using Traverser, 
>>>>>> TraversalDescription, ..etc). 
>>>>>> 
>>>>>> As Gremlin runs on top of Groovy + Pipes + Blueprints, I would expect it 
>>>>>> to be much slower than pure Neo4j Java APIs (but really SO much slower? 
>>>>>> http://stronglytypedblog.blogspot.com/2009/07/java-vs-scala-vs-groovy-performance.html
>>>>>>  ) .
>>>>>> 
>>>>>> 
>>>>>> Thanks for any comments/experiences!
>>>>>> 
>>>>>> 
>>>>>> Josef.
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Neo4j mailing list
>>>>>> [email protected]
>>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>> 
>>>>> _______________________________________________
>>>>> Neo4j mailing list
>>>>> [email protected]
>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>> 
>>> _______________________________________________
>>> Neo4j mailing list
>>> [email protected]
>>> https://lists.neo4j.org/mailman/listinfo/user
>>> 
>> _______________________________________________
>> Neo4j mailing list
>> [email protected]
>> https://lists.neo4j.org/mailman/listinfo/user
> 
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Gremlin performance?

Reply via email to