Re: [Neo4j] Gremlin performance?

Michael Hunger Sat, 23 Jul 2011 09:44:57 -0700

Especially in comparison to the raw Neo4j performance.

Would be interesting to see both, code and times side by side.


Cheers

Michael

Am 23.07.2011 um 18:40 schrieb Josef Holy:

> Thanks a lot guys for quick and comprehensive answers!
> 
> I was assuming that Gremlin serves only for Pipes assembly which shouldn't 
> impact the overall performance too much. We've 'hit the ceiling' with 
> implementing various custom traversals with native neo4j APIs - the 
> algorithms are quite lengthy and thus quite hard to maintain and tune. We are 
> hoping that Gremlin (+Groovy) expressiveness could make things easier, even 
> if in exchange for a little performance. The numbers you Marko provided are 
> promising!
> 
> Will give Gremlin a shot and report back some real numbers.
> 
> Thanks a lot!
> 
> 
> Cheers!
> 
> Josef.
> 
> On sobota, 23. července 2011 at 17:25, Marko Rodriguez wrote: 
>> Hi,
>> 
>> Finally, one point to add.
>> 
>> If I only need to do a ShorestPath over a particular edge type or a "find 
>> all paths" between two vertices and I'm using Neo4j as the graph backend, 
>> then I will drop down and use Neo4j's Algo library. This is because their 
>> ShorestPath implementation is bi-directional (efficient) and I would have to 
>> write that in Gremlin as Gremlin doesn't provide "out of the box" textbook 
>> algorithm support.
>> 
>> TinkerPop plans an algo library for standard graph algorithms whose paths 
>> are defined by Pipes/Gremlin, but as of yet, it doesn't exist.
>> See http://markorodriguez.com/2011/02/08/property-graph-algorithms/
>> 
>> Thanks,
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> On Jul 23, 2011, at 9:16 AM, Marko Rodriguez wrote:
>> 
>>> Hey,
>>> 
>>> Groovy is only used to compile a statement like "g.v(1).out.in.blah" to a 
>>> Pipes which is native Java. As such, once the compilation is complete 
>>> (milliseconds), it is simply native Java (This is not completely true as 
>>> there are some Gremlin specific pipes). Next, for the relationship between 
>>> Blueprints Neo4jGraph and native EmbeddedGraphDatabase, see this from some 
>>> time ago:
>>> 
>>> http://groups.google.com/group/gremlin-users/msg/c94dfef8352f68d3
>>> 
>>> In short, traversing 29.6 million things took:
>>> 5.6 seconds via EmbeddedGraphDatabase
>>> 6.0 seconds via Neo4jGraph
>>> 
>>> ** As a side, the same experiment was run for OrientDB with a 7.2 (native 
>>> OrientDB) vs. 7.9 (Blueprints OrientGraph).
>>> http://groups.google.com/group/gremlin-users/msg/ff5c03e188efcffe
>>> 
>>> There is more discussion in that particular thread if you are interested.
>>> 
>>> Finally, with respect to production, I have many clients that use Gremlin 
>>> in production. Here are the benefits of do so:
>>> 1. Traversal descriptions are concise and expressive.
>>> - any arbitrary graph computation can be represented and evaluated.
>>> - in language theoretic terms, it can recognize Turing complete paths.
>>> 2. Traversal descriptions can be expressed as classes in Groovy and thus, 
>>> IDE friendly.
>>> - syntax highlighting, easy to write test cases/debug, etc.
>>> - See slides 234 and 235 from 
>>> http://www.slideshare.net/slidarko/the-pathology-of-graph-databases
>>> 
>>> Thanks,
>>> Marko.
>>> 
>>> http://markorodriguez.com
>>> 
>>> On Jul 23, 2011, at 2:30 AM, Michael Hunger wrote:
>>> 
>>>> If you look at the comments of the post -
>>>> 
>>>> groovy is only that slow if you implement all the algorithm details in 
>>>> groovy !
>>>> 
>>>> Gremlin uses blueprints which is written in Java. Gremlin is just a DSL on 
>>>> top of that API so it is just used for the construction of the underlying 
>>>> pipeline.
>>>> 
>>>> Anyway, easiest way to see if that holds true is to write a PoC for _your_ 
>>>> domain, I think general 
>>>> statements are difficult.
>>>> 
>>>> But probably Marko has some nice performance benchmarks at hand.
>>>> 
>>>> Michael
>>>> 
>>>> Am 23.07.2011 um 09:51 schrieb Josef Holy:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> has someone on this list any practical experience with using Gremlin for 
>>>>> traversing the EmbeddedGraphDatabase in a production environment? What 
>>>>> interests me is how it performs compared to the traversal algorithms 
>>>>> written directly against Neo4j APIs (using Traverser, 
>>>>> TraversalDescription, ..etc). 
>>>>> 
>>>>> As Gremlin runs on top of Groovy + Pipes + Blueprints, I would expect it 
>>>>> to be much slower than pure Neo4j Java APIs (but really SO much slower? 
>>>>> http://stronglytypedblog.blogspot.com/2009/07/java-vs-scala-vs-groovy-performance.html
>>>>>  ) .
>>>>> 
>>>>> 
>>>>> Thanks for any comments/experiences!
>>>>> 
>>>>> 
>>>>> Josef.
>>>>> 
>>>>> _______________________________________________
>>>>> Neo4j mailing list
>>>>> [email protected]
>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>> 
>>>> _______________________________________________
>>>> Neo4j mailing list
>>>> [email protected]
>>>> https://lists.neo4j.org/mailman/listinfo/user
>> 
>> _______________________________________________
>> Neo4j mailing list
>> [email protected]
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Gremlin performance?

Reply via email to