[Neo4j] check for existing relationship between two nodes
Hi All I have a requirement where I must check if there is an already existing relationship between two nodes (say N1 and N2). Right now, I'm doing it as follows: boolean found = false; final IterableRelationship currentRels = N1.getRelationships(RelTypes.KNOWS, Direction.OUTGOING); for (Relationship rel : currentRels) { found = rel.getEndNode().equals(N2); if (found) { do something - like add some property to the existing relationship; break; } } if (!found) { create new relationship between N1 and N2; } This means, for a high volume of data, all the relations going out of N1 will be retrieved and checked - and this seems costly. I'm using the 1.0 API, and wasn't able to find anything that would directly check whether N1 has an outgoing relationship with N2 - like N1.hasRelationship(N2, Direction.OUTGOING) - or something similar. I think there was a similar mail sometime ago. Has there been any updates lately which allows such checks? Or, is there any other direct way to do this with the 1.0 API? Regards Arijit -- And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] check for existing relationship between two nodes
Looping through relatiomships manually is the way to go. However there's a new component in https://svn.neo4j.org/laboratory/components/lucene-index/ which can index relationships and do fast lookups on whether or not a relationship (with a certain attribute even) exists between two nodes. You'll need to go with the latest kernel then as well (as seen in https://svn.neo4j.org/laboratory/components/lucene-index/pom.xml). 2010/7/30, Arijit Mukherjee ariji...@gmail.com: Hi All I have a requirement where I must check if there is an already existing relationship between two nodes (say N1 and N2). Right now, I'm doing it as follows: boolean found = false; final IterableRelationship currentRels = N1.getRelationships(RelTypes.KNOWS, Direction.OUTGOING); for (Relationship rel : currentRels) { found = rel.getEndNode().equals(N2); if (found) { do something - like add some property to the existing relationship; break; } } if (!found) { create new relationship between N1 and N2; } This means, for a high volume of data, all the relations going out of N1 will be retrieved and checked - and this seems costly. I'm using the 1.0 API, and wasn't able to find anything that would directly check whether N1 has an outgoing relationship with N2 - like N1.hasRelationship(N2, Direction.OUTGOING) - or something similar. I think there was a similar mail sometime ago. Has there been any updates lately which allows such checks? Or, is there any other direct way to do this with the 1.0 API? Regards Arijit -- And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] check for existing relationship between two nodes
Thanx Mattias. Can I download a tar.gz or zip file from somewhere? I'm not using Maven in my projects yet...I mean I'm not very comfortable with it. Arijit On 30 July 2010 17:33, Mattias Persson matt...@neotechnology.com wrote: Looping through relatiomships manually is the way to go. However there's a new component in https://svn.neo4j.org/laboratory/components/lucene-index/ which can index relationships and do fast lookups on whether or not a relationship (with a certain attribute even) exists between two nodes. You'll need to go with the latest kernel then as well (as seen in https://svn.neo4j.org/laboratory/components/lucene-index/pom.xml). 2010/7/30, Arijit Mukherjee ariji...@gmail.com: Hi All I have a requirement where I must check if there is an already existing relationship between two nodes (say N1 and N2). Right now, I'm doing it as follows: boolean found = false; final IterableRelationship currentRels = N1.getRelationships(RelTypes.KNOWS, Direction.OUTGOING); for (Relationship rel : currentRels) { found = rel.getEndNode().equals(N2); if (found) { do something - like add some property to the existing relationship; break; } } if (!found) { create new relationship between N1 and N2; } This means, for a high volume of data, all the relations going out of N1 will be retrieved and checked - and this seems costly. I'm using the 1.0 API, and wasn't able to find anything that would directly check whether N1 has an outgoing relationship with N2 - like N1.hasRelationship(N2, Direction.OUTGOING) - or something similar. I think there was a similar mail sometime ago. Has there been any updates lately which allows such checks? Or, is there any other direct way to do this with the 1.0 API? Regards Arijit -- And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Python Newbie Struggling Slightly
Tom, I just installed the python bindings and tried this very simple setup to create a database and populate it within a transaction: #!/usr/bin/env python import sys, random, csv from time import sleep from random import randint import neo4j graphdb = neo4j.GraphDatabase(db) with graphdb.transaction: person = graphdb.node(name=Person) peter = graphdb.node(name=Peter) peter.IS_A(person) print peter['name'] graphdb.shutdown() If you try this, is that working for you? Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Jul 27, 2010 at 5:25 PM, Tom Smith tas...@york.ac.uk wrote: Hello, I really like the look of neo4j and would like to use it in a project I'm working on http://pppeoplepppowered.blogspot.com/ . I'm new to graphs and how to work with them, it's all new, but really drawn to them having banged my head on SQL schemas for years (and years). My problem with working with python neo4j is that there aren't enough simple (enough) examples. This is sort of thing is great ... http://blog.neo4j.org/2010/03/modeling-categories-in-graph-database.html ... but it doesn't explain a few things and goes wrong in places. For example, from the example, if I try to execute... computers_node = graphdb.node(Name=Computers) ... I get... jpype._jexception.RuntimeExceptionPyRaisable: org.neo4j.graphdb.NotInTransactionException: No transaction found for current thread So I need to add... with db.transaction: computers_node = graphdb.node(Name=Computers) ... which raises the issue of transactions and database connections. I'm unsure when to use transactions and how to use connections. Either way, I regularly seem to bump into a... jpype._jexception.RuntimeExceptionPyRaisable: org.neo4j.kernel.impl.transaction.TransactionFailureException: Could not create data source [nioneodb], see nested exception for cause of error Because I like to use the interpreter and learn what objects can do, and later create .py files and import them. It's not clear if I can have more than one db connection open at once, or I should open and shutdown the database everytime... or is it better to have one database connection hanging around in a file somewhere? At the moment I'm trying to create a script that, takes a csv file of people, adds them, then tries to get the data out somehow, like this... import stuff.neo_utils as neo # see below neo.import_people() #import a csv 7051 #the id of the root_person neo.people(7051) #get the people out via the root_person #nothing! neo.people(8224) # the id of the last_person Node id=7051 My question is this... am I doing it all wrong? Could someone create a very simple example that say, populates a graph, gets data out, manipulates that data and then searches that data (say for an attribute, or to see if it exists etc) in a single python file? So that I can begin to build up my understanding, thanks for listening, tom #!/usr/bin/env python import sys, random, csv from time import sleep from random import randint import neo4j class Person(neo4j.Traversal): types = [ neo4j.Outgoing.is_a ] order = neo4j.BREADTH_FIRST stop = neo4j.STOP_AT_END_OF_GRAPH returnable = neo4j.RETURN_ALL_BUT_START_NODE def people(person_root_id ): try:graphdb.shutdown() except:pass graphdb = neo4j.GraphDatabase( neo_db ) with graphdb.transaction: person_root = graphdb.node[person_root_id] for person_node in Person(person_root): try: print %s %s (@depth=%s) % ( person_node['family_name'], person_node['email'],person_node.depth) except: print person_node graphdb.shutdown( ) # The data is like this... #tas...@york.ac.uk Staff Mr T Smith |Computing Service: | |Vanbrugh College V/C/011| |+44 1904 433847| https://www.york.ac.uk/directory/user.yrk/searchdetail.cfm?scope=staffref=M95%27%22%3DYBD8%5B%3ANEJ%27S%27I%2AX%20%20F%2D6Y2%3D%20SR%21A%409%2C%40E2%3D%205%2EFMOM6A%3A%3EWIHV4T%5D%5E%3B%0A%2B4%2D%2A%3EG%2D%2F6EUS%22BI0%20%0Areferrer=searchResults def import_people(name=Untitled, file='/Users/tomsmith/pppeoplepppowered/staff/staff.csv', ): 'load a lot of people into the database, connecting each to a root Person object by a is_a relationship, spurious I know ' graphdb = neo4j.GraphDatabase( neo_db ) with graphdb.transaction: person_root = graphdb.node(name=Person) # create a root node of sorts person_root_id = person_root.id csvReader = csv.reader(open(file), delimiter=' ', quotechar='|') for row in csvReader:
Re: [Neo4j] Python Newbie put another way...
Tom, just tried this, works for me (notice there is no #!/usr/bin/env python at the beginning of the file). If this and the last trivial snippet I sent does not work, I would suspect some hickups in the setup and installation of JPype and the Neo4j bindings? I did (on Mac OS X, Python 2.6) 1. download http://sourceforge.net/projects/jpype/files/ 2. unzip and run python setup.py build python setup.py install 3. svn export https://svn.neo4j.org/components/neo4j.py/trunk neo4j-python 4. cd neo4j-python sudo python setup.py install 5. run the above script with python test.py Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Jul 27, 2010 at 6:39 PM, Tom Smith tas...@york.ac.uk wrote: This doesn't work (from the tutorial page)... any ideas where I'm going wrong? Thanks... import neo4j from neo4j.util import Subreference graphdb = neo4j.GraphDatabase( test_neo4j_db ) class SubCategoryProducts(neo4j.Traversal): types = [neo4j.Outgoing.SUBCATEGORY, neo4j.Outgoing.PRODUCT] def isReturnable(self, pos): if pos.is_start: return False return pos.last_relationship.type == 'PRODUCT' def attributes(product_node): for category in categories(product_node): for attr in category.ATTRIBUTE: yield attr class categories(neo4j.Traversal): types = [neo4j.Incoming.PRODUCT, neo4j.Incoming.SUBCATEGORY] def isReturnable(self, pos): return not pos.is_start with graphdb.transaction: attribute_subref_node = Subreference.Node.ATTRIBUTE_ROOT(graphdb) #attribute_subref_node.ATTRIBUTE_TYPE(Price) #Fails, do I need to pass an object? #attribute_subref_node.ATTRIBUTE_TYPE(Length) #attribute_subref_node.ATTRIBUTE_TYPE(Name) category_subref_node = Subreference.Node.CATEGORY_ROOT(graphdb, Name=Products) computers_node = graphdb.node(Name=Computers) #create some categories electronics_node = graphdb.node(Name=Laptops) electronics_node.SUBCATEGORY(computers_node) netbooks_node = graphdb.node(Name=Netbooks) netbooks_node.SUBCATEGORY(computers_node) desktops_node = graphdb.node(Name=Netbooks) desktops_node.SUBCATEGORY(computers_node) #create some products little_dell = graphdb.node(Name=Little Dell, Colour=red, Price=210) little_dell.is_a( netbooks_node ) print little_dell.id little_acer = graphdb.node(Name=Little Acer, Colour=grey) little_acer.is_a( netbooks_node ) print little_acer.id little_eee = graphdb.node(Name=Little EEE, Colour=white, Price=200 ) little_eee.is_a( netbooks_node ) print little_eee.id for rel in computers_node.SUBCATEGORY.outgoing: print rel.end['Name'] for prod in SubCategoryProducts(computers_node): print prod['Name'] for attr in attributes(prod): print attr['Name'], of type , attr.end['Name'] print graphdb.shutdown() ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] check for existing relationship between two nodes
The latest snapshots of things can be found at http://m2.neo4j.org/ and this component (jar-file) can be found in http://m2.neo4j.org/org/neo4j/neo4j-lucene-index/0.1-SNAPSHOT/ 2010/7/30, Arijit Mukherjee ariji...@gmail.com: Thanx Mattias. Can I download a tar.gz or zip file from somewhere? I'm not using Maven in my projects yet...I mean I'm not very comfortable with it. Arijit On 30 July 2010 17:33, Mattias Persson matt...@neotechnology.com wrote: Looping through relatiomships manually is the way to go. However there's a new component in https://svn.neo4j.org/laboratory/components/lucene-index/ which can index relationships and do fast lookups on whether or not a relationship (with a certain attribute even) exists between two nodes. You'll need to go with the latest kernel then as well (as seen in https://svn.neo4j.org/laboratory/components/lucene-index/pom.xml). 2010/7/30, Arijit Mukherjee ariji...@gmail.com: Hi All I have a requirement where I must check if there is an already existing relationship between two nodes (say N1 and N2). Right now, I'm doing it as follows: boolean found = false; final IterableRelationship currentRels = N1.getRelationships(RelTypes.KNOWS, Direction.OUTGOING); for (Relationship rel : currentRels) { found = rel.getEndNode().equals(N2); if (found) { do something - like add some property to the existing relationship; break; } } if (!found) { create new relationship between N1 and N2; } This means, for a high volume of data, all the relations going out of N1 will be retrieved and checked - and this seems costly. I'm using the 1.0 API, and wasn't able to find anything that would directly check whether N1 has an outgoing relationship with N2 - like N1.hasRelationship(N2, Direction.OUTGOING) - or something similar. I think there was a similar mail sometime ago. Has there been any updates lately which allows such checks? Or, is there any other direct way to do this with the 1.0 API? Regards Arijit -- And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Neo4j 1.1 released
Gentlemen, Neo4j 1.1 has arrived - http://bit.ly/neo4j11 (changelog at http://dist.neo4j.org/CHANGES.txt)! Change your Maven pom.xml and rejoice :) Happy hacking, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Python Struggler...
On 30 Jul 2010, at 15:35, user-requ...@lists.neo4j.org wrote: graphdb = neo4j.GraphDatabase(db) with graphdb.transaction: person = graphdb.node(name=Person) peter = graphdb.node(name=Peter) peter.IS_A(person) print peter['name'] graphdb.shutdown() Yes that works great... I think I'm having problems iterating the data once it is in. I'm trying to save data in from a simple web crawler... A and B below work OK. But C and D are hopeless. I'm assuming I don't HAVE to create a Traversal class to simply iterate through records... or is that not so? thanks, Tom A. DO SOME SETUP STUFF db = neo4j.GraphDatabase(example_db) pages = db.index(pages, create=True) # create an index called 'pages' url = 'http://pypi.python.org/' def create_page(url, code='' ): page_node = pages[url] # does this page exist yet? if not page_node: page_node = db.node(url=url, code=code) # create a page pages[url] = page_node # Add to index print Created: , url else: print Exists already: , url return page_node B. ADD SOME PAGES # with db.transaction: create_page( 'http://pypi.python.org/' ) create_page( 'http://diveintopython.org/' ) create_page( 'http://pypi.python.org/' ) create_python( 'http://stackoverflow.com/questions/tagged/python') C. NOW GET SOME OUT ### def get_one(url): with db.transaction: node = pages [url ] if node == None: print Node is none! return node print get_one( url ) # fails # D. TRY TO ITERATE def list_all_pages( ): 'Just iterate through the pages to make sure the data in in there... with db.transaction: for node in db.node: er... def delete_one( url ): '' ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] [ANN] Grails Neo4j plugin 0.3 released
Hi, today I released an update of the Grails Neo4j plugin (http://www.grails.org/plugin/neo4j). The main changes are: * compatibility with Grails 1.3.x. Be aware, Grails 1.3 - 1.3.3 are suffering from http://jira.codehaus.org/browse/GRAILS-6427, so either use Grails 1.2.x, or be brave and use a recent git build of Grails 1.3.4-SNAPSHOT * usage of Neo4j 1.1 (released today just a few hours ago, so get it while it's hot). all changes: ** Bug * [GRAILSPLUGINS-2302] - home link broken in the org.codehaus.groovy.grails.plugins.neo4j.Neo4jController views * [GRAILSPLUGINS-2303] - Problems with annotation Neo4jEntity * [GRAILSPLUGINS-2345] - upgrade to Neo4j 1.1 * [GRAILSPLUGINS-2346] - domainclass.get() throws exception if id is not invalid * [GRAILSPLUGINS-2347] - domainClass.findAllByField(value) fails ** Improvement * [GRAILSPLUGINS-2349] - provide compatibility for Grails 1.3.x Regards, Stefan ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Stability of Iterators Across Transactions?
Very interesting question, Alex. Since you've potentially mutated the collection you're Iterating, the correct thing to do is to invalidate the iterator, but I do see the need to meet your functional requirements of incremental updates. While I'm not sure it would be ideal, one approach you could possibly take is to do the deletes on another worker thread. You could a variety of techniques to get the work to those worker thread(s) (queues, etc.), and in theory, the iterator would continue on with its work on the old view of the graph. While this could be a big consistency problem for many applications, if you knew that in your application it would not be, it might work. Not sure what happens to iterators in the case of other-thread commits. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Alex D'Amour Sent: Friday, July 30, 2010 1:21 PM To: Neo user discussions Subject: [Neo4j] Stability of Iterators Across Transactions? Hello all, I have an application where I have a node that has several hundred thousand relationships (this probably needs to be changed). In the application I iterate over these relationships, and delete a large subset of them. Because there are so many writes, I want to commit the transaction every few thousand deletions. The problem is that the getAllRelationships iterator seems to halt after the first transaction commit. Clearly, I should reduce the number of relationships that are connected to this node, but is this the expected behavior? Should iterators be made stable across transactions, or are they only supposed to be guaranteed within a transaction? Thanks, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] [ANN] Grails Neo4j plugin 0.3 released
Stefan, that was record time man! Congrats! Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Fri, Jul 30, 2010 at 7:10 PM, Stefan Armbruster ml...@armbruster-it.de wrote: Hi, today I released an update of the Grails Neo4j plugin (http://www.grails.org/plugin/neo4j). The main changes are: * compatibility with Grails 1.3.x. Be aware, Grails 1.3 - 1.3.3 are suffering from http://jira.codehaus.org/browse/GRAILS-6427, so either use Grails 1.2.x, or be brave and use a recent git build of Grails 1.3.4-SNAPSHOT * usage of Neo4j 1.1 (released today just a few hours ago, so get it while it's hot). all changes: ** Bug * [GRAILSPLUGINS-2302] - home link broken in the org.codehaus.groovy.grails.plugins.neo4j.Neo4jController views * [GRAILSPLUGINS-2303] - Problems with annotation Neo4jEntity * [GRAILSPLUGINS-2345] - upgrade to Neo4j 1.1 * [GRAILSPLUGINS-2346] - domainclass.get() throws exception if id is not invalid * [GRAILSPLUGINS-2347] - domainClass.findAllByField(value) fails ** Improvement * [GRAILSPLUGINS-2349] - provide compatibility for Grails 1.3.x Regards, Stefan ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Stability of Iterators Across Transactions?
Hello all, I have an application where I have a node that has several hundred thousand relationships (this probably needs to be changed). In the application I iterate over these relationships, and delete a large subset of them. Because there are so many writes, I want to commit the transaction every few thousand deletions. The problem is that the getAllRelationships iterator seems to halt after the first transaction commit. Clearly, I should reduce the number of relationships that are connected to this node, but is this the expected behavior? Should iterators be made stable across transactions, or are they only supposed to be guaranteed within a transaction? Thanks, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Stumped by performance issue in traversal - would take a month to run!
Hi, so I got 2GB more RAM and noticed that after adding some more memory map and increasing the heap space, my small query went from 6hrs to 3min. Quite reasonable! But the larger one that would take a month would still take a month. So I've been performance testing parts of it: The algorithm as in my first post showed *no* performance improvement on more RAM. But individual parts - Traversing only (first three lines) was much speedier, but still seems slow. 1.5 million traversals (15 out of 7000 items) took 23sec. It shaves off a few seconds if I run this twice and time it the second time, or if I don't print any node properties as I traverse. (Does Neo4J load ALL the properties for a node if one is accessed?) Even with a double run and not reading node properties, it still takes 16sec, which would make traversal take two hours. I thought Neo4J was suppposed to do ~1m traversals/sec, this is doing about 100k. Why? (And in fact on the other query it was getting about 800,000 traversals/sec.) Is one of Traversers vs. getRelationship iterators faster when getting all relationships of a type at depth 1? - Searching for relationships between A B (but not writing to them) takes it from 20s to 91s. Yuck. Maybe edge indexing is the way to avoid that? - Incrementing a property on the root node for every A B takes it from 20s to 61s (57s if it's all in one transaction). THAT seems weird. I imagine it has something to do with logging changes? Any way that can be turned off for a particular property (like it could be marked 'volatile' during a transaction or something)? I'm much more hopeful with the extra RAM but it's still kind of slow. Suggestions? Thanks, Jeff Klann On Wed, Jul 28, 2010 at 11:20 AM, Jeff Klann jkl...@iupui.edu wrote: Hi, I have an algorithm running on my little server that is very very slow. It's a recommendation traversal (for all A and B in the catalog of items: for each item A, how many customers also purchased another item in the catalog B). It's processed 90 items in about 8 hours so far! Before I dive deeper into trying to figure out the performance problem, I thought I'd email the list to see if more experienced people have ideas. Some characteristics of my datastore: it's size is pretty moderate for a database application. 7500 items, not sure how many customers and purchases (how can I find the size of an index?) but probably ~1 million customers. The relationshipstore + nodestore 500mb. (Propertystore is huge but I don't access it much in traversals.) The possibilities I see are: 1) *Neo4J is just slow.* Probably not slower than Postgres which I was using previously, but maybe I need to switch to a distributed map-reduce db in the cloud and give up the very nice graph modeling approach? I didn't think this would be a problem, because my data size is pretty moderate and Neo4J is supposed to be fast. 2) *I just need more RAM.* I definitely need more RAM - I have a measly 1GB currently. But would this get my 20day traversal down to a few hours? Doesn't seem like it'd have THAT much impact. I'm running Linux and nothing much else besides Neo4j, so I've got 650m physical RAM. Using 300m heap, about 300m memory-map. 3) *There's some secret about Neo4J performance I don't know.* Is there something I'm unaware that Neo4J is doing? When I access a property, does it load a chunk of properties I don't care about? For the current node/edge or others? I turned off log rotation and I commit after each item A. Are there other performance tips I might have missed? 4) *My algorithm is inefficient.* It's a fairly naive algorithm and maybe there's some optimizations I can do. It looks like: For each item A in the catalog: For each customer C that has purchased that item: For each item B that customer purchased: Update the co-occurrence edge between AB. (If the edge exists, add one to its weight. If it doesn't exist, create it with weight one.) This is O(n^2) worst case, but practically it'll be much better due to the sparseness of purchases. The large number of customers slows it down, though. The slowest part, I suspect, is the last line. It's a lot of finding and re-finding edges between As and Bs and updating the edge properties. I don't see much way around it, though. I wrote another version that avoids this but is always O(n^2), and it takes about 15 minutes per A to check against all B (which would also take a month). The version above seems to be averaging 3 customers/sec, which doesn't seem that slow until you realize that some of these items were purchased by thousands of customers. I'd hate to give up on Neo4J. I really like the graph database concept. But can it handle data? I hope someone sees something I'm doing wrong. Thanks, Jeff Klann ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user