Bruno, it would be probably easiest for you to put the code in a public repository (e.g. github) for people to look on, fork and refactor it to the correct solution.
I saw that you used the BatchInserter API and the transactional Neo4j API calls. Please note that the Batch-Inserter is intended for fast insertions of large amounts (millions or billons of nodes + rels) data into the graph and is not transactional, nor thread safe and has to be shut-down properly to not corrupt the db. The Neo4j-API is the preferred one for day-to-day work with the graph database. You could also put your benchmark online for people to look at and comment. Cheers Michael Am 11.08.2011 um 19:37 schrieb Bruno Paiva Lima da Silva: > Hello, > > My name is Bruno Paiva Lima da Silva, and I am a PhD student at LIRMM in > Montpellier, France investigating graph-based conjunctive query > answering. A very quick description of my thesis can be found at: [ > http://bplsilva.com/en/ ] , or, for more details, please check my > presentations [ http://bplsilva.com/en/research/talks/ ]. > > The reason why I am writing to you is that I require further information > regarding your storage system. In my work I aim comparing several > different storage systems for conjective querying. To this end I have > implemented a common and abstract interface in Java that uses in the > "logical" representation (as defined in First-Order Logic) of a factual > piece of knowledge. The formula is then sent to different storage > systems (e.g. DEX, HyperGraph, MySQL, Neo4J, OrientDB, Sqlite, and > others), one of them being your system, and the storage and querying > time is then logged for further analysis and comparisons. > > To this end, I am testing this architecture by the means of an > incrementally large RDF file, that can go from 10k triples up to 5M > triples (and more). > > The main reason of this e-mail, beside annoucing you that I am using > your system in a research scope, is to understand whether I am using it > correctly for our purpose, ensuring the validity of the results I obtain > when I run my tests. > > I am particularly interested in knowing whether I am using the best > solution with Neo4J within 6 of our functions. In red you will find the > way we perform it as of today. > > - (1) Creating a new graph > > public Neo4jGraph(String s) throws Exception { > super(s); > directory = "alaska-data/neo4j/" + s; > graph = new EmbeddedGraphDatabase(directory); > } > > > - (2) Adding a new node to the graph > > public long addTerm(Object label) { > Map<String,Object> properties = new > HashMap<String,Object>(); > properties.put("label",label.toString()); > long newNode = inserter.createNode(properties); > batchIndex.add(newNode,properties); > batchIndex.flush(); > return newNode; > } > > - (3) Adding a new edge to the graph > > public void addAtom(Object predicateLabel, ArrayList<Object> > termObjects) throws Exception { > Long n1 = getNodeByLabel(termObjects.get(0)); > Long n2 = getNodeByLabel(termObjects.get(1)); > > if (n1 == null) { n1 = addTerm(termObjects.get(0)); } > if (n2 == null) { n2 = addTerm(termObjects.get(1)); } > > if (n1 != n2) { > > inserter.createRelationship(n1,n2,DynamicRelationshipType.withName(predicateLabel.toString()),null); > > > } > } > > > > - (4) Retrieving all the nodes of the graph > > public ArrayList<ITerm> getTerms() throws Exception { > ArrayList<ITerm> terms = new ArrayList<ITerm>(); > for (Node n : graph.getAllNodes()) { > if (n.getId() != 0) { > Term newTerm = new > Term(n.getProperty("label")); > terms.add(newTerm); > } > } > return terms; > } > > - (5) Retrieving all the edges of the graph > > public ArrayList<IAtom> getAtoms() throws Exception { > ArrayList<IAtom> atomsToReturn = new ArrayList<IAtom>(); > > for (Node n : graph.getAllNodes()) { > Iterable<Relationship> rel = > n.getRelationships(Direction.OUTGOING); > > for (Relationship r : rel) { > String predFullName = > r.getType().toString(); > int predStart = > predFullName.indexOf("[") + 1; > String predName = > predFullName.substring(predStart,predFullName.length()-1); > > Predicate pred = new > Predicate(predName,2); > > ArrayList<ITerm> atomTerms = new > ArrayList<ITerm>(); > > ITerm nt1 = new > Term(r.getStartNode().getProperty("label")); > ITerm nt2 = new > Term(r.getEndNode().getProperty("label")); > atomTerms.add(nt1); > atomTerms.add(nt2); > > IAtom newAtom = new Atom(pred,atomTerms); > atomsToReturn.add(newAtom); > } > } > return atomsToReturn; > } > > - (6) Retrieving a node by its label > > public Long getNodeByLabel(Object label) { > IndexHits<Long> hits = > batchIndex.get("label",label.toString()); > if (hits.size() == 0) { return null; } > else { return hits.getSingle(); } > } > > - (7) Identifying whether there is an edge between two nodes or not > > public boolean areConnected(ITerm t1,ITerm t2,Predicate p,int pos) > throws Exception { > Direction dir; > if (pos == 0) { dir = Direction.OUTGOING; } > else { dir = Direction.INCOMING; } > > long l = getNodeByLabel(t1.getLabel().toString()); > Node n = graph.getNodeById(l); > Iterable<Relationship> rel = n.getRelationships(dir); > > for (Relationship r : rel) { > String predFullName = r.getType().toString(); > int predStart = predFullName.indexOf("[") + 1; > String predName = > predFullName.substring(predStart,predFullName.length()-1); > > if (predName.equals(p.getLabelToString())) { > if (dir == Direction.OUTGOING) { > String otherTerm = > r.getEndNode().getProperty("label").toString(); > if > (otherTerm.equals(t2.getLabel().toString())) { return true; } > } > else { > String otherTerm = > r.getStartNode().getProperty("label").toString(); > if > (otherTerm.equals(t2.getLabel().toString())) { return true; } > } > } > } > return false; > } > > Being aware of that, I ask you to read carefully the small pieces of > code attached to this e-mail, answering whether there is a manner to > improve them or not, principally when speaking of reduction of the > number of operations, execution time and memory usage. > > By the way, do not hesitate to contact me if you are further interested > in the results obtained. > > Thank you, > > Bruno Paiva Lima da Silva > PhD Student > GraphIK Research Team > LIRMM - Montpellier, France > _______________________________________________ > Neo4j mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

