Re: [Neo4j] Further information about Neo4J

Michael Hunger Thu, 11 Aug 2011 12:31:14 -0700

Bruno,

it would be probably easiest for you to put the code in a public repository 
(e.g. github) for people to look on,
fork and refactor it to the correct solution.


I saw that you used the BatchInserter API and the transactional Neo4j API calls.

Please note that the Batch-Inserter is intended for fast insertions of large 
amounts (millions or billons of nodes + rels) data into the graph
and is not transactional, nor thread safe and has to be shut-down properly to 
not corrupt the db.

The Neo4j-API is the preferred one for day-to-day work with the graph database.

You could also put your benchmark online for people to look at and comment.

Cheers 

Michael

Am 11.08.2011 um 19:37 schrieb Bruno Paiva Lima da Silva:

> Hello,
> 
> My name is Bruno Paiva Lima da Silva, and I am a PhD student at LIRMM in 
> Montpellier, France investigating graph-based conjunctive query 
> answering. A very quick description of my thesis can be found at: [ 
> http://bplsilva.com/en/ ] , or, for more details, please check my 
> presentations [ http://bplsilva.com/en/research/talks/ ].
> 
> The reason why I am writing to you is that I require further information 
> regarding your storage system. In my work I aim comparing several 
> different storage systems for conjective querying. To this end I have 
> implemented a common and abstract interface in Java that uses in the 
> "logical" representation (as defined in First-Order Logic) of a factual 
> piece of knowledge. The formula is then sent to different storage 
> systems (e.g. DEX, HyperGraph, MySQL, Neo4J, OrientDB, Sqlite, and 
> others), one of them being your system, and the storage and querying 
> time is then logged for further analysis and comparisons.
> 
> To this end, I am testing this architecture by the means of an 
> incrementally large RDF file, that can go from 10k triples up to 5M 
> triples (and more).
> 
> The main reason of this e-mail, beside annoucing you that I am using 
> your system in a research scope, is to understand whether I am using it 
> correctly for our purpose, ensuring the validity of the results I obtain 
> when I run my tests.
> 
> I am particularly interested in knowing whether I am using the best 
> solution with Neo4J within 6 of our functions. In red you will find the 
> way we perform it as of today.
> 
> - (1) Creating a new graph
> 
> public Neo4jGraph(String s) throws Exception {
>                 super(s);
>                 directory = "alaska-data/neo4j/" + s;
>                 graph = new EmbeddedGraphDatabase(directory);
>         }
> 
> 
> - (2) Adding a new node to the graph
> 
> public long addTerm(Object label) {
>                 Map<String,Object> properties = new 
> HashMap<String,Object>();
>                 properties.put("label",label.toString());
>                 long newNode = inserter.createNode(properties);
>                 batchIndex.add(newNode,properties);
>                 batchIndex.flush();
>                 return newNode;
>         }
> 
> - (3) Adding a new edge to the graph
> 
> public void addAtom(Object predicateLabel, ArrayList<Object> 
> termObjects) throws Exception {
>                 Long n1 = getNodeByLabel(termObjects.get(0));
>                 Long n2 = getNodeByLabel(termObjects.get(1));
> 
>                 if (n1 == null) { n1 = addTerm(termObjects.get(0)); }
>                 if (n2 == null) { n2 = addTerm(termObjects.get(1)); }
> 
>                 if (n1 != n2) {
> 
> inserter.createRelationship(n1,n2,DynamicRelationshipType.withName(predicateLabel.toString()),null);
>  
> 
>                 }
>         }
> 
> 
> 
> - (4) Retrieving all the nodes of the graph
> 
> public ArrayList<ITerm> getTerms() throws Exception {
>                 ArrayList<ITerm> terms = new ArrayList<ITerm>();
>                 for (Node n : graph.getAllNodes()) {
>                         if (n.getId() != 0) {
>                                 Term newTerm = new 
> Term(n.getProperty("label"));
>                                 terms.add(newTerm);
>                         }
>                 }
>                 return terms;
>         }
> 
> - (5) Retrieving all the edges of the graph
> 
> public ArrayList<IAtom> getAtoms() throws Exception {
>                 ArrayList<IAtom> atomsToReturn = new ArrayList<IAtom>();
> 
>                 for (Node n : graph.getAllNodes()) {
>                         Iterable<Relationship> rel = 
> n.getRelationships(Direction.OUTGOING);
> 
>                         for (Relationship r : rel) {
>                                 String predFullName = 
> r.getType().toString();
>                                 int predStart = 
> predFullName.indexOf("[") + 1;
>                                 String predName = 
> predFullName.substring(predStart,predFullName.length()-1);
> 
>                                 Predicate pred = new 
> Predicate(predName,2);
> 
>                                 ArrayList<ITerm> atomTerms = new 
> ArrayList<ITerm>();
> 
>                                 ITerm nt1 = new 
> Term(r.getStartNode().getProperty("label"));
>                                 ITerm nt2 = new 
> Term(r.getEndNode().getProperty("label"));
>                                 atomTerms.add(nt1);
>                                 atomTerms.add(nt2);
> 
>                                 IAtom newAtom = new Atom(pred,atomTerms);
>                                 atomsToReturn.add(newAtom);
>                         }
>                 }
>                 return atomsToReturn;
>         }
> 
> - (6) Retrieving a node by its label
> 
> public Long getNodeByLabel(Object label) {
>                 IndexHits<Long> hits = 
> batchIndex.get("label",label.toString());
>                 if (hits.size() == 0) { return null; }
>                 else { return hits.getSingle(); }
>         }
> 
> - (7) Identifying whether there is an edge between two nodes or not
> 
> public boolean areConnected(ITerm t1,ITerm t2,Predicate p,int pos) 
> throws Exception {
>                 Direction dir;
>                 if (pos == 0) { dir = Direction.OUTGOING; }
>                 else { dir = Direction.INCOMING; }
> 
>                 long l = getNodeByLabel(t1.getLabel().toString());
>                 Node n = graph.getNodeById(l);
>                 Iterable<Relationship> rel = n.getRelationships(dir);
> 
>                 for (Relationship r : rel) {
>                         String predFullName = r.getType().toString();
>                         int predStart = predFullName.indexOf("[") + 1;
>                         String predName = 
> predFullName.substring(predStart,predFullName.length()-1);
> 
>                         if (predName.equals(p.getLabelToString())) {
>                                 if (dir == Direction.OUTGOING) {
>                                         String otherTerm = 
> r.getEndNode().getProperty("label").toString();
>                                         if 
> (otherTerm.equals(t2.getLabel().toString())) { return true; }
>                                 }
>                                 else {
>                                         String otherTerm = 
> r.getStartNode().getProperty("label").toString();
>                                         if 
> (otherTerm.equals(t2.getLabel().toString())) { return true; }
>                                 }
>                         }
>                 }
>                 return false;
>         }
> 
> Being aware of that, I ask you to read carefully the small pieces of 
> code attached to this e-mail, answering whether there is a manner to 
> improve them or not, principally when speaking of reduction of the 
> number of operations, execution time and memory usage.
> 
> By the way, do not hesitate to contact me if you are further interested 
> in the results obtained.
> 
> Thank you,
> 
> Bruno Paiva Lima da Silva
> PhD Student
> GraphIK Research Team
> LIRMM - Montpellier, France
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Further information about Neo4J

Reply via email to