Hi Erik, I don't think there's much to be done about it. An on-disk Lucene index (not even an in-memory Lucene index I can imagine) will never be as fast as a HashMap or similar approach. Although you're using setCacheCapacity on the index which should make it pretty close to the performance of a HashMap for cache hits, unfortunately there's a bug which prevents results from ending up in the cache... it's in my backlog.
2011/7/29 Erik Fäßler <erik.faess...@uni-jena.de> > Hi all, > > I've been doing preliminary evaluations on some Neo4j operations. One of > which rises from a specific need in my application: > My method will get a List of node ids (stored in the nodes' properties) and > need to retrieve exactly these nodes from the GraphDB. This should happen as > fast as possible, of course. I used an index for the ids. My code is as > follows: > > private static final int SAMPLE_SIZE = 100000; > > ... > > GraphDatabaseService graphDb = new EmbeddedGraphDatabase("tmp/graphdb"); > > Transaction t = graphDb.beginTx(); > IndexManager im = graphDb.index(); > Index<Node> ni = im.forNodes("nodes"); > ( (LuceneIndex<Node>) ni ).setCacheCapacity( "nodes", 500000 > ); > for (int i = 0; i < SAMPLE_SIZE; ++i) { > Node n = graphDb.createNode(); > n.setProperty("id", i); > ni.add(n, "id", n.getProperty("id")); > } > t.success(); > t.finish(); > > long time = System.currentTimeMillis(); > for (int i = 0; i < SAMPLE_SIZE; ++i) { > Node n = ni.get("id", i).getSingle(); > } > System.out.println(System.currentTimeMillis() - time); > > > It works, but is rather slow. If I do the last loop a second time, the > Lucene cache kicks in and reduces the required time by half. But then it's > still some time (2000ms on my machine). > When I do the exact same thing with a HashMap for example, the same loop > (with call Node n = ni.get("id", i).getSingle();) takes about 10ms. > > I now HashMaps have other drawbacks such like memory consumption. For my > use case this wouldn't be the problem, however, as I would only have to > cache about 1M nodes which is perfectly possible in a HashMap. My main > question is: Have I done something wrong in my usage of the Lucene index? > Can it be sped up somehow? Or will I always be served better performance > wise using a HashMap for such cases where I have a large amount of single > queries? > > Thank you and best regards, > > Erik > > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user