Hi Erik,

I don't think there's much to be done about it. An on-disk Lucene index (not
even an in-memory Lucene index I can imagine) will never be as fast as a
HashMap or similar approach. Although you're using setCacheCapacity on the
index which should make it pretty close to the performance of a HashMap for
cache hits, unfortunately there's a bug which prevents results from ending
up in the cache... it's in my backlog.

2011/7/29 Erik Fäßler <erik.faess...@uni-jena.de>

> Hi all,
>
> I've been doing preliminary evaluations on some Neo4j operations. One of
> which rises from a specific need in my application:
> My method will get a List of node ids (stored in the nodes' properties) and
> need to retrieve exactly these nodes from the GraphDB. This should happen as
> fast as possible, of course. I used an index for the ids. My code is as
> follows:
>
> private static final int SAMPLE_SIZE = 100000;
>
> ...
>
> GraphDatabaseService graphDb = new EmbeddedGraphDatabase("tmp/graphdb");
>
>                Transaction t = graphDb.beginTx();
>                IndexManager im = graphDb.index();
>                Index<Node> ni = im.forNodes("nodes");
>                ( (LuceneIndex<Node>) ni ).setCacheCapacity( "nodes", 500000
> );
>                for (int i = 0; i < SAMPLE_SIZE; ++i) {
>                        Node n = graphDb.createNode();
>                        n.setProperty("id", i);
>                        ni.add(n, "id", n.getProperty("id"));
>                }
>                t.success();
>                t.finish();
>
>                long time = System.currentTimeMillis();
>                for (int i = 0; i < SAMPLE_SIZE; ++i) {
>                        Node n = ni.get("id", i).getSingle();
>                }
>                System.out.println(System.currentTimeMillis() - time);
>
>
> It works, but is rather slow. If I do the last loop a second time, the
> Lucene cache kicks in and reduces the required time by half. But then it's
> still some time (2000ms on my machine).
> When I do the exact same thing with a HashMap for example, the same loop
> (with call Node n = ni.get("id", i).getSingle();) takes about 10ms.
>
> I now HashMaps have other drawbacks such like memory consumption. For my
> use case this wouldn't be the problem, however, as I would only have to
> cache about 1M nodes which is perfectly possible in a HashMap. My main
> question is: Have I done something wrong in my usage of the Lucene index?
> Can it be sped up somehow? Or will I always be served better performance
> wise using a HashMap for such cases where I have a large amount of single
> queries?
>
> Thank you and best regards,
>
>        Erik
>
>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to