In my OrientDB-based application, I need to do an INSERT-IF-NOT-EXISTS 
operation using the Java (TinkerPop) API.

I have created a vertex type "Identifier." It has a single property, 
"identifier," which contains a URI (effectively a String for purposes of 
this discussion).

I have also created an index like this:

ParametersBuilder builder=new ParametersBuilder(); 

builder.add("class", "Identifier"); 

builder.add("type", "UNIQUE_HASH_INDEX");

graph.createKeyIndex("identifier", Vertex.class, builder.build());


Then, I perform the INSERT-IF-NOT-EXISTS operation in a loop like this. 
This snippet is using the Google Guava libraries and is obviously a 
simplification of our real application:

int n=10000;
for (int i=0; i<n; i++)
{

String myUriStr="http://example.org/"+i.toString();

Iterable<Vertex> vertices=graph.getVertices("identifier", myUriStr);

Vertex vertex=Iterables.getOnlyElement(vertices);

if (null==vertex)

{

// Create vertex

...

}

// Use vertex

...

}


What I am seeing is that the throughput of this loop rapidly diminishes as 
more vertices are added, like this (with the throughput relative to the 
n=1,000 baseline):


n=1,000 throughput=100%
n=2,000 throughput=58.8%
n=5,000 throughput=29.7%

n=10,000 throughput=16.5%


This obviously suggests that indexing is not working, so I tried a SQL 
EXPLAIN command.

*explain select from identifier where identifier='http://example.org/1'*
documentReads=1
fullySortedByIndex=false
documentAnalyzedCompatibleClass=1
recordReads=1
fetchingFromTargetElapsed=0
indexIsUsedInOrderBy=false
compositeIndexUsed=1
current=Identifier#153:0{identifier:http://example.org/1,out_id:[size=1]} v2
involvedIndexes=[Identifier.identifier]
limit=-1
evaluated=1
user=#5:0
elapsed=2.387001
resultType=collection
resultSize=1 
 

The documentation at http://orientdb.com/docs/master/SQL-Explain.html does 
not seem to be 100% current on how to interpret the output of the EXPLAIN 
command, but my interpretation is that the query did recognize and use the 
index that I created.

I also tried some profiling (with JProfiler) and see a hot spot 
at com.tinkerpop.blueprints.impls.orient.OrientElementIterator.hasNext.

All of this is with OrientDB running in embedded mode, on a fairly high-end 
Linux machine and with a fresh, empty database at the beginning of each 
test.

I have to believe I am doing something wrong to see such a rapid drop-off 
in query performance under such relatively small data volumes.

I have been struggling with this for several days off-and-on now and it's 
time to ask for help. Has anyone else encountered a similar issue? What can 
I do to address this?

Thanks in advance!

-- John

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to