In my OrientDB-based application, I need to do an INSERT-IF-NOT-EXISTS
operation using the Java (TinkerPop) API.
I have created a vertex type "Identifier." It has a single property,
"identifier," which contains a URI (effectively a String for purposes of
this discussion).
I have also created an index like this:
ParametersBuilder builder=new ParametersBuilder();
builder.add("class", "Identifier");
builder.add("type", "UNIQUE_HASH_INDEX");
graph.createKeyIndex("identifier", Vertex.class, builder.build());
Then, I perform the INSERT-IF-NOT-EXISTS operation in a loop like this.
This snippet is using the Google Guava libraries and is obviously a
simplification of our real application:
int n=10000;
for (int i=0; i<n; i++)
{
String myUriStr="http://example.org/"+i.toString();
Iterable<Vertex> vertices=graph.getVertices("identifier", myUriStr);
Vertex vertex=Iterables.getOnlyElement(vertices);
if (null==vertex)
{
// Create vertex
...
}
// Use vertex
...
}
What I am seeing is that the throughput of this loop rapidly diminishes as
more vertices are added, like this (with the throughput relative to the
n=1,000 baseline):
n=1,000 throughput=100%
n=2,000 throughput=58.8%
n=5,000 throughput=29.7%
n=10,000 throughput=16.5%
This obviously suggests that indexing is not working, so I tried a SQL
EXPLAIN command.
*explain select from identifier where identifier='http://example.org/1'*
documentReads=1
fullySortedByIndex=false
documentAnalyzedCompatibleClass=1
recordReads=1
fetchingFromTargetElapsed=0
indexIsUsedInOrderBy=false
compositeIndexUsed=1
current=Identifier#153:0{identifier:http://example.org/1,out_id:[size=1]} v2
involvedIndexes=[Identifier.identifier]
limit=-1
evaluated=1
user=#5:0
elapsed=2.387001
resultType=collection
resultSize=1
The documentation at http://orientdb.com/docs/master/SQL-Explain.html does
not seem to be 100% current on how to interpret the output of the EXPLAIN
command, but my interpretation is that the query did recognize and use the
index that I created.
I also tried some profiling (with JProfiler) and see a hot spot
at com.tinkerpop.blueprints.impls.orient.OrientElementIterator.hasNext.
All of this is with OrientDB running in embedded mode, on a fairly high-end
Linux machine and with a fresh, empty database at the beginning of each
test.
I have to believe I am doing something wrong to see such a rapid drop-off
in query performance under such relatively small data volumes.
I have been struggling with this for several days off-and-on now and it's
time to ask for help. Has anyone else encountered a similar issue? What can
I do to address this?
Thanks in advance!
-- John
--
---
You received this message because you are subscribed to the Google Groups
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.