Hello,
To get an idea of the context of the situation, I was testing OrientDB a
while ago for my workplace by inserting 10 million documents into a
database. The document contained a 'hash' field of type String. At first I
had a unique index on that 'hash' field, but I changed it to a nonunique
index when I ran into an odd situation.
Assumptions:
- Remote connection to a server was used by the client
- Only 1 server was used, but was configured for a distributed cluster
The first time I ran this test with the unique index on the 'hash' field, I
was getting some collision exceptions and I knew that shouldn't have been
possible since I was using SHA256 hashes and the inputs for the hash
algorithm were unique. After using a nonunique index with the 'hash' field
I saw several quorum exceptions that were thrown throughout the insertions.
My write quorum is set to 1 and there is only 1 node in the cluster though.
After using the console to look at the database I noticed there were 7
extra Hash documents inserted into the database. I inserted 10 million Hash
documents starting with an empty database, but there were 10,000,007 listed
when I used the 'info' command. This amount of duplicate documents would
vary each time I ran this test as well. I later added the input to the
SHA256 hash as a field to the documents and wrote some Java code to double
check that these extra documents were duplicates as well. They did end up
being duplicates. I know for sure that I did not insert them into the
database.
Does anybody have any ideas of what the problem could be? I'm thinking it
has to do with the fact that my server is setup to be distributed, but
doesn't have any other servers in the cluster to talk to. I'm not entirely
sure though. All I know is that my boss doesn't like that OrientDB is
inserting duplicate data into the database. I'm sure a unique index would
solve the problem, but it makes him feel like he can't trust OrientDB with
keeping an accurate record of the data we insert. The data we're planning
on storing in OrientDB will have to be accurate and duplicate data might be
bad for our application.
Thanks for reading!
My OrientDB server uses the following dristributed configuration file:
{
"autoDeploy": true,
"hotAlignment": false,
"executionMode": "undefined",
"readQuorum": 1,
"writeQuorum": 1,
"failureAvailableNodesLessQuorum": false,
"readYourWrites": true,
"clusters": {
"internal": {
},
"index": {
},
"*": {
"servers" : ["<NEW_NODE>"]
}
}
}
Here's a simplified version of my program that inserted the documents:
//This created the schema for the Hash document
String database = "remote:127.0.0.1/hashdb";
OPartitionedDatabasePool pool = new
OPartitionedDatabasePool(database, "root", "asdf123$");
ODatabaseDocumentTx db = pool.acquire();
try {
OSchemaProxy schema = db.getMetadata().getSchema();
OClass strHash = schema.createClass("Hash");
strHash.createProperty("hash",
OType.STRING).setMandatory(true).setNotNull(true);
strHash.createProperty("index",
OType.INTEGER).setMandatory(true).setNotNull(true);
strHash.createIndex("hash", OClass.INDEX_TYPE.NOTUNIQUE, "hash");
/*
for (int i=1; i<16; i++) {
strHash.createProperty(String.format("field%d", i), OType.STRING);
}*/
schema.save();
} catch(Exception e) {
e.printStackTrace();
}
db.close();
pool.close();
//This part was used for the insertion of the Hash documents.
OPartitionedDatabasePool pool = new
OPartitionedDatabasePool("remote:127.0.0.1/hashdb", "root", "asdf123$");
for (int i=0; i<TOTAL_HASHES; i++) {
ODatabaseDocumentTx db = pool.acquire();
try {
ODocument hash = new ODocument("Hash");
String hashInput = getSHAHash(i);
hash.field("hash", hashInput);
hash.field("index", i);
hash.save();
} catch (Exception e) {
e.printStackTrace();
db.rollback();
continue;
} finally {
db.commit();
db.close();
}
}
pool.close();
--
---
You received this message because you are subscribed to the Google Groups
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.