I've been asked to kick the tires on OrientDB as a possible graph DB
solution for an upcoming project at my company. In order to do so, I've
spun up an EC2 instance using the OrientDB marketplace AMI on a m4.xlarge
box with an EBS drive (picked as a general purpose box since I couldn't
find any hardware recommendations via documentation or searches). I've got
the database running, but when I try and use the ETL bulk import tools with
CSV files, I'm seeing what I consider very poor performance compared to the
claims I have read. The best I've seen is @ 5K records/second loaded. The
existing documentation leaves a bit to be desired, so I was hoping someone
might be able to offer some insight.
Here are some details (I've scaled things back trying to understand where I
may have gone wrong).
- One file 2 million records that has two columns (record key and text
field). E.g. ABC\tString here
- A class schema was predefined outside the ETL config script with those
two fields and an index on the id field
- This ETL script - based on one in the documentation - I am running on
the EC2 box (I am using remote: connection as the project will consist of
a distributed DB. even though both are on the same box right now)
{
"source":{
"file":{
"path":"/user/poc1_Datasets/organization.tsv"
}
},
"extractor":{
"row":{
}
},
"transformers":[
{
"csv":{
"separator": "\t"
}
},
{
"vertex":{
"class":"Organization"
}
},
],
"loader":{
"orientdb":{
"dbURL":"remote:localhost/DataSpine1",
"dbType":"graph",
"wal":false,
"tx":false,
"batchCommit":25000
}
}
}
The final output of the ETL loader in this case was:
END ETL PROCESSOR
+ extracted 1,822,150 rows (3,904 rows/sec) - 1,822,150 rows -> loaded
1,822,149 vertices (3,907 vertices/sec) Total time: 520411ms [0 warnings, 0
errors]
Does using the remote: protocol really kill performance that greatly? I
believe the AMI has configured the data to be sitting on the EBS drive.
Should I try and find an instance that would leverage the local ephemeral?
Any insights you could provide would be appreciated.
Curt
--
---
You received this message because you are subscribed to the Google Groups
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.