Hi, guys. I'm conducting a proof-of-concept for a large bank (Luca, we had a 'phone conf on August 5...) and I'm trying to bulk insert a humongous amount of data: 1 million vertices and 1 billion edges.
Firstly, I'm impressed about how easy it was to configure a cluster. However, the performance of batch inserting is bad (and seems to get considerably worse as I add more data). It starts at about 2k vertices-and-edges per second and deteriorates to about 500/second after only about 3 million edges have been added. This also takes ~ 30 minutes. Needless to say that 1 billion payments (edges) will take over a week at this rate. This is a show-stopper for us. My data model is simply payments between accounts and I store it in one large file. It's just 3 fields and looks like: FROM_ACCOUNT TO_ACCOUNT AMOUNT In the test data I generated, I had 1 million accounts and 1 billion payments randomly distributed between pairs of accounts. I have 2 classes in OrientDB: ACCOUNTS (extending V) and PAYMENT (extending E). There is a UNIQUE_HASH_INDEX on ACCOUNTS for the account number (a string). We're using OrientDB 2.2.7. My batch size is 5k and I am using the "remote" protocol to connect to our cluster. I'm using JDK 8 and my 3 boxes are beefy machines (32 cores each) but without SSDs. I wrote the importing code myself but did nothing 'clever' (I think) and used the Graph API. This client code has been given lots of memory and using jstat I can see it is not excessively GCing. So, my questions are: 1. what kind of performance can I realistically expect and can I improve what I have at the moment? 2. what kind of degradation should I expect as the graph grows? Thanks, guys. Phillip -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.