Luca, Thanks for taking the time to reply. In answer to your question, yes and no. I was running the ETL tool on a instance that only had 2 cores, so there was really only one core available for the tool to utilize(hence the 150/sec for one thread result). I actually wrote a simple Spark-based loading program (using OrientGraphNoTx and setting intent for massive insert) and ran it as a job on my AWS Spark cluster for easily controllable parallelization. I was able to run up to 8 worker nodes (basically 8 threads) before I started seeing exceptions come back from the calls for a load rate of @ 1,042 recs/sec (approx 130 rec/sec/thread). I should note this rate was for creating edges between existing vertices from a file that had our internal ids for the nodes. The code had to look up the RIDs based on those keys (which had an index on them) and then create the link (basically the same work that our ETL config file was set up to do on our earlier runs).
Glad to hear that you are going to provide some guidance on cloud deployment recommendations. Having that type of info would have been helpful during this exercise. Curt On Friday, June 24, 2016 at 12:33:52 PM UTC-4, l.garulli wrote: > > Hi guys, > > A couple of week ago we created an internal division in OrientDB to take > care about AWS (and other Cloud). Soon we will publish some metrics about > OrientDB and Amazon AWS server configurations, so it will much easier > choosing the right hw/sw configuration for your workload. > > Back to your first question, I think the ETL is slow because it goes not > in parallel. Have you tried "parallel" option? > > > Best Regards, > > Luca Garulli > Founder & CEO > OrientDB LTD <http://orientdb.com/> > > > On 24 June 2016 at 10:57, Curt Kohler <[email protected] <javascript:>> > wrote: > >> Sorry, I should have been more explicit.. I moved over to the r3 >> instance types to leverage the attached SSD ephemeral drives instead of the >> networked EBS drive to take possible network issues out of the picture.... >> I didn't notice anything specific in iostats when running the loads. >> >> >> On Tuesday, June 21, 2016 at 12:12:33 AM UTC-4, Francisco Reyes wrote: >>> >>> On Monday, June 20, 2016 at 9:56:11 AM UTC-4, Curt Kohler wrote: >>>> >>>> eventually solved the issue). When we were finally able to run the >>>> files successfully, we were seeing throughput in the rand of @ 150 >>>> edges/sec (running with one thread). >>>> >>> >>> Curt, >>> >>> New OrientDB user here.. but was wondering if you checked iostats to see >>> if it was an issue with the disk subsystem. Also, is the disk SSD? Is disk >>> using provisioned IOPS? >>> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "OrientDB" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
