Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-10-02 Thread Phillip Henry
Done :) Phill On Friday, September 30, 2016 at 8:52:30 AM UTC+1, Andrey Lomakin wrote: > > Hi Philip , > Could you send thread dump for 2.2.10 version ? > > пт, 30 Сен 2016, 8:47 Phillip Henry : > >> Hi, Andrey. >> >> I was using 2.2.10 but just to be sure, I ran it a second

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-30 Thread Andrey Lomakin
Hi Philip , Could you send thread dump for 2.2.10 version ? пт, 30 Сен 2016, 8:47 Phillip Henry : > Hi, Andrey. > > I was using 2.2.10 but just to be sure, I ran it a second time making sure > that 2.2.10 was the first thing in my classpath and I am afraid that I saw > it

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-29 Thread Phillip Henry
Hi, Andrey. I was using 2.2.10 but just to be sure, I ran it a second time making sure that 2.2.10 was the first thing in my classpath and I am afraid that I saw it again. It's quite predictable (anywhere between 200 and 250 million edges). Regards, Phillip On Monday, September 26, 2016 at

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-29 Thread Phillip Henry
Hi, guys. Has there been any movement on this? I've run this twice now (carefully truncating all tables before each run) and seen similar results. Regards, Phillip On Tuesday, September 27, 2016 at 8:15:13 PM UTC+1, Phillip Henry wrote: > > Yes, using 2.2.10. > > > So It looks like 128GB

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-27 Thread Phillip Henry
Yes, using 2.2.10. > So It looks like 128GB wasn't enough. Correct. Just ran it on the larger box and it completed. > do you have the stack trace? I'll send the snippet to the same email address I did last time. > Do you mean with the batch API? Did you call the end() ? Yes, I call

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-27 Thread Luca Garulli
On 27 September 2016 at 08:03, Phillip Henry wrote: > Hi, Luca. > > I've now tried OGraphBatchInsert. It is indeed much faster at about 4.5 > hours for the billion payments. Slower than Neo but we can live with that. > Hi Phillip, Good to know it worked. Are you using last

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-27 Thread Phillip Henry
Hi, Luca. I've now tried OGraphBatchInsert. It is indeed much faster at about 4.5 hours for the billion payments. Slower than Neo but we can live with that. However, I'm having trouble getting a full run. I'm getting OutOfMemory errors with -XX:MaxDirectMemorySize=512G and combinations of:

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-26 Thread Andrey Lomakin
Hi, I have looked at your thread dump we have already identified and fixed your issue in 2.2.9 version. So if you use 2.2.10 (latest one), you will not experience this problem. I strongly recommend using 2.2.10 version because several deadlocks are fixed in 2.2.9 version also 2.2.10 contains few

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-23 Thread Luca Garulli
On 23 September 2016 at 11:23, Phillip Henry wrote: > > will there not be potential contention when the "to" vertex is updated? > > Ah, just re-read your post and you've already answered this. My apologies. > Yes, the idea is that with millions and mullions of vertices,

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-23 Thread Phillip Henry
> will there not be potential contention when the "to" vertex is updated? Ah, just re-read your post and you've already answered this. My apologies. Phill On Friday, September 23, 2016 at 4:51:50 PM UTC+1, Phillip Henry wrote: > > Hi, Luca. > > > How many GB? > > The input file is 22gb of text.

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-23 Thread Phillip Henry
Hi, Luca. > How many GB? The input file is 22gb of text. > If the file is ordered ... You are only sorting by the first account. The second account can be anywhere in the entire range. My understanding is that both vertices are updated when an edge is written. If this is true, will there not

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-23 Thread Luca Garulli
On 23 September 2016 at 03:50, Phillip Henry wrote: > > How big is your file the sort cannot write? > > One bil-ee-on lines... :-P > How many GB? > > ...This should help a lot. > > The trouble is that the size of a block of contiguous accounts in the real > data is

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-23 Thread Phillip Henry
> How big is your file the sort cannot write? One bil-ee-on lines... :-P > ...This should help a lot. The trouble is that the size of a block of contiguous accounts in the real data is not-uniform (even if it might be with my test data). Therefore, it is highly likely a contiguous block of

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-23 Thread Luca Garulli
On 23 September 2016 at 00:49, Phillip Henry wrote: > Hi, Luca. > Hi Phillip. > I have: > > 4. sorting is an overhead, albeit outside of Orient. Using the Unix sort > command failed with "No space left on device". Oops. OK, so I ran my > program to generate the data

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-22 Thread Phillip Henry
Hi, Luca. I have: 1. turned off other nodes in the cluster so there should be no replication during import now nor quorums either. 2. average number of edges is about 1000 per vertex. This will be a similar amount in the real data but the distribution around this mean will be much different

[orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

2016-09-15 Thread Phillip Henry
Hi, Luca. Thanks for getting back to me so quickly. In answer to your questions: 1 & 2. Yes, those numbers are from using "remote" protocol and 3 servers on 3 different boxes. 3. Yes, default configuration. Apart from adding an index for ACCOUNTS, I did nothing further. 4. Good question.