Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

Phillip Henry Thu, 29 Sep 2016 22:45:30 -0700

Hi, guys.

Has there been any movement on this? I've run this twice now (carefully 
truncating all tables before each run) and seen similar results.


Regards,

Phillip

On Tuesday, September 27, 2016 at 8:15:13 PM UTC+1, Phillip Henry wrote:
>
> Yes, using 2.2.10.
>
> > So It looks like 128GB wasn't enough. 
>
> Correct. Just ran it on the larger box and it completed.
>
> > do you have the stack trace?
>
> I'll send the snippet to the same email address I did last time.
>
> > Do you mean with the batch API? Did you call the end() ?
>
> Yes, I call OGraphBatchInsert.begin before I write 
> and OGraphBatchInsert.end after each and every batch of 100. I keep a 
> instance per thread of the OGraphBatchInsert for the entirety of the run.
>
> I've let it run to completion but I see a mere 1955 accounts (I was 
> expecting a million) and strangely 1 000 272 324 payments (I was expect 
> exactly one billion).
>
> Regards,
>
> Phill
>
> On Tuesday, September 27, 2016 at 6:07:05 PM UTC+1, l.garulli wrote:
>>
>> On 27 September 2016 at 08:03, Phillip Henry <phill...@gmail.com> wrote:
>>
>>> Hi, Luca.
>>>
>>> I've now tried OGraphBatchInsert. It is indeed much faster at about 4.5 
>>> hours for the billion payments. Slower than Neo but we can live with that.
>>>
>>
>> Hi Phillip,
>> Good to know it worked. Are you using last v2.2.10 right?
>>  
>>
>>> However, I'm having trouble getting a full run.
>>>
>>> I'm getting OutOfMemory errors with -XX:MaxDirectMemorySize=512G and 
>>> combinations of:
>>>
>>> -Xmx51443M -Dstorage.diskCache.bufferSize=60059
>>> -Xmx90G -Dstorage.diskCache.bufferSize=10240
>>> -Xmx90G -Dstorage.diskCache.bufferSize=8192
>>>
>>
>>> So, I've now increased MaxDirectMemorySize to 999G but with no success 
>>> (this box has 128GB of memory).
>>>
>>> On a box with about 250GB where I can increase the heap, it ran to the 
>>> end. There was some SEVEREs at the beginning that said "Previous maximum 
>>> cache size was 3474813 current maximum cache is 278528. Cache state for 
>>> storage /home/d3956122/OrientDB/databases/MyPayments3 will not be restored" 
>>> and some "Exception during commit of active transaction... Database 
>>>  /home/d3956122/OrientDB/databases/MyPayments3 is closed". But after that 
>>> things seem to go well. I hope these initial errors are not too serious?
>>>
>>
>> So It looks like 128GB wasn't enough. 
>>
>> About the two exception, let me ask to the team first. About the second 
>> one "Exception during commit of active transaction... Database 
>>  /home/d3956122/OrientDB/databases/MyPayments3 is closed" do you have the 
>> stack trace?
>>  
>>
>>>
>>> Unfortunately, there was a hiccup in my coding where the app doesn't 
>>> naturally die when all the writing is over (I never stop the thread pool - 
>>> oops). However, it was a day or two later when I killed it. After I killed 
>>> the processes, I was surprised to see only a few hundred vertices in the 
>>> plocal database when I was expecting to see one million (the number of 
>>> edges was much closer to what I expected). At what point are the vertices 
>>> flushed? Can I flush them via the API?
>>>
>>
>> Do you mean with the batch API? Did you call the end() ?
>>  
>>
>>> Regards,
>>>
>>> Phillip
>>>
>>
>>
>> Best Regards,
>>
>> Luca Garulli
>> Founder & CEO
>> OrientDB LTD <http://orientdb.com/>
>>
>> Want to share your opinion about OrientDB?
>> Rate & review us at Gartner's Software Review 
>> <https://www.gartner.com/reviews/survey/home>
>>
>>  
>>
>>>
>>>
>>> On Friday, September 23, 2016 at 6:27:02 PM UTC+1, l.garulli wrote:
>>>>
>>>> On 23 September 2016 at 11:23, Phillip Henry <phill...@gmail.com> 
>>>> wrote:
>>>>
>>>>> > will there not be potential contention when the "to" vertex is 
>>>>> updated?
>>>>>
>>>>> Ah, just re-read your post and you've already answered this. My 
>>>>> apologies.
>>>>>
>>>>
>>>> Yes, the idea is that with millions and mullions of vertices, the 
>>>> chance to have a collision with the target nodes is very low, unless you 
>>>> have supernodes that recurs in most of the relationships.
>>>>  
>>>>
>>>>>
>>>>>> > OGraphBatchInsert ... keeps everything in RAM before flushing
>>>>>>
>>>>>> I assume I will still have to write retry code in the event of a 
>>>>>> collision (see above)?
>>>>>>
>>>>>
>>>> No in this case, the batch insert will manage this for you.
>>>>  
>>>>
>>>> Luca
>>>>  
>>>>
>>> -- 
>>>
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to orient-databa...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

Reply via email to