Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

Phillip Henry Fri, 23 Sep 2016 09:24:07 -0700

> will there not be potential contention when the "to" vertex is updated?


Ah, just re-read your post and you've already answered this. My apologies.

Phill

On Friday, September 23, 2016 at 4:51:50 PM UTC+1, Phillip Henry wrote:
>
> Hi, Luca.
>
> > How many GB?
>
> The input file is 22gb of text.
>
> > If the file is ordered ...
>
> You are only sorting by the first account. The second account can be 
> anywhere in the entire range. My understanding is that both vertices are 
> updated when an edge is written. If this is true, will there not be 
> potential contention when the "to" vertex is updated?
>
> > OGraphBatchInsert ... keeps everything in RAM before flushing
>
> I assume I will still have to write retry code in the event of a collision 
> (see above)?
>
> > You cna use support --at- orientdb.com ... 
>
> Sent.
>
> Regards,
>
> Phill
>
> On Friday, September 23, 2016 at 4:06:49 PM UTC+1, l.garulli wrote:
>>
>> On 23 September 2016 at 03:50, Phillip Henry <phill...@gmail.com> wrote:
>>
>>> > How big is your file the sort cannot write?
>>>
>>> One bil-ee-on lines... :-P
>>>
>>
>> How many GB?
>>  
>>
>>> > ...This should help a lot. 
>>>
>>> The trouble is that the size of a block of contiguous accounts in the 
>>> real data is not-uniform (even if it might be with my test data). 
>>> Therefore, it is highly likely a contiguous block of account numbers will 
>>> span 2 or more batches. This will lead to a lot of contention. In your 
>>> example, if Account 2 spills over into the next batch, chances are I'll 
>>> have to rollback that batch.
>>>
>>> Don't you also have a problem that if X, Y, Z and W in your example are 
>>> account numbers in the next batch, you'll also get contention? Admittedly, 
>>> randomization doesn't solve this problem either.
>>>
>>
>> If the file is ordered, you could have X threads (where X is the number 
>> of cores) that parse the file not sequentially. For example with 4 threads, 
>> you could start the parsing in this way:
>>
>> Thread 1, starts from 0
>> Thread 2, starts from length * 1/4
>> Thread 3, starts from length * 2/4
>> Thread 1, starts from length * 3/4
>>  
>> Of course the parsing should browse until the next LF+LR if it's a CSV. 
>> It requires some lines of code, but you could avoid many conflicts.
>>
>>
>>> > you can use the special Batch Importer: OGraphBatchInsert
>>>
>>> Would this not be subject to the same contention problems?
>>> At what point is it flushed to disk? (Obviously, it can't live in heap 
>>> forever).
>>>
>>
>> It keeps everything in RAM before flushing. Up to a few hundreds of 
>> millions of vertices/edges should be fine if you have a lot of heap, like 
>> 58GB (and 4GB of DISKCACHE). It depends by the number of attributes you 
>> have.
>>  
>>
>>> > You should definitely using transactions with batch size of 100 items. 
>>>
>>> I thought I read somewhere else (can't find the link at the moment) that 
>>> you said only use transactions when using the remote protocol?
>>>
>>
>> This was true before v2.2. With v2.2 the management of the transaction is 
>> parallel and very light. Transactions work well with graphs because every 
>> addEdge() operation is 2 update and having a TX that works like a batch 
>> really helps.
>>  
>>
>>>
>>> > Please use last 2.2.10. ... try to define 50GB of DISKCACHE and 14GB 
>>> of Heap
>>>
>>> Will do on the next run.
>>>
>>> > If happens again, could you please send a thread dump?
>>>
>>> I have the full thread dump but it's on my work machine so can't post it 
>>> in this forum (all access to Google Groups is banned by the bank so I am 
>>> writing this on my personal computer). Happy to email them to you. Which 
>>> email shall I use?
>>>
>>
>> You cna use support --at- orientdb.com referring at this thread in the 
>> subject.
>>  
>>
>>>
>>> Phill
>>>
>>
>>
>> Best Regards,
>>
>> Luca Garulli
>> Founder & CEO
>> OrientDB LTD <http://orientdb.com/>
>>
>> Want to share your opinion about OrientDB?
>> Rate & review us at Gartner's Software Review 
>> <https://www.gartner.com/reviews/survey/home>
>>
>>  
>>
>>> On Friday, September 23, 2016 at 7:41:29 AM UTC+1, l.garulli wrote:
>>>
>>>> On 23 September 2016 at 00:49, Phillip Henry <phill...@gmail.com> 
>>>> wrote:
>>>>
>>>>> Hi, Luca.
>>>>>
>>>>
>>>> Hi Phillip.
>>>>  
>>>>
>>>>> I have:
>>>>>
>>>>> 4. sorting is an overhead, albeit outside of Orient. Using the Unix 
>>>>> sort command failed with "No space left on device". Oops. OK, so I ran my 
>>>>> program to generate the data again, this time it is ordered by the first 
>>>>> account number. Performance was much slower as there appeared to be a lot 
>>>>> of contention for this account (ie, all writes were contending for this 
>>>>> account, even if the other account had less contention). More randomized 
>>>>> data was faster.
>>>>>
>>>>
>>>> How big is your file the sort cannot write? Anyway, if you have the 
>>>> accounts sorted, you should have transactions of about 100 items where the 
>>>> bank account and edges are in the same transaction. This should help a 
>>>> lot. 
>>>> Example:
>>>>
>>>> Account 1 -> Payment 1 -> Account X
>>>> Account 1 -> Payment 2 -> Account Y
>>>> Account 1 -> Payment 3 -> Account Z
>>>> Account 2 -> Payment 1 -> Account X
>>>> Account 2 -> Payment 1 -> Account W
>>>>
>>>> If the transaction batch is 5 (I suggest you to start with 100), all 
>>>> the operations are executed in one transaction. In another thread has:
>>>>
>>>> Account 99 -> Payment 1 -> Account W
>>>>
>>>> It could go in conflict because the shared Account W.
>>>>
>>>> If you can export Account's IDs that are numbers and incremental, you 
>>>> can use the special Batch Importer: OGraphBatchInsert. Example:
>>>>
>>>> OGraphBatchInsert batch = new OGraphBatchInsert("plocal:/temp/mydb", 
>>>> "admin", "admin");
>>>> batch.begin();
>>>>
>>>> batch.createEdge(0L, 1L, null); // CREATE EDGES BETWEEN VERTEX 0 and 1. IF 
>>>> VERTICES
>>>>
>>>>                                 // DON'T EXISTS, ARE CREATED IMPLICITELY
>>>> batch.createEdge(1L, 2L, null);
>>>> batch.createEdge(2L, 0L, null);
>>>>
>>>>
>>>> batch.createVertex(3L); // CREATE AN NON CONNECTED VERTEX
>>>>
>>>>
>>>> Map<String, Object> vertexProps = new HashMap<String, Object>();
>>>> vertexProps.put("foo", "foo");
>>>> vertexProps.put("bar", 3);
>>>> batch.setVertexProperties(0L, vertexProps); // SET PROPERTY FOR VERTEX 0
>>>> batch.end();
>>>>
>>>> This is blazing fast, but uses Heap so run it with a lot of it.
>>>>  
>>>>
>>>>>
>>>>> 6. I've mutlithreaded my loader. The details are now:
>>>>>
>>>>> - using plocal
>>>>> - using 30 threads
>>>>> - not using transactions (OrientGraphFactory.getNoTx)
>>>>>
>>>>
>>>> You should definitely using transactions with batch size of 100 items. 
>>>> This speeds up things.
>>>>  
>>>>
>>>>> - retrying forever upon write collisions.
>>>>> - using Orient 2.2.7.
>>>>>
>>>>
>>>> Please use last 2.2.10.
>>>>  
>>>>
>>>>> - using -XX:MaxDirectMemorySize:258040m
>>>>>
>>>>
>>>> This is not really important, it's just an upper bound for the JVM. 
>>>> Please set it to 512GB so you can forget about it. The 2 most important 
>>>> values are DISKCACHE and JVM heap. The sum must lower than the available 
>>>> RAM in the server before you run OrientDB.
>>>>
>>>> If you have 64GB, try to define 50GB of DISKCACHE and 14GB of Heap.
>>>>
>>>> If you use the Batch Importer, you should use more Heap and less 
>>>> DISKCACHE.
>>>>  
>>>>
>>>>> The good news is I've achieved an initial write throughput of about 
>>>>> 30k/second.
>>>>>
>>>>> The bad news is I've tried several runs and only been able to achieve 
>>>>> 200mil < number of writes < 300mil.
>>>>>
>>>>> The first time I tried it, the loader deadlocked. Using jstat showed 
>>>>> that the deadlock was between 3 threads at:
>>>>> - 
>>>>> OOneKeyEntryPerKeyLockManager.acquireLock(OOneKeyEntryPerKeyLockManager.java:173)
>>>>> - 
>>>>> OPartitionedLockManager.acquireExclusiveLock(OPartitionedLockManager.java:210)
>>>>> - 
>>>>> OOneKeyEntryPerKeyLockManager.acquireLock(OOneKeyEntryPerKeyLockManager.java:171)
>>>>>
>>>>
>>>> If happens again, could you please send a thread dump?
>>>>  
>>>>
>>>>> The second time it failed was due to a NullPointerException at 
>>>>> OByteBufferPool.java:297. I've looked at the code and the only way I can 
>>>>> see this happening is if OByteBufferPool.allocateBuffer throws an error 
>>>>> (perhaps an OutOfMemoryError in java.nio.Bits.reserveMemory). This 
>>>>> StackOverflow posting (
>>>>> http://stackoverflow.com/questions/8462200/examples-of-forcing-freeing-of-native-memory-direct-bytebuffer-has-allocated-us)
>>>>>  
>>>>> seems to indicate that this can happen if the underlying 
>>>>> DirectByteBuffer's 
>>>>> Cleaner doesn't have its clean() method called. 
>>>>>
>>>>
>>>> This is because the database was bigger than this setting: - using 
>>>> -XX:MaxDirectMemorySize:258040m. Please set this at 512GB (see above).
>>>>  
>>>>
>>>>> Alternatively, I followed the SO suggestion and lowered the heap space 
>>>>> to a mere 1gb (it was 50gb) to make the GC more active. Unfortunately, 
>>>>> after a good start, the job is still running some 15 hours later with a 
>>>>> hugely reduced write throughput (~ 7k/s). Jstat shows 4292 full GCs 
>>>>> taking 
>>>>> a total time of 4597s - not great but not hugely awful either. At this 
>>>>> rate, the remaining 700mil or so payments are going to take another 30 
>>>>> hours.
>>>>>
>>>>
>>>> See above the suggested settings.
>>>>  
>>>>
>>>>> 7. Even with the highest throughput I have achieved, 30k writes per 
>>>>> second, I'm looking at about 20 hours of loading. We've taken the same 
>>>>> data 
>>>>> and, after trial and error that was not without its own problems, put it 
>>>>> into Neo4J in 37 minutes. This is a significant difference. It appears 
>>>>> that 
>>>>> they are approaching the problem differently to avoid contention on 
>>>>> updating the vertices during an edge write.
>>>>>
>>>>
>>>> With all this suggestion you should be able to have much better 
>>>> numbers. If you can use the Batch Importer the number should be close to 
>>>> Neo4j.
>>>>  
>>>>
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Phillip
>>>>>
>>>>>
>>>>
>>>> Best Regards,
>>>>
>>>> Luca Garulli
>>>> Founder & CEO
>>>> OrientDB LTD <http://orientdb.com/>
>>>>
>>>> Want to share your opinion about OrientDB?
>>>> Rate & review us at Gartner's Software Review 
>>>> <https://www.gartner.com/reviews/survey/home>
>>>>
>>>>
>>>>  
>>>>
>>>>>
>>>>> On Thursday, September 15, 2016 at 10:06:44 PM UTC+1, l.garulli wrote:
>>>>>>
>>>>>> On 15 September 2016 at 09:54, Phillip Henry <phill...@gmail.com> 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi, Luca.
>>>>>>>
>>>>>>
>>>>>> Hi Phillip,
>>>>>>
>>>>>> 3. Yes, default configuration. Apart from adding an index for 
>>>>>>> ACCOUNTS, I did nothing further.
>>>>>>>
>>>>>>
>>>>>> Ok, so you have writeQuorum="majority" that means 2 sycnhronous 
>>>>>> writes and 1 asynchronous per transaction.
>>>>>>  
>>>>>>
>>>>>>> 4. Good question. With real data, we expect it to be as you suggest: 
>>>>>>> some nodes with the majority of the payments (eg, supermarkets). 
>>>>>>> However, 
>>>>>>> for the test data, payments were assigned randomly and, therefore, 
>>>>>>> should 
>>>>>>> be uniformly distributed.
>>>>>>>
>>>>>>
>>>>>> What's your average in terms of number of edges? <10, <50, <200, 
>>>>>> <1000?
>>>>>>  
>>>>>>
>>>>>>> 2. Yes, I tried plocal minutes after posting (d'oh!). I saw a good 
>>>>>>> improvement. It started about 3 times faster and got faster still 
>>>>>>> (about 10 
>>>>>>> times faster) by the time I checked this morning on a job running 
>>>>>>> overnight. However, even though it is now running at about 7k 
>>>>>>> transactions 
>>>>>>> per second, a billion edges is still going to take about 40 hours. So, 
>>>>>>> I 
>>>>>>> ask myself: is there anyway I can make it faster still?
>>>>>>>
>>>>>>
>>>>>> Here it's missing the usage of AUTO-SHARDING INDEX. Example:
>>>>>>
>>>>>> accountClass.createIndex("Account.number", 
>>>>>> OClass.INDEX_TYPE.UNIQUE.toString(), (OProgressListener) null, 
>>>>>> (ODocument) null,
>>>>>>     "AUTOSHARDING", new String[] { "number" });
>>>>>>
>>>>>> In this way you should go more in parallel, because the index is 
>>>>>> distributed across all the shards (clusters) of Account class. you 
>>>>>> should 
>>>>>> have 32 of them by default because you have 32 cores. 
>>>>>>
>>>>>> Please let me know if by sorting the from_accounts and with this 
>>>>>> change if it's much faster.
>>>>>>
>>>>>> This is the best you can have out of the box. To push numbers up it's 
>>>>>> slightly more complicated: you should be sure that transactions go in 
>>>>>> parallel and they aren't serialized. This is possible by playing with 
>>>>>> internal OrientDB settings (mainly the distributed workerThreads), by 
>>>>>> having many clusters per class (You could try with 128 first and see how 
>>>>>> it's going).
>>>>>>  
>>>>>>
>>>>>>> I assume when I start the servers up in distributed mode once more, 
>>>>>>> the data will then be distributed across all nodes in the cluster?
>>>>>>>
>>>>>>
>>>>>> That's right.
>>>>>>  
>>>>>>
>>>>>>> 3. I'll return to concurrent, remote inserts when this job has 
>>>>>>> finished. Hopefully, a smaller batch size will mean there is no 
>>>>>>> degradation 
>>>>>>> in performance either... FYI: with a somewhat unscientific approach, I 
>>>>>>> was 
>>>>>>> polling the server JVM with JStack and saw only a single thread doing 
>>>>>>> all 
>>>>>>> the work and it *seemed* to spend a lot of its time in ODirtyManager on 
>>>>>>> collection manipulation.
>>>>>>>
>>>>>>
>>>>>> I think it's because you didn't use the AUTO-SHARDING index. 
>>>>>> Furthermore running distributed, unfortunately, means the tree ridbag is 
>>>>>> not available (we will support it in the future), so every change to the 
>>>>>> edges takes a lot of CPU to demarshall and marshall the entire edge list 
>>>>>> everytime you update a vertex. That's why my recommendation about 
>>>>>> sorting 
>>>>>> the vertices.
>>>>>>  
>>>>>>
>>>>>>> I totally appreciate that performance tuning is an empirical 
>>>>>>> science, but do you have any opinions as to which would probably be 
>>>>>>> faster: 
>>>>>>> single-threaded plocal or multithreaded remote? 
>>>>>>>
>>>>>>
>>>>>> With v2.2 yo can go in parallel, by using the tips above. For sure 
>>>>>> the replication has a cost. I'm sure you can go much faster with just 
>>>>>> one 
>>>>>> node and then start the other 2 nodes to have the database replicated 
>>>>>> automatically. At least for the first massive insertion.
>>>>>>  
>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Phillip
>>>>>>>
>>>>>>
>>>>>> Luca
>>>>>>
>>>>>>  
>>>>>>
>>>>>>>
>>>>>>> On Wednesday, September 14, 2016 at 3:48:56 PM UTC+1, Phillip Henry 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi, guys.
>>>>>>>>
>>>>>>>> I'm conducting a proof-of-concept for a large bank (Luca, we had a 
>>>>>>>> 'phone conf on August 5...) and I'm trying to bulk insert a humongous 
>>>>>>>> amount of data: 1 million vertices and 1 billion edges.
>>>>>>>>
>>>>>>>> Firstly, I'm impressed about how easy it was to configure a 
>>>>>>>> cluster. However, the performance of batch inserting is bad (and seems 
>>>>>>>> to 
>>>>>>>> get considerably worse as I add more data). It starts at about 2k 
>>>>>>>> vertices-and-edges per second and deteriorates to about 500/second 
>>>>>>>> after 
>>>>>>>> only about 3 million edges have been added. This also takes ~ 30 
>>>>>>>> minutes. 
>>>>>>>> Needless to say that 1 billion payments (edges) will take over a week 
>>>>>>>> at 
>>>>>>>> this rate. 
>>>>>>>>
>>>>>>>> This is a show-stopper for us.
>>>>>>>>
>>>>>>>> My data model is simply payments between accounts and I store it in 
>>>>>>>> one large file. It's just 3 fields and looks like:
>>>>>>>>
>>>>>>>> FROM_ACCOUNT TO_ACCOUNT AMOUNT
>>>>>>>>
>>>>>>>> In the test data I generated, I had 1 million accounts and 1 
>>>>>>>> billion payments randomly distributed between pairs of accounts.
>>>>>>>>
>>>>>>>> I have 2 classes in OrientDB: ACCOUNTS (extending V) and PAYMENT 
>>>>>>>> (extending E). There is a UNIQUE_HASH_INDEX on ACCOUNTS for the 
>>>>>>>> account 
>>>>>>>> number (a string).
>>>>>>>>
>>>>>>>> We're using OrientDB 2.2.7.
>>>>>>>>
>>>>>>>> My batch size is 5k and I am using the "remote" protocol to connect 
>>>>>>>> to our cluster.
>>>>>>>>
>>>>>>>> I'm using JDK 8 and my 3 boxes are beefy machines (32 cores each) 
>>>>>>>> but without SSDs. I wrote the importing code myself but did nothing 
>>>>>>>> 'clever' (I think) and used the Graph API. This client code has been 
>>>>>>>> given 
>>>>>>>> lots of memory and using jstat I can see it is not excessively GCing.
>>>>>>>>
>>>>>>>> So, my questions are:
>>>>>>>>
>>>>>>>> 1. what kind of performance can I realistically expect and can I 
>>>>>>>> improve what I have at the moment?
>>>>>>>>
>>>>>>>> 2. what kind of degradation should I expect as the graph grows?
>>>>>>>>
>>>>>>>> Thanks, guys.
>>>>>>>>
>>>>>>>> Phillip
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>
>>>>>>> --- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "OrientDB" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to orient-databa...@googlegroups.com.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>> -- 
>>>>>
>>>>> --- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "OrientDB" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to orient-databa...@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> -- 
>>>
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to orient-databa...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

Reply via email to