Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

Andrey Lomakin Mon, 26 Sep 2016 01:07:45 -0700

Hi,
I have looked at your thread dump we have already identified and fixed your
issue in 2.2.9 version.
So if you use 2.2.10 (latest one), you will not experience this problem.


I strongly recommend using 2.2.10 version because several deadlocks are
fixed in 2.2.9 version also 2.2.10 contains few minor optimizations.

On Fri, Sep 23, 2016 at 6:51 PM Phillip Henry <[email protected]> wrote:

> Hi, Luca.
>
> > How many GB?
>
> The input file is 22gb of text.
>
> > If the file is ordered ...
>
> You are only sorting by the first account. The second account can be
> anywhere in the entire range. My understanding is that both vertices are
> updated when an edge is written. If this is true, will there not be
> potential contention when the "to" vertex is updated?
>
> > OGraphBatchInsert ... keeps everything in RAM before flushing
>
> I assume I will still have to write retry code in the event of a collision
> (see above)?
>
> > You cna use support --at- orientdb.com ...
>
> Sent.
>
> Regards,
>
> Phill
>
> On Friday, September 23, 2016 at 4:06:49 PM UTC+1, l.garulli wrote:
>
>> On 23 September 2016 at 03:50, Phillip Henry <[email protected]> wrote:
>>
>>> > How big is your file the sort cannot write?
>>>
>>> One bil-ee-on lines... :-P
>>>
>>
>> How many GB?
>>
>>
>>> > ...This should help a lot.
>>>
>>> The trouble is that the size of a block of contiguous accounts in the
>>> real data is not-uniform (even if it might be with my test data).
>>> Therefore, it is highly likely a contiguous block of account numbers will
>>> span 2 or more batches. This will lead to a lot of contention. In your
>>> example, if Account 2 spills over into the next batch, chances are I'll
>>> have to rollback that batch.
>>>
>>> Don't you also have a problem that if X, Y, Z and W in your example are
>>> account numbers in the next batch, you'll also get contention? Admittedly,
>>> randomization doesn't solve this problem either.
>>>
>>
>> If the file is ordered, you could have X threads (where X is the number
>> of cores) that parse the file not sequentially. For example with 4 threads,
>> you could start the parsing in this way:
>>
>> Thread 1, starts from 0
>> Thread 2, starts from length * 1/4
>> Thread 3, starts from length * 2/4
>> Thread 1, starts from length * 3/4
>>
>> Of course the parsing should browse until the next LF+LR if it's a CSV.
>> It requires some lines of code, but you could avoid many conflicts.
>>
>>
>>> > you can use the special Batch Importer: OGraphBatchInsert
>>>
>>> Would this not be subject to the same contention problems?
>>> At what point is it flushed to disk? (Obviously, it can't live in heap
>>> forever).
>>>
>>
>> It keeps everything in RAM before flushing. Up to a few hundreds of
>> millions of vertices/edges should be fine if you have a lot of heap, like
>> 58GB (and 4GB of DISKCACHE). It depends by the number of attributes you
>> have.
>>
>>
>>> > You should definitely using transactions with batch size of 100 items.
>>>
>>> I thought I read somewhere else (can't find the link at the moment) that
>>> you said only use transactions when using the remote protocol?
>>>
>>
>> This was true before v2.2. With v2.2 the management of the transaction is
>> parallel and very light. Transactions work well with graphs because every
>> addEdge() operation is 2 update and having a TX that works like a batch
>> really helps.
>>
>>
>>>
>>> > Please use last 2.2.10. ... try to define 50GB of DISKCACHE and 14GB
>>> of Heap
>>>
>>> Will do on the next run.
>>>
>>> > If happens again, could you please send a thread dump?
>>>
>>> I have the full thread dump but it's on my work machine so can't post it
>>> in this forum (all access to Google Groups is banned by the bank so I am
>>> writing this on my personal computer). Happy to email them to you. Which
>>> email shall I use?
>>>
>>
>> You cna use support --at- orientdb.com referring at this thread in the
>> subject.
>>
>>
>>>
>>> Phill
>>>
>>
>>
>> Best Regards,
>>
>> Luca Garulli
>> Founder & CEO
>> OrientDB LTD <http://orientdb.com/>
>>
>> Want to share your opinion about OrientDB?
>> Rate & review us at Gartner's Software Review
>> <https://www.gartner.com/reviews/survey/home>
>>
>>
>>
>>> On Friday, September 23, 2016 at 7:41:29 AM UTC+1, l.garulli wrote:
>>>
>>>> On 23 September 2016 at 00:49, Phillip Henry <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi, Luca.
>>>>>
>>>>
>>>> Hi Phillip.
>>>>
>>>>
>>>>> I have:
>>>>>
>>>>> 4. sorting is an overhead, albeit outside of Orient. Using the Unix
>>>>> sort command failed with "No space left on device". Oops. OK, so I ran my
>>>>> program to generate the data again, this time it is ordered by the first
>>>>> account number. Performance was much slower as there appeared to be a lot
>>>>> of contention for this account (ie, all writes were contending for this
>>>>> account, even if the other account had less contention). More randomized
>>>>> data was faster.
>>>>>
>>>>
>>>> How big is your file the sort cannot write? Anyway, if you have the
>>>> accounts sorted, you should have transactions of about 100 items where the
>>>> bank account and edges are in the same transaction. This should help a lot.
>>>> Example:
>>>>
>>>> Account 1 -> Payment 1 -> Account X
>>>> Account 1 -> Payment 2 -> Account Y
>>>> Account 1 -> Payment 3 -> Account Z
>>>> Account 2 -> Payment 1 -> Account X
>>>> Account 2 -> Payment 1 -> Account W
>>>>
>>>> If the transaction batch is 5 (I suggest you to start with 100), all
>>>> the operations are executed in one transaction. In another thread has:
>>>>
>>>> Account 99 -> Payment 1 -> Account W
>>>>
>>>> It could go in conflict because the shared Account W.
>>>>
>>>> If you can export Account's IDs that are numbers and incremental, you
>>>> can use the special Batch Importer: OGraphBatchInsert. Example:
>>>>
>>>> OGraphBatchInsert batch = new OGraphBatchInsert("plocal:/temp/mydb", 
>>>> "admin", "admin");
>>>> batch.begin();
>>>>
>>>> batch.createEdge(0L, 1L, null); // CREATE EDGES BETWEEN VERTEX 0 and 1. IF 
>>>> VERTICES
>>>>
>>>>                                 // DON'T EXISTS, ARE CREATED IMPLICITELY
>>>> batch.createEdge(1L, 2L, null);
>>>> batch.createEdge(2L, 0L, null);
>>>>
>>>>
>>>> batch.createVertex(3L); // CREATE AN NON CONNECTED VERTEX
>>>>
>>>>
>>>> Map<String, Object> vertexProps = new HashMap<String, Object>();
>>>> vertexProps.put("foo", "foo");
>>>> vertexProps.put("bar", 3);
>>>> batch.setVertexProperties(0L, vertexProps); // SET PROPERTY FOR VERTEX 0
>>>> batch.end();
>>>>
>>>> This is blazing fast, but uses Heap so run it with a lot of it.
>>>>
>>>>
>>>>>
>>>>> 6. I've mutlithreaded my loader. The details are now:
>>>>>
>>>>> - using plocal
>>>>> - using 30 threads
>>>>> - not using transactions (OrientGraphFactory.getNoTx)
>>>>>
>>>>
>>>> You should definitely using transactions with batch size of 100 items.
>>>> This speeds up things.
>>>>
>>>>
>>>>> - retrying forever upon write collisions.
>>>>> - using Orient 2.2.7.
>>>>>
>>>>
>>>> Please use last 2.2.10.
>>>>
>>>>
>>>>> - using -XX:MaxDirectMemorySize:258040m
>>>>>
>>>>
>>>> This is not really important, it's just an upper bound for the JVM.
>>>> Please set it to 512GB so you can forget about it. The 2 most important
>>>> values are DISKCACHE and JVM heap. The sum must lower than the available
>>>> RAM in the server before you run OrientDB.
>>>>
>>>> If you have 64GB, try to define 50GB of DISKCACHE and 14GB of Heap.
>>>>
>>>> If you use the Batch Importer, you should use more Heap and less
>>>> DISKCACHE.
>>>>
>>>>
>>>>> The good news is I've achieved an initial write throughput of about
>>>>> 30k/second.
>>>>>
>>>>> The bad news is I've tried several runs and only been able to achieve
>>>>> 200mil < number of writes < 300mil.
>>>>>
>>>>> The first time I tried it, the loader deadlocked. Using jstat showed
>>>>> that the deadlock was between 3 threads at:
>>>>> -
>>>>> OOneKeyEntryPerKeyLockManager.acquireLock(OOneKeyEntryPerKeyLockManager.java:173)
>>>>> -
>>>>> OPartitionedLockManager.acquireExclusiveLock(OPartitionedLockManager.java:210)
>>>>> -
>>>>> OOneKeyEntryPerKeyLockManager.acquireLock(OOneKeyEntryPerKeyLockManager.java:171)
>>>>>
>>>>
>>>> If happens again, could you please send a thread dump?
>>>>
>>>>
>>>>> The second time it failed was due to a NullPointerException at
>>>>> OByteBufferPool.java:297. I've looked at the code and the only way I can
>>>>> see this happening is if OByteBufferPool.allocateBuffer throws an error
>>>>> (perhaps an OutOfMemoryError in java.nio.Bits.reserveMemory). This
>>>>> StackOverflow posting (
>>>>> http://stackoverflow.com/questions/8462200/examples-of-forcing-freeing-of-native-memory-direct-bytebuffer-has-allocated-us)
>>>>> seems to indicate that this can happen if the underlying 
>>>>> DirectByteBuffer's
>>>>> Cleaner doesn't have its clean() method called.
>>>>>
>>>>
>>>> This is because the database was bigger than this setting: - using
>>>> -XX:MaxDirectMemorySize:258040m. Please set this at 512GB (see above).
>>>>
>>>>
>>>>> Alternatively, I followed the SO suggestion and lowered the heap space
>>>>> to a mere 1gb (it was 50gb) to make the GC more active. Unfortunately,
>>>>> after a good start, the job is still running some 15 hours later with a
>>>>> hugely reduced write throughput (~ 7k/s). Jstat shows 4292 full GCs taking
>>>>> a total time of 4597s - not great but not hugely awful either. At this
>>>>> rate, the remaining 700mil or so payments are going to take another 30
>>>>> hours.
>>>>>
>>>>
>>>> See above the suggested settings.
>>>>
>>>>
>>>>> 7. Even with the highest throughput I have achieved, 30k writes per
>>>>> second, I'm looking at about 20 hours of loading. We've taken the same 
>>>>> data
>>>>> and, after trial and error that was not without its own problems, put it
>>>>> into Neo4J in 37 minutes. This is a significant difference. It appears 
>>>>> that
>>>>> they are approaching the problem differently to avoid contention on
>>>>> updating the vertices during an edge write.
>>>>>
>>>>
>>>> With all this suggestion you should be able to have much better
>>>> numbers. If you can use the Batch Importer the number should be close to
>>>> Neo4j.
>>>>
>>>>
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Phillip
>>>>>
>>>>>
>>>>
>>>> Best Regards,
>>>>
>>>> Luca Garulli
>>>> Founder & CEO
>>>> OrientDB LTD <http://orientdb.com/>
>>>>
>>>> Want to share your opinion about OrientDB?
>>>> Rate & review us at Gartner's Software Review
>>>> <https://www.gartner.com/reviews/survey/home>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> On Thursday, September 15, 2016 at 10:06:44 PM UTC+1, l.garulli wrote:
>>>>>>
>>>>>> On 15 September 2016 at 09:54, Phillip Henry <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi, Luca.
>>>>>>>
>>>>>>
>>>>>> Hi Phillip,
>>>>>>
>>>>>> 3. Yes, default configuration. Apart from adding an index for
>>>>>>> ACCOUNTS, I did nothing further.
>>>>>>>
>>>>>>
>>>>>> Ok, so you have writeQuorum="majority" that means 2 sycnhronous
>>>>>> writes and 1 asynchronous per transaction.
>>>>>>
>>>>>>
>>>>>>> 4. Good question. With real data, we expect it to be as you suggest:
>>>>>>> some nodes with the majority of the payments (eg, supermarkets). 
>>>>>>> However,
>>>>>>> for the test data, payments were assigned randomly and, therefore, 
>>>>>>> should
>>>>>>> be uniformly distributed.
>>>>>>>
>>>>>>
>>>>>> What's your average in terms of number of edges? <10, <50, <200,
>>>>>> <1000?
>>>>>>
>>>>>>
>>>>>>> 2. Yes, I tried plocal minutes after posting (d'oh!). I saw a good
>>>>>>> improvement. It started about 3 times faster and got faster still 
>>>>>>> (about 10
>>>>>>> times faster) by the time I checked this morning on a job running
>>>>>>> overnight. However, even though it is now running at about 7k 
>>>>>>> transactions
>>>>>>> per second, a billion edges is still going to take about 40 hours. So, I
>>>>>>> ask myself: is there anyway I can make it faster still?
>>>>>>>
>>>>>>
>>>>>> Here it's missing the usage of AUTO-SHARDING INDEX. Example:
>>>>>>
>>>>>> accountClass.createIndex("Account.number", 
>>>>>> OClass.INDEX_TYPE.UNIQUE.toString(), (OProgressListener) null, 
>>>>>> (ODocument) null,
>>>>>>     "AUTOSHARDING", new String[] { "number" });
>>>>>>
>>>>>> In this way you should go more in parallel, because the index is
>>>>>> distributed across all the shards (clusters) of Account class. you should
>>>>>> have 32 of them by default because you have 32 cores.
>>>>>>
>>>>>> Please let me know if by sorting the from_accounts and with this
>>>>>> change if it's much faster.
>>>>>>
>>>>>> This is the best you can have out of the box. To push numbers up it's
>>>>>> slightly more complicated: you should be sure that transactions go in
>>>>>> parallel and they aren't serialized. This is possible by playing with
>>>>>> internal OrientDB settings (mainly the distributed workerThreads), by
>>>>>> having many clusters per class (You could try with 128 first and see how
>>>>>> it's going).
>>>>>>
>>>>>>
>>>>>>> I assume when I start the servers up in distributed mode once more,
>>>>>>> the data will then be distributed across all nodes in the cluster?
>>>>>>>
>>>>>>
>>>>>> That's right.
>>>>>>
>>>>>>
>>>>>>> 3. I'll return to concurrent, remote inserts when this job has
>>>>>>> finished. Hopefully, a smaller batch size will mean there is no 
>>>>>>> degradation
>>>>>>> in performance either... FYI: with a somewhat unscientific approach, I 
>>>>>>> was
>>>>>>> polling the server JVM with JStack and saw only a single thread doing 
>>>>>>> all
>>>>>>> the work and it *seemed* to spend a lot of its time in ODirtyManager on
>>>>>>> collection manipulation.
>>>>>>>
>>>>>>
>>>>>> I think it's because you didn't use the AUTO-SHARDING index.
>>>>>> Furthermore running distributed, unfortunately, means the tree ridbag is
>>>>>> not available (we will support it in the future), so every change to the
>>>>>> edges takes a lot of CPU to demarshall and marshall the entire edge list
>>>>>> everytime you update a vertex. That's why my recommendation about sorting
>>>>>> the vertices.
>>>>>>
>>>>>>
>>>>>>> I totally appreciate that performance tuning is an empirical
>>>>>>> science, but do you have any opinions as to which would probably be 
>>>>>>> faster:
>>>>>>> single-threaded plocal or multithreaded remote?
>>>>>>>
>>>>>>
>>>>>> With v2.2 yo can go in parallel, by using the tips above. For sure
>>>>>> the replication has a cost. I'm sure you can go much faster with just one
>>>>>> node and then start the other 2 nodes to have the database replicated
>>>>>> automatically. At least for the first massive insertion.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Phillip
>>>>>>>
>>>>>>
>>>>>> Luca
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Wednesday, September 14, 2016 at 3:48:56 PM UTC+1, Phillip Henry
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi, guys.
>>>>>>>>
>>>>>>>> I'm conducting a proof-of-concept for a large bank (Luca, we had a
>>>>>>>> 'phone conf on August 5...) and I'm trying to bulk insert a humongous
>>>>>>>> amount of data: 1 million vertices and 1 billion edges.
>>>>>>>>
>>>>>>>> Firstly, I'm impressed about how easy it was to configure a
>>>>>>>> cluster. However, the performance of batch inserting is bad (and seems 
>>>>>>>> to
>>>>>>>> get considerably worse as I add more data). It starts at about 2k
>>>>>>>> vertices-and-edges per second and deteriorates to about 500/second 
>>>>>>>> after
>>>>>>>> only about 3 million edges have been added. This also takes ~ 30 
>>>>>>>> minutes.
>>>>>>>> Needless to say that 1 billion payments (edges) will take over a week 
>>>>>>>> at
>>>>>>>> this rate.
>>>>>>>>
>>>>>>>> This is a show-stopper for us.
>>>>>>>>
>>>>>>>> My data model is simply payments between accounts and I store it in
>>>>>>>> one large file. It's just 3 fields and looks like:
>>>>>>>>
>>>>>>>> FROM_ACCOUNT TO_ACCOUNT AMOUNT
>>>>>>>>
>>>>>>>> In the test data I generated, I had 1 million accounts and 1
>>>>>>>> billion payments randomly distributed between pairs of accounts.
>>>>>>>>
>>>>>>>> I have 2 classes in OrientDB: ACCOUNTS (extending V) and PAYMENT
>>>>>>>> (extending E). There is a UNIQUE_HASH_INDEX on ACCOUNTS for the account
>>>>>>>> number (a string).
>>>>>>>>
>>>>>>>> We're using OrientDB 2.2.7.
>>>>>>>>
>>>>>>>> My batch size is 5k and I am using the "remote" protocol to connect
>>>>>>>> to our cluster.
>>>>>>>>
>>>>>>>> I'm using JDK 8 and my 3 boxes are beefy machines (32 cores each)
>>>>>>>> but without SSDs. I wrote the importing code myself but did nothing
>>>>>>>> 'clever' (I think) and used the Graph API. This client code has been 
>>>>>>>> given
>>>>>>>> lots of memory and using jstat I can see it is not excessively GCing.
>>>>>>>>
>>>>>>>> So, my questions are:
>>>>>>>>
>>>>>>>> 1. what kind of performance can I realistically expect and can I
>>>>>>>> improve what I have at the moment?
>>>>>>>>
>>>>>>>> 2. what kind of degradation should I expect as the graph grows?
>>>>>>>>
>>>>>>>> Thanks, guys.
>>>>>>>>
>>>>>>>> Phillip
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>
>>>>>>> ---
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "OrientDB" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>> --
>>>>>
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "OrientDB" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
-- 
Best regards,
Andrey Lomakin, R&D lead.
OrientDB Ltd

twitter: @Andrey_Lomakin
linkedin: https://ua.linkedin.com/in/andreylomakin
blogger: http://andreylomakin.blogspot.com/

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

Reply via email to