Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

Luca Garulli Thu, 22 Sep 2016 23:41:43 -0700

On 23 September 2016 at 00:49, Phillip Henry <[email protected]> wrote:


> Hi, Luca.
>

Hi Phillip.


> I have:
>
> 4. sorting is an overhead, albeit outside of Orient. Using the Unix sort
> command failed with "No space left on device". Oops. OK, so I ran my
> program to generate the data again, this time it is ordered by the first
> account number. Performance was much slower as there appeared to be a lot
> of contention for this account (ie, all writes were contending for this
> account, even if the other account had less contention). More randomized
> data was faster.
>

How big is your file the sort cannot write? Anyway, if you have the
accounts sorted, you should have transactions of about 100 items where the
bank account and edges are in the same transaction. This should help a lot.
Example:

Account 1 -> Payment 1 -> Account X
Account 1 -> Payment 2 -> Account Y
Account 1 -> Payment 3 -> Account Z
Account 2 -> Payment 1 -> Account X
Account 2 -> Payment 1 -> Account W

If the transaction batch is 5 (I suggest you to start with 100), all the
operations are executed in one transaction. In another thread has:

Account 99 -> Payment 1 -> Account W

It could go in conflict because the shared Account W.

If you can export Account's IDs that are numbers and incremental, you can
use the special Batch Importer: OGraphBatchInsert. Example:

OGraphBatchInsert batch = new OGraphBatchInsert("plocal:/temp/mydb",
"admin", "admin");
batch.begin();

batch.createEdge(0L, 1L, null); // CREATE EDGES BETWEEN VERTEX 0 and
1. IF VERTICES

                                // DON'T EXISTS, ARE CREATED IMPLICITELY
batch.createEdge(1L, 2L, null);
batch.createEdge(2L, 0L, null);


batch.createVertex(3L); // CREATE AN NON CONNECTED VERTEX


Map<String, Object> vertexProps = new HashMap<String, Object>();
vertexProps.put("foo", "foo");
vertexProps.put("bar", 3);
batch.setVertexProperties(0L, vertexProps); // SET PROPERTY FOR VERTEX 0
batch.end();

This is blazing fast, but uses Heap so run it with a lot of it.


>
> 6. I've mutlithreaded my loader. The details are now:
>
> - using plocal
> - using 30 threads
> - not using transactions (OrientGraphFactory.getNoTx)
>

You should definitely using transactions with batch size of 100 items. This
speeds up things.


> - retrying forever upon write collisions.
> - using Orient 2.2.7.
>

Please use last 2.2.10.


> - using -XX:MaxDirectMemorySize:258040m
>

This is not really important, it's just an upper bound for the JVM. Please
set it to 512GB so you can forget about it. The 2 most important values are
DISKCACHE and JVM heap. The sum must lower than the available RAM in the
server before you run OrientDB.

If you have 64GB, try to define 50GB of DISKCACHE and 14GB of Heap.

If you use the Batch Importer, you should use more Heap and less DISKCACHE.


> The good news is I've achieved an initial write throughput of about
> 30k/second.
>
> The bad news is I've tried several runs and only been able to achieve
> 200mil < number of writes < 300mil.
>
> The first time I tried it, the loader deadlocked. Using jstat showed that
> the deadlock was between 3 threads at:
> - OOneKeyEntryPerKeyLockManager.acquireLock(OOneKeyEntryPerKeyLockManager.
> java:173)
> - OPartitionedLockManager.acquireExclusiveLock(
> OPartitionedLockManager.java:210)
> - OOneKeyEntryPerKeyLockManager.acquireLock(OOneKeyEntryPerKeyLockManager.
> java:171)
>

If happens again, could you please send a thread dump?


> The second time it failed was due to a NullPointerException at
> OByteBufferPool.java:297. I've looked at the code and the only way I can
> see this happening is if OByteBufferPool.allocateBuffer throws an error
> (perhaps an OutOfMemoryError in java.nio.Bits.reserveMemory). This
> StackOverflow posting (http://stackoverflow.com/
> questions/8462200/examples-of-forcing-freeing-of-native-
> memory-direct-bytebuffer-has-allocated-us) seems to indicate that this
> can happen if the underlying DirectByteBuffer's Cleaner doesn't have its
> clean() method called.
>

This is because the database was bigger than this setting: - using
-XX:MaxDirectMemorySize:258040m. Please set this at 512GB (see above).


> Alternatively, I followed the SO suggestion and lowered the heap space to
> a mere 1gb (it was 50gb) to make the GC more active. Unfortunately, after a
> good start, the job is still running some 15 hours later with a hugely
> reduced write throughput (~ 7k/s). Jstat shows 4292 full GCs taking a total
> time of 4597s - not great but not hugely awful either. At this rate, the
> remaining 700mil or so payments are going to take another 30 hours.
>

See above the suggested settings.


> 7. Even with the highest throughput I have achieved, 30k writes per
> second, I'm looking at about 20 hours of loading. We've taken the same data
> and, after trial and error that was not without its own problems, put it
> into Neo4J in 37 minutes. This is a significant difference. It appears that
> they are approaching the problem differently to avoid contention on
> updating the vertices during an edge write.
>

With all this suggestion you should be able to have much better numbers. If
you can use the Batch Importer the number should be close to Neo4j.


>
> Thoughts?
>
> Regards,
>
> Phillip
>
>

Best Regards,

Luca Garulli
Founder & CEO
OrientDB LTD <http://orientdb.com/>

Want to share your opinion about OrientDB?
Rate & review us at Gartner's Software Review
<https://www.gartner.com/reviews/survey/home>




>
> On Thursday, September 15, 2016 at 10:06:44 PM UTC+1, l.garulli wrote:
>>
>> On 15 September 2016 at 09:54, Phillip Henry <[email protected]> wrote:
>>
>>> Hi, Luca.
>>>
>>
>> Hi Phillip,
>>
>> 3. Yes, default configuration. Apart from adding an index for ACCOUNTS, I
>>> did nothing further.
>>>
>>
>> Ok, so you have writeQuorum="majority" that means 2 sycnhronous writes
>> and 1 asynchronous per transaction.
>>
>>
>>> 4. Good question. With real data, we expect it to be as you suggest:
>>> some nodes with the majority of the payments (eg, supermarkets). However,
>>> for the test data, payments were assigned randomly and, therefore, should
>>> be uniformly distributed.
>>>
>>
>> What's your average in terms of number of edges? <10, <50, <200, <1000?
>>
>>
>>> 2. Yes, I tried plocal minutes after posting (d'oh!). I saw a good
>>> improvement. It started about 3 times faster and got faster still (about 10
>>> times faster) by the time I checked this morning on a job running
>>> overnight. However, even though it is now running at about 7k transactions
>>> per second, a billion edges is still going to take about 40 hours. So, I
>>> ask myself: is there anyway I can make it faster still?
>>>
>>
>> Here it's missing the usage of AUTO-SHARDING INDEX. Example:
>>
>> accountClass.createIndex("Account.number", 
>> OClass.INDEX_TYPE.UNIQUE.toString(), (OProgressListener) null, (ODocument) 
>> null,
>>     "AUTOSHARDING", new String[] { "number" });
>>
>> In this way you should go more in parallel, because the index is
>> distributed across all the shards (clusters) of Account class. you should
>> have 32 of them by default because you have 32 cores.
>>
>> Please let me know if by sorting the from_accounts and with this change
>> if it's much faster.
>>
>> This is the best you can have out of the box. To push numbers up it's
>> slightly more complicated: you should be sure that transactions go in
>> parallel and they aren't serialized. This is possible by playing with
>> internal OrientDB settings (mainly the distributed workerThreads), by
>> having many clusters per class (You could try with 128 first and see how
>> it's going).
>>
>>
>>> I assume when I start the servers up in distributed mode once more, the
>>> data will then be distributed across all nodes in the cluster?
>>>
>>
>> That's right.
>>
>>
>>> 3. I'll return to concurrent, remote inserts when this job has finished.
>>> Hopefully, a smaller batch size will mean there is no degradation in
>>> performance either... FYI: with a somewhat unscientific approach, I was
>>> polling the server JVM with JStack and saw only a single thread doing all
>>> the work and it *seemed* to spend a lot of its time in ODirtyManager on
>>> collection manipulation.
>>>
>>
>> I think it's because you didn't use the AUTO-SHARDING index. Furthermore
>> running distributed, unfortunately, means the tree ridbag is not available
>> (we will support it in the future), so every change to the edges takes a
>> lot of CPU to demarshall and marshall the entire edge list everytime you
>> update a vertex. That's why my recommendation about sorting the vertices.
>>
>>
>>> I totally appreciate that performance tuning is an empirical science,
>>> but do you have any opinions as to which would probably be faster:
>>> single-threaded plocal or multithreaded remote?
>>>
>>
>> With v2.2 yo can go in parallel, by using the tips above. For sure the
>> replication has a cost. I'm sure you can go much faster with just one node
>> and then start the other 2 nodes to have the database replicated
>> automatically. At least for the first massive insertion.
>>
>>
>>>
>>> Regards,
>>>
>>> Phillip
>>>
>>
>> Luca
>>
>>
>>
>>>
>>> On Wednesday, September 14, 2016 at 3:48:56 PM UTC+1, Phillip Henry
>>> wrote:
>>>>
>>>> Hi, guys.
>>>>
>>>> I'm conducting a proof-of-concept for a large bank (Luca, we had a
>>>> 'phone conf on August 5...) and I'm trying to bulk insert a humongous
>>>> amount of data: 1 million vertices and 1 billion edges.
>>>>
>>>> Firstly, I'm impressed about how easy it was to configure a cluster.
>>>> However, the performance of batch inserting is bad (and seems to get
>>>> considerably worse as I add more data). It starts at about 2k
>>>> vertices-and-edges per second and deteriorates to about 500/second after
>>>> only about 3 million edges have been added. This also takes ~ 30 minutes.
>>>> Needless to say that 1 billion payments (edges) will take over a week at
>>>> this rate.
>>>>
>>>> This is a show-stopper for us.
>>>>
>>>> My data model is simply payments between accounts and I store it in one
>>>> large file. It's just 3 fields and looks like:
>>>>
>>>> FROM_ACCOUNT TO_ACCOUNT AMOUNT
>>>>
>>>> In the test data I generated, I had 1 million accounts and 1 billion
>>>> payments randomly distributed between pairs of accounts.
>>>>
>>>> I have 2 classes in OrientDB: ACCOUNTS (extending V) and PAYMENT
>>>> (extending E). There is a UNIQUE_HASH_INDEX on ACCOUNTS for the account
>>>> number (a string).
>>>>
>>>> We're using OrientDB 2.2.7.
>>>>
>>>> My batch size is 5k and I am using the "remote" protocol to connect to
>>>> our cluster.
>>>>
>>>> I'm using JDK 8 and my 3 boxes are beefy machines (32 cores each) but
>>>> without SSDs. I wrote the importing code myself but did nothing 'clever' (I
>>>> think) and used the Graph API. This client code has been given lots of
>>>> memory and using jstat I can see it is not excessively GCing.
>>>>
>>>> So, my questions are:
>>>>
>>>> 1. what kind of performance can I realistically expect and can I
>>>> improve what I have at the moment?
>>>>
>>>> 2. what kind of degradation should I expect as the graph grows?
>>>>
>>>> Thanks, guys.
>>>>
>>>> Phillip
>>>>
>>>>
>>>>
>>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Re: Performance of Distributed (3 nodes) cluster with one billion edges

Reply via email to