On 23 September 2016 at 03:50, Phillip Henry <[email protected]> wrote:
> > How big is your file the sort cannot write? > > One bil-ee-on lines... :-P > How many GB? > > ...This should help a lot. > > The trouble is that the size of a block of contiguous accounts in the real > data is not-uniform (even if it might be with my test data). Therefore, it > is highly likely a contiguous block of account numbers will span 2 or more > batches. This will lead to a lot of contention. In your example, if Account > 2 spills over into the next batch, chances are I'll have to rollback that > batch. > > Don't you also have a problem that if X, Y, Z and W in your example are > account numbers in the next batch, you'll also get contention? Admittedly, > randomization doesn't solve this problem either. > If the file is ordered, you could have X threads (where X is the number of cores) that parse the file not sequentially. For example with 4 threads, you could start the parsing in this way: Thread 1, starts from 0 Thread 2, starts from length * 1/4 Thread 3, starts from length * 2/4 Thread 1, starts from length * 3/4 Of course the parsing should browse until the next LF+LR if it's a CSV. It requires some lines of code, but you could avoid many conflicts. > > you can use the special Batch Importer: OGraphBatchInsert > > Would this not be subject to the same contention problems? > At what point is it flushed to disk? (Obviously, it can't live in heap > forever). > It keeps everything in RAM before flushing. Up to a few hundreds of millions of vertices/edges should be fine if you have a lot of heap, like 58GB (and 4GB of DISKCACHE). It depends by the number of attributes you have. > > You should definitely using transactions with batch size of 100 items. > > I thought I read somewhere else (can't find the link at the moment) that > you said only use transactions when using the remote protocol? > This was true before v2.2. With v2.2 the management of the transaction is parallel and very light. Transactions work well with graphs because every addEdge() operation is 2 update and having a TX that works like a batch really helps. > > > Please use last 2.2.10. ... try to define 50GB of DISKCACHE and 14GB of > Heap > > Will do on the next run. > > > If happens again, could you please send a thread dump? > > I have the full thread dump but it's on my work machine so can't post it > in this forum (all access to Google Groups is banned by the bank so I am > writing this on my personal computer). Happy to email them to you. Which > email shall I use? > You cna use support --at- orientdb.com referring at this thread in the subject. > > Phill > Best Regards, Luca Garulli Founder & CEO OrientDB LTD <http://orientdb.com/> Want to share your opinion about OrientDB? Rate & review us at Gartner's Software Review <https://www.gartner.com/reviews/survey/home> > On Friday, September 23, 2016 at 7:41:29 AM UTC+1, l.garulli wrote: > >> On 23 September 2016 at 00:49, Phillip Henry <[email protected]> wrote: >> >>> Hi, Luca. >>> >> >> Hi Phillip. >> >> >>> I have: >>> >>> 4. sorting is an overhead, albeit outside of Orient. Using the Unix sort >>> command failed with "No space left on device". Oops. OK, so I ran my >>> program to generate the data again, this time it is ordered by the first >>> account number. Performance was much slower as there appeared to be a lot >>> of contention for this account (ie, all writes were contending for this >>> account, even if the other account had less contention). More randomized >>> data was faster. >>> >> >> How big is your file the sort cannot write? Anyway, if you have the >> accounts sorted, you should have transactions of about 100 items where the >> bank account and edges are in the same transaction. This should help a lot. >> Example: >> >> Account 1 -> Payment 1 -> Account X >> Account 1 -> Payment 2 -> Account Y >> Account 1 -> Payment 3 -> Account Z >> Account 2 -> Payment 1 -> Account X >> Account 2 -> Payment 1 -> Account W >> >> If the transaction batch is 5 (I suggest you to start with 100), all the >> operations are executed in one transaction. In another thread has: >> >> Account 99 -> Payment 1 -> Account W >> >> It could go in conflict because the shared Account W. >> >> If you can export Account's IDs that are numbers and incremental, you can >> use the special Batch Importer: OGraphBatchInsert. Example: >> >> OGraphBatchInsert batch = new OGraphBatchInsert("plocal:/temp/mydb", >> "admin", "admin"); >> batch.begin(); >> >> batch.createEdge(0L, 1L, null); // CREATE EDGES BETWEEN VERTEX 0 and 1. IF >> VERTICES >> >> // DON'T EXISTS, ARE CREATED IMPLICITELY >> batch.createEdge(1L, 2L, null); >> batch.createEdge(2L, 0L, null); >> >> >> batch.createVertex(3L); // CREATE AN NON CONNECTED VERTEX >> >> >> Map<String, Object> vertexProps = new HashMap<String, Object>(); >> vertexProps.put("foo", "foo"); >> vertexProps.put("bar", 3); >> batch.setVertexProperties(0L, vertexProps); // SET PROPERTY FOR VERTEX 0 >> batch.end(); >> >> This is blazing fast, but uses Heap so run it with a lot of it. >> >> >>> >>> 6. I've mutlithreaded my loader. The details are now: >>> >>> - using plocal >>> - using 30 threads >>> - not using transactions (OrientGraphFactory.getNoTx) >>> >> >> You should definitely using transactions with batch size of 100 items. >> This speeds up things. >> >> >>> - retrying forever upon write collisions. >>> - using Orient 2.2.7. >>> >> >> Please use last 2.2.10. >> >> >>> - using -XX:MaxDirectMemorySize:258040m >>> >> >> This is not really important, it's just an upper bound for the JVM. >> Please set it to 512GB so you can forget about it. The 2 most important >> values are DISKCACHE and JVM heap. The sum must lower than the available >> RAM in the server before you run OrientDB. >> >> If you have 64GB, try to define 50GB of DISKCACHE and 14GB of Heap. >> >> If you use the Batch Importer, you should use more Heap and less >> DISKCACHE. >> >> >>> The good news is I've achieved an initial write throughput of about >>> 30k/second. >>> >>> The bad news is I've tried several runs and only been able to achieve >>> 200mil < number of writes < 300mil. >>> >>> The first time I tried it, the loader deadlocked. Using jstat showed >>> that the deadlock was between 3 threads at: >>> - OOneKeyEntryPerKeyLockManager.acquireLock(OOneKeyEntryPerKey >>> LockManager.java:173) >>> - OPartitionedLockManager.acquireExclusiveLock(OPartitionedLoc >>> kManager.java:210) >>> - OOneKeyEntryPerKeyLockManager.acquireLock(OOneKeyEntryPerKey >>> LockManager.java:171) >>> >> >> If happens again, could you please send a thread dump? >> >> >>> The second time it failed was due to a NullPointerException at >>> OByteBufferPool.java:297. I've looked at the code and the only way I can >>> see this happening is if OByteBufferPool.allocateBuffer throws an error >>> (perhaps an OutOfMemoryError in java.nio.Bits.reserveMemory). This >>> StackOverflow posting (http://stackoverflow.com/ques >>> tions/8462200/examples-of-forcing-freeing-of-native-memory- >>> direct-bytebuffer-has-allocated-us) seems to indicate that this can >>> happen if the underlying DirectByteBuffer's Cleaner doesn't have its >>> clean() method called. >>> >> >> This is because the database was bigger than this setting: - using >> -XX:MaxDirectMemorySize:258040m. Please set this at 512GB (see above). >> >> >>> Alternatively, I followed the SO suggestion and lowered the heap space >>> to a mere 1gb (it was 50gb) to make the GC more active. Unfortunately, >>> after a good start, the job is still running some 15 hours later with a >>> hugely reduced write throughput (~ 7k/s). Jstat shows 4292 full GCs taking >>> a total time of 4597s - not great but not hugely awful either. At this >>> rate, the remaining 700mil or so payments are going to take another 30 >>> hours. >>> >> >> See above the suggested settings. >> >> >>> 7. Even with the highest throughput I have achieved, 30k writes per >>> second, I'm looking at about 20 hours of loading. We've taken the same data >>> and, after trial and error that was not without its own problems, put it >>> into Neo4J in 37 minutes. This is a significant difference. It appears that >>> they are approaching the problem differently to avoid contention on >>> updating the vertices during an edge write. >>> >> >> With all this suggestion you should be able to have much better numbers. >> If you can use the Batch Importer the number should be close to Neo4j. >> >> >>> >>> Thoughts? >>> >>> Regards, >>> >>> Phillip >>> >>> >> >> Best Regards, >> >> Luca Garulli >> Founder & CEO >> OrientDB LTD <http://orientdb.com/> >> >> Want to share your opinion about OrientDB? >> Rate & review us at Gartner's Software Review >> <https://www.gartner.com/reviews/survey/home> >> >> >> >> >>> >>> On Thursday, September 15, 2016 at 10:06:44 PM UTC+1, l.garulli wrote: >>>> >>>> On 15 September 2016 at 09:54, Phillip Henry <[email protected]> >>>> wrote: >>>> >>>>> Hi, Luca. >>>>> >>>> >>>> Hi Phillip, >>>> >>>> 3. Yes, default configuration. Apart from adding an index for ACCOUNTS, >>>>> I did nothing further. >>>>> >>>> >>>> Ok, so you have writeQuorum="majority" that means 2 sycnhronous writes >>>> and 1 asynchronous per transaction. >>>> >>>> >>>>> 4. Good question. With real data, we expect it to be as you suggest: >>>>> some nodes with the majority of the payments (eg, supermarkets). However, >>>>> for the test data, payments were assigned randomly and, therefore, should >>>>> be uniformly distributed. >>>>> >>>> >>>> What's your average in terms of number of edges? <10, <50, <200, <1000? >>>> >>>> >>>>> 2. Yes, I tried plocal minutes after posting (d'oh!). I saw a good >>>>> improvement. It started about 3 times faster and got faster still (about >>>>> 10 >>>>> times faster) by the time I checked this morning on a job running >>>>> overnight. However, even though it is now running at about 7k transactions >>>>> per second, a billion edges is still going to take about 40 hours. So, I >>>>> ask myself: is there anyway I can make it faster still? >>>>> >>>> >>>> Here it's missing the usage of AUTO-SHARDING INDEX. Example: >>>> >>>> accountClass.createIndex("Account.number", >>>> OClass.INDEX_TYPE.UNIQUE.toString(), (OProgressListener) null, (ODocument) >>>> null, >>>> "AUTOSHARDING", new String[] { "number" }); >>>> >>>> In this way you should go more in parallel, because the index is >>>> distributed across all the shards (clusters) of Account class. you should >>>> have 32 of them by default because you have 32 cores. >>>> >>>> Please let me know if by sorting the from_accounts and with this change >>>> if it's much faster. >>>> >>>> This is the best you can have out of the box. To push numbers up it's >>>> slightly more complicated: you should be sure that transactions go in >>>> parallel and they aren't serialized. This is possible by playing with >>>> internal OrientDB settings (mainly the distributed workerThreads), by >>>> having many clusters per class (You could try with 128 first and see how >>>> it's going). >>>> >>>> >>>>> I assume when I start the servers up in distributed mode once more, >>>>> the data will then be distributed across all nodes in the cluster? >>>>> >>>> >>>> That's right. >>>> >>>> >>>>> 3. I'll return to concurrent, remote inserts when this job has >>>>> finished. Hopefully, a smaller batch size will mean there is no >>>>> degradation >>>>> in performance either... FYI: with a somewhat unscientific approach, I was >>>>> polling the server JVM with JStack and saw only a single thread doing all >>>>> the work and it *seemed* to spend a lot of its time in ODirtyManager on >>>>> collection manipulation. >>>>> >>>> >>>> I think it's because you didn't use the AUTO-SHARDING index. >>>> Furthermore running distributed, unfortunately, means the tree ridbag is >>>> not available (we will support it in the future), so every change to the >>>> edges takes a lot of CPU to demarshall and marshall the entire edge list >>>> everytime you update a vertex. That's why my recommendation about sorting >>>> the vertices. >>>> >>>> >>>>> I totally appreciate that performance tuning is an empirical science, >>>>> but do you have any opinions as to which would probably be faster: >>>>> single-threaded plocal or multithreaded remote? >>>>> >>>> >>>> With v2.2 yo can go in parallel, by using the tips above. For sure the >>>> replication has a cost. I'm sure you can go much faster with just one node >>>> and then start the other 2 nodes to have the database replicated >>>> automatically. At least for the first massive insertion. >>>> >>>> >>>>> >>>>> Regards, >>>>> >>>>> Phillip >>>>> >>>> >>>> Luca >>>> >>>> >>>> >>>>> >>>>> On Wednesday, September 14, 2016 at 3:48:56 PM UTC+1, Phillip Henry >>>>> wrote: >>>>>> >>>>>> Hi, guys. >>>>>> >>>>>> I'm conducting a proof-of-concept for a large bank (Luca, we had a >>>>>> 'phone conf on August 5...) and I'm trying to bulk insert a humongous >>>>>> amount of data: 1 million vertices and 1 billion edges. >>>>>> >>>>>> Firstly, I'm impressed about how easy it was to configure a cluster. >>>>>> However, the performance of batch inserting is bad (and seems to get >>>>>> considerably worse as I add more data). It starts at about 2k >>>>>> vertices-and-edges per second and deteriorates to about 500/second after >>>>>> only about 3 million edges have been added. This also takes ~ 30 minutes. >>>>>> Needless to say that 1 billion payments (edges) will take over a week at >>>>>> this rate. >>>>>> >>>>>> This is a show-stopper for us. >>>>>> >>>>>> My data model is simply payments between accounts and I store it in >>>>>> one large file. It's just 3 fields and looks like: >>>>>> >>>>>> FROM_ACCOUNT TO_ACCOUNT AMOUNT >>>>>> >>>>>> In the test data I generated, I had 1 million accounts and 1 billion >>>>>> payments randomly distributed between pairs of accounts. >>>>>> >>>>>> I have 2 classes in OrientDB: ACCOUNTS (extending V) and PAYMENT >>>>>> (extending E). There is a UNIQUE_HASH_INDEX on ACCOUNTS for the account >>>>>> number (a string). >>>>>> >>>>>> We're using OrientDB 2.2.7. >>>>>> >>>>>> My batch size is 5k and I am using the "remote" protocol to connect >>>>>> to our cluster. >>>>>> >>>>>> I'm using JDK 8 and my 3 boxes are beefy machines (32 cores each) but >>>>>> without SSDs. I wrote the importing code myself but did nothing 'clever' >>>>>> (I >>>>>> think) and used the Graph API. This client code has been given lots of >>>>>> memory and using jstat I can see it is not excessively GCing. >>>>>> >>>>>> So, my questions are: >>>>>> >>>>>> 1. what kind of performance can I realistically expect and can I >>>>>> improve what I have at the moment? >>>>>> >>>>>> 2. what kind of degradation should I expect as the graph grows? >>>>>> >>>>>> Thanks, guys. >>>>>> >>>>>> Phillip >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>> >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "OrientDB" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "OrientDB" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
