Hi Philip , Could you send thread dump for 2.2.10 version ? пт, 30 Сен 2016, 8:47 Phillip Henry <phillhe...@gmail.com>:
> Hi, Andrey. > > I was using 2.2.10 but just to be sure, I ran it a second time making sure > that 2.2.10 was the first thing in my classpath and I am afraid that I saw > it again. It's quite predictable (anywhere between 200 and 250 million > edges). > > Regards, > > Phillip > > > On Monday, September 26, 2016 at 9:06:44 AM UTC+1, Andrey Lomakin wrote: > >> Hi, >> I have looked at your thread dump we have already identified and fixed >> your issue in 2.2.9 version. >> So if you use 2.2.10 (latest one), you will not experience this problem. >> >> I strongly recommend using 2.2.10 version because several deadlocks are >> fixed in 2.2.9 version also 2.2.10 contains few minor optimizations. >> > >> On Fri, Sep 23, 2016 at 6:51 PM Phillip Henry <phill...@gmail.com> wrote: >> >>> Hi, Luca. >>> >>> > How many GB? >>> >>> The input file is 22gb of text. >>> >>> > If the file is ordered ... >>> >>> You are only sorting by the first account. The second account can be >>> anywhere in the entire range. My understanding is that both vertices are >>> updated when an edge is written. If this is true, will there not be >>> potential contention when the "to" vertex is updated? >>> >>> > OGraphBatchInsert ... keeps everything in RAM before flushing >>> >>> I assume I will still have to write retry code in the event of a >>> collision (see above)? >>> >>> > You cna use support --at- orientdb.com ... >>> >>> Sent. >>> >>> Regards, >>> >>> Phill >>> >>> On Friday, September 23, 2016 at 4:06:49 PM UTC+1, l.garulli wrote: >>> >>>> On 23 September 2016 at 03:50, Phillip Henry <phill...@gmail.com> >>>> wrote: >>>> >>>>> > How big is your file the sort cannot write? >>>>> >>>>> One bil-ee-on lines... :-P >>>>> >>>> >>>> How many GB? >>>> >>>> >>>>> > ...This should help a lot. >>>>> >>>>> The trouble is that the size of a block of contiguous accounts in the >>>>> real data is not-uniform (even if it might be with my test data). >>>>> Therefore, it is highly likely a contiguous block of account numbers will >>>>> span 2 or more batches. This will lead to a lot of contention. In your >>>>> example, if Account 2 spills over into the next batch, chances are I'll >>>>> have to rollback that batch. >>>>> >>>>> Don't you also have a problem that if X, Y, Z and W in your example >>>>> are account numbers in the next batch, you'll also get contention? >>>>> Admittedly, randomization doesn't solve this problem either. >>>>> >>>> >>>> If the file is ordered, you could have X threads (where X is the number >>>> of cores) that parse the file not sequentially. For example with 4 threads, >>>> you could start the parsing in this way: >>>> >>>> Thread 1, starts from 0 >>>> Thread 2, starts from length * 1/4 >>>> Thread 3, starts from length * 2/4 >>>> Thread 1, starts from length * 3/4 >>>> >>>> Of course the parsing should browse until the next LF+LR if it's a CSV. >>>> It requires some lines of code, but you could avoid many conflicts. >>>> >>>> >>>>> > you can use the special Batch Importer: OGraphBatchInsert >>>>> >>>>> Would this not be subject to the same contention problems? >>>>> At what point is it flushed to disk? (Obviously, it can't live in heap >>>>> forever). >>>>> >>>> >>>> It keeps everything in RAM before flushing. Up to a few hundreds of >>>> millions of vertices/edges should be fine if you have a lot of heap, like >>>> 58GB (and 4GB of DISKCACHE). It depends by the number of attributes you >>>> have. >>>> >>>> >>>>> > You should definitely using transactions with batch size of 100 >>>>> items. >>>>> >>>>> I thought I read somewhere else (can't find the link at the moment) >>>>> that you said only use transactions when using the remote protocol? >>>>> >>>> >>>> This was true before v2.2. With v2.2 the management of the transaction >>>> is parallel and very light. Transactions work well with graphs because >>>> every addEdge() operation is 2 update and having a TX that works like a >>>> batch really helps. >>>> >>>> >>>>> >>>>> > Please use last 2.2.10. ... try to define 50GB of DISKCACHE and 14GB >>>>> of Heap >>>>> >>>>> Will do on the next run. >>>>> >>>>> > If happens again, could you please send a thread dump? >>>>> >>>>> I have the full thread dump but it's on my work machine so can't post >>>>> it in this forum (all access to Google Groups is banned by the bank so I >>>>> am >>>>> writing this on my personal computer). Happy to email them to you. Which >>>>> email shall I use? >>>>> >>>> >>>> You cna use support --at- orientdb.com referring at this thread in the >>>> subject. >>>> >>>> >>>>> >>>>> Phill >>>>> >>>> >>>> >>>> Best Regards, >>>> >>>> Luca Garulli >>>> Founder & CEO >>>> OrientDB LTD <http://orientdb.com/> >>>> >>>> Want to share your opinion about OrientDB? >>>> Rate & review us at Gartner's Software Review >>>> <https://www.gartner.com/reviews/survey/home> >>>> >>>> >>>> >>>>> On Friday, September 23, 2016 at 7:41:29 AM UTC+1, l.garulli wrote: >>>>> >>>>>> On 23 September 2016 at 00:49, Phillip Henry <phill...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, Luca. >>>>>>> >>>>>> >>>>>> Hi Phillip. >>>>>> >>>>>> >>>>>>> I have: >>>>>>> >>>>>>> 4. sorting is an overhead, albeit outside of Orient. Using the Unix >>>>>>> sort command failed with "No space left on device". Oops. OK, so I ran >>>>>>> my >>>>>>> program to generate the data again, this time it is ordered by the first >>>>>>> account number. Performance was much slower as there appeared to be a >>>>>>> lot >>>>>>> of contention for this account (ie, all writes were contending for this >>>>>>> account, even if the other account had less contention). More randomized >>>>>>> data was faster. >>>>>>> >>>>>> >>>>>> How big is your file the sort cannot write? Anyway, if you have the >>>>>> accounts sorted, you should have transactions of about 100 items where >>>>>> the >>>>>> bank account and edges are in the same transaction. This should help a >>>>>> lot. >>>>>> Example: >>>>>> >>>>>> Account 1 -> Payment 1 -> Account X >>>>>> Account 1 -> Payment 2 -> Account Y >>>>>> Account 1 -> Payment 3 -> Account Z >>>>>> Account 2 -> Payment 1 -> Account X >>>>>> Account 2 -> Payment 1 -> Account W >>>>>> >>>>>> If the transaction batch is 5 (I suggest you to start with 100), all >>>>>> the operations are executed in one transaction. In another thread has: >>>>>> >>>>>> Account 99 -> Payment 1 -> Account W >>>>>> >>>>>> It could go in conflict because the shared Account W. >>>>>> >>>>>> If you can export Account's IDs that are numbers and incremental, you >>>>>> can use the special Batch Importer: OGraphBatchInsert. Example: >>>>>> >>>>>> OGraphBatchInsert batch = new OGraphBatchInsert("plocal:/temp/mydb", >>>>>> "admin", "admin"); >>>>>> batch.begin(); >>>>>> >>>>>> batch.createEdge(0L, 1L, null); // CREATE EDGES BETWEEN VERTEX 0 and 1. >>>>>> IF VERTICES >>>>>> >>>>>> // DON'T EXISTS, ARE CREATED IMPLICITELY >>>>>> batch.createEdge(1L, 2L, null); >>>>>> batch.createEdge(2L, 0L, null); >>>>>> >>>>>> >>>>>> batch.createVertex(3L); // CREATE AN NON CONNECTED VERTEX >>>>>> >>>>>> >>>>>> Map<String, Object> vertexProps = new HashMap<String, Object>(); >>>>>> vertexProps.put("foo", "foo"); >>>>>> vertexProps.put("bar", 3); >>>>>> batch.setVertexProperties(0L, vertexProps); // SET PROPERTY FOR VERTEX 0 >>>>>> batch.end(); >>>>>> >>>>>> This is blazing fast, but uses Heap so run it with a lot of it. >>>>>> >>>>>> >>>>>>> >>>>>>> 6. I've mutlithreaded my loader. The details are now: >>>>>>> >>>>>>> - using plocal >>>>>>> - using 30 threads >>>>>>> - not using transactions (OrientGraphFactory.getNoTx) >>>>>>> >>>>>> >>>>>> You should definitely using transactions with batch size of 100 >>>>>> items. This speeds up things. >>>>>> >>>>>> >>>>>>> - retrying forever upon write collisions. >>>>>>> - using Orient 2.2.7. >>>>>>> >>>>>> >>>>>> Please use last 2.2.10. >>>>>> >>>>>> >>>>>>> - using -XX:MaxDirectMemorySize:258040m >>>>>>> >>>>>> >>>>>> This is not really important, it's just an upper bound for the JVM. >>>>>> Please set it to 512GB so you can forget about it. The 2 most important >>>>>> values are DISKCACHE and JVM heap. The sum must lower than the available >>>>>> RAM in the server before you run OrientDB. >>>>>> >>>>>> If you have 64GB, try to define 50GB of DISKCACHE and 14GB of Heap. >>>>>> >>>>>> If you use the Batch Importer, you should use more Heap and less >>>>>> DISKCACHE. >>>>>> >>>>>> >>>>>>> The good news is I've achieved an initial write throughput of about >>>>>>> 30k/second. >>>>>>> >>>>>>> The bad news is I've tried several runs and only been able to >>>>>>> achieve 200mil < number of writes < 300mil. >>>>>>> >>>>>>> The first time I tried it, the loader deadlocked. Using jstat showed >>>>>>> that the deadlock was between 3 threads at: >>>>>>> - >>>>>>> OOneKeyEntryPerKeyLockManager.acquireLock(OOneKeyEntryPerKeyLockManager.java:173) >>>>>>> - >>>>>>> OPartitionedLockManager.acquireExclusiveLock(OPartitionedLockManager.java:210) >>>>>>> - >>>>>>> OOneKeyEntryPerKeyLockManager.acquireLock(OOneKeyEntryPerKeyLockManager.java:171) >>>>>>> >>>>>> >>>>>> If happens again, could you please send a thread dump? >>>>>> >>>>>> >>>>>>> The second time it failed was due to a NullPointerException at >>>>>>> OByteBufferPool.java:297. I've looked at the code and the only way I can >>>>>>> see this happening is if OByteBufferPool.allocateBuffer throws an error >>>>>>> (perhaps an OutOfMemoryError in java.nio.Bits.reserveMemory). This >>>>>>> StackOverflow posting ( >>>>>>> http://stackoverflow.com/questions/8462200/examples-of-forcing-freeing-of-native-memory-direct-bytebuffer-has-allocated-us) >>>>>>> seems to indicate that this can happen if the underlying >>>>>>> DirectByteBuffer's >>>>>>> Cleaner doesn't have its clean() method called. >>>>>>> >>>>>> >>>>>> This is because the database was bigger than this setting: - using >>>>>> -XX:MaxDirectMemorySize:258040m. Please set this at 512GB (see above). >>>>>> >>>>>> >>>>>>> Alternatively, I followed the SO suggestion and lowered the heap >>>>>>> space to a mere 1gb (it was 50gb) to make the GC more active. >>>>>>> Unfortunately, after a good start, the job is still running some 15 >>>>>>> hours >>>>>>> later with a hugely reduced write throughput (~ 7k/s). Jstat shows 4292 >>>>>>> full GCs taking a total time of 4597s - not great but not hugely awful >>>>>>> either. At this rate, the remaining 700mil or so payments are going to >>>>>>> take >>>>>>> another 30 hours. >>>>>>> >>>>>> >>>>>> See above the suggested settings. >>>>>> >>>>>> >>>>>>> 7. Even with the highest throughput I have achieved, 30k writes per >>>>>>> second, I'm looking at about 20 hours of loading. We've taken the same >>>>>>> data >>>>>>> and, after trial and error that was not without its own problems, put it >>>>>>> into Neo4J in 37 minutes. This is a significant difference. It appears >>>>>>> that >>>>>>> they are approaching the problem differently to avoid contention on >>>>>>> updating the vertices during an edge write. >>>>>>> >>>>>> >>>>>> With all this suggestion you should be able to have much better >>>>>> numbers. If you can use the Batch Importer the number should be close to >>>>>> Neo4j. >>>>>> >>>>>> >>>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Phillip >>>>>>> >>>>>>> >>>>>> >>>>>> Best Regards, >>>>>> >>>>>> Luca Garulli >>>>>> Founder & CEO >>>>>> OrientDB LTD <http://orientdb.com/> >>>>>> >>>>>> Want to share your opinion about OrientDB? >>>>>> Rate & review us at Gartner's Software Review >>>>>> <https://www.gartner.com/reviews/survey/home> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> On Thursday, September 15, 2016 at 10:06:44 PM UTC+1, l.garulli >>>>>>> wrote: >>>>>>>> >>>>>>>> On 15 September 2016 at 09:54, Phillip Henry <phill...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, Luca. >>>>>>>>> >>>>>>>> >>>>>>>> Hi Phillip, >>>>>>>> >>>>>>>> 3. Yes, default configuration. Apart from adding an index for >>>>>>>>> ACCOUNTS, I did nothing further. >>>>>>>>> >>>>>>>> >>>>>>>> Ok, so you have writeQuorum="majority" that means 2 sycnhronous >>>>>>>> writes and 1 asynchronous per transaction. >>>>>>>> >>>>>>>> >>>>>>>>> 4. Good question. With real data, we expect it to be as you >>>>>>>>> suggest: some nodes with the majority of the payments (eg, >>>>>>>>> supermarkets). >>>>>>>>> However, for the test data, payments were assigned randomly and, >>>>>>>>> therefore, >>>>>>>>> should be uniformly distributed. >>>>>>>>> >>>>>>>> >>>>>>>> What's your average in terms of number of edges? <10, <50, <200, >>>>>>>> <1000? >>>>>>>> >>>>>>>> >>>>>>>>> 2. Yes, I tried plocal minutes after posting (d'oh!). I saw a good >>>>>>>>> improvement. It started about 3 times faster and got faster still >>>>>>>>> (about 10 >>>>>>>>> times faster) by the time I checked this morning on a job running >>>>>>>>> overnight. However, even though it is now running at about 7k >>>>>>>>> transactions >>>>>>>>> per second, a billion edges is still going to take about 40 hours. >>>>>>>>> So, I >>>>>>>>> ask myself: is there anyway I can make it faster still? >>>>>>>>> >>>>>>>> >>>>>>>> Here it's missing the usage of AUTO-SHARDING INDEX. Example: >>>>>>>> >>>>>>>> accountClass.createIndex("Account.number", >>>>>>>> OClass.INDEX_TYPE.UNIQUE.toString(), (OProgressListener) null, >>>>>>>> (ODocument) null, >>>>>>>> "AUTOSHARDING", new String[] { "number" }); >>>>>>>> >>>>>>>> In this way you should go more in parallel, because the index is >>>>>>>> distributed across all the shards (clusters) of Account class. you >>>>>>>> should >>>>>>>> have 32 of them by default because you have 32 cores. >>>>>>>> >>>>>>>> Please let me know if by sorting the from_accounts and with this >>>>>>>> change if it's much faster. >>>>>>>> >>>>>>>> This is the best you can have out of the box. To push numbers up >>>>>>>> it's slightly more complicated: you should be sure that transactions >>>>>>>> go in >>>>>>>> parallel and they aren't serialized. This is possible by playing with >>>>>>>> internal OrientDB settings (mainly the distributed workerThreads), by >>>>>>>> having many clusters per class (You could try with 128 first and see >>>>>>>> how >>>>>>>> it's going). >>>>>>>> >>>>>>>> >>>>>>>>> I assume when I start the servers up in distributed mode once >>>>>>>>> more, the data will then be distributed across all nodes in the >>>>>>>>> cluster? >>>>>>>>> >>>>>>>> >>>>>>>> That's right. >>>>>>>> >>>>>>>> >>>>>>>>> 3. I'll return to concurrent, remote inserts when this job has >>>>>>>>> finished. Hopefully, a smaller batch size will mean there is no >>>>>>>>> degradation >>>>>>>>> in performance either... FYI: with a somewhat unscientific approach, >>>>>>>>> I was >>>>>>>>> polling the server JVM with JStack and saw only a single thread doing >>>>>>>>> all >>>>>>>>> the work and it *seemed* to spend a lot of its time in ODirtyManager >>>>>>>>> on >>>>>>>>> collection manipulation. >>>>>>>>> >>>>>>>> >>>>>>>> I think it's because you didn't use the AUTO-SHARDING index. >>>>>>>> Furthermore running distributed, unfortunately, means the tree ridbag >>>>>>>> is >>>>>>>> not available (we will support it in the future), so every change to >>>>>>>> the >>>>>>>> edges takes a lot of CPU to demarshall and marshall the entire edge >>>>>>>> list >>>>>>>> everytime you update a vertex. That's why my recommendation about >>>>>>>> sorting >>>>>>>> the vertices. >>>>>>>> >>>>>>>> >>>>>>>>> I totally appreciate that performance tuning is an empirical >>>>>>>>> science, but do you have any opinions as to which would probably be >>>>>>>>> faster: >>>>>>>>> single-threaded plocal or multithreaded remote? >>>>>>>>> >>>>>>>> >>>>>>>> With v2.2 yo can go in parallel, by using the tips above. For sure >>>>>>>> the replication has a cost. I'm sure you can go much faster with just >>>>>>>> one >>>>>>>> node and then start the other 2 nodes to have the database replicated >>>>>>>> automatically. At least for the first massive insertion. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Phillip >>>>>>>>> >>>>>>>> >>>>>>>> Luca >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Wednesday, September 14, 2016 at 3:48:56 PM UTC+1, Phillip >>>>>>>>> Henry wrote: >>>>>>>>>> >>>>>>>>>> Hi, guys. >>>>>>>>>> >>>>>>>>>> I'm conducting a proof-of-concept for a large bank (Luca, we had >>>>>>>>>> a 'phone conf on August 5...) and I'm trying to bulk insert a >>>>>>>>>> humongous >>>>>>>>>> amount of data: 1 million vertices and 1 billion edges. >>>>>>>>>> >>>>>>>>>> Firstly, I'm impressed about how easy it was to configure a >>>>>>>>>> cluster. However, the performance of batch inserting is bad (and >>>>>>>>>> seems to >>>>>>>>>> get considerably worse as I add more data). It starts at about 2k >>>>>>>>>> vertices-and-edges per second and deteriorates to about 500/second >>>>>>>>>> after >>>>>>>>>> only about 3 million edges have been added. This also takes ~ 30 >>>>>>>>>> minutes. >>>>>>>>>> Needless to say that 1 billion payments (edges) will take over a >>>>>>>>>> week at >>>>>>>>>> this rate. >>>>>>>>>> >>>>>>>>>> This is a show-stopper for us. >>>>>>>>>> >>>>>>>>>> My data model is simply payments between accounts and I store it >>>>>>>>>> in one large file. It's just 3 fields and looks like: >>>>>>>>>> >>>>>>>>>> FROM_ACCOUNT TO_ACCOUNT AMOUNT >>>>>>>>>> >>>>>>>>>> In the test data I generated, I had 1 million accounts and 1 >>>>>>>>>> billion payments randomly distributed between pairs of accounts. >>>>>>>>>> >>>>>>>>>> I have 2 classes in OrientDB: ACCOUNTS (extending V) and PAYMENT >>>>>>>>>> (extending E). There is a UNIQUE_HASH_INDEX on ACCOUNTS for the >>>>>>>>>> account >>>>>>>>>> number (a string). >>>>>>>>>> >>>>>>>>>> We're using OrientDB 2.2.7. >>>>>>>>>> >>>>>>>>>> My batch size is 5k and I am using the "remote" protocol to >>>>>>>>>> connect to our cluster. >>>>>>>>>> >>>>>>>>>> I'm using JDK 8 and my 3 boxes are beefy machines (32 cores each) >>>>>>>>>> but without SSDs. I wrote the importing code myself but did nothing >>>>>>>>>> 'clever' (I think) and used the Graph API. This client code has been >>>>>>>>>> given >>>>>>>>>> lots of memory and using jstat I can see it is not excessively GCing. >>>>>>>>>> >>>>>>>>>> So, my questions are: >>>>>>>>>> >>>>>>>>>> 1. what kind of performance can I realistically expect and can I >>>>>>>>>> improve what I have at the moment? >>>>>>>>>> >>>>>>>>>> 2. what kind of degradation should I expect as the graph grows? >>>>>>>>>> >>>>>>>>>> Thanks, guys. >>>>>>>>>> >>>>>>>>>> Phillip >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> >>>>>>>>> --- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "OrientDB" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to orient-databa...@googlegroups.com. >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>> >>>>>>> --- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "OrientDB" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to orient-databa...@googlegroups.com. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>> >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "OrientDB" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to orient-databa...@googlegroups.com. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "OrientDB" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to orient-databa...@googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- >> > Best regards, >> Andrey Lomakin, R&D lead. >> OrientDB Ltd >> >> twitter: @Andrey_Lomakin >> linkedin: https://ua.linkedin.com/in/andreylomakin >> blogger: http://andreylomakin.blogspot.com/ >> > -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to orient-database+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- Best regards, Andrey Lomakin, R&D lead. OrientDB Ltd twitter: @Andrey_Lomakin linkedin: https://ua.linkedin.com/in/andreylomakin blogger: http://andreylomakin.blogspot.com/ -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.