Andrey, THANK YOU! I will give this a try as soon as I can.
I will also do some profiling to see where I really need my JVM heap size to be. — John > On May 5, 2017, at 05:05, Andrey Lomakin <[email protected]> wrote: > > Hi John, > If you wish you could use this build till we will do official release > https://drive.google.com/file/d/0B2oZq2xVp841T2diVGtTcmZ5OTQ/view?usp=sharing > <https://drive.google.com/file/d/0B2oZq2xVp841T2diVGtTcmZ5OTQ/view?usp=sharing> > > > On Fri, May 5, 2017 at 11:58 AM Andrey Lomakin <[email protected] > <mailto:[email protected]>> wrote: > HI John, > > I suppose you encountered issue > https://github.com/orientechnologies/orientdb/issues/7390 > <https://github.com/orientechnologies/orientdb/issues/7390> > We will provide release soon. > > Also please do not use such huge heap size we use heap only to keep temporary > data, so I suggest you lower heap size to get ODB the chance to use more > direct memory. > > On Fri, May 5, 2017 at 10:51 AM Luigi Dell'Aquila <[email protected] > <mailto:[email protected]>> wrote: > Hi John, > > How are you doing the import? Are you working in transaction? Some code will > help us understand where the problem is > > Thanks > > Luigi > > > 2017-05-05 3:53 GMT+02:00 John J. Szucs <[email protected] > <mailto:[email protected]>>: > Hello, OrientDB community! It's me again with another question. > > I am still working on my project and have encountered another serious > challenge: it seems that writing to indices (especially edge indices?) can > cause OrientDB's direct (non-JVM) memory usage to grow without bounds until > the system effectively grinds to a halt due to swap. > > The specific use case is building a graph based on (English) Wikipedia. There > are approximately 17.4M vertices representing pages (including articles, > categories, and various meta pages). These vertices are connected by > approximately 65M (at last count) edges. There are a few super-nodes. For > example, the vertex representing https://en.wikipedia.org/wiki/United_States > <https://en.wikipedia.org/wiki/United_States> has (at last count) 306K > incoming edges and 822 outgoing edges. However, the degree of the vertices > roughly follows a Zipf distribution and the vast majority of vertices have > only a few (<10) total (in and out) edges. There are also some other vertex > and edge types for lexical data, but I think those are secondary to the issue. > > Per previous discussion here and on StackOverflow, I have added automatic > edge indices on in, out, or the composite of the two to optimize edge > queries. When I run the process to extract, transform, and load the data from > Wikipedia's XML dumps (using my own ETL code, not OrientDB's), after 24-48 > hours, the Linux System Monitor shows that physical memory usage has reached > 99.9% and then swap usage begins to grow. At this point, the process is > effectively halted by swap thrashing. > > I am running this on a Fedora 25 Linux VM with 64GB RAM and 16 CPU cores > allocated. The JVM settings are as follows: > > -Xmx32g -Xms32g -server -XX:+PerfDisableSharedMem -XX:+UseG1GC > -XX:MaxDirectMemorySize=64413m -Dstorage.wal.syncOnPageFlush=false > > The MaxDirectMemorySize parameter is recommended by OrientDB itself, during > start-up with the "out-of-memory errors" warning. It does seem odd to me that > Xmx+MaxDirectMemorySize>available RAM, but I'm more of a deep R&D (not > DevOps) guy, so I'm just accepting that unless someone advises me otherwise. > > If I disable the edge indices, then the process runs fine and completes in a > "reasonable" (for it) amount of time: 2-3 days. Of course, if I do this, my > run-time performance suffers intolerably. > > I am running this with OrientDB 2.2.19. I was able to quickly get my code to > build with 3.0 M1, but some of the unit tests fail and I am under far too > much pressure about this issue from my leadership to try to troubleshoot them > right now. > > What can I do to solve this issue? Thanks in advance for your help! > > -- John > > -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. > > > -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. > -- > Best regards, > Andrey Lomakin, R&D lead. > OrientDB Ltd > > twitter: @Andrey_Lomakin > linkedin: https://ua.linkedin.com/in/andreylomakin > <https://ua.linkedin.com/in/andreylomakin> > blogger: http://andreylomakin.blogspot.com/ > <http://andreylomakin.blogspot.com/> > -- > Best regards, > Andrey Lomakin, R&D lead. > OrientDB Ltd > > twitter: @Andrey_Lomakin > linkedin: https://ua.linkedin.com/in/andreylomakin > <https://ua.linkedin.com/in/andreylomakin> > blogger: http://andreylomakin.blogspot.com/ > <http://andreylomakin.blogspot.com/> > > -- > > --- > You received this message because you are subscribed to a topic in the Google > Groups "OrientDB" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/orient-database/p0JF5IGsqcs/unsubscribe > <https://groups.google.com/d/topic/orient-database/p0JF5IGsqcs/unsubscribe>. > To unsubscribe from this group and all its topics, send an email to > [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. Begin forwarded message: > From: Andrey Lomakin <[email protected]> > Subject: Re: [orientdb] Indices and Memory Usage > Date: May 5, 2017 at 05:05:01 EDT > To: "[email protected]" <[email protected]> > Reply-To: [email protected] > > Hi John, > If you wish you could use this build till we will do official release > https://drive.google.com/file/d/0B2oZq2xVp841T2diVGtTcmZ5OTQ/view?usp=sharing > <https://drive.google.com/file/d/0B2oZq2xVp841T2diVGtTcmZ5OTQ/view?usp=sharing> > > > On Fri, May 5, 2017 at 11:58 AM Andrey Lomakin <[email protected] > <mailto:[email protected]>> wrote: > HI John, > > I suppose you encountered issue > https://github.com/orientechnologies/orientdb/issues/7390 > <https://github.com/orientechnologies/orientdb/issues/7390> > We will provide release soon. > > Also please do not use such huge heap size we use heap only to keep temporary > data, so I suggest you lower heap size to get ODB the chance to use more > direct memory. > > On Fri, May 5, 2017 at 10:51 AM Luigi Dell'Aquila <[email protected] > <mailto:[email protected]>> wrote: > Hi John, > > How are you doing the import? Are you working in transaction? Some code will > help us understand where the problem is > > Thanks > > Luigi > > > 2017-05-05 3:53 GMT+02:00 John J. Szucs <[email protected] > <mailto:[email protected]>>: > Hello, OrientDB community! It's me again with another question. > > I am still working on my project and have encountered another serious > challenge: it seems that writing to indices (especially edge indices?) can > cause OrientDB's direct (non-JVM) memory usage to grow without bounds until > the system effectively grinds to a halt due to swap. > > The specific use case is building a graph based on (English) Wikipedia. There > are approximately 17.4M vertices representing pages (including articles, > categories, and various meta pages). These vertices are connected by > approximately 65M (at last count) edges. There are a few super-nodes. For > example, the vertex representing https://en.wikipedia.org/wiki/United_States > <https://en.wikipedia.org/wiki/United_States> has (at last count) 306K > incoming edges and 822 outgoing edges. However, the degree of the vertices > roughly follows a Zipf distribution and the vast majority of vertices have > only a few (<10) total (in and out) edges. There are also some other vertex > and edge types for lexical data, but I think those are secondary to the issue. > > Per previous discussion here and on StackOverflow, I have added automatic > edge indices on in, out, or the composite of the two to optimize edge > queries. When I run the process to extract, transform, and load the data from > Wikipedia's XML dumps (using my own ETL code, not OrientDB's), after 24-48 > hours, the Linux System Monitor shows that physical memory usage has reached > 99.9% and then swap usage begins to grow. At this point, the process is > effectively halted by swap thrashing. > > I am running this on a Fedora 25 Linux VM with 64GB RAM and 16 CPU cores > allocated. The JVM settings are as follows: > > -Xmx32g -Xms32g -server -XX:+PerfDisableSharedMem -XX:+UseG1GC > -XX:MaxDirectMemorySize=64413m -Dstorage.wal.syncOnPageFlush=false > > The MaxDirectMemorySize parameter is recommended by OrientDB itself, during > start-up with the "out-of-memory errors" warning. It does seem odd to me that > Xmx+MaxDirectMemorySize>available RAM, but I'm more of a deep R&D (not > DevOps) guy, so I'm just accepting that unless someone advises me otherwise. > > If I disable the edge indices, then the process runs fine and completes in a > "reasonable" (for it) amount of time: 2-3 days. Of course, if I do this, my > run-time performance suffers intolerably. > > I am running this with OrientDB 2.2.19. I was able to quickly get my code to > build with 3.0 M1, but some of the unit tests fail and I am under far too > much pressure about this issue from my leadership to try to troubleshoot them > right now. > > What can I do to solve this issue? Thanks in advance for your help! > > -- John > -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. > > -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. > -- > Best regards, > Andrey Lomakin, R&D lead. > OrientDB Ltd > > twitter: @Andrey_Lomakin > linkedin: https://ua.linkedin.com/in/andreylomakin > <https://ua.linkedin.com/in/andreylomakin> > blogger: http://andreylomakin.blogspot.com/ > <http://andreylomakin.blogspot.com/> > -- > Best regards, > Andrey Lomakin, R&D lead. > OrientDB Ltd > > twitter: @Andrey_Lomakin > linkedin: https://ua.linkedin.com/in/andreylomakin > <https://ua.linkedin.com/in/andreylomakin> > blogger: http://andreylomakin.blogspot.com/ > <http://andreylomakin.blogspot.com/> > -- > > --- > You received this message because you are subscribed to a topic in the Google > Groups "OrientDB" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/orient-database/p0JF5IGsqcs/unsubscribe > <https://groups.google.com/d/topic/orient-database/p0JF5IGsqcs/unsubscribe>. > To unsubscribe from this group and all its topics, send an email to > [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
