Re: [orientdb] Indices and Memory Usage

John J. Szucs Fri, 05 May 2017 06:52:44 -0700

Andrey,

THANK YOU! I will give this a try as soon as I can.


I will also do some profiling to see where I really need my JVM heap size to be.

— John

> On May 5, 2017, at 05:05, Andrey Lomakin <[email protected]> wrote:
> 
> Hi John,
> If you wish you could use this build till we will do official release 
> https://drive.google.com/file/d/0B2oZq2xVp841T2diVGtTcmZ5OTQ/view?usp=sharing 
> <https://drive.google.com/file/d/0B2oZq2xVp841T2diVGtTcmZ5OTQ/view?usp=sharing>
>  
> 
> On Fri, May 5, 2017 at 11:58 AM Andrey Lomakin <[email protected] 
> <mailto:[email protected]>> wrote:
> HI John,
> 
> I suppose you encountered issue 
> https://github.com/orientechnologies/orientdb/issues/7390 
> <https://github.com/orientechnologies/orientdb/issues/7390> 
> We will provide release soon.
> 
> Also please do not use such huge heap size we use heap only to keep temporary 
> data, so I suggest you lower heap size to get ODB the chance to use more 
> direct memory.
> 
> On Fri, May 5, 2017 at 10:51 AM Luigi Dell'Aquila <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi John,
> 
> How are you doing the import? Are you working in transaction? Some code will 
> help us understand where the problem is
> 
> Thanks
> 
> Luigi
> 
> 
> 2017-05-05 3:53 GMT+02:00 John J. Szucs <[email protected] 
> <mailto:[email protected]>>:
> Hello, OrientDB community! It's me again with another question.
> 
> I am still working on my project and have encountered another serious 
> challenge: it seems that writing to indices (especially edge indices?) can 
> cause OrientDB's direct (non-JVM) memory usage to grow without bounds until 
> the system effectively grinds to a halt due to swap.
> 
> The specific use case is building a graph based on (English) Wikipedia. There 
> are approximately 17.4M vertices representing pages (including articles, 
> categories, and various meta pages). These vertices are connected by 
> approximately 65M (at last count) edges. There are a few super-nodes. For 
> example, the vertex representing https://en.wikipedia.org/wiki/United_States 
> <https://en.wikipedia.org/wiki/United_States> has (at last count) 306K 
> incoming edges and 822 outgoing edges. However, the degree of the vertices 
> roughly follows a Zipf distribution and the vast majority of vertices have 
> only a few (<10) total (in and out) edges. There are also some other vertex 
> and edge types for lexical data, but I think those are secondary to the issue.
> 
> Per previous discussion here and on StackOverflow, I have added automatic 
> edge indices on in, out, or the composite of the two to optimize edge 
> queries. When I run the process to extract, transform, and load the data from 
> Wikipedia's XML dumps (using my own ETL code, not OrientDB's), after 24-48 
> hours, the Linux System Monitor shows that physical memory usage has reached 
> 99.9% and then swap usage begins to grow. At this point, the process is 
> effectively halted by swap thrashing.
> 
> I am running this on a Fedora 25 Linux VM with 64GB RAM and 16 CPU cores 
> allocated. The JVM settings are as follows:
> 
> -Xmx32g -Xms32g -server -XX:+PerfDisableSharedMem -XX:+UseG1GC 
> -XX:MaxDirectMemorySize=64413m -Dstorage.wal.syncOnPageFlush=false
> 
> The MaxDirectMemorySize parameter is recommended by OrientDB itself, during 
> start-up with the "out-of-memory errors" warning. It does seem odd to me that 
> Xmx+MaxDirectMemorySize>available RAM, but I'm more of a deep R&D (not 
> DevOps) guy, so I'm just accepting that unless someone advises me otherwise.
> 
> If I disable the edge indices, then the process runs fine and completes in a 
> "reasonable" (for it) amount of time: 2-3 days. Of course, if I do this, my 
> run-time performance suffers intolerably.
> 
> I am running this with OrientDB 2.2.19. I was able to quickly get my code to 
> build with 3.0 M1, but some of the unit tests fail and I am under far too 
> much pressure about this issue from my leadership to try to troubleshoot them 
> right now.
> 
> What can I do to solve this issue? Thanks in advance for your help!
> 
> -- John
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> -- 
> Best regards,
> Andrey Lomakin, R&D lead. 
> OrientDB Ltd
> 
> twitter: @Andrey_Lomakin 
> linkedin: https://ua.linkedin.com/in/andreylomakin 
> <https://ua.linkedin.com/in/andreylomakin>
> blogger: http://andreylomakin.blogspot.com/ 
> <http://andreylomakin.blogspot.com/> 
> -- 
> Best regards,
> Andrey Lomakin, R&D lead. 
> OrientDB Ltd
> 
> twitter: @Andrey_Lomakin 
> linkedin: https://ua.linkedin.com/in/andreylomakin 
> <https://ua.linkedin.com/in/andreylomakin>
> blogger: http://andreylomakin.blogspot.com/ 
> <http://andreylomakin.blogspot.com/> 
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "OrientDB" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/orient-database/p0JF5IGsqcs/unsubscribe 
> <https://groups.google.com/d/topic/orient-database/p0JF5IGsqcs/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.



Begin forwarded message:

> From: Andrey Lomakin <[email protected]>
> Subject: Re: [orientdb] Indices and Memory Usage
> Date: May 5, 2017 at 05:05:01 EDT
> To: "[email protected]" <[email protected]>
> Reply-To: [email protected]
> 
> Hi John,
> If you wish you could use this build till we will do official release 
> https://drive.google.com/file/d/0B2oZq2xVp841T2diVGtTcmZ5OTQ/view?usp=sharing 
> <https://drive.google.com/file/d/0B2oZq2xVp841T2diVGtTcmZ5OTQ/view?usp=sharing>
>  
> 
> On Fri, May 5, 2017 at 11:58 AM Andrey Lomakin <[email protected] 
> <mailto:[email protected]>> wrote:
> HI John,
> 
> I suppose you encountered issue 
> https://github.com/orientechnologies/orientdb/issues/7390 
> <https://github.com/orientechnologies/orientdb/issues/7390> 
> We will provide release soon.
> 
> Also please do not use such huge heap size we use heap only to keep temporary 
> data, so I suggest you lower heap size to get ODB the chance to use more 
> direct memory.
> 
> On Fri, May 5, 2017 at 10:51 AM Luigi Dell'Aquila <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi John,
> 
> How are you doing the import? Are you working in transaction? Some code will 
> help us understand where the problem is
> 
> Thanks
> 
> Luigi
> 
> 
> 2017-05-05 3:53 GMT+02:00 John J. Szucs <[email protected] 
> <mailto:[email protected]>>:
> Hello, OrientDB community! It's me again with another question.
> 
> I am still working on my project and have encountered another serious 
> challenge: it seems that writing to indices (especially edge indices?) can 
> cause OrientDB's direct (non-JVM) memory usage to grow without bounds until 
> the system effectively grinds to a halt due to swap.
> 
> The specific use case is building a graph based on (English) Wikipedia. There 
> are approximately 17.4M vertices representing pages (including articles, 
> categories, and various meta pages). These vertices are connected by 
> approximately 65M (at last count) edges. There are a few super-nodes. For 
> example, the vertex representing https://en.wikipedia.org/wiki/United_States 
> <https://en.wikipedia.org/wiki/United_States> has (at last count) 306K 
> incoming edges and 822 outgoing edges. However, the degree of the vertices 
> roughly follows a Zipf distribution and the vast majority of vertices have 
> only a few (<10) total (in and out) edges. There are also some other vertex 
> and edge types for lexical data, but I think those are secondary to the issue.
> 
> Per previous discussion here and on StackOverflow, I have added automatic 
> edge indices on in, out, or the composite of the two to optimize edge 
> queries. When I run the process to extract, transform, and load the data from 
> Wikipedia's XML dumps (using my own ETL code, not OrientDB's), after 24-48 
> hours, the Linux System Monitor shows that physical memory usage has reached 
> 99.9% and then swap usage begins to grow. At this point, the process is 
> effectively halted by swap thrashing.
> 
> I am running this on a Fedora 25 Linux VM with 64GB RAM and 16 CPU cores 
> allocated. The JVM settings are as follows:
> 
> -Xmx32g -Xms32g -server -XX:+PerfDisableSharedMem -XX:+UseG1GC 
> -XX:MaxDirectMemorySize=64413m -Dstorage.wal.syncOnPageFlush=false
> 
> The MaxDirectMemorySize parameter is recommended by OrientDB itself, during 
> start-up with the "out-of-memory errors" warning. It does seem odd to me that 
> Xmx+MaxDirectMemorySize>available RAM, but I'm more of a deep R&D (not 
> DevOps) guy, so I'm just accepting that unless someone advises me otherwise.
> 
> If I disable the edge indices, then the process runs fine and completes in a 
> "reasonable" (for it) amount of time: 2-3 days. Of course, if I do this, my 
> run-time performance suffers intolerably.
> 
> I am running this with OrientDB 2.2.19. I was able to quickly get my code to 
> build with 3.0 M1, but some of the unit tests fail and I am under far too 
> much pressure about this issue from my leadership to try to troubleshoot them 
> right now.
> 
> What can I do to solve this issue? Thanks in advance for your help!
> 
> -- John
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> 
> -- 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> -- 
> Best regards,
> Andrey Lomakin, R&D lead. 
> OrientDB Ltd
> 
> twitter: @Andrey_Lomakin 
> linkedin: https://ua.linkedin.com/in/andreylomakin 
> <https://ua.linkedin.com/in/andreylomakin>
> blogger: http://andreylomakin.blogspot.com/ 
> <http://andreylomakin.blogspot.com/> 
> -- 
> Best regards,
> Andrey Lomakin, R&D lead. 
> OrientDB Ltd
> 
> twitter: @Andrey_Lomakin 
> linkedin: https://ua.linkedin.com/in/andreylomakin 
> <https://ua.linkedin.com/in/andreylomakin>
> blogger: http://andreylomakin.blogspot.com/ 
> <http://andreylomakin.blogspot.com/> 
> -- 
> 
> --- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "OrientDB" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/orient-database/p0JF5IGsqcs/unsubscribe 
> <https://groups.google.com/d/topic/orient-database/p0JF5IGsqcs/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Indices and Memory Usage

Reply via email to