Re: [orientdb] Indices and Memory Usage

Luigi Dell'Aquila Fri, 05 May 2017 00:51:37 -0700

Hi John,

How are you doing the import? Are you working in transaction? Some code
will help us understand where the problem is


Thanks

Luigi


2017-05-05 3:53 GMT+02:00 John J. Szucs <[email protected]>:

> Hello, OrientDB community! It's me again with another question.
>
> I am still working on my project and have encountered another serious
> challenge: it seems that writing to indices (especially edge indices?) can
> cause OrientDB's direct (non-JVM) memory usage to grow without bounds until
> the system effectively grinds to a halt due to swap.
>
> The specific use case is building a graph based on (English) Wikipedia.
> There are approximately 17.4M vertices representing pages (including
> articles, categories, and various meta pages). These vertices are connected
> by approximately 65M (at last count) edges. There are a few super-nodes.
> For example, the vertex representing https://en.wikipedia.org/wiki/
> United_States has (at last count) 306K incoming edges and 822 outgoing
> edges. However, the degree of the vertices roughly follows a Zipf
> distribution and the vast majority of vertices have only a few (<10) total
> (in and out) edges. There are also some other vertex and edge types for
> lexical data, but I think those are secondary to the issue.
>
> Per previous discussion here and on StackOverflow, I have added automatic
> edge indices on in, out, or the composite of the two to optimize edge
> queries. When I run the process to extract, transform, and load the data
> from Wikipedia's XML dumps (using my own ETL code, not OrientDB's), after
> 24-48 hours, the Linux System Monitor shows that physical memory usage has
> reached 99.9% and then swap usage begins to grow. At this point, the
> process is effectively halted by swap thrashing.
>
> I am running this on a Fedora 25 Linux VM with 64GB RAM and 16 CPU cores
> allocated. The JVM settings are as follows:
>
> -Xmx32g -Xms32g -server -XX:+PerfDisableSharedMem -XX:+UseG1GC
> -XX:MaxDirectMemorySize=64413m -Dstorage.wal.syncOnPageFlush=false
>
> The MaxDirectMemorySize parameter is recommended by OrientDB itself,
> during start-up with the "out-of-memory errors" warning. It does seem odd
> to me that Xmx+MaxDirectMemorySize>available RAM, but I'm more of a deep
> R&D (not DevOps) guy, so I'm just accepting that unless someone advises me
> otherwise.
>
> If I disable the edge indices, then the process runs fine and completes in
> a "reasonable" (for it) amount of time: 2-3 days. Of course, if I do this,
> my run-time performance suffers intolerably.
>
> I am running this with OrientDB 2.2.19. I was able to quickly get my code
> to build with 3.0 M1, but some of the unit tests fail and I am under far
> too much pressure about this issue from my leadership to try to
> troubleshoot them right now.
>
> What can I do to solve this issue? Thanks in advance for your help!
>
> -- John
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Indices and Memory Usage

Reply via email to