Hello, OrientDB community! It's me again with another question.

I am still working on my project and have encountered another serious 
challenge: it seems that writing to indices (especially edge indices?) can 
cause OrientDB's direct (non-JVM) memory usage to grow without bounds until 
the system effectively grinds to a halt due to swap.

The specific use case is building a graph based on (English) Wikipedia. 
There are approximately 17.4M vertices representing pages (including 
articles, categories, and various meta pages). These vertices are connected 
by approximately 65M (at last count) edges. There are a few super-nodes. 
For example, the vertex representing 
https://en.wikipedia.org/wiki/United_States has (at last count) 306K 
incoming edges and 822 outgoing edges. However, the degree of the vertices 
roughly follows a Zipf distribution and the vast majority of vertices have 
only a few (<10) total (in and out) edges. There are also some other vertex 
and edge types for lexical data, but I think those are secondary to the 
issue.

Per previous discussion here and on StackOverflow, I have added automatic 
edge indices on in, out, or the composite of the two to optimize edge 
queries. When I run the process to extract, transform, and load the data 
from Wikipedia's XML dumps (using my own ETL code, not OrientDB's), after 
24-48 hours, the Linux System Monitor shows that physical memory usage has 
reached 99.9% and then swap usage begins to grow. At this point, the 
process is effectively halted by swap thrashing.

I am running this on a Fedora 25 Linux VM with 64GB RAM and 16 CPU cores 
allocated. The JVM settings are as follows:

-Xmx32g -Xms32g -server -XX:+PerfDisableSharedMem -XX:+UseG1GC 
-XX:MaxDirectMemorySize=64413m -Dstorage.wal.syncOnPageFlush=false

The MaxDirectMemorySize parameter is recommended by OrientDB itself, during 
start-up with the "out-of-memory errors" warning. It does seem odd to me 
that Xmx+MaxDirectMemorySize>available RAM, but I'm more of a deep R&D (not 
DevOps) guy, so I'm just accepting that unless someone advises me otherwise.

If I disable the edge indices, then the process runs fine and completes in 
a "reasonable" (for it) amount of time: 2-3 days. Of course, if I do this, 
my run-time performance suffers intolerably.

I am running this with OrientDB 2.2.19. I was able to quickly get my code 
to build with 3.0 M1, but some of the unit tests fail and I am under far 
too much pressure about this issue from my leadership to try to 
troubleshoot them right now.

What can I do to solve this issue? Thanks in advance for your help!

-- John

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to