Hello, OrientDB community! It's me again with another question. I am still working on my project and have encountered another serious challenge: it seems that writing to indices (especially edge indices?) can cause OrientDB's direct (non-JVM) memory usage to grow without bounds until the system effectively grinds to a halt due to swap.
The specific use case is building a graph based on (English) Wikipedia. There are approximately 17.4M vertices representing pages (including articles, categories, and various meta pages). These vertices are connected by approximately 65M (at last count) edges. There are a few super-nodes. For example, the vertex representing https://en.wikipedia.org/wiki/United_States has (at last count) 306K incoming edges and 822 outgoing edges. However, the degree of the vertices roughly follows a Zipf distribution and the vast majority of vertices have only a few (<10) total (in and out) edges. There are also some other vertex and edge types for lexical data, but I think those are secondary to the issue. Per previous discussion here and on StackOverflow, I have added automatic edge indices on in, out, or the composite of the two to optimize edge queries. When I run the process to extract, transform, and load the data from Wikipedia's XML dumps (using my own ETL code, not OrientDB's), after 24-48 hours, the Linux System Monitor shows that physical memory usage has reached 99.9% and then swap usage begins to grow. At this point, the process is effectively halted by swap thrashing. I am running this on a Fedora 25 Linux VM with 64GB RAM and 16 CPU cores allocated. The JVM settings are as follows: -Xmx32g -Xms32g -server -XX:+PerfDisableSharedMem -XX:+UseG1GC -XX:MaxDirectMemorySize=64413m -Dstorage.wal.syncOnPageFlush=false The MaxDirectMemorySize parameter is recommended by OrientDB itself, during start-up with the "out-of-memory errors" warning. It does seem odd to me that Xmx+MaxDirectMemorySize>available RAM, but I'm more of a deep R&D (not DevOps) guy, so I'm just accepting that unless someone advises me otherwise. If I disable the edge indices, then the process runs fine and completes in a "reasonable" (for it) amount of time: 2-3 days. Of course, if I do this, my run-time performance suffers intolerably. I am running this with OrientDB 2.2.19. I was able to quickly get my code to build with 3.0 M1, but some of the unit tests fail and I am under far too much pressure about this issue from my leadership to try to troubleshoot them right now. What can I do to solve this issue? Thanks in advance for your help! -- John -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
