Re: [orientdb] Indices and Memory Usage

John J. Szucs Fri, 05 May 2017 09:57:36 -0700

Andrey,

THANK YOU! I will give this a try as soon as I can.


I will also do some JVM profi

— John

On May 5, 2017, at 05:05, Andrey Lomakin <[email protected]> wrote:

Hi John,
If you wish you could use this build till we will do official release
https://drive.google.com/file/d/0B2oZq2xVp841T2diVGtTcmZ5OTQ/
view?usp=sharing

On Fri, May 5, 2017 at 11:58 AM Andrey Lomakin <[email protected]>
wrote:

> HI John,
>
> I suppose you encountered issue https://github.com/
> orientechnologies/orientdb/issues/7390
> We will provide release soon.
>
> Also please do not use such huge heap size we use heap only to keep
> temporary data, so I suggest you lower heap size to get ODB the chance to
> use more direct memory.
>
> On Fri, May 5, 2017 at 10:51 AM Luigi Dell'Aquila <
> [email protected]> wrote:
>
>> Hi John,
>>
>> How are you doing the import? Are you working in transaction? Some code
>> will help us understand where the problem is
>>
>> Thanks
>>
>> Luigi
>>
>>
>> 2017-05-05 3:53 GMT+02:00 John J. Szucs <[email protected]>:
>>
>>> Hello, OrientDB community! It's me again with another question.
>>>
>>> I am still working on my project and have encountered another serious
>>> challenge: it seems that writing to indices (especially edge indices?) can
>>> cause OrientDB's direct (non-JVM) memory usage to grow without bounds until
>>> the system effectively grinds to a halt due to swap.
>>>
>>> The specific use case is building a graph based on (English) Wikipedia.
>>> There are approximately 17.4M vertices representing pages (including
>>> articles, categories, and various meta pages). These vertices are connected
>>> by approximately 65M (at last count) edges. There are a few super-nodes.
>>> For example, the vertex representing https://en.
>>> wikipedia.org/wiki/United_States has (at last count) 306K incoming
>>> edges and 822 outgoing edges. However, the degree of the vertices roughly
>>> follows a Zipf distribution and the vast majority of vertices have only a
>>> few (<10) total (in and out) edges. There are also some other vertex and
>>> edge types for lexical data, but I think those are secondary to the issue.
>>>
>>> Per previous discussion here and on StackOverflow, I have added
>>> automatic edge indices on in, out, or the composite of the two to optimize
>>> edge queries. When I run the process to extract, transform, and load the
>>> data from Wikipedia's XML dumps (using my own ETL code, not OrientDB's),
>>> after 24-48 hours, the Linux System Monitor shows that physical memory
>>> usage has reached 99.9% and then swap usage begins to grow. At this point,
>>> the process is effectively halted by swap thrashing.
>>>
>>> I am running this on a Fedora 25 Linux VM with 64GB RAM and 16 CPU cores
>>> allocated. The JVM settings are as follows:
>>>
>>> -Xmx32g -Xms32g -server -XX:+PerfDisableSharedMem -XX:+UseG1GC
>>> -XX:MaxDirectMemorySize=64413m -Dstorage.wal.syncOnPageFlush=false
>>>
>>> The MaxDirectMemorySize parameter is recommended by OrientDB itself,
>>> during start-up with the "out-of-memory errors" warning. It does seem odd
>>> to me that Xmx+MaxDirectMemorySize>available RAM, but I'm more of a
>>> deep R&D (not DevOps) guy, so I'm just accepting that unless someone
>>> advises me otherwise.
>>>
>>> If I disable the edge indices, then the process runs fine and completes
>>> in a "reasonable" (for it) amount of time: 2-3 days. Of course, if I do
>>> this, my run-time performance suffers intolerably.
>>>
>>> I am running this with OrientDB 2.2.19. I was able to quickly get my
>>> code to build with 3.0 M1, but some of the unit tests fail and I am under
>>> far too much pressure about this issue from my leadership to try to
>>> troubleshoot them right now.
>>>
>>> What can I do to solve this issue? Thanks in advance for your help!
>>>
>>> -- John
>>>
>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> Best regards,
> Andrey Lomakin, R&D lead.
> OrientDB Ltd
>
> twitter: @Andrey_Lomakin
> linkedin: https://ua.linkedin.com/in/andreylomakin
> blogger: http://andreylomakin.blogspot.com/
>
-- 
Best regards,
Andrey Lomakin, R&D lead.
OrientDB Ltd

twitter: @Andrey_Lomakin
linkedin: https://ua.linkedin.com/in/andreylomakin
blogger: http://andreylomakin.blogspot.com/

-- 

---
You received this message because you are subscribed to a topic in the
Google Groups "OrientDB" group.
To unsubscribe from this topic, visit https://groups.google.
com/d/topic/orient-database/p0JF5IGsqcs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
[email protected].
For more options, visit https://groups.google.com/d/optout.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Indices and Memory Usage

Reply via email to