Re: [orientdb] Optimizing for maximum performance

Andrey Lomakin Mon, 17 Mar 2014 02:12:08 -0700

Hi,
No you are doing good.


On Fri, Mar 14, 2014 at 3:38 PM, <[email protected]> wrote:

> Hi,
>
> I use hash index for entity-references as I do not need range queries and
> they are a lot faster, right?
>
> Currently i do ... :
>
>    1. Fetch all entities that are involved:  .getValues(<iKeys>,true)
>    2. Fetch each vertex found with graph.getVertex(<id>)
>    3. Store all the existing vertexes in Map/cache (or create temporary
>    vertexes if missing)
>    4. Use the cache when I need to create an edge on the event.vertex
>
> Is there a faster way to get to all the vertexes then to repeat step 2 for
> all found entries in the index?
>
> Regards,
>   -Stefán
>
>
>
>
> Regards,
>   -Stefán
>
>
> On Friday, 14 March 2014 13:23:59 UTC, Andrey Lomakin wrote:
>
>> Hi Stefan,
>> Which indexes do you use hash index or default index ?
>> Do you need range queries ?
>>
>>
>> On Fri, Mar 14, 2014 at 1:51 PM, <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I have been following OrientDB for some years now and I'm finally an
>>> active user.
>>> I'm not an avid user yet but after numerous delays the past few weeks
>>> have solely been focused on OrientDB development.
>>>
>>> The project that I'm working on is quite demanding as it includes
>>> storing event information that pile up, 70+ million daily, where each event
>>> references several entities/vertexes.
>>> It may even be crazy to try to store this in a graph but that remains to
>>> bee seen.
>>>
>>> I have been reading everything on optimization here and in the github
>>> documentation and feel that it's a popular topic that not many have
>>> mastered.
>>>
>>> In the following days I hope to optimize my code and share the results
>>> and the tricks along the way, eventually contributing a small tutorial.
>>> My hope is to get good advice from the community during this journey and
>>> hopefully give something back.
>>>
>>> Current status:
>>>
>>>    - 500 events are saved in approx. 520ms. (I need to double this to
>>>    be on the safe side (2k events per second))
>>>    - Each event produces ~5 vertexes/edges on average (so each batch is
>>>    really about 2.5k records)
>>>    - Since 1.7-rc1-SNAPSHOT the performance seems sustainable but it
>>>    degraded quickly with older releases. (Thank you for linkbag!)
>>>
>>> The current approach is:
>>>
>>>    1. Load a batch of 500 events (json) into very light container
>>>    objects (POJOs) (~10ms)
>>>    2. Begin transaction
>>>    3. Resolve all referenced vertexes in one go (getValues(iKeys,true))
>>>    -> graph.getVertex( id ) -> store in a map/cache or create temporary
>>>    vertexes if missing (~45 ms)
>>>    4. Loop through all the events (~215 ms)
>>>       1. create temporary vertex
>>>       2. add required edges (uses prepared cache for vertex lookup)
>>>       3. save the event vertex
>>>    5. Commit transaction (~250 ms)
>>>
>>> Current limitations/factors:
>>>
>>>    1. The whole process is single threaded
>>>    2. The reference index used to look up all vertexes in one go is
>>>    non-unique (making it unique will perhaps speed things up a bit)
>>>    3. Loading vertexes from index is now done for each returned item
>>>    1-> (getValues(iKeys,true)) 2-> graph.getVertex(id)
>>>    I'm not sure if this fetches anything or if it's only lazy loaded
>>>    when used in step 4.2 (above).
>>>    4. Only the most basic indexes are in place (adding new will likely
>>>    affect write performance)
>>>    5. I'm somewhat concerned with the space this takes as currently all
>>>    data is stored as strings (will be changed to binary in version 2 of
>>>    OrientDB)
>>>
>>> I'm sorry that I can not share the whole code or the classified data but
>>> I will share meaningful code examples once I'm further along,
>>>
>>> Please feel free to add comments or suggestions. I will update this
>>> thread each time a significant change is made or performance is
>>> increased measurably
>>>
>>> Regards,
>>>   -Stefán
>>>
>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Best regards,
>> Andrey Lomakin.
>>
>> Orient Technologies
>> the Company behind OrientDB
>>
>>   --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Best regards,
Andrey Lomakin.

Orient Technologies
the Company behind OrientDB

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Optimizing for maximum performance

Reply via email to