Hi Stefan,
Which indexes do you use hash index or default index ?
Do you need range queries ?


On Fri, Mar 14, 2014 at 1:51 PM, <[email protected]> wrote:

> Hi,
>
> I have been following OrientDB for some years now and I'm finally an
> active user.
> I'm not an avid user yet but after numerous delays the past few weeks have
> solely been focused on OrientDB development.
>
> The project that I'm working on is quite demanding as it includes storing
> event information that pile up, 70+ million daily, where each event
> references several entities/vertexes.
> It may even be crazy to try to store this in a graph but that remains to
> bee seen.
>
> I have been reading everything on optimization here and in the github
> documentation and feel that it's a popular topic that not many have
> mastered.
>
> In the following days I hope to optimize my code and share the results and
> the tricks along the way, eventually contributing a small tutorial.
> My hope is to get good advice from the community during this journey and
> hopefully give something back.
>
> Current status:
>
>    - 500 events are saved in approx. 520ms. (I need to double this to be
>    on the safe side (2k events per second))
>    - Each event produces ~5 vertexes/edges on average (so each batch is
>    really about 2.5k records)
>    - Since 1.7-rc1-SNAPSHOT the performance seems sustainable but it
>    degraded quickly with older releases. (Thank you for linkbag!)
>
> The current approach is:
>
>    1. Load a batch of 500 events (json) into very light container objects
>    (POJOs) (~10ms)
>    2. Begin transaction
>    3. Resolve all referenced vertexes in one go (getValues(iKeys,true))
>    -> graph.getVertex( id ) -> store in a map/cache or create temporary
>    vertexes if missing (~45 ms)
>    4. Loop through all the events (~215 ms)
>       1. create temporary vertex
>       2. add required edges (uses prepared cache for vertex lookup)
>       3. save the event vertex
>    5. Commit transaction (~250 ms)
>
> Current limitations/factors:
>
>    1. The whole process is single threaded
>    2. The reference index used to look up all vertexes in one go is
>    non-unique (making it unique will perhaps speed things up a bit)
>    3. Loading vertexes from index is now done for each returned item 1-> 
> (getValues(iKeys,true))
>    2-> graph.getVertex(id)
>    I'm not sure if this fetches anything or if it's only lazy loaded when
>    used in step 4.2 (above).
>    4. Only the most basic indexes are in place (adding new will likely
>    affect write performance)
>    5. I'm somewhat concerned with the space this takes as currently all
>    data is stored as strings (will be changed to binary in version 2 of
>    OrientDB)
>
> I'm sorry that I can not share the whole code or the classified data but I
> will share meaningful code examples once I'm further along,
>
> Please feel free to add comments or suggestions. I will update this thread
> each time a significant change is made or performance is
> increased measurably
>
> Regards,
>   -Stefán
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "OrientDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Best regards,
Andrey Lomakin.

Orient Technologies
the Company behind OrientDB

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to