Hi,

I have been following OrientDB for some years now and I'm finally an active 
user.
I'm not an avid user yet but after numerous delays the past few weeks have 
solely been focused on OrientDB development.

The project that I'm working on is quite demanding as it includes storing 
event information that pile up, 70+ million daily, where each event 
references several entities/vertexes. 
It may even be crazy to try to store this in a graph but that remains to 
bee seen.

I have been reading everything on optimization here and in the github 
documentation and feel that it's a popular topic that not many have 
mastered.

In the following days I hope to optimize my code and share the results and 
the tricks along the way, eventually contributing a small tutorial.
My hope is to get good advice from the community during this journey and 
hopefully give something back.

Current status:

   - 500 events are saved in approx. 520ms. (I need to double this to be on 
   the safe side (2k events per second))
   - Each event produces ~5 vertexes/edges on average (so each batch is 
   really about 2.5k records)
   - Since 1.7-rc1-SNAPSHOT the performance seems sustainable but it 
   degraded quickly with older releases. (Thank you for linkbag!)

The current approach is:

   1. Load a batch of 500 events (json) into very light container objects 
   (POJOs) (~10ms)
   2. Begin transaction
   3. Resolve all referenced vertexes in one go (getValues(iKeys,true)) 
   -> graph.getVertex( id ) -> store in a map/cache or create temporary 
   vertexes if missing (~45 ms)
   4. Loop through all the events (~215 ms)
      1. create temporary vertex
      2. add required edges (uses prepared cache for vertex lookup)
      3. save the event vertex 
   5. Commit transaction (~250 ms)

Current limitations/factors:

   1. The whole process is single threaded
   2. The reference index used to look up all vertexes in one go is 
   non-unique (making it unique will perhaps speed things up a bit)
   3. Loading vertexes from index is now done for each returned item 1-> 
(getValues(iKeys,true)) 
   2-> graph.getVertex(id) 
   I'm not sure if this fetches anything or if it's only lazy loaded when 
   used in step 4.2 (above).
   4. Only the most basic indexes are in place (adding new will likely 
   affect write performance)
   5. I'm somewhat concerned with the space this takes as currently all 
   data is stored as strings (will be changed to binary in version 2 of 
   OrientDB)

I'm sorry that I can not share the whole code or the classified data but I 
will share meaningful code examples once I'm further along,

Please feel free to add comments or suggestions. I will update this thread 
each time a significant change is made or performance is 
increased measurably

Regards,
  -Stefán

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to