Hi,
I have been following OrientDB for some years now and I'm finally an active
user.
I'm not an avid user yet but after numerous delays the past few weeks have
solely been focused on OrientDB development.
The project that I'm working on is quite demanding as it includes storing
event information that pile up, 70+ million daily, where each event
references several entities/vertexes.
It may even be crazy to try to store this in a graph but that remains to
bee seen.
I have been reading everything on optimization here and in the github
documentation and feel that it's a popular topic that not many have
mastered.
In the following days I hope to optimize my code and share the results and
the tricks along the way, eventually contributing a small tutorial.
My hope is to get good advice from the community during this journey and
hopefully give something back.
Current status:
- 500 events are saved in approx. 520ms. (I need to double this to be on
the safe side (2k events per second))
- Each event produces ~5 vertexes/edges on average (so each batch is
really about 2.5k records)
- Since 1.7-rc1-SNAPSHOT the performance seems sustainable but it
degraded quickly with older releases. (Thank you for linkbag!)
The current approach is:
1. Load a batch of 500 events (json) into very light container objects
(POJOs) (~10ms)
2. Begin transaction
3. Resolve all referenced vertexes in one go (getValues(iKeys,true))
-> graph.getVertex( id ) -> store in a map/cache or create temporary
vertexes if missing (~45 ms)
4. Loop through all the events (~215 ms)
1. create temporary vertex
2. add required edges (uses prepared cache for vertex lookup)
3. save the event vertex
5. Commit transaction (~250 ms)
Current limitations/factors:
1. The whole process is single threaded
2. The reference index used to look up all vertexes in one go is
non-unique (making it unique will perhaps speed things up a bit)
3. Loading vertexes from index is now done for each returned item 1->
(getValues(iKeys,true))
2-> graph.getVertex(id)
I'm not sure if this fetches anything or if it's only lazy loaded when
used in step 4.2 (above).
4. Only the most basic indexes are in place (adding new will likely
affect write performance)
5. I'm somewhat concerned with the space this takes as currently all
data is stored as strings (will be changed to binary in version 2 of
OrientDB)
I'm sorry that I can not share the whole code or the classified data but I
will share meaningful code examples once I'm further along,
Please feel free to add comments or suggestions. I will update this thread
each time a significant change is made or performance is
increased measurably
Regards,
-Stefán
--
---
You received this message because you are subscribed to the Google Groups
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.