Hi Stefan, Which indexes do you use hash index or default index ? Do you need range queries ?
On Fri, Mar 14, 2014 at 1:51 PM, <[email protected]> wrote: > Hi, > > I have been following OrientDB for some years now and I'm finally an > active user. > I'm not an avid user yet but after numerous delays the past few weeks have > solely been focused on OrientDB development. > > The project that I'm working on is quite demanding as it includes storing > event information that pile up, 70+ million daily, where each event > references several entities/vertexes. > It may even be crazy to try to store this in a graph but that remains to > bee seen. > > I have been reading everything on optimization here and in the github > documentation and feel that it's a popular topic that not many have > mastered. > > In the following days I hope to optimize my code and share the results and > the tricks along the way, eventually contributing a small tutorial. > My hope is to get good advice from the community during this journey and > hopefully give something back. > > Current status: > > - 500 events are saved in approx. 520ms. (I need to double this to be > on the safe side (2k events per second)) > - Each event produces ~5 vertexes/edges on average (so each batch is > really about 2.5k records) > - Since 1.7-rc1-SNAPSHOT the performance seems sustainable but it > degraded quickly with older releases. (Thank you for linkbag!) > > The current approach is: > > 1. Load a batch of 500 events (json) into very light container objects > (POJOs) (~10ms) > 2. Begin transaction > 3. Resolve all referenced vertexes in one go (getValues(iKeys,true)) > -> graph.getVertex( id ) -> store in a map/cache or create temporary > vertexes if missing (~45 ms) > 4. Loop through all the events (~215 ms) > 1. create temporary vertex > 2. add required edges (uses prepared cache for vertex lookup) > 3. save the event vertex > 5. Commit transaction (~250 ms) > > Current limitations/factors: > > 1. The whole process is single threaded > 2. The reference index used to look up all vertexes in one go is > non-unique (making it unique will perhaps speed things up a bit) > 3. Loading vertexes from index is now done for each returned item 1-> > (getValues(iKeys,true)) > 2-> graph.getVertex(id) > I'm not sure if this fetches anything or if it's only lazy loaded when > used in step 4.2 (above). > 4. Only the most basic indexes are in place (adding new will likely > affect write performance) > 5. I'm somewhat concerned with the space this takes as currently all > data is stored as strings (will be changed to binary in version 2 of > OrientDB) > > I'm sorry that I can not share the whole code or the classified data but I > will share meaningful code examples once I'm further along, > > Please feel free to add comments or suggestions. I will update this thread > each time a significant change is made or performance is > increased measurably > > Regards, > -Stefán > > -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Best regards, Andrey Lomakin. Orient Technologies the Company behind OrientDB -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
