Hi, No you are doing good.
On Fri, Mar 14, 2014 at 3:38 PM, <[email protected]> wrote: > Hi, > > I use hash index for entity-references as I do not need range queries and > they are a lot faster, right? > > Currently i do ... : > > 1. Fetch all entities that are involved: .getValues(<iKeys>,true) > 2. Fetch each vertex found with graph.getVertex(<id>) > 3. Store all the existing vertexes in Map/cache (or create temporary > vertexes if missing) > 4. Use the cache when I need to create an edge on the event.vertex > > Is there a faster way to get to all the vertexes then to repeat step 2 for > all found entries in the index? > > Regards, > -Stefán > > > > > Regards, > -Stefán > > > On Friday, 14 March 2014 13:23:59 UTC, Andrey Lomakin wrote: > >> Hi Stefan, >> Which indexes do you use hash index or default index ? >> Do you need range queries ? >> >> >> On Fri, Mar 14, 2014 at 1:51 PM, <[email protected]> wrote: >> >>> Hi, >>> >>> I have been following OrientDB for some years now and I'm finally an >>> active user. >>> I'm not an avid user yet but after numerous delays the past few weeks >>> have solely been focused on OrientDB development. >>> >>> The project that I'm working on is quite demanding as it includes >>> storing event information that pile up, 70+ million daily, where each event >>> references several entities/vertexes. >>> It may even be crazy to try to store this in a graph but that remains to >>> bee seen. >>> >>> I have been reading everything on optimization here and in the github >>> documentation and feel that it's a popular topic that not many have >>> mastered. >>> >>> In the following days I hope to optimize my code and share the results >>> and the tricks along the way, eventually contributing a small tutorial. >>> My hope is to get good advice from the community during this journey and >>> hopefully give something back. >>> >>> Current status: >>> >>> - 500 events are saved in approx. 520ms. (I need to double this to >>> be on the safe side (2k events per second)) >>> - Each event produces ~5 vertexes/edges on average (so each batch is >>> really about 2.5k records) >>> - Since 1.7-rc1-SNAPSHOT the performance seems sustainable but it >>> degraded quickly with older releases. (Thank you for linkbag!) >>> >>> The current approach is: >>> >>> 1. Load a batch of 500 events (json) into very light container >>> objects (POJOs) (~10ms) >>> 2. Begin transaction >>> 3. Resolve all referenced vertexes in one go (getValues(iKeys,true)) >>> -> graph.getVertex( id ) -> store in a map/cache or create temporary >>> vertexes if missing (~45 ms) >>> 4. Loop through all the events (~215 ms) >>> 1. create temporary vertex >>> 2. add required edges (uses prepared cache for vertex lookup) >>> 3. save the event vertex >>> 5. Commit transaction (~250 ms) >>> >>> Current limitations/factors: >>> >>> 1. The whole process is single threaded >>> 2. The reference index used to look up all vertexes in one go is >>> non-unique (making it unique will perhaps speed things up a bit) >>> 3. Loading vertexes from index is now done for each returned item >>> 1-> (getValues(iKeys,true)) 2-> graph.getVertex(id) >>> I'm not sure if this fetches anything or if it's only lazy loaded >>> when used in step 4.2 (above). >>> 4. Only the most basic indexes are in place (adding new will likely >>> affect write performance) >>> 5. I'm somewhat concerned with the space this takes as currently all >>> data is stored as strings (will be changed to binary in version 2 of >>> OrientDB) >>> >>> I'm sorry that I can not share the whole code or the classified data but >>> I will share meaningful code examples once I'm further along, >>> >>> Please feel free to add comments or suggestions. I will update this >>> thread each time a significant change is made or performance is >>> increased measurably >>> >>> Regards, >>> -Stefán >>> >>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "OrientDB" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> >> -- >> Best regards, >> Andrey Lomakin. >> >> Orient Technologies >> the Company behind OrientDB >> >> -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Best regards, Andrey Lomakin. Orient Technologies the Company behind OrientDB -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
