Hi, I use hash index for entity-references as I do not need range queries and they are a lot faster, right?
Currently i do ... : 1. Fetch all entities that are involved: .getValues(<iKeys>,true) 2. Fetch each vertex found with graph.getVertex(<id>) 3. Store all the existing vertexes in Map/cache (or create temporary vertexes if missing) 4. Use the cache when I need to create an edge on the event.vertex Is there a faster way to get to all the vertexes then to repeat step 2 for all found entries in the index? Regards, -Stefán Regards, -Stefán On Friday, 14 March 2014 13:23:59 UTC, Andrey Lomakin wrote: > > Hi Stefan, > Which indexes do you use hash index or default index ? > Do you need range queries ? > > > On Fri, Mar 14, 2014 at 1:51 PM, <[email protected] > <javascript:>>wrote: > >> Hi, >> >> I have been following OrientDB for some years now and I'm finally an >> active user. >> I'm not an avid user yet but after numerous delays the past few weeks >> have solely been focused on OrientDB development. >> >> The project that I'm working on is quite demanding as it includes storing >> event information that pile up, 70+ million daily, where each event >> references several entities/vertexes. >> It may even be crazy to try to store this in a graph but that remains to >> bee seen. >> >> I have been reading everything on optimization here and in the github >> documentation and feel that it's a popular topic that not many have >> mastered. >> >> In the following days I hope to optimize my code and share the results >> and the tricks along the way, eventually contributing a small tutorial. >> My hope is to get good advice from the community during this journey and >> hopefully give something back. >> >> Current status: >> >> - 500 events are saved in approx. 520ms. (I need to double this to be >> on the safe side (2k events per second)) >> - Each event produces ~5 vertexes/edges on average (so each batch is >> really about 2.5k records) >> - Since 1.7-rc1-SNAPSHOT the performance seems sustainable but it >> degraded quickly with older releases. (Thank you for linkbag!) >> >> The current approach is: >> >> 1. Load a batch of 500 events (json) into very light container >> objects (POJOs) (~10ms) >> 2. Begin transaction >> 3. Resolve all referenced vertexes in one go (getValues(iKeys,true)) >> -> graph.getVertex( id ) -> store in a map/cache or create temporary >> vertexes if missing (~45 ms) >> 4. Loop through all the events (~215 ms) >> 1. create temporary vertex >> 2. add required edges (uses prepared cache for vertex lookup) >> 3. save the event vertex >> 5. Commit transaction (~250 ms) >> >> Current limitations/factors: >> >> 1. The whole process is single threaded >> 2. The reference index used to look up all vertexes in one go is >> non-unique (making it unique will perhaps speed things up a bit) >> 3. Loading vertexes from index is now done for each returned item 1-> >> (getValues(iKeys,true)) >> 2-> graph.getVertex(id) >> I'm not sure if this fetches anything or if it's only lazy loaded >> when used in step 4.2 (above). >> 4. Only the most basic indexes are in place (adding new will likely >> affect write performance) >> 5. I'm somewhat concerned with the space this takes as currently all >> data is stored as strings (will be changed to binary in version 2 of >> OrientDB) >> >> I'm sorry that I can not share the whole code or the classified data but >> I will share meaningful code examples once I'm further along, >> >> Please feel free to add comments or suggestions. I will update this >> thread each time a significant change is made or performance is >> increased measurably >> >> Regards, >> -Stefán >> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "OrientDB" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Best regards, > Andrey Lomakin. > > Orient Technologies > the Company behind OrientDB > > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
