Re: [orientdb] Optimizing for maximum performance

stefan Fri, 14 Mar 2014 06:39:09 -0700

Hi,

I use hash index for entity-references as I do not need range queries and 
they are a lot faster, right?


Currently i do ... :

   1. Fetch all entities that are involved:  .getValues(<iKeys>,true) 
   2. Fetch each vertex found with graph.getVertex(<id>) 
   3. Store all the existing vertexes in Map/cache (or create temporary 
   vertexes if missing)
   4. Use the cache when I need to create an edge on the event.vertex

Is there a faster way to get to all the vertexes then to repeat step 2 for 
all found entries in the index?

Regards,
  -Stefán




Regards,
  -Stefán

On Friday, 14 March 2014 13:23:59 UTC, Andrey Lomakin wrote:
>
> Hi Stefan,
> Which indexes do you use hash index or default index ?
> Do you need range queries ?
>
>
> On Fri, Mar 14, 2014 at 1:51 PM, <[email protected] 
> <javascript:>>wrote:
>
>> Hi,
>>
>> I have been following OrientDB for some years now and I'm finally an 
>> active user.
>> I'm not an avid user yet but after numerous delays the past few weeks 
>> have solely been focused on OrientDB development.
>>
>> The project that I'm working on is quite demanding as it includes storing 
>> event information that pile up, 70+ million daily, where each event 
>> references several entities/vertexes. 
>> It may even be crazy to try to store this in a graph but that remains to 
>> bee seen.
>>
>> I have been reading everything on optimization here and in the github 
>> documentation and feel that it's a popular topic that not many have 
>> mastered.
>>
>> In the following days I hope to optimize my code and share the results 
>> and the tricks along the way, eventually contributing a small tutorial.
>> My hope is to get good advice from the community during this journey and 
>> hopefully give something back.
>>
>> Current status:
>>
>>    - 500 events are saved in approx. 520ms. (I need to double this to be 
>>    on the safe side (2k events per second))
>>    - Each event produces ~5 vertexes/edges on average (so each batch is 
>>    really about 2.5k records) 
>>    - Since 1.7-rc1-SNAPSHOT the performance seems sustainable but it 
>>    degraded quickly with older releases. (Thank you for linkbag!)
>>
>> The current approach is:
>>
>>    1. Load a batch of 500 events (json) into very light container 
>>    objects (POJOs) (~10ms)
>>    2. Begin transaction
>>    3. Resolve all referenced vertexes in one go (getValues(iKeys,true)) 
>>    -> graph.getVertex( id ) -> store in a map/cache or create temporary 
>>    vertexes if missing (~45 ms) 
>>    4. Loop through all the events (~215 ms)
>>       1. create temporary vertex
>>       2. add required edges (uses prepared cache for vertex lookup) 
>>       3. save the event vertex 
>>    5. Commit transaction (~250 ms)
>>
>> Current limitations/factors:
>>
>>    1. The whole process is single threaded
>>    2. The reference index used to look up all vertexes in one go is 
>>    non-unique (making it unique will perhaps speed things up a bit) 
>>    3. Loading vertexes from index is now done for each returned item 1-> 
>> (getValues(iKeys,true)) 
>>    2-> graph.getVertex(id) 
>>    I'm not sure if this fetches anything or if it's only lazy loaded 
>>    when used in step 4.2 (above). 
>>    4. Only the most basic indexes are in place (adding new will likely 
>>    affect write performance)
>>    5. I'm somewhat concerned with the space this takes as currently all 
>>    data is stored as strings (will be changed to binary in version 2 of 
>>    OrientDB) 
>>
>> I'm sorry that I can not share the whole code or the classified data but 
>> I will share meaningful code examples once I'm further along,
>>
>> Please feel free to add comments or suggestions. I will update this 
>> thread each time a significant change is made or performance is 
>> increased measurably
>>
>> Regards,
>>   -Stefán
>>
>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Best regards,
> Andrey Lomakin.
>
> Orient Technologies
> the Company behind OrientDB
>
> 

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Optimizing for maximum performance

Reply via email to