Re: [orientdb] Titan-Cassandra Combination vs OrientDB

LSP Mon, 13 Jan 2014 07:50:06 -0800

Hello Andrey,

Thanks for the detailed explanation. I am currently building a prototype 
with OrientDB. I am confident that along the process I will be challenged 
with modeling techniques and setting up relationships as my application may 
mandate. I will try and share any insights I may have along the process 
with the community.



Thanks
LSP

On Monday, January 6, 2014 2:51:49 AM UTC-6, Andrey Lomakin wrote:
>
> Hi,
> Well, I will try answer your question.
>
> There are several primary differences between Cassandra and OrientDB.
>
> 1. Primary key handling.
>
> Cassandra:
>
> Cassandra is designed to achieve high write performance so they use LSM 
> trees as underlying data structure for primary key index.
> What does it mean, it means that they achieve high write performance by 
> mitigation of random I/O overhead.
> But trade off of such performance gain includes:
> 1. Memory consumption.
> 2. Disk space consumption.
> 3. Read performance a bit slower than in typical for DBMS B-tree index.
>
> You can think about LSM trees as about several sorted arrays which are 
> stored on the disk and merged by background process.
> So if you want to retrieve entry you should look over all those arrays.
> As result you get complexity which equals to N * log(M) . Where N number 
> of sorted arrays and M number of records in array.
> To avoid N multiplier Cassandra uses bloom filters , bloom filters detect 
> with some probability whether your key is contained in sorted array and you 
> need to find it in this array, or you can skip this array.
> If I remember they use counting ones , so they require at least 3 bits of 
> additional memory , or about 3GB of theoretical overhead (without 
> implementation overhead) for 100 billion of  entries. 
> If you are going to make updates to your records you still have to look 
> through several sorted arrays.
>
> So for Cassandra primary key look up the best complexity is log(M) and 
> worst is N * log(M).
>
> OrientDB:
>
> OrientDB uses list based data structure which uses list index as primary 
> key.
> As result lookup complexity is always O(1). When you create records I/O 
> operations mostly append only so you will not have write speed degradation.
> But record updates use random I/O so they are slower than record creation 
> operations.
>
> To avoid random I/O overhead during updates we are considering to use new 
> cluster implementation it uses much simpler data structure then current one 
> (which means faster) and uses append only approach -  
> https://github.com/orientechnologies/orientdb/issues/1600 .
>
> 2. Secondary key handling.
>
> Cassandra:
>
> As far as I know Cassandra secondary indexes are  limited. You can use 
> hash indexes and as I remember for data with low cardinality like color 
> names, sex and so on. (but you should recheck it I am not Cassandra expert).
>
> OrientDB:
>
> OrientDB  has 2 types of indexes hash index and sb-tree (b-tree based). 
> First guaranties at most 1 I/O operation for read  and at most 3 I/O for 
> writes, the second index has log(M) complexity. 
> In OrientDB you can index almost everything, for example you can index 
> embedded map by value, and then perform containsValue SQL queries using 
> indexes.
>
> But OrientDB indexes are suffer from random I/O , which means that you 
> probably will need to have more nodes in cluster in case of big data.
> We have several issues to fix this disadvantage  - 
> https://github.com/orientechnologies/orientdb/issues/1756 
> https://github.com/orientechnologies/orientdb/issues/1757
>
> 3. Server cluster support.
>
> The primary difference is scalability options , OrientDB does not use DHT 
> in it's cluster which means that you should migrate your data from one 
> cluster to bigger one manually.
> But records can be distributed between nodes using different strategies, 
> round robin is default one. 
>
> 4. Model.
> OrientDB model is more powerful than Blueprints model (but may be Titan 
> provides additional extensions). We support one-to-many relations using not 
> only edges but LINKLIST, LINKSET, LINKMAP data structures.
> Also OrientDB supports embedded documents and multi value properties List, 
> Set, Map. Also OrientDB SQL language has operators to support all these 
> collections.
>
> Hope this information will help you. 
>
> But please note that we are not Cassandra or Titan experts, and would be 
> better to ask questions about concrete OrientDB features so you will have 
> ability to compare both implementations.
>
>
>
>
>  
>  
>  
>  
>
>
>
> On Fri, Jan 3, 2014 at 12:04 AM, LSP <[email protected] <javascript:>>wrote:
>
>> Hi All, 
>>
>> We are currently in the process of building statistical analysis system. 
>> As a part of technology evaluation and due diligence we are drawing a 
>> comparison between Titan-Cassandra combination vs OrientDB.
>>
>> There was a topic in these forums that compared Cassandra and OrientDB 
>> (last update in October 2012). The comparison was quite succinct within the 
>> applicable context and the points therein have been factored in as a part 
>> of the due diligence. The biggest difference is obviously the fact that the 
>> comparison was between a columnar DB and a graph DB. The inclusion of Titan 
>> into this discussion makes it apples to apples comparison. Besides, a lot 
>> has changed between October 2012 and January 2014 for OrientDB (Hazelcast 
>> support, Multi-master support etc)
>>
>> Following is a high level summary of the scale requirements and internal 
>> design consensus we have:
>>
>>    1. 500-750 billion live samples per year (at this point in time we do 
>>    not have visibility if all this will necessarily translate into vertices 
>>    per se). 
>>    2. A federated model/system is acceptable
>>    3. Over and above the 500-750 billion live sample, the application 
>>    will have a couple of million records (just in case an additional drop 
>>    created chaos in the ocean :) )
>>    
>>
>> Given that we can store JSON data in Cassandra (with the knowledge that 
>> marshalling and umarshalling will induce latency) and Titan can provide 
>> graph relationship, what, in the estimation of this community tips the 
>> scales in favor of OrientDB.
>>
>> At the time of this writing, I have only managed to scratch the surface 
>> and I am relatively new to NoSQL and Big Data systems in general. So, if 
>> the question lacks clarity/depth, please let me know and I will share any 
>> additional information required
>>
>> Thanks
>> LSP
>> PS - Wishing you all a happy new year and a great 2014.
>>
>> -- 
>>  
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
> -- 
> Best regards,
> Andrey Lomakin.
>
> Orient Technologies
> the Company behind OrientDB
>
>  

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [orientdb] Titan-Cassandra Combination vs OrientDB

Reply via email to