Hello Andrey, Thanks for the detailed explanation. I am currently building a prototype with OrientDB. I am confident that along the process I will be challenged with modeling techniques and setting up relationships as my application may mandate. I will try and share any insights I may have along the process with the community.
Thanks LSP On Monday, January 6, 2014 2:51:49 AM UTC-6, Andrey Lomakin wrote: > > Hi, > Well, I will try answer your question. > > There are several primary differences between Cassandra and OrientDB. > > 1. Primary key handling. > > Cassandra: > > Cassandra is designed to achieve high write performance so they use LSM > trees as underlying data structure for primary key index. > What does it mean, it means that they achieve high write performance by > mitigation of random I/O overhead. > But trade off of such performance gain includes: > 1. Memory consumption. > 2. Disk space consumption. > 3. Read performance a bit slower than in typical for DBMS B-tree index. > > You can think about LSM trees as about several sorted arrays which are > stored on the disk and merged by background process. > So if you want to retrieve entry you should look over all those arrays. > As result you get complexity which equals to N * log(M) . Where N number > of sorted arrays and M number of records in array. > To avoid N multiplier Cassandra uses bloom filters , bloom filters detect > with some probability whether your key is contained in sorted array and you > need to find it in this array, or you can skip this array. > If I remember they use counting ones , so they require at least 3 bits of > additional memory , or about 3GB of theoretical overhead (without > implementation overhead) for 100 billion of entries. > If you are going to make updates to your records you still have to look > through several sorted arrays. > > So for Cassandra primary key look up the best complexity is log(M) and > worst is N * log(M). > > OrientDB: > > OrientDB uses list based data structure which uses list index as primary > key. > As result lookup complexity is always O(1). When you create records I/O > operations mostly append only so you will not have write speed degradation. > But record updates use random I/O so they are slower than record creation > operations. > > To avoid random I/O overhead during updates we are considering to use new > cluster implementation it uses much simpler data structure then current one > (which means faster) and uses append only approach - > https://github.com/orientechnologies/orientdb/issues/1600 . > > 2. Secondary key handling. > > Cassandra: > > As far as I know Cassandra secondary indexes are limited. You can use > hash indexes and as I remember for data with low cardinality like color > names, sex and so on. (but you should recheck it I am not Cassandra expert). > > OrientDB: > > OrientDB has 2 types of indexes hash index and sb-tree (b-tree based). > First guaranties at most 1 I/O operation for read and at most 3 I/O for > writes, the second index has log(M) complexity. > In OrientDB you can index almost everything, for example you can index > embedded map by value, and then perform containsValue SQL queries using > indexes. > > But OrientDB indexes are suffer from random I/O , which means that you > probably will need to have more nodes in cluster in case of big data. > We have several issues to fix this disadvantage - > https://github.com/orientechnologies/orientdb/issues/1756 > https://github.com/orientechnologies/orientdb/issues/1757 > > 3. Server cluster support. > > The primary difference is scalability options , OrientDB does not use DHT > in it's cluster which means that you should migrate your data from one > cluster to bigger one manually. > But records can be distributed between nodes using different strategies, > round robin is default one. > > 4. Model. > OrientDB model is more powerful than Blueprints model (but may be Titan > provides additional extensions). We support one-to-many relations using not > only edges but LINKLIST, LINKSET, LINKMAP data structures. > Also OrientDB supports embedded documents and multi value properties List, > Set, Map. Also OrientDB SQL language has operators to support all these > collections. > > Hope this information will help you. > > But please note that we are not Cassandra or Titan experts, and would be > better to ask questions about concrete OrientDB features so you will have > ability to compare both implementations. > > > > > > > > > > > > On Fri, Jan 3, 2014 at 12:04 AM, LSP <[email protected] <javascript:>>wrote: > >> Hi All, >> >> We are currently in the process of building statistical analysis system. >> As a part of technology evaluation and due diligence we are drawing a >> comparison between Titan-Cassandra combination vs OrientDB. >> >> There was a topic in these forums that compared Cassandra and OrientDB >> (last update in October 2012). The comparison was quite succinct within the >> applicable context and the points therein have been factored in as a part >> of the due diligence. The biggest difference is obviously the fact that the >> comparison was between a columnar DB and a graph DB. The inclusion of Titan >> into this discussion makes it apples to apples comparison. Besides, a lot >> has changed between October 2012 and January 2014 for OrientDB (Hazelcast >> support, Multi-master support etc) >> >> Following is a high level summary of the scale requirements and internal >> design consensus we have: >> >> 1. 500-750 billion live samples per year (at this point in time we do >> not have visibility if all this will necessarily translate into vertices >> per se). >> 2. A federated model/system is acceptable >> 3. Over and above the 500-750 billion live sample, the application >> will have a couple of million records (just in case an additional drop >> created chaos in the ocean :) ) >> >> >> Given that we can store JSON data in Cassandra (with the knowledge that >> marshalling and umarshalling will induce latency) and Titan can provide >> graph relationship, what, in the estimation of this community tips the >> scales in favor of OrientDB. >> >> At the time of this writing, I have only managed to scratch the surface >> and I am relatively new to NoSQL and Big Data systems in general. So, if >> the question lacks clarity/depth, please let me know and I will share any >> additional information required >> >> Thanks >> LSP >> PS - Wishing you all a happy new year and a great 2014. >> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "OrientDB" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > > > -- > Best regards, > Andrey Lomakin. > > Orient Technologies > the Company behind OrientDB > > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
