Hi, Thank you !
On Mon, Jan 13, 2014 at 5:48 PM, LSP <[email protected]> wrote: > Hello Andrey, > > Thanks for the detailed explanation. I am currently building a prototype > with OrientDB. I am confident that along the process I will be challenged > with modeling techniques and setting up relationships as my application may > mandate. I will try and share any insights I may have along the process > with the community. > > > Thanks > LSP > > On Monday, January 6, 2014 2:51:49 AM UTC-6, Andrey Lomakin wrote: > >> Hi, >> Well, I will try answer your question. >> >> There are several primary differences between Cassandra and OrientDB. >> >> 1. Primary key handling. >> >> Cassandra: >> >> Cassandra is designed to achieve high write performance so they use LSM >> trees as underlying data structure for primary key index. >> What does it mean, it means that they achieve high write performance by >> mitigation of random I/O overhead. >> But trade off of such performance gain includes: >> 1. Memory consumption. >> 2. Disk space consumption. >> 3. Read performance a bit slower than in typical for DBMS B-tree index. >> >> You can think about LSM trees as about several sorted arrays which are >> stored on the disk and merged by background process. >> So if you want to retrieve entry you should look over all those arrays. >> As result you get complexity which equals to N * log(M) . Where N number >> of sorted arrays and M number of records in array. >> To avoid N multiplier Cassandra uses bloom filters , bloom filters detect >> with some probability whether your key is contained in sorted array and you >> need to find it in this array, or you can skip this array. >> If I remember they use counting ones , so they require at least 3 bits of >> additional memory , or about 3GB of theoretical overhead (without >> implementation overhead) for 100 billion of entries. >> If you are going to make updates to your records you still have to look >> through several sorted arrays. >> >> So for Cassandra primary key look up the best complexity is log(M) and >> worst is N * log(M). >> >> OrientDB: >> >> OrientDB uses list based data structure which uses list index as primary >> key. >> As result lookup complexity is always O(1). When you create records I/O >> operations mostly append only so you will not have write speed degradation. >> But record updates use random I/O so they are slower than record creation >> operations. >> >> To avoid random I/O overhead during updates we are considering to use new >> cluster implementation it uses much simpler data structure then current one >> (which means faster) and uses append only approach - https://github.com/ >> orientechnologies/orientdb/issues/1600 . >> >> 2. Secondary key handling. >> >> Cassandra: >> >> As far as I know Cassandra secondary indexes are limited. You can use >> hash indexes and as I remember for data with low cardinality like color >> names, sex and so on. (but you should recheck it I am not Cassandra expert). >> >> OrientDB: >> >> OrientDB has 2 types of indexes hash index and sb-tree (b-tree based). >> First guaranties at most 1 I/O operation for read and at most 3 I/O for >> writes, the second index has log(M) complexity. >> In OrientDB you can index almost everything, for example you can index >> embedded map by value, and then perform containsValue SQL queries using >> indexes. >> >> But OrientDB indexes are suffer from random I/O , which means that you >> probably will need to have more nodes in cluster in case of big data. >> We have several issues to fix this disadvantage - https://github.com/ >> orientechnologies/orientdb/issues/1756 https://github.com/ >> orientechnologies/orientdb/issues/1757 >> >> 3. Server cluster support. >> >> The primary difference is scalability options , OrientDB does not use DHT >> in it's cluster which means that you should migrate your data from one >> cluster to bigger one manually. >> But records can be distributed between nodes using different strategies, >> round robin is default one. >> >> 4. Model. >> OrientDB model is more powerful than Blueprints model (but may be Titan >> provides additional extensions). We support one-to-many relations using not >> only edges but LINKLIST, LINKSET, LINKMAP data structures. >> Also OrientDB supports embedded documents and multi value properties >> List, Set, Map. Also OrientDB SQL language has operators to support all >> these collections. >> >> Hope this information will help you. >> >> But please note that we are not Cassandra or Titan experts, and would be >> better to ask questions about concrete OrientDB features so you will have >> ability to compare both implementations. >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Jan 3, 2014 at 12:04 AM, LSP <[email protected]> wrote: >> >>> Hi All, >>> >>> We are currently in the process of building statistical analysis system. >>> As a part of technology evaluation and due diligence we are drawing a >>> comparison between Titan-Cassandra combination vs OrientDB. >>> >>> There was a topic in these forums that compared Cassandra and OrientDB >>> (last update in October 2012). The comparison was quite succinct within the >>> applicable context and the points therein have been factored in as a part >>> of the due diligence. The biggest difference is obviously the fact that the >>> comparison was between a columnar DB and a graph DB. The inclusion of Titan >>> into this discussion makes it apples to apples comparison. Besides, a lot >>> has changed between October 2012 and January 2014 for OrientDB (Hazelcast >>> support, Multi-master support etc) >>> >>> Following is a high level summary of the scale requirements and internal >>> design consensus we have: >>> >>> 1. 500-750 billion live samples per year (at this point in time we >>> do not have visibility if all this will necessarily translate into >>> vertices >>> per se). >>> 2. A federated model/system is acceptable >>> 3. Over and above the 500-750 billion live sample, the application >>> will have a couple of million records (just in case an additional drop >>> created chaos in the ocean :) ) >>> >>> >>> Given that we can store JSON data in Cassandra (with the knowledge that >>> marshalling and umarshalling will induce latency) and Titan can provide >>> graph relationship, what, in the estimation of this community tips the >>> scales in favor of OrientDB. >>> >>> At the time of this writing, I have only managed to scratch the surface >>> and I am relatively new to NoSQL and Big Data systems in general. So, if >>> the question lacks clarity/depth, please let me know and I will share any >>> additional information required >>> >>> Thanks >>> LSP >>> PS - Wishing you all a happy new year and a great 2014. >>> >>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "OrientDB" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> >> >> -- >> Best regards, >> Andrey Lomakin. >> >> Orient Technologies >> the Company behind OrientDB >> >> -- > > --- > You received this message because you are subscribed to the Google Groups > "OrientDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- Best regards, Andrey Lomakin. Orient Technologies the Company behind OrientDB -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
