Hi and thanks, I understand now that this is more of an infrastructure feature than anything else and as such it's great :).
I have been thinking whether or not it would be beneficial for OrientDB to know if a document is immutable (append only) in terms of its content and the indexes involved. We will be storing Business Events information (slightly related to BAM) that should never-ever be changed or deleted. It would be great if the database could accommodate/guaranty that in some way, especially if it also has some speed/performance/compression benefits. Very best regards, -Stefan Baxter On Tuesday, 21 January 2014 17:35:33 UTC, Andrey Lomakin wrote: > > Hi Stefan, > > The append only cluster roughly speaking means that all changes are append > only deletes/updates and additions of course are append only. > But cluster consist of logical segments and those segments they are > defragmented in background and their space is reused. > > So lets suppose we have cluster file with lets say 3 segments each 150MB > (actually a lot of data). > then we have situation like: > > 1-st segment is empty. > 2-nd segment is empty. > > > We do creations, updates deletes and so on. > so we have: > 1-st segment is full > 2-nd segment is half empty > > In background we do 1-st segment defragmentation. we load 1-st segment in > memory then extract only useful data (drop all out of dated after updates > and deletes) and put in 2-nd segment. > So as result we have: > > 1-st segment is empty > 2-nd segment is full > > and we start to work(add data to) with 1-st segment once again. > > So virtually you always append data. > It gives following advantages: > 1. you work without random i/o (only small fraction of operations will be > suffer from random io). > 2. More scalable from mulitthreading point of view , you append only data > so reads do not compete with writes. > > From user perspective all operations are supported. > > > > > On Tue, Jan 21, 2014 at 7:03 PM, <[email protected] > <javascript:>>wrote: > >> Hi, >> >> I'm a bit curious on the "append only cluster" as append-only is a part >> of our use case. >> In our case there will be some information (some document classes) that >> will be append only while others can be updated. >> >> Will you have a way to support mixed mode like that and what do you think >> the benefits of append-only will be in terms of speed/performance? >> >> Regards, >> -Stefan Baxter >> >> >> >> On Tuesday, 21 January 2014 08:38:27 UTC, Andrey Lomakin wrote: >> >>> Hi Jun, >>> Both of issues which you described are fixed in https://github.com/ >>> orientechnologies/orientdb/tree/rid-set-sbtree branch (we do not >>> support remote storage yet) but as I can see you use embedded storage any >>> way. >>> Could you use plocal storage for your tests. >>> >>> About memory consumption OrientDB uses heap and direct memory (it >>> consumes 4GB by default) if you would like to decrease amount of consumed >>> memory you can set storage.diskCache.bufferSize property (in megabytes). >>> Also about blueprints-orient-graph-2.5.0-SNAPSHOT dependency, it is >>> not needed any more, blueprints implementation is embedded in graphdb so >>> please drop this dependency. >>> >>> >>> P.S. And finally about comparison to Neo4J insertion speed we have >>> proposal for append only cluster which should improve insertion speed. >>> P.S.2 looking forward for your feedback ! >>> >>> >>> >>> On Fri, Jan 17, 2014 at 10:56 PM, Jun Xu <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> I'm evaluating different graph database products and am new to >>>> OrientDB. One use case I'm testing now is loading data to graph database. >>>> The use case basically is building a graph with half million vertices and >>>> a >>>> few millions of edges. I'm using OrientDB 1.6.4 on a CentOS Linux box with >>>> 8GB of memory and the CentOS version is 5.10 and the JDK is 1.7.0_40. The >>>> blueprints version is blueprints-core-2.5.0-SNAPSHOT >>>> and blueprints-orient-graph-2.5.0-SNAPSHOT. >>>> >>>> I use OrientGraph to build the graph. During initialization, it creates >>>> an OrientGraph instance ("plocal" or "local" storage engine) and creates a >>>> few key indices using createKeyIndex on vertex nodes. The building process >>>> does index based lookups (OrientGraph.getVertices()) on vertices and based >>>> whether the vertices exist or not, it will create them and set properties, >>>> or create edges and set properties on edges. There are no global index >>>> based lookups on edges. Edges are always reached via vertices. I load the >>>> data in batches (each batch probably has a few hundreds operations like >>>> looking up a vertex, creating a vertex, getting all edges of a vertex, >>>> creating an edge and setting a property etc.) and commit transaction at >>>> the >>>> end of each batch. After processing around 300 batches, an exception of >>>> "Maximum lock count exceeded" was thrown. I tried both "local" and >>>> "plocal" >>>> storage engine and got the same exception. I searched this group and got >>>> to >>>> know that OrientDB used to have this bug in very old versions and I'm >>>> using >>>> the latest version (1.6.4). >>>> >>>> Since the exception was thrown in transaction commit, I changed to use >>>> the OrientGraphNoTx interface. Without transaction enabled, I did not get >>>> the "Maximum lock count exceeded" exception but I noticed that the process >>>> was really eager for memory. Giving JVM 4GB of max memory, the speed was >>>> OK >>>> although still slower than Neo4j for the same process. I did not let the >>>> process finish once I saw the memory usage growing to 3GB. I restarted the >>>> process by giving JVM only 1GB of maximum memory and after running the >>>> process for 2 and half hours, an OutOfMemoryError was thrown. While with >>>> Neo4j, the whole loading process was finished using 1GB of maximum memory >>>> with quite good performance. >>>> >>>> Another thing I noticed was that the database size on disk is much >>>> bigger than the database size using Neo4j. At half way of the loading >>>> process, the OrientDB DB directory is already at 4GB, while for Neo4j the >>>> DB directory size is only 1.6GB after the whole loading process is >>>> finished. >>>> >>>> I actually really like the way OrientDB is designed, the mix of >>>> document and graph features and the binary protocol on remote interfaces. >>>> I >>>> really appreciate if you can help me get around the hurdles mentioned >>>> above. I might have done something wrong or maybe there are some tuning >>>> can >>>> be done. >>>> >>>> Thanks. >>>> Jun >>>> >>>> -- >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "OrientDB" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>> >>> >>> >>> -- >>> Best regards, >>> Andrey Lomakin. >>> >>> Orient Technologies >>> the Company behind OrientDB >>> >>> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "OrientDB" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > > > -- > Best regards, > Andrey Lomakin. > > Orient Technologies > the Company behind OrientDB > > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
