Hi and thanks,

I understand now that this is more of an infrastructure feature than 
anything else and as such it's great :).

I have been thinking whether or not it would be beneficial for OrientDB to 
know if a document is immutable (append only) in terms of its content and 
the indexes involved.

We will be storing Business Events information (slightly related to BAM) 
 that should never-ever be changed or deleted. It would be great if the 
database could accommodate/guaranty that in some way, especially if it also 
has some speed/performance/compression benefits.

Very best regards,
  -Stefan Baxter

On Tuesday, 21 January 2014 17:35:33 UTC, Andrey Lomakin wrote:
>
> Hi Stefan,
>
> The append only cluster roughly speaking means that all changes are append 
> only deletes/updates and additions of course are append only.
> But cluster consist of logical segments and those segments they are 
> defragmented in background and their space is reused.
>
> So lets suppose we have cluster file with lets say 3 segments each 150MB 
> (actually a lot of data).
> then we have situation like:
>
> 1-st segment is empty. 
> 2-nd segment is empty.
>
>
> We do creations, updates deletes and so on.
> so we have:
> 1-st segment is full
> 2-nd segment is half empty
>
> In background we do 1-st segment defragmentation. we load 1-st segment in 
> memory then extract only useful data (drop all out of dated after updates 
> and deletes) and put in 2-nd segment.
> So as result we have:
>
> 1-st segment is empty
> 2-nd segment is full
>
> and we start to work(add data to) with 1-st segment once again.
>
> So virtually you always append data.
> It gives following advantages:
> 1. you work without random i/o (only small fraction of operations will be 
> suffer from random io).
> 2. More scalable from mulitthreading point of view , you append only data 
> so reads do not compete with writes.
>
> From user perspective all operations are supported.
>
>
>
>
> On Tue, Jan 21, 2014 at 7:03 PM, <[email protected] 
> <javascript:>>wrote:
>
>> Hi,
>>
>> I'm a bit curious on the "append only cluster" as append-only is a part 
>> of our use case. 
>> In our case there will be some information (some document classes) that 
>> will be append only while others can be updated.
>>
>> Will you have a way to support mixed mode like that and what do you think 
>> the benefits of append-only will be in terms of speed/performance?
>>
>> Regards,
>>  -Stefan Baxter
>>
>>
>>
>> On Tuesday, 21 January 2014 08:38:27 UTC, Andrey Lomakin wrote:
>>
>>> Hi Jun,
>>> Both of  issues which you described are fixed in https://github.com/
>>> orientechnologies/orientdb/tree/rid-set-sbtree branch (we do not 
>>> support remote storage yet) but as I can see you use embedded storage any 
>>> way.
>>> Could you use plocal storage for your tests.
>>>
>>> About memory consumption OrientDB uses heap and direct memory (it 
>>> consumes 4GB by default) if you would like to decrease amount of consumed 
>>> memory you can set storage.diskCache.bufferSize property (in megabytes).
>>> Also about  blueprints-orient-graph-2.5.0-SNAPSHOT dependency, it is 
>>> not needed any more, blueprints implementation is embedded in graphdb so 
>>> please drop this dependency.
>>>
>>>
>>> P.S. And finally about comparison to Neo4J insertion speed we have 
>>> proposal for append only cluster which should improve insertion speed.
>>> P.S.2 looking forward for your feedback !
>>>
>>>
>>>
>>> On Fri, Jan 17, 2014 at 10:56 PM, Jun Xu <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm evaluating different graph database products and am new to 
>>>> OrientDB. One use case I'm testing now is loading data to graph database. 
>>>> The use case basically is building a graph with half million vertices and 
>>>> a 
>>>> few millions of edges. I'm using OrientDB 1.6.4 on a CentOS Linux box with 
>>>> 8GB of memory and the CentOS version is 5.10 and the JDK is 1.7.0_40. The 
>>>> blueprints version is blueprints-core-2.5.0-SNAPSHOT 
>>>> and blueprints-orient-graph-2.5.0-SNAPSHOT.
>>>>
>>>> I use OrientGraph to build the graph. During initialization, it creates 
>>>> an OrientGraph instance ("plocal" or "local" storage engine) and creates a 
>>>> few key indices using createKeyIndex on vertex nodes. The building process 
>>>> does index based lookups (OrientGraph.getVertices()) on vertices and based 
>>>> whether the vertices exist or not, it will create them and set properties, 
>>>> or create edges and set properties on edges. There are no global index 
>>>> based lookups on edges. Edges are always reached via vertices. I load the 
>>>> data in batches (each batch probably has a few hundreds operations like 
>>>> looking up a vertex, creating a vertex, getting all edges of a vertex, 
>>>> creating an edge and setting a property etc.) and commit transaction at 
>>>> the 
>>>> end of each batch. After processing around 300 batches, an exception of 
>>>> "Maximum lock count exceeded" was thrown. I tried both "local" and 
>>>> "plocal" 
>>>> storage engine and got the same exception. I searched this group and got 
>>>> to 
>>>> know that OrientDB used to have this bug in very old versions and I'm 
>>>> using 
>>>> the latest version (1.6.4).
>>>>
>>>> Since the exception was thrown in transaction commit, I changed to use 
>>>> the OrientGraphNoTx interface. Without transaction enabled, I did not get 
>>>> the "Maximum lock count exceeded" exception but I noticed that the process 
>>>> was really eager for memory. Giving JVM 4GB of max memory, the speed was 
>>>> OK 
>>>> although still slower than Neo4j for the same process. I did not let the 
>>>> process finish once I saw the memory usage growing to 3GB. I restarted the 
>>>> process by giving JVM only 1GB of maximum memory and after running the 
>>>> process for 2 and half hours, an OutOfMemoryError was thrown. While with 
>>>> Neo4j, the whole loading process was finished using 1GB of maximum memory 
>>>> with quite good performance.
>>>>
>>>> Another thing I noticed was that the database size on disk is much 
>>>> bigger than the database size using Neo4j. At half way of the loading 
>>>> process, the OrientDB DB directory is already at 4GB, while for Neo4j the 
>>>> DB directory size is only 1.6GB after the whole loading process is 
>>>> finished.
>>>>
>>>> I actually really like the way OrientDB is designed, the mix of 
>>>> document and graph features and the binary protocol on remote interfaces. 
>>>> I 
>>>> really appreciate if you can help me get around the hurdles mentioned 
>>>> above. I might have done something wrong or maybe there are some tuning 
>>>> can 
>>>> be done. 
>>>>
>>>> Thanks.
>>>> Jun
>>>>
>>>> -- 
>>>>  
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "OrientDB" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>>
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>
>>>
>>> -- 
>>> Best regards,
>>> Andrey Lomakin.
>>>
>>> Orient Technologies
>>> the Company behind OrientDB
>>>
>>>   -- 
>>  
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
> -- 
> Best regards,
> Andrey Lomakin.
>
> Orient Technologies
> the Company behind OrientDB
>
>  

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to