[orientdb] Re: OrientDB Storage Overhead

Eric24 Sat, 28 May 2016 08:57:06 -0700

Thanks Scott. The RECORD* params definitely address the discrepancy between 
the data written and my observed disk space growth (I left them at their 
defaults, which the docs say is 1.2). So, setting these to 1 would 
essentially remove any fluff from the record. That's potentially good, and 
since it's settable on a per-cluster basis, is something that can be easily 
left to the database architect, based on their knowledge of how a 
particular class/cluster will be used (i.e. zero/few updates vs. lots of 
frequent updates, as well as they kind of updates).

In my particular case, at the time the record is initially written, the 
"whole" record will be known, but there will be two possible update 
scenarios: 1) Data updated "in place" with no size change (i.e. updating 
the value of an INTEGER); and 2) adding a lightweight edge. I'll assume 
that scenario #1 does not fragment the record, since it's storage size 
doesn't change (???), but what happens in the case of adding an edge (or 
several)? In that case, I assume that each edge will increase the size of 
the record (but by how much?)?

What might be ideal is a way to specify on a per-record (or per-cluster) 
basis a specific number of padding bytes, when this is known in advance. 
Here again, for many database applications, adding padding is probably a 
good idea (although MongoDB's 2X recommendation seems pretty wasteful), but 
for applications that are storing billions of records, that overhead adds 
up quick (disk space may be "cheap", but anything multiplied by 1B is still 
a lot).

Any thoughts on how COMPRESSION may help (or hurt) this? I assume it would 
be very efficient at removing fluff from a record, but I've also seen 
comments that would suggest that COMPRESSION isn't very efficient. My guess 
is that the "padding" is applied after the compression, since the point of 
the padding is to leave some free physical space in the disk storage (i.e. 
compressing after padding would result in most updated records taking up 
more physical space, which defeats the purpose). Can anyone from Orient 
explain this in more detail--specifically how COMPRESSION relates the 
physical disk storage and padding?

--Eric

On Saturday, May 28, 2016 at 1:13:55 AM UTC-5, scott molinari wrote:
>
> I can't help much, but I do remember reading that the records are padded 
> with space. You can find that info here (towards the bottom). 
>
> http://orientdb.com/docs/2.1/plocal-storage-engine.html 
>
> I know this kind of "pre-allocation" technique is necessary to allow for 
> flexible schema i.e. adding properties to records later on or updating 
> records with more data than was there before. As I understand the reason 
> for record "pre-allocation", it is needed because, if the space taken by 
> the record would be exactly the size of the record, then adding data to it 
> (making the record size larger) would cause the database to have to move 
> the record on disk, instead of updating it directly. You can imagine, if 
> you then update a lot of records this way, you'd end up with a huge mess 
> fast and the database would slow down considerably. So, in order to avoid 
> that, the database pre-allocates space per record. ODB has the setting 
> RECORD_GROW_FACTOR. In MongoDB, they recommend and set as default what they 
> call "powersOfTwo". In other words, the database doubles the initial size 
> of the document on disk. This is what is explained in the example in the 
> docs.
>
> As I take it from the docs, the settings for record size can be changed 
> through configuration. If you know your record size will never change, you 
> could drop the values to "1". However, I could imagine, if you do that and 
> then you do update and increase the data size even a little in a good 
> number of records, that will not jive well with the database. Though, I am 
> no expert on that. 
>
> I'd also like to know the overhead values of the data types otherwise. 
> Would be great basic knowledge of the database. If one of the nice gents 
> from Orient would lay it out here, I'd be even glad to add it to the 
> documentation. It would be a great addition to this table: 
> http://orientdb.com/docs/latest/Types.html
>
> Scott
>  
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[orientdb] Re: OrientDB Storage Overhead

Reply via email to