I do agree that flush interval must be configurable (I think its configurable).

> I've upgraded to 0.94.11. Here is my "worst-case scenario" :
> - let say each regionserver has 3 GB memstore
> - let say compaction max filesize is ~200 GB, min. 2 files, max 10 files.
> - let say memstore is growing "slowly" (1 GB / hour per RS)
> Then, automatically flushing every hour will lead into 1 GB storefiles,
> being compacted into storefiles of 2 GB, 3 GB, 4.... up to 200 GB.

Nope. The reality is even worse than one could imagine.

Memstore 'size' is estimated Java heap usage which includes 'object' overheads 
(mostly KeyValue)
For small rows, the ratio of serialized memstore (store file) and estimated 
heap size is close to 5x (store file is smaller of course).
If you enable compression (2-3x) -> your store file will be 10-15 times smaller 
than your Memstore (not 1 GB,  but 70-100MB)

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: [email protected]

________________________________________
From: Adrien Mogenet [[email protected]]
Sent: Tuesday, December 10, 2013 2:16 PM
To: user
Subject: Re: Default value for Periodic Flusher

Hi guys,

I've upgraded to 0.94.11. Here is my "worst-case scenario" :

- let say each regionserver has 3 GB memstore
- let say compaction max filesize is ~200 GB, min. 2 files, max 10 files.
- let say memstore is growing "slowly" (1 GB / hour per RS)

Then, automatically flushing every hour will lead into 1 GB storefiles,
being compacted into storefiles of 2 GB, 3 GB, 4.... up to 200 GB.
Sometimes, my write-load becomes very low, and periodic flusher will flush
perhaps 1 MB of data, it will trigger a minor compaction of hundreds
gigabytes + 1 MB; it seems to be lots of IO just to merge 1 MB of data.

Previously (ie. lack of periodic flusher) memstore was creating 3 GB
storefiles, and thus creating (after minor compactions) 3 GB, 6 GB, 9 GB...
up to 200 GB storefiles. And if memstore is growing slowly, it won't
generate small storefiles on HDFS. If think it looks like a more reasonable
IO-load, doesn't it?

I deeply agree with Periodic Flusher relevance, but I don't think it's
suitable for everyone. Do you share my opinion wrt. my workload?


On Sun, Dec 8, 2013 at 10:36 PM, Ted Yu <[email protected]> wrote:


Confidentiality Notice:  The information contained in this message, including 
any attachments hereto, may be confidential and is intended to be read only by 
the individual or entity to whom this message is addressed. If the reader of 
this message is not the intended recipient or an agent or designee of the 
intended recipient, please note that any review, use, disclosure or 
distribution of this message or its attachments, in any form, is strictly 
prohibited.  If you have received this message in error, please immediately 
notify the sender and/or [email protected] and delete or destroy any 
copy of this message and its attachments.

Reply via email to