[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2014-12-02 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231835#comment-14231835
 ] 

Jonathan Hsieh commented on HBASE-3149:
---

The jira reflects the latest status -- i believe this issue is waiting for 
someone to pick up and complete for about a year. 

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Gaurav Menghani
Priority: Critical
 Fix For: 0.89-fb

 Attachments: 3149-trunk-v1.txt, Per-CF-Memstore-Flush.diff


  Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2014-12-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231843#comment-14231843
 ] 

Ted Yu commented on HBASE-3149:
---

Please see HBASE-10201

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Gaurav Menghani
Priority: Critical
 Fix For: 0.89-fb

 Attachments: 3149-trunk-v1.txt, Per-CF-Memstore-Flush.diff


  Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2014-12-01 Thread Brian Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230448#comment-14230448
 ] 

Brian Johnson commented on HBASE-3149:
--

What is the status of this in trunk? 

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Gaurav Menghani
Priority: Critical
 Fix For: 0.89-fb

 Attachments: 3149-trunk-v1.txt, Per-CF-Memstore-Flush.diff


  Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2013-12-20 Thread Gaurav Menghani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854725#comment-13854725
 ] 

Gaurav Menghani commented on HBASE-3149:


[~saint@gmail.com] Yes, we have deployed this, with selective flushing 
disabled for now, since we didn't see any aggregate benefits yet. The 
heuristics that I was thinking about were around, which column families to 
flush when there are no column families above the threshold for flushing 
families. Eg. if the memstore limit is 128 MB, and the flushing threshold for a 
CF is 32 MB, there might be a case, where there are like 7-8 CFs, and none of 
them are above 32 MB. 

In that case, there are a couple of heuristics you can choose. Like: flush the 
top N column families, flush only as few column families to free up 1/4 th of 
the memstore, etc. The main benefit I see is the time spent while compacting 
the smaller CFs will be much lesser, since the number of files created would be 
much lesser. This is compensated against bigger column families being flushed 
earlier than before, and having smaller files than without this change, but 
with the right heuristics we can find a good balance.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Gaurav Menghani
Priority: Critical
 Fix For: 0.89-fb

 Attachments: 3149-trunk-v1.txt, Per-CF-Memstore-Flush.diff


 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2013-12-18 Thread Gaurav Menghani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852025#comment-13852025
 ] 

Gaurav Menghani commented on HBASE-3149:


Ted has volunteered to port this to trunk in a separate JIRA. I will be working 
on different heuristics to see the benefits that we get.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Gaurav Menghani
Priority: Critical
 Fix For: 0.89-fb

 Attachments: 3149-trunk-v1.txt, Per-CF-Memstore-Flush.diff


 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2013-12-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852042#comment-13852042
 ] 

stack commented on HBASE-3149:
--

[~gaurav.menghani] Gaurav, have you deployed this change?  If so, what do you 
see in operation?  When you talk about different heuristics, what you thinking? 
 Thanks boss.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Gaurav Menghani
Priority: Critical
 Fix For: 0.89-fb

 Attachments: 3149-trunk-v1.txt, Per-CF-Memstore-Flush.diff


 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2013-10-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804528#comment-13804528
 ] 

Hadoop QA commented on HBASE-3149:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12610129/Per-CF-Memstore-Flush.diff
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7618//console

This message is automatically generated.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Gaurav Menghani
Priority: Critical
 Fix For: 0.89-fb

 Attachments: Per-CF-Memstore-Flush.diff


 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2013-10-24 Thread Gaurav Menghani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804537#comment-13804537
 ] 

Gaurav Menghani commented on HBASE-3149:


The basic idea is to be able to maintain the smallest LSN amongst the edits 
present in a particular memstore for a column family. When we decide to flush a 
set of memstores, we find the smallest LSN id amongst the memstores that we are 
not flushing, say X, and say that we can remove the logs for any edits with LSN 
less than X. We choose a particular memstore to be flushed, if it occupies more 
than 't' bytes, when the global memstore size threshold is 'T' (and t/T = 1/4 
for our configuration). If there is no memstore with = t bytes but the total 
size of all the memstores is above T, we flush all the memstores. 


 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Gaurav Menghani
Priority: Critical
 Fix For: 0.89-fb

 Attachments: Per-CF-Memstore-Flush.diff


 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2013-10-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794254#comment-13794254
 ] 

stack commented on HBASE-3149:
--

Thanks [~gaurav.menghani]

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Gaurav Menghani
Priority: Critical

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2013-06-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680110#comment-13680110
 ] 

Sergey Shelukhin commented on HBASE-3149:
-

Any update since last comment?

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.92.3


 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2013-02-13 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577618#comment-13577618
 ] 

Himanshu Vashishtha commented on HBASE-3149:


This is a useful feature; I'm working on it.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Himanshu Vashishtha
Priority: Critical
 Fix For: 0.92.3


 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2012-02-22 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213858#comment-13213858
 ] 

Nicolas Spiegelberg commented on HBASE-3149:


@Lars/Stack: note that the number of StoreFiles necessary to store N amount of 
data is order O(log N) with the existing compaction algorithm.  This means that 
setting the compaction min size to a low value will not result in significantly 
more files.  Furthermore, what's hurting performance is not the amount of files 
but the size of each file.  The extra files will be very small and take up only 
a minority of the space in the LRU cache.  Every time you unnecessarily compact 
files, you have to repopulate that StoreFile in the LRU cache and get a lot of 
disk reads in addition to the obvious write increase.  This is all to say that 
I would recommend defaulting it to that low because the downsides are very 
minimal and the benefit can be substantial IO gains.

bq. At the same time, I'd think this issue still worth some time; if lots of 
cfs and only one is filling, its silly to flush the others as we do now because 
one is over the threshold.

Why is this silly?  With cache-on-write, the data is still cached in memory.  
It's just migrated from the MemCache to the BlockCache, which has comparable 
performance.  Furthermore, BlockCache data is compressed, so it then takes up 
less space.  Flushing also minimizes the amount of HLogs and decreases recovery 
time.  Flushing would be bad if it meant we weren't optimally using the global 
MemStore size, but we currently are.

bq. This surely seems a specific setting for this use-case, and there are 
others that need a slightly different setting. If you mix those two on the same 
cluster, then having only one global setting to adjust this seems restrictive? 
Should this be a setting per table, like the flush size?

I think this is a better default, not that it's a one-size setting.  I agree 
that this should toggleable on a per-CF basis, hence HBASE-5335.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.92.1


 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2012-02-22 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214372#comment-13214372
 ] 

stack commented on HBASE-3149:
--

@Nicolas I think I follow.  I opened HBASE-5461.  Let me try it.

bq. Why is this silly? 

Because I was seeing a plethora of small files a problem but given your 
explaination above, I think I grok that its not many small files thats the 
prob; its that w/ the way high min size, our selection was to inclusionary and 
so we end up doing loads of rewriting.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.92.1


 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2012-02-22 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214382#comment-13214382
 ] 

Lars Hofhansl commented on HBASE-3149:
--

Thanks for explaining Nicolas.
I wonder if a good default would be some fraction of the flushsize. Maybe 
1/4*flushsize, or something.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.92.1


 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2012-02-21 Thread Lars George (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212938#comment-13212938
 ] 

Lars George commented on HBASE-3149:


bq. At the same time, I'd think this issue still worth some time; if lots of 
cfs and only one is filling, its silly to flush the others as we do now because 
one is over the threshold.

I thought so too. Setting the hbase.hstore.compaction.size to 4MB, and having 
the flush size at 256MB, it means you will never compact flush files larger 
than 4MB. So, in other words, only if you are flushing small files (say from a 
small, dependent column family) you are running a minor compaction on them. For 
the larger family you typically do not run those at all, right?

This surely seems a specific setting for this use-case, and there are others 
that need a slightly different setting. If you mix those two on the same 
cluster, then having only one global setting to adjust this seems restrictive? 
Should this be a setting per table, like the flush size?

It still seems to me that decoupling is what we should have available as well. 
But I thought about it for a while as well as discussed this various people: it 
seems that decoupling brings its own set of issues, for example, you might end 
up with too many HLog files because the small family is flushed only rarely. 

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg
Priority: Critical
 Fix For: 0.92.1


 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2012-02-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212166#comment-13212166
 ] 

stack commented on HBASE-3149:
--

@Nicolas I wonder about this... hbase.hstore.compaction.min.size.  When we 
compact, don't we have to take adjacent files as part of our ACID guarantees?  
This would frustrate that?  (I'll take a look... tomorrow).  I'm wondering 
because i want to figure how to make it so we favor reference files... so they 
are always included in a compaction.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg
 Fix For: 0.92.1


 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2012-02-19 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211614#comment-13211614
 ] 

Lars Hofhansl commented on HBASE-3149:
--

@Nicolas: Interesting bit about hstore.compaction.min.size. I'm curious, is 4MB 
something that works specifically for your setup, or would you generally 
recommend it setting it that low?
It probably has to do with whether compression is enabled, how many CFs and 
relative sizes, etc.

Maybe instead of defaulting it to flushsize, we could default it to flushsize/2 
or flushsize/4...?


 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2012-02-18 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211204#comment-13211204
 ] 

stack commented on HBASE-3149:
--

Thanks @Nicolas (and thanks @Mubarak -- sounds like something to indeed get 
into 0.92).

At the same time, I'd think this issue still worth some time; if lots of cfs 
and only one is filling, its silly to flush the others as we do now because one 
is over the threshold.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2012-02-17 Thread Mubarak Seyed (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210671#comment-13210671
 ] 

Mubarak Seyed commented on HBASE-3149:
--

@Nicolas,
Is there any update on this issue? We have a production use-case wherein 80% of 
data goes to one CF and remaining 20% goes to two other CFs. I can collaborate 
with you if you are interested to pursue with patch. Thanks.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2012-02-17 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210687#comment-13210687
 ] 

Nicolas Spiegelberg commented on HBASE-3149:


@Mubarak: I think you probably are more interested in tuning the compaction 
settings.  The initial reason for this JIRA was higher network IO.  The actual 
problem was that the min unconditional compact size was too high  caused bad 
compaction decision.  We fixed this by lowering the min size from the default 
of the flush size (256MB, for us) to 4MB.

{code}
  property
   namehbase.hstore.compaction.min.size/name
   value4194304/value
   description
 The minimum compaction size. All files below this size are always
 included into a compaction, even if outside compaction ratio times
 the total size of all files added to compaction so far.
   /description
  /property
{code}

We identified this a while ago and I thought we were going to change the 
default for 0.92, but it looks like it's still in the Store.java code :(   A 
better use of your time would be to verify that this reduces your IO and write 
up a JIRA to change the default.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3149) Make flush decisions per column family

2012-02-17 Thread Mubarak Seyed (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210698#comment-13210698
 ] 

Mubarak Seyed commented on HBASE-3149:
--

Thanks Nicolas. Will try with 4 MB and create a JIRA.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3149) Make flush decisions per column family

2011-01-31 Thread Schubert Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988784#comment-12988784
 ] 

Schubert Zhang commented on HBASE-3149:
---

This jira is very useful in practice. 
In HBase, the horizontal partitions by rowkey-ranges make regions, and the 
vertical partitions by column-family make stores. These horizontal and vertical 
partitoning schema make a data tetragonum --- the store in hbase.

The memstore is base on the store, so the flush and compaction need also be 
based on store. The memstoreSize in HRegion should be in HStore.

For flexible configuration, I think we shlould be able to configure 
memstoresize (i.e. hbase.hregion.memstore.flush.size) in Column-Family level 
(when create table). And if possaible, I want the maxStoreSize also be 
configurable for different Column-Family.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3149) Make flush decisions per column family

2011-01-19 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983972#action_12983972
 ] 

ryan rawson commented on HBASE-3149:


ok got it. as long as we dont generate a seqid per family we are all good.


 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3149) Make flush decisions per column family

2011-01-18 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983312#action_12983312
 ] 

Nicolas Spiegelberg commented on HBASE-3149:


Some interesting stats. We did some rough calculations internally to see what 
effect an uneven distribution of data into column families was having on our 
network IO. Our data distribution for 3 column families was 1:1:20. When we 
looked at the flush:minor-compaction ratio for each of the store files, the 
large column family had a 1:2 ratio but the small CFs both had a 1:20 ratio! We 
are looking at roughly a 10% network IO decrease if we can bring those other 2 
CFs down to a 1:2 ratio as well.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3149) Make flush decisions per column family

2011-01-18 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983510#action_12983510
 ] 

ryan rawson commented on HBASE-3149:


if you are going to generate a sequence id for every CF, then we will
need to create and use a new synthetic ID for atomic views.


 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3149) Make flush decisions per column family

2011-01-18 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983523#action_12983523
 ] 

Nicolas Spiegelberg commented on HBASE-3149:


@ryan: the main work in step #3 isn't HBASE-2856 work.  It's roughly modifying 
HLog.lastSeqWritten from Mapregion, long = Mapstore, long and all the 
refactoring to the HLog code that it entails. 

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Nicolas Spiegelberg

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3149) Make flush decisions per column family

2010-10-26 Thread Karthik Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925090#action_12925090
 ] 

Karthik Ranganathan commented on HBASE-3149:


Yes, agreed that the memory implication is different. 

Eventually, is it not better to enforce the memory limit by using a combination 
of flush sizes and restricting the number of regions we create? Because ideally 
we should allow different flush sizes for the different CF's as the KV sizes 
could be way different...

Shall I just make this an option in the conf for now with the default the way 
it is?

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3149) Make flush decisions per column family

2010-10-25 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924705#action_12924705
 ] 

Jean-Daniel Cryans commented on HBASE-3149:
---

I have been thinking about this one for some time... I think it makes sense in 
loads of ways since a common problem of multi-CF is that during the initial 
import the user ends up with thousands of small store files because some family 
grows faster and triggered the flushes, which in turn generates incredible 
compaction churn. On the other hand, it means that we almost consider a family 
as a region e.g. one region with 3 CF can have up to 3x64MB in the memstores.

 Make flush decisions per column family
 --

 Key: HBASE-3149
 URL: https://issues.apache.org/jira/browse/HBASE-3149
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan

 Today, the flush decision is made using the aggregate size of all column 
 families. When large and small column families co-exist, this causes many 
 small flushes of the smaller CF. We need to make per-CF flush decisions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.