[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966112#comment-14966112 ] Vladimir Rodionov commented on HBASE-14383: --- Some patches have been submitted recently, including HBASE-14388. > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933979#comment-14933979 ] Vladimir Rodionov commented on HBASE-14383: --- Can somebody take a look at HBASE-14468? It is totally separate feature - new compaction policy and it has passed QA. > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910049#comment-14910049 ] Vladimir Rodionov commented on HBASE-14383: --- Patch for HBASE-14468 (FIFO compaction policy) was submitted. Includes HBASE-14467 (and sub-task). > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909570#comment-14909570 ] Vladimir Rodionov commented on HBASE-14383: --- Patch for HBASE-14467 was submitted. Can somebody take a look? Thanks. > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876130#comment-14876130 ] Vladimir Rodionov commented on HBASE-14383: --- {quote} Hmm, does this policy mean that we may end up not flushing data even with periodic flusher? The periodic flusher should be like a force flush to be affective. {quote} For small stores flush happens only if they have data which is older than periodic flush interval. This is how it works today. In theory, if you have small heap and large number of regions you won't be able to load data fast w/o being totally blocked periodically. ALl memstores < 16MB and they will be flushed once an hour. > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876121#comment-14876121 ] Andrew Purtell commented on HBASE-14383: A couple of suggestions above to dynamically set maxlogs. I concur. Then, there's little (or no) point then to leave it be something a user can set to a fixed value. Remove it. One config knob down thanks to autotuning, a ton more to go. (smile) > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804474#comment-14804474 ] Lars Hofhansl commented on HBASE-14383: --- I think so. maxlogs is really a function of heap available for the memstores and the HDFS block size used. Something like: {{maxlogs = memstore heap / (HDFS blocksize * 0.95)}} > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804687#comment-14804687 ] Vladimir Rodionov commented on HBASE-14383: --- {quote} Where is this code? {quote} In a FlushLargeStoresPolicy, We flush only stores with 16MB in size and greater, otherwise we check if if it is old enough to be flushed. > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804682#comment-14804682 ] Enis Soztutar commented on HBASE-14383: --- bq. flush policy ignores all files less than 15MB. Where is this code? I could not find anything in the periodic or non-periodic flush requests that prevents flush requests. bq. maxlogs is really a function of heap available for the memstores and the HDFS block size used. Something like: maxlogs = memstore heap / (HDFS blocksize * 0.95) This assumes that all memstores are getting updates. In case a memstore stops getting updates, it will not flush for ~0.5 hour (expected) unless it is the biggest memstore left. bq. Can we just default it to that? Maybe with 10% padding. Maybe we can instead do the limit as 2x or 3x. > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804741#comment-14804741 ] Enis Soztutar commented on HBASE-14383: --- bq. In a FlushLargeStoresPolicy, We flush only stores with 16MB in size and greater, otherwise we check if if it is old enough to be flushed. Hmm, does this policy mean that we may end up not flushing data even with periodic flusher? The periodic flusher should be like a force flush to be affective. > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746749#comment-14746749 ] stack commented on HBASE-14383: --- I'd be for upping the max logs number (have seen cases where it ran away up to the thousands so some guard would be good) > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746608#comment-14746608 ] Elliott Clark commented on HBASE-14383: --- bq.Also, not related to compactions, but I have seen cases where there are not enough regions per region server to fill the whole memstore space with the 128MB flush size, a few active regions and big heaps. I'm hopeful that we can up that per memstore limit a lot and just let the auto-tuning/periodic flusher work together and ensure that all space is more used while MTTR is not too bad. > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746786#comment-14746786 ] Vladimir Rodionov commented on HBASE-14383: --- {quote} MTTR depends not a max number of WAL files but on a current load and PMF interval. {quote} and on memstore size, of course . > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746783#comment-14746783 ] Vladimir Rodionov commented on HBASE-14383: --- [~saint@gmail.com]: {quote} I'd be for upping the max logs number (have seen cases where it ran away up to the thousands so some guard would be good) {quote} That is very degenerate case. I have thought about this, it is possible to have many CF in a table and very small flush files. By default, flush policy ignores all files less than 15MB. Imagine that all your files in a region's memstores selected for flushing less than 15MB => there will be no flush and WAL numbers will continue growing (indefinitely, by the way). We probably need *hbase.regionserver.maxlogs* as a safeguard against runaway wals during prolonged burst load, when ingested data per RS in a PMF flush interval (1h) is much greater than overall memstore capacity. I agree we have to up default value of *hbase.regionserver.maxlogs* but set during RS init and not statically. We have to make sure that overall WAL capacity is not less than overall memstore capacity. Ideally it should be large enough to make the event (max number exceeded) very rare. MTTR depends not a max number of WAL files but on a current load and PMF interval. > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746788#comment-14746788 ] Vladimir Rodionov commented on HBASE-14383: --- {quote} flush policy ignores all files less than 15MB. {quote} Correction: memstores, not files. > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746847#comment-14746847 ] stack commented on HBASE-14383: --- bq. That is very degenerate case. I do not disagree. Bitch to fix after the fact though. bq. By default, flush policy ignores all files less than 15MB. This seems wrong if one of these memstores has an old edit that is holding up our freeing/GC'ing WALs. bq. I agree we have to up default value of hbase.regionserver.maxlogs but set during RS init and not statically. That sounds reasonable... bq. MTTR depends not a max number of WAL files but on a current load and PMF interval. Disagree. Many WALs has longer MTTR than few WALs > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746573#comment-14746573 ] Vladimir Rodionov commented on HBASE-14383: --- [~enis], [~saint@gmail.com], [~lhofhansl], [~apurtell] what do you think? Can we retire *hbase.regionserver.maxlogs*? > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746597#comment-14746597 ] Enis Soztutar commented on HBASE-14383: --- bq. Can we retire hbase.regionserver.maxlogs? I am in favor of that, or keeping it as a safety net, but with a much higher default (128?). With default settings {code} hbase.regionserver.maxlogs=32 hbase.regionserver.hlog.blocksize=128MB hbase.regionserver.logroll.multiplier=0.95 {code} We can only have 32*128*0.95 = 3.9MB of WAL entries. So, if you are running with 32GB heap and 0.4 memstore size, the memstore space is just left unused. Also, not related to compactions, but I have seen cases where there are not enough regions per region server to fill the whole memstore space with the 128MB flush size, a few active regions and big heaps. We do not allow a memstore to grow beyond the flush limit to guard against long flushes and long MTTR times. But my feeling is that, maybe we can have a dynamically adjustable flush size taking into account a min and max flush size and delay triggering the flush if there is more space. > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746564#comment-14746564 ] Vladimir Rodionov commented on HBASE-14383: --- Need a feedback on usefulness of *hbase.regionserver.maxlogs* configuration setting. LogRoller runs periodically (1h by default) and does two things: # Archive old logs (WAL files which have all WALEdits already flushed) # Then checks number of active WAL files and if it exceeds hbase.regionserver.maxlogs then all regions which have edits from the oldest WAL file will be flushed. rollWriter from FSHLog: {code} @Override public byte [][] rollWriter(boolean force) throws FailedLogCloseException, IOException { rollWriterLock.lock(); try { // Return if nothing to flush. if (!force && (this.writer != null && this.numEntries.get() <= 0)) return null; byte [][] regionsToFlush = null; if (this.closed) { LOG.debug("WAL closed. Skipping rolling of writer"); return regionsToFlush; } if (!closeBarrier.beginOp()) { LOG.debug("WAL closing. Skipping rolling of writer"); return regionsToFlush; } TraceScope scope = Trace.startSpan("FSHLog.rollWriter"); try { Path oldPath = getOldPath(); Path newPath = getNewPath(); // Any exception from here on is catastrophic, non-recoverable so we currently abort. Writer nextWriter = this.createWriterInstance(newPath); FSDataOutputStream nextHdfsOut = null; if (nextWriter instanceof ProtobufLogWriter) { nextHdfsOut = ((ProtobufLogWriter)nextWriter).getStream(); // If a ProtobufLogWriter, go ahead and try and sync to force setup of pipeline. // If this fails, we just keep going it is an optimization, not the end of the world. preemptiveSync((ProtobufLogWriter)nextWriter); } tellListenersAboutPreLogRoll(oldPath, newPath); // NewPath could be equal to oldPath if replaceWriter fails. newPath = replaceWriter(oldPath, newPath, nextWriter, nextHdfsOut); tellListenersAboutPostLogRoll(oldPath, newPath); // Can we delete any of the old log files? if (getNumRolledLogFiles() > 0) { cleanOldLogs(); regionsToFlush = findRegionsToForceFlush(); } } finally { closeBarrier.endOp(); assert scope == NullScope.INSTANCE || !scope.isDetached(); scope.close(); } return regionsToFlush; } finally { rollWriterLock.unlock(); } } {code} There is a clear duplication in functionality between LogRoller (LR) and PeriodicMemstoreFlsuher (PMF). PMF already takes care of old memstores and flushes them - no need to call regionsToFlush = findRegionsToForceFlush() in a rollWriter call and hence there is no need in *hbase.regionserver.maxlogs* config option. PMF flushes periodically oldest memstores and LogRoller archives periodically old WAL files. That is it. > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736029#comment-14736029 ] Heng Chen commented on HBASE-14383: --- {quote} One more optimization can be added to ExploringCompactionPolicy. To limit size of a compaction there is a config parameter one could use hbase.hstore.compaction.max.size. It would be nice to have two separate limits: for peak and off peak hours. {quote} This issue meets your needs? https://issues.apache.org/jira/browse/HBASE-8329 > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14383) Compaction improvements
[ https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736108#comment-14736108 ] Vladimir Rodionov commented on HBASE-14383: --- No, it is different. This is compaction throughput limit. > Compaction improvements > --- > > Key: HBASE-14383 > URL: https://issues.apache.org/jira/browse/HBASE-14383 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > > Still major issue in many production environments. The general recommendation > - disabling region splitting and major compactions to reduce unpredictable > IO/CPU spikes, especially during peak times and running them manually during > off peak times. Still do not resolve the issues completely. > h3. Flush storms > * rolling WAL events across cluster can be highly correlated, hence flushing > memstores, hence triggering minor compactions, that can be promoted to major > ones. These events are highly correlated in time if there is a balanced > write-load on the regions in a table. > * the same is true for memstore flushing due to periodic memstore flusher > operation. > Both above may produce *flush storms* which are as bad as *compaction > storms*. > What can be done here. We can spread these events over time by randomizing > (with jitter) several config options: > # hbase.regionserver.optionalcacheflushinterval > # hbase.regionserver.flush.per.changes > # hbase.regionserver.maxlogs > h3. ExploringCompactionPolicy max compaction size > One more optimization can be added to ExploringCompactionPolicy. To limit > size of a compaction there is a config parameter one could use > hbase.hstore.compaction.max.size. It would be nice to have two separate > limits: for peak and off peak hours. > h3. ExploringCompactionPolicy selection evaluation algorithm > Too simple? Selection with more files always wins, selection of smaller size > wins if number of files is the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)