[jira] [Commented] (HBASE-6497) Revisit HLog sizing and roll parameters

2012-08-06 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429350#comment-13429350
 ] 

Jean-Daniel Cryans commented on HBASE-6497:
---

bq. Less parallelization per RS. If you have a lot of RSes, lowering file count 
does help reduce HBase RPCs too?

I'm not sure I understand what you mean. HBase RPCs in which context?

 Revisit HLog sizing and roll parameters
 ---

 Key: HBASE-6497
 URL: https://issues.apache.org/jira/browse/HBASE-6497
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Lars George

 The last major update to the HLog sizing and roll features were done in 
 HBASE-1394. I am proposing to revisit these settings to overcome recent 
 issues where the HLog becomes a major bottleneck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6497) Revisit HLog sizing and roll parameters

2012-08-03 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428544#comment-13428544
 ] 

Harsh J commented on HBASE-6497:


bq. Less parallelization during distributed splitting since the unit of 
distribution is a file.

Less parallelization per RS. If you have a lot of RSes, lowering file count 
does help reduce HBase RPCs too?

 Revisit HLog sizing and roll parameters
 ---

 Key: HBASE-6497
 URL: https://issues.apache.org/jira/browse/HBASE-6497
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Lars George

 The last major update to the HLog sizing and roll features were done in 
 HBASE-1394. I am proposing to revisit these settings to overcome recent 
 issues where the HLog becomes a major bottleneck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6497) Revisit HLog sizing and roll parameters

2012-08-02 Thread Lars George (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427130#comment-13427130
 ] 

Lars George commented on HBASE-6497:


The goal in designing a proper HBase schema is to maximize heap usage across 
all regions, which can lead to the situation where the WALs (aka HLog's) are 
required to be kept for a considerable amount of time. 

The last iteration on WAL properties added a configurable block size, as well 
as threshold percentage to roll the log before it completely fills the single 
HDFS block (see HBASE-1394).

I am questioning if this is still in issue, maybe even in the light of recent 
improvements on log performance, for example HBASE-5699 and HBASE-4608.

At the least, I would like to figure out, if we should increase the WAL size to 
512MB, to avoid getting into early flushing situations, impacting the overall 
I/O. Isn't HBASE-1364 helping to split larger logs (though not the logs 
themselves but distributed across the region servers obviously). I am not sure 
if the log splitting prefers block local nodes first, so that there is no 
remote reading though.

Questions:

# Is there a need to keep the logs small (typically 64-128 depending on the 
HDFS config)?
# Should we go multiple blocks?
# Do we still need the logroll multiplier?
# Should we increase the maxlogs number (default is 32)?

 Revisit HLog sizing and roll parameters
 ---

 Key: HBASE-6497
 URL: https://issues.apache.org/jira/browse/HBASE-6497
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Lars George

 The last major update to the HLog sizing and roll features were done in 
 HBASE-1394. I am proposing to revisit these settings to overcome recent 
 issues where the HLog becomes a major bottleneck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6497) Revisit HLog sizing and roll parameters

2012-08-02 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427615#comment-13427615
 ] 

Jean-Daniel Cryans commented on HBASE-6497:
---

bq. Is there a need to keep the logs small (typically 64-128 depending on the 
HDFS config)?

bq. If current is 128 MB x 32 = 4096 MB (4 GB) of logs approx. before full 
flush, then lets change that to have fewer than 32 files (reduces NN RPCs 
during recovery and increases the sequential read length) on to 8 maxlogs at 
512 MB default size (8x512 = 4096 again).

Issues with bigger files while having less of them:

 - Less parallelization during distributed splitting since the unit of 
distribution is a file.
 - Less opportunities to get rid of logs without having to force flush regions. 
The worst case would be having max 1 file meaning that when you roll you need 
to force flush everything that hasn't been flushed yet.

 Revisit HLog sizing and roll parameters
 ---

 Key: HBASE-6497
 URL: https://issues.apache.org/jira/browse/HBASE-6497
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Lars George

 The last major update to the HLog sizing and roll features were done in 
 HBASE-1394. I am proposing to revisit these settings to overcome recent 
 issues where the HLog becomes a major bottleneck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira