[jira] [Commented] (HBASE-5222) Stopping replication via the stop_replication command in hbase shell on a slave cluster isn't acknowledged in the replication sink
[ https://issues.apache.org/jira/browse/HBASE-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236692#comment-13236692 ] Josh Wymer commented on HBASE-5222: --- @HV, @JD: Please correct me if I'm wrong here. If you stop replication on the master, the logs are no longer stored to be pushed down stream like they would with replication enabled. Instead they would be cleaned up based on the default timeout. If we need to stop replicating to a slave cluster for maintenance, etc we don't want the master throwing away non-replicated logs (thinking it has no need to keep them). The bug, however, causes the slave to keep accepting logs even while disabled although the other processes on slave cluster respect the disabled flag. Stopping replication via the stop_replication command in hbase shell on a slave cluster isn't acknowledged in the replication sink Key: HBASE-5222 URL: https://issues.apache.org/jira/browse/HBASE-5222 Project: HBase Issue Type: Bug Components: replication, shell Affects Versions: 0.90.4 Reporter: Josh Wymer After running stop_replication in the hbase shell on our slave cluster we saw replication continue for weeks. Turns out that the replication sink is missing a check to get the replication state and therefore continued to write. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3489) .oldlogs not being cleaned out
[ https://issues.apache.org/jira/browse/HBASE-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188014#comment-13188014 ] Josh Wymer commented on HBASE-3489: --- After turning replication off on the slave cluster, the .oldlogs were cleaned up. So it appears as if hbase thinks that the slave cluster intends to replicate as well and doesn't clean the logs. .oldlogs not being cleaned out -- Key: HBASE-3489 URL: https://issues.apache.org/jira/browse/HBASE-3489 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Environment: 10 Nodes Write Heavy Cluster Reporter: Wayne Attachments: oldlog.txt The .oldlogs folder is never being cleaned up. The hbase.master.logcleaner.ttl has been set to clean up the old logs but the clean up is never kicking in. The limit of 10 files is not the problem. After running for 5 days not a single log file has ever been deleted and the logcleaner is set to 2 days (from the default of 7 days). It is assumed that the replication changes that want to be sure to keep these logs around if needed have caused the cleanup to be blocked. There is no replication defined (knowingly). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3489) .oldlogs not being cleaned out
[ https://issues.apache.org/jira/browse/HBASE-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187172#comment-13187172 ] Josh Wymer commented on HBASE-3489: --- We are seeing this on our replication cluster using 0.90.4. The /hbase/.oldlogs is filled with logs that are ~ 1 month old. .oldlogs not being cleaned out -- Key: HBASE-3489 URL: https://issues.apache.org/jira/browse/HBASE-3489 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Environment: 10 Nodes Write Heavy Cluster Reporter: Wayne Attachments: oldlog.txt The .oldlogs folder is never being cleaned up. The hbase.master.logcleaner.ttl has been set to clean up the old logs but the clean up is never kicking in. The limit of 10 files is not the problem. After running for 5 days not a single log file has ever been deleted and the logcleaner is set to 2 days (from the default of 7 days). It is assumed that the replication changes that want to be sure to keep these logs around if needed have caused the cleanup to be blocked. There is no replication defined (knowingly). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs
[ https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181550#comment-13181550 ] Josh Wymer commented on HBASE-5140: --- We also talked about other methods such as using the first 8 bytes of the keys and converting to a long. This could indeed be solved by an interface. TableInputFormat subclass to allow N number of splits per region during MR jobs --- Key: HBASE-5140 URL: https://issues.apache.org/jira/browse/HBASE-5140 Project: HBase Issue Type: New Feature Components: mapreduce Reporter: Josh Wymer Priority: Trivial Original Estimate: 72h Remaining Estimate: 72h In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I am working on a subclass for the TableInputFormat class that overrides getSplits in order to generate N number of splits per regions and/or N number of splits per job. The idea is to convert the startKey and endKey for each region from byte[] to BigDecimal, take the difference, divide by N, convert back to byte[] and generate splits on the resulting values. Assuming your keys are fully distributed this should generate splits at nearly the same number of rows per split. Any suggestions on this issue are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs
[ https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181565#comment-13181565 ] Josh Wymer commented on HBASE-5140: --- One glaring issue is the lack of start end keys for one region tables. To get the start key we could do a quick scan of the first row and get the key. For the last region of a table, I'm not sure how we'll handle determining the end key other than setting it to the max size of whatever data type (e.g. long) we are using for the split calculations. Any suggestions other than this? TableInputFormat subclass to allow N number of splits per region during MR jobs --- Key: HBASE-5140 URL: https://issues.apache.org/jira/browse/HBASE-5140 Project: HBase Issue Type: New Feature Components: mapreduce Reporter: Josh Wymer Priority: Trivial Original Estimate: 72h Remaining Estimate: 72h In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I am working on a subclass for the TableInputFormat class that overrides getSplits in order to generate N number of splits per regions and/or N number of splits per job. The idea is to convert the startKey and endKey for each region from byte[] to BigDecimal, take the difference, divide by N, convert back to byte[] and generate splits on the resulting values. Assuming your keys are fully distributed this should generate splits at nearly the same number of rows per split. Any suggestions on this issue are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs
[ https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181579#comment-13181579 ] Josh Wymer commented on HBASE-5140: --- Correct but for example on a table with one region, getStartEndKeys() returns two empty byte[]. The last region (or only region) for the table will return empty byte[] as the end key allowing the scan to scan to the end of the table. Therefore, we don't know the upper bound byte[] to use in order to determine the long (or int, etc) value we want to use for split calculations. So we must either have an efficient way to get the last key in this case or arbitrarily set the long to it's max value (since in any case nothing could be higher) and use that number to make the calculations. This obviously won't work for unbound data types like BigDecimal and is a partial solution at best. TableInputFormat subclass to allow N number of splits per region during MR jobs --- Key: HBASE-5140 URL: https://issues.apache.org/jira/browse/HBASE-5140 Project: HBase Issue Type: New Feature Components: mapreduce Reporter: Josh Wymer Priority: Trivial Original Estimate: 72h Remaining Estimate: 72h In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I am working on a subclass for the TableInputFormat class that overrides getSplits in order to generate N number of splits per regions and/or N number of splits per job. The idea is to convert the startKey and endKey for each region from byte[] to BigDecimal, take the difference, divide by N, convert back to byte[] and generate splits on the resulting values. Assuming your keys are fully distributed this should generate splits at nearly the same number of rows per split. Any suggestions on this issue are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira