[jira] [Commented] (HBASE-5222) Stopping replication via the stop_replication command in hbase shell on a slave cluster isn't acknowledged in the replication sink

2012-03-23 Thread Josh Wymer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236692#comment-13236692
 ] 

Josh Wymer commented on HBASE-5222:
---

@HV, @JD: Please correct me if I'm wrong here. If you stop replication on the 
master, the logs are no longer stored to be pushed down stream like they would 
with replication enabled. Instead they would be cleaned up based on the default 
timeout. If we need to stop replicating to a slave cluster for maintenance, etc 
we don't want the master throwing away non-replicated logs (thinking it has no 
need to keep them). The bug, however, causes the slave to keep accepting logs 
even while disabled although the other processes on slave cluster respect the 
disabled flag.

 Stopping replication via the stop_replication command in hbase shell on a 
 slave cluster isn't acknowledged in the replication sink
 

 Key: HBASE-5222
 URL: https://issues.apache.org/jira/browse/HBASE-5222
 Project: HBase
  Issue Type: Bug
  Components: replication, shell
Affects Versions: 0.90.4
Reporter: Josh Wymer

 After running stop_replication in the hbase shell on our slave cluster we 
 saw replication continue for weeks. Turns out that the replication sink is 
 missing a check to get the replication state and therefore continued to write.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3489) .oldlogs not being cleaned out

2012-01-17 Thread Josh Wymer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188014#comment-13188014
 ] 

Josh Wymer commented on HBASE-3489:
---

After turning replication off on the slave cluster, the .oldlogs were cleaned 
up. So it appears as if hbase thinks that the slave cluster intends to 
replicate as well and doesn't clean the logs.

 .oldlogs not being cleaned out
 --

 Key: HBASE-3489
 URL: https://issues.apache.org/jira/browse/HBASE-3489
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
 Environment: 10 Nodes Write Heavy Cluster
Reporter: Wayne
 Attachments: oldlog.txt


 The .oldlogs folder is never being cleaned up. The 
 hbase.master.logcleaner.ttl has been set to clean up the old logs but the 
 clean up is never kicking in. The limit of 10 files is not the problem. After 
 running for 5 days not a single log file has ever been deleted and the 
 logcleaner is set to 2 days (from the default of 7 days). It is assumed that 
 the replication changes that want to be sure to keep these logs around if 
 needed have caused the cleanup to be blocked. There is no replication defined 
 (knowingly).
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3489) .oldlogs not being cleaned out

2012-01-16 Thread Josh Wymer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187172#comment-13187172
 ] 

Josh Wymer commented on HBASE-3489:
---

We are seeing this on our replication cluster using 0.90.4. The /hbase/.oldlogs 
is filled with logs that are ~ 1 month old.

 .oldlogs not being cleaned out
 --

 Key: HBASE-3489
 URL: https://issues.apache.org/jira/browse/HBASE-3489
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
 Environment: 10 Nodes Write Heavy Cluster
Reporter: Wayne
 Attachments: oldlog.txt


 The .oldlogs folder is never being cleaned up. The 
 hbase.master.logcleaner.ttl has been set to clean up the old logs but the 
 clean up is never kicking in. The limit of 10 files is not the problem. After 
 running for 5 days not a single log file has ever been deleted and the 
 logcleaner is set to 2 days (from the default of 7 days). It is assumed that 
 the replication changes that want to be sure to keep these logs around if 
 needed have caused the cleanup to be blocked. There is no replication defined 
 (knowingly).
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

2012-01-06 Thread Josh Wymer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181550#comment-13181550
 ] 

Josh Wymer commented on HBASE-5140:
---

We also talked about other methods such as using the first 8 bytes of the keys 
and converting to a long. This could indeed be solved by an interface.

 TableInputFormat subclass to allow N number of splits per region during MR 
 jobs
 ---

 Key: HBASE-5140
 URL: https://issues.apache.org/jira/browse/HBASE-5140
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Reporter: Josh Wymer
Priority: Trivial
   Original Estimate: 72h
  Remaining Estimate: 72h

 In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
 am working on a subclass for the TableInputFormat class that overrides 
 getSplits in order to generate N number of splits per regions and/or N number 
 of splits per job. The idea is to convert the startKey and endKey for each 
 region from byte[] to BigDecimal, take the difference, divide by N, convert 
 back to byte[] and generate splits on the resulting values. Assuming your 
 keys are fully distributed this should generate splits at nearly the same 
 number of rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

2012-01-06 Thread Josh Wymer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181565#comment-13181565
 ] 

Josh Wymer commented on HBASE-5140:
---

One glaring issue is the lack of start  end keys for one region tables. To get 
the start key we could do a quick scan of the first row and get the key. For 
the last region of a table, I'm not sure how we'll handle determining the end 
key other than setting it to the max size of whatever data type (e.g. long) we 
are using for the split calculations. Any suggestions other than this?

 TableInputFormat subclass to allow N number of splits per region during MR 
 jobs
 ---

 Key: HBASE-5140
 URL: https://issues.apache.org/jira/browse/HBASE-5140
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Reporter: Josh Wymer
Priority: Trivial
   Original Estimate: 72h
  Remaining Estimate: 72h

 In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
 am working on a subclass for the TableInputFormat class that overrides 
 getSplits in order to generate N number of splits per regions and/or N number 
 of splits per job. The idea is to convert the startKey and endKey for each 
 region from byte[] to BigDecimal, take the difference, divide by N, convert 
 back to byte[] and generate splits on the resulting values. Assuming your 
 keys are fully distributed this should generate splits at nearly the same 
 number of rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

2012-01-06 Thread Josh Wymer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181579#comment-13181579
 ] 

Josh Wymer commented on HBASE-5140:
---

Correct but for example on a table with one region, getStartEndKeys() returns 
two empty byte[]. The last region (or only region) for the table will return 
empty byte[] as the end key allowing the scan to scan to the end of the table. 
Therefore, we don't know the upper bound byte[] to use in order to determine 
the long (or int, etc) value we want to use for split calculations. So we must 
either have an efficient way to get the last key in this case or arbitrarily 
set the long to it's max value (since in any case nothing could be higher) and 
use that number to make the calculations. This obviously won't work for unbound 
data types like BigDecimal and is a partial solution at best.

 TableInputFormat subclass to allow N number of splits per region during MR 
 jobs
 ---

 Key: HBASE-5140
 URL: https://issues.apache.org/jira/browse/HBASE-5140
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Reporter: Josh Wymer
Priority: Trivial
   Original Estimate: 72h
  Remaining Estimate: 72h

 In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
 am working on a subclass for the TableInputFormat class that overrides 
 getSplits in order to generate N number of splits per regions and/or N number 
 of splits per job. The idea is to convert the startKey and endKey for each 
 region from byte[] to BigDecimal, take the difference, divide by N, convert 
 back to byte[] and generate splits on the resulting values. Assuming your 
 keys are fully distributed this should generate splits at nearly the same 
 number of rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira