[jira] Commented: (HBASE-3243) Disable Table closed region on wrong host

2010-11-29 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12964918#action_12964918
 ] 

Jonathan Gray commented on HBASE-3243:
--

+1 to your proposal stack

I've also gone through this several times and have not been able to come up 
with anything besides what is contained in these patches.

Todd, do your best to break the new RC :)

 Disable Table closed region on wrong host
 -

 Key: HBASE-3243
 URL: https://issues.apache.org/jira/browse/HBASE-3243
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.90.0

 Attachments: hbase-3243-logs.tar.bz2, HBASE-3243-v1.patch, hri.diff


 I ran some YCSB benchmarks which resulted in about 150 regions worth of data 
 overnight. Then I disabled the table, and the master for some reason closed 
 one region on the wrong server. The server ignored this, but the region 
 remained open on a different server, which later flipped out when it tried to 
 flush due to hlog accumulation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3243) Disable Table closed region on wrong host

2010-11-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12964924#action_12964924
 ] 

stack commented on HBASE-3243:
--

Did you intend to do this in your patch Jon?

{code}
@@ -1359,11 +1361,6 @@
 }
 synchronized (this.regions) {
   this.regions.remove(hri);
-}
-synchronized (this.regionPlans) {
-  this.regionPlans.remove(hri.getEncodedName());
-}
-synchronized (this.servers) {
   for (ListHRegionInfo regions : this.servers.values()) {
 for (int i=0;iregions.size();i++) {
   if (regions.get(i).equals(hri)) {
{code}

 Disable Table closed region on wrong host
 -

 Key: HBASE-3243
 URL: https://issues.apache.org/jira/browse/HBASE-3243
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.90.0

 Attachments: hbase-3243-logs.tar.bz2, HBASE-3243-v1.patch, hri.diff


 I ran some YCSB benchmarks which resulted in about 150 regions worth of data 
 overnight. Then I disabled the table, and the master for some reason closed 
 one region on the wrong server. The server ignored this, but the region 
 remained open on a different server, which later flipped out when it tried to 
 flush due to hlog accumulation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3243) Disable Table closed region on wrong host

2010-11-29 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12964926#action_12964926
 ] 

Jonathan Gray commented on HBASE-3243:
--

Yes.  That was wrong, we never use this.servers as a lock, we always manipulate 
this.regions and this.servers under the this.regions lock.

The removal from regionPlans was just moved underneath it, I didn't take that 
part out.  It's in the next chunk of the diff.

 Disable Table closed region on wrong host
 -

 Key: HBASE-3243
 URL: https://issues.apache.org/jira/browse/HBASE-3243
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.90.0

 Attachments: hbase-3243-logs.tar.bz2, HBASE-3243-v1.patch, hri.diff


 I ran some YCSB benchmarks which resulted in about 150 regions worth of data 
 overnight. Then I disabled the table, and the master for some reason closed 
 one region on the wrong server. The server ignored this, but the region 
 remained open on a different server, which later flipped out when it tried to 
 flush due to hlog accumulation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3243) Disable Table closed region on wrong host

2010-11-29 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12965020#action_12965020
 ] 

Todd Lipcon commented on HBASE-3243:


Sounds good to me.

 Disable Table closed region on wrong host
 -

 Key: HBASE-3243
 URL: https://issues.apache.org/jira/browse/HBASE-3243
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
 Fix For: 0.90.1

 Attachments: 3243-v2.patch, hbase-3243-logs.tar.bz2, 
 HBASE-3243-v1.patch, hri.diff


 I ran some YCSB benchmarks which resulted in about 150 regions worth of data 
 overnight. Then I disabled the table, and the master for some reason closed 
 one region on the wrong server. The server ignored this, but the region 
 remained open on a different server, which later flipped out when it tried to 
 flush due to hlog accumulation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3243) Disable Table closed region on wrong host

2010-11-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934535#action_12934535
 ] 

Jonathan Gray commented on HBASE-3243:
--

Well try running again with my patch.  Or you could even run it again without 
to see if it happens again and we could get another set of logs.  I guess run 
it with the patch and then if it doesn't ever happen again we can punt the 
issue or resolve it until we see it again.

 Disable Table closed region on wrong host
 -

 Key: HBASE-3243
 URL: https://issues.apache.org/jira/browse/HBASE-3243
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.90.0

 Attachments: hbase-3243-logs.tar.bz2, HBASE-3243-v1.patch


 I ran some YCSB benchmarks which resulted in about 150 regions worth of data 
 overnight. Then I disabled the table, and the master for some reason closed 
 one region on the wrong server. The server ignored this, but the region 
 remained open on a different server, which later flipped out when it tried to 
 flush due to hlog accumulation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3243) Disable Table closed region on wrong host

2010-11-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933628#action_12933628
 ] 

stack commented on HBASE-3243:
--

This is an odd one.  I don't see anything jumping out at me.  Need to dig more.

 Disable Table closed region on wrong host
 -

 Key: HBASE-3243
 URL: https://issues.apache.org/jira/browse/HBASE-3243
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.90.0

 Attachments: hbase-3243-logs.tar.bz2, HBASE-3243-v1.patch


 I ran some YCSB benchmarks which resulted in about 150 regions worth of data 
 overnight. Then I disabled the table, and the master for some reason closed 
 one region on the wrong server. The server ignored this, but the region 
 remained open on a different server, which later flipped out when it tried to 
 flush due to hlog accumulation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3243) Disable Table closed region on wrong host

2010-11-17 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932894#action_12932894
 ] 

Jonathan Gray commented on HBASE-3243:
--

Dug into code around compareTo() and the values instance.  Pretty sure we are 
not touching that while a cluster is running, and definitely not during 
disabling.  We've actually even taken the 'offline' flag out of META as well, 
it's a node in ZK now that signals the state of a table 
(enabling/disabling/disabled).

It is interesting that HRI comparator uses the full HTD comparator.  It should 
probably just use the tableName itself to compare though considering HTD is a 
member it might make sense in some circumstances to do the full HTD.compareTo()?

Looking around the code, it does look like there are multiple places we're 
using regions w/o a lock in the disable table path.  Pretty sure this is the 
cause.  Patch soon.

 Disable Table closed region on wrong host
 -

 Key: HBASE-3243
 URL: https://issues.apache.org/jira/browse/HBASE-3243
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.90.0


 I ran some YCSB benchmarks which resulted in about 150 regions worth of data 
 overnight. Then I disabled the table, and the master for some reason closed 
 one region on the wrong server. The server ignored this, but the region 
 remained open on a different server, which later flipped out when it tried to 
 flush due to hlog accumulation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3243) Disable Table closed region on wrong host

2010-11-17 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932947#action_12932947
 ] 

Jonathan Gray commented on HBASE-3243:
--

This is very weird.  Can you put up the full logs somewhere?

 Disable Table closed region on wrong host
 -

 Key: HBASE-3243
 URL: https://issues.apache.org/jira/browse/HBASE-3243
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.90.0

 Attachments: HBASE-3243-v1.patch


 I ran some YCSB benchmarks which resulted in about 150 regions worth of data 
 overnight. Then I disabled the table, and the master for some reason closed 
 one region on the wrong server. The server ignored this, but the region 
 remained open on a different server, which later flipped out when it tried to 
 flush due to hlog accumulation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3243) Disable Table closed region on wrong host

2010-11-17 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933094#action_12933094
 ] 

Todd Lipcon commented on HBASE-3243:


bq. Looking at this more, I'm not sure synchronization is the issue here 
because TreeMap appears to only be not thread-safe when there are mutations. 
The two critical pieces of code where a conflict could happen are where we read 
the server a region is assigned to, and where we set the server a region is 
assigned to

What about removals? I thought I saw a couple places with remove() that were 
unsynchronized. Will take a look at your patch momentarily.

bq. This is very weird. Can you put up the full logs somewhere

Yep, will upload them here, it's just fake data, nothing secret.

 Disable Table closed region on wrong host
 -

 Key: HBASE-3243
 URL: https://issues.apache.org/jira/browse/HBASE-3243
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.90.0

 Attachments: HBASE-3243-v1.patch


 I ran some YCSB benchmarks which resulted in about 150 regions worth of data 
 overnight. Then I disabled the table, and the master for some reason closed 
 one region on the wrong server. The server ignored this, but the region 
 remained open on a different server, which later flipped out when it tried to 
 flush due to hlog accumulation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3243) Disable Table closed region on wrong host

2010-11-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932818#action_12932818
 ] 

Todd Lipcon commented on HBASE-3243:


No idea if it's related but it seems like AssignmentManager's {{regions}} 
member is accessed without synchronization sometimes... could end up getting 
some incorrect data due to this.

 Disable Table closed region on wrong host
 -

 Key: HBASE-3243
 URL: https://issues.apache.org/jira/browse/HBASE-3243
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.90.0


 I ran some YCSB benchmarks which resulted in about 150 regions worth of data 
 overnight. Then I disabled the table, and the master for some reason closed 
 one region on the wrong server. The server ignored this, but the region 
 remained open on a different server, which later flipped out when it tried to 
 flush due to hlog accumulation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3243) Disable Table closed region on wrong host

2010-11-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932825#action_12932825
 ] 

Todd Lipcon commented on HBASE-3243:


Just grasping at straws here... but another thought is this:
The keys of the {{regions}} map are HRegionInfo, where compareTo() includes 
calling down to compareTo() on HTableDescriptor. HTableDescriptor's compareTo 
eventually delegates to hashcode on the {{values}} instance... do we end up 
changing the {{values}} instance when a table gets disabled? Perhaps this is 
then changing the sort order in the TreeMap and confusing things?

 Disable Table closed region on wrong host
 -

 Key: HBASE-3243
 URL: https://issues.apache.org/jira/browse/HBASE-3243
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.90.0


 I ran some YCSB benchmarks which resulted in about 150 regions worth of data 
 overnight. Then I disabled the table, and the master for some reason closed 
 one region on the wrong server. The server ignored this, but the region 
 remained open on a different server, which later flipped out when it tried to 
 flush due to hlog accumulation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.