[jira] Commented: (HBASE-3413) DNS Configs may completely break HBase cluster

2011-01-05 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977685#action_12977685
 ] 

ryan rawson commented on HBASE-3413:


how would we ensure the uuid would be generated the same upon every 
regionserver startup?

Another thing to consider is that dns names are inserted into META tables, and 
this is used by clients to find the machine.  If we detect a DNS change we 
would have to do a bunch of fancy work to ensure the META table is correct, no?



 DNS Configs may completely break HBase cluster
 --

 Key: HBASE-3413
 URL: https://issues.apache.org/jira/browse/HBASE-3413
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
 Environment: all
Reporter: Mathias Herberts

 I recently experienced a cluster malfunction which was caused by a change in 
 DNS config for services co-hosted on the machines running region servers.
 The RS are specified using IP addresses in the 'regionservers' file. Those 
 machines are 1.example.com to N.example.com (there are A RRs for those names 
 to each of the N IP addresses in 'regionservers').
 Until recently, the PTR RRs for the RS IPs were those x.example.com names.
 Then a service was deployed on some of the x.example.com machines, and new A 
 RRs were added for svc.example.com which point to each of the IPs used for 
 the service.
 Jointly new PTR records were added too for the given IPs. Those PTR records 
 have 'svc.example.com' as their PTRDATA, and this is causing the HBase 
 cluster to get completely confused.
 Since it is perfectly legal to have multiple PTR records, it seems important 
 to make the canonicalization of RS more robust to DNS tweaks.
 Maybe generating a UUID when a RS is started would help, this UUID could be 
 used to register the RS in ZK and we would not rely on DNS for obtaining a 
 stable canonical name (which may not even exist...).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3413) DNS Configs may completely break HBase cluster

2011-01-05 Thread Mathias Herberts (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977689#action_12977689
 ] 

Mathias Herberts commented on HBASE-3413:
-

The idea of using a UUID was to be able to detect that two RS accessed with 
different DNS names were truely the same (wih the same UUID).

As for using names in META tables, I've still to understand why we do that 
instead of using IPs.

 DNS Configs may completely break HBase cluster
 --

 Key: HBASE-3413
 URL: https://issues.apache.org/jira/browse/HBASE-3413
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
 Environment: all
Reporter: Mathias Herberts

 I recently experienced a cluster malfunction which was caused by a change in 
 DNS config for services co-hosted on the machines running region servers.
 The RS are specified using IP addresses in the 'regionservers' file. Those 
 machines are 1.example.com to N.example.com (there are A RRs for those names 
 to each of the N IP addresses in 'regionservers').
 Until recently, the PTR RRs for the RS IPs were those x.example.com names.
 Then a service was deployed on some of the x.example.com machines, and new A 
 RRs were added for svc.example.com which point to each of the IPs used for 
 the service.
 Jointly new PTR records were added too for the given IPs. Those PTR records 
 have 'svc.example.com' as their PTRDATA, and this is causing the HBase 
 cluster to get completely confused.
 Since it is perfectly legal to have multiple PTR records, it seems important 
 to make the canonicalization of RS more robust to DNS tweaks.
 Maybe generating a UUID when a RS is started would help, this UUID could be 
 used to register the RS in ZK and we would not rely on DNS for obtaining a 
 stable canonical name (which may not even exist...).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3415) When scanners have readers updated we should use original file selection algorithm rather than include all files

2011-01-05 Thread Jonathan Gray (JIRA)
When scanners have readers updated we should use original file selection 
algorithm rather than include all files


 Key: HBASE-3415
 URL: https://issues.apache.org/jira/browse/HBASE-3415
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.20.7, 0.90.0
Reporter: Jonathan Gray
 Fix For: 0.90.1


Currently when a {{StoreScanner}} is instantiated we use a {{getScanner(scan, 
columns)}} call that looks at things like bloom filters and memstore only 
flags.  But when we get a changed readers notification, we use {{getScanner()}} 
which just grabs everything.

We should always use the original file selection algorithm.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3415) When scanners have readers updated we should use original file selection algorithm rather than include all files

2011-01-05 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3415:
-

Attachment: HBASE-3415-v1.patch

First go.  There are other bugs in this code and updating readers, especially 
with intra-row scans.  Going to file more jiras.

 When scanners have readers updated we should use original file selection 
 algorithm rather than include all files
 

 Key: HBASE-3415
 URL: https://issues.apache.org/jira/browse/HBASE-3415
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.20.7, 0.90.0
Reporter: Jonathan Gray
 Fix For: 0.90.1

 Attachments: HBASE-3415-v1.patch


 Currently when a {{StoreScanner}} is instantiated we use a {{getScanner(scan, 
 columns)}} call that looks at things like bloom filters and memstore only 
 flags.  But when we get a changed readers notification, we use 
 {{getScanner()}} which just grabs everything.
 We should always use the original file selection algorithm.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3416) For intra-row scanning, the update readers notification resets the query matcher and can lead to incorrect behavior

2011-01-05 Thread Jonathan Gray (JIRA)
For intra-row scanning, the update readers notification resets the query 
matcher and can lead to incorrect behavior
---

 Key: HBASE-3416
 URL: https://issues.apache.org/jira/browse/HBASE-3416
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.20.7, 0.90.0
Reporter: Jonathan Gray
 Fix For: 0.90.1


In {{StoreScanner.resetScannerStack()}}, which is called on the first 
{{next()}} call after readers have been updated, we do a query matcher reset.  
Normally this is not an issue because the query matcher does not need to 
maintain state between rows.  However, if doing intra-row scanning w/ the 
specified limit, we could have the query matcher reset in the middle of reading 
a row.  This could lead to incorrect behavior (too many versions coming back, 
etc).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-01-05 Thread Jonathan Gray (JIRA)
CacheOnWrite is using the temporary output path for block names, need to use a 
more consistent block naming scheme
--

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0


Currently the block names used in the block cache are built using the 
filesystem path.  However, for cache on write, the path is a temporary output 
file.

The original COW patch actually made some modifications to block naming stuff 
to make it more consistent but did not do enough.  Should add a separate method 
somewhere for generating block names using some more easily mocked scheme 
(rather than just raw path as we generate a random unique file name twice, once 
for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3415) When scanners have readers updated we should use original file selection algorithm rather than include all files

2011-01-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977815#action_12977815
 ] 

stack commented on HBASE-3415:
--

Am I missing something?  You remove the getScanner(scan, columns) in the patch, 
the thing you would seem to want to preserve going by your comment above.

 When scanners have readers updated we should use original file selection 
 algorithm rather than include all files
 

 Key: HBASE-3415
 URL: https://issues.apache.org/jira/browse/HBASE-3415
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.20.7, 0.90.0
Reporter: Jonathan Gray
 Fix For: 0.90.1

 Attachments: HBASE-3415-v1.patch


 Currently when a {{StoreScanner}} is instantiated we use a {{getScanner(scan, 
 columns)}} call that looks at things like bloom filters and memstore only 
 flags.  But when we get a changed readers notification, we use 
 {{getScanner()}} which just grabs everything.
 We should always use the original file selection algorithm.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3406) Region stuck in transition after RS failed while opening

2011-01-05 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3406:
-

Fix Version/s: (was: 0.90.0)
   0.90.1

Moving to 0.90.1

I cannot explain how in-memory state has OPENING for the node but the znode 
content is M_ZK_REGION_OFFLINE without more context.

 Region stuck in transition after RS failed while opening
 

 Key: HBASE-3406
 URL: https://issues.apache.org/jira/browse/HBASE-3406
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Critical
 Fix For: 0.90.1


 I had a RS fail due to GC pause while it was in the midst of opening a 
 region, apparently. This got the region stuck in the following repeating 
 sequence in the master log:
 2011-01-03 17:24:33,884 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for 
 too long, reassigning 
 region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
 2011-01-03 17:24:33,885 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 master:6-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode 
 /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; 
 data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.,
  server=haus03.sf.cloudera.com:6, state=M_ZK_REGION_OFFLINE
 2011-01-03 17:24:43,886 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70. 
 state=OPENING, ts=1293840977790
 2011-01-03 17:24:43,886 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for 
 too long, reassigning 
 region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
 2011-01-03 17:24:43,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 master:6-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode 
 /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; 
 data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.,
  server=haus03.sf.cloudera.com:6, state=M_ZK_REGION_OFFLINE
 etc... repeating every 10 seconds. Eventually I ran hbck -fix which forced it 
 to OFFLINE in ZK and it reassigned just fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3418) Increment operations can break when qualifiers are split between memstore/snapshot and storefiles

2011-01-05 Thread Jonathan Gray (JIRA)
Increment operations can break when qualifiers are split between 
memstore/snapshot and storefiles
-

 Key: HBASE-3418
 URL: https://issues.apache.org/jira/browse/HBASE-3418
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.90.1, 0.92.0


Doing investigation around some observed resetting counter behavior.

An optimization was added to check memstore/snapshots first and then check 
storefiles if not all counters were found.  However it looks like this 
introduced a bug when columns for a given row/family in a single increment 
operation are spread across memstores and storefiles.

The results from get operations on both memstores and storefiles are appended 
together but when processed are expected to be fully sorted.  This can lead to 
invalid results.

Need to sort the combined result of memstores + storefiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3418) Increment operations can break when qualifiers are split between memstore/snapshot and storefiles

2011-01-05 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3418:
-

Attachment: HBASE-3418-v1.patch

Unit test which reproduces bad behavior and small fix which seems to work / 
fixes test.

 Increment operations can break when qualifiers are split between 
 memstore/snapshot and storefiles
 -

 Key: HBASE-3418
 URL: https://issues.apache.org/jira/browse/HBASE-3418
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.90.1, 0.92.0

 Attachments: HBASE-3418-v1.patch


 Doing investigation around some observed resetting counter behavior.
 An optimization was added to check memstore/snapshots first and then check 
 storefiles if not all counters were found.  However it looks like this 
 introduced a bug when columns for a given row/family in a single increment 
 operation are spread across memstores and storefiles.
 The results from get operations on both memstores and storefiles are appended 
 together but when processed are expected to be fully sorted.  This can lead 
 to invalid results.
 Need to sort the combined result of memstores + storefiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3418) Increment operations can break when qualifiers are split between memstore/snapshot and storefiles

2011-01-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977864#action_12977864
 ] 

Todd Lipcon commented on HBASE-3418:


Can we put this in 0.90? seems like inaccurate counters are a bad problem!

 Increment operations can break when qualifiers are split between 
 memstore/snapshot and storefiles
 -

 Key: HBASE-3418
 URL: https://issues.apache.org/jira/browse/HBASE-3418
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.90.1, 0.92.0

 Attachments: HBASE-3418-v1.patch


 Doing investigation around some observed resetting counter behavior.
 An optimization was added to check memstore/snapshots first and then check 
 storefiles if not all counters were found.  However it looks like this 
 introduced a bug when columns for a given row/family in a single increment 
 operation are spread across memstores and storefiles.
 The results from get operations on both memstores and storefiles are appended 
 together but when processed are expected to be fully sorted.  This can lead 
 to invalid results.
 Need to sort the combined result of memstores + storefiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3418) Increment operations can break when qualifiers are split between memstore/snapshot and storefiles

2011-01-05 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3418:
-

Fix Version/s: (was: 0.90.1)
   0.90.0

Yeah, looks like we're doing at least one more RC.

 Increment operations can break when qualifiers are split between 
 memstore/snapshot and storefiles
 -

 Key: HBASE-3418
 URL: https://issues.apache.org/jira/browse/HBASE-3418
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.90.0, 0.92.0

 Attachments: HBASE-3418-v1.patch


 Doing investigation around some observed resetting counter behavior.
 An optimization was added to check memstore/snapshots first and then check 
 storefiles if not all counters were found.  However it looks like this 
 introduced a bug when columns for a given row/family in a single increment 
 operation are spread across memstores and storefiles.
 The results from get operations on both memstores and storefiles are appended 
 together but when processed are expected to be fully sorted.  This can lead 
 to invalid results.
 Need to sort the combined result of memstores + storefiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3419) If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open.

2011-01-05 Thread Jonathan Gray (JIRA)
If re-transition to OPENING during log replay fails, server aborts.  Instead, 
should just cancel region open.
-

 Key: HBASE-3419
 URL: https://issues.apache.org/jira/browse/HBASE-3419
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Affects Versions: 0.90.0, 0.92.0
Reporter: Jonathan Gray
Priority: Critical
 Fix For: 0.90.1, 0.92.0


The {{Progressable}} used on region open to tickle the ZK OPENING node to 
prevent the master from timing out a region open operation will currently abort 
the RegionServer if this fails for some reason.  However it could be normal 
for an RS to have a region open operation aborted by the master, so should just 
handle as it does other places by reverting the open.

We had a cluster trip over some other issue (for some reason, the tickle was 
not happening in  30 seconds, so master was timing out every time).  Because 
of the abort on BadVersion, this eventually led to every single RS aborting 
itself eventually taking down the cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3419) If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open.

2011-01-05 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977870#action_12977870
 ] 

Jonathan Gray commented on HBASE-3419:
--

Currently the tickle happens on a number-of-replayed-edits interval (does not 
count edits skipped).  This is probably not the best idea since edits can be 
wildly different sizes (in this case, an all increment cluster where there are 
very high numbers of small edits).

The tickle is really about time not number of edits.  Maybe a Chore instead set 
at 1/2 master timeout?  Or some other way of doing it based on time instead of 
edits?

 If re-transition to OPENING during log replay fails, server aborts.  Instead, 
 should just cancel region open.
 -

 Key: HBASE-3419
 URL: https://issues.apache.org/jira/browse/HBASE-3419
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Affects Versions: 0.90.0, 0.92.0
Reporter: Jonathan Gray
Priority: Critical
 Fix For: 0.90.1, 0.92.0


 The {{Progressable}} used on region open to tickle the ZK OPENING node to 
 prevent the master from timing out a region open operation will currently 
 abort the RegionServer if this fails for some reason.  However it could be 
 normal for an RS to have a region open operation aborted by the master, so 
 should just handle as it does other places by reverting the open.
 We had a cluster trip over some other issue (for some reason, the tickle was 
 not happening in  30 seconds, so master was timing out every time).  Because 
 of the abort on BadVersion, this eventually led to every single RS aborting 
 itself eventually taking down the cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state

2011-01-05 Thread stack (JIRA)
Handling a big rebalance, we can queue multiple instances of a Close event; 
messes up state
---

 Key: HBASE-3420
 URL: https://issues.apache.org/jira/browse/HBASE-3420
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: stack
 Fix For: 0.90.1


This is pretty ugly.  In short, on a heavily loaded cluster, we are queuing 
multiple instances of region close.  They all try to run confusing state.

Long version:

I have a messy cluster.  Its 16k regions on 8 servers.  One node has 5k or so 
regions on it.  Heaps are 1G all around.  My master had OOME'd.  Not sure why 
but not too worried about it for now.  So, new master comes up and is trying to 
rebalance the cluster:

{code}
2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded 
servers onto 3 less loaded servers
{code}

The balancer ends up sending many closes to a single overloaded server are 
taking so long, the close times out in RIT.  We then do this:

{code}
  case CLOSED:
LOG.info(Region has been CLOSED for too long,  +
retriggering ClosedRegionHandler);
AssignmentManager.this.executorService.submit(
new ClosedRegionHandler(master, AssignmentManager.this,
regionState.getRegion()));
break;
{code}

We queue a new close (Should we?).

We time out a few more times (9 times) and each time we queue a new close.

Eventually the close succeeds, the region gets assigned a new location.

Then the next close pops off the eventhandler queue.

Here is the telltale signature of stuff gone amiss:

{code}
2011-01-05 00:52:19,379 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Forcing OFFLINE; 
was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
state=OPEN, ts=1294188709030
{code}

Notice how state is OPEN when we are forcing offline (It was actually just 
successfully opened).  We end up assigning same server because plan was still 
around:

{code}
2011-01-05 00:52:20,705 WARN 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted open 
of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. but 
already online on this server
{code}

But later when plan is cleared, we assign new server and we have dbl-assignment.





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-3412) HLogSplitter should handle missing HLogs

2011-01-05 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-3412.
---

  Resolution: Fixed
Assignee: Jean-Daniel Cryans
Hadoop Flags: [Reviewed]

Committed to branch and trunk, thanks for the review Stack!

 HLogSplitter should handle missing HLogs
 

 Key: HBASE-3412
 URL: https://issues.apache.org/jira/browse/HBASE-3412
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.90.0

 Attachments: HBASE-3412-2.patch, HBASE-3412.patch


 In build #48 (https://hudson.apache.org/hudson/job/hbase-0.90/48/), 
 TestReplication failed because of missing rows on the slave cluster. The 
 reason is that a region server that was killed was able to archive a log at 
 the same time the master was trying to recover it:
 {noformat}
 [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] util.FSUtils(625):
  Recovering file 
 hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
 ...
 [RegionServer:0;vesta.apache.org,58598,1294117333857.logRoller] wal.HLog(740):
  moving old hlog file 
 /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
  whose highest sequenceid is 422 to 
 /user/hudson/.oldlogs/vesta.apache.org%3A58598.1294117406909
 ...
 [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] 
 master.MasterFileSystem(204):
  Failed splitting 
 hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857
  java.io.IOException: Failed to open 
 hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
  for append
 Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
  org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
  No lease on 
 /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
  File does not exist. [Lease.  Holder: DFSClient_-986975908, pendingcreates: 
 1]
 {noformat}
 We should probably just handle the fact that a file could have been archived 
 (maybe even check in .oldlogs to be sure) and move on to the next log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state

2011-01-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977890#action_12977890
 ] 

stack commented on HBASE-3420:
--

Its timeout of a close.  Here is sequence:

{code}
2011-01-05 00:49:37,670 INFO org.apache.hadoop.hbase.master.HMaster: balance 
hri=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041., 
src=sv2borg181,60020,1294096110452, dest=sv2borg188,60020,1294187735582
2011-01-05 00:49:37,670 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Starting unassignment of region 
TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. (offlining)
2011-01-05 00:49:37,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Sent CLOSE to serverName=sv2borg181,60020,1294096110452, load=(requests=0, 
regions=0, usedHeap=0, maxHeap=0) for region 
TestTable,0487405776,1294125523541.  
b1fa38bb610943e9eadc604babe4d041.
2011-01-05 00:49:38,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
master:6-0x12d3de9e7c60e37 Retrieved 112 byte(s) of data from znode 
/hbase/unassigned/b1fa38bb610943e9eadc604babe4d041 and set watcher; 
region=TestTable,0487405776,1294125523541.   
b1fa38bb610943e9eadc604babe4d041., server=sv2borg181,60020,1294096110452, 
state=RS_ZK_REGION_CLOSED
2011-01-05 00:49:38,385 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling new unassigned node: 
/hbase/unassigned/b1fa38bb610943e9eadc604babe4d041 
(region=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041., 
server=sv2borg181,60020,  1294096110452, state=RS_ZK_REGION_CLOSED)
2011-01-05 00:49:38,385 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_CLOSED, server=sv2borg181,60020,1294096110452, 
region=b1fa38bb610943e9eadc604babe4d041
2011-01-05 00:50:12,412 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Regions in transition timed out:  
TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
state=CLOSED, ts=1294188578211
2011-01-05 00:50:12,412 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Region has been CLOSED for too long, retriggering ClosedRegionHandler
{code}


 Handling a big rebalance, we can queue multiple instances of a Close event; 
 messes up state
 ---

 Key: HBASE-3420
 URL: https://issues.apache.org/jira/browse/HBASE-3420
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: stack
 Fix For: 0.90.1


 This is pretty ugly.  In short, on a heavily loaded cluster, we are queuing 
 multiple instances of region close.  They all try to run confusing state.
 Long version:
 I have a messy cluster.  Its 16k regions on 8 servers.  One node has 5k or so 
 regions on it.  Heaps are 1G all around.  My master had OOME'd.  Not sure why 
 but not too worried about it for now.  So, new master comes up and is trying 
 to rebalance the cluster:
 {code}
 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
 Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded 
 servers onto 3 less loaded servers
 {code}
 The balancer ends up sending many closes to a single overloaded server are 
 taking so long, the close times out in RIT.  We then do this:
 {code}
   case CLOSED:
 LOG.info(Region has been CLOSED for too long,  +
 retriggering ClosedRegionHandler);
 AssignmentManager.this.executorService.submit(
 new ClosedRegionHandler(master, AssignmentManager.this,
 regionState.getRegion()));
 break;
 {code}
 We queue a new close (Should we?).
 We time out a few more times (9 times) and each time we queue a new close.
 Eventually the close succeeds, the region gets assigned a new location.
 Then the next close pops off the eventhandler queue.
 Here is the telltale signature of stuff gone amiss:
 {code}
 2011-01-05 00:52:19,379 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 state=OPEN, ts=1294188709030
 {code}
 Notice how state is OPEN when we are forcing offline (It was actually just 
 successfully opened).  We end up assigning same server because plan was still 
 around:
 {code}
 2011-01-05 00:52:20,705 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted 
 open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 but already online on this server
 {code}
 But later when plan is cleared, we assign new server and we have 
 dbl-assignment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3418) Increment operations can break when qualifiers are split between memstore/snapshot and storefiles

2011-01-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977896#action_12977896
 ] 

stack commented on HBASE-3418:
--

+1 on patch.

 Increment operations can break when qualifiers are split between 
 memstore/snapshot and storefiles
 -

 Key: HBASE-3418
 URL: https://issues.apache.org/jira/browse/HBASE-3418
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.90.0, 0.92.0

 Attachments: HBASE-3418-v1.patch


 Doing investigation around some observed resetting counter behavior.
 An optimization was added to check memstore/snapshots first and then check 
 storefiles if not all counters were found.  However it looks like this 
 introduced a bug when columns for a given row/family in a single increment 
 operation are spread across memstores and storefiles.
 The results from get operations on both memstores and storefiles are appended 
 together but when processed are expected to be fully sorted.  This can lead 
 to invalid results.
 Need to sort the combined result of memstores + storefiles.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state

2011-01-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977897#action_12977897
 ] 

stack commented on HBASE-3420:
--

Looking more, the CLOSED event had been queued over on the master but tens of 
seconds elapsed before it had a chance to run (This was a rebalance of 
thousands of regions on constrained server).  Meantime, we were requeuing 
CloseRegionHandlers every ten seconds as the CLOSED timeeout in RIT.

I'm going to post patch that removes the adding new CRH to event queue on 
timeout of CLOSED.  Either the queued original CRH will run or server will 
crash and region state will be altered appropriately at that time.

 Handling a big rebalance, we can queue multiple instances of a Close event; 
 messes up state
 ---

 Key: HBASE-3420
 URL: https://issues.apache.org/jira/browse/HBASE-3420
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: stack
 Fix For: 0.90.1


 This is pretty ugly.  In short, on a heavily loaded cluster, we are queuing 
 multiple instances of region close.  They all try to run confusing state.
 Long version:
 I have a messy cluster.  Its 16k regions on 8 servers.  One node has 5k or so 
 regions on it.  Heaps are 1G all around.  My master had OOME'd.  Not sure why 
 but not too worried about it for now.  So, new master comes up and is trying 
 to rebalance the cluster:
 {code}
 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
 Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded 
 servers onto 3 less loaded servers
 {code}
 The balancer ends up sending many closes to a single overloaded server are 
 taking so long, the close times out in RIT.  We then do this:
 {code}
   case CLOSED:
 LOG.info(Region has been CLOSED for too long,  +
 retriggering ClosedRegionHandler);
 AssignmentManager.this.executorService.submit(
 new ClosedRegionHandler(master, AssignmentManager.this,
 regionState.getRegion()));
 break;
 {code}
 We queue a new close (Should we?).
 We time out a few more times (9 times) and each time we queue a new close.
 Eventually the close succeeds, the region gets assigned a new location.
 Then the next close pops off the eventhandler queue.
 Here is the telltale signature of stuff gone amiss:
 {code}
 2011-01-05 00:52:19,379 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 state=OPEN, ts=1294188709030
 {code}
 Notice how state is OPEN when we are forcing offline (It was actually just 
 successfully opened).  We end up assigning same server because plan was still 
 around:
 {code}
 2011-01-05 00:52:20,705 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted 
 open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 but already online on this server
 {code}
 But later when plan is cleared, we assign new server and we have 
 dbl-assignment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state

2011-01-05 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3420:
-

Attachment: 3420.txt

This should address most egregious issue turned up by these logs.  Other things 
to add are maximum regions to assign per balance.  We should add that too.  
Will make a new issue for that once this goes in.

 Handling a big rebalance, we can queue multiple instances of a Close event; 
 messes up state
 ---

 Key: HBASE-3420
 URL: https://issues.apache.org/jira/browse/HBASE-3420
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: stack
 Fix For: 0.90.1

 Attachments: 3420.txt


 This is pretty ugly.  In short, on a heavily loaded cluster, we are queuing 
 multiple instances of region close.  They all try to run confusing state.
 Long version:
 I have a messy cluster.  Its 16k regions on 8 servers.  One node has 5k or so 
 regions on it.  Heaps are 1G all around.  My master had OOME'd.  Not sure why 
 but not too worried about it for now.  So, new master comes up and is trying 
 to rebalance the cluster:
 {code}
 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
 Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded 
 servers onto 3 less loaded servers
 {code}
 The balancer ends up sending many closes to a single overloaded server are 
 taking so long, the close times out in RIT.  We then do this:
 {code}
   case CLOSED:
 LOG.info(Region has been CLOSED for too long,  +
 retriggering ClosedRegionHandler);
 AssignmentManager.this.executorService.submit(
 new ClosedRegionHandler(master, AssignmentManager.this,
 regionState.getRegion()));
 break;
 {code}
 We queue a new close (Should we?).
 We time out a few more times (9 times) and each time we queue a new close.
 Eventually the close succeeds, the region gets assigned a new location.
 Then the next close pops off the eventhandler queue.
 Here is the telltale signature of stuff gone amiss:
 {code}
 2011-01-05 00:52:19,379 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 state=OPEN, ts=1294188709030
 {code}
 Notice how state is OPEN when we are forcing offline (It was actually just 
 successfully opened).  We end up assigning same server because plan was still 
 around:
 {code}
 2011-01-05 00:52:20,705 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted 
 open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 but already online on this server
 {code}
 But later when plan is cleared, we assign new server and we have 
 dbl-assignment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state

2011-01-05 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977906#action_12977906
 ] 

Jonathan Gray commented on HBASE-3420:
--

So this just updates the timestamp.  Seems like it would be equivalent to 
logging and doing a NO-OP on CLOSED timeout (only point of updating timestamp 
is to prevent another timeout).  I guess this is fine since we will get a log 
message once per timeout period though.

So once the CRH runs, the RegionState goes to OFFLINE huh?  Makes sense then.

+1

and +1 on a maxregionstobalanceatonce or the like

 Handling a big rebalance, we can queue multiple instances of a Close event; 
 messes up state
 ---

 Key: HBASE-3420
 URL: https://issues.apache.org/jira/browse/HBASE-3420
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: stack
 Fix For: 0.90.1

 Attachments: 3420.txt


 This is pretty ugly.  In short, on a heavily loaded cluster, we are queuing 
 multiple instances of region close.  They all try to run confusing state.
 Long version:
 I have a messy cluster.  Its 16k regions on 8 servers.  One node has 5k or so 
 regions on it.  Heaps are 1G all around.  My master had OOME'd.  Not sure why 
 but not too worried about it for now.  So, new master comes up and is trying 
 to rebalance the cluster:
 {code}
 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
 Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded 
 servers onto 3 less loaded servers
 {code}
 The balancer ends up sending many closes to a single overloaded server are 
 taking so long, the close times out in RIT.  We then do this:
 {code}
   case CLOSED:
 LOG.info(Region has been CLOSED for too long,  +
 retriggering ClosedRegionHandler);
 AssignmentManager.this.executorService.submit(
 new ClosedRegionHandler(master, AssignmentManager.this,
 regionState.getRegion()));
 break;
 {code}
 We queue a new close (Should we?).
 We time out a few more times (9 times) and each time we queue a new close.
 Eventually the close succeeds, the region gets assigned a new location.
 Then the next close pops off the eventhandler queue.
 Here is the telltale signature of stuff gone amiss:
 {code}
 2011-01-05 00:52:19,379 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 state=OPEN, ts=1294188709030
 {code}
 Notice how state is OPEN when we are forcing offline (It was actually just 
 successfully opened).  We end up assigning same server because plan was still 
 around:
 {code}
 2011-01-05 00:52:20,705 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted 
 open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 but already online on this server
 {code}
 But later when plan is cleared, we assign new server and we have 
 dbl-assignment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state

2011-01-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977929#action_12977929
 ] 

stack commented on HBASE-3420:
--

@Ted Balancing works differently in 0.90.  Where before, when a RS would 
heartbeat, in the response we'd give it a set of regions to open/close.  The 
new region assignment goes via zk.  The balancer looks at total cluster state 
and comes up w/ a plan.  It then starts the plan rolling which instigates a 
cascade of closings done via zk.

 Handling a big rebalance, we can queue multiple instances of a Close event; 
 messes up state
 ---

 Key: HBASE-3420
 URL: https://issues.apache.org/jira/browse/HBASE-3420
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: stack
 Fix For: 0.90.1

 Attachments: 3420.txt


 This is pretty ugly.  In short, on a heavily loaded cluster, we are queuing 
 multiple instances of region close.  They all try to run confusing state.
 Long version:
 I have a messy cluster.  Its 16k regions on 8 servers.  One node has 5k or so 
 regions on it.  Heaps are 1G all around.  My master had OOME'd.  Not sure why 
 but not too worried about it for now.  So, new master comes up and is trying 
 to rebalance the cluster:
 {code}
 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
 Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded 
 servers onto 3 less loaded servers
 {code}
 The balancer ends up sending many closes to a single overloaded server are 
 taking so long, the close times out in RIT.  We then do this:
 {code}
   case CLOSED:
 LOG.info(Region has been CLOSED for too long,  +
 retriggering ClosedRegionHandler);
 AssignmentManager.this.executorService.submit(
 new ClosedRegionHandler(master, AssignmentManager.this,
 regionState.getRegion()));
 break;
 {code}
 We queue a new close (Should we?).
 We time out a few more times (9 times) and each time we queue a new close.
 Eventually the close succeeds, the region gets assigned a new location.
 Then the next close pops off the eventhandler queue.
 Here is the telltale signature of stuff gone amiss:
 {code}
 2011-01-05 00:52:19,379 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 state=OPEN, ts=1294188709030
 {code}
 Notice how state is OPEN when we are forcing offline (It was actually just 
 successfully opened).  We end up assigning same server because plan was still 
 around:
 {code}
 2011-01-05 00:52:20,705 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted 
 open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 but already online on this server
 {code}
 But later when plan is cleared, we assign new server and we have 
 dbl-assignment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state

2011-01-05 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977930#action_12977930
 ] 

Jonathan Gray commented on HBASE-3420:
--

It's unrelated to the notion of checkins (which is almost completely gone 
now) so not sure why we would reuse this config param.  We could set per-RS 
limits but that would probably require significantly more hack-up of the 
balancing algo.

 Handling a big rebalance, we can queue multiple instances of a Close event; 
 messes up state
 ---

 Key: HBASE-3420
 URL: https://issues.apache.org/jira/browse/HBASE-3420
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: stack
 Fix For: 0.90.1

 Attachments: 3420.txt


 This is pretty ugly.  In short, on a heavily loaded cluster, we are queuing 
 multiple instances of region close.  They all try to run confusing state.
 Long version:
 I have a messy cluster.  Its 16k regions on 8 servers.  One node has 5k or so 
 regions on it.  Heaps are 1G all around.  My master had OOME'd.  Not sure why 
 but not too worried about it for now.  So, new master comes up and is trying 
 to rebalance the cluster:
 {code}
 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
 Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded 
 servers onto 3 less loaded servers
 {code}
 The balancer ends up sending many closes to a single overloaded server are 
 taking so long, the close times out in RIT.  We then do this:
 {code}
   case CLOSED:
 LOG.info(Region has been CLOSED for too long,  +
 retriggering ClosedRegionHandler);
 AssignmentManager.this.executorService.submit(
 new ClosedRegionHandler(master, AssignmentManager.this,
 regionState.getRegion()));
 break;
 {code}
 We queue a new close (Should we?).
 We time out a few more times (9 times) and each time we queue a new close.
 Eventually the close succeeds, the region gets assigned a new location.
 Then the next close pops off the eventhandler queue.
 Here is the telltale signature of stuff gone amiss:
 {code}
 2011-01-05 00:52:19,379 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 state=OPEN, ts=1294188709030
 {code}
 Notice how state is OPEN when we are forcing offline (It was actually just 
 successfully opened).  We end up assigning same server because plan was still 
 around:
 {code}
 2011-01-05 00:52:20,705 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted 
 open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 but already online on this server
 {code}
 But later when plan is cleared, we assign new server and we have 
 dbl-assignment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-01-05 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977943#action_12977943
 ] 

Jonathan Gray commented on HBASE-3417:
--

One idea from discussion with stack is to use a UUID for the filename.  That 
way we can generate it once for the temporary file and then just move it in 
place without doing a rename.  Would then just use UUID + blockNumber as the 
blockName.

 CacheOnWrite is using the temporary output path for block names, need to use 
 a more consistent block naming scheme
 --

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0


 Currently the block names used in the block cache are built using the 
 filesystem path.  However, for cache on write, the path is a temporary output 
 file.
 The original COW patch actually made some modifications to block naming stuff 
 to make it more consistent but did not do enough.  Should add a separate 
 method somewhere for generating block names using some more easily mocked 
 scheme (rather than just raw path as we generate a random unique file name 
 twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state

2011-01-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977968#action_12977968
 ] 

stack commented on HBASE-3420:
--

Ok... with this patch in place, master was able to join the cluster w/o 
aborting and live through the rebalance (all regions cleared from RIT).  I'm 
going to commit.

 Handling a big rebalance, we can queue multiple instances of a Close event; 
 messes up state
 ---

 Key: HBASE-3420
 URL: https://issues.apache.org/jira/browse/HBASE-3420
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: stack
 Fix For: 0.90.1

 Attachments: 3420.txt


 This is pretty ugly.  In short, on a heavily loaded cluster, we are queuing 
 multiple instances of region close.  They all try to run confusing state.
 Long version:
 I have a messy cluster.  Its 16k regions on 8 servers.  One node has 5k or so 
 regions on it.  Heaps are 1G all around.  My master had OOME'd.  Not sure why 
 but not too worried about it for now.  So, new master comes up and is trying 
 to rebalance the cluster:
 {code}
 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
 Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded 
 servers onto 3 less loaded servers
 {code}
 The balancer ends up sending many closes to a single overloaded server are 
 taking so long, the close times out in RIT.  We then do this:
 {code}
   case CLOSED:
 LOG.info(Region has been CLOSED for too long,  +
 retriggering ClosedRegionHandler);
 AssignmentManager.this.executorService.submit(
 new ClosedRegionHandler(master, AssignmentManager.this,
 regionState.getRegion()));
 break;
 {code}
 We queue a new close (Should we?).
 We time out a few more times (9 times) and each time we queue a new close.
 Eventually the close succeeds, the region gets assigned a new location.
 Then the next close pops off the eventhandler queue.
 Here is the telltale signature of stuff gone amiss:
 {code}
 2011-01-05 00:52:19,379 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 state=OPEN, ts=1294188709030
 {code}
 Notice how state is OPEN when we are forcing offline (It was actually just 
 successfully opened).  We end up assigning same server because plan was still 
 around:
 {code}
 2011-01-05 00:52:20,705 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted 
 open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 but already online on this server
 {code}
 But later when plan is cleared, we assign new server and we have 
 dbl-assignment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state

2011-01-05 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3420.
--

   Resolution: Fixed
Fix Version/s: (was: 0.90.1)
   0.90.0
 Assignee: stack
 Hadoop Flags: [Reviewed]

Committed to branch and trunk.

 Handling a big rebalance, we can queue multiple instances of a Close event; 
 messes up state
 ---

 Key: HBASE-3420
 URL: https://issues.apache.org/jira/browse/HBASE-3420
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: stack
Assignee: stack
 Fix For: 0.90.0

 Attachments: 3420.txt


 This is pretty ugly.  In short, on a heavily loaded cluster, we are queuing 
 multiple instances of region close.  They all try to run confusing state.
 Long version:
 I have a messy cluster.  Its 16k regions on 8 servers.  One node has 5k or so 
 regions on it.  Heaps are 1G all around.  My master had OOME'd.  Not sure why 
 but not too worried about it for now.  So, new master comes up and is trying 
 to rebalance the cluster:
 {code}
 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
 Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded 
 servers onto 3 less loaded servers
 {code}
 The balancer ends up sending many closes to a single overloaded server are 
 taking so long, the close times out in RIT.  We then do this:
 {code}
   case CLOSED:
 LOG.info(Region has been CLOSED for too long,  +
 retriggering ClosedRegionHandler);
 AssignmentManager.this.executorService.submit(
 new ClosedRegionHandler(master, AssignmentManager.this,
 regionState.getRegion()));
 break;
 {code}
 We queue a new close (Should we?).
 We time out a few more times (9 times) and each time we queue a new close.
 Eventually the close succeeds, the region gets assigned a new location.
 Then the next close pops off the eventhandler queue.
 Here is the telltale signature of stuff gone amiss:
 {code}
 2011-01-05 00:52:19,379 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 state=OPEN, ts=1294188709030
 {code}
 Notice how state is OPEN when we are forcing offline (It was actually just 
 successfully opened).  We end up assigning same server because plan was still 
 around:
 {code}
 2011-01-05 00:52:20,705 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted 
 open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 but already online on this server
 {code}
 But later when plan is cleared, we assign new server and we have 
 dbl-assignment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3421) Very wide rows -- 30M plus -- cause us OOME

2011-01-05 Thread stack (JIRA)
Very wide rows -- 30M plus -- cause us OOME
---

 Key: HBASE-3421
 URL: https://issues.apache.org/jira/browse/HBASE-3421
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: stack


From the list, see 'jvm oom' in 
http://mail-archives.apache.org/mod_mbox/hbase-user/201101.mbox/browser, it 
looks like wide rows -- 30M or so -- causes OOME during compaction.  We should 
check it out. Can the scanner used during compactions use the 'limit' when 
nexting?  If so, this should save our OOME'ing (or, we need to add to the next 
a max size rather than count of KVs).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3422) Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.

2011-01-05 Thread stack (JIRA)
Balancer will willing try to rebalance thousands of regions in one go; needs an 
upper bound added.
--

 Key: HBASE-3422
 URL: https://issues.apache.org/jira/browse/HBASE-3422
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.0
Reporter: stack


See HBASE-3420.  Therein, a wonky cluster had 5k regions on one server and  1k 
on others.  Balancer ran and wanted to redistribute 3k+ all in one go.  Madness.

If a load of rebalancing to be done, should be done somewhat piecemeal.  We 
need maximum regions to rebalance at a time upper bound at a minimum.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3409) Failed server shutdown processing when retrying hlog split

2011-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977981#action_12977981
 ] 

Hudson commented on HBASE-3409:
---

Integrated in HBase-TRUNK #1703 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1703/])


 Failed server shutdown processing when retrying hlog split
 --

 Key: HBASE-3409
 URL: https://issues.apache.org/jira/browse/HBASE-3409
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Assignee: stack
Priority: Blocker
 Fix For: 0.90.0

 Attachments: 3409.txt


 2011-01-04 01:14:17,353 WARN org.apache.hadoop.hbase.master.MasterFileSystem: 
 Retrying splitting because of:
 org.apache.hadoop.hbase.regionserver.wal.OrphanHLogAfterSplitException: 
 Discovered orphan hlog after split. Maybe the HRegionServer was not dead when 
 we started
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:286)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:187)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:96)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 2011-01-04 01:14:17,353 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
 Caught throwable while processing event M_META_SERVER_SHUTDOWN
 java.lang.IllegalStateException: An HLogSplitter instance may only be used 
 once
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:145)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:170)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:199)
 at 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:96)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3412) HLogSplitter should handle missing HLogs

2011-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977978#action_12977978
 ] 

Hudson commented on HBASE-3412:
---

Integrated in HBase-TRUNK #1703 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1703/])
HBASE-3412  HLogSplitter should handle missing HLogs


 HLogSplitter should handle missing HLogs
 

 Key: HBASE-3412
 URL: https://issues.apache.org/jira/browse/HBASE-3412
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.90.0

 Attachments: HBASE-3412-2.patch, HBASE-3412.patch


 In build #48 (https://hudson.apache.org/hudson/job/hbase-0.90/48/), 
 TestReplication failed because of missing rows on the slave cluster. The 
 reason is that a region server that was killed was able to archive a log at 
 the same time the master was trying to recover it:
 {noformat}
 [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] util.FSUtils(625):
  Recovering file 
 hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
 ...
 [RegionServer:0;vesta.apache.org,58598,1294117333857.logRoller] wal.HLog(740):
  moving old hlog file 
 /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
  whose highest sequenceid is 422 to 
 /user/hudson/.oldlogs/vesta.apache.org%3A58598.1294117406909
 ...
 [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] 
 master.MasterFileSystem(204):
  Failed splitting 
 hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857
  java.io.IOException: Failed to open 
 hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
  for append
 Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
  org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
  No lease on 
 /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
  File does not exist. [Lease.  Holder: DFSClient_-986975908, pendingcreates: 
 1]
 {noformat}
 We should probably just handle the fact that a file could have been archived 
 (maybe even check in .oldlogs to be sure) and move on to the next log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3402) Web UI shows two META regions

2011-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977979#action_12977979
 ] 

Hudson commented on HBASE-3402:
---

Integrated in HBase-TRUNK #1703 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1703/])


 Web UI shows two META regions
 -

 Key: HBASE-3402
 URL: https://issues.apache.org/jira/browse/HBASE-3402
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Assignee: stack
Priority: Critical
 Fix For: 0.90.0

 Attachments: two-metas.png


 Running 0...@r1052112 I see two regions for META on the same server. Both 
 have start key '-' and end key '-'.
 Things seem to work OK, but it's very strange.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3419) If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open.

2011-01-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977983#action_12977983
 ] 

stack commented on HBASE-3419:
--

Chatting about this up on IRC, the tickle does not happen if we are skipping 
edits.  Thats wrong.  We should tickle even if we skip edits .

Regards making progressable a Chore, I'd say not exactly.  Progressable is 
about whether or no progress is being made.  We dont' want the tickle to happen 
if we are stuck on HDFS.  Chatting w/ Jon,  the tickle should happen not after 
N edits but after P milliseconds AS LONG AS we're making progress. 

Also, killing regionserver if we fail replay recovered.edits in time is wrong.  
Instead we should fail the region open.

 If re-transition to OPENING during log replay fails, server aborts.  Instead, 
 should just cancel region open.
 -

 Key: HBASE-3419
 URL: https://issues.apache.org/jira/browse/HBASE-3419
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Affects Versions: 0.90.0, 0.92.0
Reporter: Jonathan Gray
Priority: Critical
 Fix For: 0.90.1, 0.92.0


 The {{Progressable}} used on region open to tickle the ZK OPENING node to 
 prevent the master from timing out a region open operation will currently 
 abort the RegionServer if this fails for some reason.  However it could be 
 normal for an RS to have a region open operation aborted by the master, so 
 should just handle as it does other places by reverting the open.
 We had a cluster trip over some other issue (for some reason, the tickle was 
 not happening in  30 seconds, so master was timing out every time).  Because 
 of the abort on BadVersion, this eventually led to every single RS aborting 
 itself eventually taking down the cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HBASE-3419) If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open.

2011-01-05 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray reassigned HBASE-3419:


Assignee: Jonathan Gray

Working on implementing what stack outlined above.

 If re-transition to OPENING during log replay fails, server aborts.  Instead, 
 should just cancel region open.
 -

 Key: HBASE-3419
 URL: https://issues.apache.org/jira/browse/HBASE-3419
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Affects Versions: 0.90.0, 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.90.1, 0.92.0


 The {{Progressable}} used on region open to tickle the ZK OPENING node to 
 prevent the master from timing out a region open operation will currently 
 abort the RegionServer if this fails for some reason.  However it could be 
 normal for an RS to have a region open operation aborted by the master, so 
 should just handle as it does other places by reverting the open.
 We had a cluster trip over some other issue (for some reason, the tickle was 
 not happening in  30 seconds, so master was timing out every time).  Because 
 of the abort on BadVersion, this eventually led to every single RS aborting 
 itself eventually taking down the cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3421) Very wide rows -- 30M plus -- cause us OOME

2011-01-05 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978016#action_12978016
 ] 

Nicolas Spiegelberg commented on HBASE-3421:


Note that you can limit the number of StoreFiles that can be compacted at one 
time...

Store.java#204:  this.maxFilesToCompact =
conf.getInt(hbase.hstore.compaction.max, 10)

30M * 10 SF == 300MB.  What is your RAM capacity?  You are likely stuck on an 
merging outlier that exists in every SF.  I would run:

bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -f FILE_NAME -p |sed 
's/V:.*$//g'|less

on the HFiles in that Store to see what your high watermark is.

 Very wide rows -- 30M plus -- cause us OOME
 ---

 Key: HBASE-3421
 URL: https://issues.apache.org/jira/browse/HBASE-3421
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: stack

 From the list, see 'jvm oom' in 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201101.mbox/browser, it 
 looks like wide rows -- 30M or so -- causes OOME during compaction.  We 
 should check it out. Can the scanner used during compactions use the 'limit' 
 when nexting?  If so, this should save our OOME'ing (or, we need to add to 
 the next a max size rather than count of KVs).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3419) If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open.

2011-01-05 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3419:
-

Attachment: HBASE-3419-v1.patch

As outlined.

Had to add new {{CancelableProgressable}} interface because we needed to be 
able to tell the caller to cancel the operation.

 If re-transition to OPENING during log replay fails, server aborts.  Instead, 
 should just cancel region open.
 -

 Key: HBASE-3419
 URL: https://issues.apache.org/jira/browse/HBASE-3419
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Affects Versions: 0.90.0, 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.90.1, 0.92.0

 Attachments: HBASE-3419-v1.patch


 The {{Progressable}} used on region open to tickle the ZK OPENING node to 
 prevent the master from timing out a region open operation will currently 
 abort the RegionServer if this fails for some reason.  However it could be 
 normal for an RS to have a region open operation aborted by the master, so 
 should just handle as it does other places by reverting the open.
 We had a cluster trip over some other issue (for some reason, the tickle was 
 not happening in  30 seconds, so master was timing out every time).  Because 
 of the abort on BadVersion, this eventually led to every single RS aborting 
 itself eventually taking down the cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3403) Region orphaned after failure during split

2011-01-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978035#action_12978035
 ] 

stack commented on HBASE-3403:
--

bq. + cluster.getMaster().catalogJanitorSwitch(false);

I added above switch to avoid the unlikely but *perhaps* possible case of 
split, compaction in each daughter, and run of catalogjanitor happens before we 
get our edit of .META. in.  Just trying to do all I can to avoid a debug of 
failed tests up on hudson.

NP on changing name of method.  Will do.

bq. Does this change introduce a new bug?

Yes.  That could happen.  Unlikely, but perhaps.  Let me spin a new patch.




 Region orphaned after failure during split
 --

 Key: HBASE-3403
 URL: https://issues.apache.org/jira/browse/HBASE-3403
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.90.0

 Attachments: 3403.txt, broken-split.txt, 
 hbck-fix-missing-in-meta.txt, master-logs.txt.gz


 ERROR: Region 
 hdfs://haus01.sf.cloudera.com:11020/hbase-normal/usertable/2ad8df700eea55f70e02ea89178a65a2
  on HDFS, but not listed in META or deployed on any region server.
 ERROR: Found inconsistency in table usertable
 Not sure how I got into this state, will look through logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3419) If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open.

2011-01-05 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3419:
-

Attachment: HBASE-3419-v2.patch

Squashed the v1 patch with another patch.  v2 is just this stuff.

 If re-transition to OPENING during log replay fails, server aborts.  Instead, 
 should just cancel region open.
 -

 Key: HBASE-3419
 URL: https://issues.apache.org/jira/browse/HBASE-3419
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Affects Versions: 0.90.0, 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.90.1, 0.92.0

 Attachments: HBASE-3419-v1.patch, HBASE-3419-v2.patch


 The {{Progressable}} used on region open to tickle the ZK OPENING node to 
 prevent the master from timing out a region open operation will currently 
 abort the RegionServer if this fails for some reason.  However it could be 
 normal for an RS to have a region open operation aborted by the master, so 
 should just handle as it does other places by reverting the open.
 We had a cluster trip over some other issue (for some reason, the tickle was 
 not happening in  30 seconds, so master was timing out every time).  Because 
 of the abort on BadVersion, this eventually led to every single RS aborting 
 itself eventually taking down the cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3421) Very wide rows -- 30M plus -- cause us OOME

2011-01-05 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978054#action_12978054
 ] 

Nicolas Spiegelberg commented on HBASE-3421:


For interested parties...

From: Ted Yu
Hi,
I used the command you suggested in HBASE-3421 on a table and got:

K: 0012F2157E58883070B9814047048E8B/v:_/1283909035492/Put/vlen=1308 
K: 0041A80A545C4CBF412865412065BF5E/v:_/1283909035492/Put/vlen=1311 
K: 00546F4AA313020E551E049E848949C6/v:_/1283909035492/Put/vlen=1866 
K: 0068CC263C81CE65B65FC5425EFEBBCD/v:_/1283909035492/Put/vlen=1191 
K: 006DB8745D6D1B624F77E0F06C177C0B/v:_/1283909035492/Put/vlen=1021 
K: 006F9037BD7A8F081B54C5B03756C143/v:_/1283909035492/Put/vlen=1382 
...

Can you briefly describe what conclusion can be drawn here ?

~~~
From: Nicolas Spiegelberg

You're basically seeing all the KeyValues in that HFile.  The format is 
basically:

K: KeyValue.toString()

If you look at KeyValue.toString(), you'll see that the format is roughly:

row/family:qualifier/timestamp/type/value_length

So, it looks like you only have one qualifier per row and each row is roughly 
~1500 bytes of data.  For the user with the 30K columns per row, you should see 
an output that contains a ton of lines with the same row.  If you grep that 
row, cut the number after vlen=, and sum the values, you can see the size of 
your rows on a per-Hfile basis.


 Very wide rows -- 30M plus -- cause us OOME
 ---

 Key: HBASE-3421
 URL: https://issues.apache.org/jira/browse/HBASE-3421
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: stack

 From the list, see 'jvm oom' in 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201101.mbox/browser, it 
 looks like wide rows -- 30M or so -- causes OOME during compaction.  We 
 should check it out. Can the scanner used during compactions use the 'limit' 
 when nexting?  If so, this should save our OOME'ing (or, we need to add to 
 the next a max size rather than count of KVs).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-01-05 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3417:
-

Attachment: HBASE-3417-v1.patch

Changes storefile names to be UUIDs.  Makes it so we use the same name for the 
tmp file and the permanent file.  Updates a regex which now matches against 32 
char word string instead of digits.  Changes HFile to use the file name for 
block cache block names rather than full path.

 CacheOnWrite is using the temporary output path for block names, need to use 
 a more consistent block naming scheme
 --

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3417-v1.patch


 Currently the block names used in the block cache are built using the 
 filesystem path.  However, for cache on write, the path is a temporary output 
 file.
 The original COW patch actually made some modifications to block naming stuff 
 to make it more consistent but did not do enough.  Should add a separate 
 method somewhere for generating block names using some more easily mocked 
 scheme (rather than just raw path as we generate a random unique file name 
 twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3423) hbase-env.sh over-rides HBASE_OPTS incorrectly.

2011-01-05 Thread Ted Dunning (JIRA)
hbase-env.sh over-rides HBASE_OPTS incorrectly.
---

 Key: HBASE-3423
 URL: https://issues.apache.org/jira/browse/HBASE-3423
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Ted Dunning
 Fix For: 0.90.0


conf/hbase-env.sh has the following line:

   export HBASE_OPTS=-ea -XX:+HeapDumpOnOutOfMemoryError 
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode

This should be

   export HBASE_OPTS=$HBASE_OPTS -ea -XX:+HeapDumpOnOutOfMemoryError 
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-01-05 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3417:
-

Attachment: HBASE-3417-v2.patch

Makes it so we don't have to parse fileName for each block when doing 
CacheOnWrite.

 CacheOnWrite is using the temporary output path for block names, need to use 
 a more consistent block naming scheme
 --

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch


 Currently the block names used in the block cache are built using the 
 filesystem path.  However, for cache on write, the path is a temporary output 
 file.
 The original COW patch actually made some modifications to block naming stuff 
 to make it more consistent but did not do enough.  Should add a separate 
 method somewhere for generating block names using some more easily mocked 
 scheme (rather than just raw path as we generate a random unique file name 
 twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-01-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978122#action_12978122
 ] 

stack commented on HBASE-3417:
--

As discussed up on IRC, this is not backward compatible:

{code}
+Pattern.compile(^(\\w{32})(?:\\.(.+))?$);
{code}

You can do a range IIRC 20-32 (was old length 20 chars?)

The below is a little bit messy.:

{code}
+return new Path(dir, UUID.randomUUID().toString().replaceAll(-, )
++ ((suffix == null || suffix.length() = 0) ?  : suffix));
{code}

Up on IRC, was thinking should base64 because then it'd be more compact.   See 
http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string.  There 
is also in hbase util a Base64#encodeBytes method that will take the 128 UUID 
bits and emit them as base64 (Possible to get it all down to 22 chars).  But 
looking at the base64 vocabulary, http://en.wikipedia.org/wiki/Base64, it 
includes '+' and '/' which are illegal in URL, a hdfs filepath.  Base32?  
http://en.wikipedia.org/wiki/Base32? But that won't work either.  Has to be 
multiples of 40 bits.

Maybe leave it as it comes out of UUID.toString w/ hyphens.  Then its plain its 
a UUID and its easier to read?



 CacheOnWrite is using the temporary output path for block names, need to use 
 a more consistent block naming scheme
 --

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch


 Currently the block names used in the block cache are built using the 
 filesystem path.  However, for cache on write, the path is a temporary output 
 file.
 The original COW patch actually made some modifications to block naming stuff 
 to make it more consistent but did not do enough.  Should add a separate 
 method somewhere for generating block names using some more easily mocked 
 scheme (rather than just raw path as we generate a random unique file name 
 twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-01-05 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978124#action_12978124
 ] 

Jonathan Gray commented on HBASE-3417:
--

I changed regex to be {{([0-9a-z]+)}}

I kind of like how it is.  It looks just like the encoded region names used for 
region directory names.

 CacheOnWrite is using the temporary output path for block names, need to use 
 a more consistent block naming scheme
 --

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch


 Currently the block names used in the block cache are built using the 
 filesystem path.  However, for cache on write, the path is a temporary output 
 file.
 The original COW patch actually made some modifications to block naming stuff 
 to make it more consistent but did not do enough.  Should add a separate 
 method somewhere for generating block names using some more easily mocked 
 scheme (rather than just raw path as we generate a random unique file name 
 twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme

2011-01-05 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978125#action_12978125
 ] 

Jonathan Gray commented on HBASE-3417:
--

Old random file name was using rand.nextLong() so it could be any length = 1.

 CacheOnWrite is using the temporary output path for block names, need to use 
 a more consistent block naming scheme
 --

 Key: HBASE-3417
 URL: https://issues.apache.org/jira/browse/HBASE-3417
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.92.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch


 Currently the block names used in the block cache are built using the 
 filesystem path.  However, for cache on write, the path is a temporary output 
 file.
 The original COW patch actually made some modifications to block naming stuff 
 to make it more consistent but did not do enough.  Should add a separate 
 method somewhere for generating block names using some more easily mocked 
 scheme (rather than just raw path as we generate a random unique file name 
 twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state

2011-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978156#action_12978156
 ] 

Hudson commented on HBASE-3420:
---

Integrated in HBase-TRUNK #1705 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1705/])


 Handling a big rebalance, we can queue multiple instances of a Close event; 
 messes up state
 ---

 Key: HBASE-3420
 URL: https://issues.apache.org/jira/browse/HBASE-3420
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: stack
Assignee: stack
 Fix For: 0.90.0

 Attachments: 3420.txt


 This is pretty ugly.  In short, on a heavily loaded cluster, we are queuing 
 multiple instances of region close.  They all try to run confusing state.
 Long version:
 I have a messy cluster.  Its 16k regions on 8 servers.  One node has 5k or so 
 regions on it.  Heaps are 1G all around.  My master had OOME'd.  Not sure why 
 but not too worried about it for now.  So, new master comes up and is trying 
 to rebalance the cluster:
 {code}
 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
 Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded 
 servers onto 3 less loaded servers
 {code}
 The balancer ends up sending many closes to a single overloaded server are 
 taking so long, the close times out in RIT.  We then do this:
 {code}
   case CLOSED:
 LOG.info(Region has been CLOSED for too long,  +
 retriggering ClosedRegionHandler);
 AssignmentManager.this.executorService.submit(
 new ClosedRegionHandler(master, AssignmentManager.this,
 regionState.getRegion()));
 break;
 {code}
 We queue a new close (Should we?).
 We time out a few more times (9 times) and each time we queue a new close.
 Eventually the close succeeds, the region gets assigned a new location.
 Then the next close pops off the eventhandler queue.
 Here is the telltale signature of stuff gone amiss:
 {code}
 2011-01-05 00:52:19,379 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 state=OPEN, ts=1294188709030
 {code}
 Notice how state is OPEN when we are forcing offline (It was actually just 
 successfully opened).  We end up assigning same server because plan was still 
 around:
 {code}
 2011-01-05 00:52:20,705 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted 
 open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. 
 but already online on this server
 {code}
 But later when plan is cleared, we assign new server and we have 
 dbl-assignment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3423) hbase-env.sh over-rides HBASE_OPTS incorrectly.

2011-01-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978157#action_12978157
 ] 

Hudson commented on HBASE-3423:
---

Integrated in HBase-TRUNK #1705 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1705/])
HBASE-3423 hbase-env.sh overrides HBASE_OPTS incorrectly


 hbase-env.sh over-rides HBASE_OPTS incorrectly.
 ---

 Key: HBASE-3423
 URL: https://issues.apache.org/jira/browse/HBASE-3423
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Ted Dunning
 Fix For: 0.90.0, 0.92.0


 conf/hbase-env.sh has the following line:
export HBASE_OPTS=-ea -XX:+HeapDumpOnOutOfMemoryError 
 -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
 This should be
export HBASE_OPTS=$HBASE_OPTS -ea -XX:+HeapDumpOnOutOfMemoryError 
 -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3379) Log splitting slowed by repeated attempts at connecting to downed datanode

2011-01-05 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978160#action_12978160
 ] 

Hairong Kuang commented on HBASE-3379:
--

Stack, HBASE-3285 should be able to fix the problem by avoiding this code path. 
This is the solution that our fb internal trunk uses.

 Log splitting slowed by repeated attempts at connecting to downed datanode
 --

 Key: HBASE-3379
 URL: https://issues.apache.org/jira/browse/HBASE-3379
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: stack
Priority: Critical

 Testing if I kill RS and DN on a node, log splitting takes longer as we 
 doggedly try connecting to the downed DN to get WAL blocks.  Here's the cycle 
 I see:
 {code}
 2010-12-21 17:34:48,239 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
 for block blk_900551257176291912_1203821 failed  because recovery from 
 primary datanode 10.20.20.182:10010 failed 5 times.Pipeline was 
 10.20.20.184:10010, 10.20.20.186:10010, 10.20.20.182:10010. Will retry...
 2010-12-21 17:34:50,240 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: /10.20.20.182:10020. Already tried 0 time(s).
 2010-12-21 17:34:51,241 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: /10.20.20.182:10020. Already tried 1 time(s).
 2010-12-21 17:34:52,241 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: /10.20.20.182:10020. Already tried 2 time(s).
 2010-12-21 17:34:53,242 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: /10.20.20.182:10020. Already tried 3 time(s).
 2010-12-21 17:34:54,243 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: /10.20.20.182:10020. Already tried 4 time(s).
 2010-12-21 17:34:55,243 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: /10.20.20.182:10020. Already tried 5 time(s).
 2010-12-21 17:34:56,244 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: /10.20.20.182:10020. Already tried 6 time(s).
 2010-12-21 17:34:57,245 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: /10.20.20.182:10020. Already tried 7 time(s).
 2010-12-21 17:34:58,245 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: /10.20.20.182:10020. Already tried 8 time(s).
 2010-12-21 17:34:59,246 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: /10.20.20.182:10020. Already tried 9 time(s).
 2010-12-21 17:34:59,246 WARN org.apache.hadoop.hdfs.DFSClient: Failed 
 recovery attempt #5 from primary datanode 10.20.20.182:10010
 java.net.ConnectException: Call to /10.20.20.182:10020 failed on connection 
 exception: java.net.ConnectException: Connection refused
 at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
 at org.apache.hadoop.ipc.Client.call(Client.java:743)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
 at $Proxy8.getProtocolVersion(Unknown Source)
 at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
 at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:346)
 at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:383)
 ...
 {code}
 because recovery from primary datanode is done 5 times (hardcoded).  Within 
 these retries we'll do
 {code}
 this.maxRetries = conf.getInt(ipc.client.connect.max.retries, 10);
 {code}
 The hardcoding of 5 attempts we should get fixed and we should doc the 
 ipc.client.connect.max.retries as important config.  We should recommend 
 bringing it down from default.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.