[jira] Commented: (HBASE-3255) Allow Export tool to choose subset of rows

2010-11-22 Thread Lars George (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934436#action_12934436
 ] 

Lars George commented on HBASE-3255:


Hi Ted, 

Isn't this a dupe from your HBASE-2495? And for the change of Import please 
create a new JIRA with the details on what you suggest please.

Lars

 Allow Export tool to choose subset of rows
 --

 Key: HBASE-3255
 URL: https://issues.apache.org/jira/browse/HBASE-3255
 Project: HBase
  Issue Type: Improvement
  Components: util
Affects Versions: 0.20.6
Reporter: Ted Yu

 org.apache.hadoop.hbase.mapreduce.Export should allow user to specify a 
 subset of rows.
 This capability would help develop solution for problem which produces 
 unwanted rows (in .META. table e.g.) that must be deleted.
 One such case is https://issues.apache.org/jira/browse/HBASE-3251
 We can export the dangling row(s) from .META., delete it and later import the 
 row(s) to (another) hbase instance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3243) Disable Table closed region on wrong host

2010-11-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934535#action_12934535
 ] 

Jonathan Gray commented on HBASE-3243:
--

Well try running again with my patch.  Or you could even run it again without 
to see if it happens again and we could get another set of logs.  I guess run 
it with the patch and then if it doesn't ever happen again we can punt the 
issue or resolve it until we see it again.

 Disable Table closed region on wrong host
 -

 Key: HBASE-3243
 URL: https://issues.apache.org/jira/browse/HBASE-3243
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.90.0

 Attachments: hbase-3243-logs.tar.bz2, HBASE-3243-v1.patch


 I ran some YCSB benchmarks which resulted in about 150 regions worth of data 
 overnight. Then I disabled the table, and the master for some reason closed 
 one region on the wrong server. The server ignored this, but the region 
 remained open on a different server, which later flipped out when it tried to 
 flush due to hlog accumulation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3258) EOF when version file is empty

2010-11-22 Thread Jean-Daniel Cryans (JIRA)
EOF when version file is empty
--

 Key: HBASE-3258
 URL: https://issues.apache.org/jira/browse/HBASE-3258
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.0, 0.92.0


I somehow was able to get an empty hbase.version file on a test machine and 
when I start HBase I see:

{noformat}
starting master, logging to 
/data/jdcryans/git/hbase/bin/../logs/hbase-jdcryans-master-hbasedev.out
Exception in thread master-hbasedev:6 java.lang.NullPointerException
at 
org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:559)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:286)
{noformat}

And in the master's log:

{noformat}
2010-11-22 10:08:43,003 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled 
exception. Starting shutdown.
java.io.EOFException
at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
at java.io.DataInputStream.readUTF(DataInputStream.java:572)
at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:151)
at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:170)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:226)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:104)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:89)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:337)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:273)
2010-11-22 10:08:43,006 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
{noformat}

I thought that that kind of issue was solved a long time ago, but somehow it's 
there again. I'll fix by handling the EOF and also will look at that ugly NPE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3258) EOF when version file is empty

2010-11-22 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934547#action_12934547
 ] 

Jean-Daniel Cryans commented on HBASE-3258:
---

For the record, the reason is that I was testing 0.90 with 0.20-append and 
since both are currently incompatible at the data transfer level (ugly ugly), 
the master is able to create the file but unable to write to. This HDFS-724 
situation is bad.

 EOF when version file is empty
 --

 Key: HBASE-3258
 URL: https://issues.apache.org/jira/browse/HBASE-3258
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.0, 0.92.0


 I somehow was able to get an empty hbase.version file on a test machine and 
 when I start HBase I see:
 {noformat}
 starting master, logging to 
 /data/jdcryans/git/hbase/bin/../logs/hbase-jdcryans-master-hbasedev.out
 Exception in thread master-hbasedev:6 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:559)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:286)
 {noformat}
 And in the master's log:
 {noformat}
 2010-11-22 10:08:43,003 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.io.EOFException
 at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
 at java.io.DataInputStream.readUTF(DataInputStream.java:572)
 at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:151)
 at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:170)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:226)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:104)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:89)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:337)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:273)
 2010-11-22 10:08:43,006 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
 {noformat}
 I thought that that kind of issue was solved a long time ago, but somehow 
 it's there again. I'll fix by handling the EOF and also will look at that 
 ugly NPE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3259) Can't kill the region servers when they wait on the master or the cluster state znode

2010-11-22 Thread Jean-Daniel Cryans (JIRA)
Can't kill the region servers when they wait on the master or the cluster state 
znode
-

 Key: HBASE-3259
 URL: https://issues.apache.org/jira/browse/HBASE-3259
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.0, 0.92.0


With a situation like HBASE-3258, it's easy to have the region servers stuck on 
waiting for either the master or the cluster state znode since it has no 
timeout. You have to kill -9 them to have them shutting down. This is very bad 
for usability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3259) Can't kill the region servers when they wait on the master or the cluster state znode

2010-11-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934556#action_12934556
 ] 

Jonathan Gray commented on HBASE-3259:
--

Like you said, maybe this is bad for usability, not sure this is blocking or a 
bug.

You want to make it so you can just 'kill' without -9?  Or you want to add 
timeout on RS on startup?

The former seems no different for usability.  The latter might be okay but not 
sure it's expected behavior.  What would the default timeout be?

 Can't kill the region servers when they wait on the master or the cluster 
 state znode
 -

 Key: HBASE-3259
 URL: https://issues.apache.org/jira/browse/HBASE-3259
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.0, 0.92.0


 With a situation like HBASE-3258, it's easy to have the region servers stuck 
 on waiting for either the master or the cluster state znode since it has no 
 timeout. You have to kill -9 them to have them shutting down. This is very 
 bad for usability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3259) Can't kill the region servers when they wait on the master or the cluster state znode

2010-11-22 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934561#action_12934561
 ] 

Jean-Daniel Cryans commented on HBASE-3259:
---

bq. Like you said, maybe this is bad for usability, not sure this is blocking 
or a bug.

I foresee that a majority of our new users will hit this issue if they have any 
sort of trouble setting up their cluster, so I think this is a blocker.

bq. You want to make it so you can just 'kill' without -9?

Not just kill, but also hbase-daemon.sh stop regionserver since it also 
hangs. Imagine a few machines in that state where you have to manually kill -9 
every one of them.

bq. Or you want to add timeout on RS on startup?

A timeout to the blocking, but that we retry until either the data is available 
or the region server is stopped. Like 1 or 2 seconds. 

I'm currently writing the patch.

 Can't kill the region servers when they wait on the master or the cluster 
 state znode
 -

 Key: HBASE-3259
 URL: https://issues.apache.org/jira/browse/HBASE-3259
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.0, 0.92.0


 With a situation like HBASE-3258, it's easy to have the region servers stuck 
 on waiting for either the master or the cluster state znode since it has no 
 timeout. You have to kill -9 them to have them shutting down. This is very 
 bad for usability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3258) EOF when version file is empty

2010-11-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934576#action_12934576
 ] 

Todd Lipcon commented on HBASE-3258:


When we create the .version file, we should create it in a tmp location and 
then move it into place. It's probably empty in the case that a server crashes 
between writing and closing.

 EOF when version file is empty
 --

 Key: HBASE-3258
 URL: https://issues.apache.org/jira/browse/HBASE-3258
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.0, 0.92.0

 Attachments: HBASE-3258.patch


 I somehow was able to get an empty hbase.version file on a test machine and 
 when I start HBase I see:
 {noformat}
 starting master, logging to 
 /data/jdcryans/git/hbase/bin/../logs/hbase-jdcryans-master-hbasedev.out
 Exception in thread master-hbasedev:6 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:559)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:286)
 {noformat}
 And in the master's log:
 {noformat}
 2010-11-22 10:08:43,003 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.io.EOFException
 at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
 at java.io.DataInputStream.readUTF(DataInputStream.java:572)
 at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:151)
 at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:170)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:226)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:104)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:89)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:337)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:273)
 2010-11-22 10:08:43,006 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
 {noformat}
 I thought that that kind of issue was solved a long time ago, but somehow 
 it's there again. I'll fix by handling the EOF and also will look at that 
 ugly NPE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3259) Can't kill the region servers when they wait on the master or the cluster state znode

2010-11-22 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-3259:
--

Attachment: HBASE-3259.patch

Small refactoring in HRS to handle the timeout and check to stopped, I thought 
of doing it down in ZooKeeperNodeTracker but I'm not sure if we want that 
behavior everywhere.

 Can't kill the region servers when they wait on the master or the cluster 
 state znode
 -

 Key: HBASE-3259
 URL: https://issues.apache.org/jira/browse/HBASE-3259
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.0, 0.92.0

 Attachments: HBASE-3259.patch


 With a situation like HBASE-3258, it's easy to have the region servers stuck 
 on waiting for either the master or the cluster state znode since it has no 
 timeout. You have to kill -9 them to have them shutting down. This is very 
 bad for usability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3256) Coprocessors: Coprocessor host and observer for HMaster

2010-11-22 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934581#action_12934581
 ] 

Gary Helmling commented on HBASE-3256:
--

Some additional details on the code changes here:

# add a {{org.apache.hadoop.hbase.coprocessor.MasterObserver}} interface 
defining pre/post methods for createTable, deleteTable, modifyTable, addColumn, 
modifyColumn, deleteColumn, enable/disable, move, balance, and shutdown
# extract a common base class from the current 
{{org.apache.hadoop.hbase.regionserver.CoprocessorHost}} to 
{{org.apache.hadoop.hbase.coprocessor.CoprocessorHost}}
# rename the existing region-specific 
{{org.apache.hadoop.hbase.regionserver.CoprocessorHost}} to 
{{RegionCoprocessorHost}}
# add a new {{org.apache.hadoop.hbase.master.MasterCoprocessorHost}} for 
HMaster integration
# refactor the current 
{{org.apache.hadoop.hbase.coprocessor.CoprocessorEnvironment}} into a base 
interface with {{RegionCoprocessorEnvironment}} and 
{{MasterCoprocessorEnvironment extensions}}



 Coprocessors: Coprocessor host and observer for HMaster
 ---

 Key: HBASE-3256
 URL: https://issues.apache.org/jira/browse/HBASE-3256
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Assignee: Gary Helmling
 Fix For: 0.92.0


 Implement a coprocessor host for HMaster. Hook observers into administrative 
 operations performed on tables: create, alter, assignment, load balance, and 
 allow observers to modify base master behavior. Support automatic loading of 
 coprocessor implementation. 
 Consider refactoring the master coprocessor host and regionserver coprocessor 
 host into a common base class. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3260) Coprocessors: Lifecycle management

2010-11-22 Thread Andrew Purtell (JIRA)
Coprocessors: Lifecycle management
--

 Key: HBASE-3260
 URL: https://issues.apache.org/jira/browse/HBASE-3260
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3261) NPE out of HRS.run at startup when clock is out of sync

2010-11-22 Thread Jean-Daniel Cryans (JIRA)
NPE out of HRS.run at startup when clock is out of sync
---

 Key: HBASE-3261
 URL: https://issues.apache.org/jira/browse/HBASE-3261
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.0, 0.92.0


This is what I get when I start a region server that's not properly sync'ed:

{noformat}
Exception in thread regionserver60020 java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:603)
at java.lang.Thread.run(Thread.java:637)
{noformat}

I this case the line was:
{noformat}
hlogRoller.interruptIfNecessary();
{noformat}

I guess we could add a bunch of other null checks.

The end result is the same, the RS dies, but I think it's misleading.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3260) Coprocessors: Lifecycle management

2010-11-22 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-3260:
--

  Description: 
Considering extending CPs to the master, we have no equivalent to pre/postOpen 
and pre/postClose as on the regionserver. We also should consider how to 
resolve dependencies and initialization ordering if loading coprocessors that 
depend on others. 

OSGi (http://en.wikipedia.org/wiki/OSGi) has a lifecycle API and is familiar to 
many Java programmers, so we propose to borrow its terminology and state 
machine.

A lifecycle layer manages coprocessors as they are dynamically installed, 
started, stopped, updated and uninstalled. Coprocessors rely on the framework 
for dependency resolution and class loading. In turn, the framework calls up to 
lifecycle management methods in the coprocessor as needed.

A coprocessor transitions between the below states over its lifetime:

||State||Description||
|UNINSTALLED|The coprocessor implementation is not installed. This is the 
default implicit state.|
|INSTALLED|The coprocessor implementation has been successfully installed|
|STARTING|A coprocessor instance is being started.|
|ACTIVE|The coprocessor instance has been successfully activated and is 
running.|
|STOPPING|A coprocessor instance is being stopped.|

See attached state diagram. Transitions to STOPPING will only happen as the 
region is being closed. If a coprocessor throws an unhandled exception, this 
will cause the RegionServer to close the region, stopping all coprocessor 
instances on it. 

Transitions from INSTALLED-STARTING and ACTIVE-STOPPING would go through 
upcall methods into the coprocessor via the CoprocessorLifecycle interface:

{code:java}
public interface CoprocessorLifecycle {
  void start(CoprocessorEnvironment env) throws IOException; 
  void stop(CoprocessorEnvironment env) throws IOException;
}
{code}
Fix Version/s: 0.92.0

 Coprocessors: Lifecycle management
 --

 Key: HBASE-3260
 URL: https://issues.apache.org/jira/browse/HBASE-3260
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
 Fix For: 0.92.0


 Considering extending CPs to the master, we have no equivalent to 
 pre/postOpen and pre/postClose as on the regionserver. We also should 
 consider how to resolve dependencies and initialization ordering if loading 
 coprocessors that depend on others. 
 OSGi (http://en.wikipedia.org/wiki/OSGi) has a lifecycle API and is familiar 
 to many Java programmers, so we propose to borrow its terminology and state 
 machine.
 A lifecycle layer manages coprocessors as they are dynamically installed, 
 started, stopped, updated and uninstalled. Coprocessors rely on the framework 
 for dependency resolution and class loading. In turn, the framework calls up 
 to lifecycle management methods in the coprocessor as needed.
 A coprocessor transitions between the below states over its lifetime:
 ||State||Description||
 |UNINSTALLED|The coprocessor implementation is not installed. This is the 
 default implicit state.|
 |INSTALLED|The coprocessor implementation has been successfully installed|
 |STARTING|A coprocessor instance is being started.|
 |ACTIVE|The coprocessor instance has been successfully activated and is 
 running.|
 |STOPPING|A coprocessor instance is being stopped.|
 See attached state diagram. Transitions to STOPPING will only happen as the 
 region is being closed. If a coprocessor throws an unhandled exception, this 
 will cause the RegionServer to close the region, stopping all coprocessor 
 instances on it. 
 Transitions from INSTALLED-STARTING and ACTIVE-STOPPING would go through 
 upcall methods into the coprocessor via the CoprocessorLifecycle interface:
 {code:java}
 public interface CoprocessorLifecycle {
   void start(CoprocessorEnvironment env) throws IOException; 
   void stop(CoprocessorEnvironment env) throws IOException;
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3260) Coprocessors: Lifecycle management

2010-11-22 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-3260:
--

Attachment: statechart.png

 Coprocessors: Lifecycle management
 --

 Key: HBASE-3260
 URL: https://issues.apache.org/jira/browse/HBASE-3260
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
 Fix For: 0.92.0

 Attachments: statechart.png


 Considering extending CPs to the master, we have no equivalent to 
 pre/postOpen and pre/postClose as on the regionserver. We also should 
 consider how to resolve dependencies and initialization ordering if loading 
 coprocessors that depend on others. 
 OSGi (http://en.wikipedia.org/wiki/OSGi) has a lifecycle API and is familiar 
 to many Java programmers, so we propose to borrow its terminology and state 
 machine.
 A lifecycle layer manages coprocessors as they are dynamically installed, 
 started, stopped, updated and uninstalled. Coprocessors rely on the framework 
 for dependency resolution and class loading. In turn, the framework calls up 
 to lifecycle management methods in the coprocessor as needed.
 A coprocessor transitions between the below states over its lifetime:
 ||State||Description||
 |UNINSTALLED|The coprocessor implementation is not installed. This is the 
 default implicit state.|
 |INSTALLED|The coprocessor implementation has been successfully installed|
 |STARTING|A coprocessor instance is being started.|
 |ACTIVE|The coprocessor instance has been successfully activated and is 
 running.|
 |STOPPING|A coprocessor instance is being stopped.|
 See attached state diagram. Transitions to STOPPING will only happen as the 
 region is being closed. If a coprocessor throws an unhandled exception, this 
 will cause the RegionServer to close the region, stopping all coprocessor 
 instances on it. 
 Transitions from INSTALLED-STARTING and ACTIVE-STOPPING would go through 
 upcall methods into the coprocessor via the CoprocessorLifecycle interface:
 {code:java}
 public interface CoprocessorLifecycle {
   void start(CoprocessorEnvironment env) throws IOException; 
   void stop(CoprocessorEnvironment env) throws IOException;
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3261) NPE out of HRS.run at startup when clock is out of sync

2010-11-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934664#action_12934664
 ] 

Jonathan Gray commented on HBASE-3261:
--

Yeah this is something I had to add a lot of checks for in HMaster as well.  +1 
on adding null checks before we stop/interrupt stuff.

 NPE out of HRS.run at startup when clock is out of sync
 ---

 Key: HBASE-3261
 URL: https://issues.apache.org/jira/browse/HBASE-3261
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.0, 0.92.0


 This is what I get when I start a region server that's not properly sync'ed:
 {noformat}
 Exception in thread regionserver60020 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:603)
   at java.lang.Thread.run(Thread.java:637)
 {noformat}
 I this case the line was:
 {noformat}
 hlogRoller.interruptIfNecessary();
 {noformat}
 I guess we could add a bunch of other null checks.
 The end result is the same, the RS dies, but I think it's misleading.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3227) Edit of log messages before branching...

2010-11-22 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934688#action_12934688
 ] 

HBase Review Board commented on HBASE-3227:
---

Message from: Nicolas nspiegelb...@facebook.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1212/#review1971
---



trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
http://review.cloudera.org/r/1212/#comment6227

I'd suggest keeping the store name in this debug message since we're 
considering thread pools for compactions...


- Nicolas





 Edit of log messages before branching...
 

 Key: HBASE-3227
 URL: https://issues.apache.org/jira/browse/HBASE-3227
 Project: HBase
  Issue Type: Improvement
Reporter: stack
 Fix For: 0.90.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3262) TestHMasterRPCException uses non-ephemeral port for master

2010-11-22 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-3262:
-

Attachment: HBASE-3262-v1.patch

Uses ephemeral port for master and cleans up unused imports causing warnings.

 TestHMasterRPCException uses non-ephemeral port for master
 --

 Key: HBASE-3262
 URL: https://issues.apache.org/jira/browse/HBASE-3262
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.90.0

 Attachments: HBASE-3262-v1.patch


 TestHMasterRPCException instantiates an HMaster but doesn't use an ephemeral 
 port which can cause the test to fail if port already in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3262) TestHMasterRPCException uses non-ephemeral port for master

2010-11-22 Thread Jonathan Gray (JIRA)
TestHMasterRPCException uses non-ephemeral port for master
--

 Key: HBASE-3262
 URL: https://issues.apache.org/jira/browse/HBASE-3262
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.90.0
 Attachments: HBASE-3262-v1.patch

TestHMasterRPCException instantiates an HMaster but doesn't use an ephemeral 
port which can cause the test to fail if port already in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3262) TestHMasterRPCException uses non-ephemeral port for master

2010-11-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934724#action_12934724
 ] 

Jonathan Gray commented on HBASE-3262:
--

Maybe we should push this setting of port 0 as master/rs ports into constructor 
of HBaseTestingUtility?

 TestHMasterRPCException uses non-ephemeral port for master
 --

 Key: HBASE-3262
 URL: https://issues.apache.org/jira/browse/HBASE-3262
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.90.0

 Attachments: HBASE-3262-v1.patch


 TestHMasterRPCException instantiates an HMaster but doesn't use an ephemeral 
 port which can cause the test to fail if port already in use.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2888) Review all our metrics

2010-11-22 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934763#action_12934763
 ] 

Alex Baranau commented on HBASE-2888:
-

As per small discussion here: http://search-hadoop.com/m/DZcdNHsTOe2, here are 
some extra things we might want to expose:

1. Splits stats. 
We have in JMX flush and compaction data (time spent and data amount). Should 
we add also stats for split procedures as they affect hbase behaviour too?

2. Flush/Compaction/Split rate.
For flush and compaction we expose only time spent and data amount stats, but 
we might also want to show smth like operations rate (number of actions). 
Based on flush/compaction/split rate one can make judgements on whether some 
configuration is properly set (e.g. hbase.hregion.memstore.flush.size).

3. Events log.
Also I think would be very useful for ops to have ability to watch at events 
(like splits, flushes, compactions) on a web interface/in JMX, know when they 
appear, aka events' log. Thus one can go to to web page and see what can affect 
performance degradation for a particular period of time. Currently we have to 
(and do) go to log files for that kind of info.



 Review all our metrics
 --

 Key: HBASE-2888
 URL: https://issues.apache.org/jira/browse/HBASE-2888
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0


 HBase publishes a bunch of metrics, some useful some wasteful, that should be 
 improved to deliver a better ops experience. Examples:
  - Block cache hit ratio converges at some point and stops moving
  - fsReadLatency goes down when compactions are running
  - storefileIndexSizeMB is the exact same number once a system is serving 
 production load
 We could use new metrics too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2888) Review all our metrics

2010-11-22 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934765#action_12934765
 ] 

Alex Baranau commented on HBASE-2888:
-

In general, does it makes sense to create separate issue to review the way we 
expose particular metrics/stats? E.g. should we show particular metric on a web 
interface or just put into JMX, in what form (in case of web), etc.?

 Review all our metrics
 --

 Key: HBASE-2888
 URL: https://issues.apache.org/jira/browse/HBASE-2888
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0


 HBase publishes a bunch of metrics, some useful some wasteful, that should be 
 improved to deliver a better ops experience. Examples:
  - Block cache hit ratio converges at some point and stops moving
  - fsReadLatency goes down when compactions are running
  - storefileIndexSizeMB is the exact same number once a system is serving 
 production load
 We could use new metrics too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.