date:20120403

[
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245016#comment-13245016
]

Hadoop QA commented on HBASE-5451:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12521102/5305v7.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper
org.apache.hadoop.hbase.mapreduce.TestImportTsv
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
org.apache.hadoop.hbase.mapreduce.TestTableMapReduce

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1375//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1375//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1375//console

This message is automatically generated.

Switch RPC call envelope/headers to PBs
---

Key: HBASE-5451
URL: https://issues.apache.org/jira/browse/HBASE-5451
Project: HBase
Issue Type: Sub-task
Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
Fix For: 0.96.0

Attachments: 5305v7.txt, rpc-proto.2.txt, rpc-proto.3.txt,
rpc-proto.patch.1_2, rpc-proto.r5.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5656) LoadIncrementalHFiles createTable should detect and set compression algorithm

2012-04-03 Thread Cosmin Lehene (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245018#comment-13245018
 ] 

Cosmin Lehene commented on HBASE-5656:
--

Lars, so if we change the hcd default compression from NONE to LZO, but instead 
we write the HFile explicitly without compression this will create a table that 
actually has compression, which is not what we want.

I guess if we want do be defensive we could have a reader.getCompression() != 
hcd.getCompression() condition.


 LoadIncrementalHFiles createTable should detect and set compression algorithm
 -

 Key: HBASE-5656
 URL: https://issues.apache.org/jira/browse/HBASE-5656
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.92.1
Reporter: Cosmin Lehene
Assignee: Cosmin Lehene
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5656-simple.txt, HBASE-5656-0.92.patch, 
 HBASE-5656-0.92.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 LoadIncrementalHFiles doesn't set compression when creating the the table.
 This can be detected from the files within each family dir. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs


[ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245022#comment-13245022
 ] 

stack commented on HBASE-5451:
--

I ran the non-usual test fails locally and they pass.  Will commit tomorrow 
unless objection.

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 5305v7.txt, rpc-proto.2.txt, rpc-proto.3.txt, 
 rpc-proto.patch.1_2, rpc-proto.r5.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5451) Switch RPC call envelope/headers to PBs


 [ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5451:
-

Status: Open  (was: Patch Available)

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 5305v7.txt, rpc-proto.2.txt, rpc-proto.3.txt, 
 rpc-proto.patch.1_2, rpc-proto.r5.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5451) Switch RPC call envelope/headers to PBs

2012-04-03 Thread Kannan Muthukkaruppan (Commented) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5451:
-

Attachment: 5305v7.txt

Retry

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 5305v7.txt, 5305v7.txt, rpc-proto.2.txt, 
 rpc-proto.3.txt, rpc-proto.patch.1_2, rpc-proto.r5.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3444) Bytes.toBytesBinary and Bytes.toStringBinary() should be reversible

2012-04-03 Thread Simon Kelly (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245089#comment-13245089
 ] 

Simon Kelly commented on HBASE-3444:


Also results in java.lang.StringIndexOutOfBoundsException in 
Bytes.toBytesBinary if the byte[] ends in a '\'

{code}
byte[] bytes = new byte[]{'a','b','c','\'};
Bytes.toBytesBinary(Bytes.toStringBinary(bytes));
{code}

 Bytes.toBytesBinary and Bytes.toStringBinary()  should be reversible
 

 Key: HBASE-3444
 URL: https://issues.apache.org/jira/browse/HBASE-3444
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Priority: Minor

 Bytes.toStringBinary() doesn't escape \.
 Otherwise the transformation isn't reversible
 byte[] a = {'\', 'x' , '0', '0'}
 Bytes.toBytesBinary(Bytes.toStringBinary(a)) won't be equal to a

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5313) Restructure hfiles layout for better compression


[ 
https://issues.apache.org/jira/browse/HBASE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245093#comment-13245093
 ] 

Kannan Muthukkaruppan commented on HBASE-5313:
--

Yongqiang: Any updates on this effort/investigation? I noticed HBASE-5674 that 
you had created which is sort of going after a specific part (timestamps)... 
but was curious where things are with respect to this JIRA.

 Restructure hfiles layout for better compression
 

 Key: HBASE-5313
 URL: https://issues.apache.org/jira/browse/HBASE-5313
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 A HFile block contain a stream of key-values. Can we can organize these kvs 
 on the disk in a better way so that we get much greater compression ratios?
 One option (thanks Prakash) is to store all the keys in the beginning of the 
 block (let's call this the key-section) and then store all their 
 corresponding values towards the end of the block. This will allow us to 
 not-even decompress the values when we are scanning and skipping over rows in 
 the block.
 Any other ideas? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5536) Make it clear that hbase 0.96 requires hadoop 1.0.0 at least; we will no longer work on older versions

2012-04-03 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245345#comment-13245345
 ] 

Jonathan Hsieh commented on HBASE-5536:
---

bq. That'd be nice. Agree would be sweet if we didn't have to do reflection at 
all going forward.

Or at least to remove reflection require for pre-hadoop 1.0.0.  We will 
probably eventually need it again for hadoop 0.23 hdfs...

 Make it clear that hbase 0.96 requires hadoop 1.0.0 at least; we will no 
 longer work on older versions
 --

 Key: HBASE-5536
 URL: https://issues.apache.org/jira/browse/HBASE-5536
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Blocker
 Fix For: 0.96.0


 Looks like there is pretty much consensus that depending on 1.0.0 in 0.96 
 should be fine?  See 
 http://search-hadoop.com/m/dSbVW14EsUb2/discuss+0.96subj=RE+DISCUSS+Have+hbase+require+at+least+hadoop+1+0+0+in+hbase+0+96+0+

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4398) If HRegionPartitioner is used in MapReduce, client side configurations are overwritten by hbase-site.xml.

2012-04-03 Thread Takuya Ueshin (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245363#comment-13245363
 ] 

Takuya Ueshin commented on HBASE-4398:
--

Rolled back in trunk?

 If HRegionPartitioner is used in MapReduce, client side configurations are 
 overwritten by hbase-site.xml.
 -

 Key: HBASE-4398
 URL: https://issues.apache.org/jira/browse/HBASE-4398
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Takuya Ueshin
Assignee: Takuya Ueshin
 Fix For: 0.92.2, 0.94.0

 Attachments: HBASE-4398.patch


 If HRegionPartitioner is used in MapReduce, client side configurations are 
 overwritten by hbase-site.xml.
 We can reproduce the problem by the following instructions:
 {noformat}
 - Add HRegionPartitioner.class to the 4th argument of
 TableMapReduceUtil#initTableReducerJob()
 at line around 133
 in src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java
 - Change or remove hbase.zookeeper.property.clientPort property
 in hbase-site.xml ( for example, changed to 12345 ).
 - run testMultiRegionTable()
 {noformat}
 Then I got error messages as following:
 {noformat}
 2011-09-12 22:28:51,020 DEBUG [Thread-832] zookeeper.ZKUtil(93): hconnection 
 opening connection to ZooKeeper with ensemble (localhost:12345)
 2011-09-12 22:28:51,022 INFO  [Thread-832] 
 zookeeper.RecoverableZooKeeper(89): The identifier of this process is 
 43200@imac.local
 2011-09-12 22:28:51,123 WARN  [Thread-832] 
 zookeeper.RecoverableZooKeeper(161): Possibly transient ZooKeeper exception: 
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase/master
 2011-09-12 22:28:51,123 INFO  [Thread-832] 
 zookeeper.RecoverableZooKeeper(173): The 1 times to retry ZooKeeper after 
 sleeping 1000 ms
  =
 2011-09-12 22:29:02,418 ERROR [Thread-832] mapreduce.HRegionPartitioner(125): 
 java.io.IOException: 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2e54e48d
  closed
 2011-09-12 22:29:02,422 WARN  [Thread-832] mapred.LocalJobRunner$Job(256): 
 job_local_0001
 java.lang.NullPointerException
at 
 org.apache.hadoop.hbase.mapreduce.HRegionPartitioner.setConf(HRegionPartitioner.java:128)
at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:527)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 {noformat}
 I think HTable should connect to ZooKeeper at port 21818 configured at client 
 side instead of 12345 in hbase-site.xml
 and It might be caused by HBaseConfiguration.addHbaseResources(conf); in 
 HRegionPartitioner#setConf(Configuration).
 And this might mean that all of client side configurations, also configured 
 in hbase-site.xml, are overwritten caused by this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs

2012-04-03 Thread Devaraj Das (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245369#comment-13245369
 ] 

Devaraj Das commented on HBASE-5451:


Thank you, Stack!

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 5305v7.txt, 5305v7.txt, rpc-proto.2.txt, 
 rpc-proto.3.txt, rpc-proto.patch.1_2, rpc-proto.r5.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5451) Switch RPC call envelope/headers to PBs

2012-04-03 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5451:
-

Status: Patch Available  (was: Open)

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 5305v7.txt, 5305v7.txt, rpc-proto.2.txt, 
 rpc-proto.3.txt, rpc-proto.patch.1_2, rpc-proto.r5.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5583) Master restart on create table with splitkeys does not recreate table with all the splitkey regions

[
https://issues.apache.org/jira/browse/HBASE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245371#comment-13245371
]

ramkrishna.s.vasudevan commented on HBASE-5583:
---

First step update the status in the below to CREATETABLE
/hbase/table/tableName
/hbase/table/tableName/currentstep

- Create nodes with hregioninfo in the constructor of create table handler.
Create those many nodes as in the splitkeys.

- /hbase/table/tableName/currentStep
CREATED_SPLIT_KEYS (after the creation of all splitkey nodes)
CREATING_TD
ADD_TO_META
ASSIGN_USER_REGIONS

if all are done 'ENABLED' will be the state of the table.

- add regioninfo to META and on success delete the node created node one by
one

- IF master fail over happens

See in what step the node is

If failed in ASSIGN_USER_REGIONS
Just reassign all the regions

If failed in ADD_TO_META
Check what regions are not yet added to meta and then do the assignment for all
the regions
We can ensure that the nodes that are available in the zk are the ones for
which the meta entry is not updated

If failed in CREATING_TD
If tabledescriptor not found create it again. if not go to the next step

If failed in CREATED_SPLIT_KEYS
Delete the node so that atleast the next time creation is successful. Just log
it as we cannot throw error back to the client.

===

We need to create one more api in client called 'isTableAvailable()' accepting
splitKeys also. so that the user can use them to check if really
the table is created with all the splitkeys.
Currently the isTableAvailable() just checks if atleast one region is assigned
to any RS.

The current changes does not guarentee that the user table will be created in a
synchronous way to the client createTable api.
but will ensure that while creating any table if the master failover happens
the table doesnot get created with less number of
regions but still the user thinks table creation is sucessful.

We now distinguish the states where a table was in create flow or in the enable
flow.
Previously the ENABLING state was in master fail over scenario.

Pls provide your inputs.

Master restart on create table with splitkeys does not recreate table with
all the splitkey regions
---

Key: HBASE-5583
URL: https://issues.apache.org/jira/browse/HBASE-5583
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Fix For: 0.96.0

- Create table using splitkeys
- MAster goes down before all regions are added to meta
- On master restart the table is again enabled but with less number of
regions than specified in splitkeys
Anyway client will get an exception if i had called sync create table. But
table exists or not check will say table exists.
Is this scenario to be handled by client only or can we have some mechanism
on the master side for this? Pls suggest.

[jira] [Created] (HBASE-5704) HBASE-4398 mistakenly rolled back on trunk

2012-04-03 Thread stack (Created) (JIRA)

HBASE-4398 mistakenly rolled back on trunk
--

 Key: HBASE-5704
 URL: https://issues.apache.org/jira/browse/HBASE-5704
 Project: HBase
  Issue Type: Bug
Reporter: stack




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5704) HBASE-4398 mistakenly rolled back on trunk


 [ 
https://issues.apache.org/jira/browse/HBASE-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5704:
-

Attachment: 5704.txt

A code change to check for why trunk build was broke got committed by 
mistake putting back hbase-4398

 HBASE-4398 mistakenly rolled back on trunk
 --

 Key: HBASE-5704
 URL: https://issues.apache.org/jira/browse/HBASE-5704
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.96.0

 Attachments: 5704.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-5704) HBASE-4398 mistakenly rolled back on trunk

2012-04-03 Thread stack (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-5704.
--

   Resolution: Fixed
Fix Version/s: 0.96.0
 Assignee: stack

Committed to trunk.

 HBASE-4398 mistakenly rolled back on trunk
 --

 Key: HBASE-5704
 URL: https://issues.apache.org/jira/browse/HBASE-5704
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.96.0

 Attachments: 5704.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4398) If HRegionPartitioner is used in MapReduce, client side configurations are overwritten by hbase-site.xml.


[ 
https://issues.apache.org/jira/browse/HBASE-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245381#comment-13245381
 ] 

stack commented on HBASE-4398:
--

@Takuya Sorry about that.  A temporary disable while trying to figure why trunk 
build was broken got committed.  I fixed  trunk over in HBASE-5704.  Thanks for 
noticing this.

 If HRegionPartitioner is used in MapReduce, client side configurations are 
 overwritten by hbase-site.xml.
 -

 Key: HBASE-4398
 URL: https://issues.apache.org/jira/browse/HBASE-4398
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Takuya Ueshin
Assignee: Takuya Ueshin
 Fix For: 0.92.2, 0.94.0

 Attachments: HBASE-4398.patch


 If HRegionPartitioner is used in MapReduce, client side configurations are 
 overwritten by hbase-site.xml.
 We can reproduce the problem by the following instructions:
 {noformat}
 - Add HRegionPartitioner.class to the 4th argument of
 TableMapReduceUtil#initTableReducerJob()
 at line around 133
 in src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java
 - Change or remove hbase.zookeeper.property.clientPort property
 in hbase-site.xml ( for example, changed to 12345 ).
 - run testMultiRegionTable()
 {noformat}
 Then I got error messages as following:
 {noformat}
 2011-09-12 22:28:51,020 DEBUG [Thread-832] zookeeper.ZKUtil(93): hconnection 
 opening connection to ZooKeeper with ensemble (localhost:12345)
 2011-09-12 22:28:51,022 INFO  [Thread-832] 
 zookeeper.RecoverableZooKeeper(89): The identifier of this process is 
 43200@imac.local
 2011-09-12 22:28:51,123 WARN  [Thread-832] 
 zookeeper.RecoverableZooKeeper(161): Possibly transient ZooKeeper exception: 
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase/master
 2011-09-12 22:28:51,123 INFO  [Thread-832] 
 zookeeper.RecoverableZooKeeper(173): The 1 times to retry ZooKeeper after 
 sleeping 1000 ms
  =
 2011-09-12 22:29:02,418 ERROR [Thread-832] mapreduce.HRegionPartitioner(125): 
 java.io.IOException: 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2e54e48d
  closed
 2011-09-12 22:29:02,422 WARN  [Thread-832] mapred.LocalJobRunner$Job(256): 
 job_local_0001
 java.lang.NullPointerException
at 
 org.apache.hadoop.hbase.mapreduce.HRegionPartitioner.setConf(HRegionPartitioner.java:128)
at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:527)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 {noformat}
 I think HTable should connect to ZooKeeper at port 21818 configured at client 
 side instead of 12345 in hbase-site.xml
 and It might be caused by HBaseConfiguration.addHbaseResources(conf); in 
 HRegionPartitioner#setConf(Configuration).
 And this might mean that all of client side configurations, also configured 
 in hbase-site.xml, are overwritten caused by this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3444) Bytes.toBytesBinary and Bytes.toStringBinary() should be reversible


[ 
https://issues.apache.org/jira/browse/HBASE-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245384#comment-13245384
 ] 

stack commented on HBASE-3444:
--

Do you have a patch for us Simon?  Thanks.

 Bytes.toBytesBinary and Bytes.toStringBinary()  should be reversible
 

 Key: HBASE-3444
 URL: https://issues.apache.org/jira/browse/HBASE-3444
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Priority: Minor

 Bytes.toStringBinary() doesn't escape \.
 Otherwise the transformation isn't reversible
 byte[] a = {'\', 'x' , '0', '0'}
 Bytes.toBytesBinary(Bytes.toStringBinary(a)) won't be equal to a

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3444) Bytes.toBytesBinary and Bytes.toStringBinary() should be reversible


 [ 
https://issues.apache.org/jira/browse/HBASE-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3444:
-

Tags: noob

 Bytes.toBytesBinary and Bytes.toStringBinary()  should be reversible
 

 Key: HBASE-3444
 URL: https://issues.apache.org/jira/browse/HBASE-3444
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Priority: Minor

 Bytes.toStringBinary() doesn't escape \.
 Otherwise the transformation isn't reversible
 byte[] a = {'\', 'x' , '0', '0'}
 Bytes.toBytesBinary(Bytes.toStringBinary(a)) won't be equal to a

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5701) Put RegionServerDynamicStatistics under RegionServer in MBean hierarchy rather than have it as a peer.


 [ 
https://issues.apache.org/jira/browse/HBASE-5701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5701:
-

   Resolution: Fixed
Fix Version/s: 0.94.0
 Hadoop Flags: Incompatible change,Reviewed  (was: Incompatible change)
   Status: Resolved  (was: Patch Available)

Committed to 0.94 branch and trunk.  Thanks for review Enis.

 Put RegionServerDynamicStatistics under RegionServer in MBean hierarchy 
 rather than have it as a peer.
 --

 Key: HBASE-5701
 URL: https://issues.apache.org/jira/browse/HBASE-5701
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.94.0

 Attachments: 5701.txt, 5701.txt, screenshot-1.jpg, screenshot-2.jpg




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5587) Remove dns.interface configuration options


 [ 
https://issues.apache.org/jira/browse/HBASE-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5587:
-

Affects Version/s: 0.96.0

Good candidate for the singularity release.

 Remove dns.interface configuration options 
 ---

 Key: HBASE-5587
 URL: https://issues.apache.org/jira/browse/HBASE-5587
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.96.0
Reporter: Eli Collins

 Are the {{hbase.*.dns.interface}} configuration options used or needed?  Per 
 HBASE-4109 it looks like these never really worked, at least in cases where 
 the hostname with a trailing dot doesn't resolve. The reason I asked is that 
 while these were introduced in Hadoop, I don't think they're actually used, 
 nor am I convinced bypassing the host for DNS lookups is a good idea (leads 
 to painful bugs where default Java DNS lookups differ with these lookups). 
 HBase started using these via a similar feature in HBASE-1279 and HBASE-1279.
 I filed HADOOP-8156 to remove the API which HBase uses, which is obviously an 
 incompatible change and would need to be worked around here if you wanted to 
 keep this functionality in HBase, ie *if* that were to get checked into 
 Hadoop we'd first need to get you on your own DNS class. Either way I'll 
 update DNS' InterfaceAudience annotation to indicate HBase is a user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5584) Coprocessor hooks can be called in the respective handlers


[ 
https://issues.apache.org/jira/browse/HBASE-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245392#comment-13245392
 ] 

stack commented on HBASE-5584:
--

@Andrew One for you when you have a moment

 Coprocessor hooks can be called in the respective handlers
 --

 Key: HBASE-5584
 URL: https://issues.apache.org/jira/browse/HBASE-5584
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5584-1.patch, HBASE-5584-2.patch, HBASE-5584.patch


 Following points can be changed w.r.t to coprocessors
 - Call preCreate, postCreate, preEnable, postEnable, etc. in their 
 respective handlers
 - Currently it is called in the HMaster thus making the postApis async w.r.t 
 the handlers
 - Similar is the case with the balancer.
 with current behaviour once we are in the postEnable(for eg) we any way need 
 to wait for the main enable handler to 
 be completed.
 We should ensure that we dont wait in the main thread so again we need to 
 spawn a thread and wait on that.
 On the other hand if the pre and post api is called on the handlers then only 
 that handler thread will be
 used in the pre/post apis
 If the above said plan is ok i can prepare a patch for all such related 
 changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5217) Reenable the thrift tests, and add a new one for getRegionInfo

[
https://issues.apache.org/jira/browse/HBASE-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245398#comment-13245398
]

stack commented on HBASE-5217:
--

@Alex Is this dependent on another issue being committed first? In
doTestGetRegionInfo the tabs are wrong. Should be two spaces like the rest of
the file. With your changes are we going to start a cluster each time (Was the
testAll method trying to avoid our making a cluster each time)?

Reenable the thrift tests, and add a new one for getRegionInfo
--

Key: HBASE-5217
URL: https://issues.apache.org/jira/browse/HBASE-5217
Project: HBase
Issue Type: Improvement
Reporter: Alex Newman
Assignee: Alex Newman
Priority: Minor
Attachments: 0001-Fixing-thrift-tests-v2.patch,
0001-Fixing-thrift-tests.patch,
0002-HBASE-5217.-Reenable-the-thrift-tests-and-add-a-new-.patch,
-hbase-posix4e #92 Console [Jenkins].pdf

At some point we disabled tests for the thrift server. In addition, it looks
like the getRegionInfo no longer functions. I'd like to reenable the tests
and add one for getRegionInfo. I had to write this to test my changes in
HBASE-2600 anyway. I figured I would break it out. We shouldn't commit it
until we have fixed getting the regioninfo from the thriftserver.

[jira] [Commented] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits


[ 
https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245399#comment-13245399
 ] 

stack commented on HBASE-5618:
--

+1 on patch

 SplitLogManager - prevent unnecessary attempts to resubmits
 ---

 Key: HBASE-5618
 URL: https://issues.apache.org/jira/browse/HBASE-5618
 Project: HBase
  Issue Type: Improvement
  Components: wal, zookeeper
Reporter: Prakash Khemani
 Attachments: 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch


 Currently once a watch fires that the task node has been updated (hearbeated) 
 by the worker, the splitlogmanager still quite some time before it updates 
 the last heard from time. This is because the manager currently schedules 
 another getDataSetWatch() and only after that finishes will it update the 
 task's last heard from time.
 This leads to a large number of zk-BadVersion warnings when resubmission is 
 continuously attempted and it fails.
 Two changes should be made
 (1) On a resubmission failure because of BadVersion the task's lastUpdate 
 time should get upped.
 (2) The task's lastUpdate time should get upped as soon as the 
 nodeDataChanged() watch fires and without waiting for getDataSetWatch() to 
 complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits


 [ 
https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5618:
-

Status: Patch Available  (was: Open)

 SplitLogManager - prevent unnecessary attempts to resubmits
 ---

 Key: HBASE-5618
 URL: https://issues.apache.org/jira/browse/HBASE-5618
 Project: HBase
  Issue Type: Improvement
  Components: wal, zookeeper
Reporter: Prakash Khemani
 Attachments: 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch


 Currently once a watch fires that the task node has been updated (hearbeated) 
 by the worker, the splitlogmanager still quite some time before it updates 
 the last heard from time. This is because the manager currently schedules 
 another getDataSetWatch() and only after that finishes will it update the 
 task's last heard from time.
 This leads to a large number of zk-BadVersion warnings when resubmission is 
 continuously attempted and it fails.
 Two changes should be made
 (1) On a resubmission failure because of BadVersion the task's lastUpdate 
 time should get upped.
 (2) The task's lastUpdate time should get upped as soon as the 
 nodeDataChanged() watch fires and without waiting for getDataSetWatch() to 
 complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost


[ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245407#comment-13245407
 ] 

stack commented on HBASE-5606:
--

ping on this patch.   What we thinking here?  Patch seems clean enough.  Jimmy, 
you think there might be other places where we need to do similar?  Prakash, 
what you think of Jimmy's suggestion of sync'ing the delete (I think I know 
what you are going to say!).

 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Priority: Critical
 Fix For: 0.92.2

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-04-03 Thread Chinna Rao Lalam (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245423#comment-13245423
 ] 

Chinna Rao Lalam commented on HBASE-5606:
-

For me also patch looks clean.


 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Priority: Critical
 Fix For: 0.92.2

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs

[
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245425#comment-13245425
]

Hadoop QA commented on HBASE-5451:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12521110/5305v7.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper
org.apache.hadoop.hbase.master.TestDistributedLogSplitting
org.apache.hadoop.hbase.mapreduce.TestImportTsv
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
org.apache.hadoop.hbase.mapreduce.TestTableMapReduce

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1376//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1376//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1376//console

This message is automatically generated.

Switch RPC call envelope/headers to PBs
---

Attachments: 5305v7.txt, 5305v7.txt, rpc-proto.2.txt,
rpc-proto.3.txt, rpc-proto.patch.1_2, rpc-proto.r5.txt

[jira] [Commented] (HBASE-5701) Put RegionServerDynamicStatistics under RegionServer in MBean hierarchy rather than have it as a peer.


[ 
https://issues.apache.org/jira/browse/HBASE-5701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245448#comment-13245448
 ] 

Hudson commented on HBASE-5701:
---

Integrated in HBase-0.94 #83 (See 
[https://builds.apache.org/job/HBase-0.94/83/])
HBASE-5701 Put RegionServerDynamicStatistics under RegionServer in MBean 
hierarchy rather than have it as a peer (Revision 1308970)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicStatistics.java


 Put RegionServerDynamicStatistics under RegionServer in MBean hierarchy 
 rather than have it as a peer.
 --

 Key: HBASE-5701
 URL: https://issues.apache.org/jira/browse/HBASE-5701
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.94.0

 Attachments: 5701.txt, 5701.txt, screenshot-1.jpg, screenshot-2.jpg




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5451) Switch RPC call envelope/headers to PBs

2012-04-03 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5451:
-

  Resolution: Fixed
Hadoop Flags: Incompatible change,Reviewed
  Status: Resolved  (was: Patch Available)

Ran the failed tests locally and they pass.  Committed trunk.  Thanks for the 
patch Devaraj.

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 5305v7.txt, 5305v7.txt, rpc-proto.2.txt, 
 rpc-proto.3.txt, rpc-proto.patch.1_2, rpc-proto.r5.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-04-03 Thread Zhihong Yu (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-5606:
-

Assignee: Prakash Khemani

 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.2

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5444) Add PB-based calls to HMasterRegionInterface


[ 
https://issues.apache.org/jira/browse/HBASE-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245463#comment-13245463
 ] 

jirapos...@reviews.apache.org commented on HBASE-5444:
--



bq.  On 2012-04-03 06:07:58, Michael Stack wrote:
bq.   src/main/java/org/apache/hadoop/hbase/ipc/HMasterRegionInterface.java, 
line 78
bq.   https://reviews.apache.org/r/4463/diff/2/?file=95093#file95093line78
bq.  
bq.   Radical!
bq.   
bq.   Jimmy didn't remove his interface.  Maybe he should have?

I haven't removed HRegionInterface yet.  I can remove it after the admin part 
is also completed.


bq.  On 2012-04-03 06:07:58, Michael Stack wrote:
bq.   src/main/proto/hbase.proto, line 116
bq.   https://reviews.apache.org/r/4463/diff/2/?file=95105#file95105line116
bq.  
bq.   I think jimmy made one of these already.  Coordinate.  I like the 
name of yours...

Sure, I can rename mine.


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4463/#review6651
---


On 2012-03-23 19:55:28, Gregory Chanan wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4463/
bq.  ---
bq.  
bq.  (Updated 2012-03-23 19:55:28)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Adds PB-based calls replacing HMasterRegionInterface.
bq.  
bq.  There are some temporary hacks, e.g. converting PB-based ServerLoad to 
existing HServerLoad so I didn't need to convert ClusterStatus (which brings in 
a lot of other changes).  That will be cleaned up in HBASE-5445.
bq.  
bq.  
bq.  This addresses bug HBASE-5444.
bq.  https://issues.apache.org/jira/browse/HBASE-5444
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.pom.xml 10b13ef 
bq.
src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon 
69434f7 
bq.
src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon 
ae76204 
bq.src/main/java/org/apache/hadoop/hbase/ClusterStatus.java 9df4c10 
bq.src/main/java/org/apache/hadoop/hbase/HConstants.java 347 
bq.src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 
cbfa489 
bq.src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java 0db2760 
bq.src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java 2916d68 
bq.src/main/java/org/apache/hadoop/hbase/ipc/HMasterRegionInterface.java 
fd97830 
bq.src/main/java/org/apache/hadoop/hbase/ipc/Invocation.java f1f06b0 
bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
d47ef10 
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f 
bq.src/main/java/org/apache/hadoop/hbase/master/MXBean.java 7f44dc2 
bq.src/main/java/org/apache/hadoop/hbase/master/MXBeanImpl.java 45b8fe7 
bq.src/main/java/org/apache/hadoop/hbase/master/MasterDumpServlet.java 
be63838 
bq.src/main/java/org/apache/hadoop/hbase/master/ServerManager.java cbd55f7 
bq.
src/main/java/org/apache/hadoop/hbase/protobuf/MasterRegionInterface.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/protobuf/PBHelper.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
e0af8fb 
bq.src/main/proto/MasterRegionProtocol.proto PRE-CREATION 
bq.src/main/proto/hbase.proto PRE-CREATION 
bq.src/main/resources/hbase-webapps/master/table.jsp 811df46 
bq.src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java 6af9188 
bq.src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java 
368a0e5 
bq.src/test/java/org/apache/hadoop/hbase/io/TestHbaseObjectWritable.java 
f2f8ee3 
bq.src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java 
841649a 
bq.src/test/java/org/apache/hadoop/hbase/master/TestMXBean.java 379f70c 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestServerCustomProtocol.java
 e99d251 
bq.  
bq.  Diff: https://reviews.apache.org/r/4463/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran jenkins job, all unit tests passed.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Gregory
bq.  
bq.



 Add PB-based calls to HMasterRegionInterface
 

 Key: HBASE-5444
 URL: https://issues.apache.org/jira/browse/HBASE-5444
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Gregory Chanan



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA

[jira] [Commented] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits

[
https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245466#comment-13245466
]

Hadoop QA commented on HBASE-5618:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12520020/0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.master.TestSplitLogManager

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1377//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1377//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1377//console

This message is automatically generated.

SplitLogManager - prevent unnecessary attempts to resubmits
---

Key: HBASE-5618
URL: https://issues.apache.org/jira/browse/HBASE-5618
Project: HBase
Issue Type: Improvement
Components: wal, zookeeper
Reporter: Prakash Khemani
Attachments:
0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch

Currently once a watch fires that the task node has been updated (hearbeated)
by the worker, the splitlogmanager still quite some time before it updates
the last heard from time. This is because the manager currently schedules
another getDataSetWatch() and only after that finishes will it update the
task's last heard from time.
This leads to a large number of zk-BadVersion warnings when resubmission is
continuously attempted and it fails.
Two changes should be made
(1) On a resubmission failure because of BadVersion the task's lastUpdate
time should get upped.
(2) The task's lastUpdate time should get upped as soon as the
nodeDataChanged() watch fires and without waiting for getDataSetWatch() to
complete.

[jira] [Commented] (HBASE-5601) Add per-column-family data block cache hit ratios

[
https://issues.apache.org/jira/browse/HBASE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245469#comment-13245469
]

stack commented on HBASE-5601:
--

bq. Make Per-cf and global block cache metrics naming consistent
(hbase.regionserver.blockCacheHitCount vs
hbase.RegionServerDynamicStatistics.tbl.usertable.cf.ycsb.bt.Data.fsBlockReadCacheHitCnt)

Is it really named the latter? Please fix (Proposal looks good).

bq. Rename hbase.RegionServerDynamicStatistics.XXX - hbase.regionserver.dyn.XXX

hbase-5701 just committed did this (sort-of).

bq. Rename dynamic metric names
tbl.usertable.cf.ycsb.bt.Data.fsBlockReadCacheHitCnt -
tbl=usertable.cf=ycsb.bt=Data.at=scan.BlockCacheHitCount. Here at stands for
access type, and can be scan or compaction.

I don't follow the above. Something didn't make it across.

bq. For at=compaction, we probably do not need per-block type (Data, Index,
...) metric values, shall we aggregate them?

Sure, on compaction.

bq. Shall we extend per-cf metrics into HBASE-5533 (read/write latency
histograms)?

That would be sweet but maybe for later.

bq. With slab cache, we may no longer have one global block cache. Shall we
enable per-cache type metrics? This can imply renaming
hbase.regionserver.blockCache = hbase.regionserver.lrublockcache,
slabblockcache, etc.

Having them distinctly named makes sense to me.

bq. Add a configuration for disabling per-cf metrics.

Yes. There could be an overwhelming amount.

ba. Thoughts?

Sounds excellent. Sounds like work that would be checked in with the
singularity, 0.96?

Add per-column-family data block cache hit ratios
-

Key: HBASE-5601
URL: https://issues.apache.org/jira/browse/HBASE-5601
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Enis Soztutar

In addition to the overall block cache hit ratio it would be extremely useful
to have per-column-family data block cache hit ratio metrics.

[jira] [Commented] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits


[ 
https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245485#comment-13245485
 ] 

stack commented on HBASE-5618:
--

That test failure don't look too good.

 SplitLogManager - prevent unnecessary attempts to resubmits
 ---

 Key: HBASE-5618
 URL: https://issues.apache.org/jira/browse/HBASE-5618
 Project: HBase
  Issue Type: Improvement
  Components: wal, zookeeper
Reporter: Prakash Khemani
 Attachments: 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch


 Currently once a watch fires that the task node has been updated (hearbeated) 
 by the worker, the splitlogmanager still quite some time before it updates 
 the last heard from time. This is because the manager currently schedules 
 another getDataSetWatch() and only after that finishes will it update the 
 task's last heard from time.
 This leads to a large number of zk-BadVersion warnings when resubmission is 
 continuously attempted and it fails.
 Two changes should be made
 (1) On a resubmission failure because of BadVersion the task's lastUpdate 
 time should get upped.
 (2) The task's lastUpdate time should get upped as soon as the 
 nodeDataChanged() watch fires and without waiting for getDataSetWatch() to 
 complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5702) MasterSchemaChangeTracker.excludeRegionServerForSchemaChanges leaks a MonitoredTask per call


[ 
https://issues.apache.org/jira/browse/HBASE-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245496#comment-13245496
 ] 

Zhihong Yu commented on HBASE-5702:
---

The following method in ServerManager.java doesn't check the current value for 
hbase.instant.schema.alter.enabled config:
{code}
  private void excludeRegionServerFromSchemaChanges(final ServerName 
serverName) {
this.services.getSchemaChangeTracker()
.excludeRegionServerForSchemaChanges(serverName.getServerName());
  }
{code}

 MasterSchemaChangeTracker.excludeRegionServerForSchemaChanges leaks a 
 MonitoredTask per call
 

 Key: HBASE-5702
 URL: https://issues.apache.org/jira/browse/HBASE-5702
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.94.0, 0.96.0


 This bug is so easy to reproduce I'm wondering why it hasn't been reported 
 yet. Stop any number of region servers on a 0.94/6 cluster and you'll see in 
 the master interface one task per stopped region server saying the following:
 |Processing schema change exclusion for region server = 
 sv4r27s44,62023,1333402175340|RUNNING (since 5sec ago)|No schema change in 
 progress. Skipping exclusion for server = sv4r27s44,62023,1333402175340 
 (since 5sec ago)|
 It's gonna stay there until the master cleans it:
 bq. WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status Processing 
 schema change exclusion for region server = sv4r27s44,62023,1333402175340: 
 status=No schema change in progress. Skipping exclusion for server = 
 sv4r27s44,62023,1333402175340, state=RUNNING, startTime=1333404636419, 
 completionTime=-1 appears to have been leaked
 It's not clear to me why it's using a MonitoredTask in the first place. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

[
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245528#comment-13245528
]

Keith Turner commented on HBASE-4821:
-

I am an Accumulo developer, there is some cruft in our test dir. The two most
successful cluster test we have are continuous ingest and random walk. We have
found lots of bugs w/ these test. I wrote a Gora version of continuous ingest
that should run against HBASE. The readme on github has a nice description.

https://github.com/keith-turner/goraci/

The accumulo version of continuous ingest can be found here.

http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/test/system/continuous/

This dir contains an old set of open office slides that also give an overview
of continuous ingest. At the end of the slides is the beginning of the idea of
random walk test. I am not sure if we have a nice description of random walk
anywhere. It is a fairly simple test framework. You write test nodes in Java
and link the nodes together in a graph using XML. You start a test clients
each node in a cluster. The test client just does a random walk of the test
graph. We have found a ton of bugs in 1.3 and 1.4 using random walk.

Actually the Accumulo features page may be the only place we give an overview
of randomwalk. I noticed that our random walk readme only tells you how to run
it, not what it is. Below is a link to the random walk test, but like I said
its not very informative.

http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/test/system/randomwalk/

The actual Java code at the link below. The framework and test nodes code is
all here.

http://svn.apache.org/repos/asf/accumulo/tags/1.4.0/src/server/src/main/java/org/apache/accumulo/server/test/randomwalk/

The short description of randomwalk I mentioned is here.

http://accumulo.apache.org/notable_features.html#testing

If anyone is interested in generalizing random walk so that HBase could use it
to, let me know.

One last thing. We tested Accumulo for over a month on a 10 node cluster using
Continuous ingest, Random Walk, and the Agitator. Below are some of the bugs
we found during that time period.

[Bugs found in 1.4
testing|https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=labels+%3D+14_qa_bug]

A fully automated comprehensive distributed integration test for HBase
--

Key: HBASE-4821
URL: https://issues.apache.org/jira/browse/HBASE-4821
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

To properly verify that a particular version of HBase is good for production
deployment we need a better way to do real cluster testing after incremental
changes. Running unit tests is good, but we also need to deploy HBase to a
cluster, run integration tests, load tests, Thrift server tests, kill some
region servers, kill the master, and produce a report. All of this needs to
happen in 20-30 minutes with minimal manual intervention. I think this way we
can combine agile development with high stability of the codebase. I am
envisioning a high-level framework written in a scripting language (e.g.
Python) that would abstract external operations such as deploy to test
cluster, kill a particular server, run load test A, run load test B
(we already have a few kinds of load tests implemented in Java, and we could
write a Thrift load test in Python). This tool should also produce
intermediate output, allowing to catch problems early and restart the test.
No implementation has yet been done. Any ideas or suggestions are welcome.

[jira] [Resolved] (HBASE-5700) [89-fb] Fix TestMiniClusterLoad* test failures

2012-04-03 Thread Mikhail Bautin (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin resolved HBASE-5700.
---

Resolution: Fixed

Committed internally

 [89-fb] Fix TestMiniClusterLoad* test failures
 --

 Key: HBASE-5700
 URL: https://issues.apache.org/jira/browse/HBASE-5700
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D2583.1.patch


 Porting TestMiniClusterLoad* tests to 89-fb in HBASE-5679 uncovered certain 
 problems with mini-cluster setup in 89-fb that need to be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5217) Reenable the thrift tests, and add a new one for getRegionInfo

2012-04-03 Thread Alex Newman (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245537#comment-13245537
]

Alex Newman commented on HBASE-5217:

This boldis/bold dependent on the other being committed first. Sorry about
the spacing will fix when I am back from vegas. I think the reason this patch
had them consolidated was
* Runs all of the tests under a single JUnit test method. We
- * consolidate all testing to one method because HBaseClusterTestCase
- * is prone to OutOfMemoryExceptions when there are three or more
- * JUnit test methods.
- *
- * @throws Exception

I think this is mostly fixed now though.

Reenable the thrift tests, and add a new one for getRegionInfo
--

[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

[
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245536#comment-13245536
]

stack commented on HBASE-4821:
--

@Keith Welcome. Thanks for the nice fat comment. I'm not sure I want to run
your randomwalk test if its going to generate that many bugs in hbase (smile).
Let us take a looksee at the continuous ingest Good on you Keith.

A fully automated comprehensive distributed integration test for HBase
--

Key: HBASE-4821
URL: https://issues.apache.org/jira/browse/HBASE-4821
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

[jira] [Commented] (HBASE-5700) [89-fb] Fix TestMiniClusterLoad* test failures

2012-04-03 Thread Phabricator (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245540#comment-13245540
 ] 

Phabricator commented on HBASE-5700:


mbautin has committed the revision [jira] [HBASE-5700] [89-fb] Fix 
TestMiniClusterLoad* test failures.

REVISION DETAIL
  https://reviews.facebook.net/D2583

COMMIT
  https://reviews.facebook.net/rHBASEEIGHTNINEFBBRANCH1309066


 [89-fb] Fix TestMiniClusterLoad* test failures
 --

 Key: HBASE-5700
 URL: https://issues.apache.org/jira/browse/HBASE-5700
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D2583.1.patch


 Porting TestMiniClusterLoad* tests to 89-fb in HBASE-5679 uncovered certain 
 problems with mini-cluster setup in 89-fb that need to be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245545#comment-13245545
]

Keith Turner commented on HBASE-5487:
-

I am an Accumulo developer. CompactRange is our operation to force a range of
tablets(regions) to major compact all of their files into one file. The
TableRangeOp will merge a range of tablets into one tablet. TableRangeOp can
also delete a range of row from a table efficiently. It inserts splits points
at the rows you want to delete, drops the tablets, and then merges whats left.

Generic framework for Master-coordinated tasks
--

Key: HBASE-5487
URL: https://issues.apache.org/jira/browse/HBASE-5487
Project: HBase
Issue Type: New Feature
Components: master, regionserver, zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Labels: noob

Need a framework to execute master-coordinated tasks in a fault-tolerant
manner.
Master-coordinated tasks such as online-scheme change and delete-range
(deleting region(s) based on start/end key) can make use of this framework.
The advantages of framework are
1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for
master-coordinated tasks
2. Ability to abstract the common functions across Master - ZK and RS - ZK
3. Easy to plugin new master-coordinated tasks without adding code to core
components

[jira] [Updated] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.

2012-04-03 Thread Chinna Rao Lalam (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HBASE-5635:


Attachment: HBASE-5635.2.patch

 If getTaskList() returns null splitlogWorker is down. It wont serve any 
 requests. 
 --

 Key: HBASE-5635
 URL: https://issues.apache.org/jira/browse/HBASE-5635
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.1
Reporter: Kristam Subba Swathi
 Attachments: HBASE-5635.1.patch, HBASE-5635.2.patch, HBASE-5635.patch


 During the hlog split operation if all the zookeepers are down ,then the 
 paths will be returned as null and the splitworker thread wil be exited
 Now this regionserver wil not be able to acquire any other tasks since the 
 splitworker thread is exited
 Please find the attached code for more details
 {code}
 private ListString getTaskList() {
 for (int i = 0; i  zkretries; i++) {
   try {
 return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher,
 this.watcher.splitLogZNode));
   } catch (KeeperException e) {
 LOG.warn(Could not get children of znode  +
 this.watcher.splitLogZNode, e);
 try {
   Thread.sleep(1000);
 } catch (InterruptedException e1) {
   LOG.warn(Interrupted while trying to get task list ..., e1);
   Thread.currentThread().interrupt();
   return null;
 }
   }
 }
 {code}
 in the org.apache.hadoop.hbase.regionserver.SplitLogWorker 
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5635) If getTaskList() returns null splitlogWorker is down. It wont serve any requests.

2012-04-03 Thread Chinna Rao Lalam (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1324#comment-1324
 ] 

Chinna Rao Lalam commented on HBASE-5635:
-

Updated the patch with log message for retry and corrected the typo.

 If getTaskList() returns null splitlogWorker is down. It wont serve any 
 requests. 
 --

 Key: HBASE-5635
 URL: https://issues.apache.org/jira/browse/HBASE-5635
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.1
Reporter: Kristam Subba Swathi
 Attachments: HBASE-5635.1.patch, HBASE-5635.2.patch, HBASE-5635.patch


 During the hlog split operation if all the zookeepers are down ,then the 
 paths will be returned as null and the splitworker thread wil be exited
 Now this regionserver wil not be able to acquire any other tasks since the 
 splitworker thread is exited
 Please find the attached code for more details
 {code}
 private ListString getTaskList() {
 for (int i = 0; i  zkretries; i++) {
   try {
 return (ZKUtil.listChildrenAndWatchForNewChildren(this.watcher,
 this.watcher.splitLogZNode));
   } catch (KeeperException e) {
 LOG.warn(Could not get children of znode  +
 this.watcher.splitLogZNode, e);
 try {
   Thread.sleep(1000);
 } catch (InterruptedException e1) {
   LOG.warn(Interrupted while trying to get task list ..., e1);
   Thread.currentThread().interrupt();
   return null;
 }
   }
 }
 {code}
 in the org.apache.hadoop.hbase.regionserver.SplitLogWorker 
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

[
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245557#comment-13245557
]

Keith Turner commented on HBASE-4821:
-

Eric Newton has been experimenting with running goraci against HBase. One
issue he ran into was that gora-hbase uses auto flush on every HTable
connection. This really slowed down ingest. He modified the gora code locally
so it would not do this. Eric posted a question on the gora user list asking
why it behaved this way. The Gora API has a flush call.

A fully automated comprehensive distributed integration test for HBase
--

Key: HBASE-4821
URL: https://issues.apache.org/jira/browse/HBASE-4821
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

[jira] [Assigned] (HBASE-5705) Introduce Protocol Buffer RPC engine

2012-04-03 Thread Devaraj Das (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das reassigned HBASE-5705:
--

Assignee: Devaraj Das

 Introduce Protocol Buffer RPC engine
 

 Key: HBASE-5705
 URL: https://issues.apache.org/jira/browse/HBASE-5705
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Devaraj Das
Assignee: Devaraj Das

 Introduce Protocol Buffer RPC engine in the RPC core. Protocols that are PB 
 aware can be made to go through this RPC engine. The approach, in my current 
 thinking, would be similar to HADOOP-7773.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5705) Introduce Protocol Buffer RPC engine

2012-04-03 Thread Devaraj Das (Created) (JIRA)

Introduce Protocol Buffer RPC engine


 Key: HBASE-5705
 URL: https://issues.apache.org/jira/browse/HBASE-5705
 Project: HBase
  Issue Type: Sub-task
Reporter: Devaraj Das


Introduce Protocol Buffer RPC engine in the RPC core. Protocols that are PB 
aware can be made to go through this RPC engine. The approach, in my current 
thinking, would be similar to HADOOP-7773.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5706) Dropping fs latency stats since buffer is full spam

2012-04-03 Thread Jean-Daniel Cryans (Created) (JIRA)

Dropping fs latency stats since buffer is full spam
-

 Key: HBASE-5706
 URL: https://issues.apache.org/jira/browse/HBASE-5706
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Priority: Minor
 Fix For: 0.94.0, 0.96.0


I see tons of this while running tests (note that it's a WARN):

{noformat}
2012-04-03 18:54:47,172 WARN org.apache.hadoop.hbase.io.hfile.HFile: Dropping 
fs latency stats since buffer is full
{noformat}

While the code says this:

{noformat}
  // we don't want to fill up the logs with this message, so only log it 
  // once every 30 seconds at most
  // I also want to avoid locks on the 'critical path' (the common case will be
  // uncontended) - hence the CAS
  private static void logDroppedLatencyStat() {
{noformat}

It doesn't seem like this message is actionnable and even though it's printed 
only every 30 seconds it's still very spammy.

We should get rid of it or make it more useful (I don't know which).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245596#comment-13245596
 ] 

Keith Turner commented on HBASE-5487:
-

The description of FATE given in this ticket is pretty good.  The following 
resources may provide a little more info.

http://people.apache.org/~kturner/accumulo14_15.pdf

http://mail-archives.apache.org/mod_mbox/incubator-accumulo-dev/201202.mbox/%3CCAGUtCHpcHTDue-C_2RyDkZm0diW=zojd7-bzcgszqdtidzn...@mail.gmail.com%3E

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
  Labels: noob

 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-643) Rename tables

[
https://issues.apache.org/jira/browse/HBASE-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245620#comment-13245620
]

Keith Turner commented on HBASE-643:

Accumulo supports this feature by using table ids. Tables ids are generated
using zookeeper and are never reused (base 36 numbers are used to keep them
short and readable). A mapping from table id to table name is stored in
zookeeper. To rename a table, lock the table and change the mapping in
zookeeper.

Accumulo used to not use table ids, it stored the table name in meta and hdfs.
Now it uses the table id in hdfs and meta. We were discussing renaming tables,
and it seemed so complicated. Then someone thought of this table id solution,
it was such an elegant solution and made the problem trivial.

Although table ids were implemented to support table renaming, they had the
nice side effect of making hdfs and meta entries much shorter.

Rename tables
-

Key: HBASE-643
URL: https://issues.apache.org/jira/browse/HBASE-643
Project: HBase
Issue Type: New Feature
Reporter: Michael Bieniosek
Attachments: copy_table.rb, rename_table.rb

It would be nice to be able to rename tables, if this is possible. Some of
our internal users are doing things like: upload table mytable - realize
they screwed up - upload table mytable_2 - decide mytable_2 looks better -
have to go on using mytable_2 instead of originally desired table name.

[jira] [Assigned] (HBASE-5702) MasterSchemaChangeTracker.excludeRegionServerForSchemaChanges leaks a MonitoredTask per call

2012-04-03 Thread Subbu M Iyer (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subbu M Iyer reassigned HBASE-5702:
---

Assignee: Subbu M Iyer

I will take a look at it and address it as soon as possible.

 MasterSchemaChangeTracker.excludeRegionServerForSchemaChanges leaks a 
 MonitoredTask per call
 

 Key: HBASE-5702
 URL: https://issues.apache.org/jira/browse/HBASE-5702
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Subbu M Iyer
Priority: Critical
 Fix For: 0.94.0, 0.96.0


 This bug is so easy to reproduce I'm wondering why it hasn't been reported 
 yet. Stop any number of region servers on a 0.94/6 cluster and you'll see in 
 the master interface one task per stopped region server saying the following:
 |Processing schema change exclusion for region server = 
 sv4r27s44,62023,1333402175340|RUNNING (since 5sec ago)|No schema change in 
 progress. Skipping exclusion for server = sv4r27s44,62023,1333402175340 
 (since 5sec ago)|
 It's gonna stay there until the master cleans it:
 bq. WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status Processing 
 schema change exclusion for region server = sv4r27s44,62023,1333402175340: 
 status=No schema change in progress. Skipping exclusion for server = 
 sv4r27s44,62023,1333402175340, state=RUNNING, startTime=1333404636419, 
 completionTime=-1 appears to have been leaked
 It's not clear to me why it's using a MonitoredTask in the first place. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5702) MasterSchemaChangeTracker.excludeRegionServerForSchemaChanges leaks a MonitoredTask per call

2012-04-03 Thread Subbu M Iyer (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245650#comment-13245650
 ] 

Subbu M Iyer commented on HBASE-5702:
-

JD,

Can you share with me the many more issues with MonitoredTask that you have 
mentioned? I believe the problems you have seen are all related to the 
MonitoredTask reporting during instant schema change process and not with the 
actual schema change process itself? Please let me know so we can categorize 
and address the issues accordingly.

 MasterSchemaChangeTracker.excludeRegionServerForSchemaChanges leaks a 
 MonitoredTask per call
 

 Key: HBASE-5702
 URL: https://issues.apache.org/jira/browse/HBASE-5702
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Subbu M Iyer
Priority: Critical
 Fix For: 0.94.0, 0.96.0


 This bug is so easy to reproduce I'm wondering why it hasn't been reported 
 yet. Stop any number of region servers on a 0.94/6 cluster and you'll see in 
 the master interface one task per stopped region server saying the following:
 |Processing schema change exclusion for region server = 
 sv4r27s44,62023,1333402175340|RUNNING (since 5sec ago)|No schema change in 
 progress. Skipping exclusion for server = sv4r27s44,62023,1333402175340 
 (since 5sec ago)|
 It's gonna stay there until the master cleans it:
 bq. WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status Processing 
 schema change exclusion for region server = sv4r27s44,62023,1333402175340: 
 status=No schema change in progress. Skipping exclusion for server = 
 sv4r27s44,62023,1333402175340, state=RUNNING, startTime=1333404636419, 
 completionTime=-1 appears to have been leaked
 It's not clear to me why it's using a MonitoredTask in the first place. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5599) The hbck tool can not fix the six scenarios, it is NO_VERSION_FILE, NOT_IN_META_OR_DEPLOYED, NOT_IN_META, SHOULD_NOT_BE_DEPLOYED, FIRST_REGION_STARTKEY_NOT_EMPTY, HOLE_IN

2012-04-03 Thread Jonathan Hsieh (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Hsieh updated HBASE-5599:
--

Assignee: fulin wang
Status: Patch Available (was: Open)

The hbck tool can not fix the six scenarios, it is NO_VERSION_FILE,
NOT_IN_META_OR_DEPLOYED, NOT_IN_META, SHOULD_NOT_BE_DEPLOYED,
FIRST_REGION_STARTKEY_NOT_EMPTY, HOLE_IN_REGION_CHAIN.

Key: HBASE-5599
URL: https://issues.apache.org/jira/browse/HBASE-5599
Project: HBase
Issue Type: New Feature
Components: hbck
Affects Versions: 0.90.6
Reporter: fulin wang
Assignee: fulin wang
Fix For: 0.90.6

Attachments: hbase-5599-0.90.patch, hbase-5599-0.90_v2.patch,
hbase-5599-0.90_v3.patch, hbase-5599-0.90_v5.patch, hbase-5599-0.90_v6.patch,
hbase-5599-0.92_v5.patch, hbase-5599-0.94_v5.patch, hbase-5599-trunk_v5.patch

The hbck tool can not fix the six scenarios.
1. Version file does not exist in root dir.
Fix: I try to create a version file by 'FSUtils.setVersion' method.

2. [REGIONNAME][KEY] on HDFS, but not listed in META or deployed on any
region server.
Fix: I get region info form the hdfs file, this region info write to
'.META.' table.

3. [REGIONNAME][KEY] not in META, but deployed on [SERVERNAME]
Fix: I get region info form the hdfs file, this region info write to
'.META.' table.

4. [REGIONNAME] should not be deployed according to META, but is deployed on
[SERVERNAME]
Fix: Close this region.

5. First region should start with an empty key. You need to create a new
region and regioninfo in HDFS to plug the hole.
Fix: The region info is not in hdfs and .META., so it create a empty
region for this error.
6. There is a hole in the region chain between [KEY] and [KEY]. You need to
create a new regioninfo and region dir in hdfs to plug the hole.
Fix: The region info is not in hdfs and .META., so it create a empty region
for this hole.

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2012-04-03 Thread Enis Soztutar (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245672#comment-13245672
 ] 

Enis Soztutar commented on HBASE-5487:
--

Keith thanks a lot for the refs, we badly need something like FATE. 
Mubarak, or anybody else plans to work on this?

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
  Labels: noob

 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5692) Add real action time for HLogPrettyPrinter


[ 
https://issues.apache.org/jira/browse/HBASE-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245678#comment-13245678
 ] 

Hudson commented on HBASE-5692:
---

Integrated in HBase-TRUNK #2705 (See 
[https://builds.apache.org/job/HBase-TRUNK/2705/])
HBASE-5692 Add real action time for HLogPrettyPrinter (Revision 1308609)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogPrettyPrinter.java


 Add real action time for HLogPrettyPrinter
 --

 Key: HBASE-5692
 URL: https://issues.apache.org/jira/browse/HBASE-5692
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Xing Shi
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5692v2.patch, HBASE-5665-trunk.v2.patch, HBASE-5692.patch


 Now the HLogPrettyPrinter print the log without real op time but the timestamp
 {quote}
 Sequence 4 from region ee9877dfd55624f50b20acf572416a88 in table t5
   Action:
 row: r
 column: f3:q
 at time: Thu Jan 01 08:02:03 CST 1970
 {quote}
 Maybe we need to know the real op time like this
 {quote}
 Sequence 4 from region ee9877dfd55624f50b20acf572416a88 in table t5 at time: 
 Sun Apr 01 10:42:53 CST 2012
   Action:
 row: r
 column: f3:q
 timestamp: Thu Jan 01 08:02:03 CST 1970
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5599) The hbck tool can not fix the six scenarios, it is NO_VERSION_FILE, NOT_IN_META_OR_DEPLOYED, NOT_IN_META, SHOULD_NOT_BE_DEPLOYED, FIRST_REGION_STARTKEY_NOT_EMPTY, HOLE_

2012-04-03 Thread Jonathan Hsieh (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245688#comment-13245688
]

Jonathan Hsieh commented on HBASE-5599:
---

Thanks for the updates, the new patch looks pretty good. A few things to
follow up on before commit:

- If adding a test case is hard, let's file new issue to add HBCK test case
for SHOULD_NOT_BE_DEPLOYED case.

- Change the text here to be Try to fix missing hbase.version file in hdfs.
{code}
System.err.println( -fixVersionFile Try to fix the hbase.version
missing.);
{code}

- I noticed the Merge.java changes are still present in the newer patch -- is
that intended? It don't seem necessary or related to this issue.

This should be good to commit for 0.90, if you address these issues.

Could you provide an updated version for trunk?

Thanks!

The hbck tool can not fix the six scenarios, it is NO_VERSION_FILE,
NOT_IN_META_OR_DEPLOYED, NOT_IN_META, SHOULD_NOT_BE_DEPLOYED,
FIRST_REGION_STARTKEY_NOT_EMPTY, HOLE_IN_REGION_CHAIN.

The hbck tool can not fix the six scenarios.
1. Version file does not exist in root dir.
Fix: I try to create a version file by 'FSUtils.setVersion' method.

2. [REGIONNAME][KEY] on HDFS, but not listed in META or deployed on any
region server.
Fix: I get region info form the hdfs file, this region info write to
'.META.' table.

3. [REGIONNAME][KEY] not in META, but deployed on [SERVERNAME]
Fix: I get region info form the hdfs file, this region info write to
'.META.' table.

4. [REGIONNAME] should not be deployed according to META, but is deployed on
[SERVERNAME]
Fix: Close this region.

[jira] [Commented] (HBASE-5706) Dropping fs latency stats since buffer is full spam


[ 
https://issues.apache.org/jira/browse/HBASE-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245692#comment-13245692
 ] 

stack commented on HBASE-5706:
--

Assigning Shaneal so he'll at least take a look at it.  Make a recommendation 
boss and one of us can fix it.  Good on you.

 Dropping fs latency stats since buffer is full spam
 -

 Key: HBASE-5706
 URL: https://issues.apache.org/jira/browse/HBASE-5706
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Shaneal Manek
Priority: Minor
 Fix For: 0.94.0, 0.96.0


 I see tons of this while running tests (note that it's a WARN):
 {noformat}
 2012-04-03 18:54:47,172 WARN org.apache.hadoop.hbase.io.hfile.HFile: Dropping 
 fs latency stats since buffer is full
 {noformat}
 While the code says this:
 {noformat}
   // we don't want to fill up the logs with this message, so only log it 
   // once every 30 seconds at most
   // I also want to avoid locks on the 'critical path' (the common case will 
 be
   // uncontended) - hence the CAS
   private static void logDroppedLatencyStat() {
 {noformat}
 It doesn't seem like this message is actionnable and even though it's printed 
 only every 30 seconds it's still very spammy.
 We should get rid of it or make it more useful (I don't know which).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-5706) Dropping fs latency stats since buffer is full spam

2012-04-03 Thread stack (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reassigned HBASE-5706:


Assignee: Shaneal Manek

 Dropping fs latency stats since buffer is full spam
 -

 Key: HBASE-5706
 URL: https://issues.apache.org/jira/browse/HBASE-5706
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Shaneal Manek
Priority: Minor
 Fix For: 0.94.0, 0.96.0


 I see tons of this while running tests (note that it's a WARN):
 {noformat}
 2012-04-03 18:54:47,172 WARN org.apache.hadoop.hbase.io.hfile.HFile: Dropping 
 fs latency stats since buffer is full
 {noformat}
 While the code says this:
 {noformat}
   // we don't want to fill up the logs with this message, so only log it 
   // once every 30 seconds at most
   // I also want to avoid locks on the 'critical path' (the common case will 
 be
   // uncontended) - hence the CAS
   private static void logDroppedLatencyStat() {
 {noformat}
 It doesn't seem like this message is actionnable and even though it's printed 
 only every 30 seconds it's still very spammy.
 We should get rid of it or make it more useful (I don't know which).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4821) A fully automated comprehensive distributed integration test for HBase

[
https://issues.apache.org/jira/browse/HBASE-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245705#comment-13245705
]

Keith Turner commented on HBASE-4821:
-

I noticed an early comment about Python code in the Accumulo test dir. This is
code in test/auto and we call these functional test. This code is probably
similar to some HBase unit test. It supports test that run against a live
instance of Accumulo. The test framework starts an instance of Accumulo, runs
a python or JAva test against that instance, and then shuts the instance down.
Running all of the functional test takes 1 to 2 hours.

This test framework was written before random walk and it ensures basic
functionality works. For example theres a test to verify that adding split
points to a table continues to work. Since we have implemented random walk, I
have found myself writing a lot more random walk test and less functional test.
The reason for this is that the functional test usually test the feature when
the system is one state, where as random walk test the same feature with the
system in many different states. For example a random walk test that adds
splits points to a table will try to do that when the table and system are in
many different states. It may try to add the split when a tablet/region is
migrating, currently splitting, minor compacting, major compacting, offline,
etc. So the likelyhood of finding a bug with addsplits() in randomwalk is
much greater than the functional test. The functional test will detect if the
feature is completely broken, random walk can detect if the feature is broken
under certain circumstances.

A fully automated comprehensive distributed integration test for HBase
--

Key: HBASE-4821
URL: https://issues.apache.org/jira/browse/HBASE-4821
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-03 Thread Matteo Bertozzi (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245720#comment-13245720
 ] 

Matteo Bertozzi commented on HBASE-5666:


Still looking at the 0.90 code...
The new ZooKeeperWatcher (=0.92) calls the ZKUtil.createAndFailSilent(), to 
create base node and others, only if called by HMaster (canCreateBaseZNode = 
true), while before the code path was the same for everyone.

So now, if HMaster has not reached the create base node point, before the 
HRegionServer checks the existence of base node... the region server crashes...

If we want to keep the previous logic, the first one that arrives create the 
base node  co, we can remove the canCreateBaseZNode flag, else we can use 
HBASE-5666-v4.patch to wait and retry on checkExists().

what do you think?

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5599) The hbck tool can not fix the six scenarios, it is NO_VERSION_FILE, NOT_IN_META_OR_DEPLOYED, NOT_IN_META, SHOULD_NOT_BE_DEPLOYED, FIRST_REGION_STARTKEY_NOT_EMPTY, HOLE_

[
https://issues.apache.org/jira/browse/HBASE-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245725#comment-13245725
]

Hadoop QA commented on HBASE-5599:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12520573/hbase-5599-0.90_v6.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1378//console

This message is automatically generated.

The hbck tool can not fix the six scenarios, it is NO_VERSION_FILE,
NOT_IN_META_OR_DEPLOYED, NOT_IN_META, SHOULD_NOT_BE_DEPLOYED,
FIRST_REGION_STARTKEY_NOT_EMPTY, HOLE_IN_REGION_CHAIN.

The hbck tool can not fix the six scenarios.
1. Version file does not exist in root dir.
Fix: I try to create a version file by 'FSUtils.setVersion' method.

2. [REGIONNAME][KEY] on HDFS, but not listed in META or deployed on any
region server.
Fix: I get region info form the hdfs file, this region info write to
'.META.' table.

3. [REGIONNAME][KEY] not in META, but deployed on [SERVERNAME]
Fix: I get region info form the hdfs file, this region info write to
'.META.' table.

4. [REGIONNAME] should not be deployed according to META, but is deployed on
[SERVERNAME]
Fix: Close this region.

[jira] [Created] (HBASE-5707) Move clusterid and clusterup (shutdown) znodes over to pb

2012-04-03 Thread stack (Created) (JIRA)

Move clusterid and clusterup (shutdown) znodes over to pb
-

 Key: HBASE-5707
 URL: https://issues.apache.org/jira/browse/HBASE-5707
 Project: HBase
  Issue Type: Task
Reporter: stack




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245734#comment-13245734
 ] 

Zhihong Yu commented on HBASE-5666:
---

Patch v4 looks good. Minor comments below:

Since timeout of 0 is treated specially in this method:
{code}
+  public static int checkExists(ZooKeeperWatcher zkw, String znode, int 
timeout)
{code}
javadoc should mention special value of 0.
{code}
+   * @return true if baseznode exists.
+   * false if doesnot exists.
+   */
+  public boolean checkIfBaseNodeAvailable(int timeout) {
{code}
The false return doesn't have to be mentioned. If you want to keep it, it 
should read 'if baseznode does not exist'

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5707) Move clusterid and clusterup (shutdown) znodes over to pb


 [ 
https://issues.apache.org/jira/browse/HBASE-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5707:
-

Attachment: 5707.txt

{code}
M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
  Get the clusterid via utility in the ClusterId class rather than
  get it directly.
M src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterId.java
M src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java
  Serialize and deserialize using new ClusterId pb message.
  Include the PBUF preamble on znode content.
M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
  Rename getRootRegionZNodeContent to getZNodeData so it aligns in name
  with the other cases where we compose the znode content up in ClusterId
  and in ClusterStatusTracker.
M src/main/protobuf/ZooKeeper.proto
  Add new messages for ClusterId and ClusterUp
{code}

 Move clusterid and clusterup (shutdown) znodes over to pb
 -

 Key: HBASE-5707
 URL: https://issues.apache.org/jira/browse/HBASE-5707
 Project: HBase
  Issue Type: Task
Reporter: stack
 Attachments: 5707.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5707) Move clusterid and clusterup (shutdown) znodes over to pb

2012-04-03 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5707:
-

Status: Patch Available  (was: Open)

 Move clusterid and clusterup (shutdown) znodes over to pb
 -

 Key: HBASE-5707
 URL: https://issues.apache.org/jira/browse/HBASE-5707
 Project: HBase
  Issue Type: Task
Reporter: stack
 Attachments: 5707.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5707) Move clusterid and clusterup (shutdown) znodes over to pb


[ 
https://issues.apache.org/jira/browse/HBASE-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245737#comment-13245737
 ] 

jirapos...@reviews.apache.org commented on HBASE-5707:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4627/
---

Review request for hbase.


Summary
---

Move two more znodes to serialize their content as pb.

Here are the changes:

{code}
M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
  Get the clusterid via utility in the ClusterId class rather than
  get it directly.
M src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterId.java
M src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java
  Serialize and deserialize using new ClusterId pb message.
  Include the PBUF preamble on znode content.
M src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
  Rename getRootRegionZNodeContent to getZNodeData so it aligns in name
  with the other cases where we compose the znode content up in ClusterId
  and in ClusterStatusTracker.
M src/main/protobuf/ZooKeeper.proto
  Add new messages for ClusterId and ClusterUp
{code}


This addresses bug hbase-5707.
https://issues.apache.org/jira/browse/hbase-5707


Diffs
-

  src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java aa30969 
  src/main/java/org/apache/hadoop/hbase/protobuf/generated/ZooKeeperProtos.java 
8ff87fe 
  src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterId.java 3fa83e6 
  src/main/java/org/apache/hadoop/hbase/zookeeper/ClusterStatusTracker.java 
61e7367 
  src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java 
6b2ea57 
  src/main/protobuf/ZooKeeper.proto 20f8eb0 

Diff: https://reviews.apache.org/r/4627/diff


Testing
---


Thanks,

Michael



 Move clusterid and clusterup (shutdown) znodes over to pb
 -

 Key: HBASE-5707
 URL: https://issues.apache.org/jira/browse/HBASE-5707
 Project: HBase
  Issue Type: Task
Reporter: stack
 Attachments: 5707.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5688) Convert zk root-region-server znode content to pb


[ 
https://issues.apache.org/jira/browse/HBASE-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245746#comment-13245746
 ] 

Hudson commented on HBASE-5688:
---

Integrated in HBase-TRUNK #2706 (See 
[https://builds.apache.org/job/HBase-TRUNK/2706/])
HBASE-5688 Convert zk root-region-server znode content to pb; DELETE RLE... 
causing build to fail because of RAT warnings (Revision 1308697)
HBase-5688 Convert zk root-region-server znode content to pb (Revision 1308681)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/RootLocationEditor.java


 Convert zk root-region-server znode content to pb
 -

 Key: HBASE-5688
 URL: https://issues.apache.org/jira/browse/HBASE-5688
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.96.0

 Attachments: 5688.addendum.txt, 5688.txt, 5688v4.txt, 5688v5.txt


 Move the root-region-server znode content from the versioned bytes that 
 ServerName.getVersionedBytes outputs to instead be pb.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs


[ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245749#comment-13245749
 ] 

Hudson commented on HBASE-5451:
---

Integrated in HBase-TRUNK #2706 (See 
[https://builds.apache.org/job/HBase-TRUNK/2706/])
HBASE-5451 Switch RPC call envelope/headers to PBs (Revision 1309019)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/DataOutputOutputStream.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/generated/RPCProtos.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/User.java
* /hbase/trunk/src/main/protobuf/RPC.proto


 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 5305v7.txt, 5305v7.txt, rpc-proto.2.txt, 
 rpc-proto.3.txt, rpc-proto.patch.1_2, rpc-proto.r5.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5701) Put RegionServerDynamicStatistics under RegionServer in MBean hierarchy rather than have it as a peer.


[ 
https://issues.apache.org/jira/browse/HBASE-5701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245747#comment-13245747
 ] 

Hudson commented on HBASE-5701:
---

Integrated in HBase-TRUNK #2706 (See 
[https://builds.apache.org/job/HBase-TRUNK/2706/])
HBASE-5701 Put RegionServerDynamicStatistics under RegionServer in MBean 
hierarchy rather than have it as a peer (Revision 1308971)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicStatistics.java


 Put RegionServerDynamicStatistics under RegionServer in MBean hierarchy 
 rather than have it as a peer.
 --

 Key: HBASE-5701
 URL: https://issues.apache.org/jira/browse/HBASE-5701
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.94.0

 Attachments: 5701.txt, 5701.txt, screenshot-1.jpg, screenshot-2.jpg




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4398) If HRegionPartitioner is used in MapReduce, client side configurations are overwritten by hbase-site.xml.


[ 
https://issues.apache.org/jira/browse/HBASE-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245750#comment-13245750
 ] 

Hudson commented on HBASE-4398:
---

Integrated in HBase-TRUNK #2706 (See 
[https://builds.apache.org/job/HBase-TRUNK/2706/])
HBASE-5704 HBASE-4398 mistakenly rolled back on trunk (Revision 1308964)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/HRegionPartitioner.java


 If HRegionPartitioner is used in MapReduce, client side configurations are 
 overwritten by hbase-site.xml.
 -

 Key: HBASE-4398
 URL: https://issues.apache.org/jira/browse/HBASE-4398
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Takuya Ueshin
Assignee: Takuya Ueshin
 Fix For: 0.92.2, 0.94.0

 Attachments: HBASE-4398.patch


 If HRegionPartitioner is used in MapReduce, client side configurations are 
 overwritten by hbase-site.xml.
 We can reproduce the problem by the following instructions:
 {noformat}
 - Add HRegionPartitioner.class to the 4th argument of
 TableMapReduceUtil#initTableReducerJob()
 at line around 133
 in src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableMapReduce.java
 - Change or remove hbase.zookeeper.property.clientPort property
 in hbase-site.xml ( for example, changed to 12345 ).
 - run testMultiRegionTable()
 {noformat}
 Then I got error messages as following:
 {noformat}
 2011-09-12 22:28:51,020 DEBUG [Thread-832] zookeeper.ZKUtil(93): hconnection 
 opening connection to ZooKeeper with ensemble (localhost:12345)
 2011-09-12 22:28:51,022 INFO  [Thread-832] 
 zookeeper.RecoverableZooKeeper(89): The identifier of this process is 
 43200@imac.local
 2011-09-12 22:28:51,123 WARN  [Thread-832] 
 zookeeper.RecoverableZooKeeper(161): Possibly transient ZooKeeper exception: 
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for /hbase/master
 2011-09-12 22:28:51,123 INFO  [Thread-832] 
 zookeeper.RecoverableZooKeeper(173): The 1 times to retry ZooKeeper after 
 sleeping 1000 ms
  =
 2011-09-12 22:29:02,418 ERROR [Thread-832] mapreduce.HRegionPartitioner(125): 
 java.io.IOException: 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2e54e48d
  closed
 2011-09-12 22:29:02,422 WARN  [Thread-832] mapred.LocalJobRunner$Job(256): 
 job_local_0001
 java.lang.NullPointerException
at 
 org.apache.hadoop.hbase.mapreduce.HRegionPartitioner.setConf(HRegionPartitioner.java:128)
at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:527)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 {noformat}
 I think HTable should connect to ZooKeeper at port 21818 configured at client 
 side instead of 12345 in hbase-site.xml
 and It might be caused by HBaseConfiguration.addHbaseResources(conf); in 
 HRegionPartitioner#setConf(Configuration).
 And this might mean that all of client side configurations, also configured 
 in hbase-site.xml, are overwritten caused by this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5704) HBASE-4398 mistakenly rolled back on trunk


[ 
https://issues.apache.org/jira/browse/HBASE-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245748#comment-13245748
 ] 

Hudson commented on HBASE-5704:
---

Integrated in HBase-TRUNK #2706 (See 
[https://builds.apache.org/job/HBase-TRUNK/2706/])
HBASE-5704 HBASE-4398 mistakenly rolled back on trunk (Revision 1308964)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/HRegionPartitioner.java


 HBASE-4398 mistakenly rolled back on trunk
 --

 Key: HBASE-5704
 URL: https://issues.apache.org/jira/browse/HBASE-5704
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.96.0

 Attachments: 5704.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3134) [replication] Add the ability to enable/disable streams


[ 
https://issues.apache.org/jira/browse/HBASE-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245745#comment-13245745
 ] 

Hudson commented on HBASE-3134:
---

Integrated in HBase-TRUNK #2706 (See 
[https://builds.apache.org/job/HBase-TRUNK/2706/])
HBASE-3134 [replication] Add the ability to enable/disable streams 
(Teruyoshi Zenmyo) (Revision 1308675)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/replication/ReplicationAdmin.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceInterface.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
* /hbase/trunk/src/main/ruby/hbase/replication_admin.rb
* /hbase/trunk/src/main/ruby/shell/commands/disable_peer.rb
* /hbase/trunk/src/main/ruby/shell/commands/enable_peer.rb
* /hbase/trunk/src/main/ruby/shell/commands/list_peers.rb
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/replication/ReplicationSourceDummy.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java


 [replication] Add the ability to enable/disable streams
 ---

 Key: HBASE-3134
 URL: https://issues.apache.org/jira/browse/HBASE-3134
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Teruyoshi Zenmyo
Priority: Minor
  Labels: replication
 Fix For: 0.94.0

 Attachments: 3134-v2.txt, 3134-v3.txt, 3134-v4.txt, 3134.txt, 
 HBASE-3134.patch, HBASE-3134.patch, HBASE-3134.patch, HBASE-3134.patch


 This jira was initially in the scope of HBASE-2201, but was pushed out since 
 it has low value compared to the required effort (and when want to ship 
 0.90.0 rather soonish).
 We need to design a way to enable/disable replication streams in a 
 determinate fashion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-03 Thread Matteo Bertozzi (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245758#comment-13245758
 ] 

Matteo Bertozzi commented on HBASE-5666:


The problem that I see with the patch v4 and the if (timeout == 0) special case 
is that exists() is different for ZooKeeper and RecoverableZookeeper.

RecoverableZookeeper has some internal retry logic for CONNECTIONLOSS, 
SESSIONEXPIRED, and OPERATIONTIMEOUT, to keep the code simple we can add this 
logic in ZKUtil.checkExist() in this way we can remove the special case, and 
remove the code in RecoverabeZK.

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-03 Thread Matteo Bertozzi (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245759#comment-13245759
 ] 

Matteo Bertozzi commented on HBASE-5666:


The problem that I see with the patch v4 and the if (timeout == 0) special case 
is that exists() is different for ZooKeeper and RecoverableZookeeper.

RecoverableZookeeper has some internal retry logic for CONNECTIONLOSS, 
SESSIONEXPIRED, and OPERATIONTIMEOUT, to keep the code simple we can add this 
logic in ZKUtil.checkExist() in this way we can remove the special case, and 
remove the code in RecoverabeZK.

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-04-03 Thread Jimmy Xiang (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245763#comment-13245763
 ] 

Jimmy Xiang commented on HBASE-5606:


It is ok with me. Hopefully, there is no other place.

 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.2

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost


[ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245767#comment-13245767
 ] 

Zhihong Yu commented on HBASE-5606:
---

Will integrate later today if there is no objection.

 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.2

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Created] (HBASE-5706) Dropping fs latency stats since buffer is full spam

2012-04-03 Thread Andrew Purtell



 
Best regards,




    - Andy


Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)




 From: Jean-Daniel Cryans (Created) (JIRA) j...@apache.org
To: issues@hbase.apache.org 
Sent: Tuesday, April 3, 2012 11:58 AM
Subject: [jira] [Created] (HBASE-5706) Dropping fs latency stats since buffer 
is full spam
 
Dropping fs latency stats since buffer is full spam
-

                 Key: HBASE-5706
                 URL: https://issues.apache.org/jira/browse/HBASE-5706
             Project: HBase
          Issue Type: Improvement
            Reporter: Jean-Daniel Cryans
            Priority: Minor
             Fix For: 0.94.0, 0.96.0


I see tons of this while running tests (note that it's a WARN):

{noformat}
2012-04-03 18:54:47,172 WARN org.apache.hadoop.hbase.io.hfile.HFile: Dropping 
fs latency stats since buffer is full
{noformat}

While the code says this:

{noformat}
  // we don't want to fill up the logs with this message, so only log it 
  // once every 30 seconds at most
  // I also want to avoid locks on the 'critical path' (the common case will be
  // uncontended) - hence the CAS
  private static void logDroppedLatencyStat() {
{noformat}

It doesn't seem like this message is actionnable and even though it's printed 
only every 30 seconds it's still very spammy.

We should get rid of it or make it more useful (I don't know which).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5706) Dropping fs latency stats since buffer is full spam

2012-04-03 Thread Andrew Purtell (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245770#comment-13245770
 ] 

Andrew Purtell commented on HBASE-5706:
---

Interesting, this came up today here too. I've been playing around with a 
slightly modified version of this patch. Under high load the FS latency stat 
buffers can fill up before the metrics thread drains them, doesn't seem an 
exceptional condition. Removing logDroppedLatencyStat() and logging around it 
gets rid of a number of branches and a CAS from a hot code path, so that's what 
we did.

 Dropping fs latency stats since buffer is full spam
 -

 Key: HBASE-5706
 URL: https://issues.apache.org/jira/browse/HBASE-5706
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Shaneal Manek
Priority: Minor
 Fix For: 0.94.0, 0.96.0


 I see tons of this while running tests (note that it's a WARN):
 {noformat}
 2012-04-03 18:54:47,172 WARN org.apache.hadoop.hbase.io.hfile.HFile: Dropping 
 fs latency stats since buffer is full
 {noformat}
 While the code says this:
 {noformat}
   // we don't want to fill up the logs with this message, so only log it 
   // once every 30 seconds at most
   // I also want to avoid locks on the 'critical path' (the common case will 
 be
   // uncontended) - hence the CAS
   private static void logDroppedLatencyStat() {
 {noformat}
 It doesn't seem like this message is actionnable and even though it's printed 
 only every 30 seconds it's still very spammy.
 We should get rid of it or make it more useful (I don't know which).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5706) Dropping fs latency stats since buffer is full spam

2012-04-03 Thread Andrew Purtell (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245775#comment-13245775
 ] 

Andrew Purtell commented on HBASE-5706:
---



 
Best regards,




    - Andy


Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)





 Dropping fs latency stats since buffer is full spam
 -

 Key: HBASE-5706
 URL: https://issues.apache.org/jira/browse/HBASE-5706
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Shaneal Manek
Priority: Minor
 Fix For: 0.94.0, 0.96.0


 I see tons of this while running tests (note that it's a WARN):
 {noformat}
 2012-04-03 18:54:47,172 WARN org.apache.hadoop.hbase.io.hfile.HFile: Dropping 
 fs latency stats since buffer is full
 {noformat}
 While the code says this:
 {noformat}
   // we don't want to fill up the logs with this message, so only log it 
   // once every 30 seconds at most
   // I also want to avoid locks on the 'critical path' (the common case will 
 be
   // uncontended) - hence the CAS
   private static void logDroppedLatencyStat() {
 {noformat}
 It doesn't seem like this message is actionnable and even though it's printed 
 only every 30 seconds it's still very spammy.
 We should get rid of it or make it more useful (I don't know which).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5706) Dropping fs latency stats since buffer is full spam

2012-04-03 Thread Andrew Purtell (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-5706:
--

Comment: was deleted

(was: 

 
Best regards,




    - Andy


Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)



)

 Dropping fs latency stats since buffer is full spam
 -

 Key: HBASE-5706
 URL: https://issues.apache.org/jira/browse/HBASE-5706
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Assignee: Shaneal Manek
Priority: Minor
 Fix For: 0.94.0, 0.96.0


 I see tons of this while running tests (note that it's a WARN):
 {noformat}
 2012-04-03 18:54:47,172 WARN org.apache.hadoop.hbase.io.hfile.HFile: Dropping 
 fs latency stats since buffer is full
 {noformat}
 While the code says this:
 {noformat}
   // we don't want to fill up the logs with this message, so only log it 
   // once every 30 seconds at most
   // I also want to avoid locks on the 'critical path' (the common case will 
 be
   // uncontended) - hence the CAS
   private static void logDroppedLatencyStat() {
 {noformat}
 It doesn't seem like this message is actionnable and even though it's printed 
 only every 30 seconds it's still very spammy.
 We should get rid of it or make it more useful (I don't know which).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits

2012-04-03 Thread Prakash Khemani (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245790#comment-13245790
 ] 

Prakash Khemani commented on HBASE-5618:


sorry for the test failure. fixed and verified.

 SplitLogManager - prevent unnecessary attempts to resubmits
 ---

 Key: HBASE-5618
 URL: https://issues.apache.org/jira/browse/HBASE-5618
 Project: HBase
  Issue Type: Improvement
  Components: wal, zookeeper
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Attachments: 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch


 Currently once a watch fires that the task node has been updated (hearbeated) 
 by the worker, the splitlogmanager still quite some time before it updates 
 the last heard from time. This is because the manager currently schedules 
 another getDataSetWatch() and only after that finishes will it update the 
 task's last heard from time.
 This leads to a large number of zk-BadVersion warnings when resubmission is 
 continuously attempted and it fails.
 Two changes should be made
 (1) On a resubmission failure because of BadVersion the task's lastUpdate 
 time should get upped.
 (2) The task's lastUpdate time should get upped as soon as the 
 nodeDataChanged() watch fires and without waiting for getDataSetWatch() to 
 complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits

2012-04-03 Thread Prakash Khemani (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-5618:
---

Attachment: 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch

 SplitLogManager - prevent unnecessary attempts to resubmits
 ---

 Key: HBASE-5618
 URL: https://issues.apache.org/jira/browse/HBASE-5618
 Project: HBase
  Issue Type: Improvement
  Components: wal, zookeeper
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Attachments: 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch


 Currently once a watch fires that the task node has been updated (hearbeated) 
 by the worker, the splitlogmanager still quite some time before it updates 
 the last heard from time. This is because the manager currently schedules 
 another getDataSetWatch() and only after that finishes will it update the 
 task's last heard from time.
 This leads to a large number of zk-BadVersion warnings when resubmission is 
 continuously attempted and it fails.
 Two changes should be made
 (1) On a resubmission failure because of BadVersion the task's lastUpdate 
 time should get upped.
 (2) The task's lastUpdate time should get upped as soon as the 
 nodeDataChanged() watch fires and without waiting for getDataSetWatch() to 
 complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits

2012-04-03 Thread Prakash Khemani (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani reassigned HBASE-5618:
--

Assignee: Prakash Khemani

 SplitLogManager - prevent unnecessary attempts to resubmits
 ---

 Key: HBASE-5618
 URL: https://issues.apache.org/jira/browse/HBASE-5618
 Project: HBase
  Issue Type: Improvement
  Components: wal, zookeeper
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Attachments: 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch


 Currently once a watch fires that the task node has been updated (hearbeated) 
 by the worker, the splitlogmanager still quite some time before it updates 
 the last heard from time. This is because the manager currently schedules 
 another getDataSetWatch() and only after that finishes will it update the 
 task's last heard from time.
 This leads to a large number of zk-BadVersion warnings when resubmission is 
 continuously attempted and it fails.
 Two changes should be made
 (1) On a resubmission failure because of BadVersion the task's lastUpdate 
 time should get upped.
 (2) The task's lastUpdate time should get upped as soon as the 
 nodeDataChanged() watch fires and without waiting for getDataSetWatch() to 
 complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5702) MasterSchemaChangeTracker.excludeRegionServerForSchemaChanges leaks a MonitoredTask per call

2012-04-03 Thread Jean-Daniel Cryans (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245793#comment-13245793
]

Jean-Daniel Cryans commented on HBASE-5702:
---

Yeah when I opened this jira it had a narrow scope but I think it could be much
bigger. Basically if you try to instant alter a table you'll see there's about
two or three (sorry I can't be more precise at the moment, I'm testing
something else right now) tasks that are leaked.

I can see some issues just looking at the code:

MasterSchemaChangeTracker
- processCompletedSchemaChanges: creates a task, sets the status, never closes
it. It's a misuse of MonitoredTask I think, a task is normally something that's
long running and you need to report progress.
- processAlterStatus: creates a task but stops it right away, also creates one
then kills it.
- handleFailedOrExpiredSchemaChanges: creates a task, sets the status, never
closes it. Also it has an extra white space.
- createSchemaChangeNode: creates a task then closes/aborts it right away

SchemaChangeTracker
- handleSchemaChange: creates a task, sets the status, never closes it.
- reportAndLogSchemaRefreshError: can create task and set a status but doesn't
close it.

There really should only be 1 task throughout the alter process.

MasterSchemaChangeTracker.excludeRegionServerForSchemaChanges leaks a
MonitoredTask per call

Key: HBASE-5702
URL: https://issues.apache.org/jira/browse/HBASE-5702
Project: HBase
Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Subbu M Iyer
Priority: Critical
Fix For: 0.94.0, 0.96.0

This bug is so easy to reproduce I'm wondering why it hasn't been reported
yet. Stop any number of region servers on a 0.94/6 cluster and you'll see in
the master interface one task per stopped region server saying the following:
|Processing schema change exclusion for region server =
sv4r27s44,62023,1333402175340|RUNNING (since 5sec ago)|No schema change in
progress. Skipping exclusion for server = sv4r27s44,62023,1333402175340
(since 5sec ago)|
It's gonna stay there until the master cleans it:
bq. WARN org.apache.hadoop.hbase.monitoring.TaskMonitor: Status Processing
schema change exclusion for region server = sv4r27s44,62023,1333402175340:
status=No schema change in progress. Skipping exclusion for server =
sv4r27s44,62023,1333402175340, state=RUNNING, startTime=1333404636419,
completionTime=-1 appears to have been leaked
It's not clear to me why it's using a MonitoredTask in the first place.

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available


[ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245795#comment-13245795
 ] 

Zhihong Yu commented on HBASE-5666:
---

Interesting idea above.
+1 on removing special case of timeout == 0.

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

2012-04-03 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5666:
--

Comment: was deleted

(was: The problem that I see with the patch v4 and the if (timeout == 0) 
special case is that exists() is different for ZooKeeper and 
RecoverableZookeeper.

RecoverableZookeeper has some internal retry logic for CONNECTIONLOSS, 
SESSIONEXPIRED, and OPERATIONTIMEOUT, to keep the code simple we can add this 
logic in ZKUtil.checkExist() in this way we can remove the special case, and 
remove the code in RecoverabeZK.)

 RegionServer doesn't retry to check if base node is available
 -

 Key: HBASE-5666
 URL: https://issues.apache.org/jira/browse/HBASE-5666
 Project: HBase
  Issue Type: Bug
  Components: regionserver, zookeeper
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, 
 HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, 
 hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, 
 hbase-regionserver.log, hbase-zookeeper.log


 I've a script that starts hbase and a couple of region servers in distributed 
 mode (hbase.cluster.distributed = true)
 {code}
 $HBASE_HOME/bin/start-hbase.sh
 $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
 {code}
 but the region servers are not able to start...
 It seems that during the RS start the the znode is still not available, and 
 HRegionServer.initializeZooKeeper() check just once if the base not is 
 available.
 {code}
 2012-03-28 21:54:05,013 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value 
 configured in 'zookeeper.znode.parent'. There could be a mismatch with the 
 one configured in the master.
 2012-03-28 21:54:08,598 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 localhost,60202,133296824: Initialization of RS failed.  Hence aborting 
 RS.
 java.io.IOException: Received the shutdown message while waiting.
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-04-03 Thread Prakash Khemani (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245797#comment-13245797
 ] 

Prakash Khemani commented on HBASE-5606:



Making the deletes synchronous doesn't  theoretically remove the race 
condition. A master could send the delete to the zk-server it is connected to 
and die. The next master can (theoretically) still run into the pending delete 
race.





 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.2

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5708) [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test

2012-04-03 Thread Mikhail Bautin (Created) (JIRA)

[89-fb] Make MiniMapRedCluster directory a subdirectory of target/test
--

 Key: HBASE-5708
 URL: https://issues.apache.org/jira/browse/HBASE-5708
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Priority: Minor


Some map-reduce-based tests are failing when executed concurrently in 89-fb 
because mini-map-reduce cluster uses /tmp/hadoop-username for temporary data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-04-03 Thread Nicolas Spiegelberg (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245804#comment-13245804
 ] 

Zhihong Yu commented on HBASE-5606:
---

Integrated to 0.92, 0.94 and trunk.

Thanks for the patch Prakash.

Thanks for the review Stack, Jimmy and Chinna.

 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.2

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits

2012-04-03 Thread Prakash Khemani (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-5618:
---

Attachment: 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch

patch without directory prefixes

 SplitLogManager - prevent unnecessary attempts to resubmits
 ---

 Key: HBASE-5618
 URL: https://issues.apache.org/jira/browse/HBASE-5618
 Project: HBase
  Issue Type: Improvement
  Components: wal, zookeeper
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Attachments: 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch


 Currently once a watch fires that the task node has been updated (hearbeated) 
 by the worker, the splitlogmanager still quite some time before it updates 
 the last heard from time. This is because the manager currently schedules 
 another getDataSetWatch() and only after that finishes will it update the 
 task's last heard from time.
 This leads to a large number of zk-BadVersion warnings when resubmission is 
 continuously attempted and it fails.
 Two changes should be made
 (1) On a resubmission failure because of BadVersion the task's lastUpdate 
 time should get upped.
 (2) The task's lastUpdate time should get upped as soon as the 
 nodeDataChanged() watch fires and without waiting for getDataSetWatch() to 
 complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5708) [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test


[ 
https://issues.apache.org/jira/browse/HBASE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245810#comment-13245810
 ] 

Nicolas Spiegelberg commented on HBASE-5708:


As a preemptive comment, have you figured out whether this issue is fixed on 
the trunk?

 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test
 --

 Key: HBASE-5708
 URL: https://issues.apache.org/jira/browse/HBASE-5708
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Priority: Minor

 Some map-reduce-based tests are failing when executed concurrently in 89-fb 
 because mini-map-reduce cluster uses /tmp/hadoop-username for temporary 
 data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5708) [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test

2012-04-03 Thread Mikhail Bautin (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245815#comment-13245815
 ] 

Mikhail Bautin commented on HBASE-5708:
---

It is fixed on the trunk with a lot of refactoring done around test directory 
set up. Right now I am not looking at backporting all of that, but at fixing 
this particular issue.

 [89-fb] Make MiniMapRedCluster directory a subdirectory of target/test
 --

 Key: HBASE-5708
 URL: https://issues.apache.org/jira/browse/HBASE-5708
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Priority: Minor

 Some map-reduce-based tests are failing when executed concurrently in 89-fb 
 because mini-map-reduce cluster uses /tmp/hadoop-username for temporary 
 data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-04-03 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5606:
--

Fix Version/s: 0.96.0
   0.94.0
 Hadoop Flags: Reviewed

 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-04-03 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5606:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2012-04-03 Thread Mubarak Seyed (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245829#comment-13245829
 ] 

Mubarak Seyed commented on HBASE-5487:
--

@Enis
I am not working on this right now. Thanks.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
  Labels: noob

 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits

[
https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245833#comment-13245833
]

Hadoop QA commented on HBASE-5618:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12521225/0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.regionserver.TestColumnSeeking

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1380//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1380//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1380//console

This message is automatically generated.

SplitLogManager - prevent unnecessary attempts to resubmits
---

Key: HBASE-5618
URL: https://issues.apache.org/jira/browse/HBASE-5618
Project: HBase
Issue Type: Improvement
Components: wal, zookeeper
Reporter: Prakash Khemani
Assignee: Prakash Khemani
Attachments:
0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch,
0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch,
0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch

[jira] [Commented] (HBASE-5707) Move clusterid and clusterup (shutdown) znodes over to pb