[jira] [Created] (HBASE-14800) Expose checkAndMutate via Thrift2

2015-11-12 Thread Josh Elser (JIRA)
Josh Elser created HBASE-14800:
--

 Summary: Expose checkAndMutate via Thrift2
 Key: HBASE-14800
 URL: https://issues.apache.org/jira/browse/HBASE-14800
 Project: HBase
  Issue Type: Improvement
  Components: Thrift
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 2.0.0


Had a user ask why checkAndMutate wasn't exposed via Thrift2.

I see no good reason (since checkAndPut and checkAndDelete are already there), 
so let's add it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HBASE-14498) Master stuck in infinite loop when all Zookeeper servers are unreachable

2015-11-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-14498:


> Master stuck in infinite loop when all Zookeeper servers are unreachable
> 
>
> Key: HBASE-14498
> URL: https://issues.apache.org/jira/browse/HBASE-14498
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Pankaj Kumar
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4
>
> Attachments: HBASE-14498-V2.patch, HBASE-14498-V3.patch, 
> HBASE-14498-V4.patch, HBASE-14498.patch
>
>
> We met a weird scenario in our production environment.
> In a HA cluster,
> > Active Master (HM1) is not able to connect to any Zookeeper server (due to 
> > N/w breakdown on master machine network with Zookeeper servers).
> {code}
> 2015-09-26 15:24:47,508 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 33463ms for sessionid 0x104576b8dda0002, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:24:47,877 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:48,236 INFO [main-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:49,879 WARN 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:49,879 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-IP1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:24:50,238 WARN [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:50,238 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-Host1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:25:17,470 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 30023ms for sessionid 0x2045762cc710006, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:25:17,571 WARN [master/HM1-Host/HM1-IP:16000] 
> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, 
> quorum=ZK-Host:2181,ZK-Host1:2181,ZK-Host2:2181, 
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2015-09-26 15:25:17,872 INFO [main-SendThread(ZK-Host:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host 2181
> 2015-09-26 15:25:19,874 WARN [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host
> 2015-09-26 15:25:19,874 INFO [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server ZK-Host/ZK-IP:2181. 
> Will not attempt to authenticate using SASL (unknown error)
> {code}
> > Since HM1 was not able to connect to any ZK, so session timeout didnt 
> > happen at Zookeeper server side and HM1 didnt abort.
> > On Zookeeper session timeout standby master (HM2) registered himself as an 
> > active master. 
> > HM2 is keep on waiting for region server to report him as part of active 
> > master intialization.
> {noformat} 
> 2015-09-26 15:24:44,928 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 0 ms, 
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval 
> of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> ---
> ---
> 2015-09-26 15:32:50,841 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 483913 
> ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, 
> interval of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> {noformat}
> > At other end, region servers are reporting to HM1 on 3 sec interval. Here 
> > region server retrieve master location from zookeeper only when they 
> > couldn't connect to Master (ServiceException).
> Region Server will not report HM2 as per current design until unless HM1 
> abort,so HM2 will exit(InitializationMonitor) and again wait for region 
> servers in loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14800) Expose checkAndMutate via Thrift2

2015-11-12 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-14800:
---
Attachment: HBASE-14800.001.patch

Just so I don't forget about it sitting on my laptop, a tentative patch that 
adds checkAndMutate. It's entirely too massive due to HBASE-14172. Need to get 
that fixed up first.

> Expose checkAndMutate via Thrift2
> -
>
> Key: HBASE-14800
> URL: https://issues.apache.org/jira/browse/HBASE-14800
> Project: HBase
>  Issue Type: Improvement
>  Components: Thrift
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 2.0.0
>
> Attachments: HBASE-14800.001.patch
>
>
> Had a user ask why checkAndMutate wasn't exposed via Thrift2.
> I see no good reason (since checkAndPut and checkAndDelete are already 
> there), so let's add it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14800) Expose checkAndMutate via Thrift2

2015-11-12 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-14800:
---
Status: Patch Available  (was: Open)

> Expose checkAndMutate via Thrift2
> -
>
> Key: HBASE-14800
> URL: https://issues.apache.org/jira/browse/HBASE-14800
> Project: HBase
>  Issue Type: Improvement
>  Components: Thrift
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 2.0.0
>
> Attachments: HBASE-14800.001.patch
>
>
> Had a user ask why checkAndMutate wasn't exposed via Thrift2.
> I see no good reason (since checkAndPut and checkAndDelete are already 
> there), so let's add it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14803) Add some debug logs to StoreFileScanner

2015-11-12 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-14803:

Attachment: HBASE-14803.v0-trunk.patch

> Add some debug logs to StoreFileScanner
> ---
>
> Key: HBASE-14803
> URL: https://issues.apache.org/jira/browse/HBASE-14803
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
>  Labels: beginner
> Fix For: 1.2.0
>
> Attachments: HBASE-14803.v0-trunk.patch
>
>
> To validate some behaviors I had to add some logs into StoreFileScanner.
> I think it can be interesting for other people looking for debuging. So 
> sharing the modifications here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14803) Add some debug logs to StoreFileScanner

2015-11-12 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-14803:

Status: Patch Available  (was: Open)

> Add some debug logs to StoreFileScanner
> ---
>
> Key: HBASE-14803
> URL: https://issues.apache.org/jira/browse/HBASE-14803
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
>  Labels: beginner
> Fix For: 1.2.0
>
> Attachments: HBASE-14803.v0-trunk.patch
>
>
> To validate some behaviors I had to add some logs into StoreFileScanner.
> I think it can be interesting for other people looking for debuging. So 
> sharing the modifications here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14789) Enhance the current spark-hbase connector

2015-11-12 Thread Zhan Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhan Zhang updated HBASE-14789:
---
Summary: Enhance the current spark-hbase connector  (was: Provide an 
alternative spark-hbase connector)

> Enhance the current spark-hbase connector
> -
>
> Key: HBASE-14789
> URL: https://issues.apache.org/jira/browse/HBASE-14789
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
> Attachments: shc.pdf
>
>
> This JIRA is to provide user an option to choose different Spark-HBase 
> implementation based on requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14355) Scan different TimeRange for each column family

2015-11-12 Thread churro morales (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002947#comment-15002947
 ] 

churro morales commented on HBASE-14355:


Thanks [~stack], sure thing I will regenerate the PB for branch-1 and put up a 
patch.



> Scan different TimeRange for each column family
> ---
>
> Key: HBASE-14355
> URL: https://issues.apache.org/jira/browse/HBASE-14355
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver, Scanners
>Reporter: Dave Latham
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: HBASE-14355-v1.patch, HBASE-14355-v10.patch, 
> HBASE-14355-v11.patch, HBASE-14355-v2.patch, HBASE-14355-v3.patch, 
> HBASE-14355-v4.patch, HBASE-14355-v5.patch, HBASE-14355-v6.patch, 
> HBASE-14355-v7.patch, HBASE-14355-v8.patch, HBASE-14355-v9.patch, 
> HBASE-14355.patch
>
>
> At present the Scan API supports only table level time range. We have 
> specific use cases that will benefit from per column family time range. (See 
> background discussion at 
> https://mail-archives.apache.org/mod_mbox/hbase-user/201508.mbox/%3ccaa4mzom00ef5eoxstk0hetxeby8mqss61gbvgttgpaspmhq...@mail.gmail.com%3E)
> There are a couple of choices that would be good to validate.  First - how to 
> update the Scan API to support family and table level updates.  One proposal 
> would be to add Scan.setTimeRange(byte family, long minTime, long maxTime), 
> then store it in a Map.  When executing the scan, if a 
> family has a specified TimeRange, then use it, otherwise fall back to using 
> the table level TimeRange.  Clients using the new API against old region 
> servers would not get the families correctly filterd.  Old clients sending 
> scans to new region servers would work correctly.
> The other question is how to get StoreFileScanner.shouldUseScanner to match 
> up the proper family and time range.  It has the Scan available but doesn't 
> currently have available which family it is a part of.  One option would be 
> to try to pass down the column family in each constructor path.  Another 
> would be to instead alter shouldUseScanner to pass down the specific 
> TimeRange to use (similar to how it currently passes down the columns to use 
> which also appears to be a workaround for not having the family available). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14355) Scan different TimeRange for each column family

2015-11-12 Thread churro morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

churro morales updated HBASE-14355:
---
Attachment: HBASE-14355.branch-1.patch

[~stack] here is a patch for branch-1 with the test fixed and the protobuf 
files regenerated. 



> Scan different TimeRange for each column family
> ---
>
> Key: HBASE-14355
> URL: https://issues.apache.org/jira/browse/HBASE-14355
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver, Scanners
>Reporter: Dave Latham
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: HBASE-14355-v1.patch, HBASE-14355-v10.patch, 
> HBASE-14355-v11.patch, HBASE-14355-v2.patch, HBASE-14355-v3.patch, 
> HBASE-14355-v4.patch, HBASE-14355-v5.patch, HBASE-14355-v6.patch, 
> HBASE-14355-v7.patch, HBASE-14355-v8.patch, HBASE-14355-v9.patch, 
> HBASE-14355.branch-1.patch, HBASE-14355.patch
>
>
> At present the Scan API supports only table level time range. We have 
> specific use cases that will benefit from per column family time range. (See 
> background discussion at 
> https://mail-archives.apache.org/mod_mbox/hbase-user/201508.mbox/%3ccaa4mzom00ef5eoxstk0hetxeby8mqss61gbvgttgpaspmhq...@mail.gmail.com%3E)
> There are a couple of choices that would be good to validate.  First - how to 
> update the Scan API to support family and table level updates.  One proposal 
> would be to add Scan.setTimeRange(byte family, long minTime, long maxTime), 
> then store it in a Map.  When executing the scan, if a 
> family has a specified TimeRange, then use it, otherwise fall back to using 
> the table level TimeRange.  Clients using the new API against old region 
> servers would not get the families correctly filterd.  Old clients sending 
> scans to new region servers would work correctly.
> The other question is how to get StoreFileScanner.shouldUseScanner to match 
> up the proper family and time range.  It has the Scan available but doesn't 
> currently have available which family it is a part of.  One option would be 
> to try to pass down the column family in each constructor path.  Another 
> would be to instead alter shouldUseScanner to pass down the specific 
> TimeRange to use (similar to how it currently passes down the columns to use 
> which also appears to be a workaround for not having the family available). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14355) Scan different TimeRange for each column family

2015-11-12 Thread churro morales (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003014#comment-15003014
 ] 

churro morales commented on HBASE-14355:


[~apurtell] we patched this for our 98 cluster.  Would you be interested in a 
patch for pushing into 98 as well?  

> Scan different TimeRange for each column family
> ---
>
> Key: HBASE-14355
> URL: https://issues.apache.org/jira/browse/HBASE-14355
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver, Scanners
>Reporter: Dave Latham
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: HBASE-14355-v1.patch, HBASE-14355-v10.patch, 
> HBASE-14355-v11.patch, HBASE-14355-v2.patch, HBASE-14355-v3.patch, 
> HBASE-14355-v4.patch, HBASE-14355-v5.patch, HBASE-14355-v6.patch, 
> HBASE-14355-v7.patch, HBASE-14355-v8.patch, HBASE-14355-v9.patch, 
> HBASE-14355.branch-1.patch, HBASE-14355.patch
>
>
> At present the Scan API supports only table level time range. We have 
> specific use cases that will benefit from per column family time range. (See 
> background discussion at 
> https://mail-archives.apache.org/mod_mbox/hbase-user/201508.mbox/%3ccaa4mzom00ef5eoxstk0hetxeby8mqss61gbvgttgpaspmhq...@mail.gmail.com%3E)
> There are a couple of choices that would be good to validate.  First - how to 
> update the Scan API to support family and table level updates.  One proposal 
> would be to add Scan.setTimeRange(byte family, long minTime, long maxTime), 
> then store it in a Map.  When executing the scan, if a 
> family has a specified TimeRange, then use it, otherwise fall back to using 
> the table level TimeRange.  Clients using the new API against old region 
> servers would not get the families correctly filterd.  Old clients sending 
> scans to new region servers would work correctly.
> The other question is how to get StoreFileScanner.shouldUseScanner to match 
> up the proper family and time range.  It has the Scan available but doesn't 
> currently have available which family it is a part of.  One option would be 
> to try to pass down the column family in each constructor path.  Another 
> would be to instead alter shouldUseScanner to pass down the specific 
> TimeRange to use (similar to how it currently passes down the columns to use 
> which also appears to be a workaround for not having the family available). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2015-11-12 Thread Zhan Zhang (JIRA)
Zhan Zhang created HBASE-14801:
--

 Summary: Enhance the Spark-HBase connector catalog with json format
 Key: HBASE-14801
 URL: https://issues.apache.org/jira/browse/HBASE-14801
 Project: HBase
  Issue Type: Improvement
Reporter: Zhan Zhang
Assignee: Zhan Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2015-11-12 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003057#comment-15003057
 ] 

Ted Malaska commented on HBASE-14801:
-

I have no problem with this, I think it looks a lot prettier then what I did on 
the first draft.  

Does anyone else have an thought on this?  We don't want to change this too 
many times once it gets in users hands, so let agree that this JSON format is 
what we want long term.

> Enhance the Spark-HBase connector catalog with json format
> --
>
> Key: HBASE-14801
> URL: https://issues.apache.org/jira/browse/HBASE-14801
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14789) Enhance the current spark-hbase connector

2015-11-12 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003062#comment-15003062
 ] 

Ted Malaska commented on HBASE-14789:
-

Adding Jira for Changing the table definition to JSON

> Enhance the current spark-hbase connector
> -
>
> Key: HBASE-14789
> URL: https://issues.apache.org/jira/browse/HBASE-14789
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
> Attachments: shc.pdf
>
>
> This JIRA is to optimize the RDD construction in the current connector 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14498) Master stuck in infinite loop when all Zookeeper servers are unreachable

2015-11-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002879#comment-15002879
 ] 

stack commented on HBASE-14498:
---

How does the test replicate what the original description describes? It is a 
tricky scenario. Thanks for reporting it. I am afraid that we may not have 
actually fixed the scenario described.

isConnected is the name of a method you would invoke to check a boolean named 
connected. It is not what you should name a variable.

Is this right?
connWaitTimeOut = this.conf.getLong("zookeeper.session.timeout", 9) * 2 / 3;

IIRC, you ask zk for a session timeout and it may give you something other than 
what you asked for (it is a while since I dug in here)

You drop the prefix here:

LOG.debug("Received Disconnected from ZooKeeper.");

prefix helps debugging... otherwise these zk logs are hard to trace to their 
origin.

Every call into a disconnect is going to spawn a new one of these unnamed 
threads?

Did you see the below message in your log output?

LOG.debug(prefix("Received Disconnected from ZooKeeper, ignoring"));

The idea is that we could disconnect but we'll keep trying to reconnect for zk 
session timeout and may succeed? Has the zk session timeout expired when we get 
this disconnect message?  Should we abort as soon as we get one of these (I 
wonder why we have the comment that says abort when we get such a message but 
we don't actually? Because the abort is done elsewhere?)

Thanks.


> Master stuck in infinite loop when all Zookeeper servers are unreachable
> 
>
> Key: HBASE-14498
> URL: https://issues.apache.org/jira/browse/HBASE-14498
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Pankaj Kumar
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4
>
> Attachments: HBASE-14498-V2.patch, HBASE-14498-V3.patch, 
> HBASE-14498-V4.patch, HBASE-14498.patch
>
>
> We met a weird scenario in our production environment.
> In a HA cluster,
> > Active Master (HM1) is not able to connect to any Zookeeper server (due to 
> > N/w breakdown on master machine network with Zookeeper servers).
> {code}
> 2015-09-26 15:24:47,508 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 33463ms for sessionid 0x104576b8dda0002, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:24:47,877 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:48,236 INFO [main-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:49,879 WARN 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:49,879 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-IP1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:24:50,238 WARN [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:50,238 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-Host1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:25:17,470 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 30023ms for sessionid 0x2045762cc710006, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:25:17,571 WARN [master/HM1-Host/HM1-IP:16000] 
> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, 
> quorum=ZK-Host:2181,ZK-Host1:2181,ZK-Host2:2181, 
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2015-09-26 15:25:17,872 INFO [main-SendThread(ZK-Host:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host 2181
> 2015-09-26 15:25:19,874 WARN [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host
> 2015-09-26 15:25:19,874 INFO [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server ZK-Host/ZK-IP:2181. 
> Will not attempt to authenticate using SASL (unknown error)
> {code}
> > Since HM1 was not able to connect to any ZK, so session timeout didnt 
> > happen at Zookeeper server side and HM1 didnt abort.
> > On Zookeeper session timeout standby master (HM2) registered himself as an 
> > active 

[jira] [Updated] (HBASE-14796) Enhance the Gets in the connector

2015-11-12 Thread Zhan Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhan Zhang updated HBASE-14796:
---
Summary: Enhance the Gets in the connector  (was: Provide an alternative 
spark-hbase SQL implementations for Gets)

> Enhance the Gets in the connector
> -
>
> Key: HBASE-14796
> URL: https://issues.apache.org/jira/browse/HBASE-14796
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
>
> Current the Spark-Module Spark SQL implementation gets records from HBase 
> from the driver if there is something like the following found in the SQL.
> rowkey = 123
> The reason for this original was normal sql will not have many equal 
> operations in a single where clause.
> Zhan, had brought up too points that have value.
> 1. The SQL may be generated and may have many many equal statements in it so 
> moving the work to an executor protects the driver from load
> 2. In the correct implementation the drive is connecting to HBase and 
> exceptions may cause trouble with the Spark application and not just with the 
> a single task execution



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14795) Enhance the spark-hbase scan operations

2015-11-12 Thread Zhan Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhan Zhang updated HBASE-14795:
---
Summary: Enhance the spark-hbase scan operations  (was: Provide an 
alternative spark-hbase SQL implementations for Scan)

> Enhance the spark-hbase scan operations
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14789) Enhance the current spark-hbase connector

2015-11-12 Thread Zhan Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhan Zhang updated HBASE-14789:
---
Description: This JIRA is to optimize the RDD construction in the current 
connector implementation.  (was: This JIRA is to provide user an option to 
choose different Spark-HBase implementation based on requirements.)

> Enhance the current spark-hbase connector
> -
>
> Key: HBASE-14789
> URL: https://issues.apache.org/jira/browse/HBASE-14789
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
> Attachments: shc.pdf
>
>
> This JIRA is to optimize the RDD construction in the current connector 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14803) Add some debug logs to StoreFileScanner

2015-11-12 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-14803:
---

 Summary: Add some debug logs to StoreFileScanner
 Key: HBASE-14803
 URL: https://issues.apache.org/jira/browse/HBASE-14803
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Minor
 Fix For: 1.2.0


To validate some behaviors I had to add some logs into StoreFileScanner.

I think it can be interesting for other people looking for debuging. So sharing 
the modifications here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14498) Master stuck in infinite loop when all Zookeeper servers are unreachable

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002780#comment-15002780
 ] 

Hudson commented on HBASE-14498:


SUCCESS: Integrated in HBase-Trunk_matrix #460 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/460/])
HBASE-14498 Master stuck in infinite loop when all Zookeeper servers are 
(tedyu: rev b677f2e65d07194702fc181c8fd777804fa967ae)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* 
hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperWatcher.java


> Master stuck in infinite loop when all Zookeeper servers are unreachable
> 
>
> Key: HBASE-14498
> URL: https://issues.apache.org/jira/browse/HBASE-14498
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Pankaj Kumar
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4
>
> Attachments: HBASE-14498-V2.patch, HBASE-14498-V3.patch, 
> HBASE-14498-V4.patch, HBASE-14498.patch
>
>
> We met a weird scenario in our production environment.
> In a HA cluster,
> > Active Master (HM1) is not able to connect to any Zookeeper server (due to 
> > N/w breakdown on master machine network with Zookeeper servers).
> {code}
> 2015-09-26 15:24:47,508 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 33463ms for sessionid 0x104576b8dda0002, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:24:47,877 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:48,236 INFO [main-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:49,879 WARN 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:49,879 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-IP1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:24:50,238 WARN [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:50,238 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-Host1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:25:17,470 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 30023ms for sessionid 0x2045762cc710006, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:25:17,571 WARN [master/HM1-Host/HM1-IP:16000] 
> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, 
> quorum=ZK-Host:2181,ZK-Host1:2181,ZK-Host2:2181, 
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2015-09-26 15:25:17,872 INFO [main-SendThread(ZK-Host:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host 2181
> 2015-09-26 15:25:19,874 WARN [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host
> 2015-09-26 15:25:19,874 INFO [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server ZK-Host/ZK-IP:2181. 
> Will not attempt to authenticate using SASL (unknown error)
> {code}
> > Since HM1 was not able to connect to any ZK, so session timeout didnt 
> > happen at Zookeeper server side and HM1 didnt abort.
> > On Zookeeper session timeout standby master (HM2) registered himself as an 
> > active master. 
> > HM2 is keep on waiting for region server to report him as part of active 
> > master intialization.
> {noformat} 
> 2015-09-26 15:24:44,928 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 0 ms, 
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval 
> of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> ---
> ---
> 2015-09-26 15:32:50,841 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 483913 
> ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, 
> interval of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> {noformat}
> > At other end, region servers are reporting to HM1 on 3 sec 

[jira] [Assigned] (HBASE-13707) CellCounter uses to many counters

2015-11-12 Thread NIDHI GAMBHIR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

NIDHI GAMBHIR reassigned HBASE-13707:
-

Assignee: NIDHI GAMBHIR

> CellCounter uses to many counters
> -
>
> Key: HBASE-13707
> URL: https://issues.apache.org/jira/browse/HBASE-13707
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.0.1
>Reporter: Jean-Marc Spaggiari
>Assignee: NIDHI GAMBHIR
>Priority: Minor
>  Labels: beginner
>
> CellCounters creates a counter per row... So it quickly becomes to many.
> We should provide an option to drop the statistic per rows and count only 
> cells overall for the table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14355) Scan different TimeRange for each column family

2015-11-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002913#comment-15002913
 ] 

stack commented on HBASE-14355:
---

I pushed the patch to master branch.

branch-1 looks like it needs pb regenerated. Mind doing this [~churromorales] 
and then I'll push it back on that branch too... thanks.

> Scan different TimeRange for each column family
> ---
>
> Key: HBASE-14355
> URL: https://issues.apache.org/jira/browse/HBASE-14355
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver, Scanners
>Reporter: Dave Latham
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: HBASE-14355-v1.patch, HBASE-14355-v10.patch, 
> HBASE-14355-v11.patch, HBASE-14355-v2.patch, HBASE-14355-v3.patch, 
> HBASE-14355-v4.patch, HBASE-14355-v5.patch, HBASE-14355-v6.patch, 
> HBASE-14355-v7.patch, HBASE-14355-v8.patch, HBASE-14355-v9.patch, 
> HBASE-14355.patch
>
>
> At present the Scan API supports only table level time range. We have 
> specific use cases that will benefit from per column family time range. (See 
> background discussion at 
> https://mail-archives.apache.org/mod_mbox/hbase-user/201508.mbox/%3ccaa4mzom00ef5eoxstk0hetxeby8mqss61gbvgttgpaspmhq...@mail.gmail.com%3E)
> There are a couple of choices that would be good to validate.  First - how to 
> update the Scan API to support family and table level updates.  One proposal 
> would be to add Scan.setTimeRange(byte family, long minTime, long maxTime), 
> then store it in a Map.  When executing the scan, if a 
> family has a specified TimeRange, then use it, otherwise fall back to using 
> the table level TimeRange.  Clients using the new API against old region 
> servers would not get the families correctly filterd.  Old clients sending 
> scans to new region servers would work correctly.
> The other question is how to get StoreFileScanner.shouldUseScanner to match 
> up the proper family and time range.  It has the Scan available but doesn't 
> currently have available which family it is a part of.  One option would be 
> to try to pass down the column family in each constructor path.  Another 
> would be to instead alter shouldUseScanner to pass down the specific 
> TimeRange to use (similar to how it currently passes down the columns to use 
> which also appears to be a workaround for not having the family available). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14782) FuzzyRowFilter skips valid rows

2015-11-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001848#comment-15001848
 ] 

Heng Chen commented on HBASE-14782:
---

Thanks [~vrodionov] for your test code.

The reason is that, as you can see in the patch.
{code}
-// NOT FOUND -> seek next using hint
+// NOT FOUND -> it means this row has been passed, so we jump to next row
 lastFoundIndex = -1;
-return ReturnCode.SEEK_NEXT_USING_HINT;
+return ReturnCode.NEXT_ROW;
{code}

FuzzyRowFilter should jump to next row if current row not match.  
Currently, if not match, fuzzyRowFilter will always return SEEK_NEXT_USING_HINT

I am not sure what is the difference between StoreScanner.seekAsDirection and 
StoreScanner.seekToNextRow,  but currently
If we go path StoreScanner.seekAsDirection (FuzzyRowFilter return 
SEEK_NEXT_USING_HINT),  StoreScanner.heap.peek() will return null.  
So heap will be set to null in StoreScanner.close  

Relates code in StoreScanner.next as below:
{code}
LOOP: do {
 ..
ScanQueryMatcher.MatchCode qcode = matcher.match(cell);
qcode = optimize(qcode, cell);
switch(qcode) {
 ...
case SEEK_NEXT_ROW:
  // This is just a relatively simple end of scan fix, to short-cut end
  // us if there is an endKey in the scan.
  if (!matcher.moreRowsMayExistAfter(cell)) {
return 
scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
  }
  seekToNextRow(cell);
  break;
   
case SEEK_NEXT_USING_HINT:
  Cell nextKV = matcher.getNextKeyHint(cell);
  if (nextKV != null) {
seekAsDirection(nextKV);
  } else {
heap.next();
  }
  break;
default:
  throw new RuntimeException("UNEXPECTED");
}
  } while((cell = this.heap.peek()) != null);

  if (count > 0) {
return 
scannerContext.setScannerState(NextState.MORE_VALUES).hasMoreValues();
  }
  close(false); // heap will set to null which cause the other rows will 
not be processed.
  return 
scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
{code}








> FuzzyRowFilter skips valid rows
> ---
>
> Key: HBASE-14782
> URL: https://issues.apache.org/jira/browse/HBASE-14782
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Attachments: HBASE-14782.patch
>
>
> The issue may affect not only master branch, but previous releases as well.
> This is from one of our customers:
> {quote}
> We are experiencing a problem with the FuzzyRowFilter for HBase scan. We 
> think that it is a bug. 
> Fuzzy filter should pick a row if it matches filter criteria irrespective of 
> other rows present in table but filter is dropping a row depending on some 
> other row present in table. 
> Details/Step to reproduce/Sample outputs below: 
> Missing row key: \x9C\x00\x044\x00\x00\x00\x00 
> Causing row key: \x9C\x00\x03\xE9e\xBB{X\x1Fwts\x1F\x15vRX 
> Prerequisites 
> 1. Create a test table. HBase shell command -- create 'fuzzytest','d' 
> 2. Insert some test data. HBase shell commands: 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x00\x00\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x01\x00\x00\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x01\x00\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x00\x01\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x01\x00\x01",'d:a','junk' 
> • put 'fuzzytest',"\x9B\x00\x044e\xBB\xB2\xBB",'d:a','junk' 
> • put 'fuzzytest',"\x9D\x00\x044e\xBB\xB2\xBB",'d:a','junk' 
> Now when you run the code, you will find \x9C\x00\x044\x00\x00\x00\x00 in 
> output because it matches filter criteria. (Refer how to run code below) 
> Insert the row key causing bug: 
> HBase shell command: put 
> 'fuzzytest',"\x9C\x00\x03\xE9e\xBB{X\x1Fwts\x1F\x15vRX",'d:a','junk' 
> Now when you run the code, you will not find \x9C\x00\x044\x00\x00\x00\x00 in 
> output even though it still matches filter criteria. 
> {quote}
> Verified the issue on master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14498) Master stuck in infinite loop when all Zookeeper servers are unreachable

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003226#comment-15003226
 ] 

Hudson commented on HBASE-14498:


FAILURE: Integrated in HBase-1.1-JDK8 #1681 (See 
[https://builds.apache.org/job/HBase-1.1-JDK8/1681/])
HBASE-14498 Master stuck in infinite loop when all Zookeeper servers (tedyu: 
rev 1ed7e7111cedbb348418bcfeaa02428cace69b74)
* 
hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperWatcher.java
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
HBASE-14498 Revert for on-going review (tedyu: rev 
520a7b325d6d72dc4a05a0f34616c1b801542103)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* 
hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperWatcher.java


> Master stuck in infinite loop when all Zookeeper servers are unreachable
> 
>
> Key: HBASE-14498
> URL: https://issues.apache.org/jira/browse/HBASE-14498
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Pankaj Kumar
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4
>
> Attachments: HBASE-14498-V2.patch, HBASE-14498-V3.patch, 
> HBASE-14498-V4.patch, HBASE-14498.patch
>
>
> We met a weird scenario in our production environment.
> In a HA cluster,
> > Active Master (HM1) is not able to connect to any Zookeeper server (due to 
> > N/w breakdown on master machine network with Zookeeper servers).
> {code}
> 2015-09-26 15:24:47,508 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 33463ms for sessionid 0x104576b8dda0002, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:24:47,877 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:48,236 INFO [main-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:49,879 WARN 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:49,879 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-IP1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:24:50,238 WARN [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:50,238 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-Host1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:25:17,470 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 30023ms for sessionid 0x2045762cc710006, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:25:17,571 WARN [master/HM1-Host/HM1-IP:16000] 
> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, 
> quorum=ZK-Host:2181,ZK-Host1:2181,ZK-Host2:2181, 
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2015-09-26 15:25:17,872 INFO [main-SendThread(ZK-Host:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host 2181
> 2015-09-26 15:25:19,874 WARN [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host
> 2015-09-26 15:25:19,874 INFO [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server ZK-Host/ZK-IP:2181. 
> Will not attempt to authenticate using SASL (unknown error)
> {code}
> > Since HM1 was not able to connect to any ZK, so session timeout didnt 
> > happen at Zookeeper server side and HM1 didnt abort.
> > On Zookeeper session timeout standby master (HM2) registered himself as an 
> > active master. 
> > HM2 is keep on waiting for region server to report him as part of active 
> > master intialization.
> {noformat} 
> 2015-09-26 15:24:44,928 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 0 ms, 
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval 
> of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> ---
> ---
> 2015-09-26 15:32:50,841 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 483913 
> ms, 

[jira] [Commented] (HBASE-14803) Add some debug logs to StoreFileScanner

2015-11-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003225#comment-15003225
 ] 

Hadoop QA commented on HBASE-14803:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12772060/HBASE-14803.v0-trunk.patch
  against master branch at commit 789f8a5a70242c16ce10bc95401c51c7d04debfa.
  ATTACHMENT ID: 12772060

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1730 checkstyle errors (more than the master's current 1727 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestStoreFile

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16503//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16503//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16503//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16503//console

This message is automatically generated.

> Add some debug logs to StoreFileScanner
> ---
>
> Key: HBASE-14803
> URL: https://issues.apache.org/jira/browse/HBASE-14803
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
>  Labels: beginner
> Fix For: 1.2.0
>
> Attachments: HBASE-14803.v0-trunk.patch
>
>
> To validate some behaviors I had to add some logs into StoreFileScanner.
> I think it can be interesting for other people looking for debuging. So 
> sharing the modifications here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14498) Master stuck in infinite loop when all Zookeeper servers are unreachable

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003224#comment-15003224
 ] 

Hudson commented on HBASE-14498:


SUCCESS: Integrated in HBase-1.2-IT #280 (See 
[https://builds.apache.org/job/HBase-1.2-IT/280/])
HBASE-14498 Revert for on-going review (tedyu: rev 
1db8abf707b760f588932a9ca137c4d9d96e3ab1)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* 
hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperWatcher.java


> Master stuck in infinite loop when all Zookeeper servers are unreachable
> 
>
> Key: HBASE-14498
> URL: https://issues.apache.org/jira/browse/HBASE-14498
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Pankaj Kumar
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4
>
> Attachments: HBASE-14498-V2.patch, HBASE-14498-V3.patch, 
> HBASE-14498-V4.patch, HBASE-14498.patch
>
>
> We met a weird scenario in our production environment.
> In a HA cluster,
> > Active Master (HM1) is not able to connect to any Zookeeper server (due to 
> > N/w breakdown on master machine network with Zookeeper servers).
> {code}
> 2015-09-26 15:24:47,508 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 33463ms for sessionid 0x104576b8dda0002, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:24:47,877 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:48,236 INFO [main-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:49,879 WARN 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:49,879 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-IP1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:24:50,238 WARN [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:50,238 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-Host1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:25:17,470 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 30023ms for sessionid 0x2045762cc710006, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:25:17,571 WARN [master/HM1-Host/HM1-IP:16000] 
> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, 
> quorum=ZK-Host:2181,ZK-Host1:2181,ZK-Host2:2181, 
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2015-09-26 15:25:17,872 INFO [main-SendThread(ZK-Host:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host 2181
> 2015-09-26 15:25:19,874 WARN [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host
> 2015-09-26 15:25:19,874 INFO [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server ZK-Host/ZK-IP:2181. 
> Will not attempt to authenticate using SASL (unknown error)
> {code}
> > Since HM1 was not able to connect to any ZK, so session timeout didnt 
> > happen at Zookeeper server side and HM1 didnt abort.
> > On Zookeeper session timeout standby master (HM2) registered himself as an 
> > active master. 
> > HM2 is keep on waiting for region server to report him as part of active 
> > master intialization.
> {noformat} 
> 2015-09-26 15:24:44,928 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 0 ms, 
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval 
> of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> ---
> ---
> 2015-09-26 15:32:50,841 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 483913 
> ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, 
> interval of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> {noformat}
> > At other end, region servers are reporting to HM1 on 3 sec interval. Here 
> > region server retrieve master 

[jira] [Commented] (HBASE-14800) Expose checkAndMutate via Thrift2

2015-11-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003278#comment-15003278
 ] 

Hadoop QA commented on HBASE-14800:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12772050/HBASE-14800.001.patch
  against master branch at commit 789f8a5a70242c16ce10bc95401c51c7d04debfa.
  ATTACHMENT ID: 12772050

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1729 checkstyle errors (more than the master's current 1727 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+lastComparison = 
Boolean.valueOf(isSetBloomFilterType()).compareTo(other.isSetBloomFilterType());
+lastComparison = 
Boolean.valueOf(isSetBloomFilterVectorSize()).compareTo(other.isSetBloomFilterVectorSize());
+lastComparison = 
Boolean.valueOf(isSetBloomFilterNbHashes()).compareTo(other.isSetBloomFilterNbHashes());
+lastComparison = 
Boolean.valueOf(isSetBlockCacheEnabled()).compareTo(other.isSetBlockCacheEnabled());
+  public AsyncMethodCallback getResultHandler(final AsyncFrameBuffer 
fb, final int seqid) {
+  public AsyncMethodCallback getResultHandler(final AsyncFrameBuffer 
fb, final int seqid) {
+  public AsyncMethodCallback getResultHandler(final 
AsyncFrameBuffer fb, final int seqid) {
+  public AsyncMethodCallback getResultHandler(final AsyncFrameBuffer 
fb, final int seqid) {
+  public AsyncMethodCallback getResultHandler(final AsyncFrameBuffer 
fb, final int seqid) {
+  public AsyncMethodCallback getResultHandler(final 
AsyncFrameBuffer fb, final int seqid) {

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16500//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16500//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16500//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16500//console

This message is automatically generated.

> Expose checkAndMutate via Thrift2
> -
>
> Key: HBASE-14800
> URL: https://issues.apache.org/jira/browse/HBASE-14800
> Project: HBase
>  Issue Type: Improvement
>  Components: Thrift
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 2.0.0
>
> Attachments: HBASE-14800.001.patch
>
>
> Had a user ask why checkAndMutate wasn't exposed via Thrift2.
> I see no good reason (since checkAndPut and checkAndDelete are already 
> there), so let's add it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14799) Commons-collections object deserialization remote command execution vulnerability

2015-11-12 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14799:
---
Attachment: HBASE-14799-0.94.patch

> Commons-collections object deserialization remote command execution 
> vulnerability 
> --
>
> Key: HBASE-14799
> URL: https://issues.apache.org/jira/browse/HBASE-14799
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Critical
> Fix For: 0.94.28, 0.98.17
>
> Attachments: HBASE-14799-0.94.patch, HBASE-14799-0.94.patch, 
> HBASE-14799-0.98.patch
>
>
> Read: 
> http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/
> TL;DR: If you have commons-collections on your classpath and accept and 
> process Java object serialization data, then you probably have an exploitable 
> remote command execution vulnerability. 
> 0.94 and earlier HBase releases are vulnerable because we might read in and 
> rehydrate serialized Java objects out of RPC packet data in 
> HbaseObjectWritable using ObjectInputStream#readObject (see 
> https://hbase.apache.org/0.94/xref/org/apache/hadoop/hbase/io/HbaseObjectWritable.html#714)
>  and we have commons-collections on the classpath on the server.
> 0.98 also carries some limited exposure to this problem through inclusion of 
> backwards compatible deserialization code in 
> HbaseObjectWritableFor96Migration. This is used by the 0.94-to-0.98 migration 
> utility, and by the AccessController when reading permissions from the ACL 
> table serialized in legacy format by 0.94. Unprivileged users cannot run the 
> tool nor access the ACL table.
> Unprivileged users can however attack a 0.94 installation. An attacker might 
> be able to use the method discussed on that blog post to capture valid HBase 
> RPC payloads for 0.94 and prior versions, rewrite them to embed an exploit, 
> and replay them to trigger a remote command execution with the privileges of 
> the account under which the HBase RegionServer daemon is running.
> We need to make a patch release of 0.94 that changes HbaseObjectWritable to 
> disallow processing of random Java object serializations. This will be a 
> compatibility break that might affect old style coprocessors, which quite 
> possibly may rely on this catch-all in HbaseObjectWritable for custom object 
> (de)serialization. We can introduce a new configuration setting, 
> "hbase.allow.legacy.object.serialization", defaulting to false.
> To be thorough, we can also use the new configuration setting  
> "hbase.allow.legacy.object.serialization" (defaulting to false) in 0.98 to 
> prevent the AccessController from falling back to the vulnerable legacy code. 
> This turns out to not affect the ability to migrate permissions because 
> TablePermission implements Writable, which is safe, not Serializable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) [0.98] CopyTable is extremely slow when moving delete markers

2015-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003336#comment-15003336
 ] 

Lars Hofhansl commented on HBASE-14791:
---

Looks good!

Two questions:
# Would subclassing (as opposed to delegation as used in the patch) save us a 
bunch of code?
# Would the change from HTable to HTableInterface break compatibility for folks 
subclasses TableOutputFormat? (would not be an issue too if we do the 
subclassing of #1)


> [0.98] CopyTable is extremely slow when moving delete markers
> -
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
> Attachments: HBASE-14791-0.98-v1.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14802) Replaying server crash recovery procedure after a failover causes incorrect handling of deadservers

2015-11-12 Thread Ashu Pachauri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashu Pachauri updated HBASE-14802:
--
Attachment: HBASE-14802.patch

> Replaying server crash recovery procedure after a failover causes incorrect 
> handling of deadservers
> ---
>
> Key: HBASE-14802
> URL: https://issues.apache.org/jira/browse/HBASE-14802
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0, 1.2.0, 1.2.1
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Attachments: HBASE-14802.patch
>
>
> The way dead servers are processed is that a ServerCrashProcedure is launched 
> for a server after it is added to the dead servers list. 
> Every time a server is added to the dead list, a counter "numProcessing" is 
> incremented and it is decremented when a crash recovery procedure finishes. 
> Since, adding a dead server and recovering it are two separate events, it can 
> cause inconsistencies.
> If a master failover occurs in the middle of the crash recovery, the 
> numProcessing counter resets but the ServerCrashProcedure is replayed by the 
> new master. This causes the counter to go negative and makes the master think 
> that dead servers are still in process of recovery. 
> This has ramifications on the balancer that the balancer ceases to run after 
> such a failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14771) RpcServer.getRemoteAddress always returns null.

2015-11-12 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003108#comment-15003108
 ] 

Appy commented on HBASE-14771:
--

[~a72877] Let's drive it to the end. Only need to copy-paste the code from your 
last comment.

> RpcServer.getRemoteAddress always returns null.
> ---
>
> Key: HBASE-14771
> URL: https://issues.apache.org/jira/browse/HBASE-14771
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 1.2.0
>Reporter: Abhishek Kumar
>Assignee: Abhishek Kumar
>Priority: Minor
> Attachments: HBASE-14771-V1.patch, HBASE-14771.patch
>
>
> RpcServer.getRemoteAddress always returns null, because Call object is 
> getting initialized with null.This seems to be happening because of using 
> RpcServer.getRemoteIp() in  Call object constructor before RpcServer thread 
> local 'CurCall' being set in CallRunner.run method:
> {noformat}
> // --- RpcServer.java ---
> protected void processRequest(byte[] buf) throws IOException, 
> InterruptedException {
>  .
> // Call object getting initialized here with address 
> // obtained from RpcServer.getRemoteIp()
> Call call = new Call(id, this.service, md, header, param, cellScanner, this, 
> responder,
>   totalRequestSize, traceInfo, RpcServer.getRemoteIp());
>   scheduler.dispatch(new CallRunner(RpcServer.this, call));
>  }
> // getRemoteIp method gets address from threadlocal 'CurCall' which 
> // gets set in CallRunner.run and calling it before this as in above case, 
> will return null
> // --- CallRunner.java ---
> public void run() {
>   .   
>   Pair resultPair = null;
>   RpcServer.CurCall.set(call);
>   ..
> }
> // Using 'this.addr' in place of getRemoteIp method in RpcServer.java seems 
> to be fixing this issue
> Call call = new Call(id, this.service, md, header, param, cellScanner, this, 
> responder,
>   totalRequestSize, traceInfo, this.addr);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14367) Add normalization support to shell

2015-11-12 Thread Romil Choksi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003237#comment-15003237
 ] 

Romil Choksi commented on HBASE-14367:
--

I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
that table

{code}
hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
'true'}
An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
0 row(s) in 4.2670 seconds

=> Hbase::Table - test-table-4
hbase(main):021:0> desc 'test-table-4'
Table test-table-4 is ENABLED   


test-table-4


COLUMN FAMILIES DESCRIPTION 


{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', 
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}   


1 row(s) in 0.0430 seconds
{code}

However, on doing an alter command on that table we can set the 
NORMALIZATION_ENABLED attribute for that table
{code}
hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'}
Unknown argument ignored: NORMALIZATION_ENABLED
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.3640 seconds

hbase(main):023:0> desc 'test-table-4'
Table test-table-4 is ENABLED   


test-table-4, {TABLE_ATTRIBUTES => {NORMALIZATION_ENABLED => 'true'}


COLUMN FAMILIES DESCRIPTION 


{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', 
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}   


1 row(s) in 0.0190 seconds
{code}

I think it would be better to have a single step process to enable 
normalization while creating the table itself, rather than a two step process 
to alter the table later on to enable normalization

> Add normalization support to shell
> --
>
> Key: HBASE-14367
> URL: https://issues.apache.org/jira/browse/HBASE-14367
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer, shell
>Affects Versions: 1.1.2
>Reporter: Lars George
>Assignee: Mikhail Antonov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14367-branch-1.2.v1.patch, 
> HBASE-14367-branch-1.2.v2.patch, HBASE-14367-branch-1.2.v3.patch, 
> HBASE-14367-branch-1.v1.patch, HBASE-14367-v1.patch, HBASE-14367.patch
>
>
> https://issues.apache.org/jira/browse/HBASE-13103 adds support for setting a 
> normalization flag per {{HTableDescriptor}}, along with the server side chore 
> to do the work.
> What is lacking is to easily set this from the shell, right now you need to 
> use the Java API to modify the descriptor. This issue is to add the flag as a 
> known attribute key and/or other means to toggle this per table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14803) Add some debug logs to StoreFileScanner

2015-11-12 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-14803:

Attachment: HBASE-14803.v1-trunk.patch

Good catch! Update patch attached.

> Add some debug logs to StoreFileScanner
> ---
>
> Key: HBASE-14803
> URL: https://issues.apache.org/jira/browse/HBASE-14803
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
>  Labels: beginner
> Fix For: 1.2.0
>
> Attachments: HBASE-14803.v0-trunk.patch, HBASE-14803.v1-trunk.patch
>
>
> To validate some behaviors I had to add some logs into StoreFileScanner.
> I think it can be interesting for other people looking for debuging. So 
> sharing the modifications here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-14804) HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute

2015-11-12 Thread Appy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Appy reassigned HBASE-14804:


Assignee: Appy

> HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute
> 
>
> Key: HBASE-14804
> URL: https://issues.apache.org/jira/browse/HBASE-14804
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 1.1.2
>Reporter: Romil Choksi
>Assignee: Appy
>
> I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
> but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
> attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
> that table
> {code}
> hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
> 'true'}
> An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
> 0 row(s) in 4.2670 seconds
> => Hbase::Table - test-table-4
> hbase(main):021:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0430 seconds
> {code}
> However, on doing an alter command on that table we can set the 
> NORMALIZATION_ENABLED attribute for that table
> {code}
> hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'}
> Unknown argument ignored: NORMALIZATION_ENABLED
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 2.3640 seconds
> hbase(main):023:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4, {TABLE_ATTRIBUTES => {NORMALIZATION_ENABLED => 'true'}  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0190 seconds
> {code}
> I think it would be better to have a single step process to enable 
> normalization while creating the table itself, rather than a two step process 
> to alter the table later on to enable normalization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14172) Upgrade existing thrift binding using thrift 0.9.2 compiler.

2015-11-12 Thread Pankaj Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003388#comment-15003388
 ] 

Pankaj Kumar commented on HBASE-14172:
--

Shall we upgrade to Thrift v0.9.3? 

> Upgrade existing thrift binding using thrift 0.9.2 compiler.
> 
>
> Key: HBASE-14172
> URL: https://issues.apache.org/jira/browse/HBASE-14172
> Project: HBase
>  Issue Type: Improvement
>Reporter: Srikanth Srungarapu
>Priority: Minor
> Attachments: HBASE-14172-branch-1.patch, HBASE-14172.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14802) Replaying server crash recovery procedure after a failover causes incorrect handling of deadservers

2015-11-12 Thread Ashu Pachauri (JIRA)
Ashu Pachauri created HBASE-14802:
-

 Summary: Replaying server crash recovery procedure after a 
failover causes incorrect handling of deadservers
 Key: HBASE-14802
 URL: https://issues.apache.org/jira/browse/HBASE-14802
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.2.0, 1.2.1
Reporter: Ashu Pachauri
Assignee: Ashu Pachauri


The way dead servers are processed is that a ServerCrashProcedure is launched 
for a server after it is added to the dead servers list. 
Every time a server is added to the dead list, a counter "numProcessing" is 
incremented and it is decremented when a crash recovery procedure finishes. 
Since, adding a dead server and recovering it are two separate events, it can 
cause inconsistencies.

If a master failover occurs in the middle of the crash recovery, the 
numProcessing counter resets but the ServerCrashProcedure is replayed by the 
new master. This causes the counter to go negative and makes the master think 
that dead servers are still in process of recovery. 
This has ramifications on the balancer that the balancer ceases to run after 
such a failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14803) Add some debug logs to StoreFileScanner

2015-11-12 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003098#comment-15003098
 ] 

Jerry He commented on HBASE-14803:
--

+1
The scanner implementations lack debug logging overall, which makes things 
harder for new folks to master.

> Add some debug logs to StoreFileScanner
> ---
>
> Key: HBASE-14803
> URL: https://issues.apache.org/jira/browse/HBASE-14803
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
>  Labels: beginner
> Fix For: 1.2.0
>
> Attachments: HBASE-14803.v0-trunk.patch
>
>
> To validate some behaviors I had to add some logs into StoreFileScanner.
> I think it can be interesting for other people looking for debuging. So 
> sharing the modifications here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14498) Master stuck in infinite loop when all Zookeeper servers are unreachable

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003134#comment-15003134
 ] 

Hudson commented on HBASE-14498:


SUCCESS: Integrated in HBase-1.3-IT #308 (See 
[https://builds.apache.org/job/HBase-1.3-IT/308/])
HBASE-14498 Revert for on-going review (tedyu: rev 
3e551ea538dc1f9dd5ae0ce53900c1e57a53acdb)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* 
hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperWatcher.java


> Master stuck in infinite loop when all Zookeeper servers are unreachable
> 
>
> Key: HBASE-14498
> URL: https://issues.apache.org/jira/browse/HBASE-14498
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Pankaj Kumar
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4
>
> Attachments: HBASE-14498-V2.patch, HBASE-14498-V3.patch, 
> HBASE-14498-V4.patch, HBASE-14498.patch
>
>
> We met a weird scenario in our production environment.
> In a HA cluster,
> > Active Master (HM1) is not able to connect to any Zookeeper server (due to 
> > N/w breakdown on master machine network with Zookeeper servers).
> {code}
> 2015-09-26 15:24:47,508 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 33463ms for sessionid 0x104576b8dda0002, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:24:47,877 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:48,236 INFO [main-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:49,879 WARN 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:49,879 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-IP1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:24:50,238 WARN [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:50,238 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-Host1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:25:17,470 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 30023ms for sessionid 0x2045762cc710006, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:25:17,571 WARN [master/HM1-Host/HM1-IP:16000] 
> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, 
> quorum=ZK-Host:2181,ZK-Host1:2181,ZK-Host2:2181, 
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2015-09-26 15:25:17,872 INFO [main-SendThread(ZK-Host:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host 2181
> 2015-09-26 15:25:19,874 WARN [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host
> 2015-09-26 15:25:19,874 INFO [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server ZK-Host/ZK-IP:2181. 
> Will not attempt to authenticate using SASL (unknown error)
> {code}
> > Since HM1 was not able to connect to any ZK, so session timeout didnt 
> > happen at Zookeeper server side and HM1 didnt abort.
> > On Zookeeper session timeout standby master (HM2) registered himself as an 
> > active master. 
> > HM2 is keep on waiting for region server to report him as part of active 
> > master intialization.
> {noformat} 
> 2015-09-26 15:24:44,928 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 0 ms, 
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval 
> of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> ---
> ---
> 2015-09-26 15:32:50,841 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 483913 
> ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, 
> interval of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> {noformat}
> > At other end, region servers are reporting to HM1 on 3 sec interval. Here 
> > region server retrieve master 

[jira] [Commented] (HBASE-14803) Add some debug logs to StoreFileScanner

2015-11-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003269#comment-15003269
 ] 

Heng Chen commented on HBASE-14803:
---

{quote}
+  static final Log LOG = LogFactory.getLog(HStore.class);
{quote}

Why is HStore.class,  not StoreFileScanner.class?

> Add some debug logs to StoreFileScanner
> ---
>
> Key: HBASE-14803
> URL: https://issues.apache.org/jira/browse/HBASE-14803
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
>  Labels: beginner
> Fix For: 1.2.0
>
> Attachments: HBASE-14803.v0-trunk.patch
>
>
> To validate some behaviors I had to add some logs into StoreFileScanner.
> I think it can be interesting for other people looking for debuging. So 
> sharing the modifications here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14355) Scan different TimeRange for each column family

2015-11-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003284#comment-15003284
 ] 

Hadoop QA commented on HBASE-14355:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12772054/HBASE-14355.branch-1.patch
  against branch-1 branch at commit 789f8a5a70242c16ce10bc95401c51c7d04debfa.
  ATTACHMENT ID: 12772054

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
3774 checkstyle errors (more than the master's current 3773 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+  "ualifier\030\002 
\003(\014\"\271\003\n\003Get\022\013\n\003row\030\001 \002(\014\022 \n\006c" +
+  "ount\030\002 \001(\005\022\016\n\006exists\030\003 
\001(\010\022\024\n\005stale\030\004 \001(" +
+  "l\022\013\n\003row\030\001 \002(\014\022\024\n\014service_name\030\002 
\002(\t\022\023\n\013" +
+  new java.lang.String[] { "Row", "Column", "Attribute", "Filter", 
"TimeRange", "MaxVersions", "CacheBlocks", "StoreLimit", "StoreOffset", 
"ExistenceOnly", "ClosestRowBefore", "Consistency", "CfTimeRange", });
+  new java.lang.String[] { "Column", "Attribute", "StartRow", 
"StopRow", "Filter", "TimeRange", "MaxVersions", "CacheBlocks", "BatchSize", 
"MaxResultSize", "StoreLimit", "StoreOffset", "LoadColumnFamiliesOnDemand", 
"Small", "Reversed", "Consistency", "Caching", "AllowPartialResults", 
"CfTimeRange", });

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16502//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16502//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16502//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16502//console

This message is automatically generated.

> Scan different TimeRange for each column family
> ---
>
> Key: HBASE-14355
> URL: https://issues.apache.org/jira/browse/HBASE-14355
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver, Scanners
>Reporter: Dave Latham
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: HBASE-14355-v1.patch, HBASE-14355-v10.patch, 
> HBASE-14355-v11.patch, HBASE-14355-v2.patch, HBASE-14355-v3.patch, 
> HBASE-14355-v4.patch, HBASE-14355-v5.patch, HBASE-14355-v6.patch, 
> HBASE-14355-v7.patch, HBASE-14355-v8.patch, HBASE-14355-v9.patch, 
> HBASE-14355.branch-1.patch, HBASE-14355.patch
>
>
> At present the Scan API supports only table level time range. We have 
> specific use cases that will benefit from per column family time range. (See 
> background discussion at 
> https://mail-archives.apache.org/mod_mbox/hbase-user/201508.mbox/%3ccaa4mzom00ef5eoxstk0hetxeby8mqss61gbvgttgpaspmhq...@mail.gmail.com%3E)
> There are a couple of choices that would be good to validate.  First - how to 
> update the Scan API to support family and table level updates.  One proposal 
> would be to add Scan.setTimeRange(byte family, long minTime, long maxTime), 
> then store it in a Map.  When executing the scan, if a 
> family has a specified TimeRange, then use it, otherwise fall back to using 
> the table level TimeRange.  Clients using the new API against old region 
> servers would not get the families correctly filterd.  Old clients sending 
> scans to new region servers would work correctly.
> The other question is how to get StoreFileScanner.shouldUseScanner to match 
> up the 

[jira] [Comment Edited] (HBASE-14799) Commons-collections object deserialization remote command execution vulnerability

2015-11-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003329#comment-15003329
 ] 

Andrew Purtell edited comment on HBASE-14799 at 11/13/15 1:10 AM:
--

I investigated the test failures and found some issues. 

The first is we never added efficient support for serializing our Pair type. We 
rely on generic object serialization for it. I fixed this problem. 
Unfortunately I cannot be 100% backwards compatible. We can't just whitelist 
Pair. A Pair can hold any other type of object. We get to see that we have a 
Pair, but not the types contained within until after deserialization, and 
that's too late. Therefore I've added a code for Pair and special case handling 
for it, like we do with List. Older peers will not understand this change. The 
APIs affected are HMasterInterface#getAlterStatus and 
HRegionInterface#bulkLoadHFiles. Sorry, cannot be helped and avoid risk of 
exploit. However, thankfully it's only two APIs that are not super commonly 
used. To reiterate, 100% compatibility won't be possible. If that is required, 
then we must close this as Wont Fix. 

I also discovered we are generically serializing the java.lang.* types in some 
cases. However we will handle the primitive types in a backwards compatible way 
if we simply unbox, so I do this where we can. Newer peers will be able to 
communicate with older peers without issue. If older peers elect send 
object-serialized primitives, though, newer peers will reject the message 
unless configured to accept legacy serialization. This is intended behavior.

I'm still working through 0.94 tests.


was (Author: apurtell):
I investigated the test failures and found some issues. 

The first is we never added efficient support for serializing our Pair type. We 
rely on generic object serialization for it. I fixed this problem. 
Unfortunately I cannot be 100% backwards compatible. We can't just whitelist 
Pair. A Pair can hold any other type of object. We get to see that we have a 
Pair, but not the types contained within until after deserialization, and 
that's too late. Therefore I've added a code for Pair and special case handling 
for it, like we do with List. Older peers will not understand this change. The 
APIs affected are HMasterInterface#getAlterStatus and 
HRegionInterface#bulkLoadHFiles. Sorry, cannot be helped and avoid risk of 
exploit. However, thankfully it's only two APIs that are not super commonly 
used. 

I also discovered we are generically serializing the java.lang.* types. However 
we will handle the primitive types in a backwards compatible way if we simply 
unbox, so I do this where we can. Newer peers will be able to communicate with 
older peers without issue. If older peers elect send object-serialized 
primitives, though, newer peers will reject the message unless configured to 
accept legacy serialization. This is intended behavior.

I'm still working through 0.94 tests.

> Commons-collections object deserialization remote command execution 
> vulnerability 
> --
>
> Key: HBASE-14799
> URL: https://issues.apache.org/jira/browse/HBASE-14799
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Critical
> Fix For: 0.94.28, 0.98.17
>
> Attachments: HBASE-14799-0.94.patch, HBASE-14799-0.98.patch
>
>
> Read: 
> http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/
> TL;DR: If you have commons-collections on your classpath and accept and 
> process Java object serialization data, then you probably have an exploitable 
> remote command execution vulnerability. 
> 0.94 and earlier HBase releases are vulnerable because we might read in and 
> rehydrate serialized Java objects out of RPC packet data in 
> HbaseObjectWritable using ObjectInputStream#readObject (see 
> https://hbase.apache.org/0.94/xref/org/apache/hadoop/hbase/io/HbaseObjectWritable.html#714)
>  and we have commons-collections on the classpath on the server.
> 0.98 also carries some limited exposure to this problem through inclusion of 
> backwards compatible deserialization code in 
> HbaseObjectWritableFor96Migration. This is used by the 0.94-to-0.98 migration 
> utility, and by the AccessController when reading permissions from the ACL 
> table serialized in legacy format by 0.94. Unprivileged users cannot run the 
> tool nor access the ACL table.
> Unprivileged users can however attack a 0.94 installation. An attacker might 
> be able to use the method discussed on that blog post to capture valid HBase 
> RPC payloads for 0.94 and prior versions, rewrite them to embed an exploit, 
> and replay them to trigger a 

[jira] [Commented] (HBASE-14367) Add normalization support to shell

2015-11-12 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003332#comment-15003332
 ] 

Jean-Marc Spaggiari commented on HBASE-14367:
-

This is a defect. Please open a different JIRA. Should be pretty easy to fix.

> Add normalization support to shell
> --
>
> Key: HBASE-14367
> URL: https://issues.apache.org/jira/browse/HBASE-14367
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer, shell
>Affects Versions: 1.1.2
>Reporter: Lars George
>Assignee: Mikhail Antonov
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14367-branch-1.2.v1.patch, 
> HBASE-14367-branch-1.2.v2.patch, HBASE-14367-branch-1.2.v3.patch, 
> HBASE-14367-branch-1.v1.patch, HBASE-14367-v1.patch, HBASE-14367.patch
>
>
> https://issues.apache.org/jira/browse/HBASE-13103 adds support for setting a 
> normalization flag per {{HTableDescriptor}}, along with the server side chore 
> to do the work.
> What is lacking is to easily set this from the shell, right now you need to 
> use the Java API to modify the descriptor. This issue is to add the flag as a 
> known attribute key and/or other means to toggle this per table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14804) HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute

2015-11-12 Thread Romil Choksi (JIRA)
Romil Choksi created HBASE-14804:


 Summary: HBase shell's create table command ignores 
'NORMALIZATION_ENABLED' attribute
 Key: HBASE-14804
 URL: https://issues.apache.org/jira/browse/HBASE-14804
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 1.1.2
Reporter: Romil Choksi


I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
that table
hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
'true'}
An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
0 row(s) in 4.2670 seconds

=> Hbase::Table - test-table-4
hbase(main):021:0> desc 'test-table-4'
Table test-table-4 is ENABLED   


test-table-4


COLUMN FAMILIES DESCRIPTION 


{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', 
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}   


1 row(s) in 0.0430 seconds
However, on doing an alter command on that table we can set the 
NORMALIZATION_ENABLED attribute for that table
hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'}
Unknown argument ignored: NORMALIZATION_ENABLED
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.3640 seconds

hbase(main):023:0> desc 'test-table-4'
Table test-table-4 is ENABLED   


test-table-4, {TABLE_ATTRIBUTES => {NORMALIZATION_ENABLED => 'true'}


COLUMN FAMILIES DESCRIPTION 


{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', 
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}   


1 row(s) in 0.0190 seconds
I think it would be better to have a single step process to enable 
normalization while creating the table itself, rather than a two step process 
to alter the table later on to enable normalization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-14791) [0.98] CopyTable is extremely slow when moving delete markers

2015-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003336#comment-15003336
 ] 

Lars Hofhansl edited comment on HBASE-14791 at 11/13/15 1:42 AM:
-

Looks good! [~alexaraujo]

Two questions:
# Would subclassing (as opposed to delegation as used in the patch) save us a 
bunch of code?
# Would the change from HTable to HTableInterface break compatibility for folks 
subclasses TableOutputFormat? (would not be an issue too if we do the 
subclassing of #1)



was (Author: lhofhansl):
Looks good!

Two questions:
# Would subclassing (as opposed to delegation as used in the patch) save us a 
bunch of code?
# Would the change from HTable to HTableInterface break compatibility for folks 
subclasses TableOutputFormat? (would not be an issue too if we do the 
subclassing of #1)


> [0.98] CopyTable is extremely slow when moving delete markers
> -
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
> Attachments: HBASE-14791-0.98-v1.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14803) Add some debug logs to StoreFileScanner

2015-11-12 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-14803:

Status: Open  (was: Patch Available)

> Add some debug logs to StoreFileScanner
> ---
>
> Key: HBASE-14803
> URL: https://issues.apache.org/jira/browse/HBASE-14803
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
>  Labels: beginner
> Fix For: 1.2.0
>
> Attachments: HBASE-14803.v0-trunk.patch
>
>
> To validate some behaviors I had to add some logs into StoreFileScanner.
> I think it can be interesting for other people looking for debuging. So 
> sharing the modifications here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14799) Commons-collections object deserialization remote command execution vulnerability

2015-11-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003329#comment-15003329
 ] 

Andrew Purtell commented on HBASE-14799:


I investigated the test failures and found some issues. 

The first is we never added efficient support for serializing our Pair type. We 
rely on generic object serialization for it. I fixed this problem. 
Unfortunately I cannot be 100% backwards compatible. We can't just whitelist 
Pair. A Pair can hold any other type of object. We get to see that we have a 
Pair, but not the types contained within until after deserialization, and 
that's too late. Therefore I've added a code for Pair and special case handling 
for it, like we do with List. Older peers will not understand this change. The 
APIs affected are HMasterInterface#getAlterStatus and 
HRegionInterface#bulkLoadHFiles. Sorry, cannot be helped and avoid risk of 
exploit. However, thankfully it's only two APIs that are not super commonly 
used. 

I also discovered we are generically serializing the java.lang.* types. However 
we will handle the primitive types in a backwards compatible way if we simply 
unbox, so I do this where we can. Newer peers will be able to communicate with 
older peers without issue. If older peers elect send object-serialized 
primitives, though, newer peers will reject the message unless configured to 
accept legacy serialization. This is intended behavior.

I'm still working through 0.94 tests.

> Commons-collections object deserialization remote command execution 
> vulnerability 
> --
>
> Key: HBASE-14799
> URL: https://issues.apache.org/jira/browse/HBASE-14799
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Critical
> Fix For: 0.94.28, 0.98.17
>
> Attachments: HBASE-14799-0.94.patch, HBASE-14799-0.98.patch
>
>
> Read: 
> http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/
> TL;DR: If you have commons-collections on your classpath and accept and 
> process Java object serialization data, then you probably have an exploitable 
> remote command execution vulnerability. 
> 0.94 and earlier HBase releases are vulnerable because we might read in and 
> rehydrate serialized Java objects out of RPC packet data in 
> HbaseObjectWritable using ObjectInputStream#readObject (see 
> https://hbase.apache.org/0.94/xref/org/apache/hadoop/hbase/io/HbaseObjectWritable.html#714)
>  and we have commons-collections on the classpath on the server.
> 0.98 also carries some limited exposure to this problem through inclusion of 
> backwards compatible deserialization code in 
> HbaseObjectWritableFor96Migration. This is used by the 0.94-to-0.98 migration 
> utility, and by the AccessController when reading permissions from the ACL 
> table serialized in legacy format by 0.94. Unprivileged users cannot run the 
> tool nor access the ACL table.
> Unprivileged users can however attack a 0.94 installation. An attacker might 
> be able to use the method discussed on that blog post to capture valid HBase 
> RPC payloads for 0.94 and prior versions, rewrite them to embed an exploit, 
> and replay them to trigger a remote command execution with the privileges of 
> the account under which the HBase RegionServer daemon is running.
> We need to make a patch release of 0.94 that changes HbaseObjectWritable to 
> disallow processing of random Java object serializations. This will be a 
> compatibility break that might affect old style coprocessors, which quite 
> possibly may rely on this catch-all in HbaseObjectWritable for custom object 
> (de)serialization. We can introduce a new configuration setting, 
> "hbase.allow.legacy.object.serialization", defaulting to false.
> To be thorough, we can also use the new configuration setting  
> "hbase.allow.legacy.object.serialization" (defaulting to false) in 0.98 to 
> prevent the AccessController from falling back to the vulnerable legacy code. 
> This turns out to not affect the ability to migrate permissions because 
> TablePermission implements Writable, which is safe, not Serializable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

2015-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003338#comment-15003338
 ] 

Lars Hofhansl commented on HBASE-14221:
---

The LoserTree did not work out in all cases, but in HBASE-9969 Matt has an 
alternate implementation of KeyValueHeap, which I thought was nice for two 
reasons:
# it saves some compares, and
# the implementation is our own, so we can tweak it more later (it has always 
bothered me a bit that _the_ central data structure for HBase's mergesort is 
just the Java standard PriorityQueue :) )


> Reduce the number of time row comparison is done in a Scan
> --
>
> Key: HBASE-14221
> URL: https://issues.apache.org/jira/browse/HBASE-14221
> Project: HBase
>  Issue Type: Sub-task
>  Components: Scanners
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, 
> HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, 
> withmatchingRowspatch.png, withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>int ret = this.rowComparator.compareRows(curCell, cell);
> if (!this.isReversed) {
>   if (ret <= -1) {
> return MatchCode.DONE;
>   } else if (ret >= 1) {
> // could optimize this, if necessary?
> // Could also be called SEEK_TO_CURRENT_ROW, but this
> // should be rare/never happens.
> return MatchCode.SEEK_NEXT_ROW;
>   }
> } else {
>   if (ret <= -1) {
> return MatchCode.SEEK_NEXT_ROW;
>   } else if (ret >= 1) {
> return MatchCode.DONE;
>   }
> }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
> if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
> isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>   this.countPerRow = 0;
>   matcher.setToNewRow(peeked);
> }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>   scannerContext.setKeepProgress(true);
>   heap.next(results, scannerContext);
>   scannerContext.setKeepProgress(tmpKeepProgress);
>   nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-14799) Commons-collections object deserialization remote command execution vulnerability

2015-11-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003329#comment-15003329
 ] 

Andrew Purtell edited comment on HBASE-14799 at 11/13/15 1:11 AM:
--

I investigated the test failures and found some issues. 

The first is we never added efficient support for serializing our Pair type. We 
rely on generic object serialization for it. I fixed this problem. 
Unfortunately I cannot be 100% backwards compatible. We can't just whitelist 
Pair. A Pair can hold any other type of object. We get to see that we have a 
Pair, but not the types contained within until after deserialization, and 
that's too late. Therefore I've added a code for Pair and special case handling 
for it, like we do with List. Older peers will not understand this change. The 
APIs affected are HMasterInterface#getAlterStatus and 
HRegionInterface#bulkLoadHFiles. Sorry, cannot be helped and avoid risk of 
exploit. However, thankfully it's only two APIs that are not super commonly 
used. To reiterate, 100% compatibility won't be possible. If that is required, 
then we must close this as Wont Fix. 

I also discovered we are generically serializing the java.lang.* types in some 
cases. However we will handle the primitive types in a backwards compatible way 
if we simply unbox, so I do this where we can. Newer peers will be able to 
communicate with older peers without issue. If older peers elect to send 
object-serialized boxed types instead of primitives, newer peers will reject 
the message unless configured to accept legacy serialization. This is intended 
behavior.

I'm still working through 0.94 tests.


was (Author: apurtell):
I investigated the test failures and found some issues. 

The first is we never added efficient support for serializing our Pair type. We 
rely on generic object serialization for it. I fixed this problem. 
Unfortunately I cannot be 100% backwards compatible. We can't just whitelist 
Pair. A Pair can hold any other type of object. We get to see that we have a 
Pair, but not the types contained within until after deserialization, and 
that's too late. Therefore I've added a code for Pair and special case handling 
for it, like we do with List. Older peers will not understand this change. The 
APIs affected are HMasterInterface#getAlterStatus and 
HRegionInterface#bulkLoadHFiles. Sorry, cannot be helped and avoid risk of 
exploit. However, thankfully it's only two APIs that are not super commonly 
used. To reiterate, 100% compatibility won't be possible. If that is required, 
then we must close this as Wont Fix. 

I also discovered we are generically serializing the java.lang.* types in some 
cases. However we will handle the primitive types in a backwards compatible way 
if we simply unbox, so I do this where we can. Newer peers will be able to 
communicate with older peers without issue. If older peers elect to send 
object-serialized primitives, though, newer peers will reject the message 
unless configured to accept legacy serialization. This is intended behavior.

I'm still working through 0.94 tests.

> Commons-collections object deserialization remote command execution 
> vulnerability 
> --
>
> Key: HBASE-14799
> URL: https://issues.apache.org/jira/browse/HBASE-14799
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Critical
> Fix For: 0.94.28, 0.98.17
>
> Attachments: HBASE-14799-0.94.patch, HBASE-14799-0.98.patch
>
>
> Read: 
> http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/
> TL;DR: If you have commons-collections on your classpath and accept and 
> process Java object serialization data, then you probably have an exploitable 
> remote command execution vulnerability. 
> 0.94 and earlier HBase releases are vulnerable because we might read in and 
> rehydrate serialized Java objects out of RPC packet data in 
> HbaseObjectWritable using ObjectInputStream#readObject (see 
> https://hbase.apache.org/0.94/xref/org/apache/hadoop/hbase/io/HbaseObjectWritable.html#714)
>  and we have commons-collections on the classpath on the server.
> 0.98 also carries some limited exposure to this problem through inclusion of 
> backwards compatible deserialization code in 
> HbaseObjectWritableFor96Migration. This is used by the 0.94-to-0.98 migration 
> utility, and by the AccessController when reading permissions from the ACL 
> table serialized in legacy format by 0.94. Unprivileged users cannot run the 
> tool nor access the ACL table.
> Unprivileged users can however attack a 0.94 installation. An attacker might 
> be able to use the method discussed on 

[jira] [Comment Edited] (HBASE-14799) Commons-collections object deserialization remote command execution vulnerability

2015-11-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003329#comment-15003329
 ] 

Andrew Purtell edited comment on HBASE-14799 at 11/13/15 1:11 AM:
--

I investigated the test failures and found some issues. 

The first is we never added efficient support for serializing our Pair type. We 
rely on generic object serialization for it. I fixed this problem. 
Unfortunately I cannot be 100% backwards compatible. We can't just whitelist 
Pair. A Pair can hold any other type of object. We get to see that we have a 
Pair, but not the types contained within until after deserialization, and 
that's too late. Therefore I've added a code for Pair and special case handling 
for it, like we do with List. Older peers will not understand this change. The 
APIs affected are HMasterInterface#getAlterStatus and 
HRegionInterface#bulkLoadHFiles. Sorry, cannot be helped and avoid risk of 
exploit. However, thankfully it's only two APIs that are not super commonly 
used. To reiterate, 100% compatibility won't be possible. If that is required, 
then we must close this as Wont Fix. 

I also discovered we are generically serializing the java.lang.* types in some 
cases. However we will handle the primitive types in a backwards compatible way 
if we simply unbox, so I do this where we can. Newer peers will be able to 
communicate with older peers without issue. If older peers elect to send 
object-serialized primitives, though, newer peers will reject the message 
unless configured to accept legacy serialization. This is intended behavior.

I'm still working through 0.94 tests.


was (Author: apurtell):
I investigated the test failures and found some issues. 

The first is we never added efficient support for serializing our Pair type. We 
rely on generic object serialization for it. I fixed this problem. 
Unfortunately I cannot be 100% backwards compatible. We can't just whitelist 
Pair. A Pair can hold any other type of object. We get to see that we have a 
Pair, but not the types contained within until after deserialization, and 
that's too late. Therefore I've added a code for Pair and special case handling 
for it, like we do with List. Older peers will not understand this change. The 
APIs affected are HMasterInterface#getAlterStatus and 
HRegionInterface#bulkLoadHFiles. Sorry, cannot be helped and avoid risk of 
exploit. However, thankfully it's only two APIs that are not super commonly 
used. To reiterate, 100% compatibility won't be possible. If that is required, 
then we must close this as Wont Fix. 

I also discovered we are generically serializing the java.lang.* types in some 
cases. However we will handle the primitive types in a backwards compatible way 
if we simply unbox, so I do this where we can. Newer peers will be able to 
communicate with older peers without issue. If older peers elect send 
object-serialized primitives, though, newer peers will reject the message 
unless configured to accept legacy serialization. This is intended behavior.

I'm still working through 0.94 tests.

> Commons-collections object deserialization remote command execution 
> vulnerability 
> --
>
> Key: HBASE-14799
> URL: https://issues.apache.org/jira/browse/HBASE-14799
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Critical
> Fix For: 0.94.28, 0.98.17
>
> Attachments: HBASE-14799-0.94.patch, HBASE-14799-0.98.patch
>
>
> Read: 
> http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/
> TL;DR: If you have commons-collections on your classpath and accept and 
> process Java object serialization data, then you probably have an exploitable 
> remote command execution vulnerability. 
> 0.94 and earlier HBase releases are vulnerable because we might read in and 
> rehydrate serialized Java objects out of RPC packet data in 
> HbaseObjectWritable using ObjectInputStream#readObject (see 
> https://hbase.apache.org/0.94/xref/org/apache/hadoop/hbase/io/HbaseObjectWritable.html#714)
>  and we have commons-collections on the classpath on the server.
> 0.98 also carries some limited exposure to this problem through inclusion of 
> backwards compatible deserialization code in 
> HbaseObjectWritableFor96Migration. This is used by the 0.94-to-0.98 migration 
> utility, and by the AccessController when reading permissions from the ACL 
> table serialized in legacy format by 0.94. Unprivileged users cannot run the 
> tool nor access the ACL table.
> Unprivileged users can however attack a 0.94 installation. An attacker might 
> be able to use the method discussed on that blog post to 

[jira] [Updated] (HBASE-14804) HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute

2015-11-12 Thread Romil Choksi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Romil Choksi updated HBASE-14804:
-
Description: 
I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
that table
{code}
hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
'true'}
An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
0 row(s) in 4.2670 seconds

=> Hbase::Table - test-table-4
hbase(main):021:0> desc 'test-table-4'
Table test-table-4 is ENABLED   


test-table-4


COLUMN FAMILIES DESCRIPTION 


{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', 
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}   


1 row(s) in 0.0430 seconds
{code}

However, on doing an alter command on that table we can set the 
NORMALIZATION_ENABLED attribute for that table
{code}
hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'}
Unknown argument ignored: NORMALIZATION_ENABLED
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.3640 seconds

hbase(main):023:0> desc 'test-table-4'
Table test-table-4 is ENABLED   


test-table-4, {TABLE_ATTRIBUTES => {NORMALIZATION_ENABLED => 'true'}


COLUMN FAMILIES DESCRIPTION 


{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', 
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}   


1 row(s) in 0.0190 seconds
{code}
I think it would be better to have a single step process to enable 
normalization while creating the table itself, rather than a two step process 
to alter the table later on to enable normalization

  was:
I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
that table
hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
'true'}
An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
0 row(s) in 4.2670 seconds

=> Hbase::Table - test-table-4
hbase(main):021:0> desc 'test-table-4'
Table test-table-4 is ENABLED   


test-table-4


COLUMN FAMILIES DESCRIPTION 


{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', 
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}   


1 row(s) in 0.0430 seconds

[jira] [Commented] (HBASE-14804) HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute

2015-11-12 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003389#comment-15003389
 ] 

Jean-Marc Spaggiari commented on HBASE-14804:
-

Patch is done ;) Uploading.

> HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute
> 
>
> Key: HBASE-14804
> URL: https://issues.apache.org/jira/browse/HBASE-14804
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 1.1.2
>Reporter: Romil Choksi
>Assignee: Appy
>
> I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
> but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
> attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
> that table
> {code}
> hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
> 'true'}
> An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
> 0 row(s) in 4.2670 seconds
> => Hbase::Table - test-table-4
> hbase(main):021:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0430 seconds
> {code}
> However, on doing an alter command on that table we can set the 
> NORMALIZATION_ENABLED attribute for that table
> {code}
> hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'}
> Unknown argument ignored: NORMALIZATION_ENABLED
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 2.3640 seconds
> hbase(main):023:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4, {TABLE_ATTRIBUTES => {NORMALIZATION_ENABLED => 'true'}  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0190 seconds
> {code}
> I think it would be better to have a single step process to enable 
> normalization while creating the table itself, rather than a two step process 
> to alter the table later on to enable normalization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14804) HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute

2015-11-12 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-14804:

Status: Patch Available  (was: Open)

It's a .rb file so the build will not be able to test it but I tested it on 
trunk and it works with
{code}
create 'test-table-4', {NORMALIZATION_ENABLED => 'TRUE'}, { NAME => 'cf' }
{code}

> HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute
> 
>
> Key: HBASE-14804
> URL: https://issues.apache.org/jira/browse/HBASE-14804
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 1.1.2
>Reporter: Romil Choksi
>Assignee: Appy
> Attachments: HBASE-14804.v0-trunk.patch
>
>
> I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
> but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
> attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
> that table
> {code}
> hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
> 'true'}
> An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
> 0 row(s) in 4.2670 seconds
> => Hbase::Table - test-table-4
> hbase(main):021:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0430 seconds
> {code}
> However, on doing an alter command on that table we can set the 
> NORMALIZATION_ENABLED attribute for that table
> {code}
> hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'}
> Unknown argument ignored: NORMALIZATION_ENABLED
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 2.3640 seconds
> hbase(main):023:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4, {TABLE_ATTRIBUTES => {NORMALIZATION_ENABLED => 'true'}  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0190 seconds
> {code}
> I think it would be better to have a single step process to enable 
> normalization while creating the table itself, rather than a two step process 
> to alter the table later on to enable normalization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14804) HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute

2015-11-12 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-14804:

Attachment: HBASE-14804.v0-trunk.patch

> HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute
> 
>
> Key: HBASE-14804
> URL: https://issues.apache.org/jira/browse/HBASE-14804
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 1.1.2
>Reporter: Romil Choksi
>Assignee: Appy
> Attachments: HBASE-14804.v0-trunk.patch
>
>
> I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
> but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
> attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
> that table
> {code}
> hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
> 'true'}
> An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
> 0 row(s) in 4.2670 seconds
> => Hbase::Table - test-table-4
> hbase(main):021:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0430 seconds
> {code}
> However, on doing an alter command on that table we can set the 
> NORMALIZATION_ENABLED attribute for that table
> {code}
> hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'}
> Unknown argument ignored: NORMALIZATION_ENABLED
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 2.3640 seconds
> hbase(main):023:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4, {TABLE_ATTRIBUTES => {NORMALIZATION_ENABLED => 'true'}  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0190 seconds
> {code}
> I think it would be better to have a single step process to enable 
> normalization while creating the table itself, rather than a two step process 
> to alter the table later on to enable normalization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14355) Scan different TimeRange for each column family

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003201#comment-15003201
 ] 

Hudson commented on HBASE-14355:


SUCCESS: Integrated in HBase-Trunk_matrix #461 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/461/])
HBASE-14355 Scan different TimeRange for each column family (Churro (stack: rev 
290ecbe829662775daf7153cc0729a5465d7fb32)
* hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DefaultMemStore.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/Get.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java
* hbase-protocol/src/main/protobuf/HBase.proto
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/io/TimeRange.java
* hbase-protocol/src/main/protobuf/Client.proto
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/NonLazyKeyValueScanner.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/client/Query.java


> Scan different TimeRange for each column family
> ---
>
> Key: HBASE-14355
> URL: https://issues.apache.org/jira/browse/HBASE-14355
> Project: HBase
>  Issue Type: New Feature
>  Components: Client, regionserver, Scanners
>Reporter: Dave Latham
>Assignee: churro morales
> Fix For: 2.0.0, 1.3.0, 0.98.17
>
> Attachments: HBASE-14355-v1.patch, HBASE-14355-v10.patch, 
> HBASE-14355-v11.patch, HBASE-14355-v2.patch, HBASE-14355-v3.patch, 
> HBASE-14355-v4.patch, HBASE-14355-v5.patch, HBASE-14355-v6.patch, 
> HBASE-14355-v7.patch, HBASE-14355-v8.patch, HBASE-14355-v9.patch, 
> HBASE-14355.branch-1.patch, HBASE-14355.patch
>
>
> At present the Scan API supports only table level time range. We have 
> specific use cases that will benefit from per column family time range. (See 
> background discussion at 
> https://mail-archives.apache.org/mod_mbox/hbase-user/201508.mbox/%3ccaa4mzom00ef5eoxstk0hetxeby8mqss61gbvgttgpaspmhq...@mail.gmail.com%3E)
> There are a couple of choices that would be good to validate.  First - how to 
> update the Scan API to support family and table level updates.  One proposal 
> would be to add Scan.setTimeRange(byte family, long minTime, long maxTime), 
> then store it in a Map.  When executing the scan, if a 
> family has a specified TimeRange, then use it, otherwise fall back to using 
> the table level TimeRange.  Clients using the new API against old region 
> servers would not get the families correctly filterd.  Old clients sending 
> scans to new region servers would work correctly.
> The other question is how to get StoreFileScanner.shouldUseScanner to match 
> up the proper family and time range.  It has the Scan available but doesn't 
> currently have available which family it is a part of.  One option would be 
> to try to pass down the column family in each constructor path.  Another 
> would be to instead alter shouldUseScanner to pass down the specific 
> TimeRange to use (similar to how it currently passes down the columns to use 
> which also appears to be a workaround for not having the family available). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14498) Master stuck in infinite loop when all Zookeeper servers are unreachable

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003202#comment-15003202
 ] 

Hudson commented on HBASE-14498:


SUCCESS: Integrated in HBase-Trunk_matrix #461 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/461/])
HBASE-14498 Revert for on-going review (tedyu: rev 
789f8a5a70242c16ce10bc95401c51c7d04debfa)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* 
hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperWatcher.java


> Master stuck in infinite loop when all Zookeeper servers are unreachable
> 
>
> Key: HBASE-14498
> URL: https://issues.apache.org/jira/browse/HBASE-14498
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Pankaj Kumar
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4
>
> Attachments: HBASE-14498-V2.patch, HBASE-14498-V3.patch, 
> HBASE-14498-V4.patch, HBASE-14498.patch
>
>
> We met a weird scenario in our production environment.
> In a HA cluster,
> > Active Master (HM1) is not able to connect to any Zookeeper server (due to 
> > N/w breakdown on master machine network with Zookeeper servers).
> {code}
> 2015-09-26 15:24:47,508 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 33463ms for sessionid 0x104576b8dda0002, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:24:47,877 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:48,236 INFO [main-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:49,879 WARN 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:49,879 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-IP1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:24:50,238 WARN [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:50,238 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-Host1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:25:17,470 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 30023ms for sessionid 0x2045762cc710006, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:25:17,571 WARN [master/HM1-Host/HM1-IP:16000] 
> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, 
> quorum=ZK-Host:2181,ZK-Host1:2181,ZK-Host2:2181, 
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2015-09-26 15:25:17,872 INFO [main-SendThread(ZK-Host:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host 2181
> 2015-09-26 15:25:19,874 WARN [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host
> 2015-09-26 15:25:19,874 INFO [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server ZK-Host/ZK-IP:2181. 
> Will not attempt to authenticate using SASL (unknown error)
> {code}
> > Since HM1 was not able to connect to any ZK, so session timeout didnt 
> > happen at Zookeeper server side and HM1 didnt abort.
> > On Zookeeper session timeout standby master (HM2) registered himself as an 
> > active master. 
> > HM2 is keep on waiting for region server to report him as part of active 
> > master intialization.
> {noformat} 
> 2015-09-26 15:24:44,928 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 0 ms, 
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval 
> of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> ---
> ---
> 2015-09-26 15:32:50,841 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 483913 
> ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, 
> interval of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> {noformat}
> > At other end, region servers are reporting to HM1 on 3 sec interval. Here 
> > region server 

[jira] [Assigned] (HBASE-14804) HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute

2015-11-12 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari reassigned HBASE-14804:
---

Assignee: Jean-Marc Spaggiari  (was: Appy)

> HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute
> 
>
> Key: HBASE-14804
> URL: https://issues.apache.org/jira/browse/HBASE-14804
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Romil Choksi
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14804.v0-trunk.patch
>
>
> I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
> but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
> attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
> that table
> {code}
> hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
> 'true'}
> An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
> 0 row(s) in 4.2670 seconds
> => Hbase::Table - test-table-4
> hbase(main):021:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0430 seconds
> {code}
> However, on doing an alter command on that table we can set the 
> NORMALIZATION_ENABLED attribute for that table
> {code}
> hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'}
> Unknown argument ignored: NORMALIZATION_ENABLED
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 2.3640 seconds
> hbase(main):023:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4, {TABLE_ATTRIBUTES => {NORMALIZATION_ENABLED => 'true'}  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0190 seconds
> {code}
> I think it would be better to have a single step process to enable 
> normalization while creating the table itself, rather than a two step process 
> to alter the table later on to enable normalization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14804) HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute

2015-11-12 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-14804:

Affects Version/s: 1.2.0

> HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute
> 
>
> Key: HBASE-14804
> URL: https://issues.apache.org/jira/browse/HBASE-14804
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Romil Choksi
>Assignee: Appy
>  Labels: beginner
> Attachments: HBASE-14804.v0-trunk.patch
>
>
> I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
> but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
> attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
> that table
> {code}
> hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
> 'true'}
> An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
> 0 row(s) in 4.2670 seconds
> => Hbase::Table - test-table-4
> hbase(main):021:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0430 seconds
> {code}
> However, on doing an alter command on that table we can set the 
> NORMALIZATION_ENABLED attribute for that table
> {code}
> hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'}
> Unknown argument ignored: NORMALIZATION_ENABLED
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 2.3640 seconds
> hbase(main):023:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4, {TABLE_ATTRIBUTES => {NORMALIZATION_ENABLED => 'true'}  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0190 seconds
> {code}
> I think it would be better to have a single step process to enable 
> normalization while creating the table itself, rather than a two step process 
> to alter the table later on to enable normalization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14804) HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute

2015-11-12 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-14804:

Labels: beginner  (was: )

> HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute
> 
>
> Key: HBASE-14804
> URL: https://issues.apache.org/jira/browse/HBASE-14804
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Romil Choksi
>Assignee: Appy
>  Labels: beginner
> Attachments: HBASE-14804.v0-trunk.patch
>
>
> I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
> but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
> attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
> that table
> {code}
> hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
> 'true'}
> An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
> 0 row(s) in 4.2670 seconds
> => Hbase::Table - test-table-4
> hbase(main):021:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0430 seconds
> {code}
> However, on doing an alter command on that table we can set the 
> NORMALIZATION_ENABLED attribute for that table
> {code}
> hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'}
> Unknown argument ignored: NORMALIZATION_ENABLED
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 2.3640 seconds
> hbase(main):023:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4, {TABLE_ATTRIBUTES => {NORMALIZATION_ENABLED => 'true'}  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0190 seconds
> {code}
> I think it would be better to have a single step process to enable 
> normalization while creating the table itself, rather than a two step process 
> to alter the table later on to enable normalization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14804) HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute

2015-11-12 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-14804:

Priority: Minor  (was: Major)

> HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute
> 
>
> Key: HBASE-14804
> URL: https://issues.apache.org/jira/browse/HBASE-14804
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Romil Choksi
>Assignee: Appy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14804.v0-trunk.patch
>
>
> I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
> but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
> attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
> that table
> {code}
> hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
> 'true'}
> An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
> 0 row(s) in 4.2670 seconds
> => Hbase::Table - test-table-4
> hbase(main):021:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0430 seconds
> {code}
> However, on doing an alter command on that table we can set the 
> NORMALIZATION_ENABLED attribute for that table
> {code}
> hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'}
> Unknown argument ignored: NORMALIZATION_ENABLED
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 2.3640 seconds
> hbase(main):023:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4, {TABLE_ATTRIBUTES => {NORMALIZATION_ENABLED => 'true'}  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0190 seconds
> {code}
> I think it would be better to have a single step process to enable 
> normalization while creating the table itself, rather than a two step process 
> to alter the table later on to enable normalization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14189) CF Level BC setting should override global one.

2015-11-12 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-14189:
--
Attachment: HBASE-14189_v6.patch

Update patch as [~anoop.hbase] great suggestions.

> CF Level BC setting should override global one.
> ---
>
> Key: HBASE-14189
> URL: https://issues.apache.org/jira/browse/HBASE-14189
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Heng Chen
>Assignee: Heng Chen
> Attachments: HBASE-14189.patch, HBASE-14189_v1.patch, 
> HBASE-14189_v2.patch, HBASE-14189_v3.patch, HBASE-14189_v4.patch, 
> HBASE-14189_v5.patch, HBASE-14189_v6.patch
>
>
> The original description is ambiguous. I think i will rewrite it.
> Let's see {{BlockCache}} constructor firstly
> {code}
>   public CacheConfig(Configuration conf, HColumnDescriptor family) {
> this(CacheConfig.instantiateBlockCache(conf),
> family.isBlockCacheEnabled(),
> family.isInMemory(),
> // For the following flags we enable them regardless of per-schema 
> settings
> // if they are enabled in the global configuration.
> conf.getBoolean(CACHE_BLOCKS_ON_WRITE_KEY,
> DEFAULT_CACHE_DATA_ON_WRITE) || family.isCacheDataOnWrite(),
> conf.getBoolean(CACHE_INDEX_BLOCKS_ON_WRITE_KEY,
> DEFAULT_CACHE_INDEXES_ON_WRITE) || family.isCacheIndexesOnWrite(),
> conf.getBoolean(CACHE_BLOOM_BLOCKS_ON_WRITE_KEY,
> DEFAULT_CACHE_BLOOMS_ON_WRITE) || family.isCacheBloomsOnWrite(),
> conf.getBoolean(EVICT_BLOCKS_ON_CLOSE_KEY,
> DEFAULT_EVICT_ON_CLOSE) || family.isEvictBlocksOnClose(),
> conf.getBoolean(CACHE_DATA_BLOCKS_COMPRESSED_KEY, 
> DEFAULT_CACHE_DATA_COMPRESSED),
> conf.getBoolean(PREFETCH_BLOCKS_ON_OPEN_KEY,
> DEFAULT_PREFETCH_ON_OPEN) || family.isPrefetchBlocksOnOpen(),
> conf.getBoolean(HColumnDescriptor.CACHE_DATA_IN_L1,
> HColumnDescriptor.DEFAULT_CACHE_DATA_IN_L1) || 
> family.isCacheDataInL1(),
> 
> conf.getBoolean(DROP_BEHIND_CACHE_COMPACTION_KEY,DROP_BEHIND_CACHE_COMPACTION_DEFAULT)
>  );
>   }
> {code}
> If we dig in it,  we will see {{CacheConfig.cacheDataOnRead}} is used to 
> accept {{family.isBlockCacheEnabled()}}.  
> I think it is confused as comments about {{cacheDataOnRead}}
> {code}
>   /**
>* Whether blocks should be cached on read (default is on if there is a
>* cache but this can be turned off on a per-family or per-request basis).
>* If off we will STILL cache meta blocks; i.e. INDEX and BLOOM types.
>* This cannot be disabled.
>*/
>   private boolean cacheDataOnRead;
> {code}
> So i think we should use another variable to represent for 
> {{family.isBlockCacheEnabled()}}.
> The secondary point is we use 'or' to decide {{cacheDataOnWrite}} is on/off 
> when both CF and global has this setting.
> {code}
> conf.getBoolean(CACHE_BLOCKS_ON_WRITE_KEY,
> DEFAULT_CACHE_DATA_ON_WRITE) || family.isCacheDataOnWrite()
> {code}
> IMO we should use CF Level setting to override global setting. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14805) status should show the master in shell

2015-11-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003458#comment-15003458
 ] 

Hadoop QA commented on HBASE-14805:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12772098/hbase-14805_v1.patch
  against master branch at commit 789f8a5a70242c16ce10bc95401c51c7d04debfa.
  ATTACHMENT ID: 12772098

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail with Hadoop version 2.5.0.

Compilation errors resume:
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean) on 
project hbase-server: Failed to clean project: Failed to delete 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/hbase-server/target
 -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hbase-server


Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16505//console

This message is automatically generated.

> status should show the master in shell
> --
>
> Key: HBASE-14805
> URL: https://issues.apache.org/jira/browse/HBASE-14805
> Project: HBase
>  Issue Type: Improvement
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: hbase-14805_v1.patch
>
>
> {{status 'simple'}} or {{'detailed'}} only shows the regionservers and 
> regions, but not the active master. Actually, there is no way to know about 
> the active masters from the shell it seems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14223) Meta WALs are not cleared if meta region was closed and RS aborts

2015-11-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003454#comment-15003454
 ] 

Hadoop QA commented on HBASE-14223:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12772099/hbase-14223_v2-branch-1.patch
  against branch-1 branch at commit 789f8a5a70242c16ce10bc95401c51c7d04debfa.
  ATTACHMENT ID: 12772099

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 19 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16506//console

This message is automatically generated.

> Meta WALs are not cleared if meta region was closed and RS aborts
> -
>
> Key: HBASE-14223
> URL: https://issues.apache.org/jira/browse/HBASE-14223
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4, 1.0.4
>
> Attachments: HBASE-14223logs, hbase-14223_v0.patch, 
> hbase-14223_v1-branch-1.patch, hbase-14223_v2-branch-1.patch
>
>
> When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed. 
> The last WAL file just sits there in the RS WAL directory. If RS stops 
> gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for 
> meta is not cleaned. It is also not split (which is correct) since master 
> determines that the RS no longer hosts meta at the time of RS abort. 
> From a cluster after running ITBLL with CM, I see a lot of {{-splitting}} 
> directories left uncleaned: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs
> Found 31 items
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 01:14 
> /apps/hbase/data/WALs/hregion-58203265
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 07:54 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 09:28 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 10:01 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting
> ...
> {code}
> The directories contain WALs from meta: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> Found 2 items
> -rw-r--r--   3 hbase hadoop 201608 2015-06-05 03:15 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
> -rw-r--r--   3 hbase hadoop  44420 2015-06-05 04:36 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> The RS hosted the meta region for some time: 
> {code}
> 2015-06-05 03:14:28,692 INFO  [PostOpenDeployTasks:1588230740] 
> zookeeper.MetaTableLocator: Setting hbase:meta region location in ZooKeeper 
> as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285
> ...
> 2015-06-05 03:15:17,302 INFO  
> [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion: Closed 
> hbase:meta,,1.1588230740
> {code}
> In between, a WAL is created: 
> {code}
> 2015-06-05 03:15:11,707 INFO  
> [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller] wal.FSHLog: 
> Rolled WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
>  with entries=385, filesize=196.88 KB; new WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> When CM killed the region server later master did not see these WAL files: 
> {code}
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075 
> INFO  [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] 
> master.SplitLogManager: started splitting 2 logs in 
> [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting]
>  for [os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285]
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:47,300 
> INFO  [main-EventThread] wal.WALSplitter: Archived processed log 
> 

[jira] [Updated] (HBASE-14223) Meta WALs are not cleared if meta region was closed and RS aborts

2015-11-12 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-14223:
--
Attachment: hbase-14223_v3-branch-1.patch

rebased.

> Meta WALs are not cleared if meta region was closed and RS aborts
> -
>
> Key: HBASE-14223
> URL: https://issues.apache.org/jira/browse/HBASE-14223
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4, 1.0.4
>
> Attachments: HBASE-14223logs, hbase-14223_v0.patch, 
> hbase-14223_v1-branch-1.patch, hbase-14223_v2-branch-1.patch, 
> hbase-14223_v3-branch-1.patch
>
>
> When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed. 
> The last WAL file just sits there in the RS WAL directory. If RS stops 
> gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for 
> meta is not cleaned. It is also not split (which is correct) since master 
> determines that the RS no longer hosts meta at the time of RS abort. 
> From a cluster after running ITBLL with CM, I see a lot of {{-splitting}} 
> directories left uncleaned: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs
> Found 31 items
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 01:14 
> /apps/hbase/data/WALs/hregion-58203265
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 07:54 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 09:28 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 10:01 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting
> ...
> {code}
> The directories contain WALs from meta: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> Found 2 items
> -rw-r--r--   3 hbase hadoop 201608 2015-06-05 03:15 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
> -rw-r--r--   3 hbase hadoop  44420 2015-06-05 04:36 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> The RS hosted the meta region for some time: 
> {code}
> 2015-06-05 03:14:28,692 INFO  [PostOpenDeployTasks:1588230740] 
> zookeeper.MetaTableLocator: Setting hbase:meta region location in ZooKeeper 
> as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285
> ...
> 2015-06-05 03:15:17,302 INFO  
> [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion: Closed 
> hbase:meta,,1.1588230740
> {code}
> In between, a WAL is created: 
> {code}
> 2015-06-05 03:15:11,707 INFO  
> [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller] wal.FSHLog: 
> Rolled WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
>  with entries=385, filesize=196.88 KB; new WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> When CM killed the region server later master did not see these WAL files: 
> {code}
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075 
> INFO  [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] 
> master.SplitLogManager: started splitting 2 logs in 
> [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting]
>  for [os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285]
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:47,300 
> INFO  [main-EventThread] wal.WALSplitter: Archived processed log 
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
>  to 
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:50,497 
> INFO  [main-EventThread] wal.WALSplitter: Archived processed log 
> 

[jira] [Commented] (HBASE-14804) HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute

2015-11-12 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003479#comment-15003479
 ] 

Appy commented on HBASE-14804:
--

Just spent some redundant time on this because I wasn't aware that you were 
working on it in background too. But that's cool. Just make sure to assign 
issues to yourself if you planning to work on them. :)

As for the patch, lgtm.
Couple of suggestions to consider:
- This bug resulted from code duplication. Consider refactoring out common code 
into separate function to prevent the same in future. (You'll uncover another 
bug in the process)
- If you absolutely want to run the unit tests, you can create temporary 
[TestShell|https://github.com/apache/hbase/blob/dff86542d558394cc87ede256bd5432d071ed73f/hbase-shell/src/test/java/org/apache/hadoop/hbase/client/TestShell.java]
 in your repo.


> HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute
> 
>
> Key: HBASE-14804
> URL: https://issues.apache.org/jira/browse/HBASE-14804
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Romil Choksi
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14804.v0-trunk.patch
>
>
> I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
> but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
> attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
> that table
> {code}
> hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
> 'true'}
> An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
> 0 row(s) in 4.2670 seconds
> => Hbase::Table - test-table-4
> hbase(main):021:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0430 seconds
> {code}
> However, on doing an alter command on that table we can set the 
> NORMALIZATION_ENABLED attribute for that table
> {code}
> hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'}
> Unknown argument ignored: NORMALIZATION_ENABLED
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 2.3640 seconds
> hbase(main):023:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4, {TABLE_ATTRIBUTES => {NORMALIZATION_ENABLED => 'true'}  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0190 seconds
> {code}
> I think it would be better to have a single step process to enable 
> normalization while creating the table itself, rather than a two step process 
> to alter the table later on to enable normalization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only

2015-11-12 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003477#comment-15003477
 ] 

Sean Busbey commented on HBASE-14790:
-

We don't need to sync in a different thread. That's old code I've yet to
see benchmark justification for.

-- 
Sean



> Implement a new DFSOutputStream for logging WAL only
> 
>
> Key: HBASE-14790
> URL: https://issues.apache.org/jira/browse/HBASE-14790
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all 
> purposes. But in fact, we do not need most of the features if we only want to 
> log WAL. For example, we do not need pipeline recovery since we could just 
> close the old logger and open a new one. And also, we do not need to write 
> multiple blocks since we could also open a new logger if the old file is too 
> large.
> And the most important thing is that, it is hard to handle all the corner 
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when 
> using original DFSOutputStream due to its complicated logic. And the 
> complicated logic also force us to use some magical tricks to increase 
> performance. For example, we need to use multiple threads to call {{hflush}} 
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when 
> logging WAL. For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14223) Meta WALs are not cleared if meta region was closed and RS aborts

2015-11-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003599#comment-15003599
 ] 

Hadoop QA commented on HBASE-14223:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12772103/hbase-14223_v3-branch-1.patch
  against branch-1 branch at commit 789f8a5a70242c16ce10bc95401c51c7d04debfa.
  ATTACHMENT ID: 12772103

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 19 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
3776 checkstyle errors (more than the master's current 3773 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16507//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16507//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16507//artifact/patchprocess/checkstyle-aggregate.html

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16507//console

This message is automatically generated.

> Meta WALs are not cleared if meta region was closed and RS aborts
> -
>
> Key: HBASE-14223
> URL: https://issues.apache.org/jira/browse/HBASE-14223
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4, 1.0.4
>
> Attachments: HBASE-14223logs, hbase-14223_v0.patch, 
> hbase-14223_v1-branch-1.patch, hbase-14223_v2-branch-1.patch, 
> hbase-14223_v3-branch-1.patch
>
>
> When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed. 
> The last WAL file just sits there in the RS WAL directory. If RS stops 
> gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for 
> meta is not cleaned. It is also not split (which is correct) since master 
> determines that the RS no longer hosts meta at the time of RS abort. 
> From a cluster after running ITBLL with CM, I see a lot of {{-splitting}} 
> directories left uncleaned: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs
> Found 31 items
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 01:14 
> /apps/hbase/data/WALs/hregion-58203265
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 07:54 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 09:28 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 10:01 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting
> ...
> {code}
> The directories contain WALs from meta: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> Found 2 items
> -rw-r--r--   3 hbase hadoop 201608 2015-06-05 03:15 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
> -rw-r--r--   3 hbase hadoop  44420 2015-06-05 04:36 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> The RS hosted the meta region for some time: 
> {code}
> 2015-06-05 03:14:28,692 INFO  [PostOpenDeployTasks:1588230740] 
> zookeeper.MetaTableLocator: Setting hbase:meta 

[jira] [Updated] (HBASE-14798) NPE reporting server load causes regionserver abort; causes TestAcidGuarantee to fail

2015-11-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14798:
--
Attachment: 14798.patch

Address the NPE. Add protection against other possible NPEs. Undid some 
duplicated code.

> NPE reporting server load causes regionserver abort; causes TestAcidGuarantee 
> to fail
> -
>
> Key: HBASE-14798
> URL: https://issues.apache.org/jira/browse/HBASE-14798
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
> Attachments: 14798.patch
>
>
> Below crashed out a RS. Caused TestAcidGuarantees to fail because then there 
> were not RS to assign too... 
> {code}
> 2015-11-11 11:36:23,092 ERROR 
> [B.defaultRpcServer.handler=4,queue=0,port=58655] 
> master.MasterRpcServices(388): Region server 
> asf907.gq1.ygridcore.net,55184,1447241756717 reported a fatal error:
> ABORTING region server asf907.gq1.ygridcore.net,55184,1447241756717: 
> Unhandled: null
> Cause:
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getOldestHfileTs(HRegion.java:1643)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:1503)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1210)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1153)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:969)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here is the failure: 
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-Trunk_matrix/457/jdk=latest1.8,label=Hadoop/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.TestAcidGuarantees-output.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14798) NPE reporting server load causes regionserver abort; causes TestAcidGuarantee to fail

2015-11-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14798:
--
Status: Patch Available  (was: Open)

> NPE reporting server load causes regionserver abort; causes TestAcidGuarantee 
> to fail
> -
>
> Key: HBASE-14798
> URL: https://issues.apache.org/jira/browse/HBASE-14798
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
> Attachments: 14798.patch
>
>
> Below crashed out a RS. Caused TestAcidGuarantees to fail because then there 
> were not RS to assign too... 
> {code}
> 2015-11-11 11:36:23,092 ERROR 
> [B.defaultRpcServer.handler=4,queue=0,port=58655] 
> master.MasterRpcServices(388): Region server 
> asf907.gq1.ygridcore.net,55184,1447241756717 reported a fatal error:
> ABORTING region server asf907.gq1.ygridcore.net,55184,1447241756717: 
> Unhandled: null
> Cause:
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getOldestHfileTs(HRegion.java:1643)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:1503)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1210)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1153)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:969)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here is the failure: 
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-Trunk_matrix/457/jdk=latest1.8,label=Hadoop/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.TestAcidGuarantees-output.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14802) Replaying server crash recovery procedure after a failover causes incorrect handling of deadservers

2015-11-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003643#comment-15003643
 ] 

stack commented on HBASE-14802:
---

[~ashu210890]  Thanks for the find. Makes sense. How did you discover this 
issue?  I wonder why we've not seen it before? 

On the patch, yeah, would be good if we could keep procids encapsulated. Let me 
take a look and see if I can do something on top of your patch Let me see 
if can do a test too

> Replaying server crash recovery procedure after a failover causes incorrect 
> handling of deadservers
> ---
>
> Key: HBASE-14802
> URL: https://issues.apache.org/jira/browse/HBASE-14802
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0, 1.2.0, 1.2.1
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Attachments: HBASE-14802.patch
>
>
> The way dead servers are processed is that a ServerCrashProcedure is launched 
> for a server after it is added to the dead servers list. 
> Every time a server is added to the dead list, a counter "numProcessing" is 
> incremented and it is decremented when a crash recovery procedure finishes. 
> Since, adding a dead server and recovering it are two separate events, it can 
> cause inconsistencies.
> If a master failover occurs in the middle of the crash recovery, the 
> numProcessing counter resets but the ServerCrashProcedure is replayed by the 
> new master. This causes the counter to go negative and makes the master think 
> that dead servers are still in process of recovery. 
> This has ramifications on the balancer that the balancer ceases to run after 
> such a failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14807) TestWALLockup is flakey

2015-11-12 Thread stack (JIRA)
stack created HBASE-14807:
-

 Summary: TestWALLockup is flakey
 Key: HBASE-14807
 URL: https://issues.apache.org/jira/browse/HBASE-14807
 Project: HBase
  Issue Type: Bug
  Components: flakey, test
Reporter: stack
Assignee: stack


Fails frequently. 

Looks like this:

{code}
2015-11-12 10:38:51,812 DEBUG [Time-limited test] regionserver.HRegion(3882): 
Found 0 recovered edits file(s) under 
/home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad/data/default/testLockupWhenSyncInMiddleOfZigZagSetup/c8694b53368f3301a8d370089120388d
2015-11-12 10:38:51,821 DEBUG [Time-limited test] 
regionserver.FlushLargeStoresPolicy(56): 
hbase.hregion.percolumnfamilyflush.size.lower.bound is not specified, use 
global config(16777216) instead
2015-11-12 10:38:51,880 DEBUG [Time-limited test] wal.WALSplitter(729): Wrote 
region 
seqId=/home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad/data/default/testLockupWhenSyncInMiddleOfZigZagSetup/c8694b53368f3301a8d370089120388d/recovered.edits/2.seqid
 to file, newSeqId=2, maxSeqId=0
2015-11-12 10:38:51,881 INFO  [Time-limited test] regionserver.HRegion(868): 
Onlined c8694b53368f3301a8d370089120388d; next sequenceid=2
2015-11-12 10:38:51,994 ERROR [sync.1] wal.FSHLog$SyncRunner(1226): Error 
syncing, request close of WAL
java.io.IOException: FAKE! Failed to replace a bad datanode...SYNC
at 
org.apache.hadoop.hbase.regionserver.TestWALLockup$1DodgyFSLog$1.sync(TestWALLockup.java:162)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1222)
at java.lang.Thread.run(Thread.java:745)
2015-11-12 10:38:51,997 DEBUG [Thread-4] regionserver.LogRoller(139): WAL roll 
requested
2015-11-12 10:38:52,019 DEBUG [flusher] 
regionserver.FlushLargeStoresPolicy(100): Since none of the CFs were above the 
size, flushing all.
2015-11-12 10:38:52,192 INFO  [Thread-4] 
regionserver.TestWALLockup$1DodgyFSLog(129): LATCHED
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146)
at 
org.apache.hadoop.hbase.regionserver.TestWALLockup.testLockupWhenSyncInMiddleOfZigZagSetup(TestWALLockup.java:245)
2015-11-12 10:39:18,609 INFO  [main] regionserver.TestWALLockup(91): Cleaning 
test directory: 
/home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.lang.Thread.run(Thread.java:745)

{code}

... then times out after being locked up for 30 seconds.  Writes 50+MB of logs 
while spinning.

Reported as this:

{code}
---
Test set: org.apache.hadoop.hbase.regionserver.TestWALLockup
---
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 198.23 sec <<< 
FAILURE! - in org.apache.hadoop.hbase.regionserver.TestWALLockup
testLockupWhenSyncInMiddleOfZigZagSetup(org.apache.hadoop.hbase.regionserver.TestWALLockup)
  Time elapsed: 0.049 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds
at org.apache.log4j.Category.callAppenders(Category.java:205)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at 
org.apache.commons.logging.impl.Log4JLogger.debug(Log4JLogger.java:155)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1386)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1352)
at 

[jira] [Updated] (HBASE-14807) TestWALLockup is flakey

2015-11-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14807:
--
Attachment: 14807.patch

Down timeout from 30 seconds to 15. It runs fast. Let it fail before it 
generates loads of spew.

The append/puts had not edit in them so were skirting a code path; on occasion 
we'd go intended route. We also had a flush in there to add complexity... but 
mostly it just was a noop because we didn't have anything in memstore.

Address both of above. This fixes the hang we were seeing which was us waiting 
on flush to be done 'flushing' but no flush was happening.



> TestWALLockup is flakey
> ---
>
> Key: HBASE-14807
> URL: https://issues.apache.org/jira/browse/HBASE-14807
> Project: HBase
>  Issue Type: Bug
>  Components: flakey, test
>Reporter: stack
>Assignee: stack
> Attachments: 14807.patch
>
>
> Fails frequently. 
> Looks like this:
> {code}
> 2015-11-12 10:38:51,812 DEBUG [Time-limited test] regionserver.HRegion(3882): 
> Found 0 recovered edits file(s) under 
> /home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad/data/default/testLockupWhenSyncInMiddleOfZigZagSetup/c8694b53368f3301a8d370089120388d
> 2015-11-12 10:38:51,821 DEBUG [Time-limited test] 
> regionserver.FlushLargeStoresPolicy(56): 
> hbase.hregion.percolumnfamilyflush.size.lower.bound is not specified, use 
> global config(16777216) instead
> 2015-11-12 10:38:51,880 DEBUG [Time-limited test] wal.WALSplitter(729): Wrote 
> region 
> seqId=/home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad/data/default/testLockupWhenSyncInMiddleOfZigZagSetup/c8694b53368f3301a8d370089120388d/recovered.edits/2.seqid
>  to file, newSeqId=2, maxSeqId=0
> 2015-11-12 10:38:51,881 INFO  [Time-limited test] regionserver.HRegion(868): 
> Onlined c8694b53368f3301a8d370089120388d; next sequenceid=2
> 2015-11-12 10:38:51,994 ERROR [sync.1] wal.FSHLog$SyncRunner(1226): Error 
> syncing, request close of WAL
> java.io.IOException: FAKE! Failed to replace a bad datanode...SYNC
>   at 
> org.apache.hadoop.hbase.regionserver.TestWALLockup$1DodgyFSLog$1.sync(TestWALLockup.java:162)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1222)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-11-12 10:38:51,997 DEBUG [Thread-4] regionserver.LogRoller(139): WAL 
> roll requested
> 2015-11-12 10:38:52,019 DEBUG [flusher] 
> regionserver.FlushLargeStoresPolicy(100): Since none of the CFs were above 
> the size, flushing all.
> 2015-11-12 10:38:52,192 INFO  [Thread-4] 
> regionserver.TestWALLockup$1DodgyFSLog(129): LATCHED
> java.lang.InterruptedException: sleep interrupted
>   at java.lang.Thread.sleep(Native Method)
>   at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146)
>   at 
> org.apache.hadoop.hbase.regionserver.TestWALLockup.testLockupWhenSyncInMiddleOfZigZagSetup(TestWALLockup.java:245)
> 2015-11-12 10:39:18,609 INFO  [main] regionserver.TestWALLockup(91): Cleaning 
> test directory: 
> /home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> ... then times out after being locked up for 30 seconds.  Writes 50+MB of 
> logs while spinning.
> Reported as this:
> {code}
> ---
> Test set: org.apache.hadoop.hbase.regionserver.TestWALLockup
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 198.23 sec 
> <<< 

[jira] [Updated] (HBASE-14807) TestWALLockup is flakey

2015-11-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14807:
--
Status: Patch Available  (was: Open)

> TestWALLockup is flakey
> ---
>
> Key: HBASE-14807
> URL: https://issues.apache.org/jira/browse/HBASE-14807
> Project: HBase
>  Issue Type: Bug
>  Components: flakey, test
>Reporter: stack
>Assignee: stack
> Attachments: 14807.patch
>
>
> Fails frequently. 
> Looks like this:
> {code}
> 2015-11-12 10:38:51,812 DEBUG [Time-limited test] regionserver.HRegion(3882): 
> Found 0 recovered edits file(s) under 
> /home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad/data/default/testLockupWhenSyncInMiddleOfZigZagSetup/c8694b53368f3301a8d370089120388d
> 2015-11-12 10:38:51,821 DEBUG [Time-limited test] 
> regionserver.FlushLargeStoresPolicy(56): 
> hbase.hregion.percolumnfamilyflush.size.lower.bound is not specified, use 
> global config(16777216) instead
> 2015-11-12 10:38:51,880 DEBUG [Time-limited test] wal.WALSplitter(729): Wrote 
> region 
> seqId=/home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad/data/default/testLockupWhenSyncInMiddleOfZigZagSetup/c8694b53368f3301a8d370089120388d/recovered.edits/2.seqid
>  to file, newSeqId=2, maxSeqId=0
> 2015-11-12 10:38:51,881 INFO  [Time-limited test] regionserver.HRegion(868): 
> Onlined c8694b53368f3301a8d370089120388d; next sequenceid=2
> 2015-11-12 10:38:51,994 ERROR [sync.1] wal.FSHLog$SyncRunner(1226): Error 
> syncing, request close of WAL
> java.io.IOException: FAKE! Failed to replace a bad datanode...SYNC
>   at 
> org.apache.hadoop.hbase.regionserver.TestWALLockup$1DodgyFSLog$1.sync(TestWALLockup.java:162)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1222)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-11-12 10:38:51,997 DEBUG [Thread-4] regionserver.LogRoller(139): WAL 
> roll requested
> 2015-11-12 10:38:52,019 DEBUG [flusher] 
> regionserver.FlushLargeStoresPolicy(100): Since none of the CFs were above 
> the size, flushing all.
> 2015-11-12 10:38:52,192 INFO  [Thread-4] 
> regionserver.TestWALLockup$1DodgyFSLog(129): LATCHED
> java.lang.InterruptedException: sleep interrupted
>   at java.lang.Thread.sleep(Native Method)
>   at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146)
>   at 
> org.apache.hadoop.hbase.regionserver.TestWALLockup.testLockupWhenSyncInMiddleOfZigZagSetup(TestWALLockup.java:245)
> 2015-11-12 10:39:18,609 INFO  [main] regionserver.TestWALLockup(91): Cleaning 
> test directory: 
> /home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> ... then times out after being locked up for 30 seconds.  Writes 50+MB of 
> logs while spinning.
> Reported as this:
> {code}
> ---
> Test set: org.apache.hadoop.hbase.regionserver.TestWALLockup
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 198.23 sec 
> <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.TestWALLockup
> testLockupWhenSyncInMiddleOfZigZagSetup(org.apache.hadoop.hbase.regionserver.TestWALLockup)
>   Time elapsed: 0.049 sec  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at org.apache.log4j.Category.callAppenders(Category.java:205)
>   at org.apache.log4j.Category.forcedLog(Category.java:391)
>   at 

[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only

2015-11-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003638#comment-15003638
 ] 

stack commented on HBASE-14790:
---

[~wheat9]
bq. The potential issue I see is that the DN might mask the failures and 
introduce additional delays in the pipeline. 

I can see this project running up against this sort of a problem, yes. The DN 
thinking it knows better good point.

> Implement a new DFSOutputStream for logging WAL only
> 
>
> Key: HBASE-14790
> URL: https://issues.apache.org/jira/browse/HBASE-14790
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all 
> purposes. But in fact, we do not need most of the features if we only want to 
> log WAL. For example, we do not need pipeline recovery since we could just 
> close the old logger and open a new one. And also, we do not need to write 
> multiple blocks since we could also open a new logger if the old file is too 
> large.
> And the most important thing is that, it is hard to handle all the corner 
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when 
> using original DFSOutputStream due to its complicated logic. And the 
> complicated logic also force us to use some magical tricks to increase 
> performance. For example, we need to use multiple threads to call {{hflush}} 
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when 
> logging WAL. For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14805) status should show the master in shell

2015-11-12 Thread Romil Choksi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003447#comment-15003447
 ] 

Romil Choksi commented on HBASE-14805:
--

[~enis]
Have you tried zk_dump command from hbase shell, it gives out active master, 
backup masters and region servers

{code}
hbase(main):060:0> zk_dump
HBase is rooted at /hbase-secure
Active master address: hbase-dalm20-rc-2.novalocal,2,1447301843054
Backup master addresses:
Region server holding hbase:meta: 
hbase-dalm20-rc-7.novalocal,16020,1447301860073
Region servers:
 hbase-dalm20-rc-5.novalocal,16020,1447301868926
 hbase-dalm20-rc-1.novalocal,16020,1447301859425
 hbase-dalm20-rc-2.novalocal,16020,1447301856988
{code}

> status should show the master in shell
> --
>
> Key: HBASE-14805
> URL: https://issues.apache.org/jira/browse/HBASE-14805
> Project: HBase
>  Issue Type: Improvement
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: hbase-14805_v1.patch
>
>
> {{status 'simple'}} or {{'detailed'}} only shows the regionservers and 
> regions, but not the active master. Actually, there is no way to know about 
> the active masters from the shell it seems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14803) Add some debug logs to StoreFileScanner

2015-11-12 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003526#comment-15003526
 ] 

Anoop Sam John commented on HBASE-14803:


Will it be still better if we can add the info why the file got skipped?   
(Because of Time range or bloom)   It will involve more code lines as we have 
to do boolean check and return after every single check.

> Add some debug logs to StoreFileScanner
> ---
>
> Key: HBASE-14803
> URL: https://issues.apache.org/jira/browse/HBASE-14803
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
>  Labels: beginner
> Fix For: 1.2.0
>
> Attachments: HBASE-14803.v0-trunk.patch, HBASE-14803.v1-trunk.patch
>
>
> To validate some behaviors I had to add some logs into StoreFileScanner.
> I think it can be interesting for other people looking for debuging. So 
> sharing the modifications here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14804) HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute

2015-11-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003543#comment-15003543
 ] 

Hadoop QA commented on HBASE-14804:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12772096/HBASE-14804.v0-trunk.patch
  against master branch at commit 789f8a5a70242c16ce10bc95401c51c7d04debfa.
  ATTACHMENT ID: 12772096

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16504//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16504//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16504//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16504//console

This message is automatically generated.

> HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute
> 
>
> Key: HBASE-14804
> URL: https://issues.apache.org/jira/browse/HBASE-14804
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Romil Choksi
>Assignee: Jean-Marc Spaggiari
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14804.v0-trunk.patch
>
>
> I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
> but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
> attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
> that table
> {code}
> hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
> 'true'}
> An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
> 0 row(s) in 4.2670 seconds
> => Hbase::Table - test-table-4
> hbase(main):021:0> desc 'test-table-4'
> Table test-table-4 is ENABLED 
>   
> 
> test-table-4  
>   
> 
> COLUMN FAMILIES DESCRIPTION   
>   
> 
> {NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
> KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
>   
> 
> 1 row(s) in 0.0430 seconds
> {code}
> However, on doing an alter command on that table we can set the 
> NORMALIZATION_ENABLED attribute for that table
> {code}
> hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'}
> Unknown argument ignored: NORMALIZATION_ENABLED
> Updating all regions with the new schema...
> 1/1 regions 

[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only

2015-11-12 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003560#comment-15003560
 ] 

Haohui Mai commented on HBASE-14790:


Making the errors in the pipeline visible to HBase allows HBase to detect 
failures and to recover from failures much faster. It has a lot of benefits in 
terms on reducing the latency of HBase.

An Exokernel style writer will eventually allow HBase to write to HDFS in 
parallel, which further reducing the latency by 3x.

I would suggest (1) implementing the writer in the HDFS project to reduce the 
cost of maintenance, (2) making it event-driven so that it is reusable when 
building today's {{DFSOutputStream}}. It's much harder to do so today as there 
are a lot of synchronization happening for throttling, etc.

It is relatively straightforward to implement the current client-side, pipeline 
protocol without handling failures. The potential issue I see is that the DN 
might mask the failures and introduce additional delays in the pipeline. To 
fully get the benefits it might require changing the protocol. That's being 
said, the project suddenly becomes much risker when it requires changes on the 
server side.

A less risky route is to combine the effort with the HTTP/2 initiatives of HDFS 
which allows full control on both the client and the server side. Thoughts?

> Implement a new DFSOutputStream for logging WAL only
> 
>
> Key: HBASE-14790
> URL: https://issues.apache.org/jira/browse/HBASE-14790
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all 
> purposes. But in fact, we do not need most of the features if we only want to 
> log WAL. For example, we do not need pipeline recovery since we could just 
> close the old logger and open a new one. And also, we do not need to write 
> multiple blocks since we could also open a new logger if the old file is too 
> large.
> And the most important thing is that, it is hard to handle all the corner 
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when 
> using original DFSOutputStream due to its complicated logic. And the 
> complicated logic also force us to use some magical tricks to increase 
> performance. For example, we need to use multiple threads to call {{hflush}} 
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when 
> logging WAL. For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12593) Tags and Tag dictionary to work with BB

2015-11-12 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003596#comment-15003596
 ] 

Anoop Sam John commented on HBASE-12593:


What do you think abt above approach [~stack]?

> Tags and Tag dictionary to work with BB
> ---
>
> Key: HBASE-12593
> URL: https://issues.apache.org/jira/browse/HBASE-12593
> Project: HBase
>  Issue Type: Sub-task
>  Components: regionserver, Scanners
>Reporter: ramkrishna.s.vasudevan
>Assignee: Anoop Sam John
>
> Adding the subtask so that we don't forget it. Came up while reviewing the 
> items required for this parent task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only

2015-11-12 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003595#comment-15003595
 ] 

Duo Zhang commented on HBASE-14790:
---

HTTP/2 has its own problems that we haven't finish the read path yet but the 
write protocol is much more complex than read protocol... And also, if we plan 
to do it in HDFS, I think we should make it more general, and it is better to 
design a good event-driven FileSystem interface at the beginning. I do not 
think either of them is easy...

So my plan is to implement a simple version in HBase which only compatible with 
hadoop 2.x first, make sure it has some benefits and actually ship it with 
HBase. And then, we could start implementing a more general and more powerful 
event-driven FileSystem in HDFS. When the new FileSystem is out, we could move 
HBase to use the new FileSystem in HDFS and drop the old simple version.

What do you think? [~wheat9]

Thanks.

> Implement a new DFSOutputStream for logging WAL only
> 
>
> Key: HBASE-14790
> URL: https://issues.apache.org/jira/browse/HBASE-14790
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all 
> purposes. But in fact, we do not need most of the features if we only want to 
> log WAL. For example, we do not need pipeline recovery since we could just 
> close the old logger and open a new one. And also, we do not need to write 
> multiple blocks since we could also open a new logger if the old file is too 
> large.
> And the most important thing is that, it is hard to handle all the corner 
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when 
> using original DFSOutputStream due to its complicated logic. And the 
> complicated logic also force us to use some magical tricks to increase 
> performance. For example, we need to use multiple threads to call {{hflush}} 
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when 
> logging WAL. For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14798) NPE reporting server load causes regionserver abort; causes TestAcidGuarantee to fail

2015-11-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003611#comment-15003611
 ] 

stack commented on HBASE-14798:
---

This new metric that is added to region load came in in January, here: 
"HBASE-12859 New master API to track major compaction completion." It is 
possible in most branch-1 releases.

> NPE reporting server load causes regionserver abort; causes TestAcidGuarantee 
> to fail
> -
>
> Key: HBASE-14798
> URL: https://issues.apache.org/jira/browse/HBASE-14798
> Project: HBase
>  Issue Type: Sub-task
>  Components: test
>Reporter: stack
>Assignee: stack
> Attachments: 14798.patch
>
>
> Below crashed out a RS. Caused TestAcidGuarantees to fail because then there 
> were not RS to assign too... 
> {code}
> 2015-11-11 11:36:23,092 ERROR 
> [B.defaultRpcServer.handler=4,queue=0,port=58655] 
> master.MasterRpcServices(388): Region server 
> asf907.gq1.ygridcore.net,55184,1447241756717 reported a fatal error:
> ABORTING region server asf907.gq1.ygridcore.net,55184,1447241756717: 
> Unhandled: null
> Cause:
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getOldestHfileTs(HRegion.java:1643)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:1503)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1210)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1153)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:969)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>   at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here is the failure: 
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-Trunk_matrix/457/jdk=latest1.8,label=Hadoop/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.TestAcidGuarantees-output.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only

2015-11-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003636#comment-15003636
 ] 

stack commented on HBASE-14790:
---

[~busbey]

bq. That's old code I've yet to see benchmark justification for.

Here is where having multiple syncers is first challenged: 
https://issues.apache.org/jira/browse/HBASE-8755?focusedCommentId=13830604=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13830604
 It is followed by numbers that show that throughput is better with 5. I 
remember playing with syncers after ringbuffer went in and arrived again at 5 
syncers as best for throughput.

I'll be happy to see them go. Chatting w/ [~Apache9], having a single thread 
start the syncs with callbacks to take care of letting blocked handlers go 
sounds cleaner. I'm not sure how it will look just yet. We'll want to keep our 
group commits fat and we'll want to minimize inter-thread 
communication/blocking.

[~wheat9]
bq. A less risky route is to combine the effort with the HTTP/2 initiatives of 
HDFS which allows full control on both the client and the server side. Thoughts?

I'm with [~Apache9]. Lets not broaden the scope of an already involved project 
by mixing in http/2. Lets also get a success here in hbase first -- we have a 
very particular need, we have tooling and mechanisms to verify perf and correct 
behavior -- using a subset of the write API in an async way and then move 
to the general case with this feather in our hat (Einstein published the 
special theory of relativity before he did the general).





> Implement a new DFSOutputStream for logging WAL only
> 
>
> Key: HBASE-14790
> URL: https://issues.apache.org/jira/browse/HBASE-14790
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all 
> purposes. But in fact, we do not need most of the features if we only want to 
> log WAL. For example, we do not need pipeline recovery since we could just 
> close the old logger and open a new one. And also, we do not need to write 
> multiple blocks since we could also open a new logger if the old file is too 
> large.
> And the most important thing is that, it is hard to handle all the corner 
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when 
> using original DFSOutputStream due to its complicated logic. And the 
> complicated logic also force us to use some magical tricks to increase 
> performance. For example, we need to use multiple threads to call {{hflush}} 
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when 
> logging WAL. For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14802) Replaying server crash recovery procedure after a failover causes incorrect handling of deadservers

2015-11-12 Thread Ashu Pachauri (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003648#comment-15003648
 ] 

Ashu Pachauri commented on HBASE-14802:
---

[~stack] I actually encountered one of its side effects i.e. the balancer fails 
to run in such a scenario. When I debug my way through this, I found this 
issue. I did not try, but this should be reproducible in Integration tests too, 
by expiring a few RS at once and failing over the master shortly after that.

[~stack] [~mbertozzi] I also had the same concern when I started working on 
this; there might be a more elegant solution to this problem. I am also not 
very familiar with this bit of code, but it seems that we need DeadServer to be 
aware of the fact that a separate procedure is processing the recovery which 
can be replayed many times over; which should be possible in other ways than 
leaking procID.

> Replaying server crash recovery procedure after a failover causes incorrect 
> handling of deadservers
> ---
>
> Key: HBASE-14802
> URL: https://issues.apache.org/jira/browse/HBASE-14802
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0, 1.2.0, 1.2.1
>Reporter: Ashu Pachauri
>Assignee: Ashu Pachauri
> Attachments: HBASE-14802.patch
>
>
> The way dead servers are processed is that a ServerCrashProcedure is launched 
> for a server after it is added to the dead servers list. 
> Every time a server is added to the dead list, a counter "numProcessing" is 
> incremented and it is decremented when a crash recovery procedure finishes. 
> Since, adding a dead server and recovering it are two separate events, it can 
> cause inconsistencies.
> If a master failover occurs in the middle of the crash recovery, the 
> numProcessing counter resets but the ServerCrashProcedure is replayed by the 
> new master. This causes the counter to go negative and makes the master think 
> that dead servers are still in process of recovery. 
> This has ramifications on the balancer that the balancer ceases to run after 
> such a failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14189) CF Level BC setting should override global one.

2015-11-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002053#comment-15002053
 ] 

Hadoop QA commented on HBASE-14189:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12771945/HBASE-14189_v6.patch
  against master branch at commit 1f62a487284b57fca505bc1b3d04c1f86b2e7d76.
  ATTACHMENT ID: 12771945

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16498//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16498//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16498//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16498//console

This message is automatically generated.

> CF Level BC setting should override global one.
> ---
>
> Key: HBASE-14189
> URL: https://issues.apache.org/jira/browse/HBASE-14189
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Affects Versions: 2.0.0
>Reporter: Heng Chen
>Assignee: Heng Chen
> Attachments: HBASE-14189.patch, HBASE-14189_v1.patch, 
> HBASE-14189_v2.patch, HBASE-14189_v3.patch, HBASE-14189_v4.patch, 
> HBASE-14189_v5.patch, HBASE-14189_v6.patch
>
>
> The original description is ambiguous. I think i will rewrite it.
> Let's see {{BlockCache}} constructor firstly
> {code}
>   public CacheConfig(Configuration conf, HColumnDescriptor family) {
> this(CacheConfig.instantiateBlockCache(conf),
> family.isBlockCacheEnabled(),
> family.isInMemory(),
> // For the following flags we enable them regardless of per-schema 
> settings
> // if they are enabled in the global configuration.
> conf.getBoolean(CACHE_BLOCKS_ON_WRITE_KEY,
> DEFAULT_CACHE_DATA_ON_WRITE) || family.isCacheDataOnWrite(),
> conf.getBoolean(CACHE_INDEX_BLOCKS_ON_WRITE_KEY,
> DEFAULT_CACHE_INDEXES_ON_WRITE) || family.isCacheIndexesOnWrite(),
> conf.getBoolean(CACHE_BLOOM_BLOCKS_ON_WRITE_KEY,
> DEFAULT_CACHE_BLOOMS_ON_WRITE) || family.isCacheBloomsOnWrite(),
> conf.getBoolean(EVICT_BLOCKS_ON_CLOSE_KEY,
> DEFAULT_EVICT_ON_CLOSE) || family.isEvictBlocksOnClose(),
> conf.getBoolean(CACHE_DATA_BLOCKS_COMPRESSED_KEY, 
> DEFAULT_CACHE_DATA_COMPRESSED),
> conf.getBoolean(PREFETCH_BLOCKS_ON_OPEN_KEY,
> DEFAULT_PREFETCH_ON_OPEN) || family.isPrefetchBlocksOnOpen(),
> conf.getBoolean(HColumnDescriptor.CACHE_DATA_IN_L1,
> HColumnDescriptor.DEFAULT_CACHE_DATA_IN_L1) || 
> family.isCacheDataInL1(),
> 
> conf.getBoolean(DROP_BEHIND_CACHE_COMPACTION_KEY,DROP_BEHIND_CACHE_COMPACTION_DEFAULT)
>  );
>   }
> {code}
> If we dig in it,  we will see {{CacheConfig.cacheDataOnRead}} is used to 
> accept {{family.isBlockCacheEnabled()}}.  
> I think it is confused as comments about {{cacheDataOnRead}}
> {code}
>   /**
>* Whether blocks should be cached on read (default is on if there is a
>* cache but this can be turned off on a per-family or per-request basis).
>* If off we will STILL cache meta blocks; i.e. INDEX and BLOOM types.
>* This cannot be disabled.
>*/
>   private boolean cacheDataOnRead;
> {code}
> So i think we should use another variable to represent for 
> {{family.isBlockCacheEnabled()}}.
> The 

[jira] [Updated] (HBASE-13153) Bulk Loaded HFile Replication

2015-11-12 Thread Ashish Singhi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Singhi updated HBASE-13153:
--
Attachment: HBASE-13153-v13.patch

> Bulk Loaded HFile Replication
> -
>
> Key: HBASE-13153
> URL: https://issues.apache.org/jira/browse/HBASE-13153
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: sunhaitao
>Assignee: Ashish Singhi
> Fix For: 2.0.0
>
> Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch, 
> HBASE-13153-v11.patch, HBASE-13153-v12.patch, HBASE-13153-v13.patch, 
> HBASE-13153-v2.patch, HBASE-13153-v3.patch, HBASE-13153-v4.patch, 
> HBASE-13153-v5.patch, HBASE-13153-v6.patch, HBASE-13153-v7.patch, 
> HBASE-13153-v8.patch, HBASE-13153-v9.patch, HBASE-13153.patch, HBase Bulk 
> Load Replication-v1-1.pdf, HBase Bulk Load Replication-v2.pdf, HBase Bulk 
> Load Replication-v3.pdf, HBase Bulk Load Replication.pdf, HDFS_HA_Solution.PNG
>
>
> Currently we plan to use HBase Replication feature to deal with disaster 
> tolerance scenario.But we encounter an issue that we will use bulkload very 
> frequently,because bulkload bypass write path, and will not generate WAL, so 
> the data will not be replicated to backup cluster. It's inappropriate to 
> bukload twice both on active cluster and backup cluster. So i advise do some 
> modification to bulkload feature to enable bukload to both active cluster and 
> backup cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13153) Bulk Loaded HFile Replication

2015-11-12 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002066#comment-15002066
 ] 

Ashish Singhi commented on HBASE-13153:
---

Attached patch(v13) addressing Ted's comments.
Please review. If no further reviews I request please commit this, I have 
already spent lot of efforts in re-basing it! If any more issue I can fix them 
as part of sub tasks.

> Bulk Loaded HFile Replication
> -
>
> Key: HBASE-13153
> URL: https://issues.apache.org/jira/browse/HBASE-13153
> Project: HBase
>  Issue Type: New Feature
>  Components: Replication
>Reporter: sunhaitao
>Assignee: Ashish Singhi
> Fix For: 2.0.0
>
> Attachments: HBASE-13153-v1.patch, HBASE-13153-v10.patch, 
> HBASE-13153-v11.patch, HBASE-13153-v12.patch, HBASE-13153-v13.patch, 
> HBASE-13153-v2.patch, HBASE-13153-v3.patch, HBASE-13153-v4.patch, 
> HBASE-13153-v5.patch, HBASE-13153-v6.patch, HBASE-13153-v7.patch, 
> HBASE-13153-v8.patch, HBASE-13153-v9.patch, HBASE-13153.patch, HBase Bulk 
> Load Replication-v1-1.pdf, HBase Bulk Load Replication-v2.pdf, HBase Bulk 
> Load Replication-v3.pdf, HBase Bulk Load Replication.pdf, HDFS_HA_Solution.PNG
>
>
> Currently we plan to use HBase Replication feature to deal with disaster 
> tolerance scenario.But we encounter an issue that we will use bulkload very 
> frequently,because bulkload bypass write path, and will not generate WAL, so 
> the data will not be replicated to backup cluster. It's inappropriate to 
> bukload twice both on active cluster and backup cluster. So i advise do some 
> modification to bulkload feature to enable bukload to both active cluster and 
> backup cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14664) Master failover issue: Backup master is unable to start if active master is killed and started in short time interval

2015-11-12 Thread Samir Ahmic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samir Ahmic updated HBASE-14664:

Status: Open  (was: Patch Available)

Removing "/hbase/meta-region-server" to avoid backup master startup failure 
will cause issues in recovery process. I will try to find other solution for 
this issue. 
Simple workaround  is to configure "hbase.balancer.tablesOnMaster" to "none" . 

> Master failover issue: Backup master is unable to start if active master is 
> killed and started in short time interval
> -
>
> Key: HBASE-14664
> URL: https://issues.apache.org/jira/browse/HBASE-14664
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Samir Ahmic
>Assignee: Samir Ahmic
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-14664.patch, HBASE-14664.patch
>
>
> I notice this issue while running IntegrationTestDDLMasterFailover, it can be 
> simply reproduced by executing this on active master (tested on two masters + 
> 3rs cluster setup)
> {code}
> $ kill -9 master_pid; hbase-daemon.sh  start master
> {code} 
> Logs show that new active master is trying to locate hbase:meta table on 
> restarted active master
> {code}
> 2015-10-21 19:28:20,804 INFO  [hnode2:16000.activeMasterManager] 
> zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at 
> address=hnode1,16000,1445447051681, 
> exception=org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is 
> not running yet
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegionInfo(RSRpcServices.java:1330)
> at 
> org.apache.hadoop.hbase.master.MasterRpcServices.getRegionInfo(MasterRpcServices.java:1525)
> at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22233)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.HMaster: Meta was in transition on hnode1,16000,1445447051681
> 2015-10-21 19:28:20,805 INFO  [hnode2:16000.activeMasterManager] 
> master.AssignmentManager: Processing {1588230740 state=OPEN, 
> ts=1445448500598, server=hnode1,16000,1445447051681
> {code}
>  and because of above master is unable to read hbase:meta table:
> {code}
> 2015-10-21 19:28:49,429 INFO  [hconnection-0x6e9cebcc-shared--pool6-t1] 
> client.AsyncProcess: #2, table=hbase:meta, attempt=10/351 failed=1ops, last 
> exception: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: 
> org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not 
> running yet
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkOpen(RSRpcServices.java:1092)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2083)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32462)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> which cause master is unable to complete start. 
> I have also notices that in this case value of /hbase/meta-region-server 
> znode is always pointing on restarted active master (hnode1 in my cluster ).
> I was able to workaround this issue by repeating same scenario with following:
> {code}
> $ kill -9 master_pid; hbase zkcli rmr /hbase/meta-region-server; 
> hbase-daemon.sh start master
> {code}
> So issue is probably caused by staled value in /hbase/meta-region-server 
> znode. I will try to create patch based on above.   
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14793) Allow limiting size of block into L1 block cache.

2015-11-12 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002720#comment-15002720
 ] 

Andrew Purtell commented on HBASE-14793:


bq. Emm, what about a metric to count the blocks which couldn't be cached 
because of this new threshold. Can that be useful? Maybe for tuning? what do 
you think?

+1, I'd like that

> Allow limiting size of block into L1 block cache.
> -
>
> Key: HBASE-14793
> URL: https://issues.apache.org/jira/browse/HBASE-14793
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14793-v1.patch, HBASE-14793.patch
>
>
> G1GC does really badly with long lived large objects. Lets allow limiting the 
> size of a block that can be kept in the block cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only

2015-11-12 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003450#comment-15003450
 ] 

Enis Soztutar commented on HBASE-14790:
---

Sounds like a good idea to not have the pipeline recovery and single writer 
assumption (though we are calling sync() from different sync threads). How are 
we gonna maintain a custom OS for HDFS. Shouldn't this go directly to Hdfs? 
[~jingzhao], [~wheat9] FYI. 

> Implement a new DFSOutputStream for logging WAL only
> 
>
> Key: HBASE-14790
> URL: https://issues.apache.org/jira/browse/HBASE-14790
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all 
> purposes. But in fact, we do not need most of the features if we only want to 
> log WAL. For example, we do not need pipeline recovery since we could just 
> close the old logger and open a new one. And also, we do not need to write 
> multiple blocks since we could also open a new logger if the old file is too 
> large.
> And the most important thing is that, it is hard to handle all the corner 
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when 
> using original DFSOutputStream due to its complicated logic. And the 
> complicated logic also force us to use some magical tricks to increase 
> performance. For example, we need to use multiple threads to call {{hflush}} 
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when 
> logging WAL. For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14805) status should show the master in shell

2015-11-12 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003461#comment-15003461
 ] 

Enis Soztutar commented on HBASE-14805:
---

Thanks [~romil.choksi] for the pointer. Yeah, {{zk_dump}} gives a dump of the 
cluster from zookeeper data. The 'status' command works on top of the 
{{ClusterStatus}} objects obtained from the master. 

> status should show the master in shell
> --
>
> Key: HBASE-14805
> URL: https://issues.apache.org/jira/browse/HBASE-14805
> Project: HBase
>  Issue Type: Improvement
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: hbase-14805_v1.patch
>
>
> {{status 'simple'}} or {{'detailed'}} only shows the regionservers and 
> regions, but not the active master. Actually, there is no way to know about 
> the active masters from the shell it seems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14172) Upgrade existing thrift binding using thrift 0.9.2 compiler.

2015-11-12 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003509#comment-15003509
 ] 

Josh Elser commented on HBASE-14172:


bq. Do we know the compatibility story?

Thrift traditionally guarantees wire compatibility between versions. The 
generated code by thrift is not necessarily backwards compatible. I assume this 
is still the case for 0.9.3. Of course, like all software, this is to be taken 
with a grain of salt as to whether or not this was actually achieved in 0.9.3 
without sufficient testing (we've historically had some pain in Accumulo when 
upgrading Thrift, although often due to performance issues rather than wire 
compatibility). I personally haven't yet looked at what 0.9.3 has over the 
previous versions.

> Upgrade existing thrift binding using thrift 0.9.2 compiler.
> 
>
> Key: HBASE-14172
> URL: https://issues.apache.org/jira/browse/HBASE-14172
> Project: HBase
>  Issue Type: Improvement
>Reporter: Srikanth Srungarapu
>Priority: Minor
> Attachments: HBASE-14172-branch-1.patch, HBASE-14172.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14799) Commons-collections object deserialization remote command execution vulnerability

2015-11-12 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14799:
---
Attachment: HBASE-14799-0.94.patch

Fix for TestHbaseObjectWritable#testGetNextObjectCode

> Commons-collections object deserialization remote command execution 
> vulnerability 
> --
>
> Key: HBASE-14799
> URL: https://issues.apache.org/jira/browse/HBASE-14799
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Critical
> Fix For: 0.94.28, 0.98.17
>
> Attachments: HBASE-14799-0.94.patch, HBASE-14799-0.94.patch, 
> HBASE-14799-0.94.patch, HBASE-14799-0.98.patch
>
>
> Read: 
> http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/
> TL;DR: If you have commons-collections on your classpath and accept and 
> process Java object serialization data, then you probably have an exploitable 
> remote command execution vulnerability. 
> 0.94 and earlier HBase releases are vulnerable because we might read in and 
> rehydrate serialized Java objects out of RPC packet data in 
> HbaseObjectWritable using ObjectInputStream#readObject (see 
> https://hbase.apache.org/0.94/xref/org/apache/hadoop/hbase/io/HbaseObjectWritable.html#714)
>  and we have commons-collections on the classpath on the server.
> 0.98 also carries some limited exposure to this problem through inclusion of 
> backwards compatible deserialization code in 
> HbaseObjectWritableFor96Migration. This is used by the 0.94-to-0.98 migration 
> utility, and by the AccessController when reading permissions from the ACL 
> table serialized in legacy format by 0.94. Unprivileged users cannot run the 
> tool nor access the ACL table.
> Unprivileged users can however attack a 0.94 installation. An attacker might 
> be able to use the method discussed on that blog post to capture valid HBase 
> RPC payloads for 0.94 and prior versions, rewrite them to embed an exploit, 
> and replay them to trigger a remote command execution with the privileges of 
> the account under which the HBase RegionServer daemon is running.
> We need to make a patch release of 0.94 that changes HbaseObjectWritable to 
> disallow processing of random Java object serializations. This will be a 
> compatibility break that might affect old style coprocessors, which quite 
> possibly may rely on this catch-all in HbaseObjectWritable for custom object 
> (de)serialization. We can introduce a new configuration setting, 
> "hbase.allow.legacy.object.serialization", defaulting to false.
> To be thorough, we can also use the new configuration setting  
> "hbase.allow.legacy.object.serialization" (defaulting to false) in 0.98 to 
> prevent the AccessController from falling back to the vulnerable legacy code. 
> This turns out to not affect the ability to migrate permissions because 
> TablePermission implements Writable, which is safe, not Serializable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14771) RpcServer.getRemoteAddress always returns null.

2015-11-12 Thread Abhishek Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar updated HBASE-14771:
---
Attachment: HBASE-14771-V2.patch

i was on leave for two days:), updated patch attached, thanks.

> RpcServer.getRemoteAddress always returns null.
> ---
>
> Key: HBASE-14771
> URL: https://issues.apache.org/jira/browse/HBASE-14771
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 1.2.0
>Reporter: Abhishek Kumar
>Assignee: Abhishek Kumar
>Priority: Minor
> Attachments: HBASE-14771-V1.patch, HBASE-14771-V2.patch, 
> HBASE-14771.patch
>
>
> RpcServer.getRemoteAddress always returns null, because Call object is 
> getting initialized with null.This seems to be happening because of using 
> RpcServer.getRemoteIp() in  Call object constructor before RpcServer thread 
> local 'CurCall' being set in CallRunner.run method:
> {noformat}
> // --- RpcServer.java ---
> protected void processRequest(byte[] buf) throws IOException, 
> InterruptedException {
>  .
> // Call object getting initialized here with address 
> // obtained from RpcServer.getRemoteIp()
> Call call = new Call(id, this.service, md, header, param, cellScanner, this, 
> responder,
>   totalRequestSize, traceInfo, RpcServer.getRemoteIp());
>   scheduler.dispatch(new CallRunner(RpcServer.this, call));
>  }
> // getRemoteIp method gets address from threadlocal 'CurCall' which 
> // gets set in CallRunner.run and calling it before this as in above case, 
> will return null
> // --- CallRunner.java ---
> public void run() {
>   .   
>   Pair resultPair = null;
>   RpcServer.CurCall.set(call);
>   ..
> }
> // Using 'this.addr' in place of getRemoteIp method in RpcServer.java seems 
> to be fixing this issue
> Call call = new Call(id, this.service, md, header, param, cellScanner, this, 
> responder,
>   totalRequestSize, traceInfo, this.addr);
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14806) Missing sources.jar for several modules when building HBase

2015-11-12 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-14806:
-

 Summary: Missing sources.jar for several modules when building 
HBase
 Key: HBASE-14806
 URL: https://issues.apache.org/jira/browse/HBASE-14806
 Project: HBase
  Issue Type: Bug
Reporter: Duo Zhang


Introduced by HBASE-14085. The problem is, for example, in 
hbase-common/pom.xml, we have
{code:title=pom.xml}

  org.apache.maven.plugins
  maven-source-plugin
  
true

  src/main/java
  ${project.build.outputDirectory}/META-INF

  

{code}
But in fact, the path inside {{}} tag is relative to source 
directories, not the project directory. So the maven-source-plugin always end 
with
{noformat}
No sources in project. Archive not created.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14805) status should show the master in shell

2015-11-12 Thread Enis Soztutar (JIRA)
Enis Soztutar created HBASE-14805:
-

 Summary: status should show the master in shell
 Key: HBASE-14805
 URL: https://issues.apache.org/jira/browse/HBASE-14805
 Project: HBase
  Issue Type: Improvement
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 1.2.0, 1.3.0


{{status 'simple'}} or {{'detailed'}} only shows the regionservers and regions, 
but not the active master. Actually, there is no way to know about the active 
masters from the shell it seems. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14790) Implement a new DFSOutputStream for logging WAL only

2015-11-12 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003469#comment-15003469
 ] 

Duo Zhang commented on HBASE-14790:
---

The new implementation will be event-driven which means we could use a callback 
when sync so we do not need multiple sync threads any more.

So I think it is a big project if we want to implement this in HDFS since we 
introduce a new style of API. Even though we could only implement a new writer 
first, but I think it is still very important to design a good general 
interface first.

So I think we could implement this in HBase first and collect some perf 
results. Then we could start talking about move it into HDFS.

Thanks.

> Implement a new DFSOutputStream for logging WAL only
> 
>
> Key: HBASE-14790
> URL: https://issues.apache.org/jira/browse/HBASE-14790
> Project: HBase
>  Issue Type: Improvement
>Reporter: Duo Zhang
>
> The original {{DFSOutputStream}} is very powerful and aims to serve all 
> purposes. But in fact, we do not need most of the features if we only want to 
> log WAL. For example, we do not need pipeline recovery since we could just 
> close the old logger and open a new one. And also, we do not need to write 
> multiple blocks since we could also open a new logger if the old file is too 
> large.
> And the most important thing is that, it is hard to handle all the corner 
> cases to avoid data loss or data inconsistency(such as HBASE-14004) when 
> using original DFSOutputStream due to its complicated logic. And the 
> complicated logic also force us to use some magical tricks to increase 
> performance. For example, we need to use multiple threads to call {{hflush}} 
> when logging, and now we use 5 threads. But why 5 not 10 or 100?
> So here, I propose we should implement our own {{DFSOutputStream}} when 
> logging WAL. For correctness, and also for performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14172) Upgrade existing thrift binding using thrift 0.9.2 compiler.

2015-11-12 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003465#comment-15003465
 ] 

Enis Soztutar commented on HBASE-14172:
---

Do we know the compatibility story? 

> Upgrade existing thrift binding using thrift 0.9.2 compiler.
> 
>
> Key: HBASE-14172
> URL: https://issues.apache.org/jira/browse/HBASE-14172
> Project: HBase
>  Issue Type: Improvement
>Reporter: Srikanth Srungarapu
>Priority: Minor
> Attachments: HBASE-14172-branch-1.patch, HBASE-14172.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14498) Master stuck in infinite loop when all Zookeeper servers are unreachable

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003418#comment-15003418
 ] 

Hudson commented on HBASE-14498:


SUCCESS: Integrated in HBase-1.3 #364 (See 
[https://builds.apache.org/job/HBase-1.3/364/])
HBASE-14498 Revert for on-going review (tedyu: rev 
3e551ea538dc1f9dd5ae0ce53900c1e57a53acdb)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* 
hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperWatcher.java


> Master stuck in infinite loop when all Zookeeper servers are unreachable
> 
>
> Key: HBASE-14498
> URL: https://issues.apache.org/jira/browse/HBASE-14498
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Pankaj Kumar
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4
>
> Attachments: HBASE-14498-V2.patch, HBASE-14498-V3.patch, 
> HBASE-14498-V4.patch, HBASE-14498.patch
>
>
> We met a weird scenario in our production environment.
> In a HA cluster,
> > Active Master (HM1) is not able to connect to any Zookeeper server (due to 
> > N/w breakdown on master machine network with Zookeeper servers).
> {code}
> 2015-09-26 15:24:47,508 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 33463ms for sessionid 0x104576b8dda0002, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:24:47,877 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:48,236 INFO [main-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:49,879 WARN 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:49,879 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-IP1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:24:50,238 WARN [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:50,238 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-Host1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:25:17,470 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 30023ms for sessionid 0x2045762cc710006, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:25:17,571 WARN [master/HM1-Host/HM1-IP:16000] 
> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, 
> quorum=ZK-Host:2181,ZK-Host1:2181,ZK-Host2:2181, 
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2015-09-26 15:25:17,872 INFO [main-SendThread(ZK-Host:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host 2181
> 2015-09-26 15:25:19,874 WARN [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host
> 2015-09-26 15:25:19,874 INFO [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server ZK-Host/ZK-IP:2181. 
> Will not attempt to authenticate using SASL (unknown error)
> {code}
> > Since HM1 was not able to connect to any ZK, so session timeout didnt 
> > happen at Zookeeper server side and HM1 didnt abort.
> > On Zookeeper session timeout standby master (HM2) registered himself as an 
> > active master. 
> > HM2 is keep on waiting for region server to report him as part of active 
> > master intialization.
> {noformat} 
> 2015-09-26 15:24:44,928 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 0 ms, 
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval 
> of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> ---
> ---
> 2015-09-26 15:32:50,841 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 483913 
> ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, 
> interval of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> {noformat}
> > At other end, region servers are reporting to HM1 on 3 sec interval. Here 
> > region server retrieve master 

[jira] [Commented] (HBASE-14498) Master stuck in infinite loop when all Zookeeper servers are unreachable

2015-11-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003414#comment-15003414
 ] 

Hudson commented on HBASE-14498:


FAILURE: Integrated in HBase-1.2 #367 (See 
[https://builds.apache.org/job/HBase-1.2/367/])
HBASE-14498 Revert for on-going review (tedyu: rev 
1db8abf707b760f588932a9ca137c4d9d96e3ab1)
* 
hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* 
hbase-client/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperWatcher.java


> Master stuck in infinite loop when all Zookeeper servers are unreachable
> 
>
> Key: HBASE-14498
> URL: https://issues.apache.org/jira/browse/HBASE-14498
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Pankaj Kumar
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4
>
> Attachments: HBASE-14498-V2.patch, HBASE-14498-V3.patch, 
> HBASE-14498-V4.patch, HBASE-14498.patch
>
>
> We met a weird scenario in our production environment.
> In a HA cluster,
> > Active Master (HM1) is not able to connect to any Zookeeper server (due to 
> > N/w breakdown on master machine network with Zookeeper servers).
> {code}
> 2015-09-26 15:24:47,508 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 33463ms for sessionid 0x104576b8dda0002, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:24:47,877 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:48,236 INFO [main-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:49,879 WARN 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:49,879 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-IP1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:24:50,238 WARN [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:50,238 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-Host1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:25:17,470 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 30023ms for sessionid 0x2045762cc710006, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:25:17,571 WARN [master/HM1-Host/HM1-IP:16000] 
> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, 
> quorum=ZK-Host:2181,ZK-Host1:2181,ZK-Host2:2181, 
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2015-09-26 15:25:17,872 INFO [main-SendThread(ZK-Host:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host 2181
> 2015-09-26 15:25:19,874 WARN [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host
> 2015-09-26 15:25:19,874 INFO [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server ZK-Host/ZK-IP:2181. 
> Will not attempt to authenticate using SASL (unknown error)
> {code}
> > Since HM1 was not able to connect to any ZK, so session timeout didnt 
> > happen at Zookeeper server side and HM1 didnt abort.
> > On Zookeeper session timeout standby master (HM2) registered himself as an 
> > active master. 
> > HM2 is keep on waiting for region server to report him as part of active 
> > master intialization.
> {noformat} 
> 2015-09-26 15:24:44,928 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 0 ms, 
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval 
> of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> ---
> ---
> 2015-09-26 15:32:50,841 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 483913 
> ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, 
> interval of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> {noformat}
> > At other end, region servers are reporting to HM1 on 3 sec interval. Here 
> > region server retrieve master 

[jira] [Updated] (HBASE-14805) status should show the master in shell

2015-11-12 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-14805:
--
Attachment: hbase-14805_v1.patch

Simple patch: 

{code}
hbase(main):008:0* status 'simple' 
active master:  os-enis-hbase-nov-10-6.novalocal:16000 1447370319440
1 backup masters
os-enis-hbase-nov-10-3.novalocal:16000 1447378876173
7 live servers
os-enis-hbase-nov-10-6.novalocal:16020 1447375439730
requestsPerSecond=0.0, numberOfOnlineRegions=6, usedHeapMB=122, 
maxHeapMB=2007, numberOfStores=66, numberOfStorefiles=10, 
storefileUncompressedSizeMB=6733, storefileSizeMB=4652, 
compressionRatio=0.6909, memstoreSizeMB=0, storefileIndexSizeMB=0, 
readRequestsCount=41780440, writeRequestsCount=0, rootIndexSizeKB=4, 
totalStaticIndexSizeKB=6753, totalStaticBloomSizeKB=69156, 
totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN, 
coprocessors=[SecureBulkLoadEndpoint]
os-enis-hbase-nov-10-7.novalocal:16020 1447374876274
..
{code}

> status should show the master in shell
> --
>
> Key: HBASE-14805
> URL: https://issues.apache.org/jira/browse/HBASE-14805
> Project: HBase
>  Issue Type: Improvement
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: hbase-14805_v1.patch
>
>
> {{status 'simple'}} or {{'detailed'}} only shows the regionservers and 
> regions, but not the active master. Actually, there is no way to know about 
> the active masters from the shell it seems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14805) status should show the master in shell

2015-11-12 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-14805:
--
Fix Version/s: 2.0.0
   Status: Patch Available  (was: Open)

> status should show the master in shell
> --
>
> Key: HBASE-14805
> URL: https://issues.apache.org/jira/browse/HBASE-14805
> Project: HBase
>  Issue Type: Improvement
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: hbase-14805_v1.patch
>
>
> {{status 'simple'}} or {{'detailed'}} only shows the regionservers and 
> regions, but not the active master. Actually, there is no way to know about 
> the active masters from the shell it seems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14223) Meta WALs are not cleared if meta region was closed and RS aborts

2015-11-12 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-14223:
--
Attachment: hbase-14223_v2-branch-1.patch

Updated branch-1 patch. This is the same as v1, but I've added an CM action for 
moving the meta table. This action helps with testing conditions if meta is 
moved as a normal operation (instead of meta server being killed).

I've run ITBLL writing 450M records without any issues. 

The master patch is similar, but it does not have the test since in master meta 
is always colocated. 



> Meta WALs are not cleared if meta region was closed and RS aborts
> -
>
> Key: HBASE-14223
> URL: https://issues.apache.org/jira/browse/HBASE-14223
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4, 1.0.4
>
> Attachments: HBASE-14223logs, hbase-14223_v0.patch, 
> hbase-14223_v1-branch-1.patch, hbase-14223_v2-branch-1.patch
>
>
> When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed. 
> The last WAL file just sits there in the RS WAL directory. If RS stops 
> gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for 
> meta is not cleaned. It is also not split (which is correct) since master 
> determines that the RS no longer hosts meta at the time of RS abort. 
> From a cluster after running ITBLL with CM, I see a lot of {{-splitting}} 
> directories left uncleaned: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs
> Found 31 items
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 01:14 
> /apps/hbase/data/WALs/hregion-58203265
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 07:54 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 09:28 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 10:01 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting
> ...
> {code}
> The directories contain WALs from meta: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> Found 2 items
> -rw-r--r--   3 hbase hadoop 201608 2015-06-05 03:15 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
> -rw-r--r--   3 hbase hadoop  44420 2015-06-05 04:36 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> The RS hosted the meta region for some time: 
> {code}
> 2015-06-05 03:14:28,692 INFO  [PostOpenDeployTasks:1588230740] 
> zookeeper.MetaTableLocator: Setting hbase:meta region location in ZooKeeper 
> as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285
> ...
> 2015-06-05 03:15:17,302 INFO  
> [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion: Closed 
> hbase:meta,,1.1588230740
> {code}
> In between, a WAL is created: 
> {code}
> 2015-06-05 03:15:11,707 INFO  
> [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller] wal.FSHLog: 
> Rolled WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
>  with entries=385, filesize=196.88 KB; new WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> When CM killed the region server later master did not see these WAL files: 
> {code}
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075 
> INFO  [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] 
> master.SplitLogManager: started splitting 2 logs in 
> [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting]
>  for [os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285]
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:47,300 
> INFO  [main-EventThread] wal.WALSplitter: Archived processed log 
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
>  to 
> 

  1   2   >