date:20130611

JOB M THOMAS created HBASE-8728:
---

 Summary: HBase table schema 
 Key: HBASE-8728
 URL: https://issues.apache.org/jira/browse/HBASE-8728
 Project: HBase
  Issue Type: Task
Reporter: JOB M THOMAS




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8728) HBase table schema


 [ 
https://issues.apache.org/jira/browse/HBASE-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JOB M THOMAS updated HBASE-8728:


Description: 


Hi friends,

This is my data in a file.

125829086 Llandovery 501
125829087 Tamil 461
125829088 throbless 736
125829089 pondside 195
125829090 oxyterpene 791
125829091 subofficer 416
125829092 paleornithology 734
125829093 kenno 80
125829094 oratorship 565
125829095 Cimmerianism 499
125829096 jharal 985
125829097 genii 330
125829098 qualminess 340
125829099 blurredness 57
125829100 topline 803


I have to create Hbase table for this. you can use the first number as row key 
and second and third fields as two columns in hbase.

please help me to create  the table?

I have serched a lot in google, but not found any soluton to create a table 
with one column family and two columns under it

please help me...








 HBase table schema 
 ---

 Key: HBASE-8728
 URL: https://issues.apache.org/jira/browse/HBASE-8728
 Project: HBase
  Issue Type: Task
Reporter: JOB M THOMAS

 Hi friends,
 This is my data in a file.
 125829086 Llandovery 501
 125829087 Tamil 461
 125829088 throbless 736
 125829089 pondside 195
 125829090 oxyterpene 791
 125829091 subofficer 416
 125829092 paleornithology 734
 125829093 kenno 80
 125829094 oratorship 565
 125829095 Cimmerianism 499
 125829096 jharal 985
 125829097 genii 330
 125829098 qualminess 340
 125829099 blurredness 57
 125829100 topline 803
 I have to create Hbase table for this. you can use the first number as row 
 key and second and third fields as two columns in hbase.
 please help me to create  the table?
 I have serched a lot in google, but not found any soluton to create a table 
 with one column family and two columns under it
 please help me...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8728) HBase table schema

[
https://issues.apache.org/jira/browse/HBASE-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680258#comment-13680258
]

Anoop Sam John commented on HBASE-8728:
---

Pls have a look at ImportTSV tool which supports your need. And pls dont raise
tickets in JIRA for this kind of help.. You can send mails in user@ mailing
list and guys there can help you out.

HBase table schema
---

Key: HBASE-8728
URL: https://issues.apache.org/jira/browse/HBASE-8728
Project: HBase
Issue Type: Task
Reporter: JOB M THOMAS

Hi friends,
This is my data in a file.
125829086 Llandovery 501
125829087 Tamil 461
125829088 throbless 736
125829089 pondside 195
125829090 oxyterpene 791
125829091 subofficer 416
125829092 paleornithology 734
125829093 kenno 80
125829094 oratorship 565
125829095 Cimmerianism 499
125829096 jharal 985
125829097 genii 330
125829098 qualminess 340
125829099 blurredness 57
125829100 topline 803
I have to create Hbase table for this. you can use the first number as row
key and second and third fields as two columns in hbase.
please help me to create the table?
I have serched a lot in google, but not found any soluton to create a table
with one column family and two columns under it
please help me...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-8729) distributedLogReplay may hang during chained region server failure

Jeffrey Zhong created HBASE-8729:


 Summary: distributedLogReplay may hang during chained region 
server failure
 Key: HBASE-8729
 URL: https://issues.apache.org/jira/browse/HBASE-8729
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.2


In a test, half cluster(in terms of region servers) was down and some log 
replay had incurred chained RS failures(receiving RS of a log replay failed 
again). 

Since by default, we only allow 3 concurrent SSH handlers(controlled by 
{code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads,
 3));{code}).

If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving 
RS fails again then logReplay will hang because regions of the newly failed RS 
can't be re-assigned to another live RS(no ssh handler will be processed due to 
max threads setting) and existing log replay will keep routing replay traffic 
to the dead RS.

The fix is to submit logReplay work into a separate type of executor queue in 
order not to block SSH region assignment so that logReplay can route traffic to 
a live RS after retries and move forward. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-8728) HBase table schema


 [ 
https://issues.apache.org/jira/browse/HBASE-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-8728.
---

Resolution: Invalid

 HBase table schema 
 ---

 Key: HBASE-8728
 URL: https://issues.apache.org/jira/browse/HBASE-8728
 Project: HBase
  Issue Type: Task
Reporter: JOB M THOMAS

 Hi friends,
 This is my data in a file.
 125829086 Llandovery 501
 125829087 Tamil 461
 125829088 throbless 736
 125829089 pondside 195
 125829090 oxyterpene 791
 125829091 subofficer 416
 125829092 paleornithology 734
 125829093 kenno 80
 125829094 oratorship 565
 125829095 Cimmerianism 499
 125829096 jharal 985
 125829097 genii 330
 125829098 qualminess 340
 125829099 blurredness 57
 125829100 topline 803
 I have to create Hbase table for this. you can use the first number as row 
 key and second and third fields as two columns in hbase.
 please help me to create  the table?
 I have serched a lot in google, but not found any soluton to create a table 
 with one column family and two columns under it
 please help me...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8728) HBase table schema


[ 
https://issues.apache.org/jira/browse/HBASE-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680262#comment-13680262
 ] 

JOB M THOMAS commented on HBASE-8728:
-

how to join user@mailing
how to send my doublts?





 HBase table schema 
 ---

 Key: HBASE-8728
 URL: https://issues.apache.org/jira/browse/HBASE-8728
 Project: HBase
  Issue Type: Task
Reporter: JOB M THOMAS

 Hi friends,
 This is my data in a file.
 125829086 Llandovery 501
 125829087 Tamil 461
 125829088 throbless 736
 125829089 pondside 195
 125829090 oxyterpene 791
 125829091 subofficer 416
 125829092 paleornithology 734
 125829093 kenno 80
 125829094 oratorship 565
 125829095 Cimmerianism 499
 125829096 jharal 985
 125829097 genii 330
 125829098 qualminess 340
 125829099 blurredness 57
 125829100 topline 803
 I have to create Hbase table for this. you can use the first number as row 
 key and second and third fields as two columns in hbase.
 please help me to create  the table?
 I have serched a lot in google, but not found any soluton to create a table 
 with one column family and two columns under it
 please help me...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8729) distributedLogReplay may hang during chained region server failure


 [ 
https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-8729:
-

Status: Patch Available  (was: Open)

 distributedLogReplay may hang during chained region server failure
 --

 Key: HBASE-8729
 URL: https://issues.apache.org/jira/browse/HBASE-8729
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.2

 Attachments: hbase-8729.patch


 In a test, half cluster(in terms of region servers) was down and some log 
 replay had incurred chained RS failures(receiving RS of a log replay failed 
 again). 
 Since by default, we only allow 3 concurrent SSH handlers(controlled by 
 {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads,
  3));{code}).
 If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving 
 RS fails again then logReplay will hang because regions of the newly failed 
 RS can't be re-assigned to another live RS(no ssh handler will be processed 
 due to max threads setting) and existing log replay will keep routing replay 
 traffic to the dead RS.
 The fix is to submit logReplay work into a separate type of executor queue in 
 order not to block SSH region assignment so that logReplay can route traffic 
 to a live RS after retries and move forward. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8729) distributedLogReplay may hang during chained region server failure


 [ 
https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-8729:
-

Attachment: hbase-8729.patch

 distributedLogReplay may hang during chained region server failure
 --

 Key: HBASE-8729
 URL: https://issues.apache.org/jira/browse/HBASE-8729
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.2

 Attachments: hbase-8729.patch


 In a test, half cluster(in terms of region servers) was down and some log 
 replay had incurred chained RS failures(receiving RS of a log replay failed 
 again). 
 Since by default, we only allow 3 concurrent SSH handlers(controlled by 
 {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads,
  3));{code}).
 If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving 
 RS fails again then logReplay will hang because regions of the newly failed 
 RS can't be re-assigned to another live RS(no ssh handler will be processed 
 due to max threads setting) and existing log replay will keep routing replay 
 traffic to the dead RS.
 The fix is to submit logReplay work into a separate type of executor queue in 
 order not to block SSH region assignment so that logReplay can route traffic 
 to a live RS after retries and move forward. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8728) HBase table schema

[
https://issues.apache.org/jira/browse/HBASE-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680264#comment-13680264
]

Anoop Sam John commented on HBASE-8728:
---

U can send a mail to user-subscr...@hbase.apache.org to subscribe to this
list.. Just a blank mail is enough an it will add you mail id to the users
list. You will get mails..

For ur doubts, u can send email to this id - u...@hbase.apache.org

Thanks

HBase table schema
---

Key: HBASE-8728
URL: https://issues.apache.org/jira/browse/HBASE-8728
Project: HBase
Issue Type: Task
Reporter: JOB M THOMAS

[jira] [Commented] (HBASE-8728) HBase table schema


[ 
https://issues.apache.org/jira/browse/HBASE-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680283#comment-13680283
 ] 

JOB M THOMAS commented on HBASE-8728:
-

thanks anoop.


 HBase table schema 
 ---

 Key: HBASE-8728
 URL: https://issues.apache.org/jira/browse/HBASE-8728
 Project: HBase
  Issue Type: Task
Reporter: JOB M THOMAS

 Hi friends,
 This is my data in a file.
 125829086 Llandovery 501
 125829087 Tamil 461
 125829088 throbless 736
 125829089 pondside 195
 125829090 oxyterpene 791
 125829091 subofficer 416
 125829092 paleornithology 734
 125829093 kenno 80
 125829094 oratorship 565
 125829095 Cimmerianism 499
 125829096 jharal 985
 125829097 genii 330
 125829098 qualminess 340
 125829099 blurredness 57
 125829100 topline 803
 I have to create Hbase table for this. you can use the first number as row 
 key and second and third fields as two columns in hbase.
 please help me to create  the table?
 I have serched a lot in google, but not found any soluton to create a table 
 with one column family and two columns under it
 please help me...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8667) Master and Regionserver not able to communicate if both bound to different network interfaces on the same machine.

2013-06-11 Thread rajeshbabu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680316#comment-13680316
 ] 

rajeshbabu commented on HBASE-8667:
---

[~stack]
bq. Our workaround was having the regionserver take the name the master 
proffered after checkin. This seemed to get rid of a an all-to-common problem 
seen in hbase deploys
Then we need to initialize rpc server in RS with the hostname recieved from 
master after checkin right? Otherwise we will have this issue.

 Master and Regionserver not able to communicate if both bound to different 
 network interfaces on the same machine.
 --

 Key: HBASE-8667
 URL: https://issues.apache.org/jira/browse/HBASE-8667
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Reporter: rajeshbabu
 Fix For: 0.98.0, 0.95.2, 0.94.9

 Attachments: HBASE-8667_Trunk.patch, HBASE-8667_Trunk-V2.patch


 While testing HBASE-8640 fix found that master and regionserver running on 
 different interfaces are not communicating properly.
 I have two interfaces 1) lo 2) eth0 in my machine and default hostname 
 interface is lo.
 I have configured master ipc address to ip of eth0 interface.
 Started master and regionserver on the same machine.
 1) master rpc server bound to eth0 and RS rpc server bound to lo
 2) Since rpc client is not binding to any ip address, when RS is reporting RS 
 startup its getting registered with eth0 ip address(but actually it should 
 register localhost)
 Here are RS logs:
 {code}
 2013-05-31 06:05:28,608 WARN  [regionserver60020] 
 org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; 
 sleeping and then retrying.
 2013-05-31 06:05:31,609 INFO  [regionserver60020] 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to 
 Master server at 192.168.0.100,6,1369960497008
 2013-05-31 06:05:31,609 INFO  [regionserver60020] 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at 
 192.168.0.100,6,1369960497008 that we are up with port=60020, 
 startcode=1369960502544
 2013-05-31 06:05:31,618 DEBUG [regionserver60020] 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Config from master: 
 hbase.rootdir=hdfs://localhost:2851/hbase
 2013-05-31 06:05:31,618 DEBUG [regionserver60020] 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Config from master: 
 fs.default.name=hdfs://localhost:2851
 2013-05-31 06:05:31,618 INFO  [regionserver60020] 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us a 
 different hostname to use; was=localhost, but now=192.168.0.100
 {code}
 Here are master logs:
 {code}
 2013-05-31 06:05:31,615 INFO  [IPC Server handler 9 on 6] 
 org.apache.hadoop.hbase.master.ServerManager: Registering 
 server=192.168.0.100,60020,1369960502544
 {code}
 Since master has wrong rpc server address of RS, META is not getting assigned.
 {code}
 2013-05-31 06:05:34,362 DEBUG [master-192.168.0.100,6,1369960497008] 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for .META.,,1.1028785192 so 
 generated a random one; hri=.META.,,1.1028785192, src=, 
 dest=192.168.0.100,60020,1369960502544; 1 (online=1, available=1) available 
 servers, forceNewPlan=false
 -
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 .META.,,1.1028785192 to 192.168.0.100,60020,1369960502544, trying to assign 
 elsewhere instead; try=1 of 10
 java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:549)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:813)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1422)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1315)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1532)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1587)
   at 
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:15039)
   at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:627)
   at

[jira] [Commented] (HBASE-8721) fix for bug that delete can mask puts that happened after the delete was entered

2013-06-11 Thread Feng Honghua (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680323#comment-13680323
]

Feng Honghua commented on HBASE-8721:
-

[~sershe]

If we want to keep the behaviour that delete can mask puts that happened after
the delete, to fix the inconsistency issue caused by major compact, the only
alternative is to keep the delete markers forever, as you said.

But I think the inconsistency issue's root cause is the arguable behaviour that
delete can mask puts that happened after the delete. A more intuitive and more
reasonable behaviour is that a delete can only mask puts happened before it,
and has no impact on puts happened after it. (This behaviour has nothing to do
with another behaviour that timestamp determines which kv survives regarding
version semantic.) And if we choose this adjusted behaviour, we can fix the
inconsistency issue just with the help of mvcc, and collect the delete markers
during major compact as before (no need to keep them forever to fix that
inconsistency)

A obvious, and ridiculous drawback of the behaviour that delete can mask puts
that happened after the delete is that when an end user puts a kv, gets success
response but it turns out that he can't read out that kv just because
someone(maybe this someone is himself, but he can't realize this) ever made a
delete that can mask this kv...this sounds really uncanny and weird.

Turns back to scenarios that timestamp is used as another ordinary dimension
without time semantic, in those cases we declare max(int) for the versions, and
in that scheme timestamp isn't used to control version count but as an ordinary
dimension to locate a cell. And each cell has a single version. So no problem.

I agree we can introduce a config knob to enable the new behaviour.

fix for bug that delete can mask puts that happened after the delete was
entered

Key: HBASE-8721
URL: https://issues.apache.org/jira/browse/HBASE-8721
Project: HBase
Issue Type: Bug
Components: regionserver
Reporter: Feng Honghua
Attachments: HBASE-8721-0.94-V0.patch

this fix aims for bug mentioned in http://hbase.apache.org/book.html 5.8.2.1:
Deletes mask puts, even puts that happened after the delete was entered.
Remember that a delete writes a tombstone, which only disappears after then
next major compaction has run. Suppose you do a delete of everything = T.
After this you do a new put with a timestamp = T. This put, even if it
happened after the delete, will be masked by the delete tombstone. Performing
the put will not fail, but when you do a get you will notice the put did have
no effect. It will start working again after the major compaction has run.
These issues should not be a problem if you use always-increasing versions
for new puts to a row. But they can occur even if you do not care about time:
just do delete and put immediately after each other, and there is some chance
they happen within the same millisecond.

[jira] [Commented] (HBASE-8729) distributedLogReplay may hang during chained region server failure


[ 
https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680420#comment-13680420
 ] 

Ted Yu commented on HBASE-8729:
---

{code}
+
this.executorService.startExecutorService(ExecutorType.MASTER_LOG_REPLAY_OPERATIONS,
+  conf.getInt(hbase.master.executor.serverops.threads, 15));
{code}
Did you intend to introduce a new config param for log replay operations ?

There are several syntax errors in class javadoc for EventHandler.
{code}
+sinkConf.setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 
HConstants.DEFAULT_HBASE_RPC_TIMEOUT / 2);
{code}
Can you add some comment for the above change ?

 distributedLogReplay may hang during chained region server failure
 --

 Key: HBASE-8729
 URL: https://issues.apache.org/jira/browse/HBASE-8729
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.2

 Attachments: hbase-8729.patch


 In a test, half cluster(in terms of region servers) was down and some log 
 replay had incurred chained RS failures(receiving RS of a log replay failed 
 again). 
 Since by default, we only allow 3 concurrent SSH handlers(controlled by 
 {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads,
  3));{code}).
 If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving 
 RS fails again then logReplay will hang because regions of the newly failed 
 RS can't be re-assigned to another live RS(no ssh handler will be processed 
 due to max threads setting) and existing log replay will keep routing replay 
 traffic to the dead RS.
 The fix is to submit logReplay work into a separate type of executor queue in 
 order not to block SSH region assignment so that logReplay can route traffic 
 to a live RS after retries and move forward. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8727) Adding a KijiCon notice in the news section of the site


 [ 
https://issues.apache.org/jira/browse/HBASE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8727:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

We don't add notice for other folks meetups but making an exception in this 
case.   Good on you J.

 Adding a KijiCon notice in the news section of the site
 ---

 Key: HBASE-8727
 URL: https://issues.apache.org/jira/browse/HBASE-8727
 Project: HBase
  Issue Type: Bug
  Components: site
Reporter: Jonathan Natkins
 Attachments: HBASE-8727.diff




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-8729) distributedLogReplay may hang during chained region server failure


[ 
https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680420#comment-13680420
 ] 

Ted Yu edited comment on HBASE-8729 at 6/11/13 4:23 PM:


{code}
+
this.executorService.startExecutorService(ExecutorType.MASTER_LOG_REPLAY_OPERATIONS,
+  conf.getInt(hbase.master.executor.serverops.threads, 15));
{code}
Did you intend to introduce a new config param for log replay operations ?

There are several syntax errors in class javadoc for LogReplayHandler.
{code}
+sinkConf.setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 
HConstants.DEFAULT_HBASE_RPC_TIMEOUT / 2);
{code}
Can you add some comment for the above change ?

  was (Author: yuzhih...@gmail.com):
{code}
+
this.executorService.startExecutorService(ExecutorType.MASTER_LOG_REPLAY_OPERATIONS,
+  conf.getInt(hbase.master.executor.serverops.threads, 15));
{code}
Did you intend to introduce a new config param for log replay operations ?

There are several syntax errors in class javadoc for EventHandler.
{code}
+sinkConf.setInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 
HConstants.DEFAULT_HBASE_RPC_TIMEOUT / 2);
{code}
Can you add some comment for the above change ?
  
 distributedLogReplay may hang during chained region server failure
 --

 Key: HBASE-8729
 URL: https://issues.apache.org/jira/browse/HBASE-8729
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.2

 Attachments: hbase-8729.patch


 In a test, half cluster(in terms of region servers) was down and some log 
 replay had incurred chained RS failures(receiving RS of a log replay failed 
 again). 
 Since by default, we only allow 3 concurrent SSH handlers(controlled by 
 {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads,
  3));{code}).
 If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving 
 RS fails again then logReplay will hang because regions of the newly failed 
 RS can't be re-assigned to another live RS(no ssh handler will be processed 
 due to max threads setting) and existing log replay will keep routing replay 
 traffic to the dead RS.
 The fix is to submit logReplay work into a separate type of executor queue in 
 order not to block SSH region assignment so that logReplay can route traffic 
 to a live RS after retries and move forward. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8729) distributedLogReplay may hang during chained region server failure


 [ 
https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8729:
--

Attachment: 8729-v2.patch

 distributedLogReplay may hang during chained region server failure
 --

 Key: HBASE-8729
 URL: https://issues.apache.org/jira/browse/HBASE-8729
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.2

 Attachments: 8729-v2.patch, hbase-8729.patch


 In a test, half cluster(in terms of region servers) was down and some log 
 replay had incurred chained RS failures(receiving RS of a log replay failed 
 again). 
 Since by default, we only allow 3 concurrent SSH handlers(controlled by 
 {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads,
  3));{code}).
 If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving 
 RS fails again then logReplay will hang because regions of the newly failed 
 RS can't be re-assigned to another live RS(no ssh handler will be processed 
 due to max threads setting) and existing log replay will keep routing replay 
 traffic to the dead RS.
 The fix is to submit logReplay work into a separate type of executor queue in 
 order not to block SSH region assignment so that logReplay can route traffic 
 to a live RS after retries and move forward. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8687) When moving region with region_mover.rb, there is long stack trace for RegionMovedException


[ 
https://issues.apache.org/jira/browse/HBASE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680475#comment-13680475
 ] 

stack commented on HBASE-8687:
--

Did the script keep going?  Select a new location and move the region there?  
Was it moving the region to where the region was already sitting and that was 
why the exception?

 When moving region with region_mover.rb, there is long stack trace for 
 RegionMovedException
 ---

 Key: HBASE-8687
 URL: https://issues.apache.org/jira/browse/HBASE-8687
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Ted Yu
Priority: Minor

 When gracefully rolling restart region servers, I saw the following in output:
 {code}
 2013-06-04 20:44:40,135 DEBUG [main] client.ClientScanner: Scan 
 table=usertable, startRow=user8129671889902366092
 2013-06-04 20:44:40,141 DEBUG [main] client.ClientScanner: Scan table=.META., 
 startRow=usertable,user8129671889902366092,00
 2013-06-04 20:44:40,158 INFO  [main] region_mover: Moving region 
 13168d8b86f1ace9472f60555207a707 (2 of 2) to 
 server=hor8n09.gq1.ygridcore.net,60020,1370378675859
 2013-06-04 20:44:40,405 DEBUG [main] client.ClientScanner: Scan 
 table=usertable, startRow=user8129671889902366092
 2013-06-04 20:44:40,407 WARN  [main] client.ServerCallable: Call exception, 
 tries=0, numRetries=100
 org.apache.hadoop.hbase.exceptions.RegionMovedException: Region moved to: 
 hostname=hor8n09.gq1.ygridcore.net port=60020 startCode=1370378675859. As of 
 locationSeqNum=194375.
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:230)
   at 
 org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:299)
   at 
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:147)
   at 
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:55)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:174)
   at 
 org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:215)
   at 
 org.apache.hadoop.hbase.client.ClientScanner.init(ClientScanner.java:130)
   at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:585)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:450)
   at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:311)
   at 
 org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:59)
   at 
 org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:167)
   at 
 homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__6$RUBY$isSuccessfulScan(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:121)
   at 
 homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535)
   at 
 homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535)
   at 
 org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:201)
   at 
 homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__8$RUBY$move(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:164)
   at 
 homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move:65535)
   at 
 org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:181)
   at 
 org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:69)
   at

[jira] [Commented] (HBASE-8687) When moving region with region_mover.rb, there is long stack trace for RegionMovedException


[ 
https://issues.apache.org/jira/browse/HBASE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680480#comment-13680480
 ] 

Ted Yu commented on HBASE-8687:
---

bq. Did the script keep going?
Yes.

bq. Was it moving the region to where the region was already sitting and that 
was why the exception?
I checked cluster status afterwards: region servers came back up and cluster 
was balanced.
So I think the exception was red herring.

 When moving region with region_mover.rb, there is long stack trace for 
 RegionMovedException
 ---

 Key: HBASE-8687
 URL: https://issues.apache.org/jira/browse/HBASE-8687
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Ted Yu
Priority: Minor

 When gracefully rolling restart region servers, I saw the following in output:
 {code}
 2013-06-04 20:44:40,135 DEBUG [main] client.ClientScanner: Scan 
 table=usertable, startRow=user8129671889902366092
 2013-06-04 20:44:40,141 DEBUG [main] client.ClientScanner: Scan table=.META., 
 startRow=usertable,user8129671889902366092,00
 2013-06-04 20:44:40,158 INFO  [main] region_mover: Moving region 
 13168d8b86f1ace9472f60555207a707 (2 of 2) to 
 server=hor8n09.gq1.ygridcore.net,60020,1370378675859
 2013-06-04 20:44:40,405 DEBUG [main] client.ClientScanner: Scan 
 table=usertable, startRow=user8129671889902366092
 2013-06-04 20:44:40,407 WARN  [main] client.ServerCallable: Call exception, 
 tries=0, numRetries=100
 org.apache.hadoop.hbase.exceptions.RegionMovedException: Region moved to: 
 hostname=hor8n09.gq1.ygridcore.net port=60020 startCode=1370378675859. As of 
 locationSeqNum=194375.
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:230)
   at 
 org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:299)
   at 
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:147)
   at 
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:55)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:174)
   at 
 org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:215)
   at 
 org.apache.hadoop.hbase.client.ClientScanner.init(ClientScanner.java:130)
   at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:585)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:450)
   at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:311)
   at 
 org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:59)
   at 
 org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:167)
   at 
 homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__6$RUBY$isSuccessfulScan(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:121)
   at 
 homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535)
   at 
 homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535)
   at 
 org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:201)
   at 
 homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__8$RUBY$move(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:164)
   at 
 homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move:65535)
   at 
 org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:181)
   at

[jira] [Commented] (HBASE-8687) When moving region with region_mover.rb, there is long stack trace for RegionMovedException


[ 
https://issues.apache.org/jira/browse/HBASE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680477#comment-13680477
 ] 

stack commented on HBASE-8687:
--

Looking in code, the RegionMovedException uses a cache of regions recently 
moved to point out where the region has gone too.  The server above that threw 
the exception was or was not hor8n09?  If it was, then that is odd.  If region 
is still on this server, we should be fixing up the recently moved cache.

 When moving region with region_mover.rb, there is long stack trace for 
 RegionMovedException
 ---

 Key: HBASE-8687
 URL: https://issues.apache.org/jira/browse/HBASE-8687
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Ted Yu
Priority: Minor

 When gracefully rolling restart region servers, I saw the following in output:
 {code}
 2013-06-04 20:44:40,135 DEBUG [main] client.ClientScanner: Scan 
 table=usertable, startRow=user8129671889902366092
 2013-06-04 20:44:40,141 DEBUG [main] client.ClientScanner: Scan table=.META., 
 startRow=usertable,user8129671889902366092,00
 2013-06-04 20:44:40,158 INFO  [main] region_mover: Moving region 
 13168d8b86f1ace9472f60555207a707 (2 of 2) to 
 server=hor8n09.gq1.ygridcore.net,60020,1370378675859
 2013-06-04 20:44:40,405 DEBUG [main] client.ClientScanner: Scan 
 table=usertable, startRow=user8129671889902366092
 2013-06-04 20:44:40,407 WARN  [main] client.ServerCallable: Call exception, 
 tries=0, numRetries=100
 org.apache.hadoop.hbase.exceptions.RegionMovedException: Region moved to: 
 hostname=hor8n09.gq1.ygridcore.net port=60020 startCode=1370378675859. As of 
 locationSeqNum=194375.
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:230)
   at 
 org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:299)
   at 
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:147)
   at 
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:55)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:174)
   at 
 org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:215)
   at 
 org.apache.hadoop.hbase.client.ClientScanner.init(ClientScanner.java:130)
   at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:585)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:450)
   at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:311)
   at 
 org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:59)
   at 
 org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:167)
   at 
 homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__6$RUBY$isSuccessfulScan(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:121)
   at 
 homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535)
   at 
 homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535)
   at 
 org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:201)
   at 
 homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__8$RUBY$move(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:164)
   at 
 homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move:65535)
   at 
 org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:181)
   at

[jira] [Commented] (HBASE-8687) When moving region with region_mover.rb, there is long stack trace for RegionMovedException


[ 
https://issues.apache.org/jira/browse/HBASE-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680495#comment-13680495
 ] 

stack commented on HBASE-8687:
--

The region_mover.rb script does spew a bunch which can disorientate an 
operator.  No harm cleanup up some of it.

 When moving region with region_mover.rb, there is long stack trace for 
 RegionMovedException
 ---

 Key: HBASE-8687
 URL: https://issues.apache.org/jira/browse/HBASE-8687
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Ted Yu
Priority: Minor

 When gracefully rolling restart region servers, I saw the following in output:
 {code}
 2013-06-04 20:44:40,135 DEBUG [main] client.ClientScanner: Scan 
 table=usertable, startRow=user8129671889902366092
 2013-06-04 20:44:40,141 DEBUG [main] client.ClientScanner: Scan table=.META., 
 startRow=usertable,user8129671889902366092,00
 2013-06-04 20:44:40,158 INFO  [main] region_mover: Moving region 
 13168d8b86f1ace9472f60555207a707 (2 of 2) to 
 server=hor8n09.gq1.ygridcore.net,60020,1370378675859
 2013-06-04 20:44:40,405 DEBUG [main] client.ClientScanner: Scan 
 table=usertable, startRow=user8129671889902366092
 2013-06-04 20:44:40,407 WARN  [main] client.ServerCallable: Call exception, 
 tries=0, numRetries=100
 org.apache.hadoop.hbase.exceptions.RegionMovedException: Region moved to: 
 hostname=hor8n09.gq1.ygridcore.net port=60020 startCode=1370378675859. As of 
 locationSeqNum=194375.
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:230)
   at 
 org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:299)
   at 
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:147)
   at 
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:55)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:174)
   at 
 org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:215)
   at 
 org.apache.hadoop.hbase.client.ClientScanner.init(ClientScanner.java:130)
   at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:585)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:450)
   at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:311)
   at 
 org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:59)
   at 
 org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:167)
   at 
 homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__6$RUBY$isSuccessfulScan(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:121)
   at 
 homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535)
   at 
 homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__6$RUBY$isSuccessfulScan:65535)
   at 
 org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:201)
   at 
 homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.method__8$RUBY$move(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:164)
   at 
 homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move.call(homes$hortonzy$hbase_minus_0_dot_95_dot_1$bin$region_mover$method__8$RUBY$move:65535)
   at 
 org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:181)
   at 
 org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:69)
   at 
 homes.hortonzy.hbase_minus_0_dot_95_dot_1.bin.region_mover.block_6$RUBY$__for__(/homes/hortonzy/hbase-0.95.1/bin/region_mover.rb:381)

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680501#comment-13680501
 ] 

Ted Yu commented on HBASE-6295:
---

Putting patch on cluster, I saw a lot of the following in the log:
{code}
2013-06-11 16:51:19,806 INFO  [HBaseWriterThread_11] client.AsyncProcess: won: 
Waiting for number of tasks to be equals or less than 0, currently it's 1
2013-06-11 16:51:19,807 INFO  [HBaseWriterThread_18] client.AsyncProcess: won: 
Waiting for number of tasks to be equals or less than 0, currently it's 1
2013-06-11 16:51:19,807 INFO  [HBaseWriterThread_15] client.AsyncProcess: won: 
Waiting for number of tasks to be equals or less than 0, currently it's 1
{code}
I think the above log should be at TRACE level.

 Possible performance improvement in client batch operations: presplit and 
 send in background
 

 Key: HBASE-6295
 URL: https://issues.apache.org/jira/browse/HBASE-6295
 Project: HBase
  Issue Type: Improvement
  Components: Client, Performance
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
  Labels: noob
 Fix For: 0.98.0

 Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch


 today batch algo is:
 {noformat}
 for Operation o: ListOp{
   add o to todolist
   if todolist  maxsize or o last in list
 split todolist per location
 send split lists to region servers
 clear todolist
 wait
 }
 {noformat}
 We could:
 - create immediately the final object instead of an intermediate array
 - split per location immediately
 - instead of sending when the list as a whole is full, send it when there is 
 enough data for a single location
 It would be:
 {noformat}
 for Operation o: ListOp{
   get location
   add o to todo location.todolist
   if (location.todolist  maxLocationSize)
 send location.todolist to region server 
 clear location.todolist
 // don't wait, continue the loop
 }
 send remaining
 wait
 {noformat}
 It's not trivial to write if you add error management: retried list must be 
 shared with the operations added in the todolist. But it's doable.
 It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8705) RS holding META when restarted in a single node setup may hang infinitely without META assignment


 [ 
https://issues.apache.org/jira/browse/HBASE-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-8705:
--

Attachment: HBASE-8705.patch

A simple patch that just retries incase of META. What you guys think about it. 
It is nothing but reintroducing the logic where the assignment was attempted 
for maxAttempts number of times.  This just does that for META incase of not 
region plan available but with a sleep.

 RS holding META when restarted in a single node setup may hang infinitely 
 without META assignment
 -

 Key: HBASE-8705
 URL: https://issues.apache.org/jira/browse/HBASE-8705
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.98.0

 Attachments: HBASE-8705.patch


 This bug may be minor as it likely to happen in a single node setup.
 I restarted the RS holding META. The master tried assigning META using 
 MetaSSH. But tried this before the new RS came up.
 So as not region plan is found 
 {code}
  if (plan == null) {
 LOG.warn(Unable to determine a plan to assign  + region);
 if (tomActivated){
   this.timeoutMonitor.setAllRegionServersOffline(true);
 } else {
   regionStates.updateRegionState(region, 
 RegionState.State.FAILED_OPEN);
 }
 return;
   }
 {code}
 we just return without assigment.  And this being the META the small cluster 
 just hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-8705) RS holding META when restarted in a single node setup may hang infinitely without META assignment


[ 
https://issues.apache.org/jira/browse/HBASE-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680508#comment-13680508
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-8705 at 6/11/13 6:41 PM:


A simple patch that just retries incase of META. What you guys think about it. 
It is nothing but reintroducing the logic where the assignment was attempted 
for maxAttempts number of times.  This just does that for META incase of no 
region plan available but with a sleep.

  was (Author: ram_krish):
A simple patch that just retries incase of META. What you guys think about 
it. 
It is nothing but reintroducing the logic where the assignment was attempted 
for maxAttempts number of times.  This just does that for META incase of not 
region plan available but with a sleep.
  
 RS holding META when restarted in a single node setup may hang infinitely 
 without META assignment
 -

 Key: HBASE-8705
 URL: https://issues.apache.org/jira/browse/HBASE-8705
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.98.0

 Attachments: HBASE-8705.patch


 This bug may be minor as it likely to happen in a single node setup.
 I restarted the RS holding META. The master tried assigning META using 
 MetaSSH. But tried this before the new RS came up.
 So as not region plan is found 
 {code}
  if (plan == null) {
 LOG.warn(Unable to determine a plan to assign  + region);
 if (tomActivated){
   this.timeoutMonitor.setAllRegionServersOffline(true);
 } else {
   regionStates.updateRegionState(region, 
 RegionState.State.FAILED_OPEN);
 }
 return;
   }
 {code}
 we just return without assigment.  And this being the META the small cluster 
 just hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8705) RS holding META when restarted in a single node setup may hang infinitely without META assignment


 [ 
https://issues.apache.org/jira/browse/HBASE-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-8705:
--

Status: Patch Available  (was: Open)

 RS holding META when restarted in a single node setup may hang infinitely 
 without META assignment
 -

 Key: HBASE-8705
 URL: https://issues.apache.org/jira/browse/HBASE-8705
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.98.0

 Attachments: HBASE-8705.patch


 This bug may be minor as it likely to happen in a single node setup.
 I restarted the RS holding META. The master tried assigning META using 
 MetaSSH. But tried this before the new RS came up.
 So as not region plan is found 
 {code}
  if (plan == null) {
 LOG.warn(Unable to determine a plan to assign  + region);
 if (tomActivated){
   this.timeoutMonitor.setAllRegionServersOffline(true);
 } else {
   regionStates.updateRegionState(region, 
 RegionState.State.FAILED_OPEN);
 }
 return;
   }
 {code}
 we just return without assigment.  And this being the META the small cluster 
 just hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-8730) Update TestEnvironmentEdgeManager to fix error

2013-06-11 Thread Shane Hogan (JIRA)

Shane Hogan created HBASE-8730:
--

 Summary: Update TestEnvironmentEdgeManager to fix error
 Key: HBASE-8730
 URL: https://issues.apache.org/jira/browse/HBASE-8730
 Project: HBase
  Issue Type: Test
  Components: test
Affects Versions: 0.89-fb
Reporter: Shane Hogan
Priority: Trivial
 Fix For: 0.89-fb


Fixes a small issue with the test.

Fixing the unit tests false assumption that the delegate
starts out being the default delegate. This assumption is violated
if another part of the code calls injectEdge with something
other than the defaultEnvironmentEdge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-8664) Small fix ups for memory size outputs in UI


 [ 
https://issues.apache.org/jira/browse/HBASE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-8664.
--

   Resolution: Fixed
Fix Version/s: 0.98.0
 Hadoop Flags: Reviewed

Committed to trunk and 0.95.  Thanks for review Enis.

 Small fix ups for memory size outputs in UI
 ---

 Key: HBASE-8664
 URL: https://issues.apache.org/jira/browse/HBASE-8664
 Project: HBase
  Issue Type: Bug
  Components: UI
Reporter: stack
Assignee: stack
 Fix For: 0.98.0, 0.95.1

 Attachments: ui.txt


 This issue goes in the 'polish' category.  On regionserver ui, we were 
 listing raw bytes for heap size, memstore size, etc.  I put in place 
 StringUtils.humanReadableInt (looked to see if bootstrap could do it for us 
 but doesn't seem so, not w/o plugin).  I then made all the megabytes and 
 kilobytes match StringUtils.humanReadableInt with its 'm' instead of 'MB' and 
 'k' instead of KB.  Removed a stray KB that was in the wrong place too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8696) Fixup for logs that show when running hbase-it tests.

[
https://issues.apache.org/jira/browse/HBASE-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-8696:
-

Attachment: 8696v2.txt

Update to address Sergey's feedback and then a bunch of more changes. Here is
a commit message:

{code}
Tighten up logs. Mostly shorten thread names, use encoded name for region
in RegionStates logging rather than full toString of the HRI. Cleanup in
the file archiving so we log less.

Add means of asking for more than one regionserver when running standalone.
For example, below will start 5 regionservers in the standlone process (need
to suppress startup of the info servers to avoid complaint that port already
in use)

$ ./bin/start-hbase.sh -Dhbase.regionserver.info.port=-1 --localRegionServers=5

M bin/start-hbase.sh
Allow passing extraneous args provided when in local mode.
Useful when asking for more than one regionserver to be
started in the local process.

M hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
Add a short name method used when logging region name in logs
(Just prints out the encoded name)

M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientScanner.java
Was printing table name as bytes...toString it.

M
hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
Record time at which an exception was thrown so that when we dump out all
exceptions on failure, we can see the expanse during which retries were
operating.

M
hbase-client/src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedWithDetailsException.java
Print out time at which exception was thrown when doing summary of a list of
exceptions.

M
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ZooKeeperRegistry.java
Small fixups.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
M hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
Change the messages so don't output full HRI#toString just encoded region name
so lines are not unreadably long.

M hbase-it/src/test/java/org/apache/hadoop/hbase/HBaseClusterManager.java
Only log if a change.

M hbase-it/src/test/java/org/apache/hadoop/hbase/IngestIntegrationTestBase.java
Minor fixups.

M hbase-server/src/main/java/org/apache/hadoop/hbase/backup/HFileArchiver.java
Make some logging trace especially duplicated logging.

M hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
Tighten up thread names; instead of 'IPC Server listener on PORT' instead
do RpcServer.listener,port=PORT.

M
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java
Fix table name (was bytes)

M
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
Make stuff trace.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java
Tighten thread name (make it like the others).

M base-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
Tighten thread names.

M
hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java
Add being able to set how many masters in a process and regionservers.

M
hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
Print encoded name rather than full region name.

M
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Fix hostname compare (was comparing hostname to servername which never
matched)
{code}

Fixup for logs that show when running hbase-it tests.
-

Key: HBASE-8696
URL: https://issues.apache.org/jira/browse/HBASE-8696
Project: HBase
Issue Type: Improvement
Reporter: stack
Assignee: stack
Fix For: 0.95.1

Attachments: 8696v2.txt, 8698.txt

I've been staring at logs trying to figure why hbase-it tests fail.
Here are some more log cleanups that come of my frustration trying to read
our emissions.

[jira] [Commented] (HBASE-8696) Fixup for logs that show when running hbase-it tests.


[ 
https://issues.apache.org/jira/browse/HBASE-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680546#comment-13680546
 ] 

stack commented on HBASE-8696:
--

I put it up on rb here: https://reviews.apache.org/r/11805/

 Fixup for logs that show when running hbase-it tests.
 -

 Key: HBASE-8696
 URL: https://issues.apache.org/jira/browse/HBASE-8696
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: stack
 Fix For: 0.95.1

 Attachments: 8696v2.txt, 8698.txt


 I've been staring at logs trying to figure why hbase-it tests fail.
 Here are some more log cleanups that come of my frustration trying to read 
 our emissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8696) Fixup for logs that show when running hbase-it tests.

2013-06-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680552#comment-13680552
 ] 

Hadoop QA commented on HBASE-8696:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12587275/8696v2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6003//console

This message is automatically generated.

 Fixup for logs that show when running hbase-it tests.
 -

 Key: HBASE-8696
 URL: https://issues.apache.org/jira/browse/HBASE-8696
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: stack
 Fix For: 0.95.1

 Attachments: 8696v2.txt, 8698.txt


 I've been staring at logs trying to figure why hbase-it tests fail.
 Here are some more log cleanups that come of my frustration trying to read 
 our emissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8705) RS holding META when restarted in a single node setup may hang infinitely without META assignment


[ 
https://issues.apache.org/jira/browse/HBASE-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680555#comment-13680555
 ] 

stack commented on HBASE-8705:
--

+1 Seems innocuous and could help...



 RS holding META when restarted in a single node setup may hang infinitely 
 without META assignment
 -

 Key: HBASE-8705
 URL: https://issues.apache.org/jira/browse/HBASE-8705
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.98.0

 Attachments: HBASE-8705.patch


 This bug may be minor as it likely to happen in a single node setup.
 I restarted the RS holding META. The master tried assigning META using 
 MetaSSH. But tried this before the new RS came up.
 So as not region plan is found 
 {code}
  if (plan == null) {
 LOG.warn(Unable to determine a plan to assign  + region);
 if (tomActivated){
   this.timeoutMonitor.setAllRegionServersOffline(true);
 } else {
   regionStates.updateRegionState(region, 
 RegionState.State.FAILED_OPEN);
 }
 return;
   }
 {code}
 we just return without assigment.  And this being the META the small cluster 
 just hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7679) implement store file management for stripe compactions


 [ 
https://issues.apache.org/jira/browse/HBASE-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-7679:
-

Attachment: 8696v3.txt

Rebase

 implement store file management for stripe compactions
 --

 Key: HBASE-7679
 URL: https://issues.apache.org/jira/browse/HBASE-7679
 Project: HBase
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: 8696v3.txt, HBASE-7667-and-7603-v0-incomplete.patch, 
 HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v1.patch, 
 HBASE-7667-and-7603-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v1.patch, 
 HBASE-7667-v2.patch, HBASE-7667-v2.patch, HBASE-7667-v3.patch, 
 HBASE-7679-v10.patch, HBASE-7679-v11.patch, HBASE-7679-v12.patch, 
 HBASE-7679-v12.patch, HBASE-7679-v13.patch, HBASE-7679-v13.patch, 
 HBASE-7679-v14.patch, HBASE-7679-v15.patch, HBASE-7679-v16.patch, 
 HBASE-7679-v4.patch, HBASE-7679-v5.patch, HBASE-7679-v6.patch, 
 HBASE-7679-v7-.patch, HBASE-7679-v7.patch, HBASE-7679-v8.patch, 
 HBASE-7679-v9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8729) distributedLogReplay may hang during chained region server failure


 [ 
https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8729:
--

Attachment: 8729-v2.patch

 distributedLogReplay may hang during chained region server failure
 --

 Key: HBASE-8729
 URL: https://issues.apache.org/jira/browse/HBASE-8729
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.2

 Attachments: 8729-v2.patch, 8729-v2.patch, hbase-8729.patch


 In a test, half cluster(in terms of region servers) was down and some log 
 replay had incurred chained RS failures(receiving RS of a log replay failed 
 again). 
 Since by default, we only allow 3 concurrent SSH handlers(controlled by 
 {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads,
  3));{code}).
 If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving 
 RS fails again then logReplay will hang because regions of the newly failed 
 RS can't be re-assigned to another live RS(no ssh handler will be processed 
 due to max threads setting) and existing log replay will keep routing replay 
 traffic to the dead RS.
 The fix is to submit logReplay work into a separate type of executor queue in 
 order not to block SSH region assignment so that logReplay can route traffic 
 to a live RS after retries and move forward. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-8731) Use the JDK 1.7 in the precommit env for trunk

2013-06-11 Thread Nicolas Liochon (JIRA)

Nicolas Liochon created HBASE-8731:
--

 Summary: Use the JDK 1.7 in the precommit env for trunk
 Key: HBASE-8731
 URL: https://issues.apache.org/jira/browse/HBASE-8731
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.98.0
Reporter: Nicolas Liochon
Assignee: Giridharan Kesavan
 Fix For: 0.98.0


HBase today uses the jdk 1.6. In the past it created issues when we tried to 
use 1.7 for the core build while the precommit was on 1.6.

Having the precommit on 1.7 would solve this.

The best is to start with trunk. Likely 0.95 will come next, and may be, a day, 
0.94.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8729) distributedLogReplay may hang during chained region server failure


 [ 
https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8729:
--

Attachment: (was: 8729-v2.patch)

 distributedLogReplay may hang during chained region server failure
 --

 Key: HBASE-8729
 URL: https://issues.apache.org/jira/browse/HBASE-8729
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.2

 Attachments: 8729-v2.patch, hbase-8729.patch


 In a test, half cluster(in terms of region servers) was down and some log 
 replay had incurred chained RS failures(receiving RS of a log replay failed 
 again). 
 Since by default, we only allow 3 concurrent SSH handlers(controlled by 
 {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads,
  3));{code}).
 If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving 
 RS fails again then logReplay will hang because regions of the newly failed 
 RS can't be re-assigned to another live RS(no ssh handler will be processed 
 due to max threads setting) and existing log replay will keep routing replay 
 traffic to the dead RS.
 The fix is to submit logReplay work into a separate type of executor queue in 
 order not to block SSH region assignment so that logReplay can route traffic 
 to a live RS after retries and move forward. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-8732) Changing Encoding on Column Families errors out

Elliott Clark created HBASE-8732:


 Summary: Changing Encoding on Column Families errors out
 Key: HBASE-8732
 URL: https://issues.apache.org/jira/browse/HBASE-8732
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8726) Create an Integration Test for online schema change


 [ 
https://issues.apache.org/jira/browse/HBASE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-8726:
-

Attachment: HBASE-8726-0.patch

Here's a pretty simple test that uses ChaosMonkey to try and modify column 
families.

 Create an Integration Test for online schema change
 ---

 Key: HBASE-8726
 URL: https://issues.apache.org/jira/browse/HBASE-8726
 Project: HBase
  Issue Type: Bug
  Components: Admin
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-8726-0.patch


 With table locks in place it should be time to start really testing online 
 table schema changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8732) Changing Encoding on Column Families errors out


 [ 
https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-8732:
-

Description: Getting an error when opening a scanner on a file that has no 
encoding.

 Changing Encoding on Column Families errors out
 ---

 Key: HBASE-8732
 URL: https://issues.apache.org/jira/browse/HBASE-8732
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark

 Getting an error when opening a scanner on a file that has no encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8726) Create an Integration Test for online schema change


 [ 
https://issues.apache.org/jira/browse/HBASE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-8726:
-

Affects Version/s: 0.95.1
   0.98.0
   Status: Patch Available  (was: Open)

 Create an Integration Test for online schema change
 ---

 Key: HBASE-8726
 URL: https://issues.apache.org/jira/browse/HBASE-8726
 Project: HBase
  Issue Type: Bug
  Components: Admin
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-8726-0.patch


 With table locks in place it should be time to start really testing online 
 table schema changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8732) Changing Encoding on Column Families errors out


[ 
https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680608#comment-13680608
 ] 

Elliott Clark commented on HBASE-8732:
--

Getting this error:

{code}
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: 
java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader 
reader=hdfs://localhost:57053/user/eclark/hbase/IntegrationTestModifyColumns/d2c63aa3399aaf7e40bf7d045c0bb1ca/test_cf/d020ed015d9b4c73b08b06192095e4be,
 compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] 
[cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] 
[cacheEvictOnClose=false] [cacheCompressed=false], 
firstKey=1115ec15a4637bb614390d16c81ea881-105445/test_cf:0/1370980134228/Put, 
lastKey=221cdbd49831660e254edeb0c4b51109-102317/test_cf:0/1370980122463/Put, 
avgKeyLen=59, avgValueLen=100, entries=6441, length=1089866, cur=null] to key 
1a860448b5d2824f0a7163839fe04f6e-109693/test_cf:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/mvcc=0
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:154)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:160)
at 
org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1623)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3507)
at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1705)
at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1697)
at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1674)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4452)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4427)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2743)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:20926)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2122)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1829)
Caused by: java.io.IOException: Cached block under key 
d020ed015d9b4c73b08b06192095e4be_590914_FAST_DIFF has wrong encoding: null 
(expected: FAST_DIFF)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:319)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:469)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:490)
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:222)
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:142)
... 12 more

at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1336)
at 
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1540)
at 
org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1597)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:21331)
at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1233)
... 8 more
{code}

 Changing Encoding on Column Families errors out
 ---

 Key: HBASE-8732
 URL: https://issues.apache.org/jira/browse/HBASE-8732
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark

 Getting an error when opening a scanner on a file that has no encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7679) implement store file management for stripe compactions


[ 
https://issues.apache.org/jira/browse/HBASE-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680615#comment-13680615
 ] 

Sergey Shelukhin commented on HBASE-7679:
-

this appears to be the wrong JIRA

 implement store file management for stripe compactions
 --

 Key: HBASE-7679
 URL: https://issues.apache.org/jira/browse/HBASE-7679
 Project: HBase
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: 8696v3.txt, HBASE-7667-and-7603-v0-incomplete.patch, 
 HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v1.patch, 
 HBASE-7667-and-7603-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v1.patch, 
 HBASE-7667-v2.patch, HBASE-7667-v2.patch, HBASE-7667-v3.patch, 
 HBASE-7679-v10.patch, HBASE-7679-v11.patch, HBASE-7679-v12.patch, 
 HBASE-7679-v12.patch, HBASE-7679-v13.patch, HBASE-7679-v13.patch, 
 HBASE-7679-v14.patch, HBASE-7679-v15.patch, HBASE-7679-v16.patch, 
 HBASE-7679-v4.patch, HBASE-7679-v5.patch, HBASE-7679-v6.patch, 
 HBASE-7679-v7-.patch, HBASE-7679-v7.patch, HBASE-7679-v8.patch, 
 HBASE-7679-v9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8726) Create an Integration Test for online schema change


[ 
https://issues.apache.org/jira/browse/HBASE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680623#comment-13680623
 ] 

Sergey Shelukhin commented on HBASE-8726:
-

Some comments are stale (e.g. the one mentioning kills for CHAOS_EVERY_MS).

{code} new AddColumnPolicy(tableName, new HBaseAdmin(util.getConfiguration())), 
{code}
passing HBaseAdmin is not necessary, Action class has context that has admin, 
as well as other random stuff.

Action is called policy which is kind of confusing.

You are making online changes enabled by default, is this intended in this JIRA?



 Create an Integration Test for online schema change
 ---

 Key: HBASE-8726
 URL: https://issues.apache.org/jira/browse/HBASE-8726
 Project: HBase
  Issue Type: Bug
  Components: Admin
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-8726-0.patch


 With table locks in place it should be time to start really testing online 
 table schema changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8726) Create an Integration Test for online schema change


[ 
https://issues.apache.org/jira/browse/HBASE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680640#comment-13680640
 ] 

stack commented on HBASE-8726:
--

Would be cool if we could enable it as default (if it passes these tests).

Patch looks good to me (caveat the suggestions [~sershe] makes).



 Create an Integration Test for online schema change
 ---

 Key: HBASE-8726
 URL: https://issues.apache.org/jira/browse/HBASE-8726
 Project: HBase
  Issue Type: Bug
  Components: Admin
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-8726-0.patch


 With table locks in place it should be time to start really testing online 
 table schema changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7679) implement store file management for stripe compactions


 [ 
https://issues.apache.org/jira/browse/HBASE-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-7679:
-

Attachment: (was: 8696v3.txt)

 implement store file management for stripe compactions
 --

 Key: HBASE-7679
 URL: https://issues.apache.org/jira/browse/HBASE-7679
 Project: HBase
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-7667-and-7603-v0-incomplete.patch, 
 HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v1.patch, 
 HBASE-7667-and-7603-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v1.patch, 
 HBASE-7667-v2.patch, HBASE-7667-v2.patch, HBASE-7667-v3.patch, 
 HBASE-7679-v10.patch, HBASE-7679-v11.patch, HBASE-7679-v12.patch, 
 HBASE-7679-v12.patch, HBASE-7679-v13.patch, HBASE-7679-v13.patch, 
 HBASE-7679-v14.patch, HBASE-7679-v15.patch, HBASE-7679-v16.patch, 
 HBASE-7679-v4.patch, HBASE-7679-v5.patch, HBASE-7679-v6.patch, 
 HBASE-7679-v7-.patch, HBASE-7679-v7.patch, HBASE-7679-v8.patch, 
 HBASE-7679-v9.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8696) Fixup for logs that show when running hbase-it tests.


 [ 
https://issues.apache.org/jira/browse/HBASE-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8696:
-

Attachment: 8696v3.txt

Rebase

 Fixup for logs that show when running hbase-it tests.
 -

 Key: HBASE-8696
 URL: https://issues.apache.org/jira/browse/HBASE-8696
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: stack
 Fix For: 0.95.1

 Attachments: 8696v2.txt, 8696v3.txt, 8698.txt


 I've been staring at logs trying to figure why hbase-it tests fail.
 Here are some more log cleanups that come of my frustration trying to read 
 our emissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8726) Create an Integration Test for online schema change


[ 
https://issues.apache.org/jira/browse/HBASE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680646#comment-13680646
 ] 

Elliott Clark commented on HBASE-8726:
--

bq.passing HBaseAdmin is not necessary, Action class has context that has 
admin, as well as other random stuff.

The context class is all private with a comment about how whoever wrote the 
actions wanted the internals to be private so I went the route of passing an 
admin.

bq.Action is called policy which is kind of confusing.
True.  I'll rename those.

bq.You are making online changes enabled by default, is this intended in this 
JIRA?
Yes when we can get this to run stablely for hours I would like to make it 
default.  Until then I don't think committing this is right yet.

This test exposed HBASE-8732 in the first 10 mins. So I expect there are still 
more bugs before we can make it default.

 Create an Integration Test for online schema change
 ---

 Key: HBASE-8726
 URL: https://issues.apache.org/jira/browse/HBASE-8726
 Project: HBase
  Issue Type: Bug
  Components: Admin
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-8726-0.patch


 With table locks in place it should be time to start really testing online 
 table schema changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8729) distributedLogReplay may hang during chained region server failure


[ 
https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680659#comment-13680659
 ] 

stack commented on HBASE-8729:
--

Why is it M_MASTER_LOG_REPLAY rather than just M_LOG_REPLAY?  (Don't M mean 
MASTER?)

Make this name shorter: +  MASTER_LOG_REPLAY_OPERATIONS(7).  M_LOG_REPLAY_OPS.  
It is name of thread and shows all over logs so terse is better.

Should be its own config?

+
this.executorService.startExecutorService(ExecutorType.MASTER_LOG_REPLAY_OPERATIONS,
+  conf.getInt(hbase.master.executor.serverops.threads, 15));

... rather than serverops?

Rather than a log replay handler, should we instead have M_SERVER_SHUTDOWN be 
its own type... and then make N executor slots for server shutdown  handling 
rather than for log reaplay?  Would then make the exit of server shutdown 
handler nicer in that when we leave it, we have processed the server rather 
than as we have in this patch where we go off to another executor for 
completion?



 distributedLogReplay may hang during chained region server failure
 --

 Key: HBASE-8729
 URL: https://issues.apache.org/jira/browse/HBASE-8729
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.2

 Attachments: 8729-v2.patch, hbase-8729.patch


 In a test, half cluster(in terms of region servers) was down and some log 
 replay had incurred chained RS failures(receiving RS of a log replay failed 
 again). 
 Since by default, we only allow 3 concurrent SSH handlers(controlled by 
 {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads,
  3));{code}).
 If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving 
 RS fails again then logReplay will hang because regions of the newly failed 
 RS can't be re-assigned to another live RS(no ssh handler will be processed 
 due to max threads setting) and existing log replay will keep routing replay 
 traffic to the dead RS.
 The fix is to submit logReplay work into a separate type of executor queue in 
 order not to block SSH region assignment so that logReplay can route traffic 
 to a live RS after retries and move forward. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8726) Create an Integration Test for online schema change


 [ 
https://issues.apache.org/jira/browse/HBASE-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8726:
-

Fix Version/s: 0.95.2

Adding to 0.95.2.

 Create an Integration Test for online schema change
 ---

 Key: HBASE-8726
 URL: https://issues.apache.org/jira/browse/HBASE-8726
 Project: HBase
  Issue Type: Bug
  Components: Admin
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 0.95.2

 Attachments: HBASE-8726-0.patch


 With table locks in place it should be time to start really testing online 
 table schema changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8732) Changing Encoding on Column Families errors out


 [ 
https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8732:
-

 Priority: Critical  (was: Major)
Fix Version/s: 0.95.2

 Changing Encoding on Column Families errors out
 ---

 Key: HBASE-8732
 URL: https://issues.apache.org/jira/browse/HBASE-8732
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Priority: Critical
 Fix For: 0.95.2


 Getting an error when opening a scanner on a file that has no encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8706) Some improvement in snapshot

2013-06-11 Thread Matteo Bertozzi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-8706:
---

Attachment: HBASE-8706-v4.patch

Added some fixes around the use of wakeTime/keepAlive/timeout.

patch looks good for me, any other comments?

 Some improvement in snapshot
 

 Key: HBASE-8706
 URL: https://issues.apache.org/jira/browse/HBASE-8706
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 0.94.8, 0.95.0
Reporter: binlijin
 Attachments: HBASE-8706-2.patch, HBASE-8706-3.patch, 
 HBASE-8706.patch, HBASE-8706-v4.patch


 (1)timeout for Procedure can not be configured.
 {code}
 Procedure's timeout
 ProcedureCoordinator
   final static long TIMEOUT_MILLIS_DEFAULT = 6;
createProcedure(ForeignExceptionDispatcher fed, String procName, byte[] 
 procArgs,
   ListString expectedMembers) {
 // build the procedure
 return new Procedure(this, fed, WAKE_MILLIS_DEFAULT, 
 TIMEOUT_MILLIS_DEFAULT,
 procName, procArgs, expectedMembers);
   }
 RegionServerSnapshotManager:
   /** Conf key for max time to keep threads in snapshot request pool waiting 
 */
   public static final String SNAPSHOT_TIMEOUT_MILLIS_KEY = 
 hbase.snapshot.region.timeout;
   /** Keep threads alive in request pool for max of 60 seconds */
   public static final long SNAPSHOT_TIMEOUT_MILLIS_DEFAULT = 6;
   public Subprocedure buildSubprocedure(SnapshotDescription snapshot) {
 long timeoutMillis = conf.getLong(SNAPSHOT_TIMEOUT_MILLIS_KEY,
 SNAPSHOT_TIMEOUT_MILLIS_DEFAULT);
 case FLUSH:
   SnapshotSubprocedurePool taskManager =
 new SnapshotSubprocedurePool(rss.getServerName().toString(), conf);
   }
 {code}
 (2)TakeSnapshotHandler
 after snapshotRegions we should call monitor.rethrowException(); to check if 
 there is exception and if there is we can skip the verifySnapshot
 (3)too much error message when error happened in some place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8721) fix for bug that delete can mask puts that happened after the delete was entered

[
https://issues.apache.org/jira/browse/HBASE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680676#comment-13680676
]

stack commented on HBASE-8721:
--

[~fenghh] On keeping deleted cells, it is an option I believe. See
http://hbase.apache.org/book.html#cf.keep.deleted

[~fenghh] Agree that the way delete works is uncanny where we could a put after
a delete will go unseen.

Thank you for looking into this.

You are using mvcc when rather it should be sequenceid that you should be
using? Is that so? mvcc is used cloaking memstore state doing a reveal only
after all that makes up a transaction has been written across the row.
sequenceid is given when we add something to the WAL and it used ensuring
ordering when doing WAL replays.

fix for bug that delete can mask puts that happened after the delete was
entered

Key: HBASE-8721
URL: https://issues.apache.org/jira/browse/HBASE-8721
Project: HBase
Issue Type: Bug
Components: regionserver
Reporter: Feng Honghua
Attachments: HBASE-8721-0.94-V0.patch

[jira] [Commented] (HBASE-8699) Parameter to DistributedFileSystem#isFileClosed should be of type Path


[ 
https://issues.apache.org/jira/browse/HBASE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680692#comment-13680692
 ] 

Ted Yu commented on HBASE-8699:
---

[~stack]:
What do you think of the patch ?

Thanks

 Parameter to DistributedFileSystem#isFileClosed should be of type Path
 --

 Key: HBASE-8699
 URL: https://issues.apache.org/jira/browse/HBASE-8699
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 8699-v1.txt


 Here is current code of FSHDFSUtils#isFileClosed():
 {code}
   boolean isFileClosed(final DistributedFileSystem dfs, final Path p) {
 try {
   Method m = dfs.getClass().getMethod(isFileClosed, new Class?[] 
 {String.class});
   return (Boolean) m.invoke(dfs, p.toString());
 {code}
 We look for isFileClosed method with parameter type of String.
 However, from 
 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
  (branch-2):
 {code}
   public boolean isFileClosed(Path src) throws IOException {
 {code}
 The parameter type is of Path.
 This means we would get NoSuchMethodException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8664) Small fix ups for memory size outputs in UI


[ 
https://issues.apache.org/jira/browse/HBASE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680696#comment-13680696
 ] 

Hudson commented on HBASE-8664:
---

Integrated in hbase-0.95 #236 (See 
[https://builds.apache.org/job/hbase-0.95/236/])
HBASE-8664 Small fix ups for memory size outputs in UI (Revision 1491903)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/RegionServerListTmpl.jamon
* 
/hbase/branches/0.95/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RegionListTmpl.jamon
* 
/hbase/branches/0.95/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/ServerMetricsTmpl.jamon


 Small fix ups for memory size outputs in UI
 ---

 Key: HBASE-8664
 URL: https://issues.apache.org/jira/browse/HBASE-8664
 Project: HBase
  Issue Type: Bug
  Components: UI
Reporter: stack
Assignee: stack
 Fix For: 0.98.0, 0.95.1

 Attachments: ui.txt


 This issue goes in the 'polish' category.  On regionserver ui, we were 
 listing raw bytes for heap size, memstore size, etc.  I put in place 
 StringUtils.humanReadableInt (looked to see if bootstrap could do it for us 
 but doesn't seem so, not w/o plugin).  I then made all the megabytes and 
 kilobytes match StringUtils.humanReadableInt with its 'm' instead of 'MB' and 
 'k' instead of KB.  Removed a stray KB that was in the wrong place too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8699) Parameter to DistributedFileSystem#isFileClosed should be of type Path


[ 
https://issues.apache.org/jira/browse/HBASE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680708#comment-13680708
 ] 

Elliott Clark commented on HBASE-8699:
--

bq.Is there a reliable way to detect hadoop version ? I am not aware of one.
That's what the hadoop-compat modules are there for.  Anything hadoop 2+ will 
have hbase-hadoop2-compat on the cp. That seems like a good solution.

 Parameter to DistributedFileSystem#isFileClosed should be of type Path
 --

 Key: HBASE-8699
 URL: https://issues.apache.org/jira/browse/HBASE-8699
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 8699-v1.txt


 Here is current code of FSHDFSUtils#isFileClosed():
 {code}
   boolean isFileClosed(final DistributedFileSystem dfs, final Path p) {
 try {
   Method m = dfs.getClass().getMethod(isFileClosed, new Class?[] 
 {String.class});
   return (Boolean) m.invoke(dfs, p.toString());
 {code}
 We look for isFileClosed method with parameter type of String.
 However, from 
 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
  (branch-2):
 {code}
   public boolean isFileClosed(Path src) throws IOException {
 {code}
 The parameter type is of Path.
 This means we would get NoSuchMethodException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC


[ 
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680709#comment-13680709
 ] 

stack commented on HBASE-3787:
--

Yeah, can't cache KV.

Can we have something for one server first?

 Increment is non-idempotent but client retries RPC
 --

 Key: HBASE-3787
 URL: https://issues.apache.org/jira/browse/HBASE-3787
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.4, 0.95.2
Reporter: dhruba borthakur
Assignee: Sergey Shelukhin
Priority: Critical
 Fix For: 0.95.1

 Attachments: HBASE-3787-partial.patch, HBASE-3787-v0.patch, 
 HBASE-3787-v1.patch, HBASE-3787-v2.patch, HBASE-3787-v3.patch, 
 HBASE-3787-v4.patch, HBASE-3787-v5.patch, HBASE-3787-v5.patch


 The HTable.increment() operation is non-idempotent. The client retries the 
 increment RPC a few times (as specified by configuration) before throwing an 
 error to the application. This makes it possible that the same increment call 
 be applied twice at the server.
 For increment operations, is it better to use 
 HConnectionManager.getRegionServerWithoutRetries()? Another  option would be 
 to enhance the IPC module to make the RPC server correctly identify if the 
 RPC is a retry attempt and handle accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8696) Fixup for logs that show when running hbase-it tests.


 [ 
https://issues.apache.org/jira/browse/HBASE-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-8696:
-

Attachment: 8696v4.txt

Update w/ Sergey comments addressed.

 Fixup for logs that show when running hbase-it tests.
 -

 Key: HBASE-8696
 URL: https://issues.apache.org/jira/browse/HBASE-8696
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: stack
 Fix For: 0.95.1

 Attachments: 8696v2.txt, 8696v3.txt, 8696v4.txt, 8698.txt


 I've been staring at logs trying to figure why hbase-it tests fail.
 Here are some more log cleanups that come of my frustration trying to read 
 our emissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8699) Parameter to DistributedFileSystem#isFileClosed should be of type Path


[ 
https://issues.apache.org/jira/browse/HBASE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680712#comment-13680712
 ] 

Ted Yu commented on HBASE-8699:
---

Currently hadoop 1.2.0 contains DistributedFileSystem#isFileClosed that HBase 
can use.

Should 1.2.0 be covered ?

 Parameter to DistributedFileSystem#isFileClosed should be of type Path
 --

 Key: HBASE-8699
 URL: https://issues.apache.org/jira/browse/HBASE-8699
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 8699-v1.txt


 Here is current code of FSHDFSUtils#isFileClosed():
 {code}
   boolean isFileClosed(final DistributedFileSystem dfs, final Path p) {
 try {
   Method m = dfs.getClass().getMethod(isFileClosed, new Class?[] 
 {String.class});
   return (Boolean) m.invoke(dfs, p.toString());
 {code}
 We look for isFileClosed method with parameter type of String.
 However, from 
 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
  (branch-2):
 {code}
   public boolean isFileClosed(Path src) throws IOException {
 {code}
 The parameter type is of Path.
 This means we would get NoSuchMethodException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8664) Small fix ups for memory size outputs in UI


[ 
https://issues.apache.org/jira/browse/HBASE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680721#comment-13680721
 ] 

Hudson commented on HBASE-8664:
---

Integrated in HBase-TRUNK #4173 (See 
[https://builds.apache.org/job/HBase-TRUNK/4173/])
HBASE-8664 Small fix ups for memory size outputs in UI (Revision 1491902)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/RegionServerListTmpl.jamon
* 
/hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RegionListTmpl.jamon
* 
/hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/ServerMetricsTmpl.jamon


 Small fix ups for memory size outputs in UI
 ---

 Key: HBASE-8664
 URL: https://issues.apache.org/jira/browse/HBASE-8664
 Project: HBase
  Issue Type: Bug
  Components: UI
Reporter: stack
Assignee: stack
 Fix For: 0.98.0, 0.95.1

 Attachments: ui.txt


 This issue goes in the 'polish' category.  On regionserver ui, we were 
 listing raw bytes for heap size, memstore size, etc.  I put in place 
 StringUtils.humanReadableInt (looked to see if bootstrap could do it for us 
 but doesn't seem so, not w/o plugin).  I then made all the megabytes and 
 kilobytes match StringUtils.humanReadableInt with its 'm' instead of 'MB' and 
 'k' instead of KB.  Removed a stray KB that was in the wrong place too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8699) Parameter to DistributedFileSystem#isFileClosed should be of type Path


[ 
https://issues.apache.org/jira/browse/HBASE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680719#comment-13680719
 ] 

Ted Yu commented on HBASE-8699:
---

bq. Anything hadoop 2+ will have hbase-hadoop2-compat on the cp

lib/hbase-hadoop2-compat-0.95.1.jar would be on the classpath. Does it reveal 
the underlying hadoop version ?

 Parameter to DistributedFileSystem#isFileClosed should be of type Path
 --

 Key: HBASE-8699
 URL: https://issues.apache.org/jira/browse/HBASE-8699
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 8699-v1.txt


 Here is current code of FSHDFSUtils#isFileClosed():
 {code}
   boolean isFileClosed(final DistributedFileSystem dfs, final Path p) {
 try {
   Method m = dfs.getClass().getMethod(isFileClosed, new Class?[] 
 {String.class});
   return (Boolean) m.invoke(dfs, p.toString());
 {code}
 We look for isFileClosed method with parameter type of String.
 However, from 
 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
  (branch-2):
 {code}
   public boolean isFileClosed(Path src) throws IOException {
 {code}
 The parameter type is of Path.
 This means we would get NoSuchMethodException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8699) Parameter to DistributedFileSystem#isFileClosed should be of type Path


[ 
https://issues.apache.org/jira/browse/HBASE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680725#comment-13680725
 ] 

stack commented on HBASE-8699:
--

What Elliott said.

org.apache.hadoop.util.VersionInfo.getVersion() will give you hadoop version... 
(over in hadoop-one-compat, if 1.2, change test result?)

 Parameter to DistributedFileSystem#isFileClosed should be of type Path
 --

 Key: HBASE-8699
 URL: https://issues.apache.org/jira/browse/HBASE-8699
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 8699-v1.txt


 Here is current code of FSHDFSUtils#isFileClosed():
 {code}
   boolean isFileClosed(final DistributedFileSystem dfs, final Path p) {
 try {
   Method m = dfs.getClass().getMethod(isFileClosed, new Class?[] 
 {String.class});
   return (Boolean) m.invoke(dfs, p.toString());
 {code}
 We look for isFileClosed method with parameter type of String.
 However, from 
 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
  (branch-2):
 {code}
   public boolean isFileClosed(Path src) throws IOException {
 {code}
 The parameter type is of Path.
 This means we would get NoSuchMethodException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8732) Changing Encoding on Column Families errors out


[ 
https://issues.apache.org/jira/browse/HBASE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680722#comment-13680722
 ] 

Elliott Clark commented on HBASE-8732:
--

It seems like FastDiff is the culprit here.  If I change the test to not use 
fast diff then it passes.

 Changing Encoding on Column Families errors out
 ---

 Key: HBASE-8732
 URL: https://issues.apache.org/jira/browse/HBASE-8732
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.0, 0.95.1
Reporter: Elliott Clark
Priority: Critical
 Fix For: 0.95.2


 Getting an error when opening a scanner on a file that has no encoding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8706) Some improvement in snapshot


[ 
https://issues.apache.org/jira/browse/HBASE-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680728#comment-13680728
 ] 

stack commented on HBASE-8706:
--

Skimmed the patch.  lgtm.

 Some improvement in snapshot
 

 Key: HBASE-8706
 URL: https://issues.apache.org/jira/browse/HBASE-8706
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 0.94.8, 0.95.0
Reporter: binlijin
 Attachments: HBASE-8706-2.patch, HBASE-8706-3.patch, 
 HBASE-8706.patch, HBASE-8706-v4.patch


 (1)timeout for Procedure can not be configured.
 {code}
 Procedure's timeout
 ProcedureCoordinator
   final static long TIMEOUT_MILLIS_DEFAULT = 6;
createProcedure(ForeignExceptionDispatcher fed, String procName, byte[] 
 procArgs,
   ListString expectedMembers) {
 // build the procedure
 return new Procedure(this, fed, WAKE_MILLIS_DEFAULT, 
 TIMEOUT_MILLIS_DEFAULT,
 procName, procArgs, expectedMembers);
   }
 RegionServerSnapshotManager:
   /** Conf key for max time to keep threads in snapshot request pool waiting 
 */
   public static final String SNAPSHOT_TIMEOUT_MILLIS_KEY = 
 hbase.snapshot.region.timeout;
   /** Keep threads alive in request pool for max of 60 seconds */
   public static final long SNAPSHOT_TIMEOUT_MILLIS_DEFAULT = 6;
   public Subprocedure buildSubprocedure(SnapshotDescription snapshot) {
 long timeoutMillis = conf.getLong(SNAPSHOT_TIMEOUT_MILLIS_KEY,
 SNAPSHOT_TIMEOUT_MILLIS_DEFAULT);
 case FLUSH:
   SnapshotSubprocedurePool taskManager =
 new SnapshotSubprocedurePool(rss.getServerName().toString(), conf);
   }
 {code}
 (2)TakeSnapshotHandler
 after snapshotRegions we should call monitor.rethrowException(); to check if 
 there is exception and if there is we can skip the verifySnapshot
 (3)too much error message when error happened in some place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8665) bad compaction priority behavior in queue can cause store to be blocked

[
https://issues.apache.org/jira/browse/HBASE-8665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680736#comment-13680736
]

Sergey Shelukhin commented on HBASE-8665:
-

[~saint@gmail.com] ping?

bad compaction priority behavior in queue can cause store to be blocked
---

Key: HBASE-8665
URL: https://issues.apache.org/jira/browse/HBASE-8665
Project: HBase
Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Attachments: HBASE-8665-v0.patch

Note that this can be solved by bumping up the number of compaction threads
but still it seems like this priority inversion should be dealt with.
There's a store with 1 big file and 3 flushes (1 2 3 4) sitting around and
minding its own business when it decides to compact. Compaction (2 3 4) is
created and put in queue, it's low priority, so it doesn't get out of the
queue for some time - other stores are compacting. Meanwhile more files are
flushed and at (1 2 3 4 5 6 7) it decides to compact (5 6 7). This compaction
now has higher priority than the first one. After that if the load is high it
enters vicious cycle of compacting and compacting files as they arrive, with
store being blocked on and off, with the (2 3 4) compaction staying in queue
for up to ~20 minutes (that I've seen).
I wonder why we do thing thing where we queue compaction and compact
separately. Perhaps we should take snapshot of all store priorities, then do
select in order and execute the first compaction we find. This will need
starvation safeguard too but should probably be better.
Btw, exploring compaction policy may be more prone to this, as it can select
files from the middle, not just beginning, which, given the treatment of
already selected files that was not changed from the old ratio-based one (all
files with lower seqNums than the ones selected are also ineligible for
further selection), will make more files ineligible (e.g. imagine with 10
blocking files, with 8 present (1-8), (6 7 8) being selected and getting
stuck). Today I see the case that would also apply to old policy, but
yesterday I saw file distribution something like this: 4,5g, 2,1g, 295,9m,
113,3m, 68,0m, 67,8m, 1,1g, 295,1m, 100,4m, unfortunately w/o enough logs to
figure out how it resulted.

[jira] [Commented] (HBASE-8667) Master and Regionserver not able to communicate if both bound to different network interfaces on the same machine.


[ 
https://issues.apache.org/jira/browse/HBASE-8667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680733#comment-13680733
 ] 

stack commented on HBASE-8667:
--

bq. Then we need to initialize rpc server in RS with the hostname recieved from 
master after checkin right? Otherwise we will have this issue.

The regionserver just takes the name and uses it in subsequent communication w/ 
the master -- it does not change where it is bound based of the name the master 
gave it.

Are you suggesting that regionserver only set up an rpcserver after it has 
gotten name from master?  What if this disagrees w/ what the operator told us 
use in the configuration?

Isn't what we have here a setup problem; we have regionserver on localhost and 
master on an ip?  Can you have regionserver bind to same ip?


 Master and Regionserver not able to communicate if both bound to different 
 network interfaces on the same machine.
 --

 Key: HBASE-8667
 URL: https://issues.apache.org/jira/browse/HBASE-8667
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Reporter: rajeshbabu
 Fix For: 0.98.0, 0.95.2, 0.94.9

 Attachments: HBASE-8667_Trunk.patch, HBASE-8667_Trunk-V2.patch


 While testing HBASE-8640 fix found that master and regionserver running on 
 different interfaces are not communicating properly.
 I have two interfaces 1) lo 2) eth0 in my machine and default hostname 
 interface is lo.
 I have configured master ipc address to ip of eth0 interface.
 Started master and regionserver on the same machine.
 1) master rpc server bound to eth0 and RS rpc server bound to lo
 2) Since rpc client is not binding to any ip address, when RS is reporting RS 
 startup its getting registered with eth0 ip address(but actually it should 
 register localhost)
 Here are RS logs:
 {code}
 2013-05-31 06:05:28,608 WARN  [regionserver60020] 
 org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; 
 sleeping and then retrying.
 2013-05-31 06:05:31,609 INFO  [regionserver60020] 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to 
 Master server at 192.168.0.100,6,1369960497008
 2013-05-31 06:05:31,609 INFO  [regionserver60020] 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at 
 192.168.0.100,6,1369960497008 that we are up with port=60020, 
 startcode=1369960502544
 2013-05-31 06:05:31,618 DEBUG [regionserver60020] 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Config from master: 
 hbase.rootdir=hdfs://localhost:2851/hbase
 2013-05-31 06:05:31,618 DEBUG [regionserver60020] 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Config from master: 
 fs.default.name=hdfs://localhost:2851
 2013-05-31 06:05:31,618 INFO  [regionserver60020] 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us a 
 different hostname to use; was=localhost, but now=192.168.0.100
 {code}
 Here are master logs:
 {code}
 2013-05-31 06:05:31,615 INFO  [IPC Server handler 9 on 6] 
 org.apache.hadoop.hbase.master.ServerManager: Registering 
 server=192.168.0.100,60020,1369960502544
 {code}
 Since master has wrong rpc server address of RS, META is not getting assigned.
 {code}
 2013-05-31 06:05:34,362 DEBUG [master-192.168.0.100,6,1369960497008] 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for .META.,,1.1028785192 so 
 generated a random one; hri=.META.,,1.1028785192, src=, 
 dest=192.168.0.100,60020,1369960502544; 1 (online=1, available=1) available 
 servers, forceNewPlan=false
 -
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 .META.,,1.1028785192 to 192.168.0.100,60020,1369960502544, trying to assign 
 elsewhere instead; try=1 of 10
 java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:549)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:813)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1422)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1315)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1532)
   at

[jira] [Commented] (HBASE-8699) Parameter to DistributedFileSystem#isFileClosed should be of type Path


[ 
https://issues.apache.org/jira/browse/HBASE-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680739#comment-13680739
 ] 

Ted Yu commented on HBASE-8699:
---

See if I understand correctly. We utilize this method:
{code}
  public static String getVersion() {
{code}
and check the return String for certain releases we know 
DistributedFileSystem#isFileClosed(Path ) is present.

 Parameter to DistributedFileSystem#isFileClosed should be of type Path
 --

 Key: HBASE-8699
 URL: https://issues.apache.org/jira/browse/HBASE-8699
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 8699-v1.txt


 Here is current code of FSHDFSUtils#isFileClosed():
 {code}
   boolean isFileClosed(final DistributedFileSystem dfs, final Path p) {
 try {
   Method m = dfs.getClass().getMethod(isFileClosed, new Class?[] 
 {String.class});
   return (Boolean) m.invoke(dfs, p.toString());
 {code}
 We look for isFileClosed method with parameter type of String.
 However, from 
 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
  (branch-2):
 {code}
   public boolean isFileClosed(Path src) throws IOException {
 {code}
 The parameter type is of Path.
 This means we would get NoSuchMethodException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8729) distributedLogReplay may hang during chained region server failure


[ 
https://issues.apache.org/jira/browse/HBASE-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680741#comment-13680741
 ] 

Jeffrey Zhong commented on HBASE-8729:
--

[~saint@gmail.com] Thanks for the good comments!
I'll address your first two comments in the next patch(Ted addressed the second 
one already in the v2 patch). The interesting point is your last comment:
{quote}
Rather than a log replay handler, should we instead have M_SERVER_SHUTDOWN be 
its own type... and then make N executor slots for server shutdown handling 
rather than for log reaplay? Would then make the exit of server shutdown 
handler nicer in that when we leave it, we have processed the server rather 
than as we have in this patch where we go off to another executor for 
completion?
{quote}
If we don't introduce the new log replay handler, setting N is tricky and its 
value has to be big enough so that we won't end up in issue of the JIRA. 
The other alternative(not clean and error prone) is using one pool while 
limiting logReplay can use up to MaxThreads - 3 slots in order not to block all 
threads in the pool. How do you think? Thanks.




 distributedLogReplay may hang during chained region server failure
 --

 Key: HBASE-8729
 URL: https://issues.apache.org/jira/browse/HBASE-8729
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0, 0.95.2

 Attachments: 8729-v2.patch, hbase-8729.patch


 In a test, half cluster(in terms of region servers) was down and some log 
 replay had incurred chained RS failures(receiving RS of a log replay failed 
 again). 
 Since by default, we only allow 3 concurrent SSH handlers(controlled by 
 {code}this.executorService.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS,conf.getInt(hbase.master.executor.serverops.threads,
  3));{code}).
 If all 3 SSH handlers are doing logReplay(blocking call) and one of receiving 
 RS fails again then logReplay will hang because regions of the newly failed 
 RS can't be re-assigned to another live RS(no ssh handler will be processed 
 due to max threads setting) and existing log replay will keep routing replay 
 traffic to the dead RS.
 The fix is to submit logReplay work into a separate type of executor queue in 
 order not to block SSH region assignment so that logReplay can route traffic 
 to a live RS after retries and move forward. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8344) Improve the assignment when node failures happen to choose the secondary RS as the new primary RS


[ 
https://issues.apache.org/jira/browse/HBASE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680745#comment-13680745
 ] 

Enis Soztutar commented on HBASE-8344:
--

Looks good to go. 

 Improve the assignment when node failures happen to choose the secondary RS 
 as the new primary RS
 -

 Key: HBASE-8344
 URL: https://issues.apache.org/jira/browse/HBASE-8344
 Project: HBase
  Issue Type: Sub-task
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Critical
 Fix For: 0.95.2

 Attachments: hbase-8344-1.txt, hbase-8344-2.1.txt, 
 hbase-8344-2.2.txt, hbase-8344-2.3.txt, hbase-8344-2.4.txt, 
 hbase-8344-2.5.txt, hbase-8344-2.6.txt, hbase-8344-2.6.txt, 
 hbase-8344-2.7.txt, hbase-8344-2.7.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8721) fix for bug that delete can mask puts that happened after the delete was entered

[
https://issues.apache.org/jira/browse/HBASE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680744#comment-13680744
]

Sergey Shelukhin commented on HBASE-8721:
-

bq. But I think the inconsistency issue's root cause is the arguable behaviour
that delete can mask puts that happened after the delete. A more intuitive and
more reasonable behaviour is that a delete can only mask puts happened before
it, and has no impact on puts happened after it.

This would be inconsistent with puts happening after puts being masked by
earlier puts, depending on timestamp; as in my example above. Timestamp's
express purpose is the version, by default if you don't set it, it will be
taken from server time. If you are setting explicit timestamps, you are
explicitly telling HBase that it should withhold judgement about versions
because you know what happens logically before and after in your system. If you
are using timestamp otherwise for some convenience, you are misusing it.
If this version semantic is removed, timestamp becomes simply a long tucked
unto a KeyValue and should be removed, after all, we don't have a string or a
boolean also added to KeyValue so that people could use them for their
purposes. HBase already has columns and column families to do that. Timestamp
has very explicit semantics and purpose right now. If you want time-based
behavior then don't set timestamps and HBase will use time-based behavior.

fix for bug that delete can mask puts that happened after the delete was
entered

Key: HBASE-8721
URL: https://issues.apache.org/jira/browse/HBASE-8721
Project: HBase
Issue Type: Bug
Components: regionserver
Reporter: Feng Honghua
Attachments: HBASE-8721-0.94-V0.patch

[jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC


[ 
https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680751#comment-13680751
 ] 

Sergey Shelukhin commented on HBASE-3787:
-

refer to the attached patch ;) I can remove the WAL part

 Increment is non-idempotent but client retries RPC
 --

 Key: HBASE-3787
 URL: https://issues.apache.org/jira/browse/HBASE-3787
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.94.4, 0.95.2
Reporter: dhruba borthakur
Assignee: Sergey Shelukhin
Priority: Critical
 Fix For: 0.95.1

 Attachments: HBASE-3787-partial.patch, HBASE-3787-v0.patch, 
 HBASE-3787-v1.patch, HBASE-3787-v2.patch, HBASE-3787-v3.patch, 
 HBASE-3787-v4.patch, HBASE-3787-v5.patch, HBASE-3787-v5.patch


 The HTable.increment() operation is non-idempotent. The client retries the 
 increment RPC a few times (as specified by configuration) before throwing an 
 error to the application. This makes it possible that the same increment call 
 be applied twice at the server.
 For increment operations, is it better to use 
 HConnectionManager.getRegionServerWithoutRetries()? Another  option would be 
 to enhance the IPC module to make the RPC server correctly identify if the 
 RPC is a retry attempt and handle accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8721) fix for bug that delete can mask puts that happened after the delete was entered

[
https://issues.apache.org/jira/browse/HBASE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680750#comment-13680750
]

Sergey Shelukhin commented on HBASE-8721:
-

(columns, or part of rowkey, as the case seems to be from your description)

fix for bug that delete can mask puts that happened after the delete was
entered

Key: HBASE-8721
URL: https://issues.apache.org/jira/browse/HBASE-8721
Project: HBase
Issue Type: Bug
Components: regionserver
Reporter: Feng Honghua
Attachments: HBASE-8721-0.94-V0.patch

[jira] [Commented] (HBASE-8700) IntegrationTestBigLinkedList can fail due to random number collision


[ 
https://issues.apache.org/jira/browse/HBASE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680759#comment-13680759
 ] 

Enis Soztutar commented on HBASE-8700:
--

Can we make the # command line args change backwards compatible? 
I also wanted to pre-split the table at creation to reduce the runtime. It 
becomes a little bit easier with this change. Should we do a follow up? 

 IntegrationTestBigLinkedList can fail due to random number collision
 

 Key: HBASE-8700
 URL: https://issues.apache.org/jira/browse/HBASE-8700
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8700-v0.patch, HBASE-8700-v1.patch


 The test can fail due to random number collision, claiming there are 
 unreferenced elements for obvious reasons (we rewrite some link). Original 
 Accumulo test has one-stage generation so it doesn't count unreferenced 
 elements as failures, only undefined ones. With 200m longs out of half-long 
 range the probability of collision is approx 0.2%.
 Moreover, without some way to debug, it's hard to debug what keys should be 
 looked at in such cases

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8344) Improve the assignment when node failures happen to choose the secondary RS as the new primary RS

2013-06-11 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680763#comment-13680763
 ] 

Nick Dimiduk commented on HBASE-8344:
-

+1

 Improve the assignment when node failures happen to choose the secondary RS 
 as the new primary RS
 -

 Key: HBASE-8344
 URL: https://issues.apache.org/jira/browse/HBASE-8344
 Project: HBase
  Issue Type: Sub-task
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Critical
 Fix For: 0.95.2

 Attachments: hbase-8344-1.txt, hbase-8344-2.1.txt, 
 hbase-8344-2.2.txt, hbase-8344-2.3.txt, hbase-8344-2.4.txt, 
 hbase-8344-2.5.txt, hbase-8344-2.6.txt, hbase-8344-2.6.txt, 
 hbase-8344-2.7.txt, hbase-8344-2.7.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4811) Support reverse Scan


 [ 
https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4811:
--

Attachment: 4811-trunk-v10.txt

 Support reverse Scan
 

 Key: HBASE-4811
 URL: https://issues.apache.org/jira/browse/HBASE-4811
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 0.20.6, 0.94.7
Reporter: John Carrino
Assignee: Liang Xie
 Attachments: 4811-trunk-v10.txt, 4811-trunk-v5.patch, 
 HBase-4811-0.94.3modified.txt, HBase-4811-0.94-v2.txt, 
 hbase-4811-trunkv1.patch, hbase-4811-trunkv4.patch, hbase-4811-trunkv6.patch, 
 hbase-4811-trunkv7.patch, hbase-4811-trunkv8.patch, hbase-4811-trunkv9.patch


 All the documentation I find about HBase says that if you want forward and 
 reverse scans you should just build 2 tables and one be ascending and one 
 descending.  Is there a fundamental reason that HBase only supports forward 
 Scan?  It seems like a lot of extra space overhead and coding overhead (to 
 keep them in sync) to support 2 tables.  
 I am assuming this has been discussed before, but I can't find the 
 discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-8724) [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging dir on hdfs


 [ 
https://issues.apache.org/jira/browse/HBASE-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar resolved HBASE-8724.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

Thanks for the reviews. I've committed this to 0.94. 

 [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging dir on hdfs
 ---

 Key: HBASE-8724
 URL: https://issues.apache.org/jira/browse/HBASE-8724
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.94.9

 Attachments: hbase-8724_v1.patch


 On 0.94, ExportSnapshot uses hbase.tmp.dir as the job's staging directory 
 on hdfs. However, hbase.tmp.dir is by definition a local directory, thus 
 should not be used as an hdfs directory for the job. 
 Trunk uses JobUtil.getStagingDir() which gets the staging dir from 
 JobSubmissionFiles class in Hadoop, so trunk is fine. 
 We've discovered this since it fails the test on windows, but this is not 
 windows-specific as per above (like specifying hbase.tmp.dir as 
 /var/hbase/tmp/ etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8344) Improve the assignment when node failures happen to choose the secondary RS as the new primary RS

2013-06-11 Thread Devaraj Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-8344:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed. Thanks for the reviews, folks.

 Improve the assignment when node failures happen to choose the secondary RS 
 as the new primary RS
 -

 Key: HBASE-8344
 URL: https://issues.apache.org/jira/browse/HBASE-8344
 Project: HBase
  Issue Type: Sub-task
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Critical
 Fix For: 0.95.2

 Attachments: hbase-8344-1.txt, hbase-8344-2.1.txt, 
 hbase-8344-2.2.txt, hbase-8344-2.3.txt, hbase-8344-2.4.txt, 
 hbase-8344-2.5.txt, hbase-8344-2.6.txt, hbase-8344-2.6.txt, 
 hbase-8344-2.7.txt, hbase-8344-2.7.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8696) Fixup for logs that show when running hbase-it tests.


[ 
https://issues.apache.org/jira/browse/HBASE-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680776#comment-13680776
 ] 

Sergey Shelukhin commented on HBASE-8696:
-

+1

 Fixup for logs that show when running hbase-it tests.
 -

 Key: HBASE-8696
 URL: https://issues.apache.org/jira/browse/HBASE-8696
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: stack
 Fix For: 0.95.1

 Attachments: 8696v2.txt, 8696v3.txt, 8696v4.txt, 8698.txt


 I've been staring at logs trying to figure why hbase-it tests fail.
 Here are some more log cleanups that come of my frustration trying to read 
 our emissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8700) IntegrationTestBigLinkedList can fail due to random number collision


[ 
https://issues.apache.org/jira/browse/HBASE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680796#comment-13680796
 ] 

Enis Soztutar commented on HBASE-8700:
--

Offline discussion with Sergey, it seems that this is already BC in regards to 
the command line args. +1 on commit. 

 IntegrationTestBigLinkedList can fail due to random number collision
 

 Key: HBASE-8700
 URL: https://issues.apache.org/jira/browse/HBASE-8700
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8700-v0.patch, HBASE-8700-v1.patch


 The test can fail due to random number collision, claiming there are 
 unreferenced elements for obvious reasons (we rewrite some link). Original 
 Accumulo test has one-stage generation so it doesn't count unreferenced 
 elements as failures, only undefined ones. With 200m longs out of half-long 
 range the probability of collision is approx 0.2%.
 Moreover, without some way to debug, it's hard to debug what keys should be 
 looked at in such cases

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8702) Make WALEditCodec pluggable

2013-06-11 Thread Jesse Yates (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680799#comment-13680799
 ] 

Jesse Yates commented on HBASE-8702:


Thanks Sergey! I'm planning on committing to trunk tomorrow, unless there are 
objections.

 Make WALEditCodec pluggable
 ---

 Key: HBASE-8702
 URL: https://issues.apache.org/jira/browse/HBASE-8702
 Project: HBase
  Issue Type: Improvement
  Components: Replication, wal
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.98.0, 0.95.2, 0.94.9

 Attachments: hbase-8702-0.94-v0.patch, hbase-8702-trunk-v0.patch, 
 hbase-8702-trunk-v1.patch


 WALEditCode needs to be pluggable to support alternative serialziation 
 mechanisms. 
 The open question here is whether to support the alternative codec when doing 
 replication - both clusters would need the codec on the classpath, which has 
 additional overhead and also will be a little bit complicated when making the 
 WAL serialization backwards compatible in 0.94. 
 This is the follow-up to HBASE-8636.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8700) IntegrationTestBigLinkedList can fail due to random number collision


[ 
https://issues.apache.org/jira/browse/HBASE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680803#comment-13680803
 ] 

Sergey Shelukhin commented on HBASE-8700:
-

latter - maybe
former - they are

 IntegrationTestBigLinkedList can fail due to random number collision
 

 Key: HBASE-8700
 URL: https://issues.apache.org/jira/browse/HBASE-8700
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8700-v0.patch, HBASE-8700-v1.patch


 The test can fail due to random number collision, claiming there are 
 unreferenced elements for obvious reasons (we rewrite some link). Original 
 Accumulo test has one-stage generation so it doesn't count unreferenced 
 elements as failures, only undefined ones. With 200m longs out of half-long 
 range the probability of collision is approx 0.2%.
 Moreover, without some way to debug, it's hard to debug what keys should be 
 looked at in such cases

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8617) Introducing a new config to disable writes during recovering


[ 
https://issues.apache.org/jira/browse/HBASE-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680808#comment-13680808
 ] 

Jeffrey Zhong commented on HBASE-8617:
--

[~ted_yu] are you good on v2 patch? Thanks.

 Introducing a new config to disable writes during recovering 
 -

 Key: HBASE-8617
 URL: https://issues.apache.org/jira/browse/HBASE-8617
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.98.0, 0.95.1
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Attachments: HBASE-8617.patch, HBASE-8617-v2.patch


 In distributedLogReplay(hbase-7006), we allow writes even when a region is in 
 recovering. It may cause undesired behavior when applications(or deployments) 
 already are near its write capacity because distributedLogReplay generates 
 more write traffic to remaining region servers.
 The new config hbase.regionserver.disallow.writes.when.recovering tries to 
 address the above situation so that recovering won't be affected by 
 application normal write traffic.
 The default value of this config is false(meaning allow writes in recovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8617) Introducing a new config to disable writes during recovering


[ 
https://issues.apache.org/jira/browse/HBASE-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680811#comment-13680811
 ] 

Ted Yu commented on HBASE-8617:
---

+1

 Introducing a new config to disable writes during recovering 
 -

 Key: HBASE-8617
 URL: https://issues.apache.org/jira/browse/HBASE-8617
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.98.0, 0.95.1
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Attachments: HBASE-8617.patch, HBASE-8617-v2.patch


 In distributedLogReplay(hbase-7006), we allow writes even when a region is in 
 recovering. It may cause undesired behavior when applications(or deployments) 
 already are near its write capacity because distributedLogReplay generates 
 more write traffic to remaining region servers.
 The new config hbase.regionserver.disallow.writes.when.recovering tries to 
 address the above situation so that recovering won't be affected by 
 application normal write traffic.
 The default value of this config is false(meaning allow writes in recovery)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8652) Number of compacting KVs is not reset at the end of compaction


 [ 
https://issues.apache.org/jira/browse/HBASE-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8652:
--

Component/s: Compaction

 Number of compacting KVs is not reset at the end of compaction
 --

 Key: HBASE-8652
 URL: https://issues.apache.org/jira/browse/HBASE-8652
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: Ted Yu
Priority: Minor

 Looking at master:60010/master-status#compactStas , I noticed that 'Num. 
 Compacting KVs' column stays unchanged at non-zero value(s).
 In DefaultCompactor#compact(), we have this at the beginning:
 {code}
 this.progress = new CompactionProgress(fd.maxKeyCount);
 {code}
 But progress.totalCompactingKVs is not reset at the end of compact().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8664) Small fix ups for memory size outputs in UI


[ 
https://issues.apache.org/jira/browse/HBASE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680832#comment-13680832
 ] 

Hudson commented on HBASE-8664:
---

Integrated in hbase-0.95-on-hadoop2 #129 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/129/])
HBASE-8664 Small fix ups for memory size outputs in UI (Revision 1491903)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/RegionServerListTmpl.jamon
* 
/hbase/branches/0.95/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RegionListTmpl.jamon
* 
/hbase/branches/0.95/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/ServerMetricsTmpl.jamon


 Small fix ups for memory size outputs in UI
 ---

 Key: HBASE-8664
 URL: https://issues.apache.org/jira/browse/HBASE-8664
 Project: HBase
  Issue Type: Bug
  Components: UI
Reporter: stack
Assignee: stack
 Fix For: 0.98.0, 0.95.1

 Attachments: ui.txt


 This issue goes in the 'polish' category.  On regionserver ui, we were 
 listing raw bytes for heap size, memstore size, etc.  I put in place 
 StringUtils.humanReadableInt (looked to see if bootstrap could do it for us 
 but doesn't seem so, not w/o plugin).  I then made all the megabytes and 
 kilobytes match StringUtils.humanReadableInt with its 'm' instead of 'MB' and 
 'k' instead of KB.  Removed a stray KB that was in the wrong place too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8344) Improve the assignment when node failures happen to choose the secondary RS as the new primary RS


[ 
https://issues.apache.org/jira/browse/HBASE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680831#comment-13680831
 ] 

Hudson commented on HBASE-8344:
---

Integrated in hbase-0.95-on-hadoop2 #129 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/129/])
HBASE-8344. Improves the assignment when node failures happen to choose the 
secondary RS as the new primary RS (Revision 1491996)

 Result = FAILURE
ddas : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeAssignmentHelper.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodes.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionPlacement.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestFavoredNodeAssignmentHelper.java


 Improve the assignment when node failures happen to choose the secondary RS 
 as the new primary RS
 -

 Key: HBASE-8344
 URL: https://issues.apache.org/jira/browse/HBASE-8344
 Project: HBase
  Issue Type: Sub-task
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Critical
 Fix For: 0.95.2

 Attachments: hbase-8344-1.txt, hbase-8344-2.1.txt, 
 hbase-8344-2.2.txt, hbase-8344-2.3.txt, hbase-8344-2.4.txt, 
 hbase-8344-2.5.txt, hbase-8344-2.6.txt, hbase-8344-2.6.txt, 
 hbase-8344-2.7.txt, hbase-8344-2.7.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8724) [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging dir on hdfs


[ 
https://issues.apache.org/jira/browse/HBASE-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680843#comment-13680843
 ] 

Hudson commented on HBASE-8724:
---

Integrated in HBase-0.94-security #164 (See 
[https://builds.apache.org/job/HBase-0.94-security/164/])
HBASE-8724 [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging 
dir on hdfs (Revision 1491993)

 Result = SUCCESS
enis : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java


 [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging dir on hdfs
 ---

 Key: HBASE-8724
 URL: https://issues.apache.org/jira/browse/HBASE-8724
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.94.9

 Attachments: hbase-8724_v1.patch


 On 0.94, ExportSnapshot uses hbase.tmp.dir as the job's staging directory 
 on hdfs. However, hbase.tmp.dir is by definition a local directory, thus 
 should not be used as an hdfs directory for the job. 
 Trunk uses JobUtil.getStagingDir() which gets the staging dir from 
 JobSubmissionFiles class in Hadoop, so trunk is fine. 
 We've discovered this since it fails the test on windows, but this is not 
 windows-specific as per above (like specifying hbase.tmp.dir as 
 /var/hbase/tmp/ etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8724) [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging dir on hdfs


[ 
https://issues.apache.org/jira/browse/HBASE-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680849#comment-13680849
 ] 

Hudson commented on HBASE-8724:
---

Integrated in HBase-0.94 #1010 (See 
[https://builds.apache.org/job/HBase-0.94/1010/])
HBASE-8724 [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging 
dir on hdfs (Revision 1491993)

 Result = SUCCESS
enis : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/snapshot/ExportSnapshot.java


 [0.94] ExportSnapshot should not use hbase.tmp.dir as a staging dir on hdfs
 ---

 Key: HBASE-8724
 URL: https://issues.apache.org/jira/browse/HBASE-8724
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, snapshots
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.94.9

 Attachments: hbase-8724_v1.patch


 On 0.94, ExportSnapshot uses hbase.tmp.dir as the job's staging directory 
 on hdfs. However, hbase.tmp.dir is by definition a local directory, thus 
 should not be used as an hdfs directory for the job. 
 Trunk uses JobUtil.getStagingDir() which gets the staging dir from 
 JobSubmissionFiles class in Hadoop, so trunk is fine. 
 We've discovered this since it fails the test on windows, but this is not 
 windows-specific as per above (like specifying hbase.tmp.dir as 
 /var/hbase/tmp/ etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8344) Improve the assignment when node failures happen to choose the secondary RS as the new primary RS


[ 
https://issues.apache.org/jira/browse/HBASE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680856#comment-13680856
 ] 

Hudson commented on HBASE-8344:
---

Integrated in HBase-TRUNK #4174 (See 
[https://builds.apache.org/job/HBase-TRUNK/4174/])
HBASE-8344. Improves the assignment when node failures happen to choose the 
secondary RS as the new primary RS (Revision 1491994)

 Result = FAILURE
ddas : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeAssignmentHelper.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodes.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionPlacement.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestFavoredNodeAssignmentHelper.java


 Improve the assignment when node failures happen to choose the secondary RS 
 as the new primary RS
 -

 Key: HBASE-8344
 URL: https://issues.apache.org/jira/browse/HBASE-8344
 Project: HBase
  Issue Type: Sub-task
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Critical
 Fix For: 0.95.2

 Attachments: hbase-8344-1.txt, hbase-8344-2.1.txt, 
 hbase-8344-2.2.txt, hbase-8344-2.3.txt, hbase-8344-2.4.txt, 
 hbase-8344-2.5.txt, hbase-8344-2.6.txt, hbase-8344-2.6.txt, 
 hbase-8344-2.7.txt, hbase-8344-2.7.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8344) Improve the assignment when node failures happen to choose the secondary RS as the new primary RS


[ 
https://issues.apache.org/jira/browse/HBASE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680864#comment-13680864
 ] 

Hudson commented on HBASE-8344:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #564 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/564/])
HBASE-8344. Improves the assignment when node failures happen to choose the 
secondary RS as the new primary RS (Revision 1491994)

 Result = FAILURE
ddas : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeAssignmentHelper.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodes.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionPlacement.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestFavoredNodeAssignmentHelper.java


 Improve the assignment when node failures happen to choose the secondary RS 
 as the new primary RS
 -

 Key: HBASE-8344
 URL: https://issues.apache.org/jira/browse/HBASE-8344
 Project: HBase
  Issue Type: Sub-task
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Critical
 Fix For: 0.95.2

 Attachments: hbase-8344-1.txt, hbase-8344-2.1.txt, 
 hbase-8344-2.2.txt, hbase-8344-2.3.txt, hbase-8344-2.4.txt, 
 hbase-8344-2.5.txt, hbase-8344-2.6.txt, hbase-8344-2.6.txt, 
 hbase-8344-2.7.txt, hbase-8344-2.7.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8664) Small fix ups for memory size outputs in UI


[ 
https://issues.apache.org/jira/browse/HBASE-8664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680865#comment-13680865
 ] 

Hudson commented on HBASE-8664:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #564 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/564/])
HBASE-8664 Small fix ups for memory size outputs in UI (Revision 1491902)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/RegionServerListTmpl.jamon
* 
/hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RegionListTmpl.jamon
* 
/hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/ServerMetricsTmpl.jamon


 Small fix ups for memory size outputs in UI
 ---

 Key: HBASE-8664
 URL: https://issues.apache.org/jira/browse/HBASE-8664
 Project: HBase
  Issue Type: Bug
  Components: UI
Reporter: stack
Assignee: stack
 Fix For: 0.98.0, 0.95.1

 Attachments: ui.txt


 This issue goes in the 'polish' category.  On regionserver ui, we were 
 listing raw bytes for heap size, memstore size, etc.  I put in place 
 StringUtils.humanReadableInt (looked to see if bootstrap could do it for us 
 but doesn't seem so, not w/o plugin).  I then made all the megabytes and 
 kilobytes match StringUtils.humanReadableInt with its 'm' instead of 'MB' and 
 'k' instead of KB.  Removed a stray KB that was in the wrong place too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4811) Support reverse Scan

2013-06-11 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680867#comment-13680867
 ] 

Lars Hofhansl commented on HBASE-4811:
--

v9/v10 is much nicer. Few comments:
* do we need NonReversedNonLazyKeyValueScanner? Could add unsupported 
implementations for these methods to NonLazyKeyValueScanner.

* Instead of leaking backwardSeek and seekToLastRow out of the Reversed* 
classes, should we have an initScan() (or maybe setup()) method on the scanners 
that does the right thing? I.e. a ReversedScanner would do the 
seekToLastRow/backwardSeek stuff, and a normal scanner would just seek.

* This: {code}
+  @Override
+  public synchronized boolean reseek(KeyValue kv) throws IOException {
+checkReseek();
+return heap.backwardSeek(kv);
+  }
{code} and this {code}
+  @Override
+  public boolean backwardSeek(KeyValue key) throws IOException {
+checkReseek();
+return this.heap.backwardSeek(key);
+  }
{code}
Is weird. It should either scan backwards or not? If we do what I suggested in 
the previous point, we would not need this, I think.

That way only MemstoreScanner and StoreFileScanner would be special. And they 
have to special, because they are opened ahead of time (well, at least 
StoreFileScanner is).

Sorry for being pain in the ***.

 Support reverse Scan
 

 Key: HBASE-4811
 URL: https://issues.apache.org/jira/browse/HBASE-4811
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 0.20.6, 0.94.7
Reporter: John Carrino
Assignee: Liang Xie
 Attachments: 4811-trunk-v10.txt, 4811-trunk-v5.patch, 
 HBase-4811-0.94.3modified.txt, HBase-4811-0.94-v2.txt, 
 hbase-4811-trunkv1.patch, hbase-4811-trunkv4.patch, hbase-4811-trunkv6.patch, 
 hbase-4811-trunkv7.patch, hbase-4811-trunkv8.patch, hbase-4811-trunkv9.patch


 All the documentation I find about HBase says that if you want forward and 
 reverse scans you should just build 2 tables and one be ascending and one 
 descending.  Is there a fundamental reason that HBase only supports forward 
 Scan?  It seems like a lot of extra space overhead and coding overhead (to 
 keep them in sync) to support 2 tables.  
 I am assuming this has been discussed before, but I can't find the 
 discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8344) Improve the assignment when node failures happen to choose the secondary RS as the new primary RS


[ 
https://issues.apache.org/jira/browse/HBASE-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680873#comment-13680873
 ] 

Hudson commented on HBASE-8344:
---

Integrated in hbase-0.95 #237 (See 
[https://builds.apache.org/job/hbase-0.95/237/])
HBASE-8344. Improves the assignment when node failures happen to choose the 
secondary RS as the new primary RS (Revision 1491996)

 Result = SUCCESS
ddas : 
Files : 
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeAssignmentHelper.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodes.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionPlacement.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestFavoredNodeAssignmentHelper.java


 Improve the assignment when node failures happen to choose the secondary RS 
 as the new primary RS
 -

 Key: HBASE-8344
 URL: https://issues.apache.org/jira/browse/HBASE-8344
 Project: HBase
  Issue Type: Sub-task
Reporter: Devaraj Das
Assignee: Devaraj Das
Priority: Critical
 Fix For: 0.95.2

 Attachments: hbase-8344-1.txt, hbase-8344-2.1.txt, 
 hbase-8344-2.2.txt, hbase-8344-2.3.txt, hbase-8344-2.4.txt, 
 hbase-8344-2.5.txt, hbase-8344-2.6.txt, hbase-8344-2.6.txt, 
 hbase-8344-2.7.txt, hbase-8344-2.7.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8700) IntegrationTestBigLinkedList can fail due to random number collision


 [ 
https://issues.apache.org/jira/browse/HBASE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-8700:


Attachment: HBASE-8700-0.94.patch

94 patch

 IntegrationTestBigLinkedList can fail due to random number collision
 

 Key: HBASE-8700
 URL: https://issues.apache.org/jira/browse/HBASE-8700
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8700-0.94.patch, HBASE-8700-v0.patch, 
 HBASE-8700-v1.patch


 The test can fail due to random number collision, claiming there are 
 unreferenced elements for obvious reasons (we rewrite some link). Original 
 Accumulo test has one-stage generation so it doesn't count unreferenced 
 elements as failures, only undefined ones. With 200m longs out of half-long 
 range the probability of collision is approx 0.2%.
 Moreover, without some way to debug, it's hard to debug what keys should be 
 looked at in such cases

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-8541) implement flush-into-stripes in stripe compactions

[
https://issues.apache.org/jira/browse/HBASE-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HBASE-8541:

Status: Patch Available (was: Open)

implement flush-into-stripes in stripe compactions
--

Key: HBASE-8541
URL: https://issues.apache.org/jira/browse/HBASE-8541
Project: HBase
Issue Type: Improvement
Reporter: Sergey Shelukhin
Attachments: HBASE-8541-latest-with-dependencies.patch,
HBASE-8541-v0.patch

Flush will be able to flush into multiple files under this design, avoiding
L0 I/O amplification.
I have the patch which is missing just one feature - support for concurrent
flushes and stripe changes. This can be done via extensive try-locking of
stripe changes and flushes, or advisory flags without blocking flushes,
dumping conflicting flushes into L0 in case of (very rare) collisions. For
file loading for the latter, a set-cover-like problem needs to be solved to
determine optimal stripes. That will also address Jimmy's concern of getting
rid of metadata, btw. However currently I don't have time for that. I plan to
attach the try-locking patch first, but this won't happen for a couple weeks
probably and should not block main reviews. Hopefully this will be added on
top of main reviews.

[jira] [Updated] (HBASE-8541) implement flush-into-stripes in stripe compactions

[
https://issues.apache.org/jira/browse/HBASE-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HBASE-8541:

Attachment: HBASE-8541-latest-with-dependencies.patch
HBASE-8541-v0.patch

First cut of the patch. This is what I used for perf testing, so it's verified
on cluster. It's based on previous stripe compaction patches up to HBASE-8000

implement flush-into-stripes in stripe compactions
--

[jira] [Commented] (HBASE-8715) HBase should support IO QOS


[ 
https://issues.apache.org/jira/browse/HBASE-8715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680895#comment-13680895
 ] 

Sergey Shelukhin commented on HBASE-8715:
-

HBase uses HDFS as main backing storage, so this will have to go thru it to 
actual file system level; does there need to be an HDFS JIRA to plumb this thru?

 HBase should support IO QOS
 ---

 Key: HBASE-8715
 URL: https://issues.apache.org/jira/browse/HBASE-8715
 Project: HBase
  Issue Type: New Feature
Reporter: Pritam Damania
Priority: Minor

 The operating system exposes system calls like ioprio_set/get to set 
 priorities for various threads doing IO.
 HBase can use this to accordingly prioritize operations like 
 flushes/compactions/WAL write etc to use the disk bandwidth more efficiently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4955) Use the official versions of surefire junit

2013-06-11 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680897#comment-13680897
 ] 

Nicolas Liochon commented on HBASE-4955:


Surefire 2.15 is available. I will give it a try 'soon'.

 Use the official versions of surefire  junit
 -

 Key: HBASE-4955
 URL: https://issues.apache.org/jira/browse/HBASE-4955
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0
 Environment: all
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
 Attachments: 4955.v1.patch, 4955.v2.patch, 4955.v2.patch, 
 4955.v2.patch, 4955.v2.patch, 4955.v3.patch, 4955.v3.patch, 4955.v3.patch, 
 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 
 4955.v4.patch, 4955.v5.patch, 8204.v4.patch


 We currently use private versions for Surefire  JUnit since HBASE-4763.
 This JIRA traks what we need to move to official versions.
 Surefire 2.11 is just out, but, after some tests, it does not contain all 
 what we need.
 JUnit. Could be for JUnit 4.11. Issue to monitor:
 https://github.com/KentBeck/junit/issues/359: fixed in our version, no 
 feedback for an integration on trunk
 Surefire: Could be for Surefire 2.12. Issues to monitor are:
 329 (category support): fixed, we use the official implementation from the 
 trunk
 786 (@Category with forkMode=always): fixed, we use the official 
 implementation from the trunk
 791 (incorrect elapsed time on test failure): fixed, we use the official 
 implementation from the trunk
 793 (incorrect time in the XML report): Not fixed (reopen) on trunk, fixed on 
 our version.
 760 (does not take into account the test method): fixed in trunk, not fixed 
 in our version
 798 (print immediately the test class name): not fixed in trunk, not fixed in 
 our version
 799 (Allow test parallelization when forkMode=always): not fixed in trunk, 
 not fixed in our version
 800 (redirectTestOutputToFile not taken into account): not yet fix on trunk, 
 fixed on our version
 800  793 are the more important to monitor, it's the only ones that are 
 fixed in our version but not on trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8706) Some improvement in snapshot

2013-06-11 Thread binlijin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680896#comment-13680896
 ] 

binlijin commented on HBASE-8706:
-

Patch looks good for me too.

 Some improvement in snapshot
 

 Key: HBASE-8706
 URL: https://issues.apache.org/jira/browse/HBASE-8706
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 0.94.8, 0.95.0
Reporter: binlijin
 Attachments: HBASE-8706-2.patch, HBASE-8706-3.patch, 
 HBASE-8706.patch, HBASE-8706-v4.patch


 (1)timeout for Procedure can not be configured.
 {code}
 Procedure's timeout
 ProcedureCoordinator
   final static long TIMEOUT_MILLIS_DEFAULT = 6;
createProcedure(ForeignExceptionDispatcher fed, String procName, byte[] 
 procArgs,
   ListString expectedMembers) {
 // build the procedure
 return new Procedure(this, fed, WAKE_MILLIS_DEFAULT, 
 TIMEOUT_MILLIS_DEFAULT,
 procName, procArgs, expectedMembers);
   }
 RegionServerSnapshotManager:
   /** Conf key for max time to keep threads in snapshot request pool waiting 
 */
   public static final String SNAPSHOT_TIMEOUT_MILLIS_KEY = 
 hbase.snapshot.region.timeout;
   /** Keep threads alive in request pool for max of 60 seconds */
   public static final long SNAPSHOT_TIMEOUT_MILLIS_DEFAULT = 6;
   public Subprocedure buildSubprocedure(SnapshotDescription snapshot) {
 long timeoutMillis = conf.getLong(SNAPSHOT_TIMEOUT_MILLIS_KEY,
 SNAPSHOT_TIMEOUT_MILLIS_DEFAULT);
 case FLUSH:
   SnapshotSubprocedurePool taskManager =
 new SnapshotSubprocedurePool(rss.getServerName().toString(), conf);
   }
 {code}
 (2)TakeSnapshotHandler
 after snapshotRegions we should call monitor.rethrowException(); to check if 
 there is exception and if there is we can skip the verifySnapshot
 (3)too much error message when error happened in some place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8700) IntegrationTestBigLinkedList can fail due to random number collision


[ 
https://issues.apache.org/jira/browse/HBASE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680899#comment-13680899
 ] 

Hudson commented on HBASE-8700:
---

Integrated in HBase-TRUNK #4175 (See 
[https://builds.apache.org/job/HBase-TRUNK/4175/])
HBASE-8700 IntegrationTestBigLinkedList can fail due to random number 
collision (Revision 1492034)

 Result = FAILURE
sershe : 
Files : 
* 
/hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java


 IntegrationTestBigLinkedList can fail due to random number collision
 

 Key: HBASE-8700
 URL: https://issues.apache.org/jira/browse/HBASE-8700
 Project: HBase
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-8700-0.94.patch, HBASE-8700-v0.patch, 
 HBASE-8700-v1.patch


 The test can fail due to random number collision, claiming there are 
 unreferenced elements for obvious reasons (we rewrite some link). Original 
 Accumulo test has one-stage generation so it doesn't count unreferenced 
 elements as failures, only undefined ones. With 200m longs out of half-long 
 range the probability of collision is approx 0.2%.
 Moreover, without some way to debug, it's hard to debug what keys should be 
 looked at in such cases

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8705) RS holding META when restarted in a single node setup may hang infinitely without META assignment


[ 
https://issues.apache.org/jira/browse/HBASE-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680901#comment-13680901
 ] 

ramkrishna.s.vasudevan commented on HBASE-8705:
---

Thanks Stack.  Will wait for another day before committing this.

 RS holding META when restarted in a single node setup may hang infinitely 
 without META assignment
 -

 Key: HBASE-8705
 URL: https://issues.apache.org/jira/browse/HBASE-8705
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.95.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Minor
 Fix For: 0.98.0

 Attachments: HBASE-8705.patch


 This bug may be minor as it likely to happen in a single node setup.
 I restarted the RS holding META. The master tried assigning META using 
 MetaSSH. But tried this before the new RS came up.
 So as not region plan is found 
 {code}
  if (plan == null) {
 LOG.warn(Unable to determine a plan to assign  + region);
 if (tomActivated){
   this.timeoutMonitor.setAllRegionServersOffline(true);
 } else {
   regionStates.updateRegionState(region, 
 RegionState.State.FAILED_OPEN);
 }
 return;
   }
 {code}
 we just return without assigment.  And this being the META the small cluster 
 just hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8700) IntegrationTestBigLinkedList can fail due to random number collision

2013-06-11 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13680911#comment-13680911
]

Hadoop QA commented on HBASE-8700:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12587355/HBASE-8700-0.94.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 3 new
or modified tests.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/6011//console

This message is automatically generated.

IntegrationTestBigLinkedList can fail due to random number collision

Key: HBASE-8700
URL: https://issues.apache.org/jira/browse/HBASE-8700
Project: HBase
Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Attachments: HBASE-8700-0.94.patch, HBASE-8700-v0.patch,
HBASE-8700-v1.patch

The test can fail due to random number collision, claiming there are
unreferenced elements for obvious reasons (we rewrite some link). Original
Accumulo test has one-stage generation so it doesn't count unreferenced
elements as failures, only undefined ones. With 200m longs out of half-long
range the probability of collision is approx 0.2%.
Moreover, without some way to debug, it's hard to debug what keys should be
looked at in such cases

[jira] [Commented] (HBASE-8667) Master and Regionserver not able to communicate if both bound to different network interfaces on the same machine.