[jira] [Assigned] (HBASE-5840) Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status

2012-04-20 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5840:
-

Assignee: ramkrishna.s.vasudevan

 Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing 
 the old status
 --

 Key: HBASE-5840
 URL: https://issues.apache.org/jira/browse/HBASE-5840
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Gopinathan A
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0, 0.94.1


 TaskMonitor Status will not be cleared in case Regions FAILED_OPEN. This will 
 keeps showing old status.
 This will miss leads the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5809) Avoid move api to take the destination server same as the source server.

2012-04-20 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5809:
-

Assignee: rajeshbabu

 Avoid move api to take the destination server same as the source server.
 

 Key: HBASE-5809
 URL: https://issues.apache.org/jira/browse/HBASE-5809
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.1
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
Priority: Minor
  Labels: patch
 Fix For: 0.96.0

 Attachments: HBASE-5809.patch


 In Move currently we take any destination specified and if the destination is 
 same as the source we still do unassign and assign.  Here we can have 
 problems due to RegionAlreadyInTransitionException and thus hanging the 
 region in RIT for long time.  We can avoid this scenario by not allowing the 
 move to happen in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5545) region can't be opened for a long time. Because the creating File failed.

2012-04-18 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5545:
-

Assignee: ramkrishna.s.vasudevan  (was: gaojinchao)

 region can't be opened for a long time. Because the creating File failed.
 -

 Key: HBASE-5545
 URL: https://issues.apache.org/jira/browse/HBASE-5545
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.6
Reporter: gaojinchao
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.7, 0.92.2, 0.94.0

 Attachments: HBASE-5545.patch, HBASE-5545.patch


 Scenario:
 
 1. File is created 
 2. But while writing data, all datanodes might have crashed. So writing data 
 will fail.
 3. Now even if close is called in finally block, close also will fail and 
 throw the Exception because writing data failed.
 4. After this if RS try to create the same file again, then 
 AlreadyBeingCreatedException will come.
 Suggestion to handle this scenario.
 ---
 1. Check for the existence of the file, if exists delete the file and create 
 new file. 
 Here delete call for the file will not check whether the file is open or 
 closed.
 Overwrite Option:
 
 1. Overwrite option will be applicable if you are trying to overwrite a 
 closed file.
 2. If the file is not closed, then even with overwrite option Same 
 AlreadyBeingCreatedException will be thrown.
 This is the expected behaviour to avoid the Multiple clients writing to same 
 file.
 Region server logs:
 org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
 create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo 
 for 
 DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 
 on client 158.1.132.19 because current leaseholder is trying to recreate file.
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658)
 at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131)
 at org.apache.hadoop.ipc.Client.call(Client.java:961)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245)
 at $Proxy6.create(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at $Proxy6.create(Unknown Source)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.init(DFSClient.java:3643)
 at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424)
 at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340)
 at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672)
 at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116)
 at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 [2012-03-07 20:51:45,858] [WARN ] 
 [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] 
 [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker 131] Retrying the 
 method call: public abstract void 
 

[jira] [Assigned] (HBASE-5816) Balancer and ServerShutdownHandler concurrently reassigning the same region

2012-04-18 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5816:
-

Assignee: ramkrishna.s.vasudevan

 Balancer and ServerShutdownHandler concurrently reassigning the same region
 ---

 Key: HBASE-5816
 URL: https://issues.apache.org/jira/browse/HBASE-5816
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Maryann Xue
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Attachments: HBASE-5816.patch


 The first assign thread exits with success after updating the RegionState to 
 PENDING_OPEN, while the second assign follows immediately into assign and 
 fails the RegionState check in setOfflineInZooKeeper(). This causes the 
 master to abort.
 In the below case, the two concurrent assigns occurred when AM tried to 
 assign a region to a dying/dead RS, and meanwhile the ShutdownServerHandler 
 tried to assign this region (from the region plan) spontaneously.
 2012-04-17 05:44:57,648 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., 
 src=hadoop05.sh.intel.com,60020,1334544902186, 
 dest=xmlqa-clv16.sh.intel.com,60020,1334612497253
 2012-04-17 05:44:57,648 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 
 (offlining)
 2012-04-17 05:44:57,648 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=hadoop05.sh.intel.com,60020,1334544902186, load=(requests=0, 
 regions=0, usedHeap=0, maxHeap=0) for region 
 TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.
 2012-04-17 05:44:57,666 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase/unassigned/fe38fe31caf40b6e607a3e6bbed6404b 
 (region=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.,
  server=hadoop05.sh.intel.com,60020,1334544902186, state=RS_ZK_REGION_CLOSING)
 2012-04-17 05:52:58,984 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 
 state=CLOSED, ts=1334612697672, 
 server=hadoop05.sh.intel.com,60020,1334544902186
 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x236b912e9b3000e Creating (or updating) unassigned node for 
 fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state
 2012-04-17 05:52:59,096 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.; 
 plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.,
  src=hadoop05.sh.intel.com,60020,1334544902186, 
 dest=xmlqa-clv16.sh.intel.com,60020,1334612497253
 2012-04-17 05:52:59,096 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to 
 xmlqa-clv16.sh.intel.com,60020,1334612497253
 2012-04-17 05:54:19,159 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 
 state=PENDING_OPEN, ts=1334613179096, 
 server=xmlqa-clv16.sh.intel.com,60020,1334612497253
 2012-04-17 05:54:59,033 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
 TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to 
 serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253, load=(requests=0, 
 regions=0, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0
 java.net.SocketTimeoutException: Call to /10.239.47.87:60020 failed on socket 
 timeout exception: java.net.SocketTimeoutException: 12 millis timeout 
 while waiting for channel to be ready for read. ch : 
 java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302 
 remote=/10.239.47.87:60020]
 at 
 org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:805)
 at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:778)
 at 
 org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:283)
 at $Proxy7.openRegion(Unknown Source)
 at 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:573)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1127)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:912)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:892)
  

[jira] [Assigned] (HBASE-5782) Not all the regions are getting assigned after the log splitting.

2012-04-15 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5782:
-

Assignee: ramkrishna.s.vasudevan

 Not all the regions are getting assigned after the log splitting.
 -

 Key: HBASE-5782
 URL: https://issues.apache.org/jira/browse/HBASE-5782
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Gopinathan A
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.94.0


 Create a table with 1000 splits, after the region assignemnt, kill the 
 regionserver wich contains META table.
 Here few regions are missing after the log splitting and region assigment. 
 HBCK report shows multiple region holes are got created.
 Same scenario was verified mulitple times in 0.92.1, no issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5651) [findbugs] Address wait/notify synchronization/inconsistency in sync

2012-03-29 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5651:
-

Assignee: ramkrishna.s.vasudevan

 [findbugs] Address wait/notify synchronization/inconsistency in sync
 

 Key: HBASE-5651
 URL: https://issues.apache.org/jira/browse/HBASE-5651
 Project: HBase
  Issue Type: Sub-task
  Components: scripts
Reporter: Jonathan Hsieh
Assignee: ramkrishna.s.vasudevan

 See 
 https://builds.apache.org/job/PreCommit-HBASE-Build/1313//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html#Warnings_MT_CORRECTNESS
 fix classes IS,LI,MWM, NN, SWL, UG, UW

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5595) Fix NoSuchMethodException in 0.92 when running on local filesystem

2012-03-18 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5595:
-

Assignee: Uma Mahesh

Thanks for the patch  Uma.  Added you to the contributor list.

 Fix NoSuchMethodException in 0.92 when running on local filesystem
 --

 Key: HBASE-5595
 URL: https://issues.apache.org/jira/browse/HBASE-5595
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Uma Mahesh
Priority: Critical
 Fix For: 0.92.2

 Attachments: HBASE-5595.patch


 Fix this ugly exception that shows when running 0.92.1 when on local 
 filesystem:
 {code}
 2012-03-16 10:54:48,351 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
 getNumCurrentReplicas--HDFS-826 not available; 
 hdfs_out=org.apache.hadoop.fs.FSDataOutputStream@301abf87
 java.lang.NoSuchMethodException: 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
 at java.lang.Class.getDeclaredMethod(Class.java:1937)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.getGetNumCurrentReplicas(HLog.java:425)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:408)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:331)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1229)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1218)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:937)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:648)
 at java.lang.Thread.run(Thread.java:680)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5595) Fix NoSuchMethodException in 0.92 when running on local filesystem

2012-03-18 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5595:
-

Assignee: Uma Maheswara Rao G  (was: Uma Mahesh)

 Fix NoSuchMethodException in 0.92 when running on local filesystem
 --

 Key: HBASE-5595
 URL: https://issues.apache.org/jira/browse/HBASE-5595
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Uma Maheswara Rao G
Priority: Critical
 Fix For: 0.92.2

 Attachments: HBASE-5595.patch


 Fix this ugly exception that shows when running 0.92.1 when on local 
 filesystem:
 {code}
 2012-03-16 10:54:48,351 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
 getNumCurrentReplicas--HDFS-826 not available; 
 hdfs_out=org.apache.hadoop.fs.FSDataOutputStream@301abf87
 java.lang.NoSuchMethodException: 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
 at java.lang.Class.getDeclaredMethod(Class.java:1937)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.getGetNumCurrentReplicas(HLog.java:425)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:408)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:331)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1229)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1218)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:937)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:648)
 at java.lang.Thread.run(Thread.java:680)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5531) Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot

2012-03-06 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5531:
-

Assignee: ramkrishna.s.vasudevan

 Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot
 -

 Key: HBASE-5531
 URL: https://issues.apache.org/jira/browse/HBASE-5531
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.92.2
Reporter: Laxman
Assignee: ramkrishna.s.vasudevan
  Labels: build
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5531-trunk.patch, HBASE-5531.patch


 Current profile is still pointing to 0.23.1-SNAPSHOT. 
 This is failing to build as 23.1 is already released and snapshot is not 
 available anymore.
 We can update this to 0.23.2-SNAPSHOT.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5531) Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot

2012-03-06 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5531:
-

Assignee: Laxman  (was: ramkrishna.s.vasudevan)

 Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot
 -

 Key: HBASE-5531
 URL: https://issues.apache.org/jira/browse/HBASE-5531
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.92.2
Reporter: Laxman
Assignee: Laxman
  Labels: build
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5531-trunk.patch, HBASE-5531.patch


 Current profile is still pointing to 0.23.1-SNAPSHOT. 
 This is failing to build as 23.1 is already released and snapshot is not 
 available anymore.
 We can update this to 0.23.2-SNAPSHOT.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5510) Change in LB.randomAssignment(ListServerName servers) API

2012-03-02 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5510:
-

Assignee: ramkrishna.s.vasudevan

 Change in LB.randomAssignment(ListServerName servers) API
 ---

 Key: HBASE-5510
 URL: https://issues.apache.org/jira/browse/HBASE-5510
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan

  In LB there is randomAssignment(ListServerName servers) API which will be 
 used by AM to assign
  a region from a down RS. [This will be also used in other cases like call to 
 assign() API from client]
  I feel it would be better to pass the HRegionInfo also into this method. 
 When the LB making a choice for a region
  assignment, when one RS is down, it would be nice that the LB knows for 
 which region it is doing this server selection.
 +Scenario+
  While one RS down, we wanted the regions to get moved to other RSs but a set 
 of regions stay together. We are having custom load balancer but with the 
 current way of LB interface this is not possible. Another way is I can allow 
 a random assignment of the regions at the RS down time. Later with a cluster 
 balance I can balance the regions as I need. But this might make regions 
 assign 1st to one RS and then again move to another. Also for some time 
 period my business use case can not get satisfied.
 Also I have seen some issue in JIRA which speaks about making sure that Root 
 and META regions always sit in some specific RSs. With the current LB API 
 this wont be possible in future.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5323) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master

2012-02-03 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5323:
-

Assignee: ramkrishna.s.vasudevan

 Need to handle assertion error while splitting log through 
 ServerShutDownHandler by shutting down the master
 

 Key: HBASE-5323
 URL: https://issues.apache.org/jira/browse/HBASE-5323
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.94.0, 0.90.7


 We know that while parsing the HLog we expect the proper length from HDFS.
 In WALReaderFSDataInputStream
 {code}
   assert(realLength = this.length);
 {code}
 We are trying to come out if the above condition is not satisfied.  But if 
 SSH.splitLog() gets this problem then it lands in the run method of 
 EventHandler.  This kills the SSH thread and so further assignment does not 
 happen.  If ROOT and META are to be assigned they cannot be.
 I think in this condition we abort the master by catching such exceptions.
 Please do suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-11 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5120:
-

Assignee: ramkrishna.s.vasudevan

 Timeout monitor races with table disable handler
 

 Key: HBASE-5120
 URL: https://issues.apache.org/jira/browse/HBASE-5120
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Zhihong Yu
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
 HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch


 Here is what J-D described here:
 https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
 I think I will retract from my statement that it used to be extremely racy 
 and caused more troubles than it fixed, on my first test I got a stuck 
 region in transition instead of being able to recover. The timeout was set to 
 2 minutes to be sure I hit it.
 First the region gets closed
 {quote}
 2012-01-04 00:16:25,811 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r5s38,62023,1325635980913 for region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 {quote}
 2 minutes later it times out:
 {quote}
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636185810, server=null
 2012-01-04 00:18:30,026 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,027 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 (offlining)
 {quote}
 100ms later the master finally gets the event:
 {quote}
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
 region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 1a4b111bcc228043e89f59c4c3f6a791
 2012-01-04 00:18:30,129 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
 deleting ZK node and removing from regions in transition, skipping assignment 
 of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Deleting existing unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
 region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
 {quote}
 At this point everything is fine, the region was processed as closed. But 
 wait, remember that line where it said it was going to force an unassign?
 {quote}
 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db03587 Creating unassigned node for 
 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
 2012-01-04 00:18:30,328 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
 java.lang.NullPointerException: Passed server is null for 
 1a4b111bcc228043e89f59c4c3f6a791
 {quote}
 Now the master is confused, it recreated the RIT znode but the region doesn't 
 even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
 this is what's going on.
 The late ZK notification that the znode was deleted (but it got recreated 
 after):
 {quote}
 2012-01-04 00:19:33,285 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
 deleted.
 {quote}
 Then it prints this, and much later tries to unassign it again:
 {quote}
 2012-01-04 00:19:46,607 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
 state=PENDING_CLOSE, ts=1325636310328, server=null
 ...
 2012-01-04 00:20:39,623 DEBUG 
 org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
 to clear regions in transition; 
 

[jira] [Assigned] (HBASE-4988) MetaServer crash cause all splitting regionserver abort

2012-01-09 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4988:
-

Assignee: chunhui shen

 MetaServer crash cause all splitting regionserver abort
 ---

 Key: HBASE-4988
 URL: https://issues.apache.org/jira/browse/HBASE-4988
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-4988v1.patch


 If metaserver crash now,
 All the splitting regionserver will abort theirself.
 Becasue the code
 {code}
 this.journal.add(JournalEntry.PONR);
 MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
 this.parent.getRegionInfo(), a.getRegionInfo(), 
 b.getRegionInfo());
 {code}
 If the JournalEntry is PONR, split's roll back will abort itselef.
 It is terrible in huge putting environment when metaserver crash

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-04 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5121:
-

Assignee: chunhui shen

 MajorCompaction may affect scan's correctness
 -

 Key: HBASE-5121
 URL: https://issues.apache.org/jira/browse/HBASE-5121
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-5121.patch


 In our test, there are two families' keyvalue for one row.
 But we could find a infrequent problem when doing scan's next if 
 majorCompaction happens concurrently.
 In the client's two continuous doing scan.next():
 1.First time, scan's next returns the result where family A is null.
 2.Second time, scan's next returns the result where family B is null.
 The two next()'s result have the same row.
 If there are more families, I think the scenario will be more strange...
 We find the reason is that storescanner.peek() is changed after 
 majorCompaction if there are delete type KeyValue.
 This change causes the PriorityQueueKeyValueScanner of RegionScanner's heap 
 is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE

2011-12-28 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5097:
-

Assignee: ramkrishna.s.vasudevan

 RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
 return null can stall the system initialization through NPE
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan

 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5086) Reopening a region on a RS can leave it in PENDING_OPEN

2011-12-23 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5086:
-

Assignee: ramkrishna.s.vasudevan

 Reopening a region on a RS can leave it in PENDING_OPEN
 ---

 Key: HBASE-5086
 URL: https://issues.apache.org/jira/browse/HBASE-5086
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.1


 I got this twice during the same test.
 If the region servers are slow enough and you run an online alter, it's 
 possible for the RS to change the znode status to CLOSED and have the master 
 send an OPEN before the region server is able to remove the region from it's 
 list of RITs.
 This is what the master sees:
 {quote}
 011-12-21 22:24:09,498 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. 
 (offlining)
 2011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db033f7 Creating unassigned node for 
 43123e2e3fc83ec25fe2a76b4f09077f in a CLOSING state
 2011-12-21 22:24:09,524 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 sv4r25s44,62023,1324494325099 for region 
 test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
 2011-12-21 22:24:15,656 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=sv4r25s44,62023,1324494325099, 
 region=43123e2e3fc83ec25fe2a76b4f09077f
 2011-12-21 22:24:15,656 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for 43123e2e3fc83ec25fe2a76b4f09077f
 2011-12-21 22:24:15,656 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. 
 state=CLOSED, ts=1324506255629, server=sv4r25s44,62023,1324494325099
 2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x134589d3db033f7 Creating (or updating) unassigned node for 
 43123e2e3fc83ec25fe2a76b4f09077f with OFFLINE state
 2011-12-21 22:24:15,663 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
 test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. destination 
 server is + sv4r25s44,62023,1324494325099
 2011-12-21 22:24:15,663 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.; 
 plan=hri=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., 
 src=, dest=sv4r25s44,62023,1324494325099
 2011-12-21 22:24:15,663 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. to 
 sv4r25s44,62023,1324494325099
 2011-12-21 22:24:15,664 ERROR 
 org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment in: 
 sv4r25s44,62023,1324494325099 due to 
 org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: 
 Received:OPEN for the 
 region:test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. ,which 
 we are already trying to CLOSE.
 {quote}
 After that the master abandons.
 And the region server:
 {quote}
 2011-12-21 22:24:09,523 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: 
 test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
 2011-12-21 22:24:09,523 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing 
 close of test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Closing test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.: 
 disabling compactions  flushes
 2011-12-21 22:24:09,524 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Running close preflush of 
 test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current 
 region memstore size 40.5m
 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished snapshotting 
 test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing 
 wait for mvcc, flushsize=42482936
 2011-12-21 22:24:13,368 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Renaming flushed file at 
 hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/87d6944c54c7417e9a34a9f9542bcb72
  to 
 

[jira] [Assigned] (HBASE-4951) master process can not be stopped when it is initializing

2011-12-08 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4951:
-

Assignee: ramkrishna.s.vasudevan

 master process can not be stopped when it is initializing
 -

 Key: HBASE-4951
 URL: https://issues.apache.org/jira/browse/HBASE-4951
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: xufeng
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.92.0, 0.90.5


 It is easy to reproduce by following step:
 step1:start master process.(do not start regionserver process in the cluster).
 the master will wait the regionserver to check in:
 org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to 
 checkin
 step2:stop the master by sh command bin/hbase master stop
 result:the master process will never die because catalogTracker.waitForRoot() 
 method will block unitl the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4729) Clash between region unassign and splitting kills the master

2011-11-29 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4729:
-

Assignee: stack  (was: ramkrishna.s.vasudevan)

Assigning to your name Stack.

 Clash between region unassign and splitting kills the master
 

 Key: HBASE-4729
 URL: https://issues.apache.org/jira/browse/HBASE-4729
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: stack
Priority: Critical
 Fix For: 0.92.0, 0.94.0

 Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729-v5.txt, 
 4729-v6-092.txt, 4729-v6-trunk.txt, 4729.txt


 I was running an online alter while regions were splitting, and suddenly the 
 master died and left my table half-altered (haven't restarted the master yet).
 What killed the master:
 {quote}
 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unexpected ZK exception creating node CLOSING
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)
 at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 A znode was created because the region server was splitting the region 4 
 seconds before:
 {quote}
 2011-11-02 17:06:40,704 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
 region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.
 2011-11-02 17:06:40,704 DEBUG 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: 
 regionserver:62023-0x132f043bbde0710 Creating ephemeral node for 
 f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state
 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Attempting to transition node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLITTING
 ...
 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Successfully transitioned node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLIT
 2011-11-02 17:06:44,061 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
 master to process the split for f7e1783e65ea8d621a4bc96ad310f101
 {quote}
 Now that the master is dead the region server is spewing those last two lines 
 like mad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss

2011-11-27 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4880:
-

Assignee: chunhui shen

 Region is on service before completing openRegionHanlder, may cause data loss
 -

 Key: HBASE-4880
 URL: https://issues.apache.org/jira/browse/HBASE-4880
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-4880.patch


 OpenRegionHandler in regionserver is processed as the following steps:
 {code}
 1.openregion()(Through it, closed = false, closing = false)
 2.addToOnlineRegions(region)
 3.update .meta. table 
 4.update ZK's node state to RS_ZK_REGION_OPEND
 {code}
 We can find that region is on service before Step 4.
 It means client could put data to this region after step 3.
 What will happen if step 4 is failed processing?
 It will execute OpenRegionHandler#cleanupFailedOpen which will do closing 
 region, and master assign this region to another regionserver.
 If closing region is failed, the data which is put between step 3 and step 4 
 may loss, because the region has been opend on another regionserver and be 
 put new data. Therefore, it may not be recoverd through replayRecoveredEdit() 
 because the edit's LogSeqId is smaller than current region SeqId.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.

2011-11-22 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4841:
-

Assignee: ramkrishna.s.vasudevan

 If I call split fast enough, while inserting, rows disappear. 
 --

 Key: HBASE-4841
 URL: https://issues.apache.org/jira/browse/HBASE-4841
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Attachments: 1, log, log2


 I'll attach a unit test for this. Basically if you call split, while 
 inserting data you can get to the point to where the cluster becomes 
 unstable, or rows will  disappear. The unit test gives you some flexibility 
 of:
 - How many rows
 - How wide the rows are
 - The frequency of the split. 
 The default settings crash unit tests or cause the unit tests to fail on my 
 laptop. On my macbook air, i could actually turn down the number of total 
 rows, and the frequency of the splits which is surprising. I think this is 
 because the macbook air has much better IO than my backup acer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-21 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4308:
-

Assignee: ramkrishna.s.vasudevan

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master

2011-11-15 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4796:
-

Assignee: ramkrishna.s.vasudevan

 Race between SplitRegionHandlers for the same region kills the master
 -

 Key: HBASE-4796
 URL: https://issues.apache.org/jira/browse/HBASE-4796
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.94.0


 I just saw that multiple SplitRegionHandlers can be created for the same 
 region because of the RS tickling, but it becomes deadly when more than 1 are 
 trying to delete the znode at the same time:
 {quote}
 2011-11-16 02:25:28,778 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
 region=f80b6a904048a99ce88d61420b8906d1
 2011-11-16 02:25:28,780 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
 region=f80b6a904048a99ce88d61420b8906d1
 2011-11-16 02:25:28,796 DEBUG 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
 event for f80b6a904048a99ce88d61420b8906d1; deleting node
 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Deleting existing unassigned node for 
 f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,804 DEBUG 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
 event for f80b6a904048a99ce88d61420b8906d1; deleting node
 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Deleting existing unassigned node for 
 f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:62003-0x132f043bbde094b Successfully deleted unassigned node for 
 region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT
 2011-11-16 02:25:28,821 INFO 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT 
 report); 
 parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. 
 daughter 
 a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter
  b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839.
 2011-11-16 02:25:28,829 WARN 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
 /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this 
 is not a retry
 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error 
 deleting SPLIT node in ZK for transition ZK node 
 (f80b6a904048a99ce88d61420b8906d1)
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
   at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107)
   at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:506)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:453)
   at 
 org.apache.hadoop.hbase.master.handler.SplitRegionHandler.process(SplitRegionHandler.java:95)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {quote}
 Stack and I came up with the solution that we need just manage that exception 
 because handleSplitReport is an in-memory thing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4729) Race between online altering and splitting kills the master

2011-11-04 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4729:
-

Assignee: ramkrishna.s.vasudevan

 Race between online altering and splitting kills the master
 ---

 Key: HBASE-4729
 URL: https://issues.apache.org/jira/browse/HBASE-4729
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0, 0.94.0


 I was running an online alter while regions were splitting, and suddenly the 
 master died and left my table half-altered (haven't restarted the master yet).
 What killed the master:
 {quote}
 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unexpected ZK exception creating node CLOSING
 org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
 NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)
 at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {quote}
 A znode was created because the region server was splitting the region 4 
 seconds before:
 {quote}
 2011-11-02 17:06:40,704 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
 region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.
 2011-11-02 17:06:40,704 DEBUG 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: 
 regionserver:62023-0x132f043bbde0710 Creating ephemeral node for 
 f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state
 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Attempting to transition node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLITTING
 ...
 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:62023-0x132f043bbde0710 Successfully transitioned node 
 f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
 RS_ZK_REGION_SPLIT
 2011-11-02 17:06:44,061 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
 master to process the split for f7e1783e65ea8d621a4bc96ad310f101
 {quote}
 Now that the master is dead the region server is spewing those last two lines 
 like mad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4462) Properly treating SocketTimeoutException

2011-10-17 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4462:
-

Assignee: ramkrishna.s.vasudevan

 Properly treating SocketTimeoutException
 

 Key: HBASE-4462
 URL: https://issues.apache.org/jira/browse/HBASE-4462
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.90.5


 SocketTimeoutException is currently treated like any IOE inside of 
 HCM.getRegionServerWithRetries and I think this is a problem. This method 
 should only do retries in cases where we are pretty sure the operation will 
 complete, but with STE we already waited for (by default) 60 seconds and 
 nothing happened.
 I found this while debugging Douglas Campbell's problem on the mailing list 
 where it seemed like he was using the same scanner from multiple threads, but 
 actually it was just the same client doing retries while the first run didn't 
 even finish yet (that's another problem). You could see the first scanner, 
 then up to two other handlers waiting for it to finish in order to run 
 (because of the synchronization on RegionScanner).
 So what should we do? We could treat STE as a DoNotRetryException and let the 
 client deal with it, or we could retry only once.
 There's also the option of having a different behavior for get/put/icv/scan, 
 the issue with operations that modify a cell is that you don't know if the 
 operation completed or not (same when a RS dies hard after completing let's 
 say a Put but just before returning to the client).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4459) HbaseObjectWritable code is a byte, we will eventually run out of codes

2011-10-12 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4459:
-

Assignee: ramkrishna.s.vasudevan

 HbaseObjectWritable code is a byte, we will eventually run out of codes
 ---

 Key: HBASE-4459
 URL: https://issues.apache.org/jira/browse/HBASE-4459
 Project: HBase
  Issue Type: Bug
  Components: io
Reporter: Jonathan Gray
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.92.0


 There are about 90 classes/codes in HbaseObjectWritable currently and 
 Byte.MAX_VALUE is 127.  In addition, anyone wanting to add custom classes but 
 not break compatibility might want to leave a gap before using codes and 
 that's difficult in such limited space.
 Eventually we should get rid of this pattern that makes compatibility 
 difficult (better client/server protocol handshake) but we should probably at 
 least bump this to a short for 0.94.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira