[jira] [Assigned] (HBASE-5809) Avoid move api to take the destination server same as the source server.

2012-04-20 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5809:
-

Assignee: rajeshbabu

> Avoid move api to take the destination server same as the source server.
> 
>
> Key: HBASE-5809
> URL: https://issues.apache.org/jira/browse/HBASE-5809
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.1
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
>Priority: Minor
>  Labels: patch
> Fix For: 0.96.0
>
> Attachments: HBASE-5809.patch
>
>
> In Move currently we take any destination specified and if the destination is 
> same as the source we still do unassign and assign.  Here we can have 
> problems due to RegionAlreadyInTransitionException and thus hanging the 
> region in RIT for long time.  We can avoid this scenario by not allowing the 
> move to happen in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5840) Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing the old status

2012-04-20 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5840:
-

Assignee: ramkrishna.s.vasudevan

> Open Region FAILED_OPEN doesn't clear the TaskMonitor Status, keeps showing 
> the old status
> --
>
> Key: HBASE-5840
> URL: https://issues.apache.org/jira/browse/HBASE-5840
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.94.0
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.96.0, 0.94.1
>
>
> TaskMonitor Status will not be cleared in case Regions FAILED_OPEN. This will 
> keeps showing old status.
> This will miss leads the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5816) Balancer and ServerShutdownHandler concurrently reassigning the same region

2012-04-18 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5816:
-

Assignee: ramkrishna.s.vasudevan

> Balancer and ServerShutdownHandler concurrently reassigning the same region
> ---
>
> Key: HBASE-5816
> URL: https://issues.apache.org/jira/browse/HBASE-5816
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.6
>Reporter: Maryann Xue
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Attachments: HBASE-5816.patch
>
>
> The first assign thread exits with success after updating the RegionState to 
> PENDING_OPEN, while the second assign follows immediately into "assign" and 
> fails the RegionState check in setOfflineInZooKeeper(). This causes the 
> master to abort.
> In the below case, the two concurrent assigns occurred when AM tried to 
> assign a region to a dying/dead RS, and meanwhile the ShutdownServerHandler 
> tried to assign this region (from the region plan) spontaneously.
> 2012-04-17 05:44:57,648 INFO org.apache.hadoop.hbase.master.HMaster: balance 
> hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b., 
> src=hadoop05.sh.intel.com,60020,1334544902186, 
> dest=xmlqa-clv16.sh.intel.com,60020,1334612497253
> 2012-04-17 05:44:57,648 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 
> (offlining)
> 2012-04-17 05:44:57,648 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> serverName=hadoop05.sh.intel.com,60020,1334544902186, load=(requests=0, 
> regions=0, usedHeap=0, maxHeap=0) for region 
> TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.
> 2012-04-17 05:44:57,666 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
> node: /hbase/unassigned/fe38fe31caf40b6e607a3e6bbed6404b 
> (region=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.,
>  server=hadoop05.sh.intel.com,60020,1334544902186, state=RS_ZK_REGION_CLOSING)
> 2012-04-17 05:52:58,984 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
> was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 
> state=CLOSED, ts=1334612697672, 
> server=hadoop05.sh.intel.com,60020,1334544902186
> 2012-04-17 05:52:58,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:6-0x236b912e9b3000e Creating (or updating) unassigned node for 
> fe38fe31caf40b6e607a3e6bbed6404b with OFFLINE state
> 2012-04-17 05:52:59,096 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
> region TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.; 
> plan=hri=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b.,
>  src=hadoop05.sh.intel.com,60020,1334544902186, 
> dest=xmlqa-clv16.sh.intel.com,60020,1334612497253
> 2012-04-17 05:52:59,096 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to 
> xmlqa-clv16.sh.intel.com,60020,1334612497253
> 2012-04-17 05:54:19,159 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
> was=TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. 
> state=PENDING_OPEN, ts=1334613179096, 
> server=xmlqa-clv16.sh.intel.com,60020,1334612497253
> 2012-04-17 05:54:59,033 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment of 
> TABLE_ORDER_CUSTOMER,,1334017820846.fe38fe31caf40b6e607a3e6bbed6404b. to 
> serverName=xmlqa-clv16.sh.intel.com,60020,1334612497253, load=(requests=0, 
> regions=0, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0
> java.net.SocketTimeoutException: Call to /10.239.47.87:60020 failed on socket 
> timeout exception: java.net.SocketTimeoutException: 12 millis timeout 
> while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/10.239.47.89:41302 
> remote=/10.239.47.87:60020]
> at 
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:805)
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:778)
> at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:283)
> at $Proxy7.openRegion(Unknown Source)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:573)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1127)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:912)
> at 

[jira] [Assigned] (HBASE-5545) region can't be opened for a long time. Because the creating File failed.

2012-04-17 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5545:
-

Assignee: ramkrishna.s.vasudevan  (was: gaojinchao)

> region can't be opened for a long time. Because the creating File failed.
> -
>
> Key: HBASE-5545
> URL: https://issues.apache.org/jira/browse/HBASE-5545
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.6
>Reporter: gaojinchao
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.90.7, 0.92.2, 0.94.0
>
> Attachments: HBASE-5545.patch, HBASE-5545.patch
>
>
> Scenario:
> 
> 1. File is created 
> 2. But while writing data, all datanodes might have crashed. So writing data 
> will fail.
> 3. Now even if close is called in finally block, close also will fail and 
> throw the Exception because writing data failed.
> 4. After this if RS try to create the same file again, then 
> AlreadyBeingCreatedException will come.
> Suggestion to handle this scenario.
> ---
> 1. Check for the existence of the file, if exists delete the file and create 
> new file. 
> Here delete call for the file will not check whether the file is open or 
> closed.
> Overwrite Option:
> 
> 1. Overwrite option will be applicable if you are trying to overwrite a 
> closed file.
> 2. If the file is not closed, then even with overwrite option Same 
> AlreadyBeingCreatedException will be thrown.
> This is the expected behaviour to avoid the Multiple clients writing to same 
> file.
> Region server logs:
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to 
> create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo 
> for 
> DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 
> on client 158.1.132.19 because current leaseholder is trying to recreate file.
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131)
> at org.apache.hadoop.ipc.Client.call(Client.java:961)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245)
> at $Proxy6.create(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at $Proxy6.create(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:3643)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424)
> at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340)
> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672)
> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658)
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330)
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> [2012-03-07 20:51:45,858] [WARN ] 
> [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] 
> [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvo

[jira] [Assigned] (HBASE-5782) Not all the regions are getting assigned after the log splitting.

2012-04-15 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5782:
-

Assignee: ramkrishna.s.vasudevan

> Not all the regions are getting assigned after the log splitting.
> -
>
> Key: HBASE-5782
> URL: https://issues.apache.org/jira/browse/HBASE-5782
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.94.0
>Reporter: Gopinathan A
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.94.0
>
>
> Create a table with 1000 splits, after the region assignemnt, kill the 
> regionserver wich contains META table.
> Here few regions are missing after the log splitting and region assigment. 
> HBCK report shows multiple region holes are got created.
> Same scenario was verified mulitple times in 0.92.1, no issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5651) [findbugs] Address wait/notify synchronization/inconsistency in sync

2012-03-29 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5651:
-

Assignee: ramkrishna.s.vasudevan

> [findbugs] Address wait/notify synchronization/inconsistency in sync
> 
>
> Key: HBASE-5651
> URL: https://issues.apache.org/jira/browse/HBASE-5651
> Project: HBase
>  Issue Type: Sub-task
>  Components: scripts
>Reporter: Jonathan Hsieh
>Assignee: ramkrishna.s.vasudevan
>
> See 
> https://builds.apache.org/job/PreCommit-HBASE-Build/1313//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html#Warnings_MT_CORRECTNESS
> fix classes IS,LI,MWM, NN, SWL, UG, UW

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5595) Fix NoSuchMethodException in 0.92 when running on local filesystem

2012-03-18 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5595:
-

Assignee: Uma Maheswara Rao G  (was: Uma Mahesh)

> Fix NoSuchMethodException in 0.92 when running on local filesystem
> --
>
> Key: HBASE-5595
> URL: https://issues.apache.org/jira/browse/HBASE-5595
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 0.92.2
>
> Attachments: HBASE-5595.patch
>
>
> Fix this ugly exception that shows when running 0.92.1 when on local 
> filesystem:
> {code}
> 2012-03-16 10:54:48,351 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
> getNumCurrentReplicas--HDFS-826 not available; 
> hdfs_out=org.apache.hadoop.fs.FSDataOutputStream@301abf87
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
> at java.lang.Class.getDeclaredMethod(Class.java:1937)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLog.getGetNumCurrentReplicas(HLog.java:425)
> at org.apache.hadoop.hbase.regionserver.wal.HLog.(HLog.java:408)
> at org.apache.hadoop.hbase.regionserver.wal.HLog.(HLog.java:331)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1229)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1218)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:937)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:648)
> at java.lang.Thread.run(Thread.java:680)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5595) Fix NoSuchMethodException in 0.92 when running on local filesystem

2012-03-18 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5595:
-

Assignee: Uma Mahesh

Thanks for the patch  Uma.  Added you to the contributor list.

> Fix NoSuchMethodException in 0.92 when running on local filesystem
> --
>
> Key: HBASE-5595
> URL: https://issues.apache.org/jira/browse/HBASE-5595
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Uma Mahesh
>Priority: Critical
> Fix For: 0.92.2
>
> Attachments: HBASE-5595.patch
>
>
> Fix this ugly exception that shows when running 0.92.1 when on local 
> filesystem:
> {code}
> 2012-03-16 10:54:48,351 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
> getNumCurrentReplicas--HDFS-826 not available; 
> hdfs_out=org.apache.hadoop.fs.FSDataOutputStream@301abf87
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
> at java.lang.Class.getDeclaredMethod(Class.java:1937)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLog.getGetNumCurrentReplicas(HLog.java:425)
> at org.apache.hadoop.hbase.regionserver.wal.HLog.(HLog.java:408)
> at org.apache.hadoop.hbase.regionserver.wal.HLog.(HLog.java:331)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1229)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1218)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:937)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:648)
> at java.lang.Thread.run(Thread.java:680)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5531) Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot

2012-03-06 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5531:
-

Assignee: ramkrishna.s.vasudevan

> Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot
> -
>
> Key: HBASE-5531
> URL: https://issues.apache.org/jira/browse/HBASE-5531
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.92.2
>Reporter: Laxman
>Assignee: ramkrishna.s.vasudevan
>  Labels: build
> Fix For: 0.92.2, 0.94.0, 0.96.0
>
> Attachments: HBASE-5531-trunk.patch, HBASE-5531.patch
>
>
> Current profile is still pointing to 0.23.1-SNAPSHOT. 
> This is failing to build as 23.1 is already released and snapshot is not 
> available anymore.
> We can update this to 0.23.2-SNAPSHOT.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5531) Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot

2012-03-06 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5531:
-

Assignee: Laxman  (was: ramkrishna.s.vasudevan)

> Maven hadoop profile (version 23) needs to be updated with latest 23 snapshot
> -
>
> Key: HBASE-5531
> URL: https://issues.apache.org/jira/browse/HBASE-5531
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.92.2
>Reporter: Laxman
>Assignee: Laxman
>  Labels: build
> Fix For: 0.92.2, 0.94.0, 0.96.0
>
> Attachments: HBASE-5531-trunk.patch, HBASE-5531.patch
>
>
> Current profile is still pointing to 0.23.1-SNAPSHOT. 
> This is failing to build as 23.1 is already released and snapshot is not 
> available anymore.
> We can update this to 0.23.2-SNAPSHOT.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5510) Change in LB.randomAssignment(List servers) API

2012-03-02 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5510:
-

Assignee: ramkrishna.s.vasudevan

> Change in LB.randomAssignment(List servers) API
> ---
>
> Key: HBASE-5510
> URL: https://issues.apache.org/jira/browse/HBASE-5510
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.92.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>
>  In LB there is randomAssignment(List) API which will be 
> used by AM to assign
>  a region from a down RS. [This will be also used in other cases like call to 
> assign() API from client]
>  I feel it would be better to pass the HRegionInfo also into this method. 
> When the LB making a choice for a region
>  assignment, when one RS is down, it would be nice that the LB knows for 
> which region it is doing this server selection.
> +Scenario+
>  While one RS down, we wanted the regions to get moved to other RSs but a set 
> of regions stay together. We are having custom load balancer but with the 
> current way of LB interface this is not possible. Another way is I can allow 
> a random assignment of the regions at the RS down time. Later with a cluster 
> balance I can balance the regions as I need. But this might make regions 
> assign 1st to one RS and then again move to another. Also for some time 
> period my business use case can not get satisfied.
> Also I have seen some issue in JIRA which speaks about making sure that Root 
> and META regions always sit in some specific RSs. With the current LB API 
> this wont be possible in future.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5323) Need to handle assertion error while splitting log through ServerShutDownHandler by shutting down the master

2012-02-03 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5323:
-

Assignee: ramkrishna.s.vasudevan

> Need to handle assertion error while splitting log through 
> ServerShutDownHandler by shutting down the master
> 
>
> Key: HBASE-5323
> URL: https://issues.apache.org/jira/browse/HBASE-5323
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.5
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.0, 0.90.7
>
>
> We know that while parsing the HLog we expect the proper length from HDFS.
> In WALReaderFSDataInputStream
> {code}
>   assert(realLength >= this.length);
> {code}
> We are trying to come out if the above condition is not satisfied.  But if 
> SSH.splitLog() gets this problem then it lands in the run method of 
> EventHandler.  This kills the SSH thread and so further assignment does not 
> happen.  If ROOT and META are to be assigned they cannot be.
> I think in this condition we abort the master by catching such exceptions.
> Please do suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-11 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5120:
-

Assignee: ramkrishna.s.vasudevan

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Assignee: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
> Attachments: HBASE-5120.patch, HBASE-5120_1.patch, 
> HBASE-5120_2.patch, HBASE-5120_3.patch, HBASE-5120_4.patch, HBASE-5120_5.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTable

[jira] [Assigned] (HBASE-4988) MetaServer crash cause all splitting regionserver abort

2012-01-09 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4988:
-

Assignee: chunhui shen

> MetaServer crash cause all splitting regionserver abort
> ---
>
> Key: HBASE-4988
> URL: https://issues.apache.org/jira/browse/HBASE-4988
> Project: HBase
>  Issue Type: Bug
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: hbase-4988v1.patch
>
>
> If metaserver crash now,
> All the splitting regionserver will abort theirself.
> Becasue the code
> {code}
> this.journal.add(JournalEntry.PONR);
> MetaEditor.offlineParentInMeta(server.getCatalogTracker(),
> this.parent.getRegionInfo(), a.getRegionInfo(), 
> b.getRegionInfo());
> {code}
> If the JournalEntry is PONR, split's roll back will abort itselef.
> It is terrible in huge putting environment when metaserver crash

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-04 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5121:
-

Assignee: chunhui shen

> MajorCompaction may affect scan's correctness
> -
>
> Key: HBASE-5121
> URL: https://issues.apache.org/jira/browse/HBASE-5121
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: hbase-5121.patch
>
>
> In our test, there are two families' keyvalue for one row.
> But we could find a infrequent problem when doing scan's next if 
> majorCompaction happens concurrently.
> In the client's two continuous doing scan.next():
> 1.First time, scan's next returns the result where family A is null.
> 2.Second time, scan's next returns the result where family B is null.
> The two next()'s result have the same row.
> If there are more families, I think the scenario will be more strange...
> We find the reason is that storescanner.peek() is changed after 
> majorCompaction if there are delete type KeyValue.
> This change causes the PriorityQueue of RegionScanner's heap 
> is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE

2011-12-28 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5097:
-

Assignee: ramkrishna.s.vasudevan

> RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
> return null can stall the system initialization through NPE
> ---
>
> Key: HBASE-5097
> URL: https://issues.apache.org/jira/browse/HBASE-5097
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> In HRegionServer.java openScanner()
> {code}
>   r.prepareScanner(scan);
>   RegionScanner s = null;
>   if (r.getCoprocessorHost() != null) {
> s = r.getCoprocessorHost().preScannerOpen(scan);
>   }
>   if (s == null) {
> s = r.getScanner(scan);
>   }
>   if (r.getCoprocessorHost() != null) {
> s = r.getCoprocessorHost().postScannerOpen(scan, s);
>   }
> {code}
> If we dont have implemention for postScannerOpen the RegionScanner is null 
> and so throwing nullpointer 
> {code}
> java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
> {code}
> Making this defect as blocker.. Pls feel free to change the priority if am 
> wrong.  Also correct me if my way of trying out coprocessors without 
> implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5086) Reopening a region on a RS can leave it in PENDING_OPEN

2011-12-23 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-5086:
-

Assignee: ramkrishna.s.vasudevan

> Reopening a region on a RS can leave it in PENDING_OPEN
> ---
>
> Key: HBASE-5086
> URL: https://issues.apache.org/jira/browse/HBASE-5086
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.1
>
>
> I got this twice during the same test.
> If the region servers are slow enough and you run an online alter, it's 
> possible for the RS to change the znode status to CLOSED and have the master 
> send an OPEN before the region server is able to remove the region from it's 
> list of RITs.
> This is what the master sees:
> {quote}
> 011-12-21 22:24:09,498 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. 
> (offlining)
> 2011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db033f7 Creating unassigned node for 
> 43123e2e3fc83ec25fe2a76b4f09077f in a CLOSING state
> 2011-12-21 22:24:09,524 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r25s44,62023,1324494325099 for region 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:15,656 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r25s44,62023,1324494325099, 
> region=43123e2e3fc83ec25fe2a76b4f09077f
> 2011-12-21 22:24:15,656 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 43123e2e3fc83ec25fe2a76b4f09077f
> 2011-12-21 22:24:15,656 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
> was=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. 
> state=CLOSED, ts=1324506255629, server=sv4r25s44,62023,1324494325099
> 2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db033f7 Creating (or updating) unassigned node for 
> 43123e2e3fc83ec25fe2a76b4f09077f with OFFLINE state
> 2011-12-21 22:24:15,663 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. destination 
> server is + sv4r25s44,62023,1324494325099
> 2011-12-21 22:24:15,663 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
> region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.; 
> plan=hri=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., 
> src=, dest=sv4r25s44,62023,1324494325099
> 2011-12-21 22:24:15,663 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. to 
> sv4r25s44,62023,1324494325099
> 2011-12-21 22:24:15,664 ERROR 
> org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment in: 
> sv4r25s44,62023,1324494325099 due to 
> org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: 
> Received:OPEN for the 
> region:test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. ,which 
> we are already trying to CLOSE.
> {quote}
> After that the master abandons.
> And the region server:
> {quote}
> 2011-12-21 22:24:09,523 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:09,523 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing 
> close of test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Closing test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.: 
> disabling compactions & flushes
> 2011-12-21 22:24:09,524 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Running close preflush of 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
> 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Started memstore flush for 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current 
> region memstore size 40.5m
> 2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Finished snapshotting 
> test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing 
> wait for mvcc, flushsize=42482936
> 2011-12-21 22:24:13,368 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
> Renaming flushed file at 
> hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/87d6944c54c7417e9a34a9f9542bcb72

[jira] [Assigned] (HBASE-4951) master process can not be stopped when it is initializing

2011-12-08 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4951:
-

Assignee: ramkrishna.s.vasudevan

> master process can not be stopped when it is initializing
> -
>
> Key: HBASE-4951
> URL: https://issues.apache.org/jira/browse/HBASE-4951
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.3
>Reporter: xufeng
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.92.0, 0.90.5
>
>
> It is easy to reproduce by following step:
> step1:start master process.(do not start regionserver process in the cluster).
> the master will wait the regionserver to check in:
> org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to 
> checkin
> step2:stop the master by sh command bin/hbase master stop
> result:the master process will never die because catalogTracker.waitForRoot() 
> method will block unitl the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4729) Clash between region unassign and splitting kills the master

2011-11-29 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4729:
-

Assignee: stack  (was: ramkrishna.s.vasudevan)

Assigning to your name Stack.

> Clash between region unassign and splitting kills the master
> 
>
> Key: HBASE-4729
> URL: https://issues.apache.org/jira/browse/HBASE-4729
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: stack
>Priority: Critical
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729-v5.txt, 
> 4729-v6-092.txt, 4729-v6-trunk.txt, 4729.txt
>
>
> I was running an online alter while regions were splitting, and suddenly the 
> master died and left my table half-altered (haven't restarted the master yet).
> What killed the master:
> {quote}
> 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unexpected ZK exception creating node CLOSING
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)
> at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {quote}
> A znode was created because the region server was splitting the region 4 
> seconds before:
> {quote}
> 2011-11-02 17:06:40,704 INFO 
> org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
> region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.
> 2011-11-02 17:06:40,704 DEBUG 
> org.apache.hadoop.hbase.regionserver.SplitTransaction: 
> regionserver:62023-0x132f043bbde0710 Creating ephemeral node for 
> f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state
> 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:62023-0x132f043bbde0710 Attempting to transition node 
> f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
> RS_ZK_REGION_SPLITTING
> ...
> 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:62023-0x132f043bbde0710 Successfully transitioned node 
> f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
> RS_ZK_REGION_SPLIT
> 2011-11-02 17:06:44,061 INFO 
> org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
> master to process the split for f7e1783e65ea8d621a4bc96ad310f101
> {quote}
> Now that the master is dead the region server is spewing those last two lines 
> like mad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss

2011-11-27 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4880:
-

Assignee: chunhui shen

> Region is on service before completing openRegionHanlder, may cause data loss
> -
>
> Key: HBASE-4880
> URL: https://issues.apache.org/jira/browse/HBASE-4880
> Project: HBase
>  Issue Type: Bug
>Reporter: chunhui shen
>Assignee: chunhui shen
> Attachments: hbase-4880.patch
>
>
> OpenRegionHandler in regionserver is processed as the following steps:
> {code}
> 1.openregion()(Through it, closed = false, closing = false)
> 2.addToOnlineRegions(region)
> 3.update .meta. table 
> 4.update ZK's node state to RS_ZK_REGION_OPEND
> {code}
> We can find that region is on service before Step 4.
> It means client could put data to this region after step 3.
> What will happen if step 4 is failed processing?
> It will execute OpenRegionHandler#cleanupFailedOpen which will do closing 
> region, and master assign this region to another regionserver.
> If closing region is failed, the data which is put between step 3 and step 4 
> may loss, because the region has been opend on another regionserver and be 
> put new data. Therefore, it may not be recoverd through replayRecoveredEdit() 
> because the edit's LogSeqId is smaller than current region SeqId.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.

2011-11-22 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4841:
-

Assignee: ramkrishna.s.vasudevan

> If I call split fast enough, while inserting, rows disappear. 
> --
>
> Key: HBASE-4841
> URL: https://issues.apache.org/jira/browse/HBASE-4841
> Project: HBase
>  Issue Type: Bug
>Reporter: Alex Newman
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Attachments: 1, log, log2
>
>
> I'll attach a unit test for this. Basically if you call split, while 
> inserting data you can get to the point to where the cluster becomes 
> unstable, or rows will  disappear. The unit test gives you some flexibility 
> of:
> - How many rows
> - How wide the rows are
> - The frequency of the split. 
> The default settings crash unit tests or cause the unit tests to fail on my 
> laptop. On my macbook air, i could actually turn down the number of total 
> rows, and the frequency of the splits which is surprising. I think this is 
> because the macbook air has much better IO than my backup acer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-21 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4308:
-

Assignee: ramkrishna.s.vasudevan

> Race between RegionOpenedHandler and AssignmentManager
> --
>
> Key: HBASE-4308
> URL: https://issues.apache.org/jira/browse/HBASE-4308
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0
>
>
> When the master is processing a ZK event for REGION_OPENED, it calls delete() 
> on the znode before it removes the node from RegionsInTransition. If the 
> notification of that delete comes back into AssignmentManager before the 
> region is removed from RIT, you see an error like:
> 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
> master.AssignmentManager(861): Node deleted but still in RIT: 
> .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
> server=todd-w510,55655,1314751396840
> Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4796) Race between SplitRegionHandlers for the same region kills the master

2011-11-15 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4796:
-

Assignee: ramkrishna.s.vasudevan

> Race between SplitRegionHandlers for the same region kills the master
> -
>
> Key: HBASE-4796
> URL: https://issues.apache.org/jira/browse/HBASE-4796
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0, 0.94.0
>
>
> I just saw that multiple SplitRegionHandlers can be created for the same 
> region because of the RS tickling, but it becomes deadly when more than 1 are 
> trying to delete the znode at the same time:
> {quote}
> 2011-11-16 02:25:28,778 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
> region=f80b6a904048a99ce88d61420b8906d1
> 2011-11-16 02:25:28,780 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_SPLIT, server=sv4r7s38,62023,1321410237387, 
> region=f80b6a904048a99ce88d61420b8906d1
> 2011-11-16 02:25:28,796 DEBUG 
> org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
> event for f80b6a904048a99ce88d61420b8906d1; deleting node
> 2011-11-16 02:25:28,798 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x132f043bbde094b Deleting existing unassigned node for 
> f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
> 2011-11-16 02:25:28,804 DEBUG 
> org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handling SPLIT 
> event for f80b6a904048a99ce88d61420b8906d1; deleting node
> 2011-11-16 02:25:28,806 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x132f043bbde094b Deleting existing unassigned node for 
> f80b6a904048a99ce88d61420b8906d1 that is in expected state RS_ZK_REGION_SPLIT
> 2011-11-16 02:25:28,821 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x132f043bbde094b Successfully deleted unassigned node for 
> region f80b6a904048a99ce88d61420b8906d1 in expected state RS_ZK_REGION_SPLIT
> 2011-11-16 02:25:28,821 INFO 
> org.apache.hadoop.hbase.master.handler.SplitRegionHandler: Handled SPLIT 
> report); 
> parent=TestTable,006304,1321409743253.f80b6a904048a99ce88d61420b8906d1. 
> daughter 
> a=TestTable,006304,1321410325564.e0f5d201683bcabe14426817224334b8.daughter
>  b=TestTable,007054,1321410325564.1b82eeb5d230c47ccc51c08256134839.
> 2011-11-16 02:25:28,829 WARN 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
> /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1 already deleted, and this 
> is not a retry
> 2011-11-16 02:25:28,830 FATAL org.apache.hadoop.hbase.master.HMaster: Error 
> deleting SPLIT node in ZK for transition ZK node 
> (f80b6a904048a99ce88d61420b8906d1)
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for /hbase/unassigned/f80b6a904048a99ce88d61420b8906d1
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>   at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
>   at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:107)
>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:884)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:506)
>   at 
> org.apache.hadoop.hbase.zookeeper.ZKAssign.deleteNode(ZKAssign.java:453)
>   at 
> org.apache.hadoop.hbase.master.handler.SplitRegionHandler.process(SplitRegionHandler.java:95)
>   at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> {quote}
> Stack and I came up with the solution that we need just manage that exception 
> because handleSplitReport is an in-memory thing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4729) Race between online altering and splitting kills the master

2011-11-04 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4729:
-

Assignee: ramkrishna.s.vasudevan

> Race between online altering and splitting kills the master
> ---
>
> Key: HBASE-4729
> URL: https://issues.apache.org/jira/browse/HBASE-4729
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0, 0.94.0
>
>
> I was running an online alter while regions were splitting, and suddenly the 
> master died and left my table half-altered (haven't restarted the master yet).
> What killed the master:
> {quote}
> 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: 
> Unexpected ZK exception creating node CLOSING
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)
> at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {quote}
> A znode was created because the region server was splitting the region 4 
> seconds before:
> {quote}
> 2011-11-02 17:06:40,704 INFO 
> org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
> region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.
> 2011-11-02 17:06:40,704 DEBUG 
> org.apache.hadoop.hbase.regionserver.SplitTransaction: 
> regionserver:62023-0x132f043bbde0710 Creating ephemeral node for 
> f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state
> 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:62023-0x132f043bbde0710 Attempting to transition node 
> f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
> RS_ZK_REGION_SPLITTING
> ...
> 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:62023-0x132f043bbde0710 Successfully transitioned node 
> f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
> RS_ZK_REGION_SPLIT
> 2011-11-02 17:06:44,061 INFO 
> org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
> master to process the split for f7e1783e65ea8d621a4bc96ad310f101
> {quote}
> Now that the master is dead the region server is spewing those last two lines 
> like mad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4462) Properly treating SocketTimeoutException

2011-10-17 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4462:
-

Assignee: ramkrishna.s.vasudevan

> Properly treating SocketTimeoutException
> 
>
> Key: HBASE-4462
> URL: https://issues.apache.org/jira/browse/HBASE-4462
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.90.5
>
>
> SocketTimeoutException is currently treated like any IOE inside of 
> HCM.getRegionServerWithRetries and I think this is a problem. This method 
> should only do retries in cases where we are pretty sure the operation will 
> complete, but with STE we already waited for (by default) 60 seconds and 
> nothing happened.
> I found this while debugging Douglas Campbell's problem on the mailing list 
> where it seemed like he was using the same scanner from multiple threads, but 
> actually it was just the same client doing retries while the first run didn't 
> even finish yet (that's another problem). You could see the first scanner, 
> then up to two other handlers waiting for it to finish in order to run 
> (because of the synchronization on RegionScanner).
> So what should we do? We could treat STE as a DoNotRetryException and let the 
> client deal with it, or we could retry only once.
> There's also the option of having a different behavior for get/put/icv/scan, 
> the issue with operations that modify a cell is that you don't know if the 
> operation completed or not (same when a RS dies hard after completing let's 
> say a Put but just before returning to the client).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4459) HbaseObjectWritable code is a byte, we will eventually run out of codes

2011-10-12 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4459:
-

Assignee: ramkrishna.s.vasudevan

> HbaseObjectWritable code is a byte, we will eventually run out of codes
> ---
>
> Key: HBASE-4459
> URL: https://issues.apache.org/jira/browse/HBASE-4459
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Reporter: Jonathan Gray
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.92.0
>
>
> There are about 90 classes/codes in HbaseObjectWritable currently and 
> Byte.MAX_VALUE is 127.  In addition, anyone wanting to add custom classes but 
> not break compatibility might want to leave a gap before using codes and 
> that's difficult in such limited space.
> Eventually we should get rid of this pattern that makes compatibility 
> difficult (better client/server protocol handshake) but we should probably at 
> least bump this to a short for 0.94.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira