[jira] [Updated] (HBASE-4695) WAL logs get deleted before region server can fully flush

2011-11-01 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4695:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Applied branches and trunk.  Thanks for the patch Gao.

I looked at this for a while to see if could make a test.  It'd be timing based 
thing where we'd check that logs were not moved before flush had completed.  
Punting.

 WAL logs get deleted before region server can fully flush
 -

 Key: HBASE-4695
 URL: https://issues.apache.org/jira/browse/HBASE-4695
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.90.4
Reporter: jack levin
Assignee: gaojinchao
Priority: Blocker
 Fix For: 0.92.0, 0.90.5

 Attachments: HBASE-4695_Branch90_V2.patch, HBASE-4695_Trunk_V2.patch, 
 HBASE-4695_branch90_trial.patch, hbase-4695-0.92.txt


 To replicate the problem do the following:
 1. check /hbase/.logs/ directory to see if you have WAL logs for the 
 region server you are shutting down.
 2. executing kill pid (where pid is a regionserver pid)
 3. Watch the regionserver log to start flushing, you will see how many 
 regions are left to flush:
 09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
 on 489 regions to close
 09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
 on 116 regions to close
 4. Check /hbase/.logs/ -- you will notice that it has dissapeared.
 5. Check namenode logs:
 09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: 
 ugi=root ip=/10.101.1.5 cmd=delete 
 src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749
 Note that, if you kill -9 the RS now, and it crashes on flush, you won't have 
 any WAL logs to replay.  We need to make sure that logs are deleted or moved 
 out only when RS has fully flushed. Otherwise its possible to lose data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4695) WAL logs get deleted before region server can fully flush

2011-10-31 Thread gaojinchao (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4695:
--

Attachment: HBASE-4695_Trunk_V2.patch

 WAL logs get deleted before region server can fully flush
 -

 Key: HBASE-4695
 URL: https://issues.apache.org/jira/browse/HBASE-4695
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.90.4
Reporter: jack levin
Assignee: gaojinchao
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4695_Trunk_V2.patch, 
 HBASE-4695_branch90_trial.patch, hbase-4695-0.92.txt


 To replicate the problem do the following:
 1. check /hbase/.logs/ directory to see if you have WAL logs for the 
 region server you are shutting down.
 2. executing kill pid (where pid is a regionserver pid)
 3. Watch the regionserver log to start flushing, you will see how many 
 regions are left to flush:
 09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
 on 489 regions to close
 09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
 on 116 regions to close
 4. Check /hbase/.logs/ -- you will notice that it has dissapeared.
 5. Check namenode logs:
 09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: 
 ugi=root ip=/10.101.1.5 cmd=delete 
 src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749
 Note that, if you kill -9 the RS now, and it crashes on flush, you won't have 
 any WAL logs to replay.  We need to make sure that logs are deleted or moved 
 out only when RS has fully flushed. Otherwise its possible to lose data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4695) WAL logs get deleted before region server can fully flush

2011-10-31 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4695:
-

Fix Version/s: 0.92.0

 WAL logs get deleted before region server can fully flush
 -

 Key: HBASE-4695
 URL: https://issues.apache.org/jira/browse/HBASE-4695
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.90.4
Reporter: jack levin
Assignee: gaojinchao
Priority: Blocker
 Fix For: 0.92.0, 0.90.5

 Attachments: HBASE-4695_Trunk_V2.patch, 
 HBASE-4695_branch90_trial.patch, hbase-4695-0.92.txt


 To replicate the problem do the following:
 1. check /hbase/.logs/ directory to see if you have WAL logs for the 
 region server you are shutting down.
 2. executing kill pid (where pid is a regionserver pid)
 3. Watch the regionserver log to start flushing, you will see how many 
 regions are left to flush:
 09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
 on 489 regions to close
 09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
 on 116 regions to close
 4. Check /hbase/.logs/ -- you will notice that it has dissapeared.
 5. Check namenode logs:
 09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: 
 ugi=root ip=/10.101.1.5 cmd=delete 
 src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749
 Note that, if you kill -9 the RS now, and it crashes on flush, you won't have 
 any WAL logs to replay.  We need to make sure that logs are deleted or moved 
 out only when RS has fully flushed. Otherwise its possible to lose data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4695) WAL logs get deleted before region server can fully flush

2011-10-31 Thread gaojinchao (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4695:
--

Attachment: HBASE-4695_Branch90_V2.patch

 WAL logs get deleted before region server can fully flush
 -

 Key: HBASE-4695
 URL: https://issues.apache.org/jira/browse/HBASE-4695
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.90.4
Reporter: jack levin
Assignee: gaojinchao
Priority: Blocker
 Fix For: 0.92.0, 0.90.5

 Attachments: HBASE-4695_Branch90_V2.patch, HBASE-4695_Trunk_V2.patch, 
 HBASE-4695_branch90_trial.patch, hbase-4695-0.92.txt


 To replicate the problem do the following:
 1. check /hbase/.logs/ directory to see if you have WAL logs for the 
 region server you are shutting down.
 2. executing kill pid (where pid is a regionserver pid)
 3. Watch the regionserver log to start flushing, you will see how many 
 regions are left to flush:
 09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
 on 489 regions to close
 09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
 on 116 regions to close
 4. Check /hbase/.logs/ -- you will notice that it has dissapeared.
 5. Check namenode logs:
 09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: 
 ugi=root ip=/10.101.1.5 cmd=delete 
 src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749
 Note that, if you kill -9 the RS now, and it crashes on flush, you won't have 
 any WAL logs to replay.  We need to make sure that logs are deleted or moved 
 out only when RS has fully flushed. Otherwise its possible to lose data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4695) WAL logs get deleted before region server can fully flush

2011-10-29 Thread gaojinchao (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4695:
--

Attachment: HBASE-4695_branch90_trial.patch

Go back to company and verify this patch.
If you are free, Please review it firstly.

The patch seems simple.


 WAL logs get deleted before region server can fully flush
 -

 Key: HBASE-4695
 URL: https://issues.apache.org/jira/browse/HBASE-4695
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.90.4
Reporter: jack levin
Assignee: gaojinchao
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4695_branch90_trial.patch


 To replicate the problem do the following:
 1. check /hbase/.logs/ directory to see if you have WAL logs for the 
 region server you are shutting down.
 2. executing kill pid (where pid is a regionserver pid)
 3. Watch the regionserver log to start flushing, you will see how many 
 regions are left to flush:
 09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
 on 489 regions to close
 09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
 on 116 regions to close
 4. Check /hbase/.logs/ -- you will notice that it has dissapeared.
 5. Check namenode logs:
 09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: 
 ugi=root ip=/10.101.1.5 cmd=delete 
 src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749
 Note that, if you kill -9 the RS now, and it crashes on flush, you won't have 
 any WAL logs to replay.  We need to make sure that logs are deleted or moved 
 out only when RS has fully flushed. Otherwise its possible to lose data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4695) WAL logs get deleted before region server can fully flush

2011-10-29 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4695:
--

Status: Patch Available  (was: Open)

Submit for patch testing.

 WAL logs get deleted before region server can fully flush
 -

 Key: HBASE-4695
 URL: https://issues.apache.org/jira/browse/HBASE-4695
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.90.4
Reporter: jack levin
Assignee: gaojinchao
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4695_branch90_trial.patch, hbase-4695-0.92.txt


 To replicate the problem do the following:
 1. check /hbase/.logs/ directory to see if you have WAL logs for the 
 region server you are shutting down.
 2. executing kill pid (where pid is a regionserver pid)
 3. Watch the regionserver log to start flushing, you will see how many 
 regions are left to flush:
 09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
 on 489 regions to close
 09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
 on 116 regions to close
 4. Check /hbase/.logs/ -- you will notice that it has dissapeared.
 5. Check namenode logs:
 09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: 
 ugi=root ip=/10.101.1.5 cmd=delete 
 src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749
 Note that, if you kill -9 the RS now, and it crashes on flush, you won't have 
 any WAL logs to replay.  We need to make sure that logs are deleted or moved 
 out only when RS has fully flushed. Otherwise its possible to lose data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4695) WAL logs get deleted before region server can fully flush

2011-10-29 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4695:
--

Attachment: hbase-4695-0.92.txt

Proposed patch for 0.92

 WAL logs get deleted before region server can fully flush
 -

 Key: HBASE-4695
 URL: https://issues.apache.org/jira/browse/HBASE-4695
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.90.4
Reporter: jack levin
Assignee: gaojinchao
Priority: Blocker
 Fix For: 0.90.5

 Attachments: HBASE-4695_branch90_trial.patch, hbase-4695-0.92.txt


 To replicate the problem do the following:
 1. check /hbase/.logs/ directory to see if you have WAL logs for the 
 region server you are shutting down.
 2. executing kill pid (where pid is a regionserver pid)
 3. Watch the regionserver log to start flushing, you will see how many 
 regions are left to flush:
 09:36:54,665 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
 on 489 regions to close
 09:56:35,779 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting 
 on 116 regions to close
 4. Check /hbase/.logs/ -- you will notice that it has dissapeared.
 5. Check namenode logs:
 09:26:41,607 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: 
 ugi=root ip=/10.101.1.5 cmd=delete 
 src=/hbase/.logs/rdaa5.prod.imageshack.com,60020,1319749
 Note that, if you kill -9 the RS now, and it crashes on flush, you won't have 
 any WAL logs to replay.  We need to make sure that logs are deleted or moved 
 out only when RS has fully flushed. Otherwise its possible to lose data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira