[ https://issues.apache.org/jira/browse/HADOOP-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503159#comment-13503159 ]
Steve Loughran commented on HADOOP-9085: ---------------------------------------- Pid recycling is a permanent problem with Unix systems -you are correct that something needs to be done. We can't rely on deleting the pid file on a successful shutdown either, as all forms of killing are "successful" -even server reboot. I don't think the proposed patch would work as it's still looking for a file {{$pid}}, even though it's no longer needed, and that file is also used in the error text. Better to skip the -f check and use {{$curpid}} in the error. Even after tha, it's pretty brittle against unintentional command matches. What we need to do is move away from pid-file-liveness tests altogether. There is a far more robust alternative, the service started up should create an exclusive write lock on a well-known file. When the process dies, the OS automatically releases this lock. I'll open a JIRA on it. > start namenode failure,bacause pid of namenode pid file is other process pid > or thread id before start namenode > --------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-9085 > URL: https://issues.apache.org/jira/browse/HADOOP-9085 > Project: Hadoop Common > Issue Type: Bug > Components: bin > Affects Versions: 2.0.1-alpha, 2.0.3-alpha > Environment: NA > Reporter: liaowenrui > Fix For: 2.0.1-alpha, 2.0.2-alpha, 2.0.3-alpha > > > pid of namenode pid file is other process pid or thread id before start > namenode,start namenode will failure.because the pid of namenode pid file > will be checked use kill -0 command before start namenode in hadoop-daemo.sh > script.when pid of namenode pid file is other process pid or thread id,checkt > is use kil -0 command,and the kill -0 will return success.it means the > namenode is runing.in really,namenode is not runing. > 2338 is dead namenode pid > 2305 is datanode pid > cqn2:/tmp # kill -0 2338 > cqn2:/tmp # ps -wweLo pid,ppid,tid | grep 2338 > 2305 1 2338 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira