[jira] [Commented] (HAWQ-1504) Namenode hangs during restart of docker environment configured using incubator-hawq/contrib/hawq-docker/

2017-07-17 Thread Shubham Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091098#comment-16091098
 ] 

Shubham Sharma commented on HAWQ-1504:
--

Submitted [PR 1267 |https://github.com/apache/incubator-hawq/pull/1267]

> Namenode hangs during restart of docker environment configured using 
> incubator-hawq/contrib/hawq-docker/
> 
>
> Key: HAWQ-1504
> URL: https://issues.apache.org/jira/browse/HAWQ-1504
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Command Line Tools
>Reporter: Shubham Sharma
>Assignee: Radar Lei
>Priority: Minor
>
> After setting up an environment using instructions provided under 
> incubator-hawq/contrib/hawq-docker/, while trying to restart docker 
> containers namenode hangs and tries a namenode -format during every start.
> Steps to reproduce this issue - 
> - Navigate to incubator-hawq/contrib/hawq-docker
> - make stop
> - make start
> - docker exec -it centos7-namenode bash
> - ps -ef | grep java
> You can see namenode -format running.
> {code}
> [gpadmin@centos7-namenode data]$ ps -ef | grep java
> hdfs1110  1 00:56 ?00:00:06 
> /etc/alternatives/java_sdk/bin/java -Dproc_namenode -Xmx1000m 
> -Dhdfs.namenode=centos7-namenode -Dhadoop.log.dir=/var/log/hadoop/hdfs 
> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.5.0.0-1245/hadoop 
> -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console 
> -Djava.library.path=:/usr/hdp/2.5.0.0-1245/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.0.0-1245/hadoop/lib/native
>  -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true 
> -Dhadoop.security.logger=INFO,NullAppender 
> org.apache.hadoop.hdfs.server.namenode.NameNode -format
> {code}
> Since namenode -format runs in interactive mode and at this stage it is 
> waiting for a (Yes/No) response, the namenode will remain stuck forever. This 
> makes hdfs unavailable.
> Root cause of the problem - 
> In the dockerfiles present under 
> incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-test and 
> incubator-hawq/contrib/hawq-docker/centos7-docker/hawq-test, the docker 
> directive ENTRYPOINT executes entrypoin.sh during startup.
> The entrypoint.sh in turn executes start-hdfs.sh. start-dfs.sh checks for the 
> following - 
> {code}
> if [ ! -d /tmp/hdfs/name/current ]; then
> su -l hdfs -c "hdfs namenode -format"
>   fi
> {code}
> My assumption is it looks for fsimage and edit logs. If they are not present 
> the script assumes that this a first time initialization and namenode format 
> should be done. However, path /tmp/hdfs/name/current does not exist on 
> namenode. 
> From namenode logs it is clear that fsimage and edit logs are written under 
> /tmp/hadoop-hdfs/dfs/name/current.
> {code}
> 2017-07-18 00:55:20,892 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
> No edit log streams selected.
> 2017-07-18 00:55:20,893 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Planning to load image: 
> FSImageFile(file=/tmp/hadoop-hdfs/dfs/name/current/fsimage_000,
>  cpktTxId=000)
> 2017-07-18 00:55:20,995 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 1 INodes.
> 2017-07-18 00:55:21,064 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: Loaded FSImage 
> in 0 seconds.
> 2017-07-18 00:55:21,065 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Loaded image for txid 0 from 
> /tmp/hadoop-hdfs/dfs/name/current/fsimage_000
> 2017-07-18 00:55:21,084 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? 
> false (staleImage=false, haEnabled=false, isRollingUpgrade=false)
> 2017-07-18 00:55:21,084 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 1
> {code}
> Thus wrong path in 
> incubator-hawq/contrib/hawq-docker/centos*-docker/hawq-test/start-hdfs.sh 
> causes namenode to hang during each restart of the containers making hdfs 
> unavailable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HAWQ-1504) Namenode hangs during restart of docker environment configured using incubator-hawq/contrib/hawq-docker/

2017-07-17 Thread Shubham Sharma (JIRA)

[ 
https://issues.apache.org/jira/browse/HAWQ-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090935#comment-16090935
 ] 

Shubham Sharma commented on HAWQ-1504:
--

Submitting a PR shortly

> Namenode hangs during restart of docker environment configured using 
> incubator-hawq/contrib/hawq-docker/
> 
>
> Key: HAWQ-1504
> URL: https://issues.apache.org/jira/browse/HAWQ-1504
> Project: Apache HAWQ
>  Issue Type: Bug
>  Components: Command Line Tools
>Reporter: Shubham Sharma
>Assignee: Radar Lei
>
> After setting up an environment using instructions provided under 
> incubator-hawq/contrib/hawq-docker/, while trying to restart docker 
> containers namenode hangs and tries a namenode -format during every start.
> Steps to reproduce this issue - 
> - Navigate to incubator-hawq/contrib/hawq-docker
> - make stop
> - make start
> - docker exec -it centos7-namenode bash
> - ps -ef | grep java
> You can see namenode -format running.
> {code}
> [gpadmin@centos7-namenode data]$ ps -ef | grep java
> hdfs1110  1 00:56 ?00:00:06 
> /etc/alternatives/java_sdk/bin/java -Dproc_namenode -Xmx1000m 
> -Dhdfs.namenode=centos7-namenode -Dhadoop.log.dir=/var/log/hadoop/hdfs 
> -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.5.0.0-1245/hadoop 
> -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console 
> -Djava.library.path=:/usr/hdp/2.5.0.0-1245/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.0.0-1245/hadoop/lib/native
>  -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true 
> -Dhadoop.security.logger=INFO,NullAppender 
> org.apache.hadoop.hdfs.server.namenode.NameNode -format
> {code}
> Since namenode -format runs in interactive mode and at this stage it is 
> waiting for a (Yes/No) response, the namenode will remain stuck forever. This 
> makes hdfs unavailable.
> Root cause of the problem - 
> In the dockerfiles present under 
> incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-test and 
> incubator-hawq/contrib/hawq-docker/centos7-docker/hawq-test, the docker 
> directive ENTRYPOINT executes entrypoin.sh during startup.
> The entrypoint.sh in turn executes start-hdfs.sh. start-dfs.sh checks for the 
> following - 
> {code}
> if [ ! -d /tmp/hdfs/name/current ]; then
> su -l hdfs -c "hdfs namenode -format"
>   fi
> {code}
> My assumption is it looks for fsimage and edit logs. If they are not present 
> the script assumes that this a first time initialization and namenode format 
> should be done. However, path /tmp/hdfs/name/current does not exist on 
> namenode. 
> From namenode logs it is clear that fsimage and edit logs are written under 
> /tmp/hadoop-hdfs/dfs/name/current.
> {code}
> 2017-07-18 00:55:20,892 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
> No edit log streams selected.
> 2017-07-18 00:55:20,893 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Planning to load image: 
> FSImageFile(file=/tmp/hadoop-hdfs/dfs/name/current/fsimage_000,
>  cpktTxId=000)
> 2017-07-18 00:55:20,995 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 1 INodes.
> 2017-07-18 00:55:21,064 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: Loaded FSImage 
> in 0 seconds.
> 2017-07-18 00:55:21,065 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Loaded image for txid 0 from 
> /tmp/hadoop-hdfs/dfs/name/current/fsimage_000
> 2017-07-18 00:55:21,084 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? 
> false (staleImage=false, haEnabled=false, isRollingUpgrade=false)
> 2017-07-18 00:55:21,084 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 1
> {code}
> Thus wrong path in 
> incubator-hawq/contrib/hawq-docker/centos*-docker/hawq-test/start-hdfs.sh 
> causes namenode to hang during each restart of the containers making hdfs 
> unavailable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)