Hi,
thanks for your help. The HADOOP_PID_DIR variable is pointing to
/var/run/cluster/hadoop (which has hdfs:hadoop) as it’s owner. 3 PID
are created there (datanode namenode and secure_dn). It looks like the
PID was written but there was a readproblem.
I did chmod –R 777 on the folder and now the Datanodes are Stopped
correctly. It only works when I’m running the start and stop command
as user HDFS. If I try to start and stop as root (like its documented
in the Documentation I still get the “no Datanode to stop” error.
Is it important to start the DN as root? The only thing I recognized
is the secure_dn PID-File is not created when im starting the Datanode
as HDFS-User. Is this a Problem?
Greets
DK
*Von:*Ulul [mailto:[email protected]]
*Gesendet:* Montag, 2. März 2015 21:50
*An:* [email protected]
*Betreff:* Re: AW: Hadoop 2.6.0 - No DataNode to stop
Hi
The hadoop-daemon.sh script prints the no $command to stop if it
doesn'f find the pid file.
You should echo the $pid variable and see if you hava a correct pid
file there.
Ulul
Le 02/03/2015 13:53, Daniel Klinger a écrit :
Thanks for your help. But unfortunatly this didn’t do the job.
Here’s the Shellscript I’ve written to start my cluster (the
scripts on the other node only contains the command to start the
datanode respectively the command to start the Nodemanager on the
other node (with the right user (hdfs / yarn)):
#!/bin/bash
# Start
HDFS-------------------------------------------------------------------------------------------------------------------------
# Start Namenode
su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config
$HADOOP_CONF_DIR --script hdfs start namenode"
wait
# Start all Datanodes
export HADOOP_SECURE_DN_USER=hdfs
su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config
$HADOOP_CONF_DIR --script hdfs start datanode"
wait
ssh [email protected]
<mailto:[email protected]> 'bash startDatanode.sh'
wait
# Start Resourcemanager
su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config
$HADOOP_CONF_DIR start resourcemanager"
wait
# Start Nodemanager on all Nodes
su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config
$HADOOP_CONF_DIR start nodemanager"
wait
ssh [email protected]
<mailto:[email protected]> 'bash startNodemanager.sh'
wait
# Start Proxyserver
#su - yarn -c "$HADOOP_YARN_HOME/bin/yarn start proxyserver
--config $HADOOP_CONF_DIR"
#wait
# Start Historyserver
su - mapred -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start
historyserver --config $HADOOP_CONF_DIR"
wait
This script generates the following output:
starting namenode, logging to
/var/log/cluster/hadoop/hadoop-hdfs-namenode-hadoop.klinger.local.out
starting datanode, logging to
/var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop.klinger.local.out
starting datanode, logging to
/var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop-data.klinger.local.out
starting resourcemanager, logging to
/var/log/cluster/yarn/yarn-yarn-resourcemanager-hadoop.klinger.local.out
starting nodemanager, logging to
/var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop.klinger.local.out
starting nodemanager, logging to
/var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop-data.klinger.local.out
starting historyserver, logging to
/var/log/cluster/mapred/mapred-mapred-historyserver-hadoop.klinger.local.out
Following my stopscript and it’s output:
#!/bin/bash
# Stop
HDFS------------------------------------------------------------------------------------------------
# Stop Namenode
su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config
$HADOOP_CONF_DIR --script hdfs stop namenode"
# Stop all Datanodes
su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config
$HADOOP_CONF_DIR --script hdfs stop datanode"
ssh [email protected]
<mailto:[email protected]> 'bash stopDatanode.sh'
# Stop Resourcemanager
su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config
$HADOOP_CONF_DIR stop resourcemanager"
#Stop Nodemanager on all Hosts
su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config
$HADOOP_CONF_DIR stop nodemanager"
ssh [email protected]
<mailto:[email protected]> 'bash stopNodemanager.sh'
#Stop Proxyserver
#su - yarn -c "$HADOOP_YARN_HOME/bin/yarn stop proxyserver
--config $HADOOP_CONF_DIR"
#Stop Historyserver
su - mapred -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop
historyserver --config $HADOOP_CONF_DIR"
stopping namenode
no datanode to stop
no datanode to stop
stopping resourcemanager
stopping nodemanager
stopping nodemanager
nodemanager did not stop gracefully after 5 seconds: killing with
kill -9
stopping historyserver
Is there may be anything wrong with my commands?
Greets
DK
*Von:*Varun Kumar [mailto:[email protected]]
*Gesendet:* Montag, 2. März 2015 05:28
*An:* user
*Betreff:* Re: Hadoop 2.6.0 - No DataNode to stop
1.Stop the service
2.Change the permissions for log and pid directory once again to hdfs.
3.Start service with hdfs.
This will resolve the issue
On Sun, Mar 1, 2015 at 6:40 PM, Daniel Klinger
<[email protected] <mailto:[email protected]>> wrote:
Thanks for your answer.
I put the FQDN of the DataNodes in the slaves file on each
node (one FQDN per line). Here’s the full DataNode log after
the start (the log of the other DataNode is exactly the same):
2015-03-02 00:29:41,841 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: registered
UNIX signal handlers for [TERM, HUP, INT]
2015-03-02 00:29:42,207 INFO
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded
properties from hadoop-metrics2.properties
2015-03-02 00:29:42,312 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
snapshot period at 10 second(s).
2015-03-02 00:29:42,313 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode
metrics system started
2015-03-02 00:29:42,319 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Configured
hostname is hadoop.klinger.local
2015-03-02 00:29:42,327 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Starting
DataNode with maxLockedMemory = 0
2015-03-02 00:29:42,350 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Opened
streaming server at /0.0.0.0:50010 <http://0.0.0.0:50010>
2015-03-02 00:29:42,357 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing
bandwith is 1048576 bytes/s
2015-03-02 00:29:42,358 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Number
threads for balancing is 5
2015-03-02 00:29:42,458 INFO org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2015-03-02 00:29:42,462 INFO
org.apache.hadoop.http.HttpRequestLog: Http request log for
http.requests.datanode is not defined
2015-03-02 00:29:42,474 INFO
org.apache.hadoop.http.HttpServer2: Added global filter
'safety'
(class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2015-03-02 00:29:42,476 INFO
org.apache.hadoop.http.HttpServer2: Added filter
static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context datanode
2015-03-02 00:29:42,476 INFO
org.apache.hadoop.http.HttpServer2: Added filter
static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context logs
2015-03-02 00:29:42,476 INFO
org.apache.hadoop.http.HttpServer2: Added filter
static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context static
2015-03-02 00:29:42,494 INFO
org.apache.hadoop.http.HttpServer2: addJerseyResourcePackage:
packageName=org.apache.hadoop.hdfs.server.datanode.web.resources;org.apache.hadoop.hdfs.web.resources,
pathSpec=/webhdfs/v1/*
2015-03-02 00:29:42,499 INFO org.mortbay.log: jetty-6.1.26
2015-03-02 00:29:42,555 WARN org.mortbay.log: Can't reuse
/tmp/Jetty_0_0_0_0_50075_datanode____hwtdwq, using
/tmp/Jetty_0_0_0_0_50075_datanode____hwtdwq_3168831075162569402
2015-03-02 00:29:43,205 INFO org.mortbay.log: Started
[email protected]:50075
<http://[email protected]:50075>
2015-03-02 00:29:43,635 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = hdfs
2015-03-02 00:29:43,635 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup =
supergroup
2015-03-02 00:29:43,802 INFO
org.apache.hadoop.ipc.CallQueueManager: Using callQueue class
java.util.concurrent.LinkedBlockingQueue
2015-03-02 00:29:43,823 INFO org.apache.hadoop.ipc.Server:
Starting Socket Reader #1 for port 50020
2015-03-02 00:29:43,875 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC
server at /0.0.0.0:50020 <http://0.0.0.0:50020>
2015-03-02 00:29:43,913 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh
request received for nameservices: null
2015-03-02 00:29:43,953 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Starting
BPOfferServices for nameservices: <default>
2015-03-02 00:29:43,973 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
<registering> (Datanode Uuid unassigned) service to
hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>
starting to offer service
2015-03-02 00:29:43,981 INFO org.apache.hadoop.ipc.Server: IPC
Server Responder: starting
2015-03-02 00:29:43,982 INFO org.apache.hadoop.ipc.Server: IPC
Server listener on 50020: starting
2015-03-02 00:29:44,620 INFO
org.apache.hadoop.hdfs.server.common.Storage: DataNode
version: -56 and NameNode layout version: -60
2015-03-02 00:29:44,641 INFO
org.apache.hadoop.hdfs.server.common.Storage: Lock on
/cluster/storage/datanode/in_use.lock acquired by nodename
[email protected] <mailto:[email protected]>
2015-03-02 00:29:44,822 INFO
org.apache.hadoop.hdfs.server.common.Storage: Analyzing
storage directories for bpid BP-158097147-10.0.1.148-1424966425688
2015-03-02 00:29:44,822 INFO
org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled
2015-03-02 00:29:44,825 INFO
org.apache.hadoop.hdfs.server.common.Storage: Restored 0 block
files from trash.
2015-03-02 00:29:44,829 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up
storage:
nsid=330980018;bpid=BP-158097147-10.0.1.148-1424966425688;lv=-56;nsInfo=lv=-60;cid=CID-a2c81934-b3ce-44aa-b920-436ee2f0d5a7;nsid=330980018;c=0;bpid=BP-158097147-10.0.1.148-1424966425688;dnuuid=a3b6c890-41ca-4bde-855c-015c67e6e0df
2015-03-02 00:29:44,996 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Added new volume: /cluster/storage/datanode/current
2015-03-02 00:29:44,998 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Added volume - /cluster/storage/datanode/current, StorageType:
DISK
2015-03-02 00:29:45,035 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Registered FSDatasetState MBean
2015-03-02 00:29:45,057 INFO
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner:
Periodic Directory Tree Verification scan starting at
1425265856057 with interval 21600000
2015-03-02 00:29:45,064 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Adding block pool BP-158097147-10.0.1.148-1424966425688
2015-03-02 00:29:45,071 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Scanning block pool BP-158097147-10.0.1.148-1424966425688 on
volume /cluster/storage/datanode/current...
2015-03-02 00:29:45,128 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Time taken to scan block pool
BP-158097147-10.0.1.148-1424966425688 on
/cluster/storage/datanode/current: 56ms
2015-03-02 00:29:45,128 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Total time to scan all replicas for block pool
BP-158097147-10.0.1.148-1424966425688: 64ms
2015-03-02 00:29:45,128 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Adding replicas to map for block pool
BP-158097147-10.0.1.148-1424966425688 on volume
/cluster/storage/datanode/current...
2015-03-02 00:29:45,129 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Time to add replicas to map for block pool
BP-158097147-10.0.1.148-1424966425688 on volume
/cluster/storage/datanode/current: 0ms
2015-03-02 00:29:45,134 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Total time to add all replicas to map: 5ms
2015-03-02 00:29:45,138 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
BP-158097147-10.0.1.148-1424966425688 (Datanode Uuid null)
service to hadoop.klinger.local/10.0.1.148:8020
<http://10.0.1.148:8020> beginning handshake with NN
2015-03-02 00:29:45,316 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
Block pool BP-158097147-10.0.1.148-1424966425688 (Datanode
Uuid null) service to hadoop.klinger.local/10.0.1.148:8020
<http://10.0.1.148:8020> successfully registered with NN
2015-03-02 00:29:45,316 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode
hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>
using DELETEREPORT_INTERVAL of 300000 msec
BLOCKREPORT_INTERVAL of 21600000msec CACHEREPORT_INTERVAL of
10000msec Initial delay: 0msec; heartBeatInterval=3000
2015-03-02 00:29:45,751 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Namenode
Block pool BP-158097147-10.0.1.148-1424966425688 (Datanode
Uuid a3b6c890-41ca-4bde-855c-015c67e6e0df) service to
hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>
trying to claim ACTIVE state with txid=24
2015-03-02 00:29:45,751 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Acknowledging
ACTIVE Namenode Block pool
BP-158097147-10.0.1.148-1424966425688 (Datanode Uuid
a3b6c890-41ca-4bde-855c-015c67e6e0df) service to
hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>
2015-03-02 00:29:45,883 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Sent 1
blockreports 0 blocks total. Took 4 msec to generate and 126
msecs for RPC and NN processing. Got back commands
org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@3d528774
<mailto:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@3d528774>
2015-03-02 00:29:45,883 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Got finalize
command for block pool BP-158097147-10.0.1.148-1424966425688
2015-03-02 00:29:45,891 INFO org.apache.hadoop.util.GSet:
Computing capacity for map BlockMap
2015-03-02 00:29:45,891 INFO org.apache.hadoop.util.GSet: VM
type = 64-bit
2015-03-02 00:29:45,893 INFO org.apache.hadoop.util.GSet: 0.5%
max memory 966.7 MB = 4.8 MB
2015-03-02 00:29:45,893 INFO org.apache.hadoop.util.GSet:
capacity = 2^19 = 524288 entries
2015-03-02 00:29:45,894 INFO
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
Periodic Block Verification Scanner initialized with interval
504 hours for block pool BP-158097147-10.0.1.148-1424966425688
2015-03-02 00:29:45,900 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Added
bpid=BP-158097147-10.0.1.148-1424966425688 to
blockPoolScannerMap, new size=1
Dfsadmin –report (called as user hdfs on NameNode) generated
following output. It looks like both DataNodes are available:
Configured Capacity: 985465716736 (917.79 GB)
Present Capacity: 929892360192 (866.03 GB)
DFS Remaining: 929892302848 (866.03 GB)
DFS Used: 57344 (56 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Live datanodes (2):
Name: 10.0.1.148:50010 <http://10.0.1.148:50010>
(hadoop.klinger.local)
Hostname: hadoop.klinger.local
Decommission Status : Normal
Configured Capacity: 492732858368 (458.89 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 27942051840 (26.02 GB)
DFS Remaining: 464790777856 (432.87 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.33%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Mar 02 00:38:00 CET 2015
Name: 10.0.1.89:50010 <http://10.0.1.89:50010>
(hadoop-data.klinger.local)
Hostname: hadoop-data.klinger.local
Decommission Status : Normal
Configured Capacity: 492732858368 (458.89 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 27631304704 (25.73 GB)
DFS Remaining: 465101524992 (433.16 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.39%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Mar 02 00:37:59 CET 2015
Any further thoughts?
Greets
DK
*Von:*Ulul [mailto:[email protected] <mailto:[email protected]>]
*Gesendet:* Sonntag, 1. März 2015 13:12
*An:* [email protected] <mailto:[email protected]>
*Betreff:* Re: Hadoop 2.6.0 - No DataNode to stop
Hi
Did you check your slaves file is correct ?
That the datanode process is actually running ?
Did you check its log file ?
That the datanode is available ? (dfsadmin -report, through
the WUI)
We need more detail
Ulul
Le 28/02/2015 22:05, Daniel Klinger a écrit :
Thanks but i know how to kill a process in Linux. But this didn’t
answer the question why the command say no Datanode to stop instead of stopping
the Datanode:
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR
--script hdfs stop datanode
*Von:*Surbhi Gupta [mailto:[email protected]]
*Gesendet:* Samstag, 28. Februar 2015 20:16
*An:* [email protected] <mailto:[email protected]>
*Betreff:* Re: Hadoop 2.6.0 - No DataNode to stop
Issue jps and get the process id or
Try to get the process id of datanode.
Issue ps-fu userid of the user through which datanode is
running.
Then kill the process using kill -9
On 28 Feb 2015 09:38, "Daniel Klinger"
<[email protected] <mailto:[email protected]>> wrote:
Hello,
I used a lot of Hadoop-Distributions. Now I’m trying
to install a pure Hadoop on a little „cluster“ for
testing (2 CentOS-VMs: 1 Name+DataNode 1 DataNode). I
followed the instructions on the Documentation site:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html.
I’m starting the Cluster like it is described in the
Chapter „Operating the Hadoop Cluster“(with different
users). The starting process works great. The
PID-Files are created in /var/run and u can see that
Folders and Files are created in the Data- and
NameNode folders. I’m getting no errors in the log-files.
When I try to stop the cluster all Services are
stopped (NameNode, ResourceManager etc.). But when I
stop the DataNodes I’m getting the message: „No
DataNode to stop“. The PID-File and the
in_use.lock-File are still there and if I try to start
the DataNode again I’m getting the error that the
Process is already running. When I stop the DataNode
as hdfs instead of root the PID and in_use-File are
removed but I’m still getting the message: „No
DataNode to stop“
What I’m doing wrong?
Greets
dk
--
Regards,
Varun Kumar.P