Re: AW: AW: Hadoop 2.6.0 - No DataNode to stop

Ulul Tue, 03 Mar 2015 13:00:38 -0800

Hi

As a general rule, you should never run an applicative daemon as rootsince any vulnerabilty can allow a malicious intruder to get fullcontrol of the system.

The documentation does not advise to start hadoop as root :
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html#Hadoop_Startup
will show you hadoop being started as a regular user

I'm puzzled by the fact that root would be barred from accessing a file.The only case I can think of would be an nfs mount with root squashing.

And there should be no need to use the 777 bypass as long as you'reusing the same user to start and stop your daemons.


Ulul

Le 03/03/2015 00:14, Daniel Klinger a écrit :

Hi,

thanks for your help. The HADOOP_PID_DIR variable is pointing to/var/run/cluster/hadoop (which has hdfs:hadoop) as it’s owner. 3 PIDare created there (datanode namenode and secure_dn). It looks like thePID was written but there was a readproblem.

I did chmod –R 777 on the folder and now the Datanodes are Stoppedcorrectly. It only works when I’m running the start and stop commandas user HDFS. If I try to start and stop as root (like its documentedin the Documentation I still get the “no Datanode to stop” error.

Is it important to start the DN as root? The only thing I recognizedis the secure_dn PID-File is not created when im starting the Datanodeas HDFS-User. Is this a Problem?


Greets

DK

*Von:*Ulul [mailto:[email protected]]
*Gesendet:* Montag, 2. März 2015 21:50
*An:* [email protected]
*Betreff:* Re: AW: Hadoop 2.6.0 - No DataNode to stop

Hi

The hadoop-daemon.sh script prints the no $command to stop if itdoesn'f find the pid file.You should echo the $pid variable and see if you hava a correct pidfile there.

Ulul

Le 02/03/2015 13:53, Daniel Klinger a écrit :

    Thanks for your help. But unfortunatly this didn’t do the job.
    Here’s the Shellscript I’ve written to start my cluster (the
    scripts on the other node only contains the command to start the
    datanode respectively the command to start the Nodemanager on the
    other node (with the right user (hdfs / yarn)):

    #!/bin/bash

    # Start
    
HDFS-------------------------------------------------------------------------------------------------------------------------

    # Start Namenode

    su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config
    $HADOOP_CONF_DIR --script hdfs start namenode"

    wait

    # Start all Datanodes

    export HADOOP_SECURE_DN_USER=hdfs

    su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config
    $HADOOP_CONF_DIR --script hdfs start datanode"

    wait

    ssh [email protected]
    <mailto:[email protected]> 'bash startDatanode.sh'

    wait

    # Start Resourcemanager

    su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config
    $HADOOP_CONF_DIR start resourcemanager"

    wait

    # Start Nodemanager on all Nodes

    su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config
    $HADOOP_CONF_DIR start nodemanager"

    wait

    ssh [email protected]
    <mailto:[email protected]> 'bash startNodemanager.sh'

    wait

    # Start Proxyserver

    #su - yarn -c "$HADOOP_YARN_HOME/bin/yarn start proxyserver
    --config $HADOOP_CONF_DIR"

    #wait

    # Start Historyserver

    su - mapred -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start
    historyserver --config $HADOOP_CONF_DIR"

    wait

    This script generates the following output:

    starting namenode, logging to
    /var/log/cluster/hadoop/hadoop-hdfs-namenode-hadoop.klinger.local.out

    starting datanode, logging to
    /var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop.klinger.local.out

    starting datanode, logging to
    /var/log/cluster/hadoop/hadoop-hdfs-datanode-hadoop-data.klinger.local.out

    starting resourcemanager, logging to
    /var/log/cluster/yarn/yarn-yarn-resourcemanager-hadoop.klinger.local.out

    starting nodemanager, logging to
    /var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop.klinger.local.out

    starting nodemanager, logging to
    /var/log/cluster/yarn/yarn-yarn-nodemanager-hadoop-data.klinger.local.out

    starting historyserver, logging to
    /var/log/cluster/mapred/mapred-mapred-historyserver-hadoop.klinger.local.out

    Following my stopscript and it’s output:

    #!/bin/bash

    # Stop
    
HDFS------------------------------------------------------------------------------------------------

    # Stop Namenode

    su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config
    $HADOOP_CONF_DIR --script hdfs stop namenode"

    # Stop all Datanodes

    su - hdfs -c "$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config
    $HADOOP_CONF_DIR --script hdfs stop datanode"

    ssh [email protected]
    <mailto:[email protected]> 'bash stopDatanode.sh'

    # Stop Resourcemanager

    su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config
    $HADOOP_CONF_DIR stop resourcemanager"

    #Stop Nodemanager on all Hosts

    su - yarn -c "$HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config
    $HADOOP_CONF_DIR stop nodemanager"

    ssh [email protected]
    <mailto:[email protected]> 'bash stopNodemanager.sh'

    #Stop Proxyserver

    #su - yarn -c "$HADOOP_YARN_HOME/bin/yarn stop proxyserver
    --config $HADOOP_CONF_DIR"

    #Stop Historyserver

    su - mapred -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop
    historyserver --config $HADOOP_CONF_DIR"

    stopping namenode

    no datanode to stop

    no datanode to stop

    stopping resourcemanager

    stopping nodemanager

    stopping nodemanager

    nodemanager did not stop gracefully after 5 seconds: killing with
    kill -9

    stopping historyserver

    Is there may be anything wrong with my commands?

    Greets

    DK

    *Von:*Varun Kumar [mailto:[email protected]]
    *Gesendet:* Montag, 2. März 2015 05:28
    *An:* user
    *Betreff:* Re: Hadoop 2.6.0 - No DataNode to stop

    1.Stop the service

    2.Change the permissions for log and pid directory once again to hdfs.

    3.Start service with hdfs.

    This will resolve the issue

    On Sun, Mar 1, 2015 at 6:40 PM, Daniel Klinger
    <[email protected] <mailto:[email protected]>> wrote:

        Thanks for your answer.

        I put the FQDN of the DataNodes in the slaves file on each
        node (one FQDN per line). Here’s the full DataNode log after
        the start (the log of the other DataNode is exactly the same):

        2015-03-02 00:29:41,841 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: registered
        UNIX signal handlers for [TERM, HUP, INT]

        2015-03-02 00:29:42,207 INFO
        org.apache.hadoop.metrics2.impl.MetricsConfig: loaded
        properties from hadoop-metrics2.properties

        2015-03-02 00:29:42,312 INFO
        org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled
        snapshot period at 10 second(s).

        2015-03-02 00:29:42,313 INFO
        org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode
        metrics system started

        2015-03-02 00:29:42,319 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Configured
        hostname is hadoop.klinger.local

        2015-03-02 00:29:42,327 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Starting
        DataNode with maxLockedMemory = 0

        2015-03-02 00:29:42,350 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Opened
        streaming server at /0.0.0.0:50010 <http://0.0.0.0:50010>

        2015-03-02 00:29:42,357 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing
        bandwith is 1048576 bytes/s

        2015-03-02 00:29:42,358 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Number
        threads for balancing is 5

        2015-03-02 00:29:42,458 INFO org.mortbay.log: Logging to
        org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
        org.mortbay.log.Slf4jLog

        2015-03-02 00:29:42,462 INFO
        org.apache.hadoop.http.HttpRequestLog: Http request log for
        http.requests.datanode is not defined

        2015-03-02 00:29:42,474 INFO
        org.apache.hadoop.http.HttpServer2: Added global filter
        'safety'
        (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)

        2015-03-02 00:29:42,476 INFO
        org.apache.hadoop.http.HttpServer2: Added filter
        static_user_filter
        (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
        to context datanode

        2015-03-02 00:29:42,476 INFO
        org.apache.hadoop.http.HttpServer2: Added filter
        static_user_filter
        (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
        to context logs

        2015-03-02 00:29:42,476 INFO
        org.apache.hadoop.http.HttpServer2: Added filter
        static_user_filter
        (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
        to context static

        2015-03-02 00:29:42,494 INFO
        org.apache.hadoop.http.HttpServer2: addJerseyResourcePackage:
        
packageName=org.apache.hadoop.hdfs.server.datanode.web.resources;org.apache.hadoop.hdfs.web.resources,
        pathSpec=/webhdfs/v1/*

        2015-03-02 00:29:42,499 INFO org.mortbay.log: jetty-6.1.26

        2015-03-02 00:29:42,555 WARN org.mortbay.log: Can't reuse
        /tmp/Jetty_0_0_0_0_50075_datanode____hwtdwq, using
        /tmp/Jetty_0_0_0_0_50075_datanode____hwtdwq_3168831075162569402

        2015-03-02 00:29:43,205 INFO org.mortbay.log: Started
        [email protected]:50075 
<http://[email protected]:50075>

        2015-03-02 00:29:43,635 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: dnUserName = hdfs

        2015-03-02 00:29:43,635 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: supergroup =
        supergroup

        2015-03-02 00:29:43,802 INFO
        org.apache.hadoop.ipc.CallQueueManager: Using callQueue class
        java.util.concurrent.LinkedBlockingQueue

        2015-03-02 00:29:43,823 INFO org.apache.hadoop.ipc.Server:
        Starting Socket Reader #1 for port 50020

        2015-03-02 00:29:43,875 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Opened IPC
        server at /0.0.0.0:50020 <http://0.0.0.0:50020>

        2015-03-02 00:29:43,913 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh
        request received for nameservices: null

        2015-03-02 00:29:43,953 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Starting
        BPOfferServices for nameservices: <default>

        2015-03-02 00:29:43,973 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
        <registering> (Datanode Uuid unassigned) service to
        hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>
        starting to offer service

        2015-03-02 00:29:43,981 INFO org.apache.hadoop.ipc.Server: IPC
        Server Responder: starting

        2015-03-02 00:29:43,982 INFO org.apache.hadoop.ipc.Server: IPC
        Server listener on 50020: starting

        2015-03-02 00:29:44,620 INFO
        org.apache.hadoop.hdfs.server.common.Storage: DataNode
        version: -56 and NameNode layout version: -60

        2015-03-02 00:29:44,641 INFO
        org.apache.hadoop.hdfs.server.common.Storage: Lock on
        /cluster/storage/datanode/in_use.lock acquired by nodename
        [email protected] <mailto:[email protected]>

        2015-03-02 00:29:44,822 INFO
        org.apache.hadoop.hdfs.server.common.Storage: Analyzing
        storage directories for bpid BP-158097147-10.0.1.148-1424966425688

        2015-03-02 00:29:44,822 INFO
        org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled

        2015-03-02 00:29:44,825 INFO
        org.apache.hadoop.hdfs.server.common.Storage: Restored 0 block
        files from trash.

        2015-03-02 00:29:44,829 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up
        storage:
        
nsid=330980018;bpid=BP-158097147-10.0.1.148-1424966425688;lv=-56;nsInfo=lv=-60;cid=CID-a2c81934-b3ce-44aa-b920-436ee2f0d5a7;nsid=330980018;c=0;bpid=BP-158097147-10.0.1.148-1424966425688;dnuuid=a3b6c890-41ca-4bde-855c-015c67e6e0df

        2015-03-02 00:29:44,996 INFO
        org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
        Added new volume: /cluster/storage/datanode/current

        2015-03-02 00:29:44,998 INFO
        org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
        Added volume - /cluster/storage/datanode/current, StorageType:
        DISK

        2015-03-02 00:29:45,035 INFO
        org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
        Registered FSDatasetState MBean

        2015-03-02 00:29:45,057 INFO
        org.apache.hadoop.hdfs.server.datanode.DirectoryScanner:
        Periodic Directory Tree Verification scan starting at
        1425265856057 with interval 21600000

        2015-03-02 00:29:45,064 INFO
        org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
        Adding block pool BP-158097147-10.0.1.148-1424966425688

        2015-03-02 00:29:45,071 INFO
        org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
        Scanning block pool BP-158097147-10.0.1.148-1424966425688 on
        volume /cluster/storage/datanode/current...

        2015-03-02 00:29:45,128 INFO
        org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
        Time taken to scan block pool
        BP-158097147-10.0.1.148-1424966425688 on
        /cluster/storage/datanode/current: 56ms

        2015-03-02 00:29:45,128 INFO
        org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
        Total time to scan all replicas for block pool
        BP-158097147-10.0.1.148-1424966425688: 64ms

        2015-03-02 00:29:45,128 INFO
        org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
        Adding replicas to map for block pool
        BP-158097147-10.0.1.148-1424966425688 on volume
        /cluster/storage/datanode/current...

        2015-03-02 00:29:45,129 INFO
        org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
        Time to add replicas to map for block pool
        BP-158097147-10.0.1.148-1424966425688 on volume
        /cluster/storage/datanode/current: 0ms

        2015-03-02 00:29:45,134 INFO
        org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
        Total time to add all replicas to map: 5ms

        2015-03-02 00:29:45,138 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
        BP-158097147-10.0.1.148-1424966425688 (Datanode Uuid null)
        service to hadoop.klinger.local/10.0.1.148:8020
        <http://10.0.1.148:8020> beginning handshake with NN

        2015-03-02 00:29:45,316 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
        Block pool BP-158097147-10.0.1.148-1424966425688 (Datanode
        Uuid null) service to hadoop.klinger.local/10.0.1.148:8020
        <http://10.0.1.148:8020> successfully registered with NN

        2015-03-02 00:29:45,316 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode
        hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>
        using DELETEREPORT_INTERVAL of 300000 msec
        BLOCKREPORT_INTERVAL of 21600000msec CACHEREPORT_INTERVAL of
        10000msec Initial delay: 0msec; heartBeatInterval=3000

        2015-03-02 00:29:45,751 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Namenode
        Block pool BP-158097147-10.0.1.148-1424966425688 (Datanode
        Uuid a3b6c890-41ca-4bde-855c-015c67e6e0df) service to
        hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>
        trying to claim ACTIVE state with txid=24

        2015-03-02 00:29:45,751 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Acknowledging
        ACTIVE Namenode Block pool
        BP-158097147-10.0.1.148-1424966425688 (Datanode Uuid
        a3b6c890-41ca-4bde-855c-015c67e6e0df) service to
        hadoop.klinger.local/10.0.1.148:8020 <http://10.0.1.148:8020>

        2015-03-02 00:29:45,883 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Sent 1
        blockreports 0 blocks total. Took 4 msec to generate and 126
        msecs for RPC and NN processing.  Got back commands
        org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@3d528774 
<mailto:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@3d528774>

        2015-03-02 00:29:45,883 INFO
        org.apache.hadoop.hdfs.server.datanode.DataNode: Got finalize
        command for block pool BP-158097147-10.0.1.148-1424966425688

        2015-03-02 00:29:45,891 INFO org.apache.hadoop.util.GSet:
        Computing capacity for map BlockMap

        2015-03-02 00:29:45,891 INFO org.apache.hadoop.util.GSet: VM
        type       = 64-bit

        2015-03-02 00:29:45,893 INFO org.apache.hadoop.util.GSet: 0.5%
        max memory 966.7 MB = 4.8 MB

        2015-03-02 00:29:45,893 INFO org.apache.hadoop.util.GSet:
        capacity      = 2^19 = 524288 entries

        2015-03-02 00:29:45,894 INFO
        org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
        Periodic Block Verification Scanner initialized with interval
        504 hours for block pool BP-158097147-10.0.1.148-1424966425688

        2015-03-02 00:29:45,900 INFO
        org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Added
        bpid=BP-158097147-10.0.1.148-1424966425688 to
        blockPoolScannerMap, new size=1

        Dfsadmin –report (called as user hdfs on NameNode) generated
        following output. It looks like both DataNodes are available:

        Configured Capacity: 985465716736 (917.79 GB)

        Present Capacity: 929892360192 (866.03 GB)

        DFS Remaining: 929892302848 (866.03 GB)

        DFS Used: 57344 (56 KB)

        DFS Used%: 0.00%

        Under replicated blocks: 0

        Blocks with corrupt replicas: 0

        Missing blocks: 0

        -------------------------------------------------

        Live datanodes (2):

        Name: 10.0.1.148:50010 <http://10.0.1.148:50010>
        (hadoop.klinger.local)

        Hostname: hadoop.klinger.local

        Decommission Status : Normal

        Configured Capacity: 492732858368 (458.89 GB)

        DFS Used: 28672 (28 KB)

        Non DFS Used: 27942051840 (26.02 GB)

        DFS Remaining: 464790777856 (432.87 GB)

        DFS Used%: 0.00%

        DFS Remaining%: 94.33%

        Configured Cache Capacity: 0 (0 B)

        Cache Used: 0 (0 B)

        Cache Remaining: 0 (0 B)

        Cache Used%: 100.00%

        Cache Remaining%: 0.00%

        Xceivers: 1

        Last contact: Mon Mar 02 00:38:00 CET 2015

        Name: 10.0.1.89:50010 <http://10.0.1.89:50010>
        (hadoop-data.klinger.local)

        Hostname: hadoop-data.klinger.local

        Decommission Status : Normal

        Configured Capacity: 492732858368 (458.89 GB)

        DFS Used: 28672 (28 KB)

        Non DFS Used: 27631304704 (25.73 GB)

        DFS Remaining: 465101524992 (433.16 GB)

        DFS Used%: 0.00%

        DFS Remaining%: 94.39%

        Configured Cache Capacity: 0 (0 B)

        Cache Used: 0 (0 B)

        Cache Remaining: 0 (0 B)

        Cache Used%: 100.00%

        Cache Remaining%: 0.00%

        Xceivers: 1

        Last contact: Mon Mar 02 00:37:59 CET 2015

        Any further thoughts?

        Greets

        DK

        *Von:*Ulul [mailto:[email protected] <mailto:[email protected]>]
        *Gesendet:* Sonntag, 1. März 2015 13:12


        *An:* [email protected] <mailto:[email protected]>
        *Betreff:* Re: Hadoop 2.6.0 - No DataNode to stop

        Hi

        Did you check your slaves file is correct ?
        That the datanode process is actually running ?
        Did you check its log file ?
        That the datanode is available ? (dfsadmin -report, through
        the WUI)

        We need more detail

        Ulul

        Le 28/02/2015 22:05, Daniel Klinger a écrit :

            Thanks but i know how to kill a process in Linux. But this didn’t 
answer the question why the command say no Datanode to stop instead of stopping 
the Datanode:

            $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR 
--script hdfs stop datanode

            *Von:*Surbhi Gupta [mailto:[email protected]]
            *Gesendet:* Samstag, 28. Februar 2015 20:16
            *An:* [email protected] <mailto:[email protected]>
            *Betreff:* Re: Hadoop 2.6.0 - No DataNode to stop

            Issue jps and get the process id or
            Try to get the process id of datanode.

            Issue ps-fu userid of the user through which datanode is
            running.

            Then kill the process using kill -9

            On 28 Feb 2015 09:38, "Daniel Klinger"
            <[email protected] <mailto:[email protected]>> wrote:

                Hello,

                I used a lot of Hadoop-Distributions. Now I’m trying
                to install a pure Hadoop on a little „cluster“ for
                testing (2 CentOS-VMs: 1 Name+DataNode 1 DataNode). I
                followed the instructions on the Documentation site:
                
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html.

                I’m starting the Cluster like it is described in the
                Chapter „Operating the Hadoop Cluster“(with different
                users). The starting process works great. The
                PID-Files are created in /var/run and u can see that
                Folders and Files are created in the Data- and
                NameNode folders. I’m getting no errors in the log-files.

                When I try to stop the cluster all Services are
                stopped (NameNode, ResourceManager etc.). But when I
                stop the DataNodes I’m getting the message: „No
                DataNode to stop“. The PID-File and the
                in_use.lock-File are still there and if I try to start
                the DataNode again I’m getting the error that the
                Process is already running. When I stop the DataNode
                as hdfs instead of root the PID and in_use-File are
                removed but I’m still getting the message: „No
                DataNode to stop“

                What I’m doing wrong?

                Greets

                dk

--

    Regards,

    Varun Kumar.P

Re: AW: AW: Hadoop 2.6.0 - No DataNode to stop

Reply via email to