Why you have set this again in hdfs-site.xml at the end.

<property>
    <name>dfs.namenode.rpc-address</name>
    <value>nn1:8020</value>
  </property>

Remove this and start name node again.

Regards
Susheel Kumar
On Tue, 3 Oct 2023, 10:09 pm Harry Jamison,
<harryjamiso...@yahoo.com.invalid> wrote:

> OK here is where I am at now.
>
> When I start the namenodes, they work, but they are all in standby mode.
> When I start my first datanode it seems to kill one of the namenodes (the
> active one I assume)
>
> I am getting 2 different warnings in the namenode
>
> [2023-10-03 09:03:52,162] WARN Unable to initialize
> FileSignerSecretProvider, falling back to use random secrets. Reason: Could
> not read signature secret file: /root/hadoop-http-auth-signature-secret
> (org.apache.hadoop.security.authentication.server.AuthenticationFilter)
>
> [2023-10-03 09:03:52,350] WARN Only one image storage directory
> (dfs.namenode.name.dir) configured. Beware of data loss due to lack of
> redundant storage directories!
> (org.apache.hadoop.hdfs.server.namenode.FSNamesystem)
>
> I am using a journal node, so I am not clear if I am supposed to have
> multiple dfs.namenode.name.dir directories
> I thought each namenode has 1 directory.
>
>
> Susheel Kumar Gadalay said that my shared.edits.dir Is wrong, but I am
> not clear how it is wrong
> From here mine looks right
>
> https://hadoop.apache.org/docs/r3.3.6/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
>
> This is what is in the logs right before the namenode dies
> [2023-10-03 09:01:22,054] INFO Listener at vmnode3:8020
> (org.apache.hadoop.ipc.Server)
> [2023-10-03 09:01:22,054] INFO Starting Socket Reader #1 for port 8020
> (org.apache.hadoop.ipc.Server)
> [2023-10-03 09:01:22,097] INFO Registered FSNamesystemState,
> ReplicatedBlocksState and ECBlockGroupsState MBeans.
> (org.apache.hadoop.hdfs.server.namenode.FSNamesystem)
> [2023-10-03 09:01:22,119] INFO Number of blocks under construction: 0
> (org.apache.hadoop.hdfs.server.namenode.LeaseManager)
> [2023-10-03 09:01:22,122] INFO Initialized the Default Decommission and
> Maintenance monitor
> (org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminDefaultMonitor)
> [2023-10-03 09:01:22,131] INFO STATE* Leaving safe mode after 0 secs
> (org.apache.hadoop.hdfs.StateChange)
> [2023-10-03 09:01:22,131] INFO STATE* Network topology has 0 racks and 0
> datanodes (org.apache.hadoop.hdfs.StateChange)
> [2023-10-03 09:01:22,131] INFO STATE* UnderReplicatedBlocks has 0 blocks
> (org.apache.hadoop.hdfs.StateChange)
> [2023-10-03 09:01:22,130] INFO Start MarkedDeleteBlockScrubber thread
> (org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)
> [2023-10-03 09:01:22,158] INFO IPC Server Responder: starting
> (org.apache.hadoop.ipc.Server)
> [2023-10-03 09:01:22,159] INFO IPC Server listener on 8020: starting
> (org.apache.hadoop.ipc.Server)
> [2023-10-03 09:01:22,165] INFO NameNode RPC up at: vmnode3/
> 192.168.1.103:8020 (org.apache.hadoop.hdfs.server.namenode.NameNode)
> [2023-10-03 09:01:22,166] INFO Starting services required for standby
> state (org.apache.hadoop.hdfs.server.namenode.FSNamesystem)
> [2023-10-03 09:01:22,168] INFO Will roll logs on active node every 120
> seconds. (org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer)
> [2023-10-03 09:01:22,171] INFO Starting standby checkpoint thread...
> Checkpointing active NN to possible NNs: [http://vmnode1:9870,
> http://vmnode2:9870]
> Serving checkpoints at http://vmnode3:9870
> (org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer)
> real-time non-blocking time  (microseconds, -R) unlimited
> core file size              (blocks, -c) 0
> data seg size               (kbytes, -d) unlimited
> scheduling priority                 (-e) 0
> file size                   (blocks, -f) unlimited
> pending signals                     (-i) 15187
> max locked memory           (kbytes, -l) 8192
> max memory size             (kbytes, -m) unlimited
> open files                          (-n) 1024
> pipe size                (512 bytes, -p) 8
> POSIX message queues         (bytes, -q) 819200
> real-time priority                  (-r) 0
> stack size                  (kbytes, -s) 8192
> cpu time                   (seconds, -t) unlimited
> max user processes                  (-u) 15187
> virtual memory              (kbytes, -v) unlimited
> file locks                          (-x) unlimited
>
>
>
>
>
>
>
> On Tuesday, October 3, 2023 at 03:54:23 AM PDT, Liming Cui <
> anyone.cui...@gmail.com> wrote:
>
>
> Harry,
>
> Great question.
> I would say the same configurations in core-site.xml and hdfs-site.xml
> will be overwriting each other in some way.
>
> Glad you found the root cause.
>
> Keep going.
>
> On Tue, Oct 3, 2023 at 10:27 AM Harry Jamison <harryjamiso...@yahoo.com>
> wrote:
>
> Liming
>
> After looking at my config, I think that maybe my problem is because my 
> fs.defaultFS
> is inconsistent between hdfs-site.xml and core-site.xml
> What does hdfs-site.xml vs core-site.xml do why is the same setting in 2
> different places?
> Or do I just have it there mistakenly?
>
> this is what I have in hdfs-site.xml
>
> <?xml version="1.0" encoding="UTF-8"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> <configuration>
>   <property>
>       <name>fs.defaultFS</name>
>       <value>hdfs://mycluster</value>
>    </property>
>   <property>
>     <name>ha.zookeeper.quorum</name>
>     <value>nn1:2181,nn2:2181,nn3:2181</value>
>   </property>
>
>   <property>
>     <name>dfs.nameservices</name>
>     <value>mycluster</value>
>   </property>
>
>   <property>
>     <name>dfs.ha.namenodes.mycluster</name>
>     <value>nn1,nn2,nn3</value>
>   </property>
>
>   <property>
>     <name>dfs.namenode.rpc-address.mycluster.nn1</name>
>     <value>nn1:8020</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.mycluster.nn2</name>
>     <value>nn2:8020</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address.mycluster.nn3</name>
>     <value>nn3:8020</value>
>   </property>
>
>   <property>
>     <name>dfs.namenode.http-address.mycluster.nn1</name>
>     <value>nn1:9870</value>
>   </property>
>   <property>
>     <name>dfs.namenode.http-address.mycluster.nn2</name>
>     <value>nn2:9870</value>
>   </property>
>   <property>
>     <name>dfs.namenode.http-address.mycluster.nn3</name>
>     <value>nn3:9870</value>
>   </property>
>
>   <property>
>     <name>dfs.namenode.shared.edits.dir</name>
>     <value>qjournal://nn1:8485;nn2:8485;nn3:8485/mycluster</value>
>   </property>
>   <property>
>     <name>dfs.client.failover.proxy.provider.mycluster</name>
>
> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>   </property>
>
>   <property>
>     <name>dfs.ha.fencing.methods</name>
>     <value>sshfence</value>
>   </property>
>
>   <property>
>     <name>dfs.ha.fencing.ssh.private-key-files</name>
>     <value>/home/harry/.ssh/id_rsa</value>
>   </property>
>
>   <property>
>     <name>dfs.namenode.name.dir</name>
>     <value>file:/hadoop/data/hdfs/namenode</value>
>   </property>
>   <property>
>     <name>dfs.datanode.data.dir</name>
>     <value>file:/hadoop/data/hdfs/datanode</value>
>   </property>
>   <property>
>     <name>dfs.journalnode.edits.dir</name>
>     <value>/hadoop/data/hdfs/journalnode</value>
>   </property>
>   <property>
>     <name>dfs.namenode.rpc-address</name>
>     <value>nn1:8020</value>
>   </property>
>
>   <property>
>     <name>dfs.ha.nn.not-become-active-in-safemode</name>
>     <value>true</value>
>   </property>
>
> </configuration>
>
>
>
> In core-site.xml I have this
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!--
>
>   Licensed under the Apache License, Version 2.0 (the "License");
>
>   you may not use this file except in compliance with the License.
>
>   You may obtain a copy of the License at
>
>
>     http://www.apache.org/licenses/LICENSE-2.0
>
>
>   Unless required by applicable law or agreed to in writing, software
>
>   distributed under the License is distributed on an "AS IS" BASIS,
>
>   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>
>   See the License for the specific language governing permissions and
>
>   limitations under the License. See accompanying LICENSE file.
>
> -->
>
>
> <!-- Put site-specific property overrides in this file. -->
>
>
> <configuration>
>
>   <property>
>
>     <name>fs.defaultFS</name>
>
>     <value>hdfs://nn1:8020</value>
>
>   </property>
>
>
> </configuration>
>
>
> On Tuesday, October 3, 2023 at 12:54:26 AM PDT, Liming Cui <
> anyone.cui...@gmail.com> wrote:
>
>
> Can you show us the configuration files?
> Maybe I can help you with some suggestions.
>
>
> On Tue, Oct 3, 2023 at 9:05 AM Harry Jamison
> <harryjamiso...@yahoo.com.invalid> wrote:
>
> I am trying to setup a HA HDFS cluster, and I am running into a problem
>
> I am not sure what I am doing wrong, I thought I followed the HA namenode
> guide, but it is not working.
>
>
> Apache Hadoop 3.3.6 – HDFS High Availability
> <https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html>
>
>
>
> I have 2 namenodes and 3 journal nodes, and 3 zookeeper nodes.
>
> After some period of time I see the following and my namenode and journal
> node die.
> I am not sure where the problem is, or how to diagnose what I am doing
> wrong here.  And the logging here does not make sense to me.
>
> Namenode
> Serving checkpoints at http://nn1:9870
> (org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer)
>
> real-time non-blocking time  (microseconds, -R) unlimited
>
> core file size              (blocks, -c) 0
>
> data seg size               (kbytes, -d) unlimited
>
> scheduling priority                 (-e) 0
>
> file size                   (blocks, -f) unlimited
>
> pending signals                     (-i) 15187
>
> max locked memory           (kbytes, -l) 8192
>
> max memory size             (kbytes, -m) unlimited
>
> open files                          (-n) 1024
>
> pipe size                (512 bytes, -p) 8
>
> POSIX message queues         (bytes, -q) 819200
>
> real-time priority                  (-r) 0
>
> stack size                  (kbytes, -s) 8192
>
> cpu time                   (seconds, -t) unlimited
>
> max user processes                  (-u) 15187
>
> virtual memory              (kbytes, -v) unlimited
>
> file locks                          (-x) unlimited
>
> [2023-10-02 23:53:46,693] ERROR RECEIVED SIGNAL 15: SIGTERM
> (org.apache.hadoop.hdfs.server.namenode.NameNode)
>
> [2023-10-02 23:53:46,701] INFO SHUTDOWN_MSG:
>
> /************************************************************
>
> SHUTDOWN_MSG: Shutting down NameNode at nn1/192.168.1.159
>
> ************************************************************/
> (org.apache.hadoop.hdfs.server.namenode.NameNode)
>
> JournalNode
> [2023-10-02 23:54:19,162] WARN Journal at nn1/192.168.1.159:8485 has no
> edit logs (org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer)
>
> real-time non-blocking time  (microseconds, -R) unlimited
>
> core file size              (blocks, -c) 0
>
> data seg size               (kbytes, -d) unlimited
>
> scheduling priority                 (-e) 0
>
> file size                   (blocks, -f) unlimited
>
> pending signals                     (-i) 15187
>
> max locked memory           (kbytes, -l) 8192
>
> max memory size             (kbytes, -m) unlimited
>
> open files                          (-n) 1024
>
> pipe size                (512 bytes, -p) 8
>
> POSIX message queues         (bytes, -q) 819200
>
> real-time priority                  (-r) 0
>
> stack size                  (kbytes, -s) 8192
>
> cpu time                   (seconds, -t) unlimited
>
> max user processes                  (-u) 15187
>
> virtual memory              (kbytes, -v) unlimited
>
> file locks                          (-x) unlimited
>
>
>
>
> --
> *Best*
>
> Liming
>
>
>
> --
> *Best*
>
> Liming
>

Reply via email to