Liming After looking at my config, I think that maybe my problem is because my fs.defaultFS is inconsistent between hdfs-site.xml and core-site.xmlWhat does hdfs-site.xml vs core-site.xml do why is the same setting in 2 different places?Or do I just have it there mistakenly? this is what I have in hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>nn1:2181,nn2:2181,nn3:2181</value> </property> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2,nn3</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>nn1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>nn2:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn3</name> <value>nn3:8020</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>nn1:9870</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>nn2:9870</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn3</name> <value>nn3:9870</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://nn1:8485;nn2:8485;nn3:8485/mycluster</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/harry/.ssh/id_rsa</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/hadoop/data/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/hadoop/data/hdfs/datanode</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/hadoop/data/hdfs/journalnode</value> </property> <property> <name>dfs.namenode.rpc-address</name> <value>nn1:8020</value> </property> <property> <name>dfs.ha.nn.not-become-active-in-safemode</name> <value>true</value> </property> </configuration>
In core-site.xml I have this <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://nn1:8020</value> </property> </configuration> On Tuesday, October 3, 2023 at 12:54:26 AM PDT, Liming Cui <anyone.cui...@gmail.com> wrote: Can you show us the configuration files? Maybe I can help you with some suggestions. On Tue, Oct 3, 2023 at 9:05 AM Harry Jamison <harryjamiso...@yahoo.com.invalid> wrote: I am trying to setup a HA HDFS cluster, and I am running into a problem I am not sure what I am doing wrong, I thought I followed the HA namenode guide, but it is not working. Apache Hadoop 3.3.6 – HDFS High Availability I have 2 namenodes and 3 journal nodes, and 3 zookeeper nodes. After some period of time I see the following and my namenode and journal node die.I am not sure where the problem is, or how to diagnose what I am doing wrong here. And the logging here does not make sense to me. NamenodeServing checkpoints at http://nn1:9870 (org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer) real-time non-blocking time (microseconds, -R) unlimited core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 15187 max locked memory (kbytes, -l) 8192 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 15187 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited [2023-10-02 23:53:46,693] ERROR RECEIVED SIGNAL 15: SIGTERM (org.apache.hadoop.hdfs.server.namenode.NameNode) [2023-10-02 23:53:46,701] INFO SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at nn1/192.168.1.159 ************************************************************/ (org.apache.hadoop.hdfs.server.namenode.NameNode) JournalNode[2023-10-02 23:54:19,162] WARN Journal at nn1/192.168.1.159:8485 has no edit logs (org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer) real-time non-blocking time (microseconds, -R) unlimited core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 15187 max locked memory (kbytes, -l) 8192 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 15187 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited -- Best Liming