Thanks guys, I figured out what my issue was.I did not setup the ssh key 
correctly, it was for my user but I started the service as root.
Now it is working except none of the namenodes are transitioning to active on 
startup, and the datanodes are not starting automatically (I think because no 
namenode is active).
I can start everything manually though.

    On Tuesday, October 3, 2023 at 11:03:33 AM PDT, Susheel Kumar Gadalay 
<skgada...@gmail.com> wrote:  
 
 Why you have set this again in hdfs-site.xml at the end.
<property>    <name>dfs.namenode.rpc-address</name>    <value>nn1:8020</value>  
</property>
Remove this and start name node again.
Regards Susheel Kumar On Tue, 3 Oct 2023, 10:09 pm Harry Jamison, 
<harryjamiso...@yahoo.com.invalid> wrote:

 OK here is where I am at now.
When I start the namenodes, they work, but they are all in standby mode.When I 
start my first datanode it seems to kill one of the namenodes (the active one I 
assume)
I am getting 2 different warnings in the namenode
[2023-10-03 09:03:52,162] WARN Unable to initialize FileSignerSecretProvider, 
falling back to use random secrets. Reason: Could not read signature secret 
file: /root/hadoop-http-auth-signature-secret 
(org.apache.hadoop.security.authentication.server.AuthenticationFilter)

[2023-10-03 09:03:52,350] WARN Only one image storage directory 
(dfs.namenode.name.dir) configured. Beware of data loss due to lack of 
redundant storage directories! 
(org.apache.hadoop.hdfs.server.namenode.FSNamesystem)

I am using a journal node, so I am not clear if I am supposed to have multiple 
dfs.namenode.name.dir directoriesI thought each namenode has 1 directory.

Susheel Kumar Gadalay said that my shared.edits.dir Is wrong, but I am not 
clear how it is wrongFrom here mine looks 
righthttps://hadoop.apache.org/docs/r3.3.6/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

This is what is in the logs right before the namenode dies[2023-10-03 
09:01:22,054] INFO Listener at vmnode3:8020 
(org.apache.hadoop.ipc.Server)[2023-10-03 09:01:22,054] INFO Starting Socket 
Reader #1 for port 8020 (org.apache.hadoop.ipc.Server)[2023-10-03 09:01:22,097] 
INFO Registered FSNamesystemState, ReplicatedBlocksState and ECBlockGroupsState 
MBeans. (org.apache.hadoop.hdfs.server.namenode.FSNamesystem)[2023-10-03 
09:01:22,119] INFO Number of blocks under construction: 0 
(org.apache.hadoop.hdfs.server.namenode.LeaseManager)[2023-10-03 09:01:22,122] 
INFO Initialized the Default Decommission and Maintenance monitor 
(org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminDefaultMonitor)[2023-10-03
 09:01:22,131] INFO STATE* Leaving safe mode after 0 secs 
(org.apache.hadoop.hdfs.StateChange)[2023-10-03 09:01:22,131] INFO STATE* 
Network topology has 0 racks and 0 datanodes 
(org.apache.hadoop.hdfs.StateChange)[2023-10-03 09:01:22,131] INFO STATE* 
UnderReplicatedBlocks has 0 blocks 
(org.apache.hadoop.hdfs.StateChange)[2023-10-03 09:01:22,130] INFO Start 
MarkedDeleteBlockScrubber thread 
(org.apache.hadoop.hdfs.server.blockmanagement.BlockManager)[2023-10-03 
09:01:22,158] INFO IPC Server Responder: starting 
(org.apache.hadoop.ipc.Server)[2023-10-03 09:01:22,159] INFO IPC Server 
listener on 8020: starting (org.apache.hadoop.ipc.Server)[2023-10-03 
09:01:22,165] INFO NameNode RPC up at: vmnode3/192.168.1.103:8020 
(org.apache.hadoop.hdfs.server.namenode.NameNode)[2023-10-03 09:01:22,166] INFO 
Starting services required for standby state 
(org.apache.hadoop.hdfs.server.namenode.FSNamesystem)[2023-10-03 09:01:22,168] 
INFO Will roll logs on active node every 120 seconds. 
(org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer)[2023-10-03 
09:01:22,171] INFO Starting standby checkpoint thread...Checkpointing active NN 
to possible NNs: [http://vmnode1:9870, http://vmnode2:9870]Serving checkpoints 
at http://vmnode3:9870 
(org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer)real-time 
non-blocking time  (microseconds, -R) unlimitedcore file size              
(blocks, -c) 0data seg size               (kbytes, -d) unlimitedscheduling 
priority                 (-e) 0file size                   (blocks, -f) 
unlimitedpending signals                     (-i) 15187max locked memory        
   (kbytes, -l) 8192max memory size             (kbytes, -m) unlimitedopen 
files                          (-n) 1024pipe size                (512 bytes, 
-p) 8POSIX message queues         (bytes, -q) 819200real-time priority          
        (-r) 0stack size                  (kbytes, -s) 8192cpu time             
      (seconds, -t) unlimitedmax user processes                  (-u) 
15187virtual memory              (kbytes, -v) unlimitedfile locks               
           (-x) unlimited






    On Tuesday, October 3, 2023 at 03:54:23 AM PDT, Liming Cui 
<anyone.cui...@gmail.com> wrote:  
 
 Harry,
Great question.I would say the same configurations in core-site.xml and 
hdfs-site.xml will be overwriting each other in some way.
Glad you found the root cause.
Keep going.
On Tue, Oct 3, 2023 at 10:27 AM Harry Jamison <harryjamiso...@yahoo.com> wrote:

 Liming 
After looking at my config, I think that maybe my problem is because my 
fs.defaultFS is inconsistent between hdfs-site.xml and core-site.xmlWhat does 
hdfs-site.xml vs core-site.xml do why is the same setting in 2 different 
places?Or do I just have it there mistakenly?
this is what I have in hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" 
href="configuration.xsl"?><configuration>  <property>      
<name>fs.defaultFS</name>      <value>hdfs://mycluster</value>   </property>  
<property>    <name>ha.zookeeper.quorum</name>    
<value>nn1:2181,nn2:2181,nn3:2181</value>  </property>
  <property>    <name>dfs.nameservices</name>    <value>mycluster</value>  
</property>
  <property>    <name>dfs.ha.namenodes.mycluster</name>    
<value>nn1,nn2,nn3</value>  </property>
  <property>    <name>dfs.namenode.rpc-address.mycluster.nn1</name>    
<value>nn1:8020</value>  </property>  <property>    
<name>dfs.namenode.rpc-address.mycluster.nn2</name>    <value>nn2:8020</value>  
</property>  <property>    <name>dfs.namenode.rpc-address.mycluster.nn3</name>  
  <value>nn3:8020</value>  </property>
  <property>    <name>dfs.namenode.http-address.mycluster.nn1</name>    
<value>nn1:9870</value>  </property>  <property>    
<name>dfs.namenode.http-address.mycluster.nn2</name>    <value>nn2:9870</value> 
 </property>  <property>    
<name>dfs.namenode.http-address.mycluster.nn3</name>    <value>nn3:9870</value> 
 </property>
  <property>    <name>dfs.namenode.shared.edits.dir</name>    
<value>qjournal://nn1:8485;nn2:8485;nn3:8485/mycluster</value>  </property>  
<property>    <name>dfs.client.failover.proxy.provider.mycluster</name>    
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <property>    <name>dfs.ha.fencing.methods</name>    <value>sshfence</value>  
</property>
  <property>    <name>dfs.ha.fencing.ssh.private-key-files</name>    
<value>/home/harry/.ssh/id_rsa</value>  </property>
  <property>    <name>dfs.namenode.name.dir</name>    
<value>file:/hadoop/data/hdfs/namenode</value>  </property>  <property>    
<name>dfs.datanode.data.dir</name>    
<value>file:/hadoop/data/hdfs/datanode</value>  </property>  <property>    
<name>dfs.journalnode.edits.dir</name>    
<value>/hadoop/data/hdfs/journalnode</value>  </property>  <property>    
<name>dfs.namenode.rpc-address</name>    <value>nn1:8020</value>  </property>
  <property>    <name>dfs.ha.nn.not-become-active-in-safemode</name>    
<value>true</value>  </property>
</configuration>


In core-site.xml I have this
<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at




    http://www.apache.org/licenses/LICENSE-2.0




  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->




<!-- Put site-specific property overrides in this file. -->




<configuration>

  <property>

    <name>fs.defaultFS</name>

    <value>hdfs://nn1:8020</value>

  </property>




</configuration>


    On Tuesday, October 3, 2023 at 12:54:26 AM PDT, Liming Cui 
<anyone.cui...@gmail.com> wrote:  
 
 Can you show us the configuration files? Maybe I can help you with some 
suggestions.

On Tue, Oct 3, 2023 at 9:05 AM Harry Jamison <harryjamiso...@yahoo.com.invalid> 
wrote:

I am trying to setup a HA HDFS cluster, and I am running into a problem
I am not sure what I am doing wrong, I thought I followed the HA namenode 
guide, but it is not working.

Apache Hadoop 3.3.6 – HDFS High Availability


I have 2 namenodes and 3 journal nodes, and 3 zookeeper nodes.
After some period of time I see the following and my namenode and journal node 
die.I am not sure where the problem is, or how to diagnose what I am doing 
wrong here.  And the logging here does not make sense to me.
NamenodeServing checkpoints at http://nn1:9870 
(org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer)
real-time non-blocking time  (microseconds, -R) unlimited

core file size              (blocks, -c) 0

data seg size               (kbytes, -d) unlimited

scheduling priority                 (-e) 0

file size                   (blocks, -f) unlimited

pending signals                     (-i) 15187

max locked memory           (kbytes, -l) 8192

max memory size             (kbytes, -m) unlimited

open files                          (-n) 1024

pipe size                (512 bytes, -p) 8

POSIX message queues         (bytes, -q) 819200

real-time priority                  (-r) 0

stack size                  (kbytes, -s) 8192

cpu time                   (seconds, -t) unlimited

max user processes                  (-u) 15187

virtual memory              (kbytes, -v) unlimited

file locks                          (-x) unlimited

[2023-10-02 23:53:46,693] ERROR RECEIVED SIGNAL 15: SIGTERM 
(org.apache.hadoop.hdfs.server.namenode.NameNode)

[2023-10-02 23:53:46,701] INFO SHUTDOWN_MSG: 

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at nn1/192.168.1.159

************************************************************/ 
(org.apache.hadoop.hdfs.server.namenode.NameNode)

JournalNode[2023-10-02 23:54:19,162] WARN Journal at nn1/192.168.1.159:8485 has 
no edit logs (org.apache.hadoop.hdfs.qjournal.server.JournalNodeSyncer)
real-time non-blocking time  (microseconds, -R) unlimited

core file size              (blocks, -c) 0

data seg size               (kbytes, -d) unlimited

scheduling priority                 (-e) 0

file size                   (blocks, -f) unlimited

pending signals                     (-i) 15187

max locked memory           (kbytes, -l) 8192

max memory size             (kbytes, -m) unlimited

open files                          (-n) 1024

pipe size                (512 bytes, -p) 8

POSIX message queues         (bytes, -q) 819200

real-time priority                  (-r) 0

stack size                  (kbytes, -s) 8192

cpu time                   (seconds, -t) unlimited

max user processes                  (-u) 15187

virtual memory              (kbytes, -v) unlimited

file locks                          (-x) unlimited





-- 
Best
Liming  


-- 
Best
Liming  
  

Reply via email to