I am not sure what is wrong with my hbase setup. And I am not sure where to look to investigate this. I do not see anything wrong in the logs, so I am not sure where to go from here. I was having a problem with zookeeper, but I may have gotten around that, I am not 100% sure.
I am getting this error message on the hbasemaster 2024-11-14T17:08:06,945 WARN [master/dn2:16000:becomeActiveMaster] master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1731632459118, server=dn1,16020,1729981874197}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined. 2024-11-14T17:09:06,946 WARN [master/dn2:16000:becomeActiveMaster] master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1731632459118, server=dn1,16020,1729981874197}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined. But my region servers are started. And when I restarted the hbasemaster, I got a connection refused on the region server. Here is a snippet from the regionserver logs 2024-11-14T16:53:20,274 INFO [regionserver/dn2:16020] hbase.ChoreService: Chore ScheduledChore name=CompactionThroughputTuner, period=60000, unit=MILLISECONDS is enabled. 2024-11-14T16:53:20,275 INFO [regionserver/dn2:16020] regionserver.HRegionServer: CompactionChecker runs every PT10S 2024-11-14T16:53:20,298 INFO [regionserver/dn2:16020] hbase.ChoreService: Chore ScheduledChore name=CompactedHFilesCleaner, period=120000, unit=MILLISECONDS is enabled. 2024-11-14T16:53:20,302 INFO [regionserver/dn2:16020] hbase.ChoreService: Chore ScheduledChore name=CompactionChecker, period=10000, unit=MILLISECONDS is enabled. 2024-11-14T16:53:20,302 INFO [regionserver/dn2:16020] hbase.ChoreService: Chore ScheduledChore name=MemstoreFlusherChore, period=10000, unit=MILLISECONDS is enabled. 2024-11-14T16:53:20,302 INFO [regionserver/dn2:16020] hbase.ChoreService: Chore ScheduledChore name=nonceCleaner, period=360000, unit=MILLISECONDS is enabled. 2024-11-14T16:53:20,303 INFO [regionserver/dn2:16020] hbase.ChoreService: Chore ScheduledChore name=BrokenStoreFileCleaner, period=21600000, unit=MILLISECONDS is enabled. 2024-11-14T16:53:20,303 INFO [regionserver/dn2:16020] hbase.ChoreService: Chore ScheduledChore name=dn2,16020,1731631998737-MobFileCleanerChore, period=86400, unit=SECONDS is enabled. 2024-11-14T16:53:20,318 INFO [regionserver/dn2:16020] regionserver.HeapMemoryManager: Starting, tuneOn=false 2024-11-14T16:53:20,320 INFO [regionserver/dn2:16020] hbase.ChoreService: Chore ScheduledChore name=dn2,16020,1731631998737-HeapMemoryTunerChore, period=60000, unit=MILLISECONDS is enabled. 2024-11-14T16:53:20,321 INFO [regionserver/dn2:16020] regionserver.ChunkCreator: Allocating data MemStoreChunkPool with chunk size 2 MB, max count 2890, initial count 0 2024-11-14T16:53:20,322 INFO [regionserver/dn2:16020] regionserver.ChunkCreator: Allocating index MemStoreChunkPool with chunk size 204.80 KB, max count 3212, initial count 0 2024-11-14T16:53:20,334 INFO [regionserver/dn2:16020] regionserver.Replication: dn2,16020,1731631998737 started 2024-11-14T16:53:20,335 INFO [regionserver/dn2:16020] regionserver.HRegionServer: Serving as dn2,16020,1731631998737, RpcServer on dn2/192.168.1.109:16020, sessionid=0x3000036514f0002 2024-11-14T16:53:20,343 INFO [regionserver/dn2:16020] quotas.RegionServerRpcQuotaManager: Quota support disabled 2024-11-14T16:53:20,343 INFO [regionserver/dn2:16020] quotas.RegionServerSpaceQuotaManager: Quota support disabled, not starting space quota manager. 2024-11-14T16:53:23,377 INFO [regionserver/dn2:16020] wal.AbstractFSWAL: WAL configuration: blocksize=256 MB, rollsize=128 MB, prefix=dn2%2C16020%2C1731631998737, suffix=, logDir=hdfs://dn1/hbase/WALs/dn2,16020,1731631998737, archiveDir=hdfs://dn1/hbase/oldWALs, maxLogs=100 2024-11-14T16:53:23,383 INFO [regionserver/dn2:16020] monitor.StreamSlowMonitor: New stream slow monitor dn2%2C16020%2C1731631998737.1731632003381 2024-11-14T16:53:23,445 INFO [regionserver/dn2:16020] wal.AbstractFSWAL: New WAL /hbase/WALs/dn2,16020,1731631998737/dn2%2C16020%2C1731631998737.1731632003381 It is creating the directory in hdfs *kal@dn1*:*/hadoop/deploy/hadoop/bin*$ ./hdfs dfs -ls /hbase/WALs Found 846 items drwxr-xr-x - root supergroup 0 2024-11-13 18:04 /hbase/WALs/dn1,16020,1731549858040 drwxr-xr-x - root supergroup 0 2024-11-13 22:08 /hbase/WALs/dn1,16020,1731564499960 drwxr-xr-x - root supergroup 0 2024-11-13 23:05 /hbase/WALs/dn1,16020,1731567827103 drwxr-xr-x - root supergroup 0 2024-11-13 23:08 /hbase/WALs/dn1,16020,1731567978031 drwxr-xr-x - root supergroup 0 2024-11-13 23:10 /hbase/WALs/dn1,16020,1731568139080 drwxr-xr-x - root supergroup 0 2024-11-13 23:13 /hbase/WALs/dn1,16020,1731568290018 This is my hbase-site.xml <configuration> <property> <name>hbase.cluster.distributed</name> <value>false</value> </property> <property> <name>hbase.tmp.dir</name> <value>./tmp</value> </property> <property> <name>hbase.unsafe.stream.capability.enforce</name> <value>false</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://dn1/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>dn1,dn2,dn3</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/hadoop/data/hbase/zookeeper</value> </property> </configuration> And my regionservers dn1 dn2 dn3