Hi,
It seems the clock among the machines in the cluster is not synchronized
as expected. It might be because of NTP configuration issues. There is
some information to start troubleshooting with:
http://kudu.apache.org/docs/troubleshooting.html#ntp
That error might appear during tablet bootstrap (so it might happen to
both masters and tservers).
What is output of the 'ntptime' command if running at the servers?
Also, what is 'ntpq -p localhost' output is?
Best regards,
Alexey
On 5/2/17 12:12 AM, ???????? wrote:
Since the kudu cluster machine is powered down, I need to restart
kudu-master and kudu-tserver.
The cluster has three master and three tserver, one of the master and
three tserver start error, error message: Bad status: Invalid
argument: Tried to update clock beyond the max. Error.
I tried to set max_clock_sync_error_usec larger, but still the same
mistake.
I do not know what to do to solve it.
Kudu-master start log:
Log file created at: 2017/05/02 14:50:53
Running on machine: hadoopname01vl
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0502 14:50:53.479116 5474 master_main.cc:60] Master server
non-default flags:
--fs_data_dirs=/app/kudu/master
--fs_wal_dir=/app/kudu/master
--master_addresses=hadoopname01vl:7051,hadoopdata04vl:7051,hadoopname02vl:7051
--max_clock_sync_error_usec=1500000000
--heap_profile_path=/tmp/kudu-master.5474
--flagfile=/etc/kudu/conf/master.gflagfile
--fromenv=log_dir
--log_dir=/app/kudu/log
Master server version:
kudu 1.2.0-cdh5.10.0
revision 01748528baa06b78e04ce9a799cc60090a821162
build type RELEASE
built by jenkins at 23 Jan 2017 23:49:02 PST on
kudu-centos66-17b9.vpc.cloudera.com
build id 2017-01-23_23-14-17
I0502 14:50:53.479230 5474 mem_tracker.cc:140] MemTracker: hard
memory limit is 2.988239 GB
I0502 14:50:53.479236 5474 mem_tracker.cc:142] MemTracker: soft
memory limit is 1.792943 GB
I0502 14:50:53.480358 5474 master_main.cc:67] Initializing master
server...
I0502 14:50:53.480466 5474 hybrid_clock.cc:177] HybridClock
initialized. Resolution in nanos?: 1 Wait times tolerance adjustment:
1.0005 Current error: 1109553
I0502 14:50:53.481259 5474 env_posix.cc:1284] Not raising process
file limit of 131072; it is already as high as it can go
I0502 14:50:53.481281 5474 file_cache.cc:401] Constructed file cache
lbm with capacity 65536
I0502 14:50:53.482020 5474 log_block_manager.cc:1336] Data dir
/app/kudu/master/data is on an ext4 filesystem vulnerable to KUDU-1508
with block size 4096
I0502 14:50:53.482035 5474 log_block_manager.cc:1346] Limiting
containers on data directory /app/kudu/master/data to 2721 blocks
I0502 14:50:53.484666 5474 fs_manager.cc:251] Opened local
filesystem: /app/kudu/master
uuid: "4811dfb33ff444d2b3416d7bbe3c9a38"
format_stamp: "Formatted at 2017-02-20 07:35:54 on hadoopname01vl"
I0502 14:50:53.501610 5474 master_main.cc:70] Starting Master server...
I0502 14:50:53.505748 5474 rpc_server.cc:164] RPC server started.
Bound to: 0.0.0.0:7051
I0502 14:50:53.505798 5474 webserver.cc:126] Starting webserver on
0.0.0.0:8051
I0502 14:50:53.505807 5474 webserver.cc:131] Document root:
/usr/lib/kudu/www
I0502 14:50:53.505928 5474 webserver.cc:221] Webserver started. Bound
to: http://0.0.0.0:8051/
I0502 14:50:53.506609 5543 sys_catalog.cc:119] Verifying existing
consensus state
I0502 14:50:53.507067 5543 tablet_bootstrap.cc:381] T
00000000000000000000000000000000 P 4811dfb33ff444d2b3416d7bbe3c9a38:
Bootstrap starting.
I0502 14:50:53.507866 5543 tablet_bootstrap.cc:540] T
00000000000000000000000000000000 P 4811dfb33ff444d2b3416d7bbe3c9a38:
Time spent opening tablet: real 0.001s user 0.000s sys 0.000s
I0502 14:50:53.507894 5543 tablet_bootstrap.cc:560] T
00000000000000000000000000000000 P 4811dfb33ff444d2b3416d7bbe3c9a38:
Previous recovery directory found at
/app/kudu/master/wals/00000000000000000000000000000000.recovery:
Replaying log files from this location instead of
/app/kudu/master/wals/00000000000000000000000000000000
I0502 14:50:53.507917 5543 tablet_bootstrap.cc:567] T
00000000000000000000000000000000 P 4811dfb33ff444d2b3416d7bbe3c9a38:
Deleting old log files from previous recovery attempt in
/app/kudu/master/wals/00000000000000000000000000000000
I0502 14:50:53.509835 5543 log_util.cc:316] Log segment
/app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001
has no footer. This segment was likely being written when the server
previously shut down.
I0502 14:50:53.509851 5543 log_reader.cc:160] Log segment
/app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001
was likely left in-progress after a previous crash. Will try to
rebuild footer by scanning data.
I0502 14:50:53.548249 5543 log_util.cc:570] Scanning
/app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001
for valid entry headers following offset 7156830...
I0502 14:50:53.564885 5543 log_util.cc:607] Found no log entry headers
I0502 14:50:53.564929 5543 log_util.cc:219] Ignoring log segment
corruption in
/app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001
because there are no log entries following the corrupted one. The
server probably crashed in the middle of writing an entry to the
write-ahead log or downloaded an active log via tablet copy. Error
detail: Corruption: CRC mismatch in log entry header: Log file
corruption detected. Failed trying to read batch #0 at offset 7156818
for log segment
/app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001:
Prior entries: [off=7156180 REPLICATE (3.11030)] [off=7156213 COMMIT
(3.11030)] [off=7156252 REPLICATE (4.11031)] [off=7156818 REPLICATE
(4.11032)]
I0502 14:50:53.564937 5543 log_util.cc:369] Successfully rebuilt
footer for segment:
/app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001
(valid entries through byte offset 7156818)
I0502 14:50:53.564985 5543 tablet.cc:983] T
00000000000000000000000000000000 Rewinding schema during bootstrap to
Schema [
0:entry_type[int8 NOT NULL],
1:entry_id[string NOT NULL],
2:metadata[string NOT NULL]
]
I0502 14:50:53.565114 5543 log.cc:351] Log is configured to *not*
fsync() on all Append() calls
F0502 14:50:53.717851 5543 tablet_bootstrap.cc:790] Check failed:
_s.ok() Bad status: Invalid argument: Tried to update clock beyond the
max. error.