Hi,

It seems the clock among the machines in the cluster is not synchronized as expected. It might be because of NTP configuration issues. There is some information to start troubleshooting with: http://kudu.apache.org/docs/troubleshooting.html#ntp

That error might appear during tablet bootstrap (so it might happen to both masters and tservers).

What is output of the 'ntptime' command if running at the servers? Also, what is 'ntpq -p localhost' output is?


Best regards,

Alexey


On 5/2/17 12:12 AM, ???????? wrote:
Since the kudu cluster machine is powered down, I need to restart kudu-master and kudu-tserver. The cluster has three master and three tserver, one of the master and three tserver start error, error message: Bad status: Invalid argument: Tried to update clock beyond the max. Error. I tried to set max_clock_sync_error_usec larger, but still the same mistake.
I do not know what to do to solve it.
Kudu-master start log:

Log file created at: 2017/05/02 14:50:53
Running on machine: hadoopname01vl
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0502 14:50:53.479116 5474 master_main.cc:60] Master server non-default flags:
--fs_data_dirs=/app/kudu/master
--fs_wal_dir=/app/kudu/master
--master_addresses=hadoopname01vl:7051,hadoopdata04vl:7051,hadoopname02vl:7051
--max_clock_sync_error_usec=1500000000
--heap_profile_path=/tmp/kudu-master.5474
--flagfile=/etc/kudu/conf/master.gflagfile
--fromenv=log_dir
--log_dir=/app/kudu/log
Master server version:
kudu 1.2.0-cdh5.10.0
revision 01748528baa06b78e04ce9a799cc60090a821162
build type RELEASE
built by jenkins at 23 Jan 2017 23:49:02 PST on kudu-centos66-17b9.vpc.cloudera.com
build id 2017-01-23_23-14-17
I0502 14:50:53.479230 5474 mem_tracker.cc:140] MemTracker: hard memory limit is 2.988239 GB I0502 14:50:53.479236 5474 mem_tracker.cc:142] MemTracker: soft memory limit is 1.792943 GB I0502 14:50:53.480358 5474 master_main.cc:67] Initializing master server... I0502 14:50:53.480466 5474 hybrid_clock.cc:177] HybridClock initialized. Resolution in nanos?: 1 Wait times tolerance adjustment: 1.0005 Current error: 1109553 I0502 14:50:53.481259 5474 env_posix.cc:1284] Not raising process file limit of 131072; it is already as high as it can go I0502 14:50:53.481281 5474 file_cache.cc:401] Constructed file cache lbm with capacity 65536 I0502 14:50:53.482020 5474 log_block_manager.cc:1336] Data dir /app/kudu/master/data is on an ext4 filesystem vulnerable to KUDU-1508 with block size 4096 I0502 14:50:53.482035 5474 log_block_manager.cc:1346] Limiting containers on data directory /app/kudu/master/data to 2721 blocks I0502 14:50:53.484666 5474 fs_manager.cc:251] Opened local filesystem: /app/kudu/master
uuid: "4811dfb33ff444d2b3416d7bbe3c9a38"
format_stamp: "Formatted at 2017-02-20 07:35:54 on hadoopname01vl"
I0502 14:50:53.501610  5474 master_main.cc:70] Starting Master server...
I0502 14:50:53.505748 5474 rpc_server.cc:164] RPC server started. Bound to: 0.0.0.0:7051 I0502 14:50:53.505798 5474 webserver.cc:126] Starting webserver on 0.0.0.0:8051 I0502 14:50:53.505807 5474 webserver.cc:131] Document root: /usr/lib/kudu/www I0502 14:50:53.505928 5474 webserver.cc:221] Webserver started. Bound to: http://0.0.0.0:8051/ I0502 14:50:53.506609 5543 sys_catalog.cc:119] Verifying existing consensus state I0502 14:50:53.507067 5543 tablet_bootstrap.cc:381] T 00000000000000000000000000000000 P 4811dfb33ff444d2b3416d7bbe3c9a38: Bootstrap starting. I0502 14:50:53.507866 5543 tablet_bootstrap.cc:540] T 00000000000000000000000000000000 P 4811dfb33ff444d2b3416d7bbe3c9a38: Time spent opening tablet: real 0.001s user 0.000s sys 0.000s I0502 14:50:53.507894 5543 tablet_bootstrap.cc:560] T 00000000000000000000000000000000 P 4811dfb33ff444d2b3416d7bbe3c9a38: Previous recovery directory found at /app/kudu/master/wals/00000000000000000000000000000000.recovery: Replaying log files from this location instead of /app/kudu/master/wals/00000000000000000000000000000000 I0502 14:50:53.507917 5543 tablet_bootstrap.cc:567] T 00000000000000000000000000000000 P 4811dfb33ff444d2b3416d7bbe3c9a38: Deleting old log files from previous recovery attempt in /app/kudu/master/wals/00000000000000000000000000000000 I0502 14:50:53.509835 5543 log_util.cc:316] Log segment /app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001 has no footer. This segment was likely being written when the server previously shut down. I0502 14:50:53.509851 5543 log_reader.cc:160] Log segment /app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data. I0502 14:50:53.548249 5543 log_util.cc:570] Scanning /app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001 for valid entry headers following offset 7156830...
I0502 14:50:53.564885  5543 log_util.cc:607] Found no log entry headers
I0502 14:50:53.564929 5543 log_util.cc:219] Ignoring log segment corruption in /app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001 because there are no log entries following the corrupted one. The server probably crashed in the middle of writing an entry to the write-ahead log or downloaded an active log via tablet copy. Error detail: Corruption: CRC mismatch in log entry header: Log file corruption detected. Failed trying to read batch #0 at offset 7156818 for log segment /app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001: Prior entries: [off=7156180 REPLICATE (3.11030)] [off=7156213 COMMIT (3.11030)] [off=7156252 REPLICATE (4.11031)] [off=7156818 REPLICATE (4.11032)] I0502 14:50:53.564937 5543 log_util.cc:369] Successfully rebuilt footer for segment: /app/kudu/master/wals/00000000000000000000000000000000.recovery/wal-000000001 (valid entries through byte offset 7156818) I0502 14:50:53.564985 5543 tablet.cc:983] T 00000000000000000000000000000000 Rewinding schema during bootstrap to Schema [
        0:entry_type[int8 NOT NULL],
        1:entry_id[string NOT NULL],
        2:metadata[string NOT NULL]
]
I0502 14:50:53.565114 5543 log.cc:351] Log is configured to *not* fsync() on all Append() calls F0502 14:50:53.717851 5543 tablet_bootstrap.cc:790] Check failed: _s.ok() Bad status: Invalid argument: Tried to update clock beyond the max. error.


Reply via email to