[jira] [Commented] (KUDU-2260) Log block manager should handle null bytes in metadata on crash
[ https://issues.apache.org/jira/browse/KUDU-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190713#comment-17190713 ] ASF subversion and git services commented on KUDU-2260: --- Commit ae776c12895b1f2a60b95749d7a8cb31d0e841ad in kudu's branch refs/heads/master from Attila Bukor [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=ae776c1 ] Fix a Google+ link When fixing KUDU-2260 a Google+ link was added for additional context in a comment. As Google+ was shut down since then, this link is now dead. Tried replacing it with a Wayback Machine link, but it doesn't seem to load anymore either. Unfortunately I couldn't find a more formal documentation for this, but at least Alexey managed to find an explanation on Hacker News (there also happens to be a link to the Google+ post in that thread). Hopefully it stays online for far longer than Google+. Change-Id: I7e10e49b08428079d3134e9291328e4508822ecf Reviewed-on: http://gerrit.cloudera.org:8080/16279 Tested-by: Attila Bukor Reviewed-by: Grant Henke > Log block manager should handle null bytes in metadata on crash > --- > > Key: KUDU-2260 > URL: https://issues.apache.org/jira/browse/KUDU-2260 > Project: Kudu > Issue Type: Bug > Components: fs >Reporter: Mike Percy >Assignee: William Berkeley >Priority: Major > Fix For: 1.8.0 > > > The log block manager currently may leave null bytes at the end of the > metadata log file if there is a system crash in the middle of a write. The > log block manager should detect null bytes at the end of a metadata entry on > startup and potentially truncate the entry or close the container. > Currently, it prints an error along the following lines: > {code} > F0111 09:30:27.327011 28843 tablet_server_main.cc:64] Check failed: _s.ok() > Bad status: Corruption: Failed to load FS layout: Could not read records from > container /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e: Data length > checksum does not match: Incorrect checksum in file > /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e.metadata at offset 372533: > Checksum does not match. Expected: 0. Actual: 1323915147 > {code} > At the time of writing, the workaround for this issue is to truncate the > affected file at the start of the incomplete entry in the file. While this > may leave orphaned blocks, this should be safe because if the metadata entry > was never successfully written then it should not have been considered > durable, either. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2260) Log block manager should handle null bytes in metadata on crash
[ https://issues.apache.org/jira/browse/KUDU-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887615#comment-16887615 ] Adar Dembo commented on KUDU-2260: -- The Google+ shutdown means that the link Mike provided is now broken. However, I think we saw this in the wild. Here's a very interesting MRS flush failure, on a tserver running Kudu 1.7.0: {noformat} I0716 02:23:42.355777 22937 tablet.cc:1153] T f4de49e24eb6420bb41a2391921d341d P 71430a6bb9b74e09b9767dacc6598102: Flush: entering stage 1 (old memrowset already frozen for inserts) I0716 02:23:42.355800 22937 compaction.cc:914] Selected 1 rowsets to compact: I0716 02:23:42.355805 22937 compaction.cc:917] memrowset(current size on disk: ~0 bytes) I0716 02:23:42.355813 22937 tablet.cc:1155] T f4de49e24eb6420bb41a2391921d341d P 71430a6bb9b74e09b9767dacc6598102: Memstore in-memory size: 509719 bytes I0716 02:23:42.355821 22937 tablet.cc:1444] T f4de49e24eb6420bb41a2391921d341d P 71430a6bb9b74e09b9767dacc6598102: Flush: entering phase 1 (flushing snapshot). Phase 1 snapshot: MvccSnapshot[committed={T|T < 6403105213255942144 or (T in {6403105213255942144})}] I0716 02:23:42.423743 22937 multi_column_writer.cc:98] Opened CFile writers for 52 column(s) W0716 02:23:42.695962 22937 log_block_manager.cc:1151] Container /data01/kudu/tserver/data/data/539eeb9a0c4b4c87b9cb2ed727f09a19 being marked read-only: IO error: Failed to Sync() file: /data01/kudu/tserver/data/data/539eeb9a0c4b4c87b9cb2ed727f09a19.metadata: Cannot allocate memory (error 12) W0716 02:23:42.697571 22937 log_block_manager.cc:1370] Failed to abort block 02936068: IO error: container /data01/kudu/tserver/data/data/539eeb9a0c4b4c87b9cb2ed727f09a19 is read-only: Failed to Sync() file: /data01/kudu/tserver/data/data/539eeb9a0c4b4c87b9cb2ed727f09a19.metadata: Cannot allocate memory (error 12) W0716 02:23:42.716284 22937 tablet_replica_mm_ops.cc:144] T f4de49e24eb6420bb41a2391921d341d P 71430a6bb9b74e09b9767dacc6598102: failed to flush MRS: IO error: Failed to finish DRS writer: Failed to Sync() file: /data01/kudu/tserver/data/data/539eeb9a0c4b4c87b9cb2ed727f09a19.metadata: Cannot allocate memory (error 12) F0716 02:23:42.716315 22937 tablet_replica_mm_ops.cc:145] Check failed: tablet->HasBeenStopped() FlushMRS failure is only allowed if the tablet is stopped first {noformat} Looks like fdatasync() returned ENOMEM. After that, the container was corrupted with trailing NULL bytes, though it looks like parts of a message header are also in there: {noformat} F0716 02:23:59.869529 103240 tablet_server_main.cc:80] Check failed: _s.ok() Bad status: Corruption: Failed to load FS layout: Could not process records in container /data01/kudu/tserver/data/data/2a1552dd97d645689ef8c39a4f027707: Data length checksum does not match: Incorrect checksum in file /data01/kudu/tserver/data/data/2a1552dd97d645689ef8c39a4f027707.metadata at offset 425981: Checksum does not match. Expected: 4647048. Actual: 1699145864 $ hexdump /data01/kudu/tserver/data/data/2a1552dd97d645689ef8c39a4f027707.metadata: ... 0067ff0 fee3 e3ab d202 a931 1629 8800 46e8 0068000 * 0068307 {noformat} > Log block manager should handle null bytes in metadata on crash > --- > > Key: KUDU-2260 > URL: https://issues.apache.org/jira/browse/KUDU-2260 > Project: Kudu > Issue Type: Bug > Components: fs >Reporter: Mike Percy >Assignee: Will Berkeley >Priority: Major > Fix For: 1.8.0 > > > The log block manager currently may leave null bytes at the end of the > metadata log file if there is a system crash in the middle of a write. The > log block manager should detect null bytes at the end of a metadata entry on > startup and potentially truncate the entry or close the container. > Currently, it prints an error along the following lines: > {code} > F0111 09:30:27.327011 28843 tablet_server_main.cc:64] Check failed: _s.ok() > Bad status: Corruption: Failed to load FS layout: Could not read records from > container /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e: Data length > checksum does not match: Incorrect checksum in file > /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e.metadata at offset 372533: > Checksum does not match. Expected: 0. Actual: 1323915147 > {code} > At the time of writing, the workaround for this issue is to truncate the > affected file at the start of the incomplete entry in the file. While this > may leave orphaned blocks, this should be safe because if the metadata entry > was never successfully written then it should not have been considered > durable, either. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KUDU-2260) Log block manager should handle null bytes in metadata on crash
[ https://issues.apache.org/jira/browse/KUDU-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516459#comment-16516459 ] Will Berkeley commented on KUDU-2260: - One other important detail: the NULL byte guarantee holds when using the default data=ordered guarantee or stronger. If ext4 is mounted with data=writeback then we could find anything at the end. So we can (transparently) recover from this situation at startup under default ext4 settings. > Log block manager should handle null bytes in metadata on crash > --- > > Key: KUDU-2260 > URL: https://issues.apache.org/jira/browse/KUDU-2260 > Project: Kudu > Issue Type: Bug > Components: fs >Reporter: Mike Percy >Priority: Major > > The log block manager currently may leave null bytes at the end of the > metadata log file if there is a system crash in the middle of a write. The > log block manager should detect null bytes at the end of a metadata entry on > startup and potentially truncate the entry or close the container. > Currently, it prints an error along the following lines: > {code} > F0111 09:30:27.327011 28843 tablet_server_main.cc:64] Check failed: _s.ok() > Bad status: Corruption: Failed to load FS layout: Could not read records from > container /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e: Data length > checksum does not match: Incorrect checksum in file > /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e.metadata at offset 372533: > Checksum does not match. Expected: 0. Actual: 1323915147 > {code} > At the time of writing, the workaround for this issue is to truncate the > affected file at the start of the incomplete entry in the file. While this > may leave orphaned blocks, this should be safe because if the metadata entry > was never successfully written then it should not have been considered > durable, either. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2260) Log block manager should handle null bytes in metadata on crash
[ https://issues.apache.org/jira/browse/KUDU-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516456#comment-16516456 ] Mike Percy commented on KUDU-2260: -- [~wdberkeley] looked into this a bit today after this appeared again in the wild and found [this thread|https://plus.google.com/+KentonVarda/posts/JDwHfAiLGNQ] where Ted T'so discusses this situation and notes that ext4 may flush the file size before the data makes it to disk. The one guarantee you get is that when that happens you will read NULL bytes at the end of the file (instead of some garbage data). So it seems like we should look for trailing NULL records at the end of these files and ignore them when opening log block containers. One thing that wasn't clear from my reading of that thread is whether the writes need to be sector-aligned to avoid torn writes or whether the filesystem will avoid crossing a sector boundary in all cases for a single write that is less than sector bytes long. > Log block manager should handle null bytes in metadata on crash > --- > > Key: KUDU-2260 > URL: https://issues.apache.org/jira/browse/KUDU-2260 > Project: Kudu > Issue Type: Bug > Components: fs >Reporter: Mike Percy >Priority: Major > > The log block manager currently may leave null bytes at the end of the > metadata log file if there is a system crash in the middle of a write. The > log block manager should detect null bytes at the end of a metadata entry on > startup and potentially truncate the entry or close the container. > Currently, it prints an error along the following lines: > {code} > F0111 09:30:27.327011 28843 tablet_server_main.cc:64] Check failed: _s.ok() > Bad status: Corruption: Failed to load FS layout: Could not read records from > container /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e: Data length > checksum does not match: Incorrect checksum in file > /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e.metadata at offset 372533: > Checksum does not match. Expected: 0. Actual: 1323915147 > {code} > At the time of writing, the workaround for this issue is to truncate the > affected file at the start of the incomplete entry in the file. While this > may leave orphaned blocks, this should be safe because if the metadata entry > was never successfully written then it should not have been considered > durable, either. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2260) Log block manager should handle null bytes in metadata on crash
[ https://issues.apache.org/jira/browse/KUDU-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324773#comment-16324773 ] Mike Percy commented on KUDU-2260: -- I think the way this happens is the underlying filesystem starts flushing to disk and then the system crashes or loses power, however I have only seen this in the wild and not yet reproduced this "in the lab". > Log block manager should handle null bytes in metadata on crash > --- > > Key: KUDU-2260 > URL: https://issues.apache.org/jira/browse/KUDU-2260 > Project: Kudu > Issue Type: Bug > Components: fs >Reporter: Mike Percy > > The log block manager currently may leave null bytes at the end of the > metadata log file if there is a system crash in the middle of a write. The > log block manager should detect null bytes at the end of a metadata entry on > startup and potentially truncate the entry or close the container. > Currently, it prints an error along the following lines: > {code} > F0111 09:30:27.327011 28843 tablet_server_main.cc:64] Check failed: _s.ok() > Bad status: Corruption: Failed to load FS layout: Could not read records from > container /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e: Data length > checksum does not match: Incorrect checksum in file > /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e.metadata at offset 372533: > Checksum does not match. Expected: 0. Actual: 1323915147 > {code} > At the time of writing, the workaround for this issue is to truncate the > affected file at the start of the incomplete entry in the file. While this > may leave orphaned blocks, this should be safe because if the metadata entry > was never successfully written then it should not have been considered > durable, either. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KUDU-2260) Log block manager should handle null bytes in metadata on crash
[ https://issues.apache.org/jira/browse/KUDU-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324765#comment-16324765 ] Adar Dembo commented on KUDU-2260: -- The LBM doesn't preallocate metadata files and all records are guaranteed to be smaller than a filesystem block and disk sector. Each record is also written in just one syscall. So, how can this happen? Note: this is somewhat covered by KUDU-668. I'll link it here. > Log block manager should handle null bytes in metadata on crash > --- > > Key: KUDU-2260 > URL: https://issues.apache.org/jira/browse/KUDU-2260 > Project: Kudu > Issue Type: Bug > Components: fs >Reporter: Mike Percy > > The log block manager currently may leave null bytes at the end of the > metadata log file if there is a system crash in the middle of a write. The > log block manager should detect null bytes at the end of a metadata entry on > startup and potentially truncate the entry or close the container. > Currently, it prints an error along the following lines: > {code} > F0111 09:30:27.327011 28843 tablet_server_main.cc:64] Check failed: _s.ok() > Bad status: Corruption: Failed to load FS layout: Could not read records from > container /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e: Data length > checksum does not match: Incorrect checksum in file > /data/3/kudu/data/f70391c7c6084e08bbae7448518e0b5e.metadata at offset 372533: > Checksum does not match. Expected: 0. Actual: 1323915147 > {code} > At the time of writing, the workaround for this issue is to truncate the > affected file at the start of the incomplete entry in the file. While this > may leave orphaned blocks, this should be safe because if the metadata entry > was never successfully written then it should not have been considered > durable, either. -- This message was sent by Atlassian JIRA (v6.4.14#64029)