Bug#833054: e2fsprogs: fsck.ext4 does not repair filesystem
On Wed, Sep 07, 2016 at 10:21:34PM +0100, ael wrote: > Of course, I can just reformat the card, and I am confident that it will > then work, but this looks like an opportunity to uncover a "feature" > lurking somewhere. I imagine that you will ask for an dumpe2fs run next? Actually, what I'd like is a compressed e2image dump of the file system, and reproduction instructions so I can try to replicate the problem in an environment where I can observe it. See "man e2image" and look for the section for RAW IMAGE FILES for more details. This will allow you to send me just the metadata of the file (so I won't see any of the data blocks), but that should be enough to replicate the problem. It would leak out the directory names to me, which might or might not be an issue that you would be concerned about. There is a "scramble" option which will scramble the filenames, but that will screw up the htree directory structures, so I'd prefer not having the filename unless you feel strongly about the filenames being privacy sensitive --- in which case an e2image with scrambled filenames is better than nothing. Cheers, - Ted
Bug#833054: e2fsprogs: fsck.ext4 does not repair filesystem
On Sun, Jul 31, 2016 at 03:14:28PM -0400, Theodore Ts'o wrote: > > My suggestion at this point is to do an image copy of the entire > contents of the sdhc card using either dd or ddrescue, to a known-good > hard drive, and then run fsck.ext4 on that that image copy of your > file system. I suspect the root cause of this is simple hardware > failure of the sdhc card. So doing an image copy is a good idea > before you do anything else, since it reduces the risk of (further) > data loss. First my apologies for the long silence: I was away from home. I have done as you suggested. Taken an image of the sdhc card (on another machine) and written that to a file (as it happens on an usbstick). I then ran fsck.ext4 which claimed to correct a problem. Next I loop-mounted that image. I also ran fsck.ext4 on the sdhc card on the other machine) which similarly claimed to fix a problem and then also mounted that. I then tested using rsync as: rsync -na /mnt/testing/ /sdhc/ ^ ^ | | loop mounted image faulty card That is I asked rsync to do a dry run copy from the loop mounted image to the card so that it would read the image filesystem and not write to the card. On a subset of files, I saw the warnings: Structure needs cleaning (117) and /var/log/messages included EXT4-fs error: 32 callbacks suppressed Just to confirm, I used a simple cat of one of the problem files:- # cat /mnt/testing/lost+found/#528000: Structure needs cleaning So it seems that fsck.ext4 is unable to correct the problems even on the image, although it claims to have done so. It is possible that the original problem with the card was perhaps a hardware fault (I have mentioned that it occasionally gets disturbed in its slot), but whatever happened the current state of the file system is confusing fsck.ext4. Or so it seems to me. > > At this point, unless you can replicate the problem after copying the > file system image off the sdhc to something more reliable (and in > particular, cheap sd cards in the checkout counter line of "The Micro > Center", or deep discount cards purchased from a direct-from-a-no-name- > Shenzhen manufacturer don't count as reliable), I very much doubt it > is a software bug, but rather a hardware issue. So yes, I can replicate the problem after copying: I am satisfied that the usbstick is fine, but I can do similar tests using other disk hardware if you remain suspicious. Of course, I can just reformat the card, and I am confident that it will then work, but this looks like an opportunity to uncover a "feature" lurking somewhere. I imagine that you will ask for an dumpe2fs run next? Thanks for the help so far.
Bug#833054: e2fsprogs: fsck.ext4 does not repair filesystem
On Sun, Jul 31, 2016 at 03:14:28PM -0400, Theodore Ts'o wrote: > On Sun, Jul 31, 2016 at 10:41:29AM +0100, ael wrote: > > Package: e2fsprogs > > Version: 1.43.1-1 > > Severity: normal > > > At this point, unless you can replicate the problem after copying the > file system image off the sdhc to something more reliable (and in > particular, cheap sd cards in the checkout counter line of "The Micro > Center", or deep discount cards purchased from a direct-from-a-no-name- > Shenzhen manufacturer don't count as reliable), I very much doubt it > is a software bug, but rather a hardware issue. Thanks for the reply and analysis. I will follow your suggestions. The sdhc card was not cheap and I checked it with f3 (http://oss.digirati.com.br/f3/) to check for a fake before installation. I believe it is a genuine Kingston 32GB card. I always check ssd technology cards or drives before trusting them these days. But you were right to suggest this given how many dubious devices there are around. Thanks again. What I think is more likely is that I think that card occasionally gets disturbed in its slot when the netbook gets carried around, so a connector problem is a more likely cause of a hardware issue. I reseat the card when I remember to try to avoid that happening. I don't know whether the device driver can defend against that sort of problem. Thank you so much for all your work: I have admired it from a distnce for many years. I will report further when I have followed your suggestions.
Bug#833054: e2fsprogs: fsck.ext4 does not repair filesystem
On Sun, Jul 31, 2016 at 10:41:29AM +0100, ael wrote: > Package: e2fsprogs > Version: 1.43.1-1 > Severity: normal > > I have an ext4 filesystem (on an sdhc card) which has been working > for several years. Yesterday, I became aware that the file system > was damaged. Systemd seems to try to hide such details from users these > days, so I am not sure when and why the problem occurred. > > I ran fsck.ext4 which made extensive repairs, but eventually declared > the system clean. But accessing some parts of the file system > gives a (d?)message: > " Cannot stat: Structure needs cleaning." > *and* (I think) a kernel oops: This was a kernel warning, not a oops. However, it was preceeded by an EXT4-fs error, and unfortunately there were enough before that point that that apparently the rate limiters had kicked in: > Jul 30 21:21:52 elf kernel: [ 938.971363] EXT4-fs error: 2 callbacks > suppressed > Jul 30 21:23:22 elf kernel: [ 1028.614023] [ cut here > ] > Jul 30 21:23:22 elf kernel: [ 1028.614052] WARNING: CPU: 0 PID: 23014 at > /build/linux-sI189k/linux-4.6.4/fs/inode.c:279 drop_nlink+0x3f/0x50 My suggestion at this point is to do an image copy of the entire contents of the sdhc card using either dd or ddrescue, to a known-good hard drive, and then run fsck.ext4 on that that image copy of your file system. I suspect the root cause of this is simple hardware failure of the sdhc card. So doing an image copy is a good idea before you do anything else, since it reduces the risk of (further) data loss. > Running fsck.ext4 after such problems claims to clear the filesystem > again, but the problems remain. At this point, unless you can replicate the problem after copying the file system image off the sdhc to something more reliable (and in particular, cheap sd cards in the checkout counter line of "The Micro Center", or deep discount cards purchased from a direct-from-a-no-name- Shenzhen manufacturer don't count as reliable), I very much doubt it is a software bug, but rather a hardware issue. Cheers, - Ted
Bug#833054: e2fsprogs: fsck.ext4 does not repair filesystem
Package: e2fsprogs Version: 1.43.1-1 Severity: normal I have an ext4 filesystem (on an sdhc card) which has been working for several years. Yesterday, I became aware that the file system was damaged. Systemd seems to try to hide such details from users these days, so I am not sure when and why the problem occurred. I ran fsck.ext4 which made extensive repairs, but eventually declared the system clean. But accessing some parts of the file system gives a (d?)message: " Cannot stat: Structure needs cleaning." *and* (I think) a kernel oops: -- Jul 30 21:21:52 elf kernel: [ 938.971363] EXT4-fs error: 2 callbacks suppressed Jul 30 21:23:22 elf kernel: [ 1028.614023] [ cut here ] Jul 30 21:23:22 elf kernel: [ 1028.614052] WARNING: CPU: 0 PID: 23014 at /build/linux-sI189k/linux-4.6.4/fs/inode.c:279 drop_nlink+0x3f/0x50 Jul 30 21:23:22 elf kernel: [ 1028.614060] Modules linked in: cpuid(E) cbc(E) dm_crypt(E) dm_mod(E) uas(E) usb_storage(E) nls_utf8(E) nls_cp437(E) vfat(E) fat(E) xt_recent(E) iptable_nat(E) nf_nat_ipv4(E) xt_comment(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_addrtype(E) xt_mark(E) iptable_mangle(E) xt_tcpudp(E) xt_CT(E) iptable_raw(E) xt_multiport(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_conntrack(E) xt_NFLOG(E) nfnetlink_log(E) xt_LOG(E) nf_log_ipv4(E) nf_log_common(E) nf_nat_tftp(E) nf_nat_snmp_basic(E) nf_conntrack_snmp(E) nf_nat_sip(E) nf_nat_pptp(E) nf_nat_proto_gre(E) nf_nat_irc(E) nf_nat_h323(E) nf_nat_ftp(E) nf_nat_amanda(E) ts_kmp(E) nf_conntrack_amanda(E) nf_nat(E) nf_conntrack_sane(E) nf_conntrack_tftp(E) nf_conntrack_sip(E) nf_conntrack_proto_udplite(E) nf_conntrack_proto_sctp(E) nf_conntrack_pptp(E) nf_conntrack_proto_gre(E) nf_conntrack_netlink(E) nfnetlink(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_irc(E) nf_conntrack_h323(E) nf_conntrack_ftp(E ) nf_conntrack(E) iptable_filter(E) ip_tables(E) x_tables(E) snd_hrtimer(E) snd_seq_midi(E) snd_seq_midi_event(E) snd_rawmidi(E) snd_seq(E) snd_seq_device(E) cpufreq_powersave(E) cpufreq_userspace(E) cpufreq_stats(E) cpufreq_conservative(E) iTCO_wdt(E) iTCO_vendor_support(E) sparse_keymap(E) arc4(E) acerhdf(E) uvcvideo(E) videobuf2_vmalloc(E) videobuf2_memops(E) videobuf2_v4l2(E) videobuf2_core(E) coretemp(E) videodev(E) media(E) joydev(E) evdev(E) serio_raw(E) pcspkr(E) sg(E) ath5k(E) ath(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) i2c_i801(E) lpc_ich(E) mfd_core(E) rng_core(E) mac80211(E) i915(E) snd_hda_intel(E) snd_hda_codec(E) jmb38x_ms(E) snd_hda_core(E) snd_hwdep(E) snd_pcm_oss(E) memstick(E) cfg80211(E) snd_mixer_oss(E) rfkill(E) drm_kms_helper(E) snd_pcm(E) snd_timer(E) drm(E) snd(E) soundcore(E) i2c_algo_bit(E) shpchp(E) video(E) wmi(E) ac(E) battery(E) acpi_cpufreq(E) tpm_tis(E) tpm(E) button(E) processor(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sun rpc(E) loop(E) autofs4(E) ext4(E) ecb(E) xts(E) lrw(E) gf128mul(E) ablk_helper(E) cryptd(E) aes_i586(E) crc16(E) jbd2(E) crc32c_generic(E) mbcache(E) mmc_block(E) sd_mod(E) ata_generic(E) psmouse(E) ata_piix(E) libata(E) scsi_mod(E) ehci_pci(E) sdhci_pci(E) sdhci(E) mmc_core(E) r8169(E) mii(E) uhci_hcd(E) ehci_hcd(E) usbcore(E) usb_common(E) Jul 30 21:23:22 elf kernel: [ 1028.614449] CPU: 0 PID: 23014 Comm: offlineimap Tainted: GE 4.6.0-1-686-pae #1 Debian 4.6.4-1 Jul 30 21:23:22 elf kernel: [ 1028.614458] Hardware name: Acer AOA110/, BIOS v0.3309 10/06/2008 Jul 30 21:23:22 elf kernel: [ 1028.614466] 00200286 164f904c f4195de4 c12dbeac c106cc4a c1660208 Jul 30 21:23:22 elf kernel: [ 1028.614486] 59e6 c167a04c 0117 c11e513f 0117 0009 c11e513f Jul 30 21:23:22 elf kernel: [ 1028.614507] c3d2b8d8 0001 f4195df8 c106cd5a 0009 Jul 30 21:23:22 elf kernel: [ 1028.614527] Call Trace: Jul 30 21:23:22 elf kernel: [ 1028.614549] [] ? dump_stack+0x55/0x79 Jul 30 21:23:22 elf kernel: [ 1028.614564] [] ? __warn+0xea/0x110 Jul 30 21:23:22 elf kernel: [ 1028.614578] [] ? drop_nlink+0x3f/0x50 Jul 30 21:23:22 elf kernel: [ 1028.614591] [] ? drop_nlink+0x3f/0x50 Jul 30 21:23:22 elf kernel: [ 1028.614604] [] ? warn_slowpath_null+0x2a/0x30 Jul 30 21:23:22 elf kernel: [ 1028.614617] [] ? drop_nlink+0x3f/0x50 Jul 30 21:23:22 elf kernel: [ 1028.614660] [] ? ext4_rename2+0x65a/0xcc0 [ext4] Jul 30 21:23:22 elf kernel: [ 1028.614705] [] ? ext4_tmpfile+0x1c0/0x1c0 [ext4] Jul 30 21:23:22 elf kernel: [ 1028.614721] [] ? vfs_rename+0x5a0/0x930 Jul 30 21:23:22 elf kernel: [ 1028.614762] [] ? ext4_tmpfile+0x1c0/0x1c0 [ext4] Jul 30 21:23:22 elf kernel: [ 1028.614777] [] ? SyS_rename+0x366/0x380 Jul 30 21:23:22 elf kernel: [ 1028.614794] [] ? do_fast_syscall_32+0x8d/0x140 Jul 30 21:23:22 elf kernel: [ 1028.614806] [] ? sysenter_past_esp+0x47/0x75 Jul 30 21:23:22 elf kernel: [ 1028.614818] ---[ end trace