Re: dump trying to access incorrect block numbers?
I have extracted material from the list-exchanges related to this and submitted: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220693 against the kernel for the issue. I placed emphasis on the SSD-trim related "freeing free block" panics that "fsck -B" can lead to after it gets the g_vfs_done messages for unclean ufs file systems but first noted that: mksnap_ffs /.snap2 was enough to get the g_vfs_done messages. I figured that the nastiest and most important known consequences were "fsck -B" being broken for unclean ufs file systems and having later panics trying to trim based on how it is broken. I did also mention dump as producing the messages. I referenced. . . > See also the exchange of list submittals associated > with: > > https://lists.freebsd.org/pipermail/freebsd-current/2017-July/066505.html > > and: > > https://lists.freebsd.org/pipermail/freebsd-current/2017-July/066508.html === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: dump trying to access incorrect block numbers?
> On 07/08/17 12:28, Rodney W. Grimes wrote: > > > Since it has been speculated that this is occuring during the > > creation of the snapshot, could you try just creating a snapshot > > using mksnap_ffs and see if any errors occur? > > After a short pause with disk activity, the same sorts of errors are > logged when using "mksnap_ffs /.snap2" where .snap2 did not previously > exist, Ok, so this simplifies what needs to be looked at, this is no longer a dump(8) issue, but now a snapshot ufs issue. This should be much easier for more people to try out. -- Rod Grimes rgri...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: dump trying to access incorrect block numbers?
[A normal multi-user boot's fsck activity can do fsck -B activity that gets the problem.] On 2017-Jul-8, at 9:45 AM, Mark Millard wrote: > [I add notes about a problem that happens after the > "fsck -B". Also forgot to mention: production style > kernel world builds were in use. And a tried a > powerpc64 build and it works the same.] > > On 2017-Jul-7, at 11:09 PM, Mark Millard wrote: > >> [This note has more information than one sent with extra text >> in the subject but with a partially different "to" list.] >> >> Peter Jeremy peter at rulingia.com wrote on >> Sat Jul 8 02:00:47 UTC 2017 : >> >>> When did you first notice this (what SVN revision)? >>> Do you know what the last good SVN revision was? >>> Is this a new or old filesystem? >>> Is the filesystem mounted/active or not when you dump it? >>> What are the relevant parameters for the filesystem on ada0s3a? >>> Are you running softupdates, journalling etc? >>> Which dump(8) phase is reporting the errors? >>> What are the exact dump and fsck commands you ran? >> >> I can add a little information with some contrast >> and only "fsck -B" in use (with an unclean file >> system from a prior crash), no dump use. Still: >> a snapshot is involved in the below. >> >> Unfortunately two problems with major consequences >> for my involved context limit the svn range that I >> can cover for the activity, the problem version >> ranges being: >> >> -r319722 through -r320651 (fixed by -r320652) >> (actually this is why I had used "boot -s" >> in what I report later: I could get to a >> shell prompt that way instead of crashing >> before any login prompt; the crashes left >> the file system in need of repair) >> >> -r320509 through -r320561 (fixed by -r320570) >> >> So I was using -r320570 to avoid one of the >> two problems. >> >> >> >> Context: 32-bit powerpc FreeBSD used on PowerMac G5 >> so-called "Quad-core". (So big-endian as well.) >> Softupdates, no journalling. Long-in-use file >> system having lots of FreeBSD versions updates >> and port rebuilds over the time. >> >> The following is from now, not from the time of the >> example messages: >> >> # dumpfs / | more >> magic 19540119 (UFS2) timeFri Jul 7 22:53:34 2017 >> superblock location 65536 id [ ] >> ncg 158 size25165823blocks 24372006 >> bsize 32768 shift 15 mask0x8000 >> fsize 4096shift 12 mask0xf000 >> frag8 shift 3 fsbtodb 3 >> minfree 8% optim timesymlinklen 120 >> maxbsize 32768 maxbpg 4096maxcontig 4 contigsumsize 4 >> nbfree 2130375 ndir65518 nifree 11769796nffree 425065 >> bpg 20032 fpg 160256 ipg 80128 unrefs 0 >> nindir 4096inopb 128 maxfilesize 2252349704110079 >> sbsize 4096cgsize 32768 csaddr 5048cssize 4096 >> sblkno 24 cblkno 32 iblkno 40 dblkno 5048 >> cgrotor 127 fmod0 ronly 0 clean 0 >> metaspace 6408 avgfpdir 64 avgfilesize 16384 >> flags soft-updates trim >> fsmnt / >> volname FBSDG4Srootfs swuid 0 providersize25165823 >> . . . >> >> >> >> What I had done that produced the messages was: >> >> > leaves root (only) file system not marked clean >> so fsck -B will actually do something below> >> >> boot -s (so: single user mode) >> # The next 3 lines are the content of a generic, manually-run script. >> mount -u / >> mount -a -t ufs (but there is no other file system) >> swapon -a (there is a swap partition) >> # >> fsck -B >> >> That "fsck -B" caused the same kinds of lines >> reported by Michael Butler, happening as fsck >> makes a snapshot for the background processing >> to use. (I have camera pictures and could type >> in some of the lines if needed.) >> >> After those lines was text like (typed in from >> an example camera picture): >> >> ** //.snap/fsck_snapshot >> ** Last Mount on / >> ** Root file system >> ** Phase 1 - Check Blocks and Sizes >> ** Phase 2 - Check Pathnames >> ** Phase 3 - Check Connectivity >> ** Phase 4 - Check Reference Counts >> ** Phase 5 - Check Cyl groups >> Reclaimed: 0 directories, 1 files, 22680 fragments >> 780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 blocks, >> 1.8% fragmentation) >> >> * FILE SYSTEM MARKED CLEAN * > > [I forgot or mention that the context was a > production style kernel and world build, > no invariants or other such.] > > Since I'm running a patched -r320570 for the > issue: > > -r319722 through -r320651 (fixed by -r320652) > > I went back and forced a power-off without > shutdown and did the sequence: > > boot -s (so: single user mode) > # The next 3 lines are the content of a generic, manually-run script. > mount -u / > mount -a -t ufs (but there is no other file system) > swapon -a (there is a swap partition) > # > fsck -B > > but always waited briefly after the fsck -B finished. > > Like before the followi
Re: dump trying to access incorrect block numbers?
[I add notes about a problem that happens after the "fsck -B". Also forgot to mention: production style kernel world builds were in use. And a tried a powerpc64 build and it works the same.] On 2017-Jul-7, at 11:09 PM, Mark Millard wrote: > [This note has more information than one sent with extra text > in the subject but with a partially different "to" list.] > > Peter Jeremy peter at rulingia.com wrote on > Sat Jul 8 02:00:47 UTC 2017 : > >> When did you first notice this (what SVN revision)? >> Do you know what the last good SVN revision was? >> Is this a new or old filesystem? >> Is the filesystem mounted/active or not when you dump it? >> What are the relevant parameters for the filesystem on ada0s3a? >> Are you running softupdates, journalling etc? >> Which dump(8) phase is reporting the errors? >> What are the exact dump and fsck commands you ran? > > I can add a little information with some contrast > and only "fsck -B" in use (with an unclean file > system from a prior crash), no dump use. Still: > a snapshot is involved in the below. > > Unfortunately two problems with major consequences > for my involved context limit the svn range that I > can cover for the activity, the problem version > ranges being: > > -r319722 through -r320651 (fixed by -r320652) > (actually this is why I had used "boot -s" > in what I report later: I could get to a > shell prompt that way instead of crashing > before any login prompt; the crashes left > the file system in need of repair) > > -r320509 through -r320561 (fixed by -r320570) > > So I was using -r320570 to avoid one of the > two problems. > > > > Context: 32-bit powerpc FreeBSD used on PowerMac G5 > so-called "Quad-core". (So big-endian as well.) > Softupdates, no journalling. Long-in-use file > system having lots of FreeBSD versions updates > and port rebuilds over the time. > > The following is from now, not from the time of the > example messages: > > # dumpfs / | more > magic 19540119 (UFS2) timeFri Jul 7 22:53:34 2017 > superblock location 65536 id [ ] > ncg 158 size25165823blocks 24372006 > bsize 32768 shift 15 mask0x8000 > fsize 4096shift 12 mask0xf000 > frag8 shift 3 fsbtodb 3 > minfree 8% optim timesymlinklen 120 > maxbsize 32768 maxbpg 4096maxcontig 4 contigsumsize 4 > nbfree 2130375 ndir65518 nifree 11769796nffree 425065 > bpg 20032 fpg 160256 ipg 80128 unrefs 0 > nindir 4096inopb 128 maxfilesize 2252349704110079 > sbsize 4096cgsize 32768 csaddr 5048cssize 4096 > sblkno 24 cblkno 32 iblkno 40 dblkno 5048 > cgrotor 127 fmod0 ronly 0 clean 0 > metaspace 6408 avgfpdir 64 avgfilesize 16384 > flags soft-updates trim > fsmnt / > volname FBSDG4Srootfs swuid 0 providersize25165823 > . . . > > > > What I had done that produced the messages was: > > leaves root (only) file system not marked clean > so fsck -B will actually do something below> > > boot -s (so: single user mode) > # The next 3 lines are the content of a generic, manually-run script. > mount -u / > mount -a -t ufs (but there is no other file system) > swapon -a (there is a swap partition) > # > fsck -B > > That "fsck -B" caused the same kinds of lines > reported by Michael Butler, happening as fsck > makes a snapshot for the background processing > to use. (I have camera pictures and could type > in some of the lines if needed.) > > After those lines was text like (typed in from > an example camera picture): > > ** //.snap/fsck_snapshot > ** Last Mount on / > ** Root file system > ** Phase 1 - Check Blocks and Sizes > ** Phase 2 - Check Pathnames > ** Phase 3 - Check Connectivity > ** Phase 4 - Check Reference Counts > ** Phase 5 - Check Cyl groups > Reclaimed: 0 directories, 1 files, 22680 fragments > 780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 blocks, 1.8% > fragmentation) > > * FILE SYSTEM MARKED CLEAN * [I forgot or mention that the context was a production style kernel and world build, no invariants or other such.] Since I'm running a patched -r320570 for the issue: -r319722 through -r320651 (fixed by -r320652) I went back and forced a power-off without shutdown and did the sequence: boot -s (so: single user mode) # The next 3 lines are the content of a generic, manually-run script. mount -u / mount -a -t ufs (but there is no other file system) swapon -a (there is a swap partition) # fsck -B but always waited briefly after the fsck -B finished. Like before the following happens as it tries to trim: (typed in from camera picture) panic: ffs_blkfree_cq: freeing free block cpuid = 2 (varies, of course) time = (varies) KDB: stack backtrace (stack addresses can vary: just an example here) 0xd23b17e0: at kdb_backtrace+0x5c 0xd23b1850: at vpanic+0x1e8 0xd23b18c0: at panic+0x54
Re: dump trying to access incorrect block numbers?
> On 07/07/17 21:53, Peter Jeremy wrote: > > On 2017-Jul-07 10:44:36 -0400, Michael Butler > > wrote: > >> Recent builds doing a backup (dump) cause nonsensical errors in syslog: > > > > I can't directly offer any ideas but some more background might help: > > When did you first notice this (what SVN revision)? > > I was stuck on SVN r319721 on the i386 machine while the socket/union > issue was addressed. That version did not display the problem. > > > Do you know what the last good SVN revision was? > > Is this a new or old filesystem? > > old - it's been years since this system was rebuilt. > > > Is the filesystem mounted/active or not when you dump it? > > Mounted and active. > > > What are the relevant parameters for the filesystem on ada0s3a? > > imb@toshi:/home/imb> dumpfs / > > magic 19540119 (UFS2) timeFri Jul 7 22:43:49 2017 > superblock location 65536 id [ 56c8bf68 1a8b12b5 ] > ncg 516 size82575360blocks 79978821 > bsize 32768 shift 15 mask0x8000 > fsize 4096shift 12 mask0xf000 > frag8 shift 3 fsbtodb 3 > minfree 8% optim timesymlinklen 120 > maxbsize 32768 maxbpg 4096maxcontig 4 contigsumsize 4 > nbfree 3965346 ndir98169 nifree 40196026nffree 453383 > bpg 20035 fpg 160280 ipg 80256 unrefs 0 > nindir 4096inopb 128 maxfilesize 2252349704110079 > sbsize 4096cgsize 32768 csaddr 5056cssize 12288 > sblkno 24 cblkno 32 iblkno 40 dblkno 5056 > cgrotor 253 fmod0 ronly 0 clean 0 > metaspace 6408 avgfpdir 64 avgfilesize 16384 > flags soft-updates > fsmnt / > volname swuid 0 providersize82575360 > > [ .. ] > > > Are you running softupdates, journalling etc? > > soft-updates only. > > > Which dump(8) phase is reporting the errors? > > The errors occur before the "date of the last level x dump" message - > presumably, this is while creating the snapshot. > > > What are the exact dump and fsck commands you ran? > > /sbin/dump 0Lauf - -C 32 / > > none of the following report any (unexpected) errors: > > fsck -f / > fsck -f -r / > fsck -f -Z / > > > > >> I now have two UFS-based systems showing the same symptoms - what's up > >> with this? > > > > Was there anything you did on either filesystem that might have triggered > > it? > > Other than update the kernel, no. Since it has been speculated that this is occuring during the creation of the snapshot, could you try just creating a snapshot using mksnap_ffs and see if any errors occur? -- Rod Grimes rgri...@freebsd.org ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: dump trying to access incorrect block numbers?
On 07/08/17 12:28, Rodney W. Grimes wrote: Since it has been speculated that this is occuring during the creation of the snapshot, could you try just creating a snapshot using mksnap_ffs and see if any errors occur? After a short pause with disk activity, the same sorts of errors are logged when using "mksnap_ffs /.snap2" where .snap2 did not previously exist, Michael ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: dump trying to access incorrect block numbers?
[This note has more information than one sent with extra text in the subject but with a partially different "to" list.] Peter Jeremy peter at rulingia.com wrote on Sat Jul 8 02:00:47 UTC 2017 : > When did you first notice this (what SVN revision)? > Do you know what the last good SVN revision was? > Is this a new or old filesystem? > Is the filesystem mounted/active or not when you dump it? > What are the relevant parameters for the filesystem on ada0s3a? > Are you running softupdates, journalling etc? > Which dump(8) phase is reporting the errors? > What are the exact dump and fsck commands you ran? I can add a little information with some contrast and only "fsck -B" in use (with an unclean file system from a prior crash), no dump use. Still: a snapshot is involved in the below. Unfortunately two problems with major consequences for my involved context limit the svn range that I can cover for the activity, the problem version ranges being: -r319722 through -r320651 (fixed by -r320652) (actually this is why I had used "boot -s" in what I report later: I could get to a shell prompt that way instead of crashing before any login prompt; the crashes left the file system in need of repair) -r320509 through -r320561 (fixed by -r320570) So I was using -r320570 to avoid one of the two problems. Context: 32-bit powerpc FreeBSD used on PowerMac G5 so-called "Quad-core". (So big-endian as well.) Softupdates, no journalling. Long-in-use file system having lots of FreeBSD versions updates and port rebuilds over the time. The following is from now, not from the time of the example messages: # dumpfs / | more magic 19540119 (UFS2) timeFri Jul 7 22:53:34 2017 superblock location 65536 id [ ] ncg 158 size25165823blocks 24372006 bsize 32768 shift 15 mask0x8000 fsize 4096shift 12 mask0xf000 frag8 shift 3 fsbtodb 3 minfree 8% optim timesymlinklen 120 maxbsize 32768 maxbpg 4096maxcontig 4 contigsumsize 4 nbfree 2130375 ndir65518 nifree 11769796nffree 425065 bpg 20032 fpg 160256 ipg 80128 unrefs 0 nindir 4096inopb 128 maxfilesize 2252349704110079 sbsize 4096cgsize 32768 csaddr 5048cssize 4096 sblkno 24 cblkno 32 iblkno 40 dblkno 5048 cgrotor 127 fmod0 ronly 0 clean 0 metaspace 6408 avgfpdir 64 avgfilesize 16384 flags soft-updates trim fsmnt / volname FBSDG4Srootfs swuid 0 providersize25165823 . . . What I had done that produced the messages was: boot -s (so: single user mode) # The next 3 lines are the content of a generic, manually-run script. mount -u / mount -a -t ufs (but there is no other file system) swapon -a (there is a swap partition) # fsck -B That "fsck -B" caused the same kinds of lines reported by Michael Butler, happening as fsck makes a snapshot for the background processing to use. (I have camera pictures and could type in some of the lines if needed.) After those lines was text like (typed in from an example camera picture): ** //.snap/fsck_snapshot ** Last Mount on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups Reclaimed: 0 directories, 1 files, 22680 fragments 780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 blocks, 1.8% fragmentation) * FILE SYSTEM MARKED CLEAN * === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: dump trying to access incorrect block numbers?
On 07/07/17 21:53, Peter Jeremy wrote: On 2017-Jul-07 10:44:36 -0400, Michael Butler wrote: Recent builds doing a backup (dump) cause nonsensical errors in syslog: I can't directly offer any ideas but some more background might help: When did you first notice this (what SVN revision)? I was stuck on SVN r319721 on the i386 machine while the socket/union issue was addressed. That version did not display the problem. Do you know what the last good SVN revision was? Is this a new or old filesystem? old - it's been years since this system was rebuilt. Is the filesystem mounted/active or not when you dump it? Mounted and active. What are the relevant parameters for the filesystem on ada0s3a? imb@toshi:/home/imb> dumpfs / magic 19540119 (UFS2) timeFri Jul 7 22:43:49 2017 superblock location 65536 id [ 56c8bf68 1a8b12b5 ] ncg 516 size82575360blocks 79978821 bsize 32768 shift 15 mask0x8000 fsize 4096shift 12 mask0xf000 frag8 shift 3 fsbtodb 3 minfree 8% optim timesymlinklen 120 maxbsize 32768 maxbpg 4096maxcontig 4 contigsumsize 4 nbfree 3965346 ndir98169 nifree 40196026nffree 453383 bpg 20035 fpg 160280 ipg 80256 unrefs 0 nindir 4096inopb 128 maxfilesize 2252349704110079 sbsize 4096cgsize 32768 csaddr 5056cssize 12288 sblkno 24 cblkno 32 iblkno 40 dblkno 5056 cgrotor 253 fmod0 ronly 0 clean 0 metaspace 6408 avgfpdir 64 avgfilesize 16384 flags soft-updates fsmnt / volname swuid 0 providersize82575360 [ .. ] Are you running softupdates, journalling etc? soft-updates only. Which dump(8) phase is reporting the errors? The errors occur before the "date of the last level x dump" message - presumably, this is while creating the snapshot. What are the exact dump and fsck commands you ran? /sbin/dump 0Lauf - -C 32 / none of the following report any (unexpected) errors: fsck -f / fsck -f -r / fsck -f -Z / I now have two UFS-based systems showing the same symptoms - what's up with this? Was there anything you did on either filesystem that might have triggered it? Other than update the kernel, no. Michael ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: dump trying to access incorrect block numbers?
On 2017-Jul-07 10:44:36 -0400, Michael Butler wrote: >Recent builds doing a backup (dump) cause nonsensical errors in syslog: I can't directly offer any ideas but some more background might help: When did you first notice this (what SVN revision)? Do you know what the last good SVN revision was? Is this a new or old filesystem? Is the filesystem mounted/active or not when you dump it? What are the relevant parameters for the filesystem on ada0s3a? Are you running softupdates, journalling etc? Which dump(8) phase is reporting the errors? What are the exact dump and fsck commands you ran? >I now have two UFS-based systems showing the same symptoms - what's up >with this? Was there anything you did on either filesystem that might have triggered it? -- Peter Jeremy signature.asc Description: PGP signature
Re: dump trying to access incorrect block numbers? [It is not just dump that can get such]
On 2017-Jul-7, at 4:49 PM, Michael Butler wrote: > On 07/07/17 19:02, Mark Millard wrote: >> Michael Butler imb at protected-networks.net wrote on >> Fri Jul 7 14:45:12 UTC 2017 : >>> Recent builds doing a backup (dump) cause nonsensical errors in syslog: >>> >>> Jul 7 00:10:24 toshi kernel: >>> g_vfs_done():ada0s3a[READ(offset=6050375794688, length=32768)]error = 5 >>> Jul 7 00:10:24 toshi kernel: >>> g_vfs_done():ada0s3a[READ(offset=546222112768, length=32768)]error = 5 >>> Jul 7 00:10:24 toshi kernel: > > [ snip ] > >> Both dump and fsck likely are using snapshots >> so the issue is likely ties to ufs snapshots. >> May be it has a INO64 incompleteness that >> gives the huge offsets. (Wild guess.) >> If your context was more typical then the issue >> spans little-endian and big-endian since the >> powerpc context is big-endian but most usage >> is little endian. > > I'm seeing this on both amd64 and i386 builds (@SVN r320760) when dump tries > to build a snap-shot. These are both non-debug, non-invariant production boxes Sounds like there is enough evidence of repeatability, span of TARGET_ARCH's and systems, and recent enough range of -r320??? vintages for a bugzilla submittal. Your TARGET_ARCH's are more main-stream then where I've tried something that showed the issue. What to do the initial submittal? === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: dump trying to access incorrect block numbers? [It is not just dump that can get such]
On 07/07/17 19:02, Mark Millard wrote: Michael Butler imb at protected-networks.net wrote on Fri Jul 7 14:45:12 UTC 2017 : Recent builds doing a backup (dump) cause nonsensical errors in syslog: Jul 7 00:10:24 toshi kernel: g_vfs_done():ada0s3a[READ(offset=6050375794688, length=32768)]error = 5 Jul 7 00:10:24 toshi kernel: g_vfs_done():ada0s3a[READ(offset=546222112768, length=32768)]error = 5 Jul 7 00:10:24 toshi kernel: [ snip ] Both dump and fsck likely are using snapshots so the issue is likely ties to ufs snapshots. May be it has a INO64 incompleteness that gives the huge offsets. (Wild guess.) If your context was more typical then the issue spans little-endian and big-endian since the powerpc context is big-endian but most usage is little endian. I'm seeing this on both amd64 and i386 builds (@SVN r320760) when dump tries to build a snap-shot. These are both non-debug, non-invariant production boxes, Michael ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: dump trying to access incorrect block numbers? [It is not just dump that can get such]
Michael Butler imb at protected-networks.net wrote on Fri Jul 7 14:45:12 UTC 2017 : > Recent builds doing a backup (dump) cause nonsensical errors in syslog: > > Jul 7 00:10:24 toshi kernel: > g_vfs_done():ada0s3a[READ(offset=6050375794688, length=32768)]error = 5 > Jul 7 00:10:24 toshi kernel: > g_vfs_done():ada0s3a[READ(offset=546222112768, length=32768)]error = 5 > Jul 7 00:10:24 toshi kernel: > g_vfs_done():ada0s3a[READ(offset=2142846844928, length=32768)]error = 5 > Jul 7 00:10:24 toshi last message repeated 7 times > Jul 7 00:10:24 toshi kernel: > g_vfs_done():ada0s3a[READ(offset=2226879725568, length=32768)]error = 5 > Jul 7 00:10:24 toshi kernel: > g_vfs_done():ada0s3a[READ(offset=2941159211008, length=32768)]error = 5 > Jul 7 00:10:24 toshi last message repeated 2 times > Jul 7 00:10:24 toshi kernel: > g_vfs_done():ada0s3a[READ(offset=3067208531968, length=32768)]error = 5 > Jul 7 00:10:24 toshi kernel: > g_vfs_done():ada0s3a[READ(offset=3277290733568, length=32768)]error = 5 > Jul 7 00:10:24 toshi kernel: > g_vfs_done():ada0s3a[READ(offset=3487372935168, length=32768)]error = 5 > Jul 7 00:10:24 toshi kernel: > g_vfs_done():ada0s3a[READ(offset=3697455136768, length=32768)]error = 5 > Jul 7 00:10:24 toshi kernel: > g_vfs_done():ada0s3a[READ(offset=3865520898048, length=32768)]error = 5 > > FSCK declares nothing to be wrong with the file-system. I even used the > '-r' inode reclaim option and '-Z' to zero unused blocks to no effect. > > I now have two UFS-based systems showing the same symptoms - what's up > with this? I've seen these kind of messages on 32-bit powerpc -r320570 when using "boot -s" (standalone) and doing an fsck after making the ufs root file system writable. (-r320570 could not boot multi-user all the way without workarounds due to socket software errors.) [Context was a production-style kernel build, not the debug style --but I likely did not try this for a debug kernel build.] The messages came out before the following: (manually retyped from a camera picture) ** //.snap/fsck_snapshot ** Last Mount on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups Reclaimed: 0 directories, 1 files, 22680 fragments 780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 blocks, 1.8% fragmentation) * FILE SYSTEM MARKED CLEAN * There were a lot of the messages. I've not checked if anything after -r320570 for 32-bit powerpc shows such or not. (The socket software problem has an official fix checked in: -r320652 . But I've not got as far as progressing to it or beyond yet.) -r320570 was a fix of another major problem for the use of __pthread_cleanup_push_imp stubs. I was not sure if the g_vfs_done notices were a distinct issue from the other issues of the time frame or not at the time and did not get as far as investigating that question at the time. Both dump and fsck likely are using snapshots so the issue is likely ties to ufs snapshots. May be it has a INO64 incompleteness that gives the huge offsets. (Wild guess.) If your context was more typical then the issue spans little-endian and big-endian since the powerpc context is big-endian but most usage is little endian. === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
dump trying to access incorrect block numbers?
Recent builds doing a backup (dump) cause nonsensical errors in syslog: Jul 7 00:10:24 toshi kernel: g_vfs_done():ada0s3a[READ(offset=6050375794688, length=32768)]error = 5 Jul 7 00:10:24 toshi kernel: g_vfs_done():ada0s3a[READ(offset=546222112768, length=32768)]error = 5 Jul 7 00:10:24 toshi kernel: g_vfs_done():ada0s3a[READ(offset=2142846844928, length=32768)]error = 5 Jul 7 00:10:24 toshi last message repeated 7 times Jul 7 00:10:24 toshi kernel: g_vfs_done():ada0s3a[READ(offset=2226879725568, length=32768)]error = 5 Jul 7 00:10:24 toshi kernel: g_vfs_done():ada0s3a[READ(offset=2941159211008, length=32768)]error = 5 Jul 7 00:10:24 toshi last message repeated 2 times Jul 7 00:10:24 toshi kernel: g_vfs_done():ada0s3a[READ(offset=3067208531968, length=32768)]error = 5 Jul 7 00:10:24 toshi kernel: g_vfs_done():ada0s3a[READ(offset=3277290733568, length=32768)]error = 5 Jul 7 00:10:24 toshi kernel: g_vfs_done():ada0s3a[READ(offset=3487372935168, length=32768)]error = 5 Jul 7 00:10:24 toshi kernel: g_vfs_done():ada0s3a[READ(offset=3697455136768, length=32768)]error = 5 Jul 7 00:10:24 toshi kernel: g_vfs_done():ada0s3a[READ(offset=3865520898048, length=32768)]error = 5 FSCK declares nothing to be wrong with the file-system. I even used the '-r' inode reclaim option and '-Z' to zero unused blocks to no effect. I now have two UFS-based systems showing the same symptoms - what's up with this? Michael ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"