Re: dump trying to access incorrect block numbers?

2017-07-12 Thread Mark Millard
I have extracted material from the list-exchanges
related to this and submitted:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220693

against the kernel for the issue.

I placed emphasis on the SSD-trim related "freeing
free block" panics that "fsck -B" can lead to
after it gets the g_vfs_done messages for unclean
ufs file systems but first noted that:

mksnap_ffs /.snap2

was enough to get the g_vfs_done messages.

I figured that the nastiest and most important known
consequences were "fsck -B" being broken for unclean
ufs file systems and having later panics trying to
trim based on how it is broken.

I did also mention dump as producing the messages.



I referenced. . .

> See also the exchange of list submittals associated
> with:
> 
> https://lists.freebsd.org/pipermail/freebsd-current/2017-July/066505.html
> 
> and:
> 
> https://lists.freebsd.org/pipermail/freebsd-current/2017-July/066508.html


===
Mark Millard
markmi at dsl-only.net


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: dump trying to access incorrect block numbers?

2017-07-08 Thread Rodney W. Grimes
> On 07/08/17 12:28, Rodney W. Grimes wrote:
> 
> > Since it has been speculated that this is occuring during the
> > creation of the snapshot, could you try just creating a snapshot
> > using mksnap_ffs and see if any errors occur?
> 
> After a short pause with disk activity, the same sorts of errors are 
> logged when using "mksnap_ffs /.snap2" where .snap2 did  not previously 
> exist,

Ok, so this simplifies what needs to be looked at, this is no
longer a dump(8) issue, but now a snapshot ufs issue.

This should be much easier for more people to try out.  

-- 
Rod Grimes rgri...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: dump trying to access incorrect block numbers?

2017-07-08 Thread Mark Millard
[A normal multi-user boot's fsck activity can do
 fsck -B activity that gets the problem.]

On 2017-Jul-8, at 9:45 AM, Mark Millard  wrote:

> [I add notes about a problem that happens after the
> "fsck -B". Also  forgot to mention: production style
> kernel world builds were in use. And a tried a
> powerpc64 build and it works the same.]
> 
> On 2017-Jul-7, at 11:09 PM, Mark Millard  wrote:
> 
>> [This note has more information than one sent with extra text
>> in the subject but with a partially different "to" list.]
>> 
>> Peter Jeremy peter at rulingia.com wrote on
>> Sat Jul 8 02:00:47 UTC 2017 :
>> 
>>> When did you first notice this (what SVN revision)?
>>> Do you know what the last good SVN revision was?
>>> Is this a new or old filesystem?
>>> Is the filesystem mounted/active or not when you dump it?
>>> What are the relevant parameters for the filesystem on ada0s3a?
>>> Are you running softupdates, journalling etc?
>>> Which dump(8) phase is reporting the errors?
>>> What are the exact dump and fsck commands you ran?
>> 
>> I can add a little information with some contrast
>> and only "fsck -B" in use (with an unclean file
>> system from a prior crash), no dump use. Still:
>> a snapshot is involved in the below.
>> 
>> Unfortunately two problems with major consequences
>> for my involved context limit the svn range that I
>> can cover for the activity, the problem version
>> ranges being:
>> 
>> -r319722 through -r320651 (fixed by -r320652)
>> (actually this is why I had used "boot -s" 
>> in what I report later: I could get to a
>> shell prompt that way instead of crashing
>> before any login prompt; the crashes left
>> the file system in need of repair)
>> 
>> -r320509 through -r320561 (fixed by -r320570)
>> 
>> So I was using -r320570 to avoid one of the
>> two problems.
>> 
>> 
>> 
>> Context: 32-bit powerpc FreeBSD used on PowerMac G5
>> so-called "Quad-core". (So big-endian as well.)
>> Softupdates, no journalling. Long-in-use file
>> system having lots of FreeBSD versions updates
>> and port rebuilds over the time.
>> 
>> The following is from now, not from the time of the
>> example messages:
>> 
>> # dumpfs / | more
>> magic   19540119 (UFS2) timeFri Jul  7 22:53:34 2017
>> superblock location 65536   id  [  ]
>> ncg 158 size25165823blocks  24372006
>> bsize   32768   shift   15  mask0x8000
>> fsize   4096shift   12  mask0xf000
>> frag8   shift   3   fsbtodb 3
>> minfree 8%  optim   timesymlinklen 120
>> maxbsize 32768  maxbpg  4096maxcontig 4 contigsumsize 4
>> nbfree  2130375 ndir65518   nifree  11769796nffree  425065
>> bpg 20032   fpg 160256  ipg 80128   unrefs  0
>> nindir  4096inopb   128 maxfilesize 2252349704110079
>> sbsize  4096cgsize  32768   csaddr  5048cssize  4096
>> sblkno  24  cblkno  32  iblkno  40  dblkno  5048
>> cgrotor 127 fmod0   ronly   0   clean   0
>> metaspace 6408  avgfpdir 64 avgfilesize 16384
>> flags   soft-updates trim 
>> fsmnt   /
>> volname FBSDG4Srootfs   swuid   0   providersize25165823
>> . . .
>> 
>> 
>> 
>> What I had done that produced the messages was:
>> 
>> > leaves root (only) file system not marked clean
>> so fsck -B will actually do something below>
>> 
>> boot -s (so: single user mode)
>> # The next 3 lines are the content of a generic, manually-run script.
>> mount -u /
>> mount -a -t ufs (but there is no other file system)
>> swapon -a   (there is a swap partition)
>> #
>> fsck -B
>> 
>> That "fsck -B" caused the same kinds of lines
>> reported by Michael Butler, happening as fsck
>> makes a snapshot for the background processing
>> to use. (I have camera pictures and could type
>> in some of the lines if needed.)
>> 
>> After those lines was text like (typed in from
>> an example camera picture):
>> 
>> ** //.snap/fsck_snapshot
>> ** Last Mount on /
>> ** Root file system
>> ** Phase 1 - Check Blocks and Sizes
>> ** Phase 2 - Check Pathnames
>> ** Phase 3 - Check Connectivity
>> ** Phase 4 - Check Reference Counts
>> ** Phase 5 - Check Cyl groups
>> Reclaimed: 0 directories, 1 files, 22680 fragments
>> 780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 blocks, 
>> 1.8% fragmentation)
>> 
>> * FILE SYSTEM MARKED CLEAN *
> 
> [I forgot or mention that the context was a
> production style kernel and world build,
> no invariants or other such.]
> 
> Since I'm running a patched -r320570 for the
> issue:
> 
> -r319722 through -r320651 (fixed by -r320652)
> 
> I went back and forced a power-off without
> shutdown and did the sequence:
> 
> boot -s (so: single user mode)
> # The next 3 lines are the content of a generic, manually-run script.
> mount -u /
> mount -a -t ufs (but there is no other file system)
> swapon -a   (there is a swap partition)
> #
> fsck -B
> 
> but always waited briefly after the fsck -B finished.
> 
> Like before the followi

Re: dump trying to access incorrect block numbers?

2017-07-08 Thread Mark Millard
[I add notes about a problem that happens after the
"fsck -B". Also  forgot to mention: production style
kernel world builds were in use. And a tried a
powerpc64 build and it works the same.]

On 2017-Jul-7, at 11:09 PM, Mark Millard  wrote:

> [This note has more information than one sent with extra text
> in the subject but with a partially different "to" list.]
> 
> Peter Jeremy peter at rulingia.com wrote on
> Sat Jul 8 02:00:47 UTC 2017 :
> 
>> When did you first notice this (what SVN revision)?
>> Do you know what the last good SVN revision was?
>> Is this a new or old filesystem?
>> Is the filesystem mounted/active or not when you dump it?
>> What are the relevant parameters for the filesystem on ada0s3a?
>> Are you running softupdates, journalling etc?
>> Which dump(8) phase is reporting the errors?
>> What are the exact dump and fsck commands you ran?
> 
> I can add a little information with some contrast
> and only "fsck -B" in use (with an unclean file
> system from a prior crash), no dump use. Still:
> a snapshot is involved in the below.
> 
> Unfortunately two problems with major consequences
> for my involved context limit the svn range that I
> can cover for the activity, the problem version
> ranges being:
> 
> -r319722 through -r320651 (fixed by -r320652)
> (actually this is why I had used "boot -s" 
> in what I report later: I could get to a
> shell prompt that way instead of crashing
> before any login prompt; the crashes left
> the file system in need of repair)
> 
> -r320509 through -r320561 (fixed by -r320570)
> 
> So I was using -r320570 to avoid one of the
> two problems.
> 
> 
> 
> Context: 32-bit powerpc FreeBSD used on PowerMac G5
> so-called "Quad-core". (So big-endian as well.)
> Softupdates, no journalling. Long-in-use file
> system having lots of FreeBSD versions updates
> and port rebuilds over the time.
> 
> The following is from now, not from the time of the
> example messages:
> 
> # dumpfs / | more
> magic   19540119 (UFS2) timeFri Jul  7 22:53:34 2017
> superblock location 65536   id  [  ]
> ncg 158 size25165823blocks  24372006
> bsize   32768   shift   15  mask0x8000
> fsize   4096shift   12  mask0xf000
> frag8   shift   3   fsbtodb 3
> minfree 8%  optim   timesymlinklen 120
> maxbsize 32768  maxbpg  4096maxcontig 4 contigsumsize 4
> nbfree  2130375 ndir65518   nifree  11769796nffree  425065
> bpg 20032   fpg 160256  ipg 80128   unrefs  0
> nindir  4096inopb   128 maxfilesize 2252349704110079
> sbsize  4096cgsize  32768   csaddr  5048cssize  4096
> sblkno  24  cblkno  32  iblkno  40  dblkno  5048
> cgrotor 127 fmod0   ronly   0   clean   0
> metaspace 6408  avgfpdir 64 avgfilesize 16384
> flags   soft-updates trim 
> fsmnt   /
> volname FBSDG4Srootfs   swuid   0   providersize25165823
> . . .
> 
> 
> 
> What I had done that produced the messages was:
> 
>  leaves root (only) file system not marked clean
> so fsck -B will actually do something below>
> 
> boot -s (so: single user mode)
> # The next 3 lines are the content of a generic, manually-run script.
> mount -u /
> mount -a -t ufs (but there is no other file system)
> swapon -a   (there is a swap partition)
> #
> fsck -B
> 
> That "fsck -B" caused the same kinds of lines
> reported by Michael Butler, happening as fsck
> makes a snapshot for the background processing
> to use. (I have camera pictures and could type
> in some of the lines if needed.)
> 
> After those lines was text like (typed in from
> an example camera picture):
> 
> ** //.snap/fsck_snapshot
> ** Last Mount on /
> ** Root file system
> ** Phase 1 - Check Blocks and Sizes
> ** Phase 2 - Check Pathnames
> ** Phase 3 - Check Connectivity
> ** Phase 4 - Check Reference Counts
> ** Phase 5 - Check Cyl groups
> Reclaimed: 0 directories, 1 files, 22680 fragments
> 780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 blocks, 1.8% 
> fragmentation)
> 
> * FILE SYSTEM MARKED CLEAN *

[I forgot or mention that the context was a
production style kernel and world build,
no invariants or other such.]

Since I'm running a patched -r320570 for the
issue:

-r319722 through -r320651 (fixed by -r320652)

I went back and forced a power-off without
shutdown and did the sequence:

boot -s (so: single user mode)
# The next 3 lines are the content of a generic, manually-run script.
mount -u /
mount -a -t ufs (but there is no other file system)
swapon -a   (there is a swap partition)
#
fsck -B

but always waited briefly after the fsck -B finished.

Like before the following happens as it tries to trim:
(typed in from camera picture)

panic: ffs_blkfree_cq: freeing free block
cpuid = 2 (varies, of course)
time = (varies)
KDB: stack backtrace
(stack addresses can vary: just an example here)
0xd23b17e0: at kdb_backtrace+0x5c
0xd23b1850: at vpanic+0x1e8
0xd23b18c0: at panic+0x54

Re: dump trying to access incorrect block numbers?

2017-07-08 Thread Rodney W. Grimes
> On 07/07/17 21:53, Peter Jeremy wrote:
> > On 2017-Jul-07 10:44:36 -0400, Michael Butler  
> > wrote:
> >> Recent builds doing a backup (dump) cause nonsensical errors in syslog:
> > 
> > I can't directly offer any ideas but some more background might help:
> > When did you first notice this (what SVN revision)?
> 
> I was stuck on SVN r319721 on the i386 machine while the socket/union 
> issue was addressed. That version did not display the problem.
> 
> > Do you know what the last good SVN revision was?
> > Is this a new or old filesystem?
> 
> old - it's been years since this system was rebuilt.
> 
> > Is the filesystem mounted/active or not when you dump it?
> 
> Mounted and active.
> 
> > What are the relevant parameters for the filesystem on ada0s3a?
> 
> imb@toshi:/home/imb> dumpfs /
> 
> magic   19540119 (UFS2) timeFri Jul  7 22:43:49 2017
> superblock location 65536   id  [ 56c8bf68 1a8b12b5 ]
> ncg 516 size82575360blocks  79978821
> bsize   32768   shift   15  mask0x8000
> fsize   4096shift   12  mask0xf000
> frag8   shift   3   fsbtodb 3
> minfree 8%  optim   timesymlinklen 120
> maxbsize 32768  maxbpg  4096maxcontig 4 contigsumsize 4
> nbfree  3965346 ndir98169   nifree  40196026nffree  453383
> bpg 20035   fpg 160280  ipg 80256   unrefs  0
> nindir  4096inopb   128 maxfilesize 2252349704110079
> sbsize  4096cgsize  32768   csaddr  5056cssize  12288
> sblkno  24  cblkno  32  iblkno  40  dblkno  5056
> cgrotor 253 fmod0   ronly   0   clean   0
> metaspace 6408  avgfpdir 64 avgfilesize 16384
> flags   soft-updates
> fsmnt   /
> volname swuid   0   providersize82575360
> 
>   [ .. ]
> 
> > Are you running softupdates, journalling etc?
> 
> soft-updates only.
> 
> > Which dump(8) phase is reporting the errors?
> 
> The errors occur before the "date of the last level x dump" message - 
> presumably, this is while creating the snapshot.
> 
> > What are the exact dump and fsck commands you ran?
> 
> /sbin/dump 0Lauf - -C 32 /
> 
> none of the following report any (unexpected) errors:
> 
> fsck -f /
> fsck -f -r /
> fsck -f -Z /
> 
> > 
> >> I now have two UFS-based systems showing the same symptoms - what's up
> >> with this?
> > 
> > Was there anything you did on either filesystem that might have triggered 
> > it?
> 
> Other than update the kernel, no.

Since it has been speculated that this is occuring during the
creation of the snapshot, could you try just creating a snapshot
using mksnap_ffs and see if any errors occur?


-- 
Rod Grimes rgri...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: dump trying to access incorrect block numbers?

2017-07-08 Thread Michael Butler

On 07/08/17 12:28, Rodney W. Grimes wrote:


Since it has been speculated that this is occuring during the
creation of the snapshot, could you try just creating a snapshot
using mksnap_ffs and see if any errors occur?


After a short pause with disk activity, the same sorts of errors are 
logged when using "mksnap_ffs /.snap2" where .snap2 did  not previously 
exist,


Michael


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: dump trying to access incorrect block numbers?

2017-07-07 Thread Mark Millard
[This note has more information than one sent with extra text
in the subject but with a partially different "to" list.]

Peter Jeremy peter at rulingia.com wrote on
Sat Jul 8 02:00:47 UTC 2017 :

> When did you first notice this (what SVN revision)?
> Do you know what the last good SVN revision was?
> Is this a new or old filesystem?
> Is the filesystem mounted/active or not when you dump it?
> What are the relevant parameters for the filesystem on ada0s3a?
> Are you running softupdates, journalling etc?
> Which dump(8) phase is reporting the errors?
> What are the exact dump and fsck commands you ran?

I can add a little information with some contrast
and only "fsck -B" in use (with an unclean file
system from a prior crash), no dump use. Still:
a snapshot is involved in the below.

Unfortunately two problems with major consequences
for my involved context limit the svn range that I
can cover for the activity, the problem version
ranges being:

-r319722 through -r320651 (fixed by -r320652)
(actually this is why I had used "boot -s" 
in what I report later: I could get to a
shell prompt that way instead of crashing
before any login prompt; the crashes left
the file system in need of repair)

-r320509 through -r320561 (fixed by -r320570)

So I was using -r320570 to avoid one of the
two problems.



Context: 32-bit powerpc FreeBSD used on PowerMac G5
so-called "Quad-core". (So big-endian as well.)
Softupdates, no journalling. Long-in-use file
system having lots of FreeBSD versions updates
and port rebuilds over the time.

The following is from now, not from the time of the
example messages:

# dumpfs / | more
magic   19540119 (UFS2) timeFri Jul  7 22:53:34 2017
superblock location 65536   id  [  ]
ncg 158 size25165823blocks  24372006
bsize   32768   shift   15  mask0x8000
fsize   4096shift   12  mask0xf000
frag8   shift   3   fsbtodb 3
minfree 8%  optim   timesymlinklen 120
maxbsize 32768  maxbpg  4096maxcontig 4 contigsumsize 4
nbfree  2130375 ndir65518   nifree  11769796nffree  425065
bpg 20032   fpg 160256  ipg 80128   unrefs  0
nindir  4096inopb   128 maxfilesize 2252349704110079
sbsize  4096cgsize  32768   csaddr  5048cssize  4096
sblkno  24  cblkno  32  iblkno  40  dblkno  5048
cgrotor 127 fmod0   ronly   0   clean   0
metaspace 6408  avgfpdir 64 avgfilesize 16384
flags   soft-updates trim 
fsmnt   /
volname FBSDG4Srootfs   swuid   0   providersize25165823
. . .



What I had done that produced the messages was:



boot -s (so: single user mode)
# The next 3 lines are the content of a generic, manually-run script.
mount -u /
mount -a -t ufs (but there is no other file system)
swapon -a   (there is a swap partition)
#
fsck -B

That "fsck -B" caused the same kinds of lines
reported by Michael Butler, happening as fsck
makes a snapshot for the background processing
to use. (I have camera pictures and could type
in some of the lines if needed.)

After those lines was text like (typed in from
an example camera picture):

** //.snap/fsck_snapshot
** Last Mount on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
Reclaimed: 0 directories, 1 files, 22680 fragments
780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 blocks, 1.8% 
fragmentation)

* FILE SYSTEM MARKED CLEAN *


===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: dump trying to access incorrect block numbers?

2017-07-07 Thread Michael Butler

On 07/07/17 21:53, Peter Jeremy wrote:

On 2017-Jul-07 10:44:36 -0400, Michael Butler  
wrote:

Recent builds doing a backup (dump) cause nonsensical errors in syslog:


I can't directly offer any ideas but some more background might help:
When did you first notice this (what SVN revision)?


I was stuck on SVN r319721 on the i386 machine while the socket/union 
issue was addressed. That version did not display the problem.



Do you know what the last good SVN revision was?
Is this a new or old filesystem?


old - it's been years since this system was rebuilt.


Is the filesystem mounted/active or not when you dump it?


Mounted and active.


What are the relevant parameters for the filesystem on ada0s3a?


imb@toshi:/home/imb> dumpfs /

magic   19540119 (UFS2) timeFri Jul  7 22:43:49 2017
superblock location 65536   id  [ 56c8bf68 1a8b12b5 ]
ncg 516 size82575360blocks  79978821
bsize   32768   shift   15  mask0x8000
fsize   4096shift   12  mask0xf000
frag8   shift   3   fsbtodb 3
minfree 8%  optim   timesymlinklen 120
maxbsize 32768  maxbpg  4096maxcontig 4 contigsumsize 4
nbfree  3965346 ndir98169   nifree  40196026nffree  453383
bpg 20035   fpg 160280  ipg 80256   unrefs  0
nindir  4096inopb   128 maxfilesize 2252349704110079
sbsize  4096cgsize  32768   csaddr  5056cssize  12288
sblkno  24  cblkno  32  iblkno  40  dblkno  5056
cgrotor 253 fmod0   ronly   0   clean   0
metaspace 6408  avgfpdir 64 avgfilesize 16384
flags   soft-updates
fsmnt   /
volname swuid   0   providersize82575360

 [ .. ]


Are you running softupdates, journalling etc?


soft-updates only.


Which dump(8) phase is reporting the errors?


The errors occur before the "date of the last level x dump" message - 
presumably, this is while creating the snapshot.



What are the exact dump and fsck commands you ran?


/sbin/dump 0Lauf - -C 32 /

none of the following report any (unexpected) errors:

fsck -f /
fsck -f -r /
fsck -f -Z /




I now have two UFS-based systems showing the same symptoms - what's up
with this?


Was there anything you did on either filesystem that might have triggered it?


Other than update the kernel, no.

Michael

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: dump trying to access incorrect block numbers?

2017-07-07 Thread Peter Jeremy
On 2017-Jul-07 10:44:36 -0400, Michael Butler  
wrote:
>Recent builds doing a backup (dump) cause nonsensical errors in syslog:

I can't directly offer any ideas but some more background might help:
When did you first notice this (what SVN revision)?
Do you know what the last good SVN revision was?
Is this a new or old filesystem?
Is the filesystem mounted/active or not when you dump it?
What are the relevant parameters for the filesystem on ada0s3a?
Are you running softupdates, journalling etc?
Which dump(8) phase is reporting the errors?
What are the exact dump and fsck commands you ran?

>I now have two UFS-based systems showing the same symptoms - what's up 
>with this?

Was there anything you did on either filesystem that might have triggered it?

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: dump trying to access incorrect block numbers? [It is not just dump that can get such]

2017-07-07 Thread Mark Millard

On 2017-Jul-7, at 4:49 PM, Michael Butler  wrote:

> On 07/07/17 19:02, Mark Millard wrote:
>> Michael Butler imb at protected-networks.net  wrote on
>> Fri Jul 7 14:45:12 UTC 2017 :
>>> Recent builds doing a backup (dump) cause nonsensical errors in syslog:
>>> 
>>> Jul  7 00:10:24 toshi kernel:
>>> g_vfs_done():ada0s3a[READ(offset=6050375794688, length=32768)]error = 5
>>> Jul  7 00:10:24 toshi kernel:
>>> g_vfs_done():ada0s3a[READ(offset=546222112768, length=32768)]error = 5
>>> Jul  7 00:10:24 toshi kernel:
> 
> [ snip ]
> 
>> Both dump and fsck likely are using snapshots
>> so the issue is likely ties to ufs snapshots.
>> May be it has a INO64 incompleteness that
>> gives the huge offsets. (Wild guess.)
>> If your context was more typical then the issue
>> spans little-endian and big-endian since the
>> powerpc context is big-endian but most usage
>> is little endian.
> 
> I'm seeing this on both amd64 and i386 builds (@SVN r320760) when dump tries 
> to build a snap-shot. These are both non-debug, non-invariant production boxes

Sounds like there is enough evidence of repeatability,
span of TARGET_ARCH's and systems, and recent enough
range of -r320??? vintages for a bugzilla submittal.

Your TARGET_ARCH's are more main-stream then where I've
tried something that showed the issue. What to do the
initial submittal?

===
Mark Millard
markmi at dsl-only.net



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: dump trying to access incorrect block numbers? [It is not just dump that can get such]

2017-07-07 Thread Michael Butler

On 07/07/17 19:02, Mark Millard wrote:

Michael Butler imb at protected-networks.net  wrote on
Fri Jul 7 14:45:12 UTC 2017 :


Recent builds doing a backup (dump) cause nonsensical errors in syslog:

Jul  7 00:10:24 toshi kernel:
g_vfs_done():ada0s3a[READ(offset=6050375794688, length=32768)]error = 5
Jul  7 00:10:24 toshi kernel:
g_vfs_done():ada0s3a[READ(offset=546222112768, length=32768)]error = 5
Jul  7 00:10:24 toshi kernel:


 [ snip ]


Both dump and fsck likely are using snapshots
so the issue is likely ties to ufs snapshots.
May be it has a INO64 incompleteness that
gives the huge offsets. (Wild guess.)

If your context was more typical then the issue
spans little-endian and big-endian since the
powerpc context is big-endian but most usage
is little endian.


I'm seeing this on both amd64 and i386 builds (@SVN r320760) when dump 
tries to build a snap-shot. These are both non-debug, non-invariant 
production boxes,


Michael

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: dump trying to access incorrect block numbers? [It is not just dump that can get such]

2017-07-07 Thread Mark Millard
Michael Butler imb at protected-networks.net  wrote on 
Fri Jul 7 14:45:12 UTC 2017 :

> Recent builds doing a backup (dump) cause nonsensical errors in syslog:
> 
> Jul  7 00:10:24 toshi kernel: 
> g_vfs_done():ada0s3a[READ(offset=6050375794688, length=32768)]error = 5
> Jul  7 00:10:24 toshi kernel: 
> g_vfs_done():ada0s3a[READ(offset=546222112768, length=32768)]error = 5
> Jul  7 00:10:24 toshi kernel: 
> g_vfs_done():ada0s3a[READ(offset=2142846844928, length=32768)]error = 5
> Jul  7 00:10:24 toshi last message repeated 7 times
> Jul  7 00:10:24 toshi kernel: 
> g_vfs_done():ada0s3a[READ(offset=2226879725568, length=32768)]error = 5
> Jul  7 00:10:24 toshi kernel: 
> g_vfs_done():ada0s3a[READ(offset=2941159211008, length=32768)]error = 5
> Jul  7 00:10:24 toshi last message repeated 2 times
> Jul  7 00:10:24 toshi kernel: 
> g_vfs_done():ada0s3a[READ(offset=3067208531968, length=32768)]error = 5
> Jul  7 00:10:24 toshi kernel: 
> g_vfs_done():ada0s3a[READ(offset=3277290733568, length=32768)]error = 5
> Jul  7 00:10:24 toshi kernel: 
> g_vfs_done():ada0s3a[READ(offset=3487372935168, length=32768)]error = 5
> Jul  7 00:10:24 toshi kernel: 
> g_vfs_done():ada0s3a[READ(offset=3697455136768, length=32768)]error = 5
> Jul  7 00:10:24 toshi kernel: 
> g_vfs_done():ada0s3a[READ(offset=3865520898048, length=32768)]error = 5
> 
> FSCK declares nothing to be wrong with the file-system. I even used the 
> '-r' inode reclaim option and '-Z' to zero unused blocks to no effect.
> 
> I now have two UFS-based systems showing the same symptoms - what's up 
> with this?

I've seen these kind of messages on 32-bit powerpc -r320570 when
using "boot -s" (standalone) and doing an fsck after making the
ufs root file system writable. (-r320570 could not boot
multi-user all the way without workarounds due to socket software
errors.) [Context was a production-style kernel build, not the
debug style --but I likely did not try this for a debug kernel
build.]

The messages came out before the following:
(manually retyped from a camera picture)

** //.snap/fsck_snapshot
** Last Mount on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
Reclaimed: 0 directories, 1 files, 22680 fragments
780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 blocks, 1.8% 
fragmentation)

* FILE SYSTEM MARKED CLEAN *


There were a lot of the messages.


I've not checked if anything after -r320570
for 32-bit powerpc shows such or not. (The
socket software problem has an official fix
checked in: -r320652 . But I've not got as
far as progressing to it or beyond yet.)

-r320570 was a fix of another major problem
for the use of  __pthread_cleanup_push_imp
stubs.

I was not sure if the g_vfs_done notices
were a distinct issue from the other issues
of the time frame or not at the time and
did not get as far as investigating that
question at the time.


Both dump and fsck likely are using snapshots
so the issue is likely ties to ufs snapshots.
May be it has a INO64 incompleteness that
gives the huge offsets. (Wild guess.)

If your context was more typical then the issue
spans little-endian and big-endian since the
powerpc context is big-endian but most usage
is little endian.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


dump trying to access incorrect block numbers?

2017-07-07 Thread Michael Butler

Recent builds doing a backup (dump) cause nonsensical errors in syslog:

Jul  7 00:10:24 toshi kernel: 
g_vfs_done():ada0s3a[READ(offset=6050375794688, length=32768)]error = 5
Jul  7 00:10:24 toshi kernel: 
g_vfs_done():ada0s3a[READ(offset=546222112768, length=32768)]error = 5
Jul  7 00:10:24 toshi kernel: 
g_vfs_done():ada0s3a[READ(offset=2142846844928, length=32768)]error = 5

Jul  7 00:10:24 toshi last message repeated 7 times
Jul  7 00:10:24 toshi kernel: 
g_vfs_done():ada0s3a[READ(offset=2226879725568, length=32768)]error = 5
Jul  7 00:10:24 toshi kernel: 
g_vfs_done():ada0s3a[READ(offset=2941159211008, length=32768)]error = 5

Jul  7 00:10:24 toshi last message repeated 2 times
Jul  7 00:10:24 toshi kernel: 
g_vfs_done():ada0s3a[READ(offset=3067208531968, length=32768)]error = 5
Jul  7 00:10:24 toshi kernel: 
g_vfs_done():ada0s3a[READ(offset=3277290733568, length=32768)]error = 5
Jul  7 00:10:24 toshi kernel: 
g_vfs_done():ada0s3a[READ(offset=3487372935168, length=32768)]error = 5
Jul  7 00:10:24 toshi kernel: 
g_vfs_done():ada0s3a[READ(offset=3697455136768, length=32768)]error = 5
Jul  7 00:10:24 toshi kernel: 
g_vfs_done():ada0s3a[READ(offset=3865520898048, length=32768)]error = 5


FSCK declares nothing to be wrong with the file-system. I even used the 
'-r' inode reclaim option and '-Z' to zero unused blocks to no effect.


I now have two UFS-based systems showing the same symptoms - what's up 
with this?


Michael
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"