Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
On 2017-Mar-21, at 7:21 PM, Mark Millard wrote: > On 2017-Mar-18, at 9:10 PM, Mark Millard wrote: > >> >> On 2017-Mar-18, at 5:53 PM, Mark Millard wrote: >> >>> A new, significant discovery follows. . . >>> >>> While checking out use of procstat -v I ran >>> into the following common property for the 3 >>> programs that I looked at: >>> >>> A) My small test program that fails for >>> a dynamically allocated space. >>> >>> B) sh reporting Failed assertion: "tsd_booted". >>> >>> C) su reporting Failed assertion: "tsd_booted". >>> >>> Here are example addresses from the area of >>> incorrectly zeroed memory (A then B then C): >>> >>> (lldb) print dyn_region >>> (region *volatile) $0 = 0x40616000 >>> >>> (lldb) print &__je_tsd_booted >>> (bool *) $0 = 0x40618520 >>> >>> (lldb) print &__je_tsd_booted >>> (bool *) $0 = 0x40618520 >> >> That last above was a copy/paste error. Correction: >> >> (lldb) print &__je_tsd_booted >> (bool *) $0 = 0x4061d520 >> >>> The first is from dynamic allocation ending up >>> in the area. The other two are from libc.so.7 >>> globals/statics ending up in the general area. >>> >>> It looks like something is trashing a specific >>> memory area for some reason, rather independently >>> of what the program specifics are. > > I probably should have noted that the processes > involved were: child/parent then grandparent > and then great grandparent. The grandparent > was sh and the great grandparent was su. > > The ancestors in the process tree are being > damaged, not just the instances of the > program that demonstrates the problem. > >>> Other notes: >>> >>> At least for my small program showing failure: >>> >>> Being explicit about the combined conditions for failure >>> for my test program. . . >>> >>> Both tcache enabled and allocations fitting in SMALL_MAXCLASS >>> are required in order to make the program fail. >>> >>> Note: >>> >>> lldb) print __je_tcache_maxclass >>> (size_t) $0 = 32768 >>> >>> which is larger than SMALL_MAXCLASS. I've not observed >>> failures for sizes above SMALL_MAXCLASS but not exceeding >>> __je_tcache_maxclass. >>> >>> Thus tcache use by itself does not seen sufficient for >>> my program to get corruption of its dynamically allocated >>> memory: the small allocation size also matters. >>> >>> >>> Be warned that I can not eliminate the possibility that >>> the trashing changed what region of memory it trashed >>> for larger allocations or when tcache is disabled. >> >> The pine64+ 2GB eventually got into a state where: >> >> /etc/malloc.conf -> tcache:false >> >> made no difference and the failure kept occurring >> with that symbolic link in place. >> >> But after a reboot of the pin46+ 2GB >> /etc/malloc.conf -> tcache:false was again effective >> for my test program. (It was still present from >> before the reboot.) >> >> I checked the .core files and the allocated address >> assigned to dyn_region was the same in the tries >> before and after the reboot. (I had put in an >> additional raise(SIGABRT) so I'd always have >> a core file to look at.) >> >> Apparently /etc/malloc.conf -> tcache:false was >> being ignored before the reboot for some reason? > > I have also discovered that if the child process > in an example like my program does a: > > (void) posix_madvise(dyn_region, region_size, POSIX_MADV_WILLNEED); > > after the fork but before the sleep/swap-out/wait > then the problem does not happen. This is without > any read or write access to the memory between the > fork and sleep/swap-out/wait. > > By contrast such POSIX_MADV_WILLNEED use in the parent > process does not change the failure behavior. I've added another test program to bugzilla 217239 and 217138, one with thousands of 14 KiByte allocations. The test program usually ends up with them all being zeroed in the parent and child of the fork. But I've had a couple of runs where a much smaller prefix was messed up and then there were normal, expected values. #define region_size (14u*1024u) . . . #define num_regions (256u*1024u*1024u/region_size) So num_regions==18724, using up most of 256 MiBytes. Note: each region has its own 14 KiByte allocation. But dyn_regions[1296].array[0] in one example was the first normal value. In another example dyn_regions[2180].array[4096] was the first normal value. The last is interesting for being part way through an allocation's space. That but aligning with a 4 KiByte page size would seem odd for a pure-jemalloc issue. === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
On 2017-Mar-18, at 9:10 PM, Mark Millardwrote: > > On 2017-Mar-18, at 5:53 PM, Mark Millard wrote: > >> A new, significant discovery follows. . . >> >> While checking out use of procstat -v I ran >> into the following common property for the 3 >> programs that I looked at: >> >> A) My small test program that fails for >> a dynamically allocated space. >> >> B) sh reporting Failed assertion: "tsd_booted". >> >> C) su reporting Failed assertion: "tsd_booted". >> >> Here are example addresses from the area of >> incorrectly zeroed memory (A then B then C): >> >> (lldb) print dyn_region >> (region *volatile) $0 = 0x40616000 >> >> (lldb) print &__je_tsd_booted >> (bool *) $0 = 0x40618520 >> >> (lldb) print &__je_tsd_booted >> (bool *) $0 = 0x40618520 > > That last above was a copy/paste error. Correction: > > (lldb) print &__je_tsd_booted > (bool *) $0 = 0x4061d520 > >> The first is from dynamic allocation ending up >> in the area. The other two are from libc.so.7 >> globals/statics ending up in the general area. >> >> It looks like something is trashing a specific >> memory area for some reason, rather independently >> of what the program specifics are. I probably should have noted that the processes involved were: child/parent then grandparent and then great grandparent. The grandparent was sh and the great grandparent was su. The ancestors in the process tree are being damaged, not just the instances of the program that demonstrates the problem. >> Other notes: >> >> At least for my small program showing failure: >> >> Being explicit about the combined conditions for failure >> for my test program. . . >> >> Both tcache enabled and allocations fitting in SMALL_MAXCLASS >> are required in order to make the program fail. >> >> Note: >> >> lldb) print __je_tcache_maxclass >> (size_t) $0 = 32768 >> >> which is larger than SMALL_MAXCLASS. I've not observed >> failures for sizes above SMALL_MAXCLASS but not exceeding >> __je_tcache_maxclass. >> >> Thus tcache use by itself does not seen sufficient for >> my program to get corruption of its dynamically allocated >> memory: the small allocation size also matters. >> >> >> Be warned that I can not eliminate the possibility that >> the trashing changed what region of memory it trashed >> for larger allocations or when tcache is disabled. > > The pine64+ 2GB eventually got into a state where: > > /etc/malloc.conf -> tcache:false > > made no difference and the failure kept occurring > with that symbolic link in place. > > But after a reboot of the pin46+ 2GB > /etc/malloc.conf -> tcache:false was again effective > for my test program. (It was still present from > before the reboot.) > > I checked the .core files and the allocated address > assigned to dyn_region was the same in the tries > before and after the reboot. (I had put in an > additional raise(SIGABRT) so I'd always have > a core file to look at.) > > Apparently /etc/malloc.conf -> tcache:false was > being ignored before the reboot for some reason? I have also discovered that if the child process in an example like my program does a: (void) posix_madvise(dyn_region, region_size, POSIX_MADV_WILLNEED); after the fork but before the sleep/swap-out/wait then the problem does not happen. This is without any read or write access to the memory between the fork and sleep/swap-out/wait. By contrast such POSIX_MADV_WILLNEED use in the parent process does not change the failure behavior. === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
On 2017-Mar-18, at 5:53 PM, Mark Millardwrote: > A new, significant discovery follows. . . > > While checking out use of procstat -v I ran > into the following common property for the 3 > programs that I looked at: > > A) My small test program that fails for > a dynamically allocated space. > > B) sh reporting Failed assertion: "tsd_booted". > > C) su reporting Failed assertion: "tsd_booted". > > Here are example addresses from the area of > incorrectly zeroed memory (A then B then C): > > (lldb) print dyn_region > (region *volatile) $0 = 0x40616000 > > (lldb) print &__je_tsd_booted > (bool *) $0 = 0x40618520 > > (lldb) print &__je_tsd_booted > (bool *) $0 = 0x40618520 That last above was a copy/paste error. Correction: (lldb) print &__je_tsd_booted (bool *) $0 = 0x4061d520 > The first is from dynamic allocation ending up > in the area. The other two are from libc.so.7 > globals/statics ending up in the general area. > > It looks like something is trashing a specific > memory area for some reason, rather independently > of what the program specifics are. > > > Other notes: > > At least for my small program showing failure: > > Being explicit about the combined conditions for failure > for my test program. . . > > Both tcache enabled and allocations fitting in SMALL_MAXCLASS > are required in order to make the program fail. > > Note: > > lldb) print __je_tcache_maxclass > (size_t) $0 = 32768 > > which is larger than SMALL_MAXCLASS. I've not observed > failures for sizes above SMALL_MAXCLASS but not exceeding > __je_tcache_maxclass. > > Thus tcache use by itself does not seen sufficient for > my program to get corruption of its dynamically allocated > memory: the small allocation size also matters. > > > Be warned that I can not eliminate the possibility that > the trashing changed what region of memory it trashed > for larger allocations or when tcache is disabled. The pine64+ 2GB eventually got into a state where: /etc/malloc.conf -> tcache:false made no difference and the failure kept occurring with that symbolic link in place. But after a reboot of the pin46+ 2GB /etc/malloc.conf -> tcache:false was again effective for my test program. (It was still present from before the reboot.) I checked the .core files and the allocated address assigned to dyn_region was the same in the tries before and after the reboot. (I had put in an additional raise(SIGABRT) so I'd always have a core file to look at.) Apparently /etc/malloc.conf -> tcache:false was being ignored before the reboot for some reason? === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
A new, significant discovery follows. . . While checking out use of procstat -v I ran into the following common property for the 3 programs that I looked at: A) My small test program that fails for a dynamically allocated space. B) sh reporting Failed assertion: "tsd_booted". C) su reporting Failed assertion: "tsd_booted". Here are example addresses from the area of incorrectly zeroed memory (A then B then C): (lldb) print dyn_region (region *volatile) $0 = 0x40616000 (lldb) print &__je_tsd_booted (bool *) $0 = 0x40618520 (lldb) print &__je_tsd_booted (bool *) $0 = 0x40618520 The first is from dynamic allocation ending up in the area. The other two are from libc.so.7 globals/statics ending up in the general area. It looks like something is trashing a specific memory area for some reason, rather independently of what the program specifics are. Other notes: At least for my small program showing failure: Being explicit about the combined conditions for failure for my test program. . . Both tcache enabled and allocations fitting in SMALL_MAXCLASS are required in order to make the program fail. Note: lldb) print __je_tcache_maxclass (size_t) $0 = 32768 which is larger than SMALL_MAXCLASS. I've not observed failures for sizes above SMALL_MAXCLASS but not exceeding __je_tcache_maxclass. Thus tcache use by itself does not seen sufficient for my program to get corruption of its dynamically allocated memory: the small allocation size also matters. Be warned that I can not eliminate the possibility that the trashing changed what region of memory it trashed for larger allocations or when tcache is disabled. === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
[Summary: I've now tested on a rpi3 in addition to a pine64+ 2GB. Both contexts show the problem.] On 2017-Mar-16, at 2:07 AM, Mark Millard wrote: > On 2017-Mar-15, at 11:07 PM, Scott Bennett wrote: > >> Mark Millard wrote: >> >>> [Something strange happened to the automatic CC: fill-in for my original >>> reply. Also I should have mentioned that for my test program if a >>> variant is made that does not fork the swapping works fine.] >>> >>> On 2017-Mar-15, at 9:37 AM, Mark Millard wrote: >>> On 2017-Mar-15, at 6:15 AM, Scott Bennett wrote: > On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard > wrote: >> On 2017-Mar-14, at 4:44 PM, Bernd Walterwrote: >> >>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: [test_check() between the fork and the wait/sleep prevents the failure from occurring. Even a small access to the memory at that stage prevents the failure. Details follow.] >>> >>> Maybe a stupid question, since you might have written it somewhere. >>> What medium do you swap to? >>> I've seen broken firmware on microSD cards doing silent data >>> corruption for some access patterns. >> >> The root filesystem is on a USB SSD on a powered hub. >> >> Only the kernel is from the microSD card. >> >> I have several examples of the USB SSD model and have >> never observed such problems in any other context. >> >> [remainder of irrelevant material deleted --SB] > > You gave a very long-winded non-answer to Bernd's question, so I'll > repeat it here. What medium do you swap to? My wording of: The root filesystem is on a USB SSD on a powered hub. was definitely poor. It should have explicitly mentioned the swap partition too: The root filesystem and swap partition are both on the same USB SSD on a powered hub. More detail from dmesg -a for usb: usbus0: 12Mbps Full Speed USB v1.0 usbus1: 480Mbps High Speed USB v2.0 usbus2: 12Mbps Full Speed USB v1.0 usbus3: 480Mbps High Speed USB v2.0 ugen0.1: at usbus0 uhub0: on usbus0 ugen1.1: at usbus1 uhub1: on usbus1 ugen2.1: at usbus2 uhub2: on usbus2 ugen3.1: at usbus3 uhub3: on usbus3 . . . uhub0: 1 port with 1 removable, self powered uhub2: 1 port with 1 removable, self powered uhub1: 1 port with 1 removable, self powered uhub3: 1 port with 1 removable, self powered ugen3.2: at usbus3 uhub4 on uhub3 uhub4: on usbus3 uhub4: MTT enabled uhub4: 4 ports with 4 removable, self powered ugen3.3: at usbus3 umass0 on uhub4 umass0: on usbus3 umass0: SCSI over Bulk-Only; quirks = 0x0100 umass0:0:0: Attached to scbus0 . . . da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 da0: Fixed Direct Access SPC-4 SCSI device da0: Serial Number da0: 40.000MB/s transfers (Edited a bit because there is other material interlaced, even internal to some lines. Also: I removed the serial number of the specific example device.) >> >>Thank you. That presents a much clearer picture. > I will further note that any kind of USB device cannot automatically > be trusted to behave properly. USB devices are notorious, for example, > > [reasons why deleted --SB] > > You should identify where you page/swap to and then try substituting > a different device for that function as a test to eliminate the > possibility > of a bad storage device/controller. If the problem still occurs, that > means there still remains the possibility that another controller or its > firmware is defective instead. It could be a kernel bug, it is true, but > making sure there is no hardware or firmware error occurring is important, > and as I say, USB devices should always be considered suspect unless and > until proven innocent. [FYI: This is a ufs context, not a zfs one.] >> >>Right. It's only a Pi, after all. :-) > > It is a Pine64+ 2GB, not an rpi3. > I'm aware of such things. There is no evidence that has resulted in suggesting the USB devices that I can replace are a problem. Otherwise I'd not be going down this path. I only have access to the one arm64 device (a Pine64+ 2GB) so I've no ability to substitution-test what is on that board. >> >>There isn't even one open port on that hub that you could plug a >> flash drive into temporarily to be the paging device? > > Why do you think that I've never tried alternative devices? It > is just that the result was no evidence that my usually-in-use > SSD is having a special/local problem: the behavior continues > across all such contexts when the Pine64+ 2GB is involved. (Again > I have not had
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
Mark Millard wrote: > [Something strange happened to the automatic CC: fill-in for my original > reply. Also I should have mentioned that for my test program if a > variant is made that does not fork the swapping works fine.] > > On 2017-Mar-15, at 9:37 AM, Mark Millard wrote: > > > On 2017-Mar-15, at 6:15 AM, Scott Bennett wrote: > > > >>On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard > >> wrote: > >>> On 2017-Mar-14, at 4:44 PM, Bernd Walterwrote: > >>> > On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: > > [test_check() between the fork and the wait/sleep prevents the > > failure from occurring. Even a small access to the memory at > > that stage prevents the failure. Details follow.] > > Maybe a stupid question, since you might have written it somewhere. > What medium do you swap to? > I've seen broken firmware on microSD cards doing silent data > corruption for some access patterns. > >>> > >>> The root filesystem is on a USB SSD on a powered hub. > >>> > >>> Only the kernel is from the microSD card. > >>> > >>> I have several examples of the USB SSD model and have > >>> never observed such problems in any other context. > >>> > >>> [remainder of irrelevant material deleted --SB] > >> > >>You gave a very long-winded non-answer to Bernd's question, so I'll > >> repeat it here. What medium do you swap to? > > > > My wording of: > > > > The root filesystem is on a USB SSD on a powered hub. > > > > was definitely poor. It should have explicitly mentioned the > > swap partition too: > > > > The root filesystem and swap partition are both on the same > > USB SSD on a powered hub. > > > > More detail from dmesg -a for usb: > > > > usbus0: 12Mbps Full Speed USB v1.0 > > usbus1: 480Mbps High Speed USB v2.0 > > usbus2: 12Mbps Full Speed USB v1.0 > > usbus3: 480Mbps High Speed USB v2.0 > > ugen0.1: at usbus0 > > uhub0: on usbus0 > > ugen1.1: at usbus1 > > uhub1: on usbus1 > > ugen2.1: at usbus2 > > uhub2: on usbus2 > > ugen3.1: at usbus3 > > uhub3: on usbus3 > > . . . > > uhub0: 1 port with 1 removable, self powered > > uhub2: 1 port with 1 removable, self powered > > uhub1: 1 port with 1 removable, self powered > > uhub3: 1 port with 1 removable, self powered > > ugen3.2: at usbus3 > > uhub4 on uhub3 > > uhub4: on > > usbus3 > > uhub4: MTT enabled > > uhub4: 4 ports with 4 removable, self powered > > ugen3.3: at usbus3 > > umass0 on uhub4 > > umass0: on usbus3 > > umass0: SCSI over Bulk-Only; quirks = 0x0100 > > umass0:0:0: Attached to scbus0 > > . . . > > da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 > > da0: Fixed Direct Access SPC-4 SCSI device > > da0: Serial Number > > da0: 40.000MB/s transfers > > > > (Edited a bit because there is other material interlaced, even > > internal to some lines. Also: I removed the serial number of the > > specific example device.) Thank you. That presents a much clearer picture. > > > >>I will further note that any kind of USB device cannot automatically > >> be trusted to behave properly. USB devices are notorious, for example, > >> > >> [reasons why deleted --SB] > >> > >>You should identify where you page/swap to and then try substituting > >> a different device for that function as a test to eliminate the possibility > >> of a bad storage device/controller. If the problem still occurs, that > >> means there still remains the possibility that another controller or its > >> firmware is defective instead. It could be a kernel bug, it is true, but > >> making sure there is no hardware or firmware error occurring is important, > >> and as I say, USB devices should always be considered suspect unless and > >> until proven innocent. > > > > [FYI: This is a ufs context, not a zfs one.] Right. It's only a Pi, after all. :-) > > > > I'm aware of such things. There is no evidence that has resulted in > > suggesting the USB devices that I can replace are a problem. Otherwise > > I'd not be going down this path. I only have access to the one arm64 > > device (a Pine64+ 2GB) so I've no ability to substitution-test what > > is on that board. There isn't even one open port on that hub that you could plug a flash drive into temporarily to be the paging device? You could then try your tests before returning to the normal configuration. If there isn't an open port, then how about plugging a second hub into one of the first hub's ports and moving the displaced device to the second hub? A flash drive could then be plugged in. That kind of configuration is obviously a bad idea for the long run, but just to try your tests it ought to work well enough. (BTW, if a USB storage device containing a paging area drops off=line even momentarily and the system needs to use it, that is the beginning of the end, even though it may take up to a few minutes for everything to lock up. You probably won't be able to do an orderly
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
On 2017-Mar-15, at 11:07 PM, Scott Bennett wrote: > Mark Millard wrote: > >> [Something strange happened to the automatic CC: fill-in for my original >> reply. Also I should have mentioned that for my test program if a >> variant is made that does not fork the swapping works fine.] >> >> On 2017-Mar-15, at 9:37 AM, Mark Millard wrote: >> >>> On 2017-Mar-15, at 6:15 AM, Scott Bennett wrote: >>> On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard wrote: > On 2017-Mar-14, at 4:44 PM, Bernd Walterwrote: > >> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: >>> [test_check() between the fork and the wait/sleep prevents the >>> failure from occurring. Even a small access to the memory at >>> that stage prevents the failure. Details follow.] >> >> Maybe a stupid question, since you might have written it somewhere. >> What medium do you swap to? >> I've seen broken firmware on microSD cards doing silent data >> corruption for some access patterns. > > The root filesystem is on a USB SSD on a powered hub. > > Only the kernel is from the microSD card. > > I have several examples of the USB SSD model and have > never observed such problems in any other context. > > [remainder of irrelevant material deleted --SB] You gave a very long-winded non-answer to Bernd's question, so I'll repeat it here. What medium do you swap to? >>> >>> My wording of: >>> >>> The root filesystem is on a USB SSD on a powered hub. >>> >>> was definitely poor. It should have explicitly mentioned the >>> swap partition too: >>> >>> The root filesystem and swap partition are both on the same >>> USB SSD on a powered hub. >>> >>> More detail from dmesg -a for usb: >>> >>> usbus0: 12Mbps Full Speed USB v1.0 >>> usbus1: 480Mbps High Speed USB v2.0 >>> usbus2: 12Mbps Full Speed USB v1.0 >>> usbus3: 480Mbps High Speed USB v2.0 >>> ugen0.1: at usbus0 >>> uhub0: on usbus0 >>> ugen1.1: at usbus1 >>> uhub1: on usbus1 >>> ugen2.1: at usbus2 >>> uhub2: on usbus2 >>> ugen3.1: at usbus3 >>> uhub3: on usbus3 >>> . . . >>> uhub0: 1 port with 1 removable, self powered >>> uhub2: 1 port with 1 removable, self powered >>> uhub1: 1 port with 1 removable, self powered >>> uhub3: 1 port with 1 removable, self powered >>> ugen3.2: at usbus3 >>> uhub4 on uhub3 >>> uhub4: on >>> usbus3 >>> uhub4: MTT enabled >>> uhub4: 4 ports with 4 removable, self powered >>> ugen3.3: at usbus3 >>> umass0 on uhub4 >>> umass0: on usbus3 >>> umass0: SCSI over Bulk-Only; quirks = 0x0100 >>> umass0:0:0: Attached to scbus0 >>> . . . >>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 >>> da0: Fixed Direct Access SPC-4 SCSI device >>> da0: Serial Number >>> da0: 40.000MB/s transfers >>> >>> (Edited a bit because there is other material interlaced, even >>> internal to some lines. Also: I removed the serial number of the >>> specific example device.) > > Thank you. That presents a much clearer picture. >>> I will further note that any kind of USB device cannot automatically be trusted to behave properly. USB devices are notorious, for example, [reasons why deleted --SB] You should identify where you page/swap to and then try substituting a different device for that function as a test to eliminate the possibility of a bad storage device/controller. If the problem still occurs, that means there still remains the possibility that another controller or its firmware is defective instead. It could be a kernel bug, it is true, but making sure there is no hardware or firmware error occurring is important, and as I say, USB devices should always be considered suspect unless and until proven innocent. >>> >>> [FYI: This is a ufs context, not a zfs one.] > > Right. It's only a Pi, after all. :-) It is a Pine64+ 2GB, not an rpi3. >>> >>> I'm aware of such things. There is no evidence that has resulted in >>> suggesting the USB devices that I can replace are a problem. Otherwise >>> I'd not be going down this path. I only have access to the one arm64 >>> device (a Pine64+ 2GB) so I've no ability to substitution-test what >>> is on that board. > > There isn't even one open port on that hub that you could plug a > flash drive into temporarily to be the paging device? Why do you think that I've never tried alternative devices? It is just that the result was no evidence that my usually-in-use SSD is having a special/local problem: the behavior continues across all such contexts when the Pine64+ 2GB is involved. (Again I have not had access to an alternate to the one arm64 board. That limits my substitution testing possibilities.) Why would you expect a Flash drive to be better than another SSD for such testing? (The SSD that I usually use even happens to be a USB 3.0 SSD, capable of USB 3.0 speeds in USB 3.0 contexts. So is
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
[Something strange happened to the automatic CC: fill-in for my original reply. Also I should have mentioned that for my test program if a variant is made that does not fork the swapping works fine.] On 2017-Mar-15, at 9:37 AM, Mark Millard wrote: > On 2017-Mar-15, at 6:15 AM, Scott Bennett wrote: > >>On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard >> wrote: >>> On 2017-Mar-14, at 4:44 PM, Bernd Walterwrote: >>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: > [test_check() between the fork and the wait/sleep prevents the > failure from occurring. Even a small access to the memory at > that stage prevents the failure. Details follow.] Maybe a stupid question, since you might have written it somewhere. What medium do you swap to? I've seen broken firmware on microSD cards doing silent data corruption for some access patterns. >>> >>> The root filesystem is on a USB SSD on a powered hub. >>> >>> Only the kernel is from the microSD card. >>> >>> I have several examples of the USB SSD model and have >>> never observed such problems in any other context. >>> >>> [remainder of irrelevant material deleted --SB] >> >>You gave a very long-winded non-answer to Bernd's question, so I'll >> repeat it here. What medium do you swap to? > > My wording of: > > The root filesystem is on a USB SSD on a powered hub. > > was definitely poor. It should have explicitly mentioned the > swap partition too: > > The root filesystem and swap partition are both on the same > USB SSD on a powered hub. > > More detail from dmesg -a for usb: > > usbus0: 12Mbps Full Speed USB v1.0 > usbus1: 480Mbps High Speed USB v2.0 > usbus2: 12Mbps Full Speed USB v1.0 > usbus3: 480Mbps High Speed USB v2.0 > ugen0.1: at usbus0 > uhub0: on usbus0 > ugen1.1: at usbus1 > uhub1: on usbus1 > ugen2.1: at usbus2 > uhub2: on usbus2 > ugen3.1: at usbus3 > uhub3: on usbus3 > . . . > uhub0: 1 port with 1 removable, self powered > uhub2: 1 port with 1 removable, self powered > uhub1: 1 port with 1 removable, self powered > uhub3: 1 port with 1 removable, self powered > ugen3.2: at usbus3 > uhub4 on uhub3 > uhub4: on usbus3 > uhub4: MTT enabled > uhub4: 4 ports with 4 removable, self powered > ugen3.3: at usbus3 > umass0 on uhub4 > umass0: on usbus3 > umass0: SCSI over Bulk-Only; quirks = 0x0100 > umass0:0:0: Attached to scbus0 > . . . > da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 > da0: Fixed Direct Access SPC-4 SCSI device > da0: Serial Number > da0: 40.000MB/s transfers > > (Edited a bit because there is other material interlaced, even > internal to some lines. Also: I removed the serial number of the > specific example device.) > >>I will further note that any kind of USB device cannot automatically >> be trusted to behave properly. USB devices are notorious, for example, >> for momentarily dropping off-line and then immediately reconnecting. (ZFS >> reacts very poorly to such events, BTW.) This misbehavior can be caused >> by either processor involved, i.e., the one controlling either the >> upstream or the downstream device. Hubs are really bad about this, but >> any USB device can be guilty. You may have a defective storage device, >> its controller may be defective, or any controller in the chain all the >> way back to the motherboard may be defective or, not defective, but >> corrupted by having been connected to another device with corrupted >> (infected) firmware that tries to flash itself into the firmware flash >> chips in its potential victim. >>Flash memory chips, spinning disks, or {S,}{D,}RAM chips can be >> defective. Without parity bits, the devices may return bad data and lie >> about the presence of corrupted data. That, for example, is where ZFS >> is better than any kind of classical RAID because ZFS keeps checksums on >> everything, so it has a reasonable chance of detecting corruption even >> without parity support and, if there is any redundancy in the pool or the >> data set, fixing the bad data machine. Even having parity generally >> allows only the detection of single-bit errors, but not of repairing them. >>You should identify where you page/swap to and then try substituting >> a different device for that function as a test to eliminate the possibility >> of a bad storage device/controller. If the problem still occurs, that >> means there still remains the possibility that another controller or its >> firmware is defective instead. It could be a kernel bug, it is true, but >> making sure there is no hardware or firmware error occurring is important, >> and as I say, USB devices should always be considered suspect unless and >> until proven innocent. > > [FYI: This is a ufs context, not a zfs one.] > > I'm aware of such things. There is no evidence that has resulted in > suggesting the USB devices that I can replace are a problem. Otherwise > I'd not be going down this path. I
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
A single Byte access to a 4K Byte aligned region between the fork and wait/sleep/swap-out prevents that specific 4K Byte region from having the (bad) zeros. Sounds like a page sized unit of behavior to me. Details follow. On 2017-Mar-14, at 3:28 PM, Mark Millardwrote: > [test_check() between the fork and the wait/sleep prevents the > failure from occurring. Even a small access to the memory at > that stage prevents the failure. Details follow.] > > On 2017-Mar-14, at 11:07 AM, Mark Millard wrote: > >> [This is just a correction to the subject-line text to say arm64 >> instead of amd64.] >> >> On 2017-Mar-14, at 12:58 AM, Mark Millard wrote: >> >> [Another correction I'm afraid --about alternative program variations >> this time.] >> >> On 2017-Mar-13, at 11:52 PM, Mark Millard wrote: >> >>> I'm still at a loss about how to figure out what stages are messed >>> up. (Memory coherency? Some memory not swapped out? Bad data swapped >>> out? Wrong data swapped in?) >>> >>> But at least I've found a much smaller/simpler example to demonstrate >>> some problem with in my Pine64+_ 2GB context. >>> >>> The Pine64+ 2GB is the only amd64 context that I have access to. >> >> Someday I'll learn to type arm64 the first time instead of amd64. >> >>> The following program fails its check for data >>> having its expected byte pattern in dynamically >>> allocated memory after a fork/swap-out/swap-in >>> sequence. >>> >>> I'll note that the program sleeps for 60s after >>> forking to give time to do something else to >>> cause the parent and child processes to swap >>> out (RES=0 as seen in top). >> >> The following about the extra test_check() was >> wrong. >> >>> Note the source code line: >>> >>> // test_check(); // Adding this line prevents failure. >>> >>> It seem that accessing the region contents before forking >>> and swapping avoids the problem. But there is a problem >>> if the region was only written-to before the fork/swap. > > There is a place that if a test_check call is put then the > problem does not happen at any stage: I tried putting a > call between the fork and the later wait/sleep code: I changed the byte sequence patterns to avoid zero values since the bad values are zeros: static value_type value(size_t v) { return (value_type)((v&0xFEu)|0x1u); } // value now avoids the zero value since the failures // are zeros. With that I can then test accurately what bytes have bad values vs. do not. I also changed to: void partial_test_check(void) { if (value(0u)!=gbl_region.array[0])raise(SIGABRT); if (value(0u)!=(*dyn_region).array[0]) raise(SIGABRT); } since previously [0] had a zero value and so I'd used [1]. On this basis I'm now using the below. See the comments tied to partial_test_check() calls: extern void test_setup(void); // Sets up the memory byte patterns. extern void test_check(void); // Tests the memory byte patterns. extern void partial_test_check(void); // Tests just [0] of each region // (gbl_region and dyn_region). int main(void) { test_setup(); test_check(); // Before fork() [passes] pid_t pid = fork(); int wait_status = 0;; // After fork; before waitsleep/swap-out. if (0==pid) partial_test_check(); // Even the above is sufficient by // itself to prevent failure for // region_size 1u through // 4u*1024u! // But 4u*1024u+1u and above fail // with this access to memory. // The failing test is of // (*dyn_region).array[4096u]. // This test never fails here. if (0 This suggests to me that the small access is forcing one or more things to > be initialized for memory access that fork is not establishing of itself. > It appears that if established correctly then the swap-out/swap-in > sequence would work okay without needing the manual access to the memory. > > > So far via this test I've not seen any evidence
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
On 2017-Mar-14, at 4:44 PM, Bernd Walterwrote: > On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: >> [test_check() between the fork and the wait/sleep prevents the >> failure from occurring. Even a small access to the memory at >> that stage prevents the failure. Details follow.] > > Maybe a stupid question, since you might have written it somewhere. > What medium do you swap to? > I've seen broken firmware on microSD cards doing silent data > corruption for some access patterns. The root filesystem is on a USB SSD on a powered hub. Only the kernel is from the microSD card. I have several examples of the USB SSD model and have never observed such problems in any other context. The original issue that started this investigation has been reported by several people on the lists: Failed assertion: "tsd_booted" on arm64 specifically, no other contexts so far as I know. Earlier I had discovered that: A) I could use a swap-in to cause the messages from instances of sh or su that had swapped out earlier. B) The core dumps showed that a large memory region containing the global tsd_booted had all turned to be zero bytes. The assert is just exposing one of those zeros. (tsd_booted is from jemalloc that is in a .so that is loaded.) This prompted me to look for simpler contexts involving swapping that also show memory corruption. So far I've only managed to produce corrupted memory when fork and later swapping are both involved. Being a shared library global is not a requirement for the problem, although such contexts can have an issue. I've not made a simpler example of that yet, although I tried. I have not explored vfork, rfork, or any other alternatives. So far all failure examples end up with zeroed memory when the memory does not match the original pattern from before the fork. At least that is what the core dumps show for all examples that I've looked at. See bugzilla 217138 and 217239. In some respects this example is more analogous to the 217239 context as I remember. My tests on amd64, armv6 (really -mcpu=cortex-a7 so armv7), and powerpc64 have never produced any problems, including never getting the failed assertion. Only arm64. (But I've access to only one arm64 system, a Pine64+ 2GB.) Prior to this I tracked down a different arm64 problem to the fork_trampline code (for the child process) modifying a system register but in a place allowing interrupts that could also change the value. Andrew Turner fixed that one at the time. For this fork/swapping kind of issue I'm not sure that I'll be able to do more than provide the simpler example and the steps that I used. My isolating the internal stage(s) and specific problem(s) at the code level of detail does not seem likely. But whatever is found needs to be able to explain the contrast with an access after the fork but before the swap preventing the failing behavior. So what I've got so far hopefully does provide some hints to someone. === Mark Millard markmi at dsl-only.net ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: > [test_check() between the fork and the wait/sleep prevents the > failure from occurring. Even a small access to the memory at > that stage prevents the failure. Details follow.] Maybe a stupid question, since you might have written it somewhere. What medium do you swap to? I've seen broken firmware on microSD cards doing silent data corruption for some access patterns. -- B.Walterhttp://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
[test_check() between the fork and the wait/sleep prevents the failure from occurring. Even a small access to the memory at that stage prevents the failure. Details follow.] On 2017-Mar-14, at 11:07 AM, Mark Millardwrote: > [This is just a correction to the subject-line text to say arm64 > instead of amd64.] > > On 2017-Mar-14, at 12:58 AM, Mark Millard wrote: > > [Another correction I'm afraid --about alternative program variations > this time.] > > On 2017-Mar-13, at 11:52 PM, Mark Millard wrote: > >> I'm still at a loss about how to figure out what stages are messed >> up. (Memory coherency? Some memory not swapped out? Bad data swapped >> out? Wrong data swapped in?) >> >> But at least I've found a much smaller/simpler example to demonstrate >> some problem with in my Pine64+_ 2GB context. >> >> The Pine64+ 2GB is the only amd64 context that I have access to. > > Someday I'll learn to type arm64 the first time instead of amd64. > >> The following program fails its check for data >> having its expected byte pattern in dynamically >> allocated memory after a fork/swap-out/swap-in >> sequence. >> >> I'll note that the program sleeps for 60s after >> forking to give time to do something else to >> cause the parent and child processes to swap >> out (RES=0 as seen in top). > > The following about the extra test_check() was > wrong. > >> Note the source code line: >> >> // test_check(); // Adding this line prevents failure. >> >> It seem that accessing the region contents before forking >> and swapping avoids the problem. But there is a problem >> if the region was only written-to before the fork/swap. There is a place that if a test_check call is put then the problem does not happen at any stage: I tried putting a call between the fork and the later wait/sleep code: int main(void) { test_setup(); test_check(); // Before fork() [passes] pid_t pid = fork(); int wait_status = 0;; // test_check(); // After fork(); before wait/sleep. // If used it prevents failure later! if (0
arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
[This is just a correction to the subject-line text to say arm64 instead of amd64.] On 2017-Mar-14, at 12:58 AM, Mark Millardwrote: [Another correction I'm afraid --about alternative program variations this time.] On 2017-Mar-13, at 11:52 PM, Mark Millard wrote: > I'm still at a loss about how to figure out what stages are messed > up. (Memory coherency? Some memory not swapped out? Bad data swapped > out? Wrong data swapped in?) > > But at least I've found a much smaller/simpler example to demonstrate > some problem with in my Pine64+_ 2GB context. > > The Pine64+ 2GB is the only amd64 context that I have access to. Someday I'll learn to type arm64 the first time instead of amd64. > The following program fails its check for data > having its expected byte pattern in dynamically > allocated memory after a fork/swap-out/swap-in > sequence. > > I'll note that the program sleeps for 60s after > forking to give time to do something else to > cause the parent and child processes to swap > out (RES=0 as seen in top). The following about the extra test_check() was wrong. > Note the source code line: > > // test_check(); // Adding this line prevents failure. > > It seem that accessing the region contents before forking > and swapping avoids the problem. But there is a problem > if the region was only written-to before the fork/swap. This was because I'd carelessly moved some loop variables to globals in a way that depended on the initialization of the globals and the extra call changed those values. I've noted code adjustments below (3 lines). I get the failures with them as well. > Another point is the size of the region matters: <= 14K Bytes > fails and > 14K Bytes works for as much has I have tested. > > > # more swap_testing.c > // swap_testing.c > > // Built via (c++ was clang++ 4.0 in my case): > // > // cc -g -std=c11 -Wpedantic swap_testing.c > // -O0 and -O2 also gets the problem. > > #include // for fork(), sleep(.) > #include // for pid_t > #include// for wait(.) > > extern void test_setup(void); // Sets up the memory byte pattern. > extern void test_check(void); // Tests the memory byte pattern. > > int main(void) > { > test_setup(); test_check(); // This test passes. > > pid_t pid = fork(); > int wait_status = 0;; > > if (0 > if (-1!=wait_status && 0<=pid) > { > if (0==pid) > { > sleep(60); > > // During this manually force this process to > // swap out. I use something like: > > // stress -m 1 --vm-bytes 1800M > > // in another shell and ^C'ing it after top > // shows the swapped status desired. 1800M > // just happened to work on the Pine64+ 2GB > // that I was using. > } > > test_check(); > } > } > > // The memory and test code follows. > > #include // for bool, true, false > #include // for size_t, NULL > #include // for malloc(.), free(.) > > #include // for raise(.), SIGABRT > > #define region_size (14u*1024u) > // Bad dyn_region pattern, parent and child > // processes: > // 256u, 4u*1024u, 8u*1024u, 9u*1024u, > // 12u*1024u, 14u*1024u > > // Works: > // 14u*1024u+1u, 15u*1024u, 16u*1024u, > // 32u*1024u, 256u*1024u*1024u > > typedef volatile unsigned char value_type; > > struct region_struct { value_type array[region_size]; }; > typedef struct region_struct region; > > static regiongbl_region; > static region * volatile dyn_region = NULL; > > static value_type value(size_t v) { return (value_type)v; } > > void test_setup(void) { > dyn_region = malloc(sizeof(region)); > if (!dyn_region) raise(SIGABRT); > > for(size_t i=0u; i (*dyn_region).array[i] = gbl_region.array[i] = value(i); > } > } > > static volatile bool gbl_failed = false; // Until potentially disproved > static volatile size_t gbl_pos = 0u; > > static volatile bool dyn_failed = false; // Until potentially disproved > static volatile size_t dyn_pos = 0u; > > void test_check(void) { gbl_pos = 0u; > while (!gbl_failed && gbl_pos gbl_failed = (value(gbl_pos) != gbl_region.array[gbl_pos]); > gbl_pos++; > } > dyn_pos = 0u; > while (!dyn_failed && dyn_pos dyn_failed = (value(dyn_pos) != (*dyn_region).array[dyn_pos]); > // Note: When the memory pattern fails this case is that > // records the failure. > dyn_pos++; > } > > if (gbl_failed) raise(SIGABRT); > if (dyn_failed) raise(SIGABRT); // lldb reports this line for the __raise > call. > // when it fails (both parent and child > processes). > } I'm not bothering to redo the details below for the line number variations. > Other details from lldb