Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-27 Thread Mark Millard
On 2017-Mar-21, at 7:21 PM, Mark Millard  wrote:

> On 2017-Mar-18, at 9:10 PM, Mark Millard  wrote:
> 
>> 
>> On 2017-Mar-18, at 5:53 PM, Mark Millard  wrote:
>> 
>>> A new, significant discovery follows. . .
>>> 
>>> While checking out use of procstat -v I ran
>>> into the following common property for the 3
>>> programs that I looked at:
>>> 
>>> A) My small test program that fails for
>>> a dynamically allocated space.
>>> 
>>> B) sh reporting Failed assertion: "tsd_booted".
>>> 
>>> C) su reporting Failed assertion: "tsd_booted".
>>> 
>>> Here are example addresses from the area of
>>> incorrectly zeroed memory (A then B then C):
>>> 
>>> (lldb) print dyn_region
>>> (region *volatile) $0 = 0x40616000
>>> 
>>> (lldb) print &__je_tsd_booted
>>> (bool *) $0 = 0x40618520
>>> 
>>> (lldb) print &__je_tsd_booted
>>> (bool *) $0 = 0x40618520
>> 
>> That last above was a copy/paste error. Correction:
>> 
>> (lldb) print &__je_tsd_booted
>> (bool *) $0 = 0x4061d520
>> 
>>> The first is from dynamic allocation ending up
>>> in the area. The other two are from libc.so.7
>>> globals/statics ending up in the general area.
>>> 
>>> It looks like something is trashing a specific
>>> memory area for some reason, rather independently
>>> of what the program specifics are.
> 
> I probably should have noted that the processes
> involved were: child/parent then grandparent
> and then great grandparent. The grandparent
> was sh and the great grandparent was su.
> 
> The ancestors in the process tree are being
> damaged, not just the instances of the
> program that demonstrates the problem.
> 
>>> Other notes:
>>> 
>>> At least for my small program showing failure:
>>> 
>>> Being explicit about the combined conditions for failure
>>> for my test program. . .
>>> 
>>> Both tcache enabled and allocations fitting in SMALL_MAXCLASS
>>> are required in order to make the program fail.
>>> 
>>> Note:
>>> 
>>> lldb) print __je_tcache_maxclass
>>> (size_t) $0 = 32768
>>> 
>>> which is larger than SMALL_MAXCLASS. I've not observed
>>> failures for sizes above SMALL_MAXCLASS but not exceeding
>>> __je_tcache_maxclass.
>>> 
>>> Thus tcache use by itself does not seen sufficient for
>>> my program to get corruption of its dynamically allocated
>>> memory: the small allocation size also matters.
>>> 
>>> 
>>> Be warned that I can not eliminate the possibility that
>>> the trashing changed what region of memory it trashed
>>> for larger allocations or when tcache is disabled.
>> 
>> The pine64+ 2GB eventually got into a state where:
>> 
>> /etc/malloc.conf -> tcache:false
>> 
>> made no difference and the failure kept occurring
>> with that symbolic link in place.
>> 
>> But after a reboot of the pin46+ 2GB
>> /etc/malloc.conf -> tcache:false was again effective
>> for my test program. (It was still present from
>> before the reboot.)
>> 
>> I checked the .core files and the allocated address
>> assigned to dyn_region was the same in the tries
>> before and after the reboot. (I had put in an
>> additional raise(SIGABRT) so I'd always have
>> a core file to look at.)
>> 
>> Apparently /etc/malloc.conf -> tcache:false was
>> being ignored before the reboot for some reason?
> 
> I have also discovered that if the child process
> in an example like my program does a:
> 
> (void) posix_madvise(dyn_region, region_size, POSIX_MADV_WILLNEED);
> 
> after the fork but before the sleep/swap-out/wait
> then the problem does not happen. This is without
> any read or write access to the memory between the
> fork and sleep/swap-out/wait.
> 
> By contrast such POSIX_MADV_WILLNEED use in the parent
> process does not change the failure behavior.

I've added another test program to bugzilla
217239 and 217138, one with thousands of 14
KiByte allocations.

The test program usually ends up with them all being
zeroed in the parent and child of the fork.

But I've had a couple of runs where a much smaller
prefix was messed up and then there were normal,
expected values.

#define region_size (14u*1024u)
. . .
#define num_regions (256u*1024u*1024u/region_size)

So num_regions==18724, using up most of 256 MiBytes.

Note: each region has its own 14 KiByte allocation.

But dyn_regions[1296].array[0] in one example was
the first normal value.

In another example dyn_regions[2180].array[4096] was
the first normal value.

The last is interesting for being part way through
an allocation's space. That but aligning with a 4
KiByte page size would seem odd for a pure-jemalloc
issue.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-21 Thread Mark Millard
On 2017-Mar-18, at 9:10 PM, Mark Millard  wrote:

> 
> On 2017-Mar-18, at 5:53 PM, Mark Millard  wrote:
> 
>> A new, significant discovery follows. . .
>> 
>> While checking out use of procstat -v I ran
>> into the following common property for the 3
>> programs that I looked at:
>> 
>> A) My small test program that fails for
>>  a dynamically allocated space.
>> 
>> B) sh reporting Failed assertion: "tsd_booted".
>> 
>> C) su reporting Failed assertion: "tsd_booted".
>> 
>> Here are example addresses from the area of
>> incorrectly zeroed memory (A then B then C):
>> 
>> (lldb) print dyn_region
>> (region *volatile) $0 = 0x40616000
>> 
>> (lldb) print &__je_tsd_booted
>> (bool *) $0 = 0x40618520
>> 
>> (lldb) print &__je_tsd_booted
>> (bool *) $0 = 0x40618520
> 
> That last above was a copy/paste error. Correction:
> 
> (lldb) print &__je_tsd_booted
> (bool *) $0 = 0x4061d520
> 
>> The first is from dynamic allocation ending up
>> in the area. The other two are from libc.so.7
>> globals/statics ending up in the general area.
>> 
>> It looks like something is trashing a specific
>> memory area for some reason, rather independently
>> of what the program specifics are.

I probably should have noted that the processes
involved were: child/parent then grandparent
and then great grandparent. The grandparent
was sh and the great grandparent was su.

The ancestors in the process tree are being
damaged, not just the instances of the
program that demonstrates the problem.

>> Other notes:
>> 
>> At least for my small program showing failure:
>> 
>> Being explicit about the combined conditions for failure
>> for my test program. . .
>> 
>> Both tcache enabled and allocations fitting in SMALL_MAXCLASS
>> are required in order to make the program fail.
>> 
>> Note:
>> 
>> lldb) print __je_tcache_maxclass
>> (size_t) $0 = 32768
>> 
>> which is larger than SMALL_MAXCLASS. I've not observed
>> failures for sizes above SMALL_MAXCLASS but not exceeding
>> __je_tcache_maxclass.
>> 
>> Thus tcache use by itself does not seen sufficient for
>> my program to get corruption of its dynamically allocated
>> memory: the small allocation size also matters.
>> 
>> 
>> Be warned that I can not eliminate the possibility that
>> the trashing changed what region of memory it trashed
>> for larger allocations or when tcache is disabled.
> 
> The pine64+ 2GB eventually got into a state where:
> 
> /etc/malloc.conf -> tcache:false
> 
> made no difference and the failure kept occurring
> with that symbolic link in place.
> 
> But after a reboot of the pin46+ 2GB
> /etc/malloc.conf -> tcache:false was again effective
> for my test program. (It was still present from
> before the reboot.)
> 
> I checked the .core files and the allocated address
> assigned to dyn_region was the same in the tries
> before and after the reboot. (I had put in an
> additional raise(SIGABRT) so I'd always have
> a core file to look at.)
> 
> Apparently /etc/malloc.conf -> tcache:false was
> being ignored before the reboot for some reason?

I have also discovered that if the child process
in an example like my program does a:

(void) posix_madvise(dyn_region, region_size, POSIX_MADV_WILLNEED);

after the fork but before the sleep/swap-out/wait
then the problem does not happen. This is without
any read or write access to the memory between the
fork and sleep/swap-out/wait.

By contrast such POSIX_MADV_WILLNEED use in the parent
process does not change the failure behavior.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-18 Thread Mark Millard

On 2017-Mar-18, at 5:53 PM, Mark Millard  wrote:

> A new, significant discovery follows. . .
> 
> While checking out use of procstat -v I ran
> into the following common property for the 3
> programs that I looked at:
> 
> A) My small test program that fails for
>   a dynamically allocated space.
> 
> B) sh reporting Failed assertion: "tsd_booted".
> 
> C) su reporting Failed assertion: "tsd_booted".
> 
> Here are example addresses from the area of
> incorrectly zeroed memory (A then B then C):
> 
> (lldb) print dyn_region
> (region *volatile) $0 = 0x40616000
> 
> (lldb) print &__je_tsd_booted
> (bool *) $0 = 0x40618520
> 
> (lldb) print &__je_tsd_booted
> (bool *) $0 = 0x40618520

That last above was a copy/paste error. Correction:

(lldb) print &__je_tsd_booted
(bool *) $0 = 0x4061d520

> The first is from dynamic allocation ending up
> in the area. The other two are from libc.so.7
> globals/statics ending up in the general area.
> 
> It looks like something is trashing a specific
> memory area for some reason, rather independently
> of what the program specifics are.
> 
> 
> Other notes:
> 
> At least for my small program showing failure:
> 
> Being explicit about the combined conditions for failure
> for my test program. . .
> 
> Both tcache enabled and allocations fitting in SMALL_MAXCLASS
> are required in order to make the program fail.
> 
> Note:
> 
> lldb) print __je_tcache_maxclass
> (size_t) $0 = 32768
> 
> which is larger than SMALL_MAXCLASS. I've not observed
> failures for sizes above SMALL_MAXCLASS but not exceeding
> __je_tcache_maxclass.
> 
> Thus tcache use by itself does not seen sufficient for
> my program to get corruption of its dynamically allocated
> memory: the small allocation size also matters.
> 
> 
> Be warned that I can not eliminate the possibility that
> the trashing changed what region of memory it trashed
> for larger allocations or when tcache is disabled.

The pine64+ 2GB eventually got into a state where:

/etc/malloc.conf -> tcache:false

made no difference and the failure kept occurring
with that symbolic link in place.

But after a reboot of the pin46+ 2GB
/etc/malloc.conf -> tcache:false was again effective
for my test program. (It was still present from
before the reboot.)

I checked the .core files and the allocated address
assigned to dyn_region was the same in the tries
before and after the reboot. (I had put in an
additional raise(SIGABRT) so I'd always have
a core file to look at.)

Apparently /etc/malloc.conf -> tcache:false was
being ignored before the reboot for some reason?


===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-18 Thread Mark Millard
A new, significant discovery follows. . .

While checking out use of procstat -v I ran
into the following common property for the 3
programs that I looked at:

A) My small test program that fails for
   a dynamically allocated space.

B) sh reporting Failed assertion: "tsd_booted".

C) su reporting Failed assertion: "tsd_booted".

Here are example addresses from the area of
incorrectly zeroed memory (A then B then C):

(lldb) print dyn_region
(region *volatile) $0 = 0x40616000

(lldb) print &__je_tsd_booted
(bool *) $0 = 0x40618520

(lldb) print &__je_tsd_booted
(bool *) $0 = 0x40618520

The first is from dynamic allocation ending up
in the area. The other two are from libc.so.7
globals/statics ending up in the general area.

It looks like something is trashing a specific
memory area for some reason, rather independently
of what the program specifics are.


Other notes:

At least for my small program showing failure:

Being explicit about the combined conditions for failure
for my test program. . .

Both tcache enabled and allocations fitting in SMALL_MAXCLASS
are required in order to make the program fail.

Note:

lldb) print __je_tcache_maxclass
(size_t) $0 = 32768

which is larger than SMALL_MAXCLASS. I've not observed
failures for sizes above SMALL_MAXCLASS but not exceeding
__je_tcache_maxclass.

Thus tcache use by itself does not seen sufficient for
my program to get corruption of its dynamically allocated
memory: the small allocation size also matters.


Be warned that I can not eliminate the possibility that
the trashing changed what region of memory it trashed
for larger allocations or when tcache is disabled.


===
Mark Millard
markmi at dsl-only.net


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-18 Thread Mark Millard
[Summary: I've now tested on a rpi3 in addition to a
pine64+ 2GB. Both contexts show the problem.]

On 2017-Mar-16, at 2:07 AM, Mark Millard  wrote:

> On 2017-Mar-15, at 11:07 PM, Scott Bennett  wrote:
> 
>> Mark Millard  wrote:
>> 
>>> [Something strange happened to the automatic CC: fill-in for my original
>>> reply. Also I should have mentioned that for my test program if a
>>> variant is made that does not fork the swapping works fine.]
>>> 
>>> On 2017-Mar-15, at 9:37 AM, Mark Millard  wrote:
>>> 
 On 2017-Mar-15, at 6:15 AM, Scott Bennett  wrote:
 
>  On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard
>  wrote:
>> On 2017-Mar-14, at 4:44 PM, Bernd Walter  wrote:
>> 
>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
 [test_check() between the fork and the wait/sleep prevents the
 failure from occurring. Even a small access to the memory at
 that stage prevents the failure. Details follow.]
>>> 
>>> Maybe a stupid question, since you might have written it somewhere.
>>> What medium do you swap to?
>>> I've seen broken firmware on microSD cards doing silent data
>>> corruption for some access patterns.
>> 
>> The root filesystem is on a USB SSD on a powered hub.
>> 
>> Only the kernel is from the microSD card.
>> 
>> I have several examples of the USB SSD model and have
>> never observed such problems in any other context.
>> 
>> [remainder of irrelevant material deleted  --SB]
> 
>  You gave a very long-winded non-answer to Bernd's question, so I'll
> repeat it here.  What medium do you swap to?
 
 My wording of:
 
 The root filesystem is on a USB SSD on a powered hub.
 
 was definitely poor. It should have explicitly mentioned the
 swap partition too:
 
 The root filesystem and swap partition are both on the same
 USB SSD on a powered hub.
 
 More detail from dmesg -a for usb:
 
 usbus0: 12Mbps Full Speed USB v1.0
 usbus1: 480Mbps High Speed USB v2.0
 usbus2: 12Mbps Full Speed USB v1.0
 usbus3: 480Mbps High Speed USB v2.0
 ugen0.1:  at usbus0
 uhub0:  on usbus0
 ugen1.1:  at usbus1
 uhub1:  on 
 usbus1
 ugen2.1:  at usbus2
 uhub2:  on usbus2
 ugen3.1:  at usbus3
 uhub3:  on 
 usbus3
 . . .
 uhub0: 1 port with 1 removable, self powered
 uhub2: 1 port with 1 removable, self powered
 uhub1: 1 port with 1 removable, self powered
 uhub3: 1 port with 1 removable, self powered
 ugen3.2:  at usbus3
 uhub4 on uhub3
 uhub4:  on 
 usbus3
 uhub4: MTT enabled
 uhub4: 4 ports with 4 removable, self powered
 ugen3.3:  at usbus3
 umass0 on uhub4
 umass0:  on usbus3
 umass0:  SCSI over Bulk-Only; quirks = 0x0100
 umass0:0:0: Attached to scbus0
 . . .
 da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
 da0:  Fixed Direct Access SPC-4 SCSI device
 da0: Serial Number 
 da0: 40.000MB/s transfers
 
 (Edited a bit because there is other material interlaced, even
 internal to some lines. Also: I removed the serial number of the
 specific example device.)
>> 
>>Thank you.  That presents a much clearer picture.
 
>  I will further note that any kind of USB device cannot automatically
> be trusted to behave properly.  USB devices are notorious, for example,
> 
> [reasons why deleted  --SB]
> 
>  You should identify where you page/swap to and then try substituting
> a different device for that function as a test to eliminate the 
> possibility
> of a bad storage device/controller.  If the problem still occurs, that
> means there still remains the possibility that another controller or its
> firmware is defective instead.  It could be a kernel bug, it is true, but
> making sure there is no hardware or firmware error occurring is important,
> and as I say, USB devices should always be considered suspect unless and
> until proven innocent.
 
 [FYI: This is a ufs context, not a zfs one.]
>> 
>>Right.  It's only a Pi, after all. :-)
> 
> It is a Pine64+ 2GB, not an rpi3.
> 
 
 I'm aware of such  things. There is no evidence that has resulted in
 suggesting the USB devices that I can replace are a problem. Otherwise
 I'd not be going down this path. I only have access to the one arm64
 device (a Pine64+ 2GB) so I've no ability to substitution-test what
 is on that board.
>> 
>>There isn't even one open port on that hub that you could plug a
>> flash drive into temporarily to be the paging device?
> 
> Why do you think that I've never tried alternative devices? It
> is just that the result was no evidence that my usually-in-use
> SSD is having a special/local problem: the behavior continues
> across all such contexts when the Pine64+ 2GB is involved. (Again
> I have not had 

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-16 Thread Scott Bennett
Mark Millard  wrote:

> [Something strange happened to the automatic CC: fill-in for my original
> reply. Also I should have mentioned that for my test program if a
> variant is made that does not fork the swapping works fine.]
>
> On 2017-Mar-15, at 9:37 AM, Mark Millard  wrote:
>
> > On 2017-Mar-15, at 6:15 AM, Scott Bennett  wrote:
> > 
> >>On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard
> >>  wrote:
> >>> On 2017-Mar-14, at 4:44 PM, Bernd Walter  wrote:
> >>> 
>  On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
> > [test_check() between the fork and the wait/sleep prevents the
> > failure from occurring. Even a small access to the memory at
> > that stage prevents the failure. Details follow.]
>  
>  Maybe a stupid question, since you might have written it somewhere.
>  What medium do you swap to?
>  I've seen broken firmware on microSD cards doing silent data
>  corruption for some access patterns.
> >>> 
> >>> The root filesystem is on a USB SSD on a powered hub.
> >>> 
> >>> Only the kernel is from the microSD card.
> >>> 
> >>> I have several examples of the USB SSD model and have
> >>> never observed such problems in any other context.
> >>> 
> >>> [remainder of irrelevant material deleted  --SB]
> >> 
> >>You gave a very long-winded non-answer to Bernd's question, so I'll
> >> repeat it here.  What medium do you swap to?
> > 
> > My wording of:
> > 
> > The root filesystem is on a USB SSD on a powered hub.
> > 
> > was definitely poor. It should have explicitly mentioned the
> > swap partition too:
> > 
> > The root filesystem and swap partition are both on the same
> > USB SSD on a powered hub.
> > 
> > More detail from dmesg -a for usb:
> > 
> > usbus0: 12Mbps Full Speed USB v1.0
> > usbus1: 480Mbps High Speed USB v2.0
> > usbus2: 12Mbps Full Speed USB v1.0
> > usbus3: 480Mbps High Speed USB v2.0
> > ugen0.1:  at usbus0
> > uhub0:  on usbus0
> > ugen1.1:  at usbus1
> > uhub1:  on usbus1
> > ugen2.1:  at usbus2
> > uhub2:  on usbus2
> > ugen3.1:  at usbus3
> > uhub3:  on usbus3
> > . . .
> > uhub0: 1 port with 1 removable, self powered
> > uhub2: 1 port with 1 removable, self powered
> > uhub1: 1 port with 1 removable, self powered
> > uhub3: 1 port with 1 removable, self powered
> > ugen3.2:  at usbus3
> > uhub4 on uhub3
> > uhub4:  on 
> > usbus3
> > uhub4: MTT enabled
> > uhub4: 4 ports with 4 removable, self powered
> > ugen3.3:  at usbus3
> > umass0 on uhub4
> > umass0:  on usbus3
> > umass0:  SCSI over Bulk-Only; quirks = 0x0100
> > umass0:0:0: Attached to scbus0
> > . . .
> > da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
> > da0:  Fixed Direct Access SPC-4 SCSI device
> > da0: Serial Number 
> > da0: 40.000MB/s transfers
> > 
> > (Edited a bit because there is other material interlaced, even
> > internal to some lines. Also: I removed the serial number of the
> > specific example device.)

 Thank you.  That presents a much clearer picture.
> > 
> >>I will further note that any kind of USB device cannot automatically
> >> be trusted to behave properly.  USB devices are notorious, for example,
> >> 
> >>   [reasons why deleted  --SB]
> >> 
> >>You should identify where you page/swap to and then try substituting
> >> a different device for that function as a test to eliminate the possibility
> >> of a bad storage device/controller.  If the problem still occurs, that
> >> means there still remains the possibility that another controller or its
> >> firmware is defective instead.  It could be a kernel bug, it is true, but
> >> making sure there is no hardware or firmware error occurring is important,
> >> and as I say, USB devices should always be considered suspect unless and
> >> until proven innocent.
> > 
> > [FYI: This is a ufs context, not a zfs one.]

 Right.  It's only a Pi, after all. :-)
> > 
> > I'm aware of such  things. There is no evidence that has resulted in
> > suggesting the USB devices that I can replace are a problem. Otherwise
> > I'd not be going down this path. I only have access to the one arm64
> > device (a Pine64+ 2GB) so I've no ability to substitution-test what
> > is on that board.

 There isn't even one open port on that hub that you could plug a
flash drive into temporarily to be the paging device?  You could then
try your tests before returning to the normal configuration.  If there
isn't an open port, then how about plugging a second hub into one of
the first hub's ports and moving the displaced device to the second
hub?  A flash drive could then be plugged in.  That kind of configuration
is obviously a bad idea for the long run, but just to try your tests it
ought to work well enough.  (BTW, if a USB storage device containing a
paging area drops off=line even momentarily and the system needs to use
it, that is the beginning of the end, even though it may take up to a few
minutes for everything to lock up.  You probably won't be able to do an
orderly 

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-16 Thread Mark Millard
On 2017-Mar-15, at 11:07 PM, Scott Bennett  wrote:

> Mark Millard  wrote:
> 
>> [Something strange happened to the automatic CC: fill-in for my original
>> reply. Also I should have mentioned that for my test program if a
>> variant is made that does not fork the swapping works fine.]
>> 
>> On 2017-Mar-15, at 9:37 AM, Mark Millard  wrote:
>> 
>>> On 2017-Mar-15, at 6:15 AM, Scott Bennett  wrote:
>>> 
   On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard
  wrote:
> On 2017-Mar-14, at 4:44 PM, Bernd Walter  wrote:
> 
>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
>>> [test_check() between the fork and the wait/sleep prevents the
>>> failure from occurring. Even a small access to the memory at
>>> that stage prevents the failure. Details follow.]
>> 
>> Maybe a stupid question, since you might have written it somewhere.
>> What medium do you swap to?
>> I've seen broken firmware on microSD cards doing silent data
>> corruption for some access patterns.
> 
> The root filesystem is on a USB SSD on a powered hub.
> 
> Only the kernel is from the microSD card.
> 
> I have several examples of the USB SSD model and have
> never observed such problems in any other context.
> 
> [remainder of irrelevant material deleted  --SB]
 
   You gave a very long-winded non-answer to Bernd's question, so I'll
 repeat it here.  What medium do you swap to?
>>> 
>>> My wording of:
>>> 
>>> The root filesystem is on a USB SSD on a powered hub.
>>> 
>>> was definitely poor. It should have explicitly mentioned the
>>> swap partition too:
>>> 
>>> The root filesystem and swap partition are both on the same
>>> USB SSD on a powered hub.
>>> 
>>> More detail from dmesg -a for usb:
>>> 
>>> usbus0: 12Mbps Full Speed USB v1.0
>>> usbus1: 480Mbps High Speed USB v2.0
>>> usbus2: 12Mbps Full Speed USB v1.0
>>> usbus3: 480Mbps High Speed USB v2.0
>>> ugen0.1:  at usbus0
>>> uhub0:  on usbus0
>>> ugen1.1:  at usbus1
>>> uhub1:  on usbus1
>>> ugen2.1:  at usbus2
>>> uhub2:  on usbus2
>>> ugen3.1:  at usbus3
>>> uhub3:  on usbus3
>>> . . .
>>> uhub0: 1 port with 1 removable, self powered
>>> uhub2: 1 port with 1 removable, self powered
>>> uhub1: 1 port with 1 removable, self powered
>>> uhub3: 1 port with 1 removable, self powered
>>> ugen3.2:  at usbus3
>>> uhub4 on uhub3
>>> uhub4:  on 
>>> usbus3
>>> uhub4: MTT enabled
>>> uhub4: 4 ports with 4 removable, self powered
>>> ugen3.3:  at usbus3
>>> umass0 on uhub4
>>> umass0:  on usbus3
>>> umass0:  SCSI over Bulk-Only; quirks = 0x0100
>>> umass0:0:0: Attached to scbus0
>>> . . .
>>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
>>> da0:  Fixed Direct Access SPC-4 SCSI device
>>> da0: Serial Number 
>>> da0: 40.000MB/s transfers
>>> 
>>> (Edited a bit because there is other material interlaced, even
>>> internal to some lines. Also: I removed the serial number of the
>>> specific example device.)
> 
> Thank you.  That presents a much clearer picture.
>>> 
   I will further note that any kind of USB device cannot automatically
 be trusted to behave properly.  USB devices are notorious, for example,
 
  [reasons why deleted  --SB]
 
   You should identify where you page/swap to and then try substituting
 a different device for that function as a test to eliminate the possibility
 of a bad storage device/controller.  If the problem still occurs, that
 means there still remains the possibility that another controller or its
 firmware is defective instead.  It could be a kernel bug, it is true, but
 making sure there is no hardware or firmware error occurring is important,
 and as I say, USB devices should always be considered suspect unless and
 until proven innocent.
>>> 
>>> [FYI: This is a ufs context, not a zfs one.]
> 
> Right.  It's only a Pi, after all. :-)

It is a Pine64+ 2GB, not an rpi3.

>>> 
>>> I'm aware of such  things. There is no evidence that has resulted in
>>> suggesting the USB devices that I can replace are a problem. Otherwise
>>> I'd not be going down this path. I only have access to the one arm64
>>> device (a Pine64+ 2GB) so I've no ability to substitution-test what
>>> is on that board.
> 
> There isn't even one open port on that hub that you could plug a
> flash drive into temporarily to be the paging device?

Why do you think that I've never tried alternative devices? It
is just that the result was no evidence that my usually-in-use
SSD is having a special/local problem: the behavior continues
across all such contexts when the Pine64+ 2GB is involved. (Again
I have not had access to an alternate to the one arm64 board.
That limits my substitution testing possibilities.)

Why would you expect a Flash drive to be better than another SSD
for such testing? (The SSD that I usually use even happens to be
a USB 3.0 SSD, capable of USB 3.0 speeds in USB 3.0 contexts. So
is 

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-15 Thread Mark Millard
[Something strange happened to the automatic CC: fill-in for my original
reply. Also I should have mentioned that for my test program if a
variant is made that does not fork the swapping works fine.]

On 2017-Mar-15, at 9:37 AM, Mark Millard  wrote:

> On 2017-Mar-15, at 6:15 AM, Scott Bennett  wrote:
> 
>>On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard
>>  wrote:
>>> On 2017-Mar-14, at 4:44 PM, Bernd Walter  wrote:
>>> 
 On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
> [test_check() between the fork and the wait/sleep prevents the
> failure from occurring. Even a small access to the memory at
> that stage prevents the failure. Details follow.]
 
 Maybe a stupid question, since you might have written it somewhere.
 What medium do you swap to?
 I've seen broken firmware on microSD cards doing silent data
 corruption for some access patterns.
>>> 
>>> The root filesystem is on a USB SSD on a powered hub.
>>> 
>>> Only the kernel is from the microSD card.
>>> 
>>> I have several examples of the USB SSD model and have
>>> never observed such problems in any other context.
>>> 
>>> [remainder of irrelevant material deleted  --SB]
>> 
>>You gave a very long-winded non-answer to Bernd's question, so I'll
>> repeat it here.  What medium do you swap to?
> 
> My wording of:
> 
> The root filesystem is on a USB SSD on a powered hub.
> 
> was definitely poor. It should have explicitly mentioned the
> swap partition too:
> 
> The root filesystem and swap partition are both on the same
> USB SSD on a powered hub.
> 
> More detail from dmesg -a for usb:
> 
> usbus0: 12Mbps Full Speed USB v1.0
> usbus1: 480Mbps High Speed USB v2.0
> usbus2: 12Mbps Full Speed USB v1.0
> usbus3: 480Mbps High Speed USB v2.0
> ugen0.1:  at usbus0
> uhub0:  on usbus0
> ugen1.1:  at usbus1
> uhub1:  on usbus1
> ugen2.1:  at usbus2
> uhub2:  on usbus2
> ugen3.1:  at usbus3
> uhub3:  on usbus3
> . . .
> uhub0: 1 port with 1 removable, self powered
> uhub2: 1 port with 1 removable, self powered
> uhub1: 1 port with 1 removable, self powered
> uhub3: 1 port with 1 removable, self powered
> ugen3.2:  at usbus3
> uhub4 on uhub3
> uhub4:  on usbus3
> uhub4: MTT enabled
> uhub4: 4 ports with 4 removable, self powered
> ugen3.3:  at usbus3
> umass0 on uhub4
> umass0:  on usbus3
> umass0:  SCSI over Bulk-Only; quirks = 0x0100
> umass0:0:0: Attached to scbus0
> . . .
> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
> da0:  Fixed Direct Access SPC-4 SCSI device
> da0: Serial Number 
> da0: 40.000MB/s transfers
> 
> (Edited a bit because there is other material interlaced, even
> internal to some lines. Also: I removed the serial number of the
> specific example device.)
> 
>>I will further note that any kind of USB device cannot automatically
>> be trusted to behave properly.  USB devices are notorious, for example,
>> for momentarily dropping off-line and then immediately reconnecting.  (ZFS
>> reacts very poorly to such events, BTW.)  This misbehavior can be caused
>> by either processor involved, i.e., the one controlling either the
>> upstream or the downstream device.  Hubs are really bad about this, but
>> any USB device can be guilty.  You may have a defective storage device,
>> its controller may be defective, or any controller in the chain all the
>> way back to the motherboard may be defective or, not defective, but
>> corrupted by having been connected to another device with corrupted
>> (infected) firmware that tries to flash itself into the firmware flash
>> chips in its potential victim.
>>Flash memory chips, spinning disks, or {S,}{D,}RAM chips can be
>> defective.  Without parity bits, the devices may return bad data and lie
>> about the presence of corrupted data.  That, for example, is where ZFS
>> is better than any kind of classical RAID because ZFS keeps checksums on
>> everything, so it has a reasonable chance of detecting corruption even
>> without parity support and, if there is any redundancy in the pool or the
>> data set, fixing the bad data machine.  Even having parity generally
>> allows only the detection of single-bit errors, but not of repairing them.
>>You should identify where you page/swap to and then try substituting
>> a different device for that function as a test to eliminate the possibility
>> of a bad storage device/controller.  If the problem still occurs, that
>> means there still remains the possibility that another controller or its
>> firmware is defective instead.  It could be a kernel bug, it is true, but
>> making sure there is no hardware or firmware error occurring is important,
>> and as I say, USB devices should always be considered suspect unless and
>> until proven innocent.
> 
> [FYI: This is a ufs context, not a zfs one.]
> 
> I'm aware of such  things. There is no evidence that has resulted in
> suggesting the USB devices that I can replace are a problem. Otherwise
> I'd not be going down this path. I 

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-14 Thread Mark Millard
A single Byte access to a 4K Byte aligned region between
the fork and wait/sleep/swap-out prevents that specific
4K Byte region from having the (bad) zeros.

Sounds like a page sized unit of behavior to me.

Details follow.

On 2017-Mar-14, at 3:28 PM, Mark Millard  wrote:

> [test_check() between the fork and the wait/sleep prevents the
> failure from occurring. Even a small access to the memory at
> that stage prevents the failure. Details follow.]
> 
> On 2017-Mar-14, at 11:07 AM, Mark Millard  wrote:
> 
>> [This is just a correction to the subject-line text to say arm64
>> instead of amd64.]
>> 
>> On 2017-Mar-14, at 12:58 AM, Mark Millard  wrote:
>> 
>> [Another correction I'm afraid --about alternative program variations
>> this time.]
>> 
>> On 2017-Mar-13, at 11:52 PM, Mark Millard  wrote:
>> 
>>> I'm still at a loss about how to figure out what stages are messed
>>> up. (Memory coherency? Some memory not swapped out? Bad data swapped
>>> out? Wrong data swapped in?)
>>> 
>>> But at least I've found a much smaller/simpler example to demonstrate
>>> some problem with in my Pine64+_ 2GB context.
>>> 
>>> The Pine64+ 2GB is the only amd64 context that I have access to.
>> 
>> Someday I'll learn to type arm64 the first time instead of amd64.
>> 
>>> The following program fails its check for data
>>> having its expected byte pattern in dynamically
>>> allocated memory after a fork/swap-out/swap-in
>>> sequence.
>>> 
>>> I'll note that the program sleeps for 60s after
>>> forking to give time to do something else to
>>> cause the parent and child processes to swap
>>> out (RES=0 as seen in top).
>> 
>> The following about the extra test_check() was
>> wrong.
>> 
>>> Note the source code line:
>>> 
>>> // test_check(); // Adding this line prevents failure.
>>> 
>>> It seem that accessing the region contents before forking
>>> and swapping avoids the problem. But there is a problem
>>> if the region was only written-to before the fork/swap.
> 
> There is a place that if a test_check call is put then the
> problem does not happen at any stage: I tried putting a
> call between the fork and the later wait/sleep code:

I changed the byte sequence patterns to avoid
zero values since the bad values are zeros:

static value_type value(size_t v) { return (value_type)((v&0xFEu)|0x1u); }
  // value now avoids the zero value since the failures
  // are zeros.

With that I can then test accurately what bytes have
bad values vs. do not. I also changed to:

void partial_test_check(void) {
if (value(0u)!=gbl_region.array[0])raise(SIGABRT);
if (value(0u)!=(*dyn_region).array[0]) raise(SIGABRT);
}

since previously [0] had a zero value and so I'd used [1].

On this basis I'm now using the below. See the comments tied
to partial_test_check() calls:

extern void test_setup(void); // Sets up the memory byte patterns.
extern void test_check(void); // Tests the memory byte patterns.
extern void partial_test_check(void); // Tests just [0] of each region
  // (gbl_region and dyn_region).

int main(void) {
test_setup();
test_check(); // Before fork() [passes]

pid_t pid = fork();
int wait_status = 0;;

// After fork; before waitsleep/swap-out.

if (0==pid) partial_test_check();
 // Even the above is sufficient by
 // itself to prevent failure for
 // region_size 1u through
 // 4u*1024u!
 // But 4u*1024u+1u and above fail
 // with this access to memory.
 // The failing test is of
 // (*dyn_region).array[4096u].
 // This test never fails here.

if (0 This suggests to me that the small access is forcing one or more things to
> be initialized for memory access that fork is not establishing of itself.
> It appears that if established correctly then the swap-out/swap-in
> sequence would work okay without needing the manual access to the memory.
> 
> 
> So far via this test I've not seen any evidence 

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-14 Thread Mark Millard
On 2017-Mar-14, at 4:44 PM, Bernd Walter  wrote:

> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
>> [test_check() between the fork and the wait/sleep prevents the
>> failure from occurring. Even a small access to the memory at
>> that stage prevents the failure. Details follow.]
> 
> Maybe a stupid question, since you might have written it somewhere.
> What medium do you swap to?
> I've seen broken firmware on microSD cards doing silent data
> corruption for some access patterns.

The root filesystem is on a USB SSD on a powered hub.

Only the kernel is from the microSD card.

I have several examples of the USB SSD model and have
never observed such problems in any other context.


The original issue that started this investigation
has been reported by several people on the lists:

Failed assertion: "tsd_booted"

on arm64 specifically, no other contexts so far as
I know. Earlier I had discovered that:

A) I could use a swap-in to cause the messages from
   instances of sh or su that had swapped out earlier.

B) The core dumps showed that a large memory region
   containing the global tsd_booted had all turned
   to be zero bytes. The assert is just exposing one
   of those zeros. (tsd_booted is from jemalloc that
   is in a .so that is loaded.)

This prompted me to look for simpler contexts involving
swapping that also show memory corruption.

So far I've only managed to produce corrupted memory when
fork and later swapping are both involved. Being a shared
library global is not a requirement for the problem,
although such contexts can have an issue. I've not made a
simpler example of that yet, although I tried.

I have not explored vfork, rfork, or any other alternatives.

So far all failure examples end up with zeroed memory when
the memory does not match the original pattern from before
the fork. At least that is what the core dumps show for all
examples that I've looked at.

See bugzilla 217138 and 217239. In some respects this example
is more analogous to the 217239 context as I remember.

My tests on amd64, armv6 (really -mcpu=cortex-a7 so armv7),
and powerpc64 have never produced any problems, including
never getting the failed assertion. Only arm64. (But I've
access to only one arm64 system, a Pine64+ 2GB.)

Prior to this I tracked down a different arm64 problem to
the fork_trampline code (for the child process) modifying
a system register but in a place allowing interrupts that
could also change the value. Andrew Turner fixed that
one at the time.

For this fork/swapping kind of issue I'm not sure that
I'll be able to do more than provide the simpler
example and the steps that I used. My isolating the
internal stage(s) and specific problem(s) at the code
level of detail does not seem likely.

But whatever is found needs to be able to explain the
contrast with an access after the fork but before the
swap preventing the failing behavior. So what I've got
so far hopefully does provide some hints to someone.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"



Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-14 Thread Bernd Walter
On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
> [test_check() between the fork and the wait/sleep prevents the
> failure from occurring. Even a small access to the memory at
> that stage prevents the failure. Details follow.]

Maybe a stupid question, since you might have written it somewhere.
What medium do you swap to?
I've seen broken firmware on microSD cards doing silent data
corruption for some access patterns.

-- 
B.Walter  http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-14 Thread Mark Millard
[test_check() between the fork and the wait/sleep prevents the
failure from occurring. Even a small access to the memory at
that stage prevents the failure. Details follow.]

On 2017-Mar-14, at 11:07 AM, Mark Millard  wrote:

> [This is just a correction to the subject-line text to say arm64
> instead of amd64.]
> 
> On 2017-Mar-14, at 12:58 AM, Mark Millard  wrote:
> 
> [Another correction I'm afraid --about alternative program variations
> this time.]
> 
> On 2017-Mar-13, at 11:52 PM, Mark Millard  wrote:
> 
>> I'm still at a loss about how to figure out what stages are messed
>> up. (Memory coherency? Some memory not swapped out? Bad data swapped
>> out? Wrong data swapped in?)
>> 
>> But at least I've found a much smaller/simpler example to demonstrate
>> some problem with in my Pine64+_ 2GB context.
>> 
>> The Pine64+ 2GB is the only amd64 context that I have access to.
> 
> Someday I'll learn to type arm64 the first time instead of amd64.
> 
>> The following program fails its check for data
>> having its expected byte pattern in dynamically
>> allocated memory after a fork/swap-out/swap-in
>> sequence.
>> 
>> I'll note that the program sleeps for 60s after
>> forking to give time to do something else to
>> cause the parent and child processes to swap
>> out (RES=0 as seen in top).
> 
> The following about the extra test_check() was
> wrong.
> 
>> Note the source code line:
>> 
>> // test_check(); // Adding this line prevents failure.
>> 
>> It seem that accessing the region contents before forking
>> and swapping avoids the problem. But there is a problem
>> if the region was only written-to before the fork/swap.

There is a place that if a test_check call is put then the
problem does not happen at any stage: I tried putting a
call between the fork and the later wait/sleep code:

int main(void) {
test_setup();
test_check(); // Before fork() [passes]

pid_t pid = fork();
int wait_status = 0;;

// test_check(); // After fork(); before wait/sleep. 
 // If used it prevents failure later!

if (0

arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

2017-03-14 Thread Mark Millard
[This is just a correction to the subject-line text to say arm64
instead of amd64.]

On 2017-Mar-14, at 12:58 AM, Mark Millard  wrote:

[Another correction I'm afraid --about alternative program variations
this time.]

On 2017-Mar-13, at 11:52 PM, Mark Millard  wrote:

> I'm still at a loss about how to figure out what stages are messed
> up. (Memory coherency? Some memory not swapped out? Bad data swapped
> out? Wrong data swapped in?)
> 
> But at least I've found a much smaller/simpler example to demonstrate
> some problem with in my Pine64+_ 2GB context.
> 
> The Pine64+ 2GB is the only amd64 context that I have access to.

Someday I'll learn to type arm64 the first time instead of amd64.

> The following program fails its check for data
> having its expected byte pattern in dynamically
> allocated memory after a fork/swap-out/swap-in
> sequence.
> 
> I'll note that the program sleeps for 60s after
> forking to give time to do something else to
> cause the parent and child processes to swap
> out (RES=0 as seen in top).

The following about the extra test_check() was
wrong.

> Note the source code line:
> 
>  // test_check(); // Adding this line prevents failure.
> 
> It seem that accessing the region contents before forking
> and swapping avoids the problem. But there is a problem
> if the region was only written-to before the fork/swap.

This was because I'd carelessly moved some loop variables to
globals in a way that depended on the initialization of the
globals and the extra call changed those values.

I've noted code adjustments below (3 lines). I get the failures
with them as well.

> Another point is the size of the region matters: <= 14K Bytes
> fails and > 14K Bytes works for as much has I have tested.
> 
> 
> # more swap_testing.c
> // swap_testing.c
> 
> // Built via (c++ was clang++ 4.0 in my case):
> //
> // cc -g -std=c11 -Wpedantic swap_testing.c
> // -O0 and -O2 also gets the problem.
> 
> #include  // for fork(), sleep(.)
> #include   // for pid_t
> #include// for wait(.)
> 
> extern void test_setup(void); // Sets up the memory byte pattern.
> extern void test_check(void); // Tests the memory byte pattern.
> 
> int main(void)
> {
>  test_setup();
  test_check(); // This test passes.
> 
>  pid_t pid = fork();
>  int wait_status = 0;;
> 
>  if (0 
>  if (-1!=wait_status && 0<=pid)
>  {
>  if (0==pid)
>  {
>  sleep(60);
> 
>  // During this manually force this process to
>  // swap out. I use something like:
> 
>  // stress -m 1 --vm-bytes 1800M
> 
>  // in another shell and ^C'ing it after top
>  // shows the swapped status desired. 1800M
>  // just happened to work on the Pine64+ 2GB
>  // that I was using.
>  }
> 
>  test_check();
>  }
> }
> 
> // The memory and test code follows.
> 
> #include // for bool, true, false
> #include  // for size_t, NULL
> #include  // for malloc(.), free(.)
> 
> #include  // for raise(.), SIGABRT
> 
> #define region_size (14u*1024u)
>  // Bad dyn_region pattern, parent and child
>  // processes:
>  //  256u, 4u*1024u, 8u*1024u, 9u*1024u,
>  // 12u*1024u, 14u*1024u
> 
>  // Works:
>  // 14u*1024u+1u, 15u*1024u, 16u*1024u,
>  // 32u*1024u, 256u*1024u*1024u
> 
> typedef volatile unsigned char value_type;
> 
> struct region_struct { value_type array[region_size]; };
> typedef struct region_struct region;
> 
> static regiongbl_region;
> static region * volatile dyn_region = NULL;
> 
> static value_type value(size_t v) { return (value_type)v; }
> 
> void test_setup(void) {
>  dyn_region = malloc(sizeof(region));
>  if (!dyn_region) raise(SIGABRT);
> 
>  for(size_t i=0u; i  (*dyn_region).array[i] = gbl_region.array[i] = value(i);
>  }
> }
> 
> static volatile bool gbl_failed = false; // Until potentially disproved
> static volatile size_t gbl_pos = 0u;
> 
> static volatile bool dyn_failed = false; // Until potentially disproved
> static volatile size_t dyn_pos = 0u;
> 
> void test_check(void) {
  gbl_pos = 0u;
>  while (!gbl_failed && gbl_pos  gbl_failed = (value(gbl_pos) != gbl_region.array[gbl_pos]);
>  gbl_pos++;
>  }
> 
  dyn_pos = 0u;
>  while (!dyn_failed && dyn_pos  dyn_failed = (value(dyn_pos) != (*dyn_region).array[dyn_pos]);
>  // Note: When the memory pattern fails this case is that
>  //   records the failure.
>  dyn_pos++;
>  }
> 
>  if (gbl_failed) raise(SIGABRT);
>  if (dyn_failed) raise(SIGABRT); // lldb reports this line for the __raise 
> call.
>  // when it fails (both parent and child 
> processes).
> }

I'm not bothering to redo the details below for the
line number variations.

> Other details from lldb