Re: 11.0-CURRENT (r292413) on a rpi2b: arm-gnueabi-freebsd/bin/ar, _fseeko, and memset vs memory alignment (SCTRL bit[1]=1?): Explains the Bus error?

2015-12-28 Thread Mark Millard

On 2015-Dec-26, at 8:45 AM, Warner Losh  wrote:

> Thanks, it sounds like I fixed a bug, but there’s more.
> 
> What were the specific port so I can test it here?
> 
> And to be clear, this is a buildworld on the RPi 2 using the cross-built 
> world with CPUTYPE=armv7a or some such, right?
> 
> Warner
> 
>> On Dec 25, 2015, at 9:32 PM, Mark Millard  wrote:
>> 
>> [I am again breaking off another section of older material.]
>> 
>> Mixed news I'm afraid.
>> 
>> The specific couple of ports that I attempted did build, the same ones that 
>> originally got the Bus Error in ar using (indirectly) _fseeko and memset 
>> that I reported. So I expect that you fixed one error.
>> 
>> But when I tried to buildworld, clang++ 3.7 processing 
>> usr/src/lib/clang/libllvmtablegen/ materials quickly got a Bus Error at 
>> nearly the same type of instruction (it has a "!" below that the earlier one 
>> did not), but with r4 holding the misaligned address this time:
>> 
>>> --- _bootstrap-tools-lib/clang/libllvmsupport ---
>>> --- APFloat.o ---
>>> clang++: error: unable to execute command: Bus error (core dumped)
>>> . . .
>>> # gdb clang++ usr/src/lib/clang/libllvmtablegen/clang++.core
>>> . . .
>>> Core was generated by `clang++'.
>>> Program terminated with signal 10, Bus error.
>>> #0  0x00c3bb9c in 
>>> clang::DependentTemplateSpecializationType::DependentTemplateSpecializationType
>>>  ()
>>> [New Thread 22a18000 (LWP 100128/)]
>>> (gdb) x/40i 0x00c3bb60
>>> . . .
>>> 0xc3bb9c 
>>> <_ZN5clang35DependentTemplateSpecializationTypeC2ENS_21ElaboratedTypeKeywordEPNS_19NestedNameSpecifierEPKNS_14IdentifierInfoEjPKNS_16TemplateArgumentENS_8QualTypeE+356>:
>>>   vst1.64   {d16-d17}, [r4]!
>>> . . .
>>> (gdb) info all-registers
>>> r0 0xbfbf81a8   -1077968472
>>> r1 0x22f07e14   586186260
>>> r2 0xc416bc 12850876
>>> r3 0x2  2
>>> r4 0x22f07dfc   586186236
>>> . . .
>> 
>> 
>> Thus it appears that there is more code around that likely generates 
>> pointers not aligned so to allow the code generation that is in use for what 
>> is pointed to.
>> 
>> At this point I have no clue if the issue is just inside clang itself vs. if 
>> it is in something that clang is layered on top of. Nor if there is just one 
>> bad thing or many.
>> 
>> Note: I had not yet tried buildworld/buildkernel for the context of the "-f" 
>> option that I was experimenting with earlier. So I do not have a direct 
>> compare and contrast at this point.

Somehow I did not notice your E-mail at the time. Meanwhile I've more evidence. 
. .

[Initial context for notes: Before updating to 11.0-CURRENT -r292756 and its 
clang/clang++ 3.7.1.]

Example c++ program that clang++ got an internal Bus Error for:

> # more main.cc
> #include 
> int
> main ()
> {
> std::ostream *o; return 0;
> }

Of course the include makes the source being processed non-trivial.

Going in a different direction. . . dmesg -a | grep "core dumped" on the rpi2 
showed:

> pid 22238 (msgfmt), uid 0: exited on signal 11 (core dumped)
> pid 22250 (xgettext), uid 0: exited on signal 11 (core dumped)
> pid 22259 (msgmerge), uid 0: exited on signal 11 (core dumped)
> pid 26149 (msgfmt), uid 0: exited on signal 11 (core dumped)
> pid 26161 (xgettext), uid 0: exited on signal 11 (core dumped)
> pid 26170 (msgmerge), uid 0: exited on signal 11 (core dumped)
> pid 28826 (c++), uid 0: exited on signal 10 (core dumped)
> pid 29202 (c++), uid 0: exited on signal 10 (core dumped)
> pid 29282 (c++), uid 0: exited on signal 10 (core dumped)
> pid 29292 (clang++), uid 0: exited on signal 10 (core dumped)

Only the c++/clang++ contexts (same but for name) seemed to be leaving .core 
files behind.

The older log files also showed examples like the following from ports building 
activity:

> /var/log/dmesg.today:pid 18763 (conftest), uid 0: exited on signal 11 (core 
> dumped)
> /var/log/dmesg.today:pid 18916 (conftest), uid 0: exited on signal 11 (core 
> dumped)

(The original ar that I started with showed as well, the records went back that 
far at the time.)

[New -r292756 context. . .]

After the above I updated to:

> $ freebsd-version -ku; uname -aKU
> 11.0-CURRENT
> 11.0-CURRENT
> FreeBSD rpi2 11.0-CURRENT FreeBSD 11.0-CURRENT #4 r292756M: Sun Dec 27 
> 02:55:57 PST 2015 
> root@FreeBSDx64:/usr/obj/clang/arm.armv6/usr/src/sys/RPI2-NODBG  arm 1100092 
> 1100092

in order to pick up clang 3.7.1. I used -fmax-type-align=4 
-mno-unaligned-access in the src.conf file for the buildworld buildkernel 
amd64->rpi2 cross build before installing both parts on the rpi2 media.

On the rpi2 itself the resulting c++/clang++ still gets Bus Error during 
buildworld despite the use of -fmax-type-align=4 -mno-unaligned-acces in the 
amd64 hosted cross build (and in the rpi2 attempted rebuild). An example crash 
report is:

> /usr/bin/clang++ -B/usr/local/arm-gnueabi-freebsd/bin -march=armv7a 
> -fmax-type-align=4 

Re: 11.0-CURRENT (r292413) on a rpi2b: arm-gnueabi-freebsd/bin/ar, _fseeko, and memset vs memory alignment (SCTRL bit[1]=1?): Explains the Bus error?

2015-12-28 Thread Mark Millard
[I have both dropped a bunch of older history and started a new break.]

The clang++ Bus Errors are a compiler implementation defect(!), as shown below. 
(Presumes they want clang++ to work in contexts where alignment is required.) 
In summary:

The clang++ source code presumes that there are no alignment criteria to be met 
for TemplateArgument instances from the "arg buffer" for any 
DependentTemplateSpecializationType instance.


The details. . .

I finally have a 11-line example source file (no includes) that crashes clang++ 
on the rpi2. (The example is a partial item from libc++'s .)

> # more main.cc
> template 
> struct __has_rebind
> {
> template  static char __test(typename _Xp::template 
> rebind<_Up>* = 0);
> };
> 
> int
> main ()
> {
> return 0;
> }

The backtrace in clang++ looks like:

> Program terminated with signal 10, Bus error.
> #0  0x00c404d0 in 
> clang::DependentTemplateSpecializationType::DependentTemplateSpecializationType
>  ()
> [New Thread 22a18000 (LWP 100182/)]
> (gdb) bt
> #0  0x00c404d0 in 
> clang::DependentTemplateSpecializationType::DependentTemplateSpecializationType
>  ()
> #1  0x00d86634 in clang::ASTContext::getDependentTemplateSpecializationType ()
> #2  0x00d865d8 in clang::ASTContext::getDependentTemplateSpecializationType ()
> #3  0x00d862d4 in clang::ASTContext::getDependentTemplateSpecializationType ()
> #4  0x00553b7c in clang::Sema::ActOnTypenameType ()
> #5  0x0040cb68 in clang::Parser::TryAnnotateTypeOrScopeToken ()
> #6  0x00471198 in $a.28 ()
> #7  0x00471198 in $a.28 ()
> (gdb) x/1i 0x00c404d0
> 0xc404d0 
> <_ZN5clang35DependentTemplateSpecializationTypeC2ENS_21ElaboratedTypeKeywordEPNS_19NestedNameSpecifierEPKNS_14IdentifierInfoEjPKNS_16TemplateArgumentENS_8QualTypeE+356>:
> 
> vst1.64   {d16-d17}, [r4]!
> (gdb) info all-registers
> r0 0xbfbf9778 -1077962888
> r1 0x22ac59c4 581720516
> r2 0xc45ff8   12869624
> r3 0x22
> r4 0x22ac59ac 581720492
. . .

The code involved is from lib/AST/Type.cpp :

> DependentTemplateSpecializationType::DependentTemplateSpecializationType(
>  ElaboratedTypeKeyword Keyword,
>  NestedNameSpecifier *NNS, const IdentifierInfo *Name,
>  unsigned NumArgs, const TemplateArgument *Args,
>  QualType Canon)
>   : TypeWithKeyword(Keyword, DependentTemplateSpecialization, Canon, true, 
> true,
> /*VariablyModified=*/false,
> NNS && NNS->containsUnexpandedParameterPack()),
> NNS(NNS), Name(Name), NumArgs(NumArgs) {
>   assert((!NNS || NNS->isDependent()) &&
>  "DependentTemplateSpecializatonType requires dependent qualifier");
>   for (unsigned I = 0; I != NumArgs; ++I) {
> if (Args[I].containsUnexpandedParameterPack())
>   setContainsUnexpandedParameterPack();
>   
> new (()[I]) TemplateArgument(Args[I]);
>   }
> }

The failing code is for the "placement new" in the loop:

A) ()[I] is not always an address for which the vst1.64 
instruction gets an aligned address.

but. . .

B) TemplateArgument(Args[I])'s copy construction activity has code (such as the 
vst1.64) requiring a specific alignment when SCTLR bit[1]==1.

C) Nothing here has any explicitly packed data structures.

As for (A):

> class DependentTemplateSpecializationType :
>   public TypeWithKeyword, public llvm::FoldingSetNode {
> . . .
>   const TemplateArgument *getArgBuffer() const {
> return reinterpret_cast(this+1);
>   }
>   TemplateArgument *getArgBuffer() {
> return reinterpret_cast(this+1);
>   }

clang++ is over-allocating the space for the 
DependentTemplateSpecializationType objects and using the extra space that is 
afterwards to hold (a somewhat C-style array of) TemplateArgument instances. 
But the logic for this does nothing explicit about alignment of the 
TemplateArgument instance pointers, not even partially via explicitly 
controlling sizeof(DependentTemplateSpecializationType).

This code does not explicitly force any specific minimum TemplateArgument 
alignment, other than 1.

Separately there is the issue that the code produced did not treat the pointers 
returned from getArgBuffer() methods as "opaque pointer" examples but they are. 
Having compiled with -fmax-type-align=4 the code should have not have required 
8 byte alignment (vst1.64). It should have produced code that required 4 (or 2 
or 1). Quoting for -fmax-type-align=?:

> Instruct the code generator to not enforce a higher alignment than the given 
> number (of bytes) when accessing memory via an opaque pointer or reference


Those pointers certainly are opaque and should be treated as such. The 
"reinterpret_cast" use is a big clue that clang++ should respect.

In other words: I see two clang++ defects in the overall evidence, one of which 
directly leads to the Bus Errors being possible.

The script of the 

Re: 11.0-CURRENT (r292413) on a rpi2b: arm-gnueabi-freebsd/bin/ar, _fseeko, and memset vs memory alignment (SCTRL bit[1]=1?): Explains the Bus error?

2015-12-26 Thread Mark Millard
On 2015-Dec-26, at 9:00 AM, Ian Lepore  wrote:

> On Fri, 2015-12-25 at 17:21 -0800, Mark Millard wrote:
>> In my view "-mno-unaligned-access" is an even bigger hammer than I
>> used. I find no clang statement about what its ABI consequences would
>> be, unlike for what I did: What mix of more padding for alignment vs.
>> more but smaller accesses? But as I remember I've seen "-mno
>> -unaligned-access" in use in ports and the like so its consequences
>> may be familiar material for some folks.
>> 
>> Absent any questions about ABI consequences "-mno-unaligned-access"
>> does well mark the expected SCTLR bit[1] status, far better than what
>> I did. Again: I was covering my ignorance while making any
>> significant investigation/debugging as unlikely as I could.
> 
> After reading the docs more carefully, I think -mno-unaligned-access
> isn't a bigger hammer, it's just a different tool that addresses a
> different problem than the one you ran into, and it's one we need.  In
> particular, it prevents alignment-required accesses to potentially
> unaligned fields in a struct marked as 'packed', which is something we
> rely on (it's why we mark some structs as packed).
> 
> -- Ian
> 
> 

If clang uses the same interpretation as gcc for arm then I agree:

> -munaligned-access
> -mno-unaligned-access
> Enables (or disables) reading and writing of 16- and 32- bit values from 
> addresses that are not 16- or 32- bit aligned. By default unaligned access is 
> disabled for all pre-ARMv6 and all ARMv6-M architectures, and enabled for all 
> other architectures. If unaligned access is not enabled then words in packed 
> data structures are accessed a byte at a time.



I see that linux went with SCTLR bit[1] being cleared for >= armv6 for the 
kernel: 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8428e84d42179c2a00f5f6450866e70d802d1d05


Interestingly clang -cc1 -help only mentions -mno-unaligned-access as a note to 
-mstrict-align:

> # clang++ -cc1 -help | grep align
>   -fmax-type-align=
>   Specify the maximum alignment to enforce on 
> pointers lacking an explicit alignment
>   -fno-bitfield-type-align
>   Ignore bit-field types when aligning structures
>   -fpack-struct=   Specify the default maximum struct packing alignment
>   -mstack-alignment=
>   Set the stack alignment
>   -mstackrealign  Force realign the stack at entry to every function
>   -mstrict-align  Force all memory accesses to be aligned (same as 
> mno-unaligned-access)


Also -munaligned-access is not mentioned at all. Apparently "clang -cc1 -help" 
does not generally document gcc compatibility syntax.

gcc's AArch64 page 
https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options only 
mentions -mstrict-align : "Do not assume that unaligned memory references are 
handled by the system". (Not as explicit for interpretation as the 
earlier-quoted arm wording.)

gcc's arm page https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html#ARM-Options 
only mentions -munaligned-access and -mno-unaligned-access (as quoted earlier), 
not -mstrict-align .

powerpc's page at 
https://gcc.gnu.org/onlinedocs/gcc/RS_002f6000-and-PowerPC-Options.html#RS_002f6000-and-PowerPC-Options
 only mentions -mstrict-align and -mno-strict-align : "On System V.4 and 
embedded PowerPC systems do not (do) assume that unaligned memory references 
are handled by the system".

It looks like being compatible for the command line syntax requires separate 
cases across architectures, especially when spanning both clang and gcc.

___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


Re: 11.0-CURRENT (r292413) on a rpi2b: arm-gnueabi-freebsd/bin/ar, _fseeko, and memset vs memory alignment (SCTRL bit[1]=1?): Explains the Bus error?

2015-12-26 Thread Warner Losh
Thanks, it sounds like I fixed a bug, but there’s more.

What were the specific port so I can test it here?

And to be clear, this is a buildworld on the RPi 2 using the cross-built world 
with CPUTYPE=armv7a or some such, right?

Warner

> On Dec 25, 2015, at 9:32 PM, Mark Millard  wrote:
> 
> [I am again breaking off another section of older material.]
> 
> Mixed news I'm afraid.
> 
> The specific couple of ports that I attempted did build, the same ones that 
> originally got the Bus Error in ar using (indirectly) _fseeko and memset that 
> I reported. So I expect that you fixed one error.
> 
> But when I tried to buildworld, clang++ 3.7 processing 
> usr/src/lib/clang/libllvmtablegen/ materials quickly got a Bus Error at 
> nearly the same type of instruction (it has a "!" below that the earlier one 
> did not), but with r4 holding the misaligned address this time:
> 
>> --- _bootstrap-tools-lib/clang/libllvmsupport ---
>> --- APFloat.o ---
>> clang++: error: unable to execute command: Bus error (core dumped)
>> . . .
>> # gdb clang++ usr/src/lib/clang/libllvmtablegen/clang++.core
>> . . .
>> Core was generated by `clang++'.
>> Program terminated with signal 10, Bus error.
>> #0  0x00c3bb9c in 
>> clang::DependentTemplateSpecializationType::DependentTemplateSpecializationType
>>  ()
>> [New Thread 22a18000 (LWP 100128/)]
>> (gdb) x/40i 0x00c3bb60
>> . . .
>> 0xc3bb9c 
>> <_ZN5clang35DependentTemplateSpecializationTypeC2ENS_21ElaboratedTypeKeywordEPNS_19NestedNameSpecifierEPKNS_14IdentifierInfoEjPKNS_16TemplateArgumentENS_8QualTypeE+356>:
>>vst1.64   {d16-d17}, [r4]!
>> . . .
>> (gdb) info all-registers
>> r0 0xbfbf81a8-1077968472
>> r1 0x22f07e14586186260
>> r2 0xc416bc  12850876
>> r3 0x2   2
>> r4 0x22f07dfc586186236
>> . . .
> 
> 
> Thus it appears that there is more code around that likely generates pointers 
> not aligned so to allow the code generation that is in use for what is 
> pointed to.
> 
> At this point I have no clue if the issue is just inside clang itself vs. if 
> it is in something that clang is layered on top of. Nor if there is just one 
> bad thing or many.
> 
> Note: I had not yet tried buildworld/buildkernel for the context of the "-f" 
> option that I was experimenting with earlier. So I do not have a direct 
> compare and contrast at this point.
> 
> 
> 
> Older material:
> 
> On 2015-Dec-25, at 5:21 PM, Mark Millard  wrote:
> 
>> On 2015-Dec-25, at 3:42 PM, Warner Losh  wrote:
>> 
>> 
>>> On Dec 25, 2015, at 3:14 PM, Mark Millard  wrote:
>>> 
>>> [I'm going to break much of the earlier "original material" text to tail of 
>>> the message.]
>>> 
 On 2015-Dec-25, at 11:53 AM, Warner Losh  wrote:
 
 So what happens if we actually fix the underlying bug?
 
 I see two ways of doing this. In findfp.c, we allocate an array of FILE * 
 today like:
g = (struct glue *)malloc(sizeof(*g) + ALIGNBYTES + n * sizeof(FILE));
 but that assumes that FILE just has normal pointer alignment requirements. 
 However,
 due to the mbstate having int64_t alignment requirements, this is wrong. 
 Maybe we
 need to do something like
g = (struct glue *)malloc(sizeof(*g) + max(sizeof(int64_t),ALIGNBYTES) 
 + n * sizeof(FILE));
 which wouldn’t change anything on LP64 systems, but would result in proper 
 alignment
 for ILP32 systems. We’d have to fix the loop that uses ALIGN afterwards to 
 use
 roundup. Instead, we’d need to round up to the neared 8-byte aligned 
 offset (or technically,
 the max of ALIGNBYTES and 8, but that’s always 8 on today’s systems. If we 
 do this,
 we can make sure that each file is 8-byte aligned or better. We may need 
 to round up
 sizeof(FILE) to a multiple of 8 as well. I believe that since it has the 
 8-byte alignment
 for a member, its size must be a multiple of 8, but I’ve not chased that 
 belief to ground.
 If not, we may need another decorator (__aligned(8), I think, spelled with 
 the ugly
 max expression above). That way, the contract we’re making with the 
 compiler will
 always be true. ALIGN BYTES is 4 on Arm anyway, so that bit is clearly 
 wrong.
 
 This wouldn’t be an ABI change, since you can only get a valid FILE * from 
 fopen (and
 friends), plus stdin, stdout, and stderr. Those addresses aren’t hard 
 coded into binaries,
 so even if we have to tweak the last three and deal with some ‘fake’ FILE 
 abuse in libc
 (which I don’t think suffers from this issue, btw, given the alignment 
 requirements that would
 naturally follow from something on the stack), we’d still be ahead. At 
 least for all CONFORMING
 implementations[*]...
 
 TL;DR: Why not make FILE * always 8-byte aligned? The 

Re: 11.0-CURRENT (r292413) on a rpi2b: arm-gnueabi-freebsd/bin/ar, _fseeko, and memset vs memory alignment (SCTRL bit[1]=1?): Explains the Bus error?

2015-12-26 Thread Ian Lepore
On Fri, 2015-12-25 at 17:21 -0800, Mark Millard wrote:
> In my view "-mno-unaligned-access" is an even bigger hammer than I
> used. I find no clang statement about what its ABI consequences would
> be, unlike for what I did: What mix of more padding for alignment vs.
> more but smaller accesses? But as I remember I've seen "-mno
> -unaligned-access" in use in ports and the like so its consequences
> may be familiar material for some folks.
> 
> Absent any questions about ABI consequences "-mno-unaligned-access"
> does well mark the expected SCTLR bit[1] status, far better than what
> I did. Again: I was covering my ignorance while making any
> significant investigation/debugging as unlikely as I could.

After reading the docs more carefully, I think -mno-unaligned-access
isn't a bigger hammer, it's just a different tool that addresses a
different problem than the one you ran into, and it's one we need.  In
particular, it prevents alignment-required accesses to potentially
unaligned fields in a struct marked as 'packed', which is something we
rely on (it's why we mark some structs as packed).

-- Ian

___
freebsd-toolchain@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "freebsd-toolchain-unsubscr...@freebsd.org"


Re: 11.0-CURRENT (r292413) on a rpi2b: arm-gnueabi-freebsd/bin/ar, _fseeko, and memset vs memory alignment (SCTRL bit[1]=1?): Explains the Bus error?

2015-12-25 Thread Mark Millard
On 2015-Dec-24, at 10:39 PM, Mark Millard  wrote:

> [I do not know if this partial crash analysis related to on-arm 
> clang-associated activity is good enough and appropriate to submit or not.]
> 
> The /usr/local/arm-gnueabi-freebsd/bin/ar on the rpi2b involved below came 
> from pkg install activity instead of port building. Used as-is.
> 
> When I just tried my first from-rpi2b builds (ports for a rpi2b), 
> /usr/local/arm-gnueabi-freebsd/bin/ar crashed. I believe that the following 
> suggests an alignment error for the type of instructions that memset for 128 
> bytes was translated to (sizeof(mbstate_t)) in the code used by 
> /usr/local/arm-gnueabi-freebsd/bin/ar. (But I do not know how to check SCTLR 
> bit[1] to be directly sure that alignment was being enforced.)
> 
> The crash was a Bus error in /usr/local/arm-gnueabi-freebsd/bin/ar :
> 
>> libtool: link: /usr/local/arm-gnueabi-freebsd/bin/ar cru .libs/libgnuintl.a  
>> bindtextdom.o dcgettext.o dgettext.o gettext.o finddomain.o hash-string.o 
>> loadmsgcat.o localealias.o textdomain.o l10nflist.o explodename.o 
>> dcigettext.o dcngettext.o dngettext.o ngettext.o pluralx.o plural-exp.o 
>> localcharset.o threadlib.o lock.o relocatable.o langprefs.o localename.o 
>> log.o printf.o setlocale.o version.o xsize.o osdep.o intl-compat.o
>> Bus error (core dumped)
>> *** [libgnuintl.la] Error code 138
> 
> It failed in _fseeko doing a memset that turned into uses of "vst1.64 
> {d16-d17}, [r0]" instructions, for an address in register r0 that ended in 
> 0xa4, so was not aligned to 8 byte boundaries. From what I read such "VSTn 
> (multiple n-element structures)" that have .64 require 8 byte alignment. The 
> evidence of the code and register value follow.
> 
>> # gdb /usr/local/arm-gnueabi-freebsd/bin/ar 
>> /usr/obj/portswork/usr/ports/devel/gettext-tools/work/gettext-0.19.6/gettext-tools/intl/ar.core
>> . . .
>> #0  0x2033adcc in _fseeko (fp=0x20651dcc, offset=, 
>> whence=, ltest=) at 
>> /usr/src/lib/libc/stdio/fseek.c:299
>> 299  memset(>_mbstate, 0, sizeof(mbstate_t));
>> . . .
>> (gdb) x/24i 0x2033adb0
>> 0x2033adb0 <_fseeko+836>:vmov.i32q8, #0  ; 0x
>> 0x2033adb4 <_fseeko+840>:movwr1, #65503  ; 0xffdf
>> 0x2033adb8 <_fseeko+844>:stm r4, {r0, r7}
>> 0x2033adbc <_fseeko+848>:ldrhr0, [r4, #12]
>> 0x2033adc0 <_fseeko+852>:and r0, r0, r1
>> 0x2033adc4 <_fseeko+856>:strhr0, [r4, #12]
>> 0x2033adc8 <_fseeko+860>:add r0, r4, #216; 0xd8
>> 0x2033adcc <_fseeko+864>:vst1.64 {d16-d17}, [r0]
>> 0x2033add0 <_fseeko+868>:add r0, r4, #200; 0xc8
>> 0x2033add4 <_fseeko+872>:vst1.64 {d16-d17}, [r0]
>> 0x2033add8 <_fseeko+876>:add r0, r4, #184; 0xb8
>> 0x2033addc <_fseeko+880>:vst1.64 {d16-d17}, [r0]
>> 0x2033ade0 <_fseeko+884>:add r0, r4, #168; 0xa8
>> 0x2033ade4 <_fseeko+888>:vst1.64 {d16-d17}, [r0]
>> 0x2033ade8 <_fseeko+892>:add r0, r4, #152; 0x98
>> 0x2033adec <_fseeko+896>:vst1.64 {d16-d17}, [r0]
>> 0x2033adf0 <_fseeko+900>:add r0, r4, #136; 0x88
>> 0x2033adf4 <_fseeko+904>:vst1.64 {d16-d17}, [r0]
>> 0x2033adf8 <_fseeko+908>:add r0, r4, #120; 0x78
>> 0x2033adfc <_fseeko+912>:vst1.64 {d16-d17}, [r0]
>> 0x2033ae00 <_fseeko+916>:add r0, r4, #104; 0x68
>> 0x2033ae04 <_fseeko+920>:vst1.64 {d16-d17}, [r0]
>> 0x2033ae08 <_fseeko+924>:b   0x2033b070 <_fseeko+1540>
>> 0x2033ae0c <_fseeko+928>:cmp r5, #0  ; 0x0
>> (gdb) info all-registers
>> r0 0x20651ea4543497892
>> r1 0xffdf65503
>> r2 0x0   0
>> r3 0x0   0
>> r4 0x20651dcc543497676
>> r5 0x0   0
>> r6 0x0   0
>> r7 0x0   0
>> r8 0x20359df4540384756
>> r9 0x0   0
>> r100x0   0
>> r110xbfbfb948-1077954232
>> r120x2037b208540520968
>> sp 0xbfbfb898-1077954408
>> lr 0x2035a004540385284
>> pc 0x2033adcc540257740
>> f0 0 (raw 0x)
>> f1 0 (raw 0x)
>> f2 0 (raw 0x)
>> f3 0 (raw 0x)
>> f4 0 (raw 0x)
>> f5 0 (raw 0x)
>> f6 0 (raw 0x)
>> f7 0 (raw 0x)
>> fps0x0   0
>> cpsr   0x60101610612752
> 
> The syntax in use for vst1.64 instructions does not explicitly have the 
> alignment notation. Presuming that the decoding is correct then from what I 
> read the following applies:
> 
>> Home > NEON and VFP Programming > NEON load and store element and structure 
>> instructions > Alignment restrictions in load and store, 

Re: 11.0-CURRENT (r292413) on a rpi2b: arm-gnueabi-freebsd/bin/ar, _fseeko, and memset vs memory alignment (SCTRL bit[1]=1?): Explains the Bus error?

2015-12-25 Thread Mark Millard
[Good News Summary: Rebuilding buildworld/buildkernel for rpi2 11.0-CURRENT 
292413 from amd64 based on adding -fmax-type-align=4 has so far removed the 
crashes during the toolchain activity: no more misaligned accesses in libc's 
_fseeko or elsewhere.]

On 2015-Dec-25, at 12:31 AM, Mark Millard  wrote:

> On 2015-Dec-24, at 10:39 PM, Mark Millard  wrote:
> 
>> [I do not know if this partial crash analysis related to on-arm 
>> clang-associated activity is good enough and appropriate to submit or not.]
>> 
>> The /usr/local/arm-gnueabi-freebsd/bin/ar on the rpi2b involved below came 
>> from pkg install activity instead of port building. Used as-is.
>> 
>> When I just tried my first from-rpi2b builds (ports for a rpi2b), 
>> /usr/local/arm-gnueabi-freebsd/bin/ar crashed. I believe that the following 
>> suggests an alignment error for the type of instructions that memset for 128 
>> bytes was translated to (sizeof(mbstate_t)) in the code used by 
>> /usr/local/arm-gnueabi-freebsd/bin/ar. (But I do not know how to check SCTLR 
>> bit[1] to be directly sure that alignment was being enforced.)
>> 
>> The crash was a Bus error in /usr/local/arm-gnueabi-freebsd/bin/ar :
>> 
>>> libtool: link: /usr/local/arm-gnueabi-freebsd/bin/ar cru .libs/libgnuintl.a 
>>>  bindtextdom.o dcgettext.o dgettext.o gettext.o finddomain.o hash-string.o 
>>> loadmsgcat.o localealias.o textdomain.o l10nflist.o explodename.o 
>>> dcigettext.o dcngettext.o dngettext.o ngettext.o pluralx.o plural-exp.o 
>>> localcharset.o threadlib.o lock.o relocatable.o langprefs.o localename.o 
>>> log.o printf.o setlocale.o version.o xsize.o osdep.o intl-compat.o
>>> Bus error (core dumped)
>>> *** [libgnuintl.la] Error code 138
>> 
>> It failed in _fseeko doing a memset that turned into uses of "vst1.64
>> {d16-d17}, [r0]" instructions, for an address in register r0 that ended in 
>> 0xa4, so was not aligned to 8 byte boundaries. From what I read such "VSTn 
>> (multiple n-element structures)" that have .64 require 8 byte alignment. The 
>> evidence of the code and register value follow.
>> 
>>> # gdb /usr/local/arm-gnueabi-freebsd/bin/ar 
>>> /usr/obj/portswork/usr/ports/devel/gettext-tools/work/gettext-0.19.6/gettext-tools/intl/ar.core
>>> . . .
>>> #0  0x2033adcc in _fseeko (fp=0x20651dcc, offset=, 
>>> whence=, ltest=) at 
>>> /usr/src/lib/libc/stdio/fseek.c:299
>>> 299 memset(>_mbstate, 0, sizeof(mbstate_t));
>>> . . .
>>> (gdb) x/24i 0x2033adb0
>>> 0x2033adb0 <_fseeko+836>:   vmov.i32q8, #0  ; 0x
>>> 0x2033adb4 <_fseeko+840>:   movwr1, #65503  ; 0xffdf
>>> 0x2033adb8 <_fseeko+844>:   stm r4, {r0, r7}
>>> 0x2033adbc <_fseeko+848>:   ldrhr0, [r4, #12]
>>> 0x2033adc0 <_fseeko+852>:   and r0, r0, r1
>>> 0x2033adc4 <_fseeko+856>:   strhr0, [r4, #12]
>>> 0x2033adc8 <_fseeko+860>:   add r0, r4, #216; 0xd8
>>> 0x2033adcc <_fseeko+864>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033add0 <_fseeko+868>:   add r0, r4, #200; 0xc8
>>> 0x2033add4 <_fseeko+872>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033add8 <_fseeko+876>:   add r0, r4, #184; 0xb8
>>> 0x2033addc <_fseeko+880>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033ade0 <_fseeko+884>:   add r0, r4, #168; 0xa8
>>> 0x2033ade4 <_fseeko+888>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033ade8 <_fseeko+892>:   add r0, r4, #152; 0x98
>>> 0x2033adec <_fseeko+896>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033adf0 <_fseeko+900>:   add r0, r4, #136; 0x88
>>> 0x2033adf4 <_fseeko+904>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033adf8 <_fseeko+908>:   add r0, r4, #120; 0x78
>>> 0x2033adfc <_fseeko+912>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033ae00 <_fseeko+916>:   add r0, r4, #104; 0x68
>>> 0x2033ae04 <_fseeko+920>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033ae08 <_fseeko+924>:   b   0x2033b070 <_fseeko+1540>
>>> 0x2033ae0c <_fseeko+928>:   cmp r5, #0  ; 0x0
>>> (gdb) info all-registers
>>> r0 0x20651ea4   543497892
>>> r1 0xffdf   65503
>>> r2 0x0  0
>>> r3 0x0  0
>>> r4 0x20651dcc   543497676
>>> r5 0x0  0
>>> r6 0x0  0
>>> r7 0x0  0
>>> r8 0x20359df4   540384756
>>> r9 0x0  0
>>> r100x0  0
>>> r110xbfbfb948   -1077954232
>>> r120x2037b208   540520968
>>> sp 0xbfbfb898   -1077954408
>>> lr 0x2035a004   540385284
>>> pc 0x2033adcc   540257740
>>> f0 0(raw 0x)
>>> f1 0(raw 0x)
>>> f2 0(raw 0x)
>>> f3 0(raw 0x)
>>> f4 0(raw 0x)
>>> f5 0(raw 0x)
>>> f6 0(raw 0x)
>>> f7 0(raw 0x)
>>> fps0x0  0

Re: 11.0-CURRENT (r292413) on a rpi2b: arm-gnueabi-freebsd/bin/ar, _fseeko, and memset vs memory alignment (SCTRL bit[1]=1?): Explains the Bus error?

2015-12-25 Thread Mark Millard
[Good News Summary: Rebuilding buildworld/buildkernel for rpi2 11.0-CURRENT 
292413 from amd64 based on adding -fmax-type-align=4 has so far removed the 
crashes during the toolchain activity: no more misaligned accesses in libc's 
_fseeko or elsewhere.]

On 2015-Dec-25, at 12:31 AM, Mark Millard  wrote:

> On 2015-Dec-24, at 10:39 PM, Mark Millard  wrote:
> 
>> [I do not know if this partial crash analysis related to on-arm 
>> clang-associated activity is good enough and appropriate to submit or not.]
>> 
>> The /usr/local/arm-gnueabi-freebsd/bin/ar on the rpi2b involved below came 
>> from pkg install activity instead of port building. Used as-is.
>> 
>> When I just tried my first from-rpi2b builds (ports for a rpi2b), 
>> /usr/local/arm-gnueabi-freebsd/bin/ar crashed. I believe that the following 
>> suggests an alignment error for the type of instructions that memset for 128 
>> bytes was translated to (sizeof(mbstate_t)) in the code used by 
>> /usr/local/arm-gnueabi-freebsd/bin/ar. (But I do not know how to check SCTLR 
>> bit[1] to be directly sure that alignment was being enforced.)
>> 
>> The crash was a Bus error in /usr/local/arm-gnueabi-freebsd/bin/ar :
>> 
>>> libtool: link: /usr/local/arm-gnueabi-freebsd/bin/ar cru .libs/libgnuintl.a 
>>>  bindtextdom.o dcgettext.o dgettext.o gettext.o finddomain.o hash-string.o 
>>> loadmsgcat.o localealias.o textdomain.o l10nflist.o explodename.o 
>>> dcigettext.o dcngettext.o dngettext.o ngettext.o pluralx.o plural-exp.o 
>>> localcharset.o threadlib.o lock.o relocatable.o langprefs.o localename.o 
>>> log.o printf.o setlocale.o version.o xsize.o osdep.o intl-compat.o
>>> Bus error (core dumped)
>>> *** [libgnuintl.la] Error code 138
>> 
>> It failed in _fseeko doing a memset that turned into uses of "vst1.64
>> {d16-d17}, [r0]" instructions, for an address in register r0 that ended in 
>> 0xa4, so was not aligned to 8 byte boundaries. From what I read such "VSTn 
>> (multiple n-element structures)" that have .64 require 8 byte alignment. The 
>> evidence of the code and register value follow.
>> 
>>> # gdb /usr/local/arm-gnueabi-freebsd/bin/ar 
>>> /usr/obj/portswork/usr/ports/devel/gettext-tools/work/gettext-0.19.6/gettext-tools/intl/ar.core
>>> . . .
>>> #0  0x2033adcc in _fseeko (fp=0x20651dcc, offset=, 
>>> whence=, ltest=) at 
>>> /usr/src/lib/libc/stdio/fseek.c:299
>>> 299 memset(>_mbstate, 0, sizeof(mbstate_t));
>>> . . .
>>> (gdb) x/24i 0x2033adb0
>>> 0x2033adb0 <_fseeko+836>:   vmov.i32q8, #0  ; 0x
>>> 0x2033adb4 <_fseeko+840>:   movwr1, #65503  ; 0xffdf
>>> 0x2033adb8 <_fseeko+844>:   stm r4, {r0, r7}
>>> 0x2033adbc <_fseeko+848>:   ldrhr0, [r4, #12]
>>> 0x2033adc0 <_fseeko+852>:   and r0, r0, r1
>>> 0x2033adc4 <_fseeko+856>:   strhr0, [r4, #12]
>>> 0x2033adc8 <_fseeko+860>:   add r0, r4, #216; 0xd8
>>> 0x2033adcc <_fseeko+864>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033add0 <_fseeko+868>:   add r0, r4, #200; 0xc8
>>> 0x2033add4 <_fseeko+872>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033add8 <_fseeko+876>:   add r0, r4, #184; 0xb8
>>> 0x2033addc <_fseeko+880>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033ade0 <_fseeko+884>:   add r0, r4, #168; 0xa8
>>> 0x2033ade4 <_fseeko+888>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033ade8 <_fseeko+892>:   add r0, r4, #152; 0x98
>>> 0x2033adec <_fseeko+896>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033adf0 <_fseeko+900>:   add r0, r4, #136; 0x88
>>> 0x2033adf4 <_fseeko+904>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033adf8 <_fseeko+908>:   add r0, r4, #120; 0x78
>>> 0x2033adfc <_fseeko+912>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033ae00 <_fseeko+916>:   add r0, r4, #104; 0x68
>>> 0x2033ae04 <_fseeko+920>:   vst1.64 {d16-d17}, [r0]
>>> 0x2033ae08 <_fseeko+924>:   b   0x2033b070 <_fseeko+1540>
>>> 0x2033ae0c <_fseeko+928>:   cmp r5, #0  ; 0x0
>>> (gdb) info all-registers
>>> r0 0x20651ea4   543497892
>>> r1 0xffdf   65503
>>> r2 0x0  0
>>> r3 0x0  0
>>> r4 0x20651dcc   543497676
>>> r5 0x0  0
>>> r6 0x0  0
>>> r7 0x0  0
>>> r8 0x20359df4   540384756
>>> r9 0x0  0
>>> r100x0  0
>>> r110xbfbfb948   -1077954232
>>> r120x2037b208   540520968
>>> sp 0xbfbfb898   -1077954408
>>> lr 0x2035a004   540385284
>>> pc 0x2033adcc   540257740
>>> f0 0(raw 0x)
>>> f1 0(raw 0x)
>>> f2 0(raw 0x)
>>> f3 0(raw 0x)
>>> f4 0(raw 0x)
>>> f5 0(raw 0x)
>>> f6 0(raw 0x)
>>> f7 0(raw 0x)
>>> fps0x0  0

Re: 11.0-CURRENT (r292413) on a rpi2b: arm-gnueabi-freebsd/bin/ar, _fseeko, and memset vs memory alignment (SCTRL bit[1]=1?): Explains the Bus error?

2015-12-25 Thread Warner Losh
So what happens if we actually fix the underlying bug?

I see two ways of doing this. In findfp.c, we allocate an array of FILE * today 
like:
g = (struct glue *)malloc(sizeof(*g) + ALIGNBYTES + n * sizeof(FILE));
but that assumes that FILE just has normal pointer alignment requirements. 
However,
due to the mbstate having int64_t alignment requirements, this is wrong. Maybe 
we
need to do something like
g = (struct glue *)malloc(sizeof(*g) + max(sizeof(int64_t),ALIGNBYTES) 
+ n * sizeof(FILE));
which wouldn’t change anything on LP64 systems, but would result in proper 
alignment
for ILP32 systems. We’d have to fix the loop that uses ALIGN afterwards to use
roundup. Instead, we’d need to round up to the neared 8-byte aligned offset (or 
technically,
the max of ALIGNBYTES and 8, but that’s always 8 on today’s systems. If we do 
this,
we can make sure that each file is 8-byte aligned or better. We may need to 
round up
sizeof(FILE) to a multiple of 8 as well. I believe that since it has the 8-byte 
alignment
for a member, its size must be a multiple of 8, but I’ve not chased that belief 
to ground.
If not, we may need another decorator (__aligned(8), I think, spelled with the 
ugly
max expression above). That way, the contract we’re making with the compiler 
will
always be true. ALIGN BYTES is 4 on Arm anyway, so that bit is clearly wrong.

This wouldn’t be an ABI change, since you can only get a valid FILE * from 
fopen (and
friends), plus stdin, stdout, and stderr. Those addresses aren’t hard coded 
into binaries,
so even if we have to tweak the last three and deal with some ‘fake’ FILE abuse 
in libc
(which I don’t think suffers from this issue, btw, given the alignment 
requirements that would
naturally follow from something on the stack), we’d still be ahead. At least 
for all CONFORMING
implementations[*]...

TL;DR: Why not make FILE * always 8-byte aligned? The compiler options are a 
band-aide.

Warner

[*] There’s at least on popular package that has a copy of the FILE structure 
in one of its
.h files and uses that to do unnatural optimization things, but even that’s 
cool, I think,
since it never allocates a new one.

> On Dec 25, 2015, at 7:24 AM, Mark Millard  wrote:
> 
> [Good News Summary: Rebuilding buildworld/buildkernel for rpi2 11.0-CURRENT 
> 292413 from amd64 based on adding -fmax-type-align=4 has so far removed the 
> crashes during the toolchain activity: no more misaligned accesses in libc's 
> _fseeko or elsewhere.]
> 
> On 2015-Dec-25, at 12:31 AM, Mark Millard  wrote:
> 
>> On 2015-Dec-24, at 10:39 PM, Mark Millard  wrote:
>> 
>>> [I do not know if this partial crash analysis related to on-arm 
>>> clang-associated activity is good enough and appropriate to submit or not.]
>>> 
>>> The /usr/local/arm-gnueabi-freebsd/bin/ar on the rpi2b involved below came 
>>> from pkg install activity instead of port building. Used as-is.
>>> 
>>> When I just tried my first from-rpi2b builds (ports for a rpi2b), 
>>> /usr/local/arm-gnueabi-freebsd/bin/ar crashed. I believe that the following 
>>> suggests an alignment error for the type of instructions that memset for 
>>> 128 bytes was translated to (sizeof(mbstate_t)) in the code used by 
>>> /usr/local/arm-gnueabi-freebsd/bin/ar. (But I do not know how to check 
>>> SCTLR bit[1] to be directly sure that alignment was being enforced.)
>>> 
>>> The crash was a Bus error in /usr/local/arm-gnueabi-freebsd/bin/ar :
>>> 
 libtool: link: /usr/local/arm-gnueabi-freebsd/bin/ar cru 
 .libs/libgnuintl.a  bindtextdom.o dcgettext.o dgettext.o gettext.o 
 finddomain.o hash-string.o loadmsgcat.o localealias.o textdomain.o 
 l10nflist.o explodename.o dcigettext.o dcngettext.o dngettext.o ngettext.o 
 pluralx.o plural-exp.o localcharset.o threadlib.o lock.o relocatable.o 
 langprefs.o localename.o log.o printf.o setlocale.o version.o xsize.o 
 osdep.o intl-compat.o
 Bus error (core dumped)
 *** [libgnuintl.la] Error code 138
>>> 
>>> It failed in _fseeko doing a memset that turned into uses of "vst1.64   
>>> {d16-d17}, [r0]" instructions, for an address in register r0 that ended in 
>>> 0xa4, so was not aligned to 8 byte boundaries. From what I read such "VSTn 
>>> (multiple n-element structures)" that have .64 require 8 byte alignment. 
>>> The evidence of the code and register value follow.
>>> 
 # gdb /usr/local/arm-gnueabi-freebsd/bin/ar 
 /usr/obj/portswork/usr/ports/devel/gettext-tools/work/gettext-0.19.6/gettext-tools/intl/ar.core
 . . .
 #0  0x2033adcc in _fseeko (fp=0x20651dcc, offset=, 
 whence=, ltest=) at 
 /usr/src/lib/libc/stdio/fseek.c:299
 299memset(>_mbstate, 0, sizeof(mbstate_t));
 . . .
 (gdb) x/24i 0x2033adb0
 0x2033adb0 <_fseeko+836>:  vmov.i32q8, #0  ; 0x
 0x2033adb4 <_fseeko+840>:  movwr1, #65503  ; 0xffdf
 0x2033adb8 <_fseeko+844>:  

Re: 11.0-CURRENT (r292413) on a rpi2b: arm-gnueabi-freebsd/bin/ar, _fseeko, and memset vs memory alignment (SCTRL bit[1]=1?): Explains the Bus error?

2015-12-25 Thread Mark Millard
> On 2015-Dec-25, at 3:42 PM, Warner Losh  wrote:
> 
> 
>> On Dec 25, 2015, at 3:14 PM, Mark Millard  wrote:
>> 
>> [I'm going to break much of the earlier "original material" text to tail of 
>> the message.]
>> 
>>> On 2015-Dec-25, at 11:53 AM, Warner Losh  wrote:
>>> 
>>> So what happens if we actually fix the underlying bug?
>>> 
>>> I see two ways of doing this. In findfp.c, we allocate an array of FILE * 
>>> today like:
>>>  g = (struct glue *)malloc(sizeof(*g) + ALIGNBYTES + n * sizeof(FILE));
>>> but that assumes that FILE just has normal pointer alignment requirements. 
>>> However,
>>> due to the mbstate having int64_t alignment requirements, this is wrong. 
>>> Maybe we
>>> need to do something like
>>> g = (struct glue *)malloc(sizeof(*g) + max(sizeof(int64_t),ALIGNBYTES) 
>>> + n * sizeof(FILE));
>>> which wouldn’t change anything on LP64 systems, but would result in proper 
>>> alignment
>>> for ILP32 systems. We’d have to fix the loop that uses ALIGN afterwards to 
>>> use
>>> roundup. Instead, we’d need to round up to the neared 8-byte aligned offset 
>>> (or technically,
>>> the max of ALIGNBYTES and 8, but that’s always 8 on today’s systems. If we 
>>> do this,
>>> we can make sure that each file is 8-byte aligned or better. We may need to 
>>> round up
>>> sizeof(FILE) to a multiple of 8 as well. I believe that since it has the 
>>> 8-byte alignment
>>> for a member, its size must be a multiple of 8, but I’ve not chased that 
>>> belief to ground.
>>> If not, we may need another decorator (__aligned(8), I think, spelled with 
>>> the ugly
>>> max expression above). That way, the contract we’re making with the 
>>> compiler will
>>> always be true. ALIGN BYTES is 4 on Arm anyway, so that bit is clearly 
>>> wrong.
>>> 
>>> This wouldn’t be an ABI change, since you can only get a valid FILE * from 
>>> fopen (and
>>> friends), plus stdin, stdout, and stderr. Those addresses aren’t hard coded 
>>> into binaries,
>>> so even if we have to tweak the last three and deal with some ‘fake’ FILE 
>>> abuse in libc
>>> (which I don’t think suffers from this issue, btw, given the alignment 
>>> requirements that would
>>> naturally follow from something on the stack), we’d still be ahead. At 
>>> least for all CONFORMING
>>> implementations[*]...
>>> 
>>> TL;DR: Why not make FILE * always 8-byte aligned? The compiler options are 
>>> a band-aide.
>>> 
>>> Warner
>>> 
>>> [*] There’s at least on popular package that has a copy of the FILE 
>>> structure in one of its
>>> .h files and uses that to do unnatural optimization things, but even that’s 
>>> cool, I think,
>>> since it never allocates a new one.
>>> 
>> 
>> The ARM documentation mentions cases of 16 byte alignment requirements. I've 
>> no clue if the clang code generation ever creates such code. There might be 
>> wider requirements possible in arm code as well. (I'm not an arm expert.) As 
>> an example of an implication: "The malloc() function returns a pointer to a 
>> block of at least size bytes suitably aligned for any use." In other words: 
>> aligned to some figure that is a multiple of *every* alignment requirement 
>> that the code generator can produce, possibly being the least common 
>> multiple.
>> 
>> "-fmax-type-align=. . ." is a means of controlling/limiting the range of 
>> potential alignments to no more than a fixed, predefined value. Above that 
>> and the code generation has to work in small size accesses and 
>> build-up/split-up bigger values. Using "-fmax-type-align=. . ." allows 
>> defining a figure as part of an ABI that is then not subject to code 
>> generator updates that could increase the maximum alignment figure and break 
>> things: It turns off such new capabilities. Other options need not work that 
>> way to preserve the ABI.
> 
> That’s true, as far as it goes… But I’m not sure it goes far enough. The 
> premise here is that the problem is wide-spread, when in fact I think it is 
> quite narrow.
> 
>> But in the most fundamental terms process wise as far as I can tell. . .
>> 
>> While the FILE case that occurred is a specific example, every 
>> memory-allocation-like operation is at a potential issue for all such 
>> "allocated" objects where the related code generation requires alignment to 
>> avoid Bus Error (given the SCTLR bit[1] in use).
> 
> The problem isn’t general. The problem isn’t malloc. Malloc will generally 
> return the right thing on arm (and if it doesn’t,
> then we need to make sure it does).
> 
> The problem is we get a boatload of FILEs from the system all at once, and 
> those are misaligned because of a bug in the code. One that’s fixed, I 
> believe, in https://reviews.freebsd.org/D4708.
> 
> 
>> How many other places in FreeBSD might sometimes return mis-aligned pointers 
>> for the existing code generation and ABI combination?
> 
> It isn’t an ABI thing, just a code bug thing. The only reason it was an issue 
> 

Re: 11.0-CURRENT (r292413) on a rpi2b: arm-gnueabi-freebsd/bin/ar, _fseeko, and memset vs memory alignment (SCTRL bit[1]=1?): Explains the Bus error?

2015-12-25 Thread Mark Millard
[I am again breaking off another section of older material.]

Mixed news I'm afraid.

The specific couple of ports that I attempted did build, the same ones that 
originally got the Bus Error in ar using (indirectly) _fseeko and memset that I 
reported. So I expect that you fixed one error.

But when I tried to buildworld, clang++ 3.7 processing 
usr/src/lib/clang/libllvmtablegen/ materials quickly got a Bus Error at nearly 
the same type of instruction (it has a "!" below that the earlier one did not), 
but with r4 holding the misaligned address this time:

> --- _bootstrap-tools-lib/clang/libllvmsupport ---
> --- APFloat.o ---
> clang++: error: unable to execute command: Bus error (core dumped)
> . . .
> # gdb clang++ usr/src/lib/clang/libllvmtablegen/clang++.core
> . . .
> Core was generated by `clang++'.
> Program terminated with signal 10, Bus error.
> #0  0x00c3bb9c in 
> clang::DependentTemplateSpecializationType::DependentTemplateSpecializationType
>  ()
> [New Thread 22a18000 (LWP 100128/)]
> (gdb) x/40i 0x00c3bb60
> . . .
> 0xc3bb9c 
> <_ZN5clang35DependentTemplateSpecializationTypeC2ENS_21ElaboratedTypeKeywordEPNS_19NestedNameSpecifierEPKNS_14IdentifierInfoEjPKNS_16TemplateArgumentENS_8QualTypeE+356>:
> 
> vst1.64   {d16-d17}, [r4]!
> . . .
> (gdb) info all-registers
> r0 0xbfbf81a8 -1077968472
> r1 0x22f07e14 586186260
> r2 0xc416bc   12850876
> r3 0x22
> r4 0x22f07dfc 586186236
> . . .


Thus it appears that there is more code around that likely generates pointers 
not aligned so to allow the code generation that is in use for what is pointed 
to.

At this point I have no clue if the issue is just inside clang itself vs. if it 
is in something that clang is layered on top of. Nor if there is just one bad 
thing or many.

Note: I had not yet tried buildworld/buildkernel for the context of the "-f" 
option that I was experimenting with earlier. So I do not have a direct compare 
and contrast at this point.



Older material:

On 2015-Dec-25, at 5:21 PM, Mark Millard  wrote:

> On 2015-Dec-25, at 3:42 PM, Warner Losh  wrote:
> 
> 
>> On Dec 25, 2015, at 3:14 PM, Mark Millard  wrote:
>> 
>> [I'm going to break much of the earlier "original material" text to tail of 
>> the message.]
>> 
>>> On 2015-Dec-25, at 11:53 AM, Warner Losh  wrote:
>>> 
>>> So what happens if we actually fix the underlying bug?
>>> 
>>> I see two ways of doing this. In findfp.c, we allocate an array of FILE * 
>>> today like:
>>> g = (struct glue *)malloc(sizeof(*g) + ALIGNBYTES + n * sizeof(FILE));
>>> but that assumes that FILE just has normal pointer alignment requirements. 
>>> However,
>>> due to the mbstate having int64_t alignment requirements, this is wrong. 
>>> Maybe we
>>> need to do something like
>>> g = (struct glue *)malloc(sizeof(*g) + max(sizeof(int64_t),ALIGNBYTES) 
>>> + n * sizeof(FILE));
>>> which wouldn’t change anything on LP64 systems, but would result in proper 
>>> alignment
>>> for ILP32 systems. We’d have to fix the loop that uses ALIGN afterwards to 
>>> use
>>> roundup. Instead, we’d need to round up to the neared 8-byte aligned offset 
>>> (or technically,
>>> the max of ALIGNBYTES and 8, but that’s always 8 on today’s systems. If we 
>>> do this,
>>> we can make sure that each file is 8-byte aligned or better. We may need to 
>>> round up
>>> sizeof(FILE) to a multiple of 8 as well. I believe that since it has the 
>>> 8-byte alignment
>>> for a member, its size must be a multiple of 8, but I’ve not chased that 
>>> belief to ground.
>>> If not, we may need another decorator (__aligned(8), I think, spelled with 
>>> the ugly
>>> max expression above). That way, the contract we’re making with the 
>>> compiler will
>>> always be true. ALIGN BYTES is 4 on Arm anyway, so that bit is clearly 
>>> wrong.
>>> 
>>> This wouldn’t be an ABI change, since you can only get a valid FILE * from 
>>> fopen (and
>>> friends), plus stdin, stdout, and stderr. Those addresses aren’t hard coded 
>>> into binaries,
>>> so even if we have to tweak the last three and deal with some ‘fake’ FILE 
>>> abuse in libc
>>> (which I don’t think suffers from this issue, btw, given the alignment 
>>> requirements that would
>>> naturally follow from something on the stack), we’d still be ahead. At 
>>> least for all CONFORMING
>>> implementations[*]...
>>> 
>>> TL;DR: Why not make FILE * always 8-byte aligned? The compiler options are 
>>> a band-aide.
>>> 
>>> Warner
>>> 
>>> [*] There’s at least on popular package that has a copy of the FILE 
>>> structure in one of its
>>> .h files and uses that to do unnatural optimization things, but even that’s 
>>> cool, I think,
>>> since it never allocates a new one.
>>> 
>> 
>> The ARM documentation mentions cases of 16 byte alignment requirements. I've 
>> no clue if the clang code generation ever creates such 

11.0-CURRENT (r292413) on a rpi2b: arm-gnueabi-freebsd/bin/ar, _fseeko, and memset vs memory alignment (SCTRL bit[1]=1?): Explains the Bus error?

2015-12-24 Thread Mark Millard
[I do not know if this partial crash analysis related to on-arm 
clang-associated activity is good enough and appropriate to submit or not.]

The /usr/local/arm-gnueabi-freebsd/bin/ar on the rpi2b involved below came from 
pkg install activity instead of port building. Used as-is.

When I just tried my first from-rpi2b builds (ports for a rpi2b), 
/usr/local/arm-gnueabi-freebsd/bin/ar crashed. I believe that the following 
suggests an alignment error for the type of instructions that memset for 128 
bytes was translated to (sizeof(mbstate_t)) in the code used by 
/usr/local/arm-gnueabi-freebsd/bin/ar. (But I do not know how to check SCTLR 
bit[1] to be directly sure that alignment was being enforced.)

The crash was a Bus error in /usr/local/arm-gnueabi-freebsd/bin/ar :

> libtool: link: /usr/local/arm-gnueabi-freebsd/bin/ar cru .libs/libgnuintl.a  
> bindtextdom.o dcgettext.o dgettext.o gettext.o finddomain.o hash-string.o 
> loadmsgcat.o localealias.o textdomain.o l10nflist.o explodename.o 
> dcigettext.o dcngettext.o dngettext.o ngettext.o pluralx.o plural-exp.o 
> localcharset.o threadlib.o lock.o relocatable.o langprefs.o localename.o 
> log.o printf.o setlocale.o version.o xsize.o osdep.o intl-compat.o
> Bus error (core dumped)
> *** [libgnuintl.la] Error code 138

It failed in _fseeko doing a memset that turned into uses of "vst1.64   
{d16-d17}, [r0]" instructions, for an address in register r0 that ended in 
0xa4, so was not aligned to 8 byte boundaries. From what I read such "VSTn 
(multiple n-element structures)" that have .64 require 8 byte alignment. The 
evidence of the code and register value follow.

> # gdb /usr/local/arm-gnueabi-freebsd/bin/ar 
> /usr/obj/portswork/usr/ports/devel/gettext-tools/work/gettext-0.19.6/gettext-tools/intl/ar.core
> . . .
> #0  0x2033adcc in _fseeko (fp=0x20651dcc, offset=, 
> whence=, ltest=) at 
> /usr/src/lib/libc/stdio/fseek.c:299
> 299   memset(>_mbstate, 0, sizeof(mbstate_t));
> . . .
> (gdb) x/24i 0x2033adb0
> 0x2033adb0 <_fseeko+836>: vmov.i32q8, #0  ; 0x
> 0x2033adb4 <_fseeko+840>: movwr1, #65503  ; 0xffdf
> 0x2033adb8 <_fseeko+844>: stm r4, {r0, r7}
> 0x2033adbc <_fseeko+848>: ldrhr0, [r4, #12]
> 0x2033adc0 <_fseeko+852>: and r0, r0, r1
> 0x2033adc4 <_fseeko+856>: strhr0, [r4, #12]
> 0x2033adc8 <_fseeko+860>: add r0, r4, #216; 0xd8
> 0x2033adcc <_fseeko+864>: vst1.64 {d16-d17}, [r0]
> 0x2033add0 <_fseeko+868>: add r0, r4, #200; 0xc8
> 0x2033add4 <_fseeko+872>: vst1.64 {d16-d17}, [r0]
> 0x2033add8 <_fseeko+876>: add r0, r4, #184; 0xb8
> 0x2033addc <_fseeko+880>: vst1.64 {d16-d17}, [r0]
> 0x2033ade0 <_fseeko+884>: add r0, r4, #168; 0xa8
> 0x2033ade4 <_fseeko+888>: vst1.64 {d16-d17}, [r0]
> 0x2033ade8 <_fseeko+892>: add r0, r4, #152; 0x98
> 0x2033adec <_fseeko+896>: vst1.64 {d16-d17}, [r0]
> 0x2033adf0 <_fseeko+900>: add r0, r4, #136; 0x88
> 0x2033adf4 <_fseeko+904>: vst1.64 {d16-d17}, [r0]
> 0x2033adf8 <_fseeko+908>: add r0, r4, #120; 0x78
> 0x2033adfc <_fseeko+912>: vst1.64 {d16-d17}, [r0]
> 0x2033ae00 <_fseeko+916>: add r0, r4, #104; 0x68
> 0x2033ae04 <_fseeko+920>: vst1.64 {d16-d17}, [r0]
> 0x2033ae08 <_fseeko+924>: b   0x2033b070 <_fseeko+1540>
> 0x2033ae0c <_fseeko+928>: cmp r5, #0  ; 0x0
> (gdb) info all-registers
> r0 0x20651ea4 543497892
> r1 0xffdf 65503
> r2 0x00
> r3 0x00
> r4 0x20651dcc 543497676
> r5 0x00
> r6 0x00
> r7 0x00
> r8 0x20359df4 540384756
> r9 0x00
> r100x00
> r110xbfbfb948 -1077954232
> r120x2037b208 540520968
> sp 0xbfbfb898 -1077954408
> lr 0x2035a004 540385284
> pc 0x2033adcc 540257740
> f0 0  (raw 0x)
> f1 0  (raw 0x)
> f2 0  (raw 0x)
> f3 0  (raw 0x)
> f4 0  (raw 0x)
> f5 0  (raw 0x)
> f6 0  (raw 0x)
> f7 0  (raw 0x)
> fps0x00
> cpsr   0x6010 1610612752

The syntax in use for vst1.64 instructions does not explicitly have the 
alignment notation. Presuming that the decoding is correct then from what I 
read the following applies:

> Home > NEON and VFP Programming > NEON load and store element and structure 
> instructions > Alignment restrictions in load and store, element and 
> structure instructions
> 
> . . . When the alignment is not specified in the instruction, the alignment 
> restriction is controlled by the A