Re: [macppc] clang-10: issue with ppc dag to dag pattern instruction selection

2020-09-06 Thread Todd Mortimer
On Tue, Sep 01, 2020 at 07:37:19PM -0400, George Koehler wrote:
> Moving from bugs to tech.
> 
> cwen reported that base-clang crashed on macppc in graphics/babl and
> emulators/mednafen [1].  I observed that clang crashed on powerpc64 in
> mednafen.  I now propose to backport a commit in llvm 11.x git [2] to
> prevent these crashes.  This change affects other arches.
> 
> [1] https://marc.info/?l=openbsd-bugs=159775378014683=2
> [2] https://github.com/llvm/llvm-project/commit/d4ce862
> 
> FreeBSD on powerpc64 found the problems earlier:
> https://bugs.llvm.org/show_bug.cgi?id=45237 (babl)
> https://bugs.llvm.org/show_bug.cgi?id=45274 (mednafen)
> 
> The problem is with clang -fno-unsafe-math-optimizations, one of the
> new floating-point options in clang-10.  The diff disables the new
> options on all targets except X86 (and SystemZ), so it also affects
> arches other than PowerPC.  The disabled options cause a warning.
> 
> With the diff in clang:
>  - mednafen stops using -fno-unsafe-math-optimizations, because the
>new warning fails the configure test.  I can build and package
>mednafen on macppc (where a single *m68k*.cpp takes over 2 hours),
>but when I try to play Super Mario World (snes), mednafen crashes
>with a SIGSEGV.  The build on powerpc64 stops crashing clang, but
>fails at an altivec error.
>  - babl uses -fno-unsafe-math-optimizations and ignores the new
>warning.  I can build and package babl, and all 27 tests pass on
>both macppc and powerpc64.
>  - I have built src and xenocara on powerpc64.
> 
> The diff is in llvm 11.x.  To backport the diff, I manually applied
> some parts (where the context had changed), and I renamed
> RoundingMode::NearestTiesToEven back to FPR_RoundToNearest.
> 
> Apply the diff in /usr/src/gnu/llvm.  If DiagnosticFrontendKinds.inc
> exists under /usr/obj/gnu/usr.bin/clang/, you must delete the .inc,
> because there is a missing dependency in our Makefiles.
> 
> Is this OK?  Should we do something else?  --George

This is the upstream patch to clang11 almost verbatim, and with it I can
build clang and then use that clang to build a snap, so I think this is
good to go. I certainly do not have alternate suggestions, so
ok mortimer.

Cheers,
Todd

> 
> Index: clang/docs/ClangCommandLineReference.rst
> ===
> RCS file: /cvs/src/gnu/llvm/clang/docs/ClangCommandLineReference.rst,v
> retrieving revision 1.1.1.2
> diff -u -p -r1.1.1.2 ClangCommandLineReference.rst
> --- clang/docs/ClangCommandLineReference.rst  9 Aug 2020 15:51:07 -   
> 1.1.1.2
> +++ clang/docs/ClangCommandLineReference.rst  30 Aug 2020 19:37:51 -
> @@ -818,6 +818,10 @@ Enables the experimental global instruct
>  
>  Enables an experimental new pass manager in LLVM.
>  
> +.. option:: -fexperimental-strict-floating-point
> +
> +Enables the use of non-default rounding modes and non-default exception 
> handling on targets that are not currently ready.
> +
>  .. option:: -ffine-grained-bitfield-accesses, 
> -fno-fine-grained-bitfield-accesses
>  
>  Use separate accesses for consecutive bitfield runs with legal widths and 
> alignments.
> Index: clang/include/clang/Basic/CodeGenOptions.def
> ===
> RCS file: /cvs/src/gnu/llvm/clang/include/clang/Basic/CodeGenOptions.def,v
> retrieving revision 1.1.1.1
> diff -u -p -r1.1.1.1 CodeGenOptions.def
> --- clang/include/clang/Basic/CodeGenOptions.def  3 Aug 2020 14:31:32 
> -   1.1.1.1
> +++ clang/include/clang/Basic/CodeGenOptions.def  30 Aug 2020 19:37:51 
> -
> @@ -58,6 +58,8 @@ CODEGENOPT(DisableLLVMPasses , 1, 0) ///
>   ///< frontend.
>  CODEGENOPT(DisableLifetimeMarkers, 1, 0) ///< Don't emit any lifetime markers
>  CODEGENOPT(DisableO0ImplyOptNone , 1, 0) ///< Don't annonate function with 
> optnone at O0
> +CODEGENOPT(ExperimentalStrictFloatingPoint, 1, 0) ///< Enables the new, 
> experimental
> +  ///< strict floating point.
>  CODEGENOPT(ExperimentalNewPassManager, 1, 0) ///< Enables the new, 
> experimental
>   ///< pass manager.
>  CODEGENOPT(DebugPassManager, 1, 0) ///< Prints debug information for the new
> Index: clang/include/clang/Basic/DiagnosticFrontendKinds.td
> ===
> RCS file: 
> /cvs/src/gnu/llvm/clang/include/clang/Basic/DiagnosticFrontendKinds.td,v
> retrieving revision 1.1.1.1
> diff -u -p -r1.1.1.1 DiagnosticFrontendKinds.td
> --- clang/include/clang/Basic/DiagnosticFrontendKinds.td  3 Aug 2020 
> 14:31:32 -   1.1.1.1
> +++ clang/include/clang/Basic/DiagnosticFrontendKinds.td  30 Aug 2020 
> 19:37:51 -
> @@ -37,6 +37,12 @@ def note_fe_backend_plugin: Note<"%0">, 
>  def warn_fe_override_module : Warning<
>  "overriding the module target 

Re: [macppc 6.7-beta] clang backend error: "adde Constant" issue

2020-04-30 Thread Todd Mortimer
On Thu, Apr 30, 2020 at 12:58:32PM -0400, George Koehler wrote:
> On Wed, 29 Apr 2020 21:08:52 +0200 (CEST)
> Mark Kettenis  wrote:
> 
> > Upstream fixed this issue as well.  Apparently only ADDE can't be
> > legalized (because it is "special") but ADDCARRY can.  Do ypu want to
> > adjust your diff based on that information?
> > 
> > Either way, ok kettenis@
> 
> This adjusted diff changes only ADDE and not ADDCARRY.  I expect it to
> work as well as my previous diff (Tue 28 Apr) on PowerPC, because
> PowerPC doesn't use ADDCARRY.
> 
> After I built macppc clang with the previous diff, I did a successful
> "make build" in /usr/src.  I'm not doing another "make build" with
> this adjusted diff, but I will check that some other stuff builds.

Looks good to me. ok mortimer@

> 
> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> ===
> RCS file: /cvs/src/gnu/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp,v
> retrieving revision 1.1.1.8
> diff -u -p -r1.1.1.8 DAGCombiner.cpp
> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp  23 Jun 2019 21:36:48 -  
> 1.1.1.8
> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp  30 Apr 2020 16:18:31 -
> @@ -9904,9 +9904,11 @@ SDValue DAGCombiner::visitTRUNCATE(SDNod
>// (trunc adde(X, Y, Carry)) -> (adde trunc(X), trunc(Y), Carry)
>// (trunc addcarry(X, Y, Carry)) -> (addcarry trunc(X), trunc(Y), Carry)
>// When the adde's carry is not used.
> +  // Don't make an illegal adde: LegalizeDAG can't expand nor promote it.
>if ((N0.getOpcode() == ISD::ADDE || N0.getOpcode() == ISD::ADDCARRY) &&
>N0.hasOneUse() && !N0.getNode()->hasAnyUseOfValue(1) &&
> -  (!LegalOperations || TLI.isOperationLegal(N0.getOpcode(), VT))) {
> +  ((!LegalOperations && N0.getOpcode() == ISD::ADDCARRY) ||
> +   TLI.isOperationLegal(N0.getOpcode(), VT))) {
>  SDLoc SL(N);
>  auto X = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(0));
>  auto Y = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(1));



Re: macppc base-clang -msvr4-struct-return

2020-02-14 Thread Todd Mortimer
On Wed, Feb 12, 2020 at 04:05:38PM -0500, George Koehler wrote:
> This is the everything diff: it includes the clang diff that I sent to
> tech@, plus the major version bumps for libLLVM libc++ libc++abi.
> 
>  - distrib/sets/lists: put the new versions in the base sets.
>  - gnu/llvm/tools/clang: -msvr4-struct-return
>  - gnu/usr.bin/clang/libLLVM, lib/libcxx, lib/libcxxabi: new versions

I think this is good to go.

I built through the ABI change with this diff using the directions
George posted earlier:

> To update from source, you must build clang before you build the
> new versions of base libraries.
> First, change major=2 to major=1 in
> /usr/src/gnu/usr.bin/clang/libLLVM/shlib_version.
> 
> Then build clang:
> 
> # cd /usr/src/gnu/usr.bin/clang
> # make obj
> # make
> # make install
> 
> Then change major=1 back to major=2 in
> /usr/src/gnu/usr.bin/clang/libLLVM/shlib_version.
> Then build userland (including clang again).

Then I built a new snapshot and sysupgraded to that snapshot, all
without incident. The snapshot is in my home dir on cvs
(snapshots/macppc) if anyone wants to avoid the build. I checked the
tarballs and the right versions of the bumped libs are all there.

ok mortimer@


> 
> Index: distrib/sets/lists/base/md.amd64
> ===
> RCS file: /cvs/src/distrib/sets/lists/base/md.amd64,v
> retrieving revision 1.875
> diff -u -p -r1.875 md.amd64
> --- distrib/sets/lists/base/md.amd64  30 Dec 2019 02:20:33 -  1.875
> +++ distrib/sets/lists/base/md.amd64  12 Feb 2020 20:52:32 -
> @@ -81,9 +81,9 @@
>  ./usr/lib/gcc-lib/amd64-unknown-openbsd6.6/4.2.1/collect2
>  ./usr/lib/gcc-lib/amd64-unknown-openbsd6.6/4.2.1/libgcc.a
>  ./usr/lib/gcc-lib/amd64-unknown-openbsd6.6/4.2.1/specs
> -./usr/lib/libLLVM.so.1.0
> -./usr/lib/libc++.so.3.0
> -./usr/lib/libc++abi.so.1.0
> +./usr/lib/libLLVM.so.2.0
> +./usr/lib/libc++.so.4.0
> +./usr/lib/libc++abi.so.2.0
>  ./usr/lib/libcompiler_rt.a
>  ./usr/lib/libstdc++.so.57.0
>  ./usr/libdata/perl5/amd64-openbsd
> Index: distrib/sets/lists/base/md.arm64
> ===
> RCS file: /cvs/src/distrib/sets/lists/base/md.arm64,v
> retrieving revision 1.22
> diff -u -p -r1.22 md.arm64
> --- distrib/sets/lists/base/md.arm64  30 Dec 2019 02:20:34 -  1.22
> +++ distrib/sets/lists/base/md.arm64  12 Feb 2020 20:52:32 -
> @@ -37,9 +37,9 @@
>  ./usr/lib/crtbeginS.o
>  ./usr/lib/crtend.o
>  ./usr/lib/crtendS.o
> -./usr/lib/libLLVM.so.1.0
> -./usr/lib/libc++.so.3.0
> -./usr/lib/libc++abi.so.1.0
> +./usr/lib/libLLVM.so.2.0
> +./usr/lib/libc++.so.4.0
> +./usr/lib/libc++abi.so.2.0
>  ./usr/lib/libcompiler_rt.a
>  ./usr/libdata/perl5/aarch64-openbsd
>  ./usr/libdata/perl5/aarch64-openbsd/.packlist
> Index: distrib/sets/lists/base/md.armv7
> ===
> RCS file: /cvs/src/distrib/sets/lists/base/md.armv7,v
> retrieving revision 1.284
> diff -u -p -r1.284 md.armv7
> --- distrib/sets/lists/base/md.armv7  30 Dec 2019 02:20:34 -  1.284
> +++ distrib/sets/lists/base/md.armv7  12 Feb 2020 20:52:32 -
> @@ -37,9 +37,9 @@
>  ./usr/lib/crtbeginS.o
>  ./usr/lib/crtend.o
>  ./usr/lib/crtendS.o
> -./usr/lib/libLLVM.so.1.0
> -./usr/lib/libc++.so.3.0
> -./usr/lib/libc++abi.so.1.0
> +./usr/lib/libLLVM.so.2.0
> +./usr/lib/libc++.so.4.0
> +./usr/lib/libc++abi.so.2.0
>  ./usr/lib/libcompiler_rt.a
>  ./usr/libdata/perl5/arm-openbsd
>  ./usr/libdata/perl5/arm-openbsd/.packlist
> Index: distrib/sets/lists/base/md.i386
> ===
> RCS file: /cvs/src/distrib/sets/lists/base/md.i386,v
> retrieving revision 1.1254
> diff -u -p -r1.1254 md.i386
> --- distrib/sets/lists/base/md.i386   30 Dec 2019 02:20:34 -  1.1254
> +++ distrib/sets/lists/base/md.i386   12 Feb 2020 20:52:32 -
> @@ -78,9 +78,9 @@
>  ./usr/lib/crtbeginS.o
>  ./usr/lib/crtend.o
>  ./usr/lib/crtendS.o
> -./usr/lib/libLLVM.so.1.0
> -./usr/lib/libc++.so.3.0
> -./usr/lib/libc++abi.so.1.0
> +./usr/lib/libLLVM.so.2.0
> +./usr/lib/libc++.so.4.0
> +./usr/lib/libc++abi.so.2.0
>  ./usr/lib/libcompiler_rt.a
>  ./usr/libdata/perl5/i386-openbsd
>  ./usr/libdata/perl5/i386-openbsd/.packlist
> Index: distrib/sets/lists/base/md.loongson
> ===
> RCS file: /cvs/src/distrib/sets/lists/base/md.loongson,v
> retrieving revision 1.456
> diff -u -p -r1.456 md.loongson
> --- distrib/sets/lists/base/md.loongson   30 Dec 2019 02:20:34 -  
> 1.456
> +++ distrib/sets/lists/base/md.loongson   12 Feb 2020 20:52:32 -
> @@ -48,9 +48,9 @@
>  ./usr/lib/gcc-lib/mips64el-unknown-openbsd6.6/4.2.1/collect2
>  ./usr/lib/gcc-lib/mips64el-unknown-openbsd6.6/4.2.1/libgcc.a
>  ./usr/lib/gcc-lib/mips64el-unknown-openbsd6.6/4.2.1/specs
> -./usr/lib/libLLVM.so.1.0
> -./usr/lib/libc++.so.3.0
> -./usr/lib/libc++abi.so.1.0
> 

Re: macppc base-clang -msvr4-struct-return

2020-02-05 Thread Todd Mortimer
Hi George,

On Tue, Feb 04, 2020 at 08:39:12PM -0500, George Koehler wrote:
> Hello tech list,
> 
> This is a diff for base-clang.  It would change the powerpc target to
> return small structs in registers r3 and r4.  This would fix an
> incompatibility with gcc in OpenBSD macppc.  I fear that if I commit
> the diff, I might break snapshot builds.  I am looking for help.

The diff looks sane to me. I don't see anything that would obviously
break snaps, and the option is limited to only the ppc platform.

I am running it through a release build here (amd64) and will know how
that turned out tomorrow after work, so will be able to confirm that you
won't break snaps on that arch. I don't expect there to be any problem
there. I should be able to test it on macppc over the weekend, so if
you're paranoid you can wait to see how that goes.

Cheers,
Todd



> 
> I posted the diff upstream: https://reviews.llvm.org/D73290
> Then backported it to ports-clang and mailed it to ports@:
>   https://marc.info/?l=openbsd-ports=158026852003182=2
> This is the same diff for base-clang.
> 
> Apply diff in /usr/src/gnu/llvm, then build clang like
> 
> # cd /usr/src/gnu/usr.bin/clang
> # make obj
> # make
> # make install
> 
> My macppc machine, an iMac G3 with 512M RAM, is too slow to build llvm
> and clang.  The iMac is now in the 11th day of building ports-clang
> (but I turned it off overnight), and within the last 200 of over 4400
> targets.  After a .cpp file got stuck while using over 800M swap (on
> hard disk), I set vm.swapencrypt.enable=0 and switched to swap on NFS.
> (For contrast, my amd64 desktop built ports-clang in about 2 hours.)
> 
> cwen@ built ports-clang with my diff on a G4 in "29 hours".
> rgc  built base-clang with the diff in this mail.
> Both got good results from my small test programs in the attachment
> (sret-examples.tar.gz) to my ports@ mail.
> 
> The diff affects all platforms (if they build clang), so if the diff
> would break clang, it might break arches other than macppc.  I built
> and installed base-clang with this diff on amd64 (using make -j4 to
> speed up the build).  Then, with the patched clang, I built an amd64
> release(8) of base and xenocara.  I used my new install66.iso
> to install a new amd64 virtual machine, including the patched clang.
> 
> We build most of OpenBSD macppc with base-gcc, but we use base-clang
> for at least clang itself.  I don't know whether the patched
> base-clang can still build base-clang, whether the patch would break
> the base build on macppc.
> 
> The patch adds -maix-struct-return and -msvr4-struct-return:
> 
> $ clang -msvr-struct-return -x c /dev/null
> clang: error: unknown argument '-msvr-struct-return', did you mean
> '-msvr4-struct-return'?
> $ clang -msvr4-struct-return -x c /dev/null
> clang: error: unsupported option '-msvr4-struct-return' for target
> 'amd64-unknown-openbsd6.6'
> 
> On OpenBSD macppc, gcc defaults to -msvr4-struct-return (to return
> small structs in registers r3 and r4), but clang without the patch
> always acts like gcc -maix-struct-return (to return small structs in
> memory through a sret pointer).  The patch adds the options and
> changes the default to -msvr4-struct-return on OpenBSD (and other
> ELF except Linux).  The options work only on 32-bit powerpc.
> 
> When the patch returns a struct in registers, it coerces the struct
> to an integer; this changes the LLVM IR from clang to not use a sret
> pointer.  The option handling for -m{aix,svr4}-struct-return is mostly
> a copy of that for -f{pcc,reg}-struct-return on i386.
> 
> --George
> 
> Index: tools/clang/include/clang/Driver/Options.td
> ===
> RCS file: /cvs/src/gnu/llvm/tools/clang/include/clang/Driver/Options.td,v
> retrieving revision 1.11
> diff -u -p -r1.11 Options.td
> --- tools/clang/include/clang/Driver/Options.td   23 Jun 2019 22:05:15 
> -  1.11
> +++ tools/clang/include/clang/Driver/Options.td   30 Jan 2020 01:13:29 
> -
> @@ -2238,6 +2238,12 @@ def mlongcall: Flag<["-"], "mlongcall">,
>  Group;
>  def mno_longcall : Flag<["-"], "mno-longcall">,
>  Group;
> +def maix_struct_return : Flag<["-"], "maix-struct-return">,
> +  Group, Flags<[CC1Option]>,
> +  HelpText<"Return all structs in memory (PPC32 only)">;
> +def msvr4_struct_return : Flag<["-"], "msvr4-struct-return">,
> +  Group, Flags<[CC1Option]>,
> +  HelpText<"Return small structs in registers (PPC32 only)">;
>  
>  def mvx : Flag<["-"], "mvx">, Group;
>  def mno_vx : Flag<["-"], "mno-vx">, Group;
> Index: tools/clang/lib/CodeGen/TargetInfo.cpp
> ===
> RCS file: /cvs/src/gnu/llvm/tools/clang/lib/CodeGen/TargetInfo.cpp,v
> retrieving revision 1.1.1.7
> diff -u -p -r1.1.1.7 TargetInfo.cpp
> --- tools/clang/lib/CodeGen/TargetInfo.cpp23 Jun 2019 21:37:39 -  
> 1.1.1.7
> +++ tools/clang/lib/CodeGen/TargetInfo.cpp30 Jan 2020 

Re: Add pcidev for Ryzen 3x ccp

2020-01-01 Thread Todd Mortimer
On Thu, Jan 02, 2020 at 09:33:59AM +1000, Jonathan Matthew wrote:
> On Thu, Jan 02, 2020 at 10:20:52AM +1100, Jonathan Gray wrote:
> > On Wed, Jan 01, 2020 at 11:21:45AM -0500, Todd Mortimer wrote:
> > > On Thu, Jan 02, 2020 at 02:41:11AM +1100, Jonathan Gray wrote:
> > > > On Wed, Jan 01, 2020 at 09:53:39AM -0500, Todd Mortimer wrote:
> > > > > My CPU has a CCP that isn't in the known list, so add it and tell ccp
> > > > > about it.
> > > > > 
> > > > > Tested on Ryzen 3900x.
> > > > > 
> > > > > ok? 
> > > > > 
> > > > > Will commit a pcidevs regen immediately after.
> > > > 
> > > > Are your cpu lines in dmesg 17-3* or 17-7*?
> > > > Ryzen 3900x should be model 7X.
> > > 
> > > 17-71-00.
> > > 
> > > cpu0: AMD Ryzen 9 3900X 12-Core Processor, 3792.88 MHz, 17-71-00
> > > 
> > > I assume then change the constant bits to 17_7X and 17h/7xh ?
> > 
> > Yes, unless that device id is also known to appear on earlier
> > models as well, in which case the first appearance is used.
> 
> Also found here:
> 
> cpu3: AMD EPYC 7502P 32-Core Processor, 2495.32 MHz, 17-31-00
> 
> vendor "AMD", unknown product 0x1486 (class crypto subclass miscellaneous, 
> rev 0x00) at pci8 dev 0 function 1 not configured
> 

In that case, the original 17/3x labelling would be correct, which also
agrees with some of the other pcidev ids in the same numeric range.

I have to admit, I would be just as happy to just label everything 17h
and drop the /[123..]x parts, but I don't really mind one way or the
other.

ok for the original diff then?




Re: Add pcidev for Ryzen 3x ccp

2020-01-01 Thread Todd Mortimer
On Thu, Jan 02, 2020 at 02:41:11AM +1100, Jonathan Gray wrote:
> On Wed, Jan 01, 2020 at 09:53:39AM -0500, Todd Mortimer wrote:
> > My CPU has a CCP that isn't in the known list, so add it and tell ccp
> > about it.
> > 
> > Tested on Ryzen 3900x.
> > 
> > ok? 
> > 
> > Will commit a pcidevs regen immediately after.
> 
> Are your cpu lines in dmesg 17-3* or 17-7*?
> Ryzen 3900x should be model 7X.

17-71-00.

cpu0: AMD Ryzen 9 3900X 12-Core Processor, 3792.88 MHz, 17-71-00

I assume then change the constant bits to 17_7X and 17h/7xh ?

> 
> > 
> > 
> > diff --git a/sys/dev/pci/ccp_pci.c b/sys/dev/pci/ccp_pci.c
> > index 2259594644b..c8dcc8750fc 100644
> > --- a/sys/dev/pci/ccp_pci.c
> > +++ b/sys/dev/pci/ccp_pci.c
> > @@ -47,6 +47,7 @@ static const struct pci_matchid ccp_pci_devices[] = {
> > { PCI_VENDOR_AMD,   PCI_PRODUCT_AMD_AMD64_17_CCP_1 },
> > { PCI_VENDOR_AMD,   PCI_PRODUCT_AMD_AMD64_17_CCP_2 },
> > { PCI_VENDOR_AMD,   PCI_PRODUCT_AMD_AMD64_17_1X_CCP },
> > +   { PCI_VENDOR_AMD,   PCI_PRODUCT_AMD_AMD64_17_3X_CCP },
> >  };
> >  
> >  int
> > diff --git a/sys/dev/pci/pcidevs b/sys/dev/pci/pcidevs
> > index d0a021b89bc..fb918e8de5d 100644
> > --- a/sys/dev/pci/pcidevs
> > +++ b/sys/dev/pci/pcidevs
> > @@ -754,6 +754,7 @@ product AMD AMD64_17_CCP_2  0x1468  AMD64 17h Crypto
> >  product AMD AMD64_17_PCIE_40x1470  AMD64 17h PCIE
> >  product AMD AMD64_17_PCIE_50x1471  AMD64 17h PCIE
> >  product AMD AMD64_17_3X_RC 0x1480  AMD64 17h/3xh Root Complex
> > +product AMD AMD64_17_3X_CCP0x1486  AMD64 17h/3xh Crypto
> >  product AMD AMD64_14_HB0x1510  AMD64 14h Host
> >  product AMD AMD64_14_PCIE_10x1512  AMD64 14h PCIE
> >  product AMD AMD64_14_PCIE_20x1513  AMD64 14h PCIE
> > 
> > 



Add pcidev for Ryzen 3x ccp

2020-01-01 Thread Todd Mortimer
My CPU has a CCP that isn't in the known list, so add it and tell ccp
about it.

Tested on Ryzen 3900x.

ok? 

Will commit a pcidevs regen immediately after.


diff --git a/sys/dev/pci/ccp_pci.c b/sys/dev/pci/ccp_pci.c
index 2259594644b..c8dcc8750fc 100644
--- a/sys/dev/pci/ccp_pci.c
+++ b/sys/dev/pci/ccp_pci.c
@@ -47,6 +47,7 @@ static const struct pci_matchid ccp_pci_devices[] = {
{ PCI_VENDOR_AMD,   PCI_PRODUCT_AMD_AMD64_17_CCP_1 },
{ PCI_VENDOR_AMD,   PCI_PRODUCT_AMD_AMD64_17_CCP_2 },
{ PCI_VENDOR_AMD,   PCI_PRODUCT_AMD_AMD64_17_1X_CCP },
+   { PCI_VENDOR_AMD,   PCI_PRODUCT_AMD_AMD64_17_3X_CCP },
 };
 
 int
diff --git a/sys/dev/pci/pcidevs b/sys/dev/pci/pcidevs
index d0a021b89bc..fb918e8de5d 100644
--- a/sys/dev/pci/pcidevs
+++ b/sys/dev/pci/pcidevs
@@ -754,6 +754,7 @@ product AMD AMD64_17_CCP_2  0x1468  AMD64 17h Crypto
 product AMD AMD64_17_PCIE_40x1470  AMD64 17h PCIE
 product AMD AMD64_17_PCIE_50x1471  AMD64 17h PCIE
 product AMD AMD64_17_3X_RC 0x1480  AMD64 17h/3xh Root Complex
+product AMD AMD64_17_3X_CCP0x1486  AMD64 17h/3xh Crypto
 product AMD AMD64_14_HB0x1510  AMD64 14h Host
 product AMD AMD64_14_PCIE_10x1512  AMD64 14h PCIE
 product AMD AMD64_14_PCIE_20x1513  AMD64 14h PCIE



Re: [PATCH] add support for more Nuvoton chips to lm(4)

2019-12-16 Thread Todd Mortimer
Hi Joe,

On Mon, Dec 16, 2019 at 12:04:57PM -0500, Joe Gidi wrote:
> Hi Todd,
> 
> Thanks for taking the time to review and offer improvements. I'm attaching
> a new diff that incorporates your suggestion for simplifying the matching
> and eliminating the unneeded struct and function. It is definitely a
> cleaner, simpler approach. I also corrected the whitespace issue you
> pointed out.
> 
> I made a few tweaks to the voltage calculations. I actually spent quite a
> bit of time going down the rabbit hole here, and while I made some
> improvements, it is definitely not perfect yet. I removed the division by
> 2 from Vcore; I'm pretty confident this is correct, because I'm now seeing
> the voltage I expect, and the datasheet says:
> 
> "The CPUVCORE pin feeds directly into the ADC with no voltage divider
> since the nominal voltage on this pin is only 1.2V."

I think this is right - I read the docs the same way, and the nnumber is
right.

> I actually installed Windows on this machine so I could run HWiNFO64 and
> see how the sensors looked there; I'm including a screenshot for your
> reference. Under Windows, it looks like the sensors are enumerated and
> displayed in the same order, though VBAT and VTT are skipped. Windows
> appears to have VIN2 and VIN3 labeled as +12V and +5V, though the values I
> see in HWiNFO64 are both slightly low and disagree with the values I see
> in the BIOS. With the current version of my patch, VIN3 looks correct and
> matches the BIOS value for +5V but VIN2 is low by a factor of about 3.5. I
> adjusted VIN4, VIN5 and VIN6 by dividing by 2; this brings their readings
> in line with what I see in HWiNFO64.

This is indeed a rabbit hole. :-) My reading of the datasheet leads me
to believe interpreting the VIN values depends on what is hooked up to
them, so this is just trial and error, as you say below.

> From what I've read from the Linux lm_sensors folks, it's a
> trial-and-error process to adjust these values correctly, and sometimes
> the readings are just garbage. As it stands, the patch is definitely
> better than not having sensors, and these values could always be tweaked
> with subsequent patches. If you have any suggestions for improvements
> here, please let me know.

Patch looks good to me. As you say, tweaking the voltage adjustments is
easy if we find we need to later.

> Thanks again for your help; I hope this patch is able to go in.

I committed it as is just now. Thank you!

Cheers,
Todd



Re: [PATCH] add support for more Nuvoton chips to lm(4)

2019-12-15 Thread Todd Mortimer
Hello,

On Fri, Dec 13, 2019 at 06:32:36PM -0500, Joe Gidi wrote:
> Hi all,
> 
> I recently built a new system with an ASRock B450M Pro4 motherboard.

I literally fired up a new system with the exact same board today and
saw the temp sensors didn't appear to be supported, so I'm very happy
that you posted this!

> This board has a Nuvoton NCT6779D chip to monitor temperatures, fans and
> voltages. OpenBSD's lm(4) currently only supports the Nuvoton NCT6776F
> chip, added by Marco Pfatschbacher in 2011:
> 
> https://marc.info/?l=openbsd-cvs=132318770131497=2
> 
> NetBSD's lm(4) gained support for several other Nuvoton chips in this
> commit by SAITOH Masanobu in 2017:
> 
> http://mail-index.netbsd.org/source-changes/2017/07/11/msg086253.html
> 
> I have adapted the NetBSD code to OpenBSD and confirmed that it appears to
> work correctly with my NCT6779D chip. With the attached patch, I get this
> in dmesg:
> 
> wbsio0 at isa0 port 0x2e/2: NCT6779D rev 0x62
> lm1 at wbsio0 port 0x290/8: NCT6779D
> 
> And here's the sensor data:
> 
> $ sysctl hw.sensors.lm1
> hw.sensors.lm1.temp0=29.00 degC (MB Temperature)
> hw.sensors.lm1.temp1=31.00 degC (CPU Temperature)
> hw.sensors.lm1.temp2=78.00 degC (Aux Temp0)
> hw.sensors.lm1.temp3=98.00 degC (Aux Temp1)
> hw.sensors.lm1.temp4=22.50 degC (Aux Temp2)
> hw.sensors.lm1.temp5=-23.00 degC (Aux Temp3)
> hw.sensors.lm1.fan0=1095 RPM (System Fan)
> hw.sensors.lm1.fan1=739 RPM (CPU Fan)
> hw.sensors.lm1.fan2=400 RPM (Aux Fan0)
> hw.sensors.lm1.fan3=0 RPM (Aux Fan1)
> hw.sensors.lm1.fan4=387 RPM (Aux Fan2)
> hw.sensors.lm1.volt0=0.74 VDC (VCore)
> hw.sensors.lm1.volt1=2.16 VDC (VIN1)
> hw.sensors.lm1.volt2=3.33 VDC (AVCC)
> hw.sensors.lm1.volt3=3.33 VDC (+3.3V)
> hw.sensors.lm1.volt4=21.56 VDC (VIN0)
> hw.sensors.lm1.volt5=0.87 VDC (VIN8)
> hw.sensors.lm1.volt6=0.59 VDC (VIN4)
> hw.sensors.lm1.volt7=3.46 VDC (+3.3VSB)
> hw.sensors.lm1.volt8=0.00 VDC (VBAT)
> hw.sensors.lm1.volt9=0.00 VDC (VTT)
> hw.sensors.lm1.volt10=0.45 VDC (VIN5)
> hw.sensors.lm1.volt11=2.13 VDC (VIN6)
> hw.sensors.lm1.volt12=3.38 VDC (VIN2)
> hw.sensors.lm1.volt13=5.12 VDC (VIN3)
> hw.sensors.lm1.volt14=1.81 VDC (VIN7)
> 
> The motherboard and CPU temperature values look very reasonable; the Aux
> Temp values look like garbage, possibly because there aren't any other
> sensors on this board? Fan values look reasonable. I am unsure about the
> voltage values, but the ones that claim to be 3.3 volts look sane.

I see similar numbers with your patch applied.

The BIOS doesn't show any other temperature values, so it is possble the
Aux Temp sensors aren't hooked up to anything. In any event, the read
procedure looks the same as the others (according to the datasheet), so
I don't think it is wrong.

> This is the first non-trivial patch I'm submitting and my C is pretty
> rusty, so I would appreciate review and corrections. I don't have any
> other systems with different Nuvoton chips to test, so I can't confirm
> that this works for the other chips.
> 
> Could anyone please review this and help me get it into commit-ready shape?

The diff looks pretty good to me - I have a couple of comments inline.
Overall it looks good, and comparing to the NetBSD source it looks like
you got everything.

> Index: sys/dev/ic/lm78.c
> ===
> RCS file: /cvs/src/sys/dev/ic/lm78.c,v
> retrieving revision 1.24
> diff -u -p -u -r1.24 lm78.c
> --- sys/dev/ic/lm78.c 14 Mar 2015 03:38:47 -  1.24
> +++ sys/dev/ic/lm78.c 13 Dec 2019 22:49:03 -
> @@ -67,6 +67,7 @@ void wb_refresh_temp(struct lm_softc *, 
>  void wb_refresh_fanrpm(struct lm_softc *, int);
>  void wb_nct6776f_refresh_fanrpm(struct lm_softc *, int);
>  void wb_w83792d_refresh_fanrpm(struct lm_softc *, int);
> +const char * wb_nct67xx_id2str(uint8_t);
>  
>  void as_refresh_temp(struct lm_softc *, int);
>  
> @@ -80,6 +81,20 @@ struct lm_chip lm_chips[] = {
>   { def_match } /* Must be last */
>  };
>  
> +struct {
> + uint8_t id;
> + const char *str;
> +} nct_chips[] = {
> + {WBSIO_ID_NCT6775F, "NCT6775F"},
> + {WBSIO_ID_NCT6776F, "NCT6776F"},
> + {WBSIO_ID_NCT5104D, "NCT5104D or 610[246]D"},

The existing code prints this as just NCT5104D, so I would leave it
at that, but I don't think we actually need this struct (see below).

> + {WBSIO_ID_NCT6779D, "NCT6779D"},
> + {WBSIO_ID_NCT6791D, "NCT6791D"},
> + {WBSIO_ID_NCT6792D, "NCT6792D"},
> + {WBSIO_ID_NCT6793D, "NCT6793D"},
> + {WBSIO_ID_NCT6795D, "NCT6795D"},
> +};
> +
> @@ -489,6 +541,7 @@ def_match(struct lm_softc *sc)
>  int
>  wb_match(struct lm_softc *sc)
>  {
> + const char *model = NULL;
>   int banksel, vendid, devid;
>  
>   /* Read vendor ID */
> @@ -526,12 +579,46 @@ wb_match(struct lm_softc *sc)
>   lm_setup_sensors(sc, w83627ehf_sensors);
>   break;
>   case WB_CHIPID_W83627DHG:
> - if (sc->sioid == WBSIO_ID_NCT6776F) {
> -

Neuter shm calls from X swrast_dri.so

2019-02-24 Thread Todd Mortimer
A few weeks ago I noticed that firefox tabs were getting killed for
running afoul of pledge(2). It seems that the problem was some calls to
shmget(2) from the X swrast_dri.so lib that seem to have come from the
new mesa code that was recently imported. Since the shm syscalls aren't
covered by any pledge the system was killing the firefox tabs when they
called into X and X went down this code path.

The shm calls are guarded by a #ifdef, so the patch below just
modifies the conditions so OpenBSD does not include the shm function and
falls back to ordinary malloc. With this patch my firefox works again.

The alternative is to add shm to pledge(2) somewhere. I expect that
adding a syscall to pledge for a single program is unwanted, but in this
case it would be for any program using X with this DRI module. A quick
check in xenocara finds a small number of other references to the shm
functions (lib/libXvMC/src/XvMC.c, lib/xcb-util-image/), but I don't
know if we use these.

ok?


Index: lib/mesa/src/gallium/winsys/sw/dri/dri_sw_winsys.c
===
RCS file: /cvs/xenocara/lib/mesa/src/gallium/winsys/sw/dri/dri_sw_winsys.c,v
retrieving revision 1.7
diff -u -p -u -r1.7 dri_sw_winsys.c
--- lib/mesa/src/gallium/winsys/sw/dri/dri_sw_winsys.c  19 Feb 2019 04:24:01 
-  1.7
+++ lib/mesa/src/gallium/winsys/sw/dri/dri_sw_winsys.c  24 Feb 2019 15:16:35 
-
@@ -26,8 +26,9 @@
  *
  **/

-#if !defined(ANDROID) || ANDROID_API_LEVEL >= 26
+#if (!defined(ANDROID) || ANDROID_API_LEVEL >= 26) && !defined(__OpenBSD__)
 /* Android's libc began supporting shm in Oreo */
+/* OpenBSD does not allow shm in programs using pledge(2) */
 #define HAVE_SHM
 #include 
 #include 




lldb - detect retguard prologue

2019-02-17 Thread Todd Mortimer
The diff below teaches the lldb assembly inspector to skip over the
retguard instrumentation when traversing function prologues.

ok?


diff --git 
a/gnu/llvm/tools/lldb/source/Plugins/UnwindAssembly/x86/x86AssemblyInspectionEngine.cpp
 
b/gnu/llvm/tools/lldb/source/Plugins/UnwindAssembly/x86/x86AssemblyInspectionEngine.cpp
index 10a56980594..251635c7e6f 100644
--- 
a/gnu/llvm/tools/lldb/source/Plugins/UnwindAssembly/x86/x86AssemblyInspectionEngine.cpp
+++ 
b/gnu/llvm/tools/lldb/source/Plugins/UnwindAssembly/x86/x86AssemblyInspectionEngine.cpp
@@ -593,6 +593,18 @@ bool x86AssemblyInspectionEngine::ret_pattern_p() {
   return false;
 }

+// movq $0x(%rip), $reg [(0x4c || 0x48) 0x8b ?? ?? ?? ?? ??]
+// xorq $off(%rsp), $reg[(0x4c || 0x48) 0x33 ?? 0x24]
+bool x86AssemblyInspectionEngine::retguard_prologue_p(size_t offset, int 
insn_len) {
+  uint8_t *p = m_cur_insn;
+  if (offset == 0 && insn_len == 7)
+return (*p == 0x48 || *p == 0x4c) && (*(p + 1) == 0x8b);
+  else if (offset == 7 && insn_len == 4)
+return (*p == 0x48 || *p == 0x4c) && (*(p + 1) == 0x33) && (*(p + 3) == 
0x24);
+
+  return false;
+}
+
 uint32_t x86AssemblyInspectionEngine::extract_4(uint8_t *b) {
   uint32_t v = 0;
   for (int i = 3; i >= 0; i--)
@@ -1214,6 +1226,7 @@ bool 
x86AssemblyInspectionEngine::FindFirstNonPrologueInstruction(
 if (push_rbp_pattern_p() || mov_rsp_rbp_pattern_p() ||
 sub_rsp_pattern_p(scratch) || push_reg_p(regno) ||
 mov_reg_to_local_stack_frame_p(regno, scratch) ||
+retguard_prologue_p(offset, insn_len) ||
 (lea_rsp_pattern_p(scratch) && offset == 0)) {
   offset += insn_len;
   continue;
diff --git 
a/gnu/llvm/tools/lldb/source/Plugins/UnwindAssembly/x86/x86AssemblyInspectionEngine.h
 
b/gnu/llvm/tools/lldb/source/Plugins/UnwindAssembly/x86/x86AssemblyInspectionEngine.h
index cec9803c8a4..8ef4ab59c63 100644
--- 
a/gnu/llvm/tools/lldb/source/Plugins/UnwindAssembly/x86/x86AssemblyInspectionEngine.h
+++ 
b/gnu/llvm/tools/lldb/source/Plugins/UnwindAssembly/x86/x86AssemblyInspectionEngine.h
@@ -110,6 +110,7 @@ private:
   bool call_next_insn_pattern_p();
   bool mov_reg_to_local_stack_frame_p(int , int _offset);
   bool ret_pattern_p();
+  bool retguard_prologue_p(size_t offset, int insn_len);
   uint32_t extract_4(uint8_t *b);

   bool instruction_length(uint8_t *insn, int , uint32_t 
buffer_remaining_bytes);



Re: -msave-args : uninitialized variable

2019-02-04 Thread Todd Mortimer
On Mon, Feb 04, 2019 at 04:53:36PM +0100, Sebastien Marie wrote:
> Hi,
>
> Recently, devel/llvm (the port) has copied the -msave-args diff from
> base, and it resulted lang/rust to segfault.
>
> Since, the change has been backouted, but I continued searching the root
> cause as the diff is on base too.
>
> and I think the culprit is SaveArgs variable member in class X86Subtarget.
>
> If -msave-args is used, it is set to `true'.
> If -mno-save-args is used, it is set to `false'.
> But else, it lefts uninitialized.
>
> As the value is a boolean, I changed type from unsigned to bool, and set
> the default value.
>
> I didn't test it on base, but with such patch on devel/llvm, rust is
> able to compile correctly.

ok mortimer@.

The unitialized memory argument makes sense. I would expect that it
would affect more than just the rust port though.

>
> --
> Sebastien Marie
>
> Index: lib/Target/X86/X86Subtarget.h
> ===
> RCS file: /cvs/src/gnu/llvm/lib/Target/X86/X86Subtarget.h,v
> retrieving revision 1.4
> diff -u -p -r1.4 X86Subtarget.h
> --- lib/Target/X86/X86Subtarget.h 30 Jan 2019 03:08:12 -  1.4
> +++ lib/Target/X86/X86Subtarget.h 4 Feb 2019 15:44:11 -
> @@ -401,7 +401,7 @@ protected:
>unsigned stackAlignment = 4;
>
>/// Whether function prologues should save register arguments on the stack.
> -  unsigned SaveArgs;
> +  bool SaveArgs = false;
>
>/// Max. memset / memcpy size that is turned into rep/movs, rep/stos ops.
>///
> @@ -481,7 +481,7 @@ public:
>  return ()->getRegisterInfo();
>}
>
> -  unsigned getSaveArgs() const { return SaveArgs; }
> +  bool getSaveArgs() const { return SaveArgs; }
>
>/// Returns the minimum alignment known to hold of the
>/// stack frame on entry to the function and which must be maintained by 
> every
>



Re: Fix gdb "target kvm" on amd64

2019-01-30 Thread Todd Mortimer
On Tue, Jan 22, 2019 at 02:47:10PM +0900, YASUOKA Masahiko wrote:
> Hi,
>
> Currently gdb "target kvm bsd.0.core" on amd64 doesn't work.  It can't
> read the registers and can't follow the stack frames.  The diff
> follows fixes those problems.
>
> ok? comment?

Sorry for the delay getting to this.

This looks ok to me. The checks look complete, so you'll see the
prologue instructions no matter which register we're using.

ok mortimer@

Now I'll have to go look at egdb and lldb to see if they need the same
treatment. :-)

Thanks!


>
> Fix gdb can handle prologues which has the retguard.  Also teach gdb
> that alltraps_kern() is a trap function.  Original diff from IIJ.
>
> Index: gnu/usr.bin/binutils/gdb/amd64-tdep.c
> ===
> RCS file: /cvs/src/gnu/usr.bin/binutils/gdb/amd64-tdep.c,v
> retrieving revision 1.1.1.2
> diff -u -p -r1.1.1.2 amd64-tdep.c
> --- gnu/usr.bin/binutils/gdb/amd64-tdep.c 27 Dec 2004 13:05:33 -  
> 1.1.1.2
> +++ gnu/usr.bin/binutils/gdb/amd64-tdep.c 22 Jan 2019 05:28:15 -
> @@ -741,13 +741,31 @@ amd64_analyze_prologue (CORE_ADDR pc, CO
>   struct amd64_frame_cache *cache)
>  {
>static unsigned char proto[3] = { 0x48, 0x89, 0xe5 };
> -  unsigned char buf[3];
> +  unsigned char buf[10];
>unsigned char op;
> +  CORE_ADDR pc0 = pc;
>
>if (current_pc <= pc)
>  return current_pc;
>
> -  op = read_memory_unsigned_integer (pc, 1);
> +  op = read_memory_unsigned_integer (pc0, 1);
> +
> +  /* Check for retguard */
> +  if ((op == 0x4c || op == 0x48) && (current_pc <= pc + 11))
> +{
> +  read_memory (pc0 + 1, buf, 10);
> +  pc0 += 11;
> +
> +  /* Check for `movq (__retguard_ ## x)(%rip), %reg;'. */
> +  if (buf[0] != 0x8b)
> + return pc;
> +
> +  /* Check for `xorq off(%rsp), %reg'. */
> +  if ((buf[6] != 0x4c && buf[6] != 0x48)
> +   || buf[7] != 0x33 || buf[9] != 0x24)
> + return pc;
> +  op = read_memory_unsigned_integer (pc0, 1);
> +}
>
>if (op == 0x55)/* pushq %rbp */
>  {
> @@ -757,17 +775,17 @@ amd64_analyze_prologue (CORE_ADDR pc, CO
>cache->sp_offset += 8;
>
>/* If that's all, return now.  */
> -  if (current_pc <= pc + 1)
> +  if (current_pc <= pc0 + 1)
>  return current_pc;
>
>/* Check for `movq %rsp, %rbp'.  */
> -  read_memory (pc + 1, buf, 3);
> +  read_memory (pc0 + 1, buf, 3);
>if (memcmp (buf, proto, 3) != 0)
> - return pc + 1;
> + return pc0 + 1;
>
>/* OK, we actually have a frame.  */
>cache->frameless_p = 0;
> -  return pc + 4;
> +  return pc0 + 4;
>  }
>
>return pc;
> Index: gnu/usr.bin/binutils/gdb/amd64obsd-tdep.c
> ===
> RCS file: /cvs/src/gnu/usr.bin/binutils/gdb/amd64obsd-tdep.c,v
> retrieving revision 1.11
> diff -u -p -r1.11 amd64obsd-tdep.c
> --- gnu/usr.bin/binutils/gdb/amd64obsd-tdep.c 10 Jul 2018 08:57:44 -  
> 1.11
> +++ gnu/usr.bin/binutils/gdb/amd64obsd-tdep.c 22 Jan 2019 05:28:15 -
> @@ -452,6 +452,7 @@ amd64obsd_trapframe_sniffer (const struc
>return (name && ((strcmp (name, "calltrap") == 0)
>  || (name[0] == 'X' && strncmp(name, "Xipi_", 5) != 0)
>  || (strcmp (name, "alltraps") == 0)
> +|| (strcmp (name, "alltraps_kern") == 0)
>  || (strcmp (name, "intr_fast_exit") == 0)
>  || (strcmp (name, "intr_exit_recurse") == 0)));
>  }
>



Re: [patch] Remove unused variable in gnu/llvm/lib/Target/X86/X86FrameLowering.cpp

2018-06-06 Thread Todd Mortimer
On Wed, Jun 06, 2018 at 03:57:44PM +0800, Nan Xiao wrote:
> Hi tech@,
>
> The following patch fixes the build warning:

Committed, thanks!

>
> ..
>
>  variable 'CFIIndex' [-Wunused-variable]
>   unsigned CFIIndex;
>^
> 1 warning generated.
>
>
> Index: X86FrameLowering.cpp
> ===
> RCS file: /cvs/src/gnu/llvm/lib/Target/X86/X86FrameLowering.cpp,v
> retrieving revision 1.2
> diff -u -p -r1.2 X86FrameLowering.cpp
> --- X86FrameLowering.cpp  6 Jun 2018 00:14:29 -   1.2
> +++ X86FrameLowering.cpp  6 Jun 2018 07:34:44 -
> @@ -3185,7 +3185,6 @@ void X86FrameLowering::insertReturnProte
>if (!MFI.hasReturnProtector() || !MFI.hasReturnProtectorTempRegister())
>  return;
>
> -  unsigned CFIIndex;
>MachineBasicBlock::instr_iterator MBBI = MBB.instr_begin();
>DebugLoc MBBDL = MBB.findDebugLoc(MBBI);
>const Function  = MF.getFunction();
> --
> Best Regards
> Nan Xiao
>



Quiet clang warnings in elfrd_size.c

2018-06-01 Thread Todd Mortimer
New clang warns about some differences between Elf64_Addr (aka
uint64) and Elf32_Addr (aka uint32) and the format type provided for
these values in the debug output from distrib/common/elfrd_size.c. The
same code is used when compiling both 32 and 64 bit versions, so we used
ptrdiff_t to get size specific formats. New clang warns on this format
specifier:

In file included from elf32.c:3:
./elfrd_size.c:118:29: warning: format specifies type 'unsigned ptrdiff_t'
  (aka 'unsigned long') but the argument has type 'Elf32_Addr'
  (aka 'unsigned int') [-Wformat]

In file included from elf64.c:3:
./elfrd_size.c:118:29: warning: format specifies type 'unsigned ptrdiff_t'
  (aka 'unsigned long') but the argument has type 'Elf64_Addr'
  (aka 'unsigned long long') [-Wformat]

Since these values are actually unsigned integers of width 32/64 bits,
we can use zx to print them, and cast the elements to size_t to quiet
clang.

ok?


Index: distrib/common/elfrd_size.c
===
RCS file: /cvs/src/distrib/common/elfrd_size.c,v
retrieving revision 1.9
diff -u -p -u -r1.9 elfrd_size.c
--- distrib/common/elfrd_size.c 26 Apr 2018 12:42:50 -  1.9
+++ distrib/common/elfrd_size.c 1 Jun 2018 19:58:57 -
@@ -114,9 +114,9 @@ ELFNAME(find_rd_root_image)(char *file,
fprintf(stderr, "segment %d rd_root_size_off = 0x%llx\n", 
segment,
rd_root_size_off);
if ((ph->p_vaddr - ph->p_paddr) != 0)
-   fprintf(stderr, "root_off v %tx p %tx, diff %tx altered 
%llx\n",
-   ph->p_vaddr, ph->p_paddr,
-   (ph->p_vaddr - ph->p_paddr),
+   fprintf(stderr, "root_off v %zx p %zx, diff %zx altered 
%llx\n",
+   (size_t)ph->p_vaddr, (size_t)ph->p_paddr,
+   (size_t)(ph->p_vaddr - ph->p_paddr),
rd_root_size_off - (ph->p_vaddr - ph->p_paddr));
fprintf(stderr, "rd_root_image_off = 0x%llx\n", 
rd_root_image_off);
}



Re: clang: Avoid EBX/RBX

2017-11-15 Thread Todd Mortimer
Hi tech@,

This is an updated diff that shuffles the allocation order for registers
on i386/amd64. The last one exposed a subtle bug with the way chromium
and libexecinfo interact when creating backtraces. With this diff, make
release seems fine, I didn't see any differences in regress, and
chromium builds. We are still removing about 4500 unique gadgets from
the kernel this way (about 6% of the total unique gadgets).

The problem with the previous diff was that it put RBP before RBX, which
broke some assumptions between chromium and libexecinfo. When building
chromium, the C++ code ended up allocating RBP as a general purpose
register, which broke backtraces from the devel/libexecinfo port, which
is expecting RBP to hold a frame pointer. In the original order, RBP
points to a stack address that holds 0, so it 'worked' by terminating
the backtrace immediately, but this seems like a bit of a fluke. I am
not sure if / how we want to deal with this. Basically, libexecinfo
is using the __builtin_frame_address and __builtin_return_address
compiler builtins, which expect RBP to hold a frame pointer. Any port
that calls backtrace() from code that is using RBP as a general purpose
register will be incorrect / unstable. This seems like an uncommon
problem.

Todd

Index: lib/Target/X86/X86RegisterInfo.td
===
RCS file: /cvs/src/gnu/llvm/lib/Target/X86/X86RegisterInfo.td,v
retrieving revision 1.1.1.4
diff -u -p -u -p -r1.1.1.4 X86RegisterInfo.td
--- lib/Target/X86/X86RegisterInfo.td   4 Oct 2017 20:28:02 -   1.1.1.4
+++ lib/Target/X86/X86RegisterInfo.td   11 Nov 2017 14:04:23 -
@@ -339,8 +339,8 @@ def GR16 : RegisterClass<"X86", [i16], 1
   R8W, R9W, R10W, R11W, R14W, R15W, R12W, R13W)>;

 def GR32 : RegisterClass<"X86", [i32], 32,
- (add EAX, ECX, EDX, ESI, EDI, EBX, EBP, ESP,
-  R8D, R9D, R10D, R11D, R14D, R15D, R12D, R13D)>;
+ (add EAX, ECX, EDX, ESI, EDI,
+  R8D, R9D, R10D, R11D, R14D, R15D, R12D, R13D, 
EBX, EBP, ESP)>;

 // GR64 - 64-bit GPRs. This oddly includes RIP, which isn't accurate, since
 // RIP isn't really a register and it can't be used anywhere except in an
@@ -349,7 +349,7 @@ def GR32 : RegisterClass<"X86", [i32], 3
 // tests because of the inclusion of RIP in this register class.
 def GR64 : RegisterClass<"X86", [i64], 64,
  (add RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11,
-  RBX, R14, R15, R12, R13, RBP, RSP, RIP)>;
+  R14, R15, R12, R13, RBX, RBP, RSP, RIP)>;

 // Segment registers for use by MOV instructions (and others) that have a
 //   segment register as one operand.  Always contain a 16-bit segment

- End forwarded message -



clang: Avoid EBX/RBX

2017-10-22 Thread Todd Mortimer
Hello tech@,

The attached diff changes the order in which clang will allocate
registers on X86, specifically so EBX / RBX are selected last. The
reason is because some instructions using RBX as the destination operand
and either RAX or RCX as the source result in machine code that includes
a C3 or CB byte, which is useful as a return in ROP gadgets. By choosing
EBX/RBX last, there are less of these instructions, and therefore less
of these gadgets.


diff --git lib/Target/X86/X86RegisterInfo.td lib/Target/X86/X86RegisterInfo.td
index 3a61a7247c7..f5106c40445 100644
--- lib/Target/X86/X86RegisterInfo.td
+++ lib/Target/X86/X86RegisterInfo.td
@@ -339,8 +339,8 @@ def GR16 : RegisterClass<"X86", [i16], 16,
   R8W, R9W, R10W, R11W, R14W, R15W, R12W, R13W)>;

 def GR32 : RegisterClass<"X86", [i32], 32,
- (add EAX, ECX, EDX, ESI, EDI, EBX, EBP, ESP,
-  R8D, R9D, R10D, R11D, R14D, R15D, R12D, R13D)>;
+ (add EAX, ECX, EDX, ESI, EDI, EBP, ESP,
+  R8D, R9D, R10D, R11D, R14D, R15D, R12D, R13D, 
EBX)>;

 // GR64 - 64-bit GPRs. This oddly includes RIP, which isn't accurate, since
 // RIP isn't really a register and it can't be used anywhere except in an
@@ -349,7 +349,7 @@ def GR32 : RegisterClass<"X86", [i32], 32,
 // tests because of the inclusion of RIP in this register class.
 def GR64 : RegisterClass<"X86", [i64], 64,
  (add RAX, RCX, RDX, RSI, RDI, R8, R9, R10, R11,
-  RBX, R14, R15, R12, R13, RBP, RSP, RIP)>;
+  R14, R15, R12, R13, RBP, RSP, RIP, RBX)>;

 // Segment registers for use by MOV instructions (and others) that have a
 //   segment register as one operand.  Always contain a 16-bit segment



Re: llvm - xor return pointers

2017-08-15 Thread Todd Mortimer
On Tue, Aug 15, 2017 at 05:12:39PM -0700, Ori Bernstein wrote:
> On Sat, 22 Jul 2017 02:25:29 -0400
> Todd Mortimer <t...@opennet.ca> wrote:
> 
> > xor [rsp], rsp
> > 
> > at the start of each function, and before every RET.
> 
> Wouldn't this break with alloca() or C99 VLAs?
> %rbp may work better, if the frame pointer is
> retained.

If alloca or C99 VLAs would break with this, than RET is broken with
alloca and C99 VLAs. At the time of the RET, %rsp *must* point to the
return address, or else RET will pop the wrong thing. So as long as we
insert the xor as the first instruction in the function preamble, then
we will xor the return pointer before alloca or VLAs have an opportunity
to modify %rsp, so when we later RET, we can xor it back.

Of course, I am happy to be proven wrong. If you can demonstrate a test
case where this breaks things, then please send it along and we can see
about fixing it. 

Cheers,
Todd



llvm - xor return pointers

2017-07-22 Thread Todd Mortimer
Hello tech,

I've been working on llvm/clang some lately, and am experimenting with
the llvm Pass infrastructure. Passes essentially let you perform
arbitrary transforms on the program at various points in the compilation
process.

I've attached a patch that defines a machine function pass that adds:

xor [rsp], rsp

at the start of each function, and before every RET. This means that the
return pointer on the stack is xored with the stack address where it
happens to be. When the function returns, the same transform is applied
again to reverse the process. The consequences of doing this are (a) The
return address is obfuscated on the stack, which mitigates against some
info leaks and (b) All the RETs in the program become hard to use in ROP
gadgets, because the return address is permuted before being popped.
ASLR on the stack makes predicting stack addresses difficult, so if an
attacker is dumping a ROP chain onto the stack that uses these gadgets,
they will need to know the addresses they are writing to on the stack in
order for them to work, which adds an extra hurdle. 

The performance cost here is 2 extra instructions per function call. I
picked rsp to xor with the return address because it is cheap to use,
non-constant, and makes the transform simple and easy to reason about. I
am happy to bikeshed about better choices if people are interested.

I've done some light testing on this, and it seems to work on my test
programs and when I did it to my libc. I am not really sure if this is
interesting enough to warrant more work, but I figured there isn't any
harm in posting it up. So I'm not looking for okays, but figured it
might be interesting to some others on the list.

Todd


diff --git llvm/lib/Target/X86/CMakeLists.txt llvm/lib/Target/X86/CMakeLists.txt
index 9dfd09022bd..dc2d58073fd 100644
--- llvm/lib/Target/X86/CMakeLists.txt
+++ llvm/lib/Target/X86/CMakeLists.txt
@@ -45,6 +45,7 @@ set(sources
   X86MachineFunctionInfo.cpp
   X86OptimizeLEAs.cpp
   X86PadShortFunction.cpp
+  X86GuardStackRet.cpp
   X86RegisterInfo.cpp
   X86SelectionDAGInfo.cpp
   X86ShuffleDecodeConstantPool.cpp
diff --git llvm/lib/Target/X86/X86.h llvm/lib/Target/X86/X86.h
index 2cb80a482d0..4b8a132f648 100644
--- llvm/lib/Target/X86/X86.h
+++ llvm/lib/Target/X86/X86.h
@@ -50,6 +50,10 @@ FunctionPass *createX86IssueVZeroUpperPass();
 /// This will prevent a stall when returning on the Atom.
 FunctionPass *createX86PadShortFunctions();
 
+/// Return a pass that adds xor instructions for return pointers
+/// on the stack
+FunctionPass *createX86GuardStackRet();
+
 /// Return a pass that selectively replaces certain instructions (like add,
 /// sub, inc, dec, some shifts, and some multiplies) by equivalent LEA
 /// instructions, in order to eliminate execution delays in some processors.
diff --git llvm/lib/Target/X86/X86GuardStackRet.cpp 
llvm/lib/Target/X86/X86GuardStackRet.cpp
new file mode 100644
index 000..e7261a85aee
--- /dev/null
+++ llvm/lib/Target/X86/X86GuardStackRet.cpp
@@ -0,0 +1,98 @@
+//=== X86GuardStackRet.cpp - xor return pointers ---===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===--===//
+//
+// This file defines a pass that will add an xor (rsp), rsp instruction to
+// each function preamble, and before any ret.
+//
+//===--===//
+
+#include 
+
+#include "X86.h"
+#include "X86InstrInfo.h"
+#include "X86Subtarget.h"
+#include "X86InstrBuilder.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Function.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Target/TargetInstrInfo.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "x86-guard-ret"
+
+namespace {
+  struct GuardStackRet : public MachineFunctionPass {
+static char ID;
+GuardStackRet() : MachineFunctionPass(ID)
+, STI(nullptr), TII(nullptr) {}
+
+bool runOnMachineFunction(MachineFunction ) override;
+
+MachineFunctionProperties getRequiredProperties() const override {
+  return MachineFunctionProperties().set(
+  MachineFunctionProperties::Property::NoVRegs);
+}
+
+StringRef getPassName() const override {
+  return "X86 Guard RET Instructions";
+}
+
+  private:
+void addXORInst(MachineBasicBlock , MachineInstr );
+
+const X86Subtarget *STI;
+const TargetInstrInfo *TII;
+   bool is64bit;
+  };
+
+  char GuardStackRet::ID = 0;
+}
+
+FunctionPass *llvm::createX86GuardStackRet() {
+  return new GuardStackRet();
+}
+
+/// runOnMachineFunction - Loop over all of the 

clang: emit trap padding between functions

2017-07-17 Thread Todd Mortimer
Hello tech,

The patch below teaches clang to treat padding between functions
differently than padding inside functions. Padding between functions is
completely filled with trap instructions, and padding inside functions
is padded as usual (trapsleds on X86, NOPs on everything else). This
means that the trapsleds between functions no longer start with a short
JMP, but are instead completely filled with int3 (0xCC), which makes them
slightly more ROP unfriendly.

This has been through a bulk build on amd64, and I have been running
base + kernel build with this for about a week without issue, so this
change seems to be stable. I would interested to get feedback from
someone running arm64, since this change does touch some generic code
shared across architectures, but there shouldn't be any binary changes
outside X86.

Any other feedback / bikeshedding is welcome.

ok?

Todd

Index: include/llvm/CodeGen/AsmPrinter.h
===
RCS file: /cvs/src/gnu/llvm/include/llvm/CodeGen/AsmPrinter.h,v
retrieving revision 1.1.1.4
diff -u -p -u -p -r1.1.1.4 AsmPrinter.h
--- include/llvm/CodeGen/AsmPrinter.h   14 Mar 2017 08:08:02 -  1.1.1.4
+++ include/llvm/CodeGen/AsmPrinter.h   10 Jul 2017 01:12:02 -
@@ -297,6 +297,11 @@ public:
   ///
   void EmitAlignment(unsigned NumBits, const GlobalObject *GO = nullptr) const;
 
+  /// Emit an alinment directive to the specified power of two boundary,
+  /// like EmitAlignment, but call EmitTrapToAlignment to fill with
+  /// trap instructions instead of NOPs.
+  void EmitTrapAlignment(unsigned NumBits, const GlobalObject *GO = nullptr) 
const;
+
   /// Lower the specified LLVM Constant to an MCExpr.
   virtual const MCExpr *lowerConstant(const Constant *CV);
 
@@ -354,6 +359,11 @@ public:
   virtual void EmitInstruction(const MachineInstr *) {
 llvm_unreachable("EmitInstruction not implemented");
   }
+
+  /// Emit an alignment directive to the specified power
+  /// of two boundary, but use Trap instructions for alignment
+  /// sections that should never be executed.
+  virtual void EmitTrapToAlignment(unsigned NumBits) const;
 
   /// Return the symbol for the specified constant pool entry.
   virtual MCSymbol *GetCPISymbol(unsigned CPID) const;
Index: lib/CodeGen/AsmPrinter/AsmPrinter.cpp
===
RCS file: /cvs/src/gnu/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp,v
retrieving revision 1.1.1.4
diff -u -p -u -p -r1.1.1.4 AsmPrinter.cpp
--- lib/CodeGen/AsmPrinter/AsmPrinter.cpp   14 Mar 2017 08:08:14 -  
1.1.1.4
+++ lib/CodeGen/AsmPrinter/AsmPrinter.cpp   10 Jul 2017 01:12:02 -
@@ -590,7 +590,7 @@ void AsmPrinter::EmitFunctionHeader() {
 
   EmitLinkage(F, CurrentFnSym);
   if (MAI->hasFunctionAlignment())
-EmitAlignment(MF->getAlignment(), F);
+EmitTrapAlignment(MF->getAlignment(), F);
 
   if (MAI->hasDotTypeDotSizeDirective())
 OutStreamer->EmitSymbolAttribute(CurrentFnSym, MCSA_ELF_TypeFunction);
@@ -1759,6 +1759,33 @@ void AsmPrinter::EmitAlignment(unsigned 
   else
 OutStreamer->EmitValueToAlignment(1u << NumBits);
 }
+
+//===--===//
+/// EmitTrapAlignment - Emit an alignment directive to the specified power of
+/// two boundary, but call EmitTrapToAlignment to fill with Trap instructions
+/// if the Target implements EmitTrapToAlignment.
+void AsmPrinter::EmitTrapAlignment(unsigned NumBits, const GlobalObject *GV) 
const {
+  if (GV)
+NumBits = getGVAlignmentLog2(GV, GV->getParent()->getDataLayout(), 
NumBits);
+
+  if (NumBits == 0) return;   // 1-byte aligned: no need to emit alignment.
+
+  assert(NumBits <
+ static_cast(std::numeric_limits::digits) &&
+ "undefined behavior");
+  EmitTrapToAlignment(NumBits);
+}
+
+//===--===//
+/// EmitTrapToAlignment - Emit an alignment directive to the specified power
+/// of two boundary. This default implementation calls EmitCodeAlignment on
+/// the OutStreamer, but can be overridden by Target implementations.
+void AsmPrinter::EmitTrapToAlignment(unsigned NumBits) const {
+  if (NumBits == 0) return;
+  OutStreamer->EmitCodeAlignment(1u << NumBits);
+}
+
+
 
 
//===--===//
 // Constant emission.
Index: lib/Target/X86/X86AsmPrinter.h
===
RCS file: /cvs/src/gnu/llvm/lib/Target/X86/X86AsmPrinter.h,v
retrieving revision 1.1.1.3
diff -u -p -u -p -r1.1.1.3 X86AsmPrinter.h
--- lib/Target/X86/X86AsmPrinter.h  24 Jan 2017 08:33:28 -  1.1.1.3
+++ lib/Target/X86/X86AsmPrinter.h  10 Jul 2017 01:12:02 -
@@ -113,6 +113,8 @@ public:
 
   void EmitInstruction(const MachineInstr *MI) override;
 
+  void EmitTrapToAlignment(unsigned NumBits) const override;
+
   void 

Re: Trapsleds

2017-06-20 Thread Todd Mortimer
> 2. This patch also hits NOP sleds > 8 bytes on i386. We could also hit
> the NOP sleds between 3 and 7 bytes if there are no objections.

The attached diff implements the same trapsled mechanism for i386 and
amd64 for all padding sequences between 3 and 15 bytes.

I have put this through a kernel and base build on i386 without apparent
ill effect, and the amd64 parts are unchanged from the last diff.

Todd


Index: gas/config/tc-i386.c
===
RCS file: /cvs/src/gnu/usr.bin/binutils-2.17/gas/config/tc-i386.c,v
retrieving revision 1.7
diff -u -p -u -p -r1.7 tc-i386.c
--- gas/config/tc-i386.c4 Jun 2017 20:26:18 -   1.7
+++ gas/config/tc-i386.c21 Jun 2017 00:43:14 -
@@ -505,41 +505,9 @@ i386_align_code (fragP, count)
 {0x90};/* nop  */
   static const char f32_2[] =
 {0x89,0xf6};   /* movl %esi,%esi   */
-  static const char f32_3[] =
-{0x8d,0x76,0x00};  /* leal 0(%esi),%esi*/
-  static const char f32_4[] =
-{0x8d,0x74,0x26,0x00}; /* leal 0(%esi,1),%esi  */
-  static const char f32_5[] =
-{0x90, /* nop  */
- 0x8d,0x74,0x26,0x00}; /* leal 0(%esi,1),%esi  */
-  static const char f32_6[] =
-{0x8d,0xb6,0x00,0x00,0x00,0x00};   /* leal 0L(%esi),%esi   */
-  static const char f32_7[] =
-{0x8d,0xb4,0x26,0x00,0x00,0x00,0x00};  /* leal 0L(%esi,1),%esi */
-  static const char f32_8[] =
-{0x90, /* nop  */
- 0x8d,0xb4,0x26,0x00,0x00,0x00,0x00};  /* leal 0L(%esi,1),%esi */
-  static const char f32_9[] =
-{0x89,0xf6,/* movl %esi,%esi   
*/
- 0x8d,0xbc,0x27,0x00,0x00,0x00,0x00};  /* leal 0L(%edi,1),%edi */
-  static const char f32_10[] =
-{0x8d,0x76,0x00,   /* leal 0(%esi),%esi*/
- 0x8d,0xbc,0x27,0x00,0x00,0x00,0x00};  /* leal 0L(%edi,1),%edi */
-  static const char f32_11[] =
-{0x8d,0x74,0x26,0x00,  /* leal 0(%esi,1),%esi  */
- 0x8d,0xbc,0x27,0x00,0x00,0x00,0x00};  /* leal 0L(%edi,1),%edi */
-  static const char f32_12[] =
-{0x8d,0xb6,0x00,0x00,0x00,0x00,/* leal 0L(%esi),%esi   */
- 0x8d,0xbf,0x00,0x00,0x00,0x00};   /* leal 0L(%edi),%edi   */
-  static const char f32_13[] =
-{0x8d,0xb6,0x00,0x00,0x00,0x00,/* leal 0L(%esi),%esi   */
- 0x8d,0xbc,0x27,0x00,0x00,0x00,0x00};  /* leal 0L(%edi,1),%edi */
-  static const char f32_14[] =
-{0x8d,0xb4,0x26,0x00,0x00,0x00,0x00,   /* leal 0L(%esi,1),%esi */
- 0x8d,0xbc,0x27,0x00,0x00,0x00,0x00};  /* leal 0L(%edi,1),%edi */
   static const char f32_15[] =
-{0xeb,0x0d,0x90,0x90,0x90,0x90,0x90,   /* jmp .+15; lotsa nops */
- 0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90};
+{0xeb,0x0d,0xCC,0xCC,0xCC,0xCC,0xCC,   /* jmp .+15; lotsa int3 */
+ 0xCC,0xCC,0xCC,0xCC,0xCC,0xCC,0xCC,0xCC};
   static const char f16_3[] =
 {0x8d,0x74,0x00};  /* lea 0(%esi),%esi */
   static const char f16_4[] =
@@ -556,40 +524,31 @@ i386_align_code (fragP, count)
   static const char f16_8[] =
 {0x8d,0xb4,0x00,0x00,  /* lea 0w(%si),%si  */
  0x8d,0xbd,0x00,0x00}; /* lea 0w(%di),%di  */
+  static const char f64_2[] =
+{0x66,0x90};/* data16, nop*/
   static const char *const f32_patt[] = {
-f32_1, f32_2, f32_3, f32_4, f32_5, f32_6, f32_7, f32_8,
-f32_9, f32_10, f32_11, f32_12, f32_13, f32_14, f32_15
+f32_1, f32_2, f32_15, f32_15, f32_15, f32_15, f32_15, f32_15,
+f32_15, f32_15, f32_15, f32_15, f32_15, f32_15, f32_15
   };
   static const char *const f16_patt[] = {
 f32_1, f32_2, f16_3, f16_4, f16_5, f16_6, f16_7, f16_8,
 f32_15, f32_15, f32_15, f32_15, f32_15, f32_15, f32_15
   };
+  static const char *const f64_patt[] = {
+f32_1, f64_2, f32_15, f32_15, f32_15, f32_15, f32_15, f32_15,
+f32_15, f32_15, f32_15, f32_15, f32_15, f32_15, f32_15
+  };
 
   if (count <= 0 || count > 15)
 return;
 
-  /* The recommended way to pad 64bit code is to use NOPs preceded by
- maximally four 0x66 prefixes.  Balance the size of nops.  */
   if (flag_code == CODE_64BIT)
 {
-  int i;
-  int nnops = (count + 3) / 4;
-  int len = count / nnops;
-  int remains = count - nnops * len;
-  int pos = 0;
-
-  for (i = 0; i < remains; i++)
-   {
- memset (fragP->fr_literal + fragP->fr_fix + pos, 0x66, len);
- fragP->fr_literal[fragP->fr_fix + pos + len] = 0x90;
- pos += len + 1;
-   }
-  for (; i < nnops; i++)
-   {
- memset (fragP->fr_literal + fragP->fr_fix + pos, 0x66, len - 1);
- fragP->fr_literal[fragP->fr_fix + pos 

Trapsleds

2017-06-19 Thread Todd Mortimer
Hello tech,

I have attached a patch that converts NOP padding from the assembler
into INT3 padding on amd64. The idea is to remove potentially conveinent
NOP sleds from programs and libraries, which makes it harder for an
attacker to hit any ROP gadgets or other instructions after a NOP sled. 

NOP sleds are used for text alignment in order to get jump targets onto
16 byte boundaries. They can appear both in the middle of a function
and at the end. The trapsleds implemented in this diff convert NOP sleds
longer than 2 bytes from a series of 0x6690 instructions to a 2 byte
short JMP over a series of INT3 instructions that fill the rest of the
gap. Programs that would have normally just slid through the NOP sled
will now jump over. An attacker trying to hit the NOP sled will now get
a core dump.

I have been running this on my system for over a week without any
apparent ill effects. Specifically, there don't appear to be any
performance penalties associated with doing this. A full base build
on a system completely converted over to this took slightly less time to
complete than the same build on a normal system, and my synthetic
testing shows trapsleds perform similarly to nopsleds (performance
difference was <1%, which is within error over multiple runs).

If people like this, I can do up the equivalent diff for clang.

Things that could could be improved:

1. For padding inserted at the end of a function, the JMP is
unnecessary, and could also be a 0x. I am going to have a go at gcc
to see if I can coerce it into distinguishing end-of-function padding
from padding that is intended to be executed. If some kind soul with gcc
experience knows where I should look, any pointers would be welcome - my
previous attempt was not fruitful.

2. This patch also hits NOP sleds > 8 bytes on i386. We could also hit
the NOP sleds between 3 and 7 bytes if there are no objections.

Comments and suggestions are welcome. Thanks to Theo for suggesting it
in the hallway track at BSDCan. 

Todd

Index: gas/config/tc-i386.c
===
RCS file: /cvs/src/gnu/usr.bin/binutils-2.17/gas/config/tc-i386.c,v
retrieving revision 1.7
diff -u -p -u -p -r1.7 tc-i386.c
--- gas/config/tc-i386.c4 Jun 2017 20:26:18 -   1.7
+++ gas/config/tc-i386.c20 Jun 2017 00:36:27 -
@@ -538,8 +538,8 @@ i386_align_code (fragP, count)
 {0x8d,0xb4,0x26,0x00,0x00,0x00,0x00,   /* leal 0L(%esi,1),%esi */
  0x8d,0xbc,0x27,0x00,0x00,0x00,0x00};  /* leal 0L(%edi,1),%edi */
   static const char f32_15[] =
-{0xeb,0x0d,0x90,0x90,0x90,0x90,0x90,   /* jmp .+15; lotsa nops */
- 0x90,0x90,0x90,0x90,0x90,0x90,0x90,0x90};
+{0xeb,0x0d,0xCC,0xCC,0xCC,0xCC,0xCC,   /* jmp .+15; lotsa int3 */
+ 0xCC,0xCC,0xCC,0xCC,0xCC,0xCC,0xCC,0xCC};
   static const char f16_3[] =
 {0x8d,0x74,0x00};  /* lea 0(%esi),%esi */
   static const char f16_4[] =
@@ -556,6 +556,8 @@ i386_align_code (fragP, count)
   static const char f16_8[] =
 {0x8d,0xb4,0x00,0x00,  /* lea 0w(%si),%si  */
  0x8d,0xbd,0x00,0x00}; /* lea 0w(%di),%di  */
+  static const char f64_2[] =
+{0x66,0x90};/* data16, nop*/
   static const char *const f32_patt[] = {
 f32_1, f32_2, f32_3, f32_4, f32_5, f32_6, f32_7, f32_8,
 f32_9, f32_10, f32_11, f32_12, f32_13, f32_14, f32_15
@@ -564,32 +566,21 @@ i386_align_code (fragP, count)
 f32_1, f32_2, f16_3, f16_4, f16_5, f16_6, f16_7, f16_8,
 f32_15, f32_15, f32_15, f32_15, f32_15, f32_15, f32_15
   };
+  static const char *const f64_patt[] = {
+f32_1, f64_2, f32_15, f32_15, f32_15, f32_15, f32_15, f32_15,
+f32_15, f32_15, f32_15, f32_15, f32_15, f32_15, f32_15
+  };
 
   if (count <= 0 || count > 15)
 return;
 
-  /* The recommended way to pad 64bit code is to use NOPs preceded by
- maximally four 0x66 prefixes.  Balance the size of nops.  */
   if (flag_code == CODE_64BIT)
 {
-  int i;
-  int nnops = (count + 3) / 4;
-  int len = count / nnops;
-  int remains = count - nnops * len;
-  int pos = 0;
-
-  for (i = 0; i < remains; i++)
-   {
- memset (fragP->fr_literal + fragP->fr_fix + pos, 0x66, len);
- fragP->fr_literal[fragP->fr_fix + pos + len] = 0x90;
- pos += len + 1;
-   }
-  for (; i < nnops; i++)
-   {
- memset (fragP->fr_literal + fragP->fr_fix + pos, 0x66, len - 1);
- fragP->fr_literal[fragP->fr_fix + pos + len - 1] = 0x90;
- pos += len;
-   }
+  memcpy(fragP->fr_literal + fragP->fr_fix,
+  f64_patt[count -1], count);
+if (count > 2)
+  /* Adjust jump offset */
+  fragP->fr_literal[fragP->fr_fix + 1] = count - 2;
 }
   else
 if (flag_code == CODE_16BIT)


Re: Scheduler hack for multi-threaded processes

2016-03-19 Thread Todd Mortimer
On Sat, Mar 19, 2016 at 01:53:07PM +0100, Martin Pieuchot wrote:
> I experimented with various values for "p_priority" and this one is
> the one that generates fewer # IPIs when watching a HD video on firefox. 
> Because yes, with this diff, now I can.

Same result here: Video in firefox plays nicely with this patch, where
before it was choppy. Running amd64 in a 2xCPU VMware box. 

Todd



Allow custom signify key in install.sub

2016-01-06 Thread Todd Mortimer
Hello tech,

The attached patch adds an autoinstall question to install.sub that lets
the user specify a custom signify key for the SHA256.sig file. 

I like to track -stable for a bunch of my servers, and it's convenient
to make a release and use autoinstall with bsd.rd to keep up to date.
I already use a personal signify key for packages, and it would be
handy to be able to use the same key to sign my patched sets.

I understand if this is an unwanted feature, and the preferred solution
is to just add 'Continue without verification = yes' to
auto_upgrade.conf. But if this might be a useful feature for more than
just me, this patch might be of interest. 

Thank you,
Todd


Index: distrib/miniroot/install.sub
===
RCS file: /cvs/src/distrib/miniroot/install.sub,v
retrieving revision 1.867
diff -u -p -u -p -r1.867 install.sub
--- distrib/miniroot/install.sub27 Dec 2015 18:42:11 -  1.867
+++ distrib/miniroot/install.sub7 Jan 2016 03:55:23 -
@@ -1224,7 +1224,8 @@ __EOT
 # user will know to try again.
 install_files() {
local _src=$1 _files=$2 _f _sets _get_sets _n _col=$COLUMNS \
-   _tmpfs _tmpsrc _cfile _fsrc _unver _t _issue _srclocal
+   _tmpfs _tmpsrc _cfile _fsrc _unver _t _issue _srclocal \
+   _signifykey
 
# Initialize _sets to the list of sets found in _src, and initialize
# _get_sets to the intersection of _sets and DEFAULTSETS.
@@ -1308,8 +1309,10 @@ install_files() {
_issue="Cannot fetch SHA256.sig" && break
 
# Verify signature file with public keys.
-   ! signify -Vep /etc/signify/openbsd-${VERSION}-base.pub \
-   -x "$_cfile.sig" -m "$_cfile" &&
+   _signifykey="/etc/signify/openbsd-${VERSION}-base.pub"
+   $AUTO && ask "Location of signify key" "$_signifykey" &&
+   [[ $resp != none ]] && _signifykey=$resp
+   ! signify -Vep "$_signifykey" -x "$_cfile.sig" -m "$_cfile" &&
_issue="Signature check of SHA256.sig failed" && break
 
for _f in $_get_sets; do
Index: share/man/man8/autoinstall.8
===
RCS file: /cvs/src/share/man/man8/autoinstall.8,v
retrieving revision 1.16
diff -u -p -u -p -r1.16 autoinstall.8
--- share/man/man8/autoinstall.815 May 2015 18:53:05 -  1.16
+++ share/man/man8/autoinstall.87 Jan 2016 03:55:26 -
@@ -180,6 +180,10 @@ A template file for
 autopartitioning is fetched from
 .Ar url
 allowing a custom partition layout for the root disk.
+.It Location of signify key = Ar path
+A path to the
+.Xr signify 1
+public key used to verify the release distribution files.
 .El
 .Sh FILES
 .Bl -tag -width "/etc/dhcpd.confXXX" -compact


Re: ksh another home/end pair

2015-12-29 Thread Todd Mortimer
On Tue, Dec 29, 2015 at 10:02:48PM +0100, Mark Kettenis wrote:
> > From: "Ted Unangst" 
> > Date: Tue, 29 Dec 2015 12:11:25 -0500
> > 
> > In tmux, home and end send different bytes. I don't know why, but I want
> > things to just work. We already have two different keys here, so what's one
> > more? (how many can there be...?)
> 
> Isn't that somehowa tmux bug?  I mean, isn't tmux supposed to be
> compatible with some standard terminal type?

Looking at the tmux and screen terminfo definitions for khome and kend,
it looks like tmux is doing exactly what it is supposed to be doing.
Shouldn't the program be checking the termcap/terminfo definition for
$TERM and interpreting the control sequences appropriately? In this
case, it looks like the program is just hardcoding the SS3 H/F
sequences, which will work for a bunch of terminals, but not all. 

> 
> > Index: emacs.c
> > ===
> > RCS file: /cvs/src/bin/ksh/emacs.c,v
> > retrieving revision 1.62
> > diff -u -p -r1.62 emacs.c
> > --- emacs.c 14 Dec 2015 13:59:42 -  1.62
> > +++ emacs.c 29 Dec 2015 10:41:45 -
> > @@ -1576,6 +1576,8 @@ x_init_emacs(void)
> > kb_add(x_mv_end,NULL, CTRL('['), '[', 'F', 0); /* end */
> > kb_add(x_mv_begin,  NULL, CTRL('['), 'O', 'H', 0); /* home 
> > */
> > kb_add(x_mv_end,NULL, CTRL('['), 'O', 'F', 0); /* end */
> > +   kb_add(x_mv_begin,  NULL, CTRL('['), '[', '1', '~', 0); /* 
> > home */
> > +   kb_add(x_mv_end,NULL, CTRL('['), '[', '4', '~', 0); /* 
> > end */
> >  
> > /* can't be bound */
> > kb_add(x_set_arg,   NULL, CTRL('['), '0', 0);
> > 
> > 
> 



[patch] faq12 - blobs

2015-06-06 Thread Todd Mortimer
Hi tech@

It seems that this question comes up frequently enough that people might be 
tired
of answering it. 

Not sure if this is the right spot in the FAQ to put this, or even if this is 
something 
that people want included in there at all. Rejections, corrections and 
bikeshedding welcome. 

Cheers,
Todd


Index: faq12.html
===
RCS file: /cvs/www/faq/faq12.html,v
retrieving revision 1.117
diff -u -p -u -p -r1.117 faq12.html
--- faq12.html  25 May 2015 03:48:24 -  1.117
+++ faq12.html  6 Jun 2015 15:17:36 -
@@ -41,6 +41,7 @@ Questions/font/h1
   lia href=#ami12.1.7 - My ami(4) card will only support one
logical disk!/a
   lia href=#cryptohw12.1.8 - How do I activate my crypto accelerator 
card?/a
+  lia href=#blobs12.1.9 - Does OpenBSD include any binary-only device 
drivers (blobs)?/a
   /ul
 lia href=#alpha12.2 - DEC Alpha /a
 lia href=#amd6412.3 - AMD 64/a
@@ -280,6 +281,41 @@ much or all of the benefit of offloading
 tasks.
 Your results may vary widely depending on the task you have to
 accomplish.
+
+h3 id=blobs12.1.9 - Does OpenBSD include any binary-only device drivers 
(blobs)?/h3
+
+No. The source code for all of the device drivers in the OpenBSD kernel is
+available in a href=http://www.openbsd.org/anoncvs.html;CVS/a. OpenBSD has
+rejected binary device drivers (a.k.a. blobs) for many years, and this was even
+the subject of the a href=http://www.openbsd.org/lyrics.html#39;3.9 release 
song/a, 
+which was released in 2006. 
+
+p
+Some people are confused about the distinction between device drivers,
+which run in the kernel, and firmware, which runs on the many hardware
+parts that collectively make up your computer. Devices such as hard
+disks, network cards, and even CPUs generally contain firmware that runs on 
the device
+itself and transforms the physical collection of transistors and
+wires into something that acts like a hard disk, network card or CPU.
+This firmware is usually included with the device itself on a ROM chip.
+In some cases the vendor does not include the firmware with the device, 
+and expects the firmware to be loaded onto the device at run time by the
+operating system. For these cases, OpenBSD can load the firmware onto the 
device
+and includes the 
+a 
href=http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man1/fw_update.1;fw_update(1)/a
+utility, which can fetch non-free firmware from the Internet if the vendor has
+made it available. 
+
+p
+Before posting to the mailing lists and objecting to binary firmware,
+please remember that firmware does not run in the kernel (and is therefore not 
a
+part of the operating system), and if it is not loaded onto the device at run
+time, then that usually means that it was loaded from ROM when the device was
+powered on. Users who wish to use only hardware which has freely available
+firmware source code are encouraged to seek out and buy only that hardware. If
+OpenBSD does not yet support that hardware, then users can submit new device 
drivers
+or patches to the a 
href=http://www.openbsd.org/faq/faq2.html#MailLists;tech/a 
+mailing list.
 
 h2 id=alpha12.2 - DEC Alpha/h2
 [nothing yet]




Re: [Patch] httpd - don't leak fcgi file descriptors

2015-05-31 Thread Todd Mortimer
Hi Joerg,

Thanks for getting back to me. 

I cloned the server and upgraded it to the 31 May snapshot, did the
sysmerge and upgraded the packages to the snapshot versions.

The behaviour is still there. It actually appears to be somewhat
more pronounced, and php-fpm hits max_children more quickly than
it does under 5.7-stable. The same patch prevents the php-fpm
processes from going idle on netio, and I have reproduced it against
-current below.

It also seems that httpd on -current has more parallel connections
open to php-fpm at once compared to the same setup on 5.7-stable.

I agree that my patch is more of a workaround, and it would be
better to track down how it is that the client is being passed to
server_fcgi with an open socket. I was going this way when I started
looking at the source, but then I saw that clt-clt_srvevb and
clt-clt_srvbev get the same treatment (free if not null, then
reassign) at the same spot in server_fcgi(), and I figured if it
was good enough for clt_srvevb and clt_srvbev, why not for clt_fd?

I would be happy to look into a proper solution if that would be
better.

Thanks!
Todd

On May 31, 2015, at 2:23, Joerg Jung m...@umaxx.net wrote:

 Hi,
 
 Am 20.05.2015 um 02:06 schrieb Todd Mortimer t...@opennet.ca:
 
 The attached patch fixes a problem I’ve been having with httpd +
 php_fpm + owncloud on 5.7. The patch is against 5.7-release.
 
 Can you try with recent snapshot, and see if issue
 still occurs, please?
 Development happens in -current.
 
 After several days running owncloud with httpd, php_fpm started
 complaining about hitting pm.max_children, and top would show a
 bunch of idle php_fpm processes waiting on netio. Eventually httpd
 would start returning error 500 and owncloud would stop working.
 Restarting php_fpm or httpd would temporarily fix the issue. The
 same server with nginx did not have the same problem.
 
 I’ve had this patch running for 5 days now, and php_fpm isnt leaving
 idle processes lying around anymore. I did run with some debugging
 output to verify that clt-clt_fd is sometimes not -1 when it is
 overwritten with the new socket fd.
 
 IMHO your proposed fix is just a workaround.
 Instead of 'blindly' close()'ing, better approach is to
 figure out where the fd was leaked earlier.
 
 Regards,
 Joerg



Index: server_fcgi.c
===
RCS file: /cvs/src/usr.sbin/httpd/server_fcgi.c,v
retrieving revision 1.53
diff -u -p -u -p -r1.53 server_fcgi.c
--- server_fcgi.c   26 Mar 2015 09:01:51 -  1.53
+++ server_fcgi.c   31 May 2015 22:33:54 -
@@ -31,6 +31,7 @@
 #include stdio.h
 #include time.h
 #include ctype.h
+#include unistd.h
 #include event.h
 
 #include httpd.h
@@ -152,6 +153,9 @@ server_fcgi(struct httpd *env, struct cl
errstr = failed to allocate evbuffer;
goto fail;
}
+
+   if(clt-clt_fd != -1)
+   close(clt-clt_fd);
 
clt-clt_fd = fd;
if (clt-clt_srvbev != NULL)





[Patch] httpd - don't leak fcgi file descriptors

2015-05-19 Thread Todd Mortimer
Hi tech,

The attached patch fixes a problem I’ve been having with httpd +
php_fpm + owncloud on 5.7. The patch is against 5.7-release.

After several days running owncloud with httpd, php_fpm started
complaining about hitting pm.max_children, and top would show a
bunch of idle php_fpm processes waiting on netio. Eventually httpd
would start returning error 500 and owncloud would stop working.
Restarting php_fpm or httpd would temporarily fix the issue. The
same server with nginx did not have the same problem.

I’ve had this patch running for 5 days now, and php_fpm isnt leaving
idle processes lying around anymore. I did run with some debugging
output to verify that clt-clt_fd is sometimes not -1 when it is
overwritten with the new socket fd.

I’m happy to test or revise if needed. 

Thank you!
Todd


Index: server_fcgi.c
===
RCS file: /cvs/src/usr.sbin/httpd/server_fcgi.c,v
retrieving revision 1.52
diff -u -p -u -p -r1.52 server_fcgi.c
--- server_fcgi.c   23 Feb 2015 19:22:43 -  1.52
+++ server_fcgi.c   15 May 2015 22:12:30 -
@@ -31,6 +31,7 @@
#include stdio.h
#include time.h
#include ctype.h
+#include unistd.h
#include event.h

#include httpd.h
@@ -152,6 +153,9 @@ server_fcgi(struct httpd *env, struct cl
   errstr = failed to allocate evbuffer;
   goto fail;
   }
+
+   if (clt-clt_fd != -1)
+   close(clt-clt_fd);

   clt-clt_fd = fd;
   if (clt-clt_srvbev != NULL)