macppc: add ld.script for kernel, ofwboot

2021-05-06 Thread George Koehler
Hello tech list,

I want macppc to switch from ld.bfd to ld.lld, but there is a problem
when lld links ofwboot or the kernel.  I propose to fix it by adding
ld.script for both.  These scripts also work with ld.bfd, so I want to
commit my diff at the end of this mail, ok?

lld sets the symbol "etext" to a nonsense value like 0x1034.  In
ofwboot, wrong "etext" causes freeze, failure to boot kernel.  (Wrong
"etext" in kernel doesn't cause an obvious problem.)  Other lld arches
use an ld.script to set a correct "etext" in kernel.

I copied the ld.script from powerpc64's kernel and made these changes
for macppc's kernel:
 - change "_start" to "start" to match macppc/locore0.S
 - remove PT_DYNAMIC and sections (don't exist in macppc kernel)
 - put .text at 0x00100114 to match Makefile
 - remove symbols like "_erodata" (not used by macppc kernel)
 - add ".rodata.*" and such, so sections and segments look correct

| --- arch/powerpc64/conf/ld.script Sat Jul 18 09:16:32 2020
| +++ arch/macppc/conf/ld.scriptSat Apr 24 11:52:34 2021
| @@ -16,18 +16,17 @@
|   * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
|   */
|  
| -ENTRY(_start)
| +ENTRY(start)
|  
|  PHDRS
|  {
|   text PT_LOAD;
| - dynamic PT_DYNAMIC;
|   openbsd_randomize PT_OPENBSD_RANDOMIZE;
|  }
|  
|  SECTIONS
|  {
| - . = 0x0010;
| + . = 0x00100114;
|   .text :
|   {
|   *(.text)
| @@ -35,49 +34,35 @@
|   PROVIDE (etext = .);
|   PROVIDE (_etext = .);
|  
| - . = ALIGN(4096);
| - .rela.dyn : { *(.rela.dyn) }
| -
| - .dynamic :
| + .rodata :
|   {
| - *(.dynamic)
| - } :dynamic :text
| + *(.rodata .rodata.*)
| + } :text
|  
| - .rodata :
| + .data.rel.ro :
|   {
| - *(.rodata)
|   *(.data.rel.ro)
|   } :text
|  
|   .openbsd.randomdata :
|   {
| - *(.openbsd.randomdata)
| + *(.openbsd.randomdata .openbsd.randomdata.*)
|   } :openbsd_randomize :text
| - PROVIDE (_erodata = .);
|  
| - . = ALIGN(4096);
|   .data :
|   {
|   *(.data)
|   } :text
|  
| - . = ALIGN(4096);
| - .got : { *(.got) }
| - .toc : { *(.toc) }
| + .sbss :
| + {
| + *(.sbss)
| + }
|  
| - PROVIDE (__bss_start = .);
|   .bss :
|   {
|   *(.bss)
|   }
|   PROVIDE (end = .);
|   PROVIDE (_end = .);
| -
| - /DISCARD/ :
| - {
| - *(.dynsym)
| - *(.dynstr)
| - *(.gnu.hash)
| - *(.hash)
| - }
|  }

Then I made these changes for ofwboot:
 - use "_start" and 0x0002 to match Makefile
 - remove randomdata (doesn't exist in ofwboot)

| --- arch/macppc/conf/ld.scriptSat Apr 24 11:52:34 2021
| +++ arch/macppc/stand/ofwboot/ld.script   Sat Apr 24 11:52:34 2021
| @@ -16,17 +16,17 @@
|   * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
|   */
|  
| -ENTRY(start)
| +ENTRY(_start)
|  
|  PHDRS
|  {
|   text PT_LOAD;
| - openbsd_randomize PT_OPENBSD_RANDOMIZE;
|  }
|  
|  SECTIONS
|  {
| - . = 0x00100114;
| + /* Must match RELOC in Makefile */
| + . = 0x0002;
|   .text :
|   {
|   *(.text)
| @@ -43,11 +43,6 @@
|   {
|   *(.data.rel.ro)
|   } :text
| -
| - .openbsd.randomdata :
| - {
| - *(.openbsd.randomdata .openbsd.randomdata.*)
| - } :openbsd_randomize :text
|  
|   .data :
|   {

Index: arch/macppc/conf/Makefile.macppc
===
RCS file: /cvs/src/sys/arch/macppc/conf/Makefile.macppc,v
retrieving revision 1.101
diff -u -p -r1.101 Makefile.macppc
--- arch/macppc/conf/Makefile.macppc28 Jan 2021 17:39:03 -  1.101
+++ arch/macppc/conf/Makefile.macppc6 May 2021 20:01:08 -
@@ -51,7 +51,7 @@ DEBUG?=   -g
 COPTIMIZE?=-O2
 CFLAGS=${DEBUG} ${CWARNFLAGS} ${CMACHFLAGS} ${COPTIMIZE} 
${COPTS} ${PIPE}
 AFLAGS=-D_LOCORE ${CMACHFLAGS}
-LINKFLAGS= -N -Ttext 100114 -e start --warn-common -nopie
+LINKFLAGS= -N -T ld.script --warn-common -nopie
 
 .if ${MACHINE} == "powerpc64"
 CFLAGS+=   -m32
Index: arch/macppc/conf/ld.script
===
RCS file: /cvs/src/sys/arch/macppc/conf/ld.script,v
retrieving revision 1.1
diff -u -p -r1.1 ld.script
--- arch/macppc/conf/ld.script  13 Jun 2017 01:42:52 -  1.1
+++ arch/macppc/conf/ld.script  6 May 2021 20:01:08 -
@@ -0,0 +1,68 @@
+/* $OpenBSD: ld.script,v 1.4 2020/07/18 13:16:32 kettenis Exp $*/
+
+/*
+ * Copyright (c) 2013 Mark Kettenis 
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE 

macppc bsd.mp pmap's hash lock

2021-05-06 Thread George Koehler
Hello tech list,

If you have a macppc with more than one cpu, I would like you to try
this diff in the GENERIC.MP kernel.  I am running it on a dual G5
(without radeondrm and not running X).  I don't know whether I want to
commit this diff.

In late April, my G5's kernel froze very early during boot, while
trying to map the framebuffer.  The problem went away after reordering
and relinking the kernel.  I kept a copy of the bad kernel.  I found
the problem on Tuesday: __ppc_lock() crossed a page boundary.

$ nm -n /bsd.crash | grep __ppc_lock
$ objdump -dlr --start-ad=0x27bf8c /bsd.crash|less

The disassembly had 0x27fbf8c <= __ppc_lock < 0x27c058, so it crossed
pages at 0x27c000; page size = 0x1000 = 4096.  On a G5, the kernel
lazily faults in its own pages.  The page fault at 0x27c000 inside
__ppc_lock caused a recursive call to __ppc_lock.  I believe that the
fault happened here in __ppc_lock:

s = ppc_intr_disable();
if (atomic_cas_ulong(>mpl_count, 0, 1) == 0) {
membar_enter();
mpl->mpl_cpu = curcpu();
}

if (mpl->mpl_cpu == curcpu()) {
//-->   // page fault!  recursive call to __ppc_lock
mpl->mpl_count++;
ppc_intr_enable(s);

This is bad, because the lock is not in a valid state when the page
fault happens.  The code tries ppc_intr_disable to protect the lock,
but this doesn't disable page faults.

The lock is a recursive spinlock that protects the pmap's hash table.
My bad kernel tried to grab the lock to insert the 1st page of the
framebuffer into the hash, and then tried to recursively grab the lock
to insert page 0x27c000 into the hash.  I debugged the problem by
copying the bad kernel and overwriting some asm to insert some extra
printf()s.  The problem went away when my asm caused an earlier access
to page 0x27c000.

When I reorder the kernel, __ppc_lock gets a different address, and
probably doesn't cross pages with a not-valid lock; but I can force
a page fault this way:

__asm volatile("b 1f; . = . + 4096; 1:");

If I insert this asm at my above "//-->" and compile GENERIC.MP, then
it reproduces the freezing problem on my G5.

The problem doesn't happen on a G3 or G4, where the kernel uses block
address translation (!ppc_nobat); I copied my bad kernel to a G4 and
it didn't freeze there.  The problem doesn't happen in bsd.sp, where
the lock doesn't exist.  Our powerpc64 kernel doesn't fault in its own
pages this way.  I have observed the problem only with macppc on G5.

In this diff, I try to fix the problem by shrinking the lock to 32
bits, and using 32-bit atomic ops to keep the lock in a valid state.
This __ppc_lock is no longer the __mp_lock, but is only the pmap's
hash lock, so I also delete some unused functions.

--George

Index: arch/powerpc/include/mplock.h
===
RCS file: /cvs/src/sys/arch/powerpc/include/mplock.h,v
retrieving revision 1.4
diff -u -p -r1.4 mplock.h
--- arch/powerpc/include/mplock.h   15 Apr 2020 08:09:00 -  1.4
+++ arch/powerpc/include/mplock.h   6 May 2021 20:01:08 -
@@ -30,13 +30,13 @@
 #define __USE_MI_MPLOCK
 
 /*
+ * __ppc_lock exists because pte_spill_r() can't use __mp_lock.
  * Really simple spinlock implementation with recursive capabilities.
  * Correctness is paramount, no fancyness allowed.
  */
 
 struct __ppc_lock {
-   volatile struct cpu_info *mpl_cpu;
-   volatile long   mpl_count;
+   volatile unsigned int   mpl_bolt;
 };
 
 #ifndef _LOCORE
@@ -44,10 +44,6 @@ struct __ppc_lock {
 void __ppc_lock_init(struct __ppc_lock *);
 void __ppc_lock(struct __ppc_lock *);
 void __ppc_unlock(struct __ppc_lock *);
-int __ppc_release_all(struct __ppc_lock *);
-int __ppc_release_all_but_one(struct __ppc_lock *);
-void __ppc_acquire_count(struct __ppc_lock *, int);
-int __ppc_lock_held(struct __ppc_lock *, struct cpu_info *);
 
 #endif
 
Index: arch/powerpc/powerpc/lock_machdep.c
===
RCS file: /cvs/src/sys/arch/powerpc/powerpc/lock_machdep.c,v
retrieving revision 1.9
diff -u -p -r1.9 lock_machdep.c
--- arch/powerpc/powerpc/lock_machdep.c 15 Apr 2020 08:09:00 -  1.9
+++ arch/powerpc/powerpc/lock_machdep.c 6 May 2021 20:01:08 -
@@ -1,6 +1,7 @@
 /* $OpenBSD: lock_machdep.c,v 1.9 2020/04/15 08:09:00 mpi Exp $*/
 
 /*
+ * Copyright (c) 2021 George Koehler 
  * Copyright (c) 2007 Artur Grabowski 
  *
  * Permission to use, copy, modify, and distribute this software for any
@@ -22,15 +23,29 @@
 #include 
 
 #include 
-#include 
 
 #include 
 
+/*
+ * If __ppc_lock() crosses a page boundary in the kernel text, then it
+ * may cause a page fault (on G5 with ppc_nobat), and pte_spill_r()
+ * would recursively call __ppc_lock().  The lock must be in a valid
+ * state when the page fault happens.

Re: patch: make ld.so aware of pthread_key_create destructor - Re: multimedia/mpv debug vo=gpu crash on exit

2021-05-06 Thread Anindya Mukherjee
On Thu, May 06, 2021 at 08:00:56AM -0600, Todd C. Miller wrote:
> On Thu, 06 May 2021 09:32:28 +0200, Sebastien Marie wrote:
> 
> > We already take care of such situation with __cxa_thread_atexit_impl
> > (in libc/stdlib/thread_atexit.c), by keeping an additionnal reference
> > on object loaded (it makes ld.so aware that it is still used and so
> > dlclose() doesn't unload it).
> >
> > I used the same idiom for pthread_key_create() and used dlctl(3) in
> > the same way with the destructor address.
> 
> This will set STAT_NODELETE so the DSO will never really get unloaded.
> That's not a problem for atexit() since the process is headed for
> the exit.
> 
> I'm less sure about using it here since we don't have a way to
> unreference the DSO upon pthread_key_delete().
> 
>  - todd

I did a quick investigation on my Linux machine and there mpv seems to
be using libEGL_mesa.so instead of iris_dri.so. In this case I am not
seeing a call to pthread_key_create at the start of video playback
(there are some other places where pthread_key_create is called from but
they don't cause a problem). So, not sure what happens in Linux when
iris_dri.so is used. However, the Linux implementation of
pthread_key_create seems to also not increment the refcount when the
destructor is set so I don't yet see how it's solved there, assuming
iris_dri.so behaves identically.

Regards,
Anindya



Re: [Patch] Add support for Realtek RTL8111FP-CG Ethernet Controller

2021-05-06 Thread Jonathan Gray
On Thu, May 06, 2021 at 12:38:17PM -0400, Stephen Taylor wrote:
> This is my first post to the mailing list and my first time customizing 
> the kernel. I apologize in advance for mistakes I make.
> 
> I have a Lenovo ThinkCenter M75n Nano IoT computer. It uses the Realtek 
> RTL8111FP-CG Ethernet controller, as do other similar small form factor 
> computers. OpenBSD 6.9 does not recognize the controller. dmesg shows:
> 
> re0 at pci2 dev 0 function 1 "Realtek 8168" rev 0x1a: unknown ASIC 
> (0x5480), msi, address 00:00:00:00
> 
> Studying similar posts in the mailing lists, I made the changes shown
> in the diffs below, compiled the kernel, and now OpenBSD recognizes 
> and can use the Ethernet.
> 
> Would someone be willing to review and/or commit this? Thank you!

Thanks, committed.  This id is also used by RTL8117.

> 
>  
> Index: re.c
> ===
> RCS file: /cvs/src/sys/dev/ic/re.c,v
> retrieving revision 1.208
> diff -u -p -u -r1.208 re.c
> --- re.c  12 Dec 2020 11:48:52 -  1.208
> +++ re.c  6 May 2021 16:13:43 -
> @@ -248,6 +248,7 @@ static const struct re_revision {
>   { RL_HWREV_8168E,   "RTL8168E/8111E" },
>   { RL_HWREV_8168E_VL,"RTL8168E/8111E-VL" },
>   { RL_HWREV_8168EP,  "RTL8168EP/8111EP" },
> + { RL_HWREV_8168FP,  "RTL8168FP/8111FP" },
>   { RL_HWREV_8169,"RTL8169" },
>   { RL_HWREV_8169_8110SB, "RTL8169/8110SB" },
>   { RL_HWREV_8169_8110SBL, "RTL8169SBL" },
> @@ -754,6 +755,7 @@ re_attach(struct rl_softc *sc, const cha
>   sc->rl_max_mtu = RL_JUMBO_MTU_9K;
>   break;
>   case RL_HWREV_8168EP:
> + case RL_HWREV_8168FP:
>   case RL_HWREV_8168G:
>   case RL_HWREV_8168GU:
>   case RL_HWREV_8168H:
> Index: rtl81x9reg.h
> ===
> RCS file: /cvs/src/sys/dev/ic/rtl81x9reg.h,v
> retrieving revision 1.101
> diff -u -p -u -r1.101 rtl81x9reg.h
> --- rtl81x9reg.h  11 Apr 2018 08:02:18 -  1.101
> +++ rtl81x9reg.h  6 May 2021 16:13:43 -
> @@ -210,6 +210,7 @@
>  #define RL_HWREV_8168EP  0x5000
>  #define RL_HWREV_8168GU  0x5080
>  #define RL_HWREV_8168H   0x5400
> +#define RL_HWREV_8168FP  0x5480
>  #define RL_HWREV_8411B   0x5c80  
>  #define RL_HWREV_81390x6000
>  #define RL_HWREV_8139A   0x7000
> 
> 



[Patch] Add support for Realtek RTL8111FP-CG Ethernet Controller

2021-05-06 Thread Stephen Taylor
This is my first post to the mailing list and my first time customizing 
the kernel. I apologize in advance for mistakes I make.

I have a Lenovo ThinkCenter M75n Nano IoT computer. It uses the Realtek 
RTL8111FP-CG Ethernet controller, as do other similar small form factor 
computers. OpenBSD 6.9 does not recognize the controller. dmesg shows:

re0 at pci2 dev 0 function 1 "Realtek 8168" rev 0x1a: unknown ASIC 
(0x5480), msi, address 00:00:00:00

Studying similar posts in the mailing lists, I made the changes shown
in the diffs below, compiled the kernel, and now OpenBSD recognizes 
and can use the Ethernet.

Would someone be willing to review and/or commit this? Thank you!

 
Index: re.c
===
RCS file: /cvs/src/sys/dev/ic/re.c,v
retrieving revision 1.208
diff -u -p -u -r1.208 re.c
--- re.c12 Dec 2020 11:48:52 -  1.208
+++ re.c6 May 2021 16:13:43 -
@@ -248,6 +248,7 @@ static const struct re_revision {
{ RL_HWREV_8168E,   "RTL8168E/8111E" },
{ RL_HWREV_8168E_VL,"RTL8168E/8111E-VL" },
{ RL_HWREV_8168EP,  "RTL8168EP/8111EP" },
+   { RL_HWREV_8168FP,  "RTL8168FP/8111FP" },
{ RL_HWREV_8169,"RTL8169" },
{ RL_HWREV_8169_8110SB, "RTL8169/8110SB" },
{ RL_HWREV_8169_8110SBL, "RTL8169SBL" },
@@ -754,6 +755,7 @@ re_attach(struct rl_softc *sc, const cha
sc->rl_max_mtu = RL_JUMBO_MTU_9K;
break;
case RL_HWREV_8168EP:
+   case RL_HWREV_8168FP:
case RL_HWREV_8168G:
case RL_HWREV_8168GU:
case RL_HWREV_8168H:
Index: rtl81x9reg.h
===
RCS file: /cvs/src/sys/dev/ic/rtl81x9reg.h,v
retrieving revision 1.101
diff -u -p -u -r1.101 rtl81x9reg.h
--- rtl81x9reg.h11 Apr 2018 08:02:18 -  1.101
+++ rtl81x9reg.h6 May 2021 16:13:43 -
@@ -210,6 +210,7 @@
 #define RL_HWREV_8168EP0x5000
 #define RL_HWREV_8168GU0x5080
 #define RL_HWREV_8168H 0x5400
+#define RL_HWREV_8168FP0x5480
 #define RL_HWREV_8411B 0x5c80  
 #define RL_HWREV_8139  0x6000
 #define RL_HWREV_8139A 0x7000



Re: services(5): add default ftps ports

2021-05-06 Thread Jan Klemkow
On Thu, May 06, 2021 at 06:36:52PM +0200, Mark Kettenis wrote:
> > From: "Theo de Raadt" 
> > Date: Thu, 06 May 2021 10:26:31 -0600
> > 
> > Jan Klemkow  wrote:
> > 
> > > On Wed, May 05, 2021 at 12:18:43PM -0600, Theo de Raadt wrote:
> > > > I would like a further justification for removing these ports from
> > > > the very limited dynamic reserved space used by bindresvport.
> > > > 
> > > > (but not by rresvport, which appears still stomp over them)
> > > > 
> > > > For tcp, 32 of the 512 are locked out.
> > > > For udp, 19.
> > > > 
> > > > What software is actually using these ports?
> > > > 
> > > > Is that software irrelevant these days?
> > > 
> > > I'm working on a diff to bring ftps with libtls into our ftpd(8).  There
> > > is a "getaddrinfo(NULL, "ftps", , )" call, which uses this
> > > port.  Thus, I made this change.
> > 
> > Hang on -- does the world want ftps support?

I don't know, what "the world" wants.  But, I want ftps.  As far as I
can see, ftps is the only way to bring our ftpd(8) into the 21st
century.

I use ftp in my private local setup.  I also want to use over public
internet in the future, like I did in the past.  Thats why I'm working
on it.
 
> I was going to ask the same thing.  I mean even with encryption the
> FTP protocol still is a bad idea given all the problems with NAT
> traversal and such.

In don't use NAT or packet filters in my setup.  With IPv6 there is no
active FTP problem.



Re: remove gccism from com(4)

2021-05-06 Thread Mark Kettenis
> Date: Thu, 6 May 2021 19:56:57 +
> From: Miod Vallat 
> 
> `return f()' when f is a void function is not allowed by the C standard
> but is a gcc extension.

Thanks.  I probably inadvertedly copied that from the read code.
Committed.


> Index: com.c
> ===
> RCS file: /OpenBSD/src/sys/dev/ic/com.c,v
> retrieving revision 1.173
> diff -u -p -r1.173 com.c
> --- com.c 14 Aug 2020 18:14:11 -  1.173
> +++ com.c 6 May 2021 19:55:13 -
> @@ -1609,9 +1609,9 @@ com_write_reg(struct com_softc *sc, bus_
>   reg <<= sc->sc_reg_shift;
>  
>   if (sc->sc_reg_width == 4)
> - return bus_space_write_4(sc->sc_iot, sc->sc_ioh, reg, value);
> + bus_space_write_4(sc->sc_iot, sc->sc_ioh, reg, value);
>   else
> - return bus_space_write_1(sc->sc_iot, sc->sc_ioh, reg, value);
> + bus_space_write_1(sc->sc_iot, sc->sc_ioh, reg, value);
>  }
>  
>  #ifdef COM_CONSOLE
> @@ -1636,9 +1636,9 @@ comcn_write_reg(bus_size_t reg, uint8_t 
>   reg <<= comcons_reg_shift;
>  
>   if (comcons_reg_width == 4)
> - return bus_space_write_4(comconsiot, comconsioh, reg, value);
> + bus_space_write_4(comconsiot, comconsioh, reg, value);
>   else
> - return bus_space_write_1(comconsiot, comconsioh, reg, value);
> + bus_space_write_1(comconsiot, comconsioh, reg, value);
>  }
>  
>  #endif
> 
> 



Re: services(5): add default ftps ports

2021-05-06 Thread Jan Klemkow
On Wed, May 05, 2021 at 12:18:43PM -0600, Theo de Raadt wrote:
> I would like a further justification for removing these ports from
> the very limited dynamic reserved space used by bindresvport.
> 
> (but not by rresvport, which appears still stomp over them)
> 
> For tcp, 32 of the 512 are locked out.
> For udp, 19.
> 
> What software is actually using these ports?
> 
> Is that software irrelevant these days?

I'm working on a diff to bring ftps with libtls into our ftpd(8).  There
is a "getaddrinfo(NULL, "ftps", , )" call, which uses this
port.  Thus, I made this change.

> Jan Klemkow  wrote:
> > On Wed, May 05, 2021 at 11:09:12AM +0100, Stuart Henderson wrote:
> > > On 2021/05/04 12:07, Jan Klemkow wrote:
> > > > Add missing ftps defaults ports to servies(5).
> > > > 
> > > > Index: services
> > > > ===
> > > > RCS file: /cvs/src/etc/services,v
> > > > retrieving revision 1.99
> > > > diff -u -p -r1.99 services
> > > > --- services18 Feb 2021 02:30:29 -  1.99
> > > > +++ services4 May 2021 10:01:35 -
> > > > @@ -318,6 +318,10 @@ krb_prop   754/tcp hprop   # 
> > > > Kerberos slav
> > > >  krbupdate  760/tcp kreg# BSD Kerberos 
> > > > registration
> > > >  supfilesrv 871/tcp # SUP server
> > > >  swat   901/tcp # Samba Web 
> > > > Administration Tool
> > > > +ftps-data  989/tcp # ftp data over TLS/SSL
> > > > +ftps-data  989/udp # ftp data over TLS/SSL
> > > > +ftps   990/tcp # ftp control over 
> > > > TLS/SSL
> > > > +ftps   990/udp # ftp control over 
> > > > TLS/SSL
> > > 
> > > I'm OK with adding the TCP ones (though ftp-over-tls always makes me
> > > want to rant...). It's not going to run on UDP though so I think those
> > > should not be added.
> > 
> > OK?
> > 
> > Index: services
> > ===
> > RCS file: /cvs/src/etc/services,v
> > retrieving revision 1.99
> > diff -u -p -r1.99 services
> > --- services18 Feb 2021 02:30:29 -  1.99
> > +++ services5 May 2021 12:24:29 -
> > @@ -318,6 +318,8 @@ krb_prop754/tcp hprop   # 
> > Kerberos slav
> >  krbupdate  760/tcp kreg# BSD Kerberos registration
> >  supfilesrv 871/tcp # SUP server
> >  swat   901/tcp # Samba Web 
> > Administration Tool
> > +ftps-data  989/tcp # ftp data over TLS
> > +ftps   990/tcp # ftp control over TLS
> >  supfiledbg 1127/tcp# SUP debugging
> >  support1529/tcp# GNATS, cygnus bug 
> > tracker
> >  datametrics1645/udp
> > 
> 



remove gccism from com(4)

2021-05-06 Thread Miod Vallat
`return f()' when f is a void function is not allowed by the C standard
but is a gcc extension.

Index: com.c
===
RCS file: /OpenBSD/src/sys/dev/ic/com.c,v
retrieving revision 1.173
diff -u -p -r1.173 com.c
--- com.c   14 Aug 2020 18:14:11 -  1.173
+++ com.c   6 May 2021 19:55:13 -
@@ -1609,9 +1609,9 @@ com_write_reg(struct com_softc *sc, bus_
reg <<= sc->sc_reg_shift;
 
if (sc->sc_reg_width == 4)
-   return bus_space_write_4(sc->sc_iot, sc->sc_ioh, reg, value);
+   bus_space_write_4(sc->sc_iot, sc->sc_ioh, reg, value);
else
-   return bus_space_write_1(sc->sc_iot, sc->sc_ioh, reg, value);
+   bus_space_write_1(sc->sc_iot, sc->sc_ioh, reg, value);
 }
 
 #ifdef COM_CONSOLE
@@ -1636,9 +1636,9 @@ comcn_write_reg(bus_size_t reg, uint8_t 
reg <<= comcons_reg_shift;
 
if (comcons_reg_width == 4)
-   return bus_space_write_4(comconsiot, comconsioh, reg, value);
+   bus_space_write_4(comconsiot, comconsioh, reg, value);
else
-   return bus_space_write_1(comconsiot, comconsioh, reg, value);
+   bus_space_write_1(comconsiot, comconsioh, reg, value);
 }
 
 #endif



Re: services(5): add default ftps ports

2021-05-06 Thread Theo de Raadt
Jan Klemkow  wrote:

> > > > I'm working on a diff to bring ftps with libtls into our ftpd(8).  There
> > > > is a "getaddrinfo(NULL, "ftps", , )" call, which uses this
> > > > port.  Thus, I made this change.
> > > 
> > > Hang on -- does the world want ftps support?
> 
> I don't know, what "the world" wants.  But, I want ftps.  As far as I
> can see, ftps is the only way to bring our ftpd(8) into the 21st
> century.

I have a really hard time with that.

The protocol is completely broken, and in a way that adding TLS makes it
even worse.




Re: rpki-client don't clobber errno in mkpath

2021-05-06 Thread Theo de Raadt
I'm confused.

Who has a free() that clobbers errno?


Claudio Jeker  wrote:

> Noticed while looking at the same version in rsync. free() may clobber
> errno so better save the value before calling free().
> Also update the comment, remove all those arguments I removed :)
> 
> -- 
> :wq Claudio
> 
> Index: mkdir.c
> ===
> RCS file: /cvs/src/usr.sbin/rpki-client/mkdir.c,v
> retrieving revision 1.6
> diff -u -p -r1.6 mkdir.c
> --- mkdir.c   29 Mar 2021 04:01:17 -  1.6
> +++ mkdir.c   6 May 2021 16:41:16 -
> @@ -39,15 +39,13 @@
>  
>  /*
>   * mkpath -- create directories.
> - *   path - path
> - *   mode - file mode of terminal directory
> - *   dir_mode - file mode of intermediate directories
> + *   dir - path to create directories for
>   */
>  int
>  mkpath(const char *dir)
>  {
>   char *path, *slash;
> - int done;
> + int done, save_errno;
>  
>   if ((path = strdup(dir)) == NULL)
>   return -1;
> @@ -61,7 +59,9 @@ mkpath(const char *dir)
>   *slash = '\0';
>  
>   if (mkdir(path, 0755) == -1 && errno != EEXIST) {
> + save_errno = errno;
>   free(path);
> + errno = save_errno;
>   return -1;
>   }
>  
> 



rpki-client don't clobber errno in mkpath

2021-05-06 Thread Claudio Jeker
Noticed while looking at the same version in rsync. free() may clobber
errno so better save the value before calling free().
Also update the comment, remove all those arguments I removed :)

-- 
:wq Claudio

Index: mkdir.c
===
RCS file: /cvs/src/usr.sbin/rpki-client/mkdir.c,v
retrieving revision 1.6
diff -u -p -r1.6 mkdir.c
--- mkdir.c 29 Mar 2021 04:01:17 -  1.6
+++ mkdir.c 6 May 2021 16:41:16 -
@@ -39,15 +39,13 @@
 
 /*
  * mkpath -- create directories.
- * path - path
- * mode - file mode of terminal directory
- * dir_mode - file mode of intermediate directories
+ * dir - path to create directories for
  */
 int
 mkpath(const char *dir)
 {
char *path, *slash;
-   int done;
+   int done, save_errno;
 
if ((path = strdup(dir)) == NULL)
return -1;
@@ -61,7 +59,9 @@ mkpath(const char *dir)
*slash = '\0';
 
if (mkdir(path, 0755) == -1 && errno != EEXIST) {
+   save_errno = errno;
free(path);
+   errno = save_errno;
return -1;
}
 



Re: timeout_del_barrier(9): remove kernel lock

2021-05-06 Thread Scott Cheloha
On Wed, May 05, 2021 at 11:05:06AM +1000, David Gwynne wrote:
> On Tue, May 04, 2021 at 11:54:31AM -0500, Scott Cheloha wrote:
> > 
> > [...]
> > 
> > Here is where I get confused.
> > 
> > Why do I want to wait for *all* timeouts in the queue to finish
> > running?
> 
> You don't, you just want to know that whatever timeout was running
> has finished cos you don't know if that currently running timeout is
> yours or not.
> 
> > My understanding of the timeout_del_barrier(9) use case was as
> > follows:
> > 
> > 1. I have a dynamically allocated timeout struct.  The timeout is
> >scheduled to execute at some point in the future.
> > 
> > 2. I want to free the memory allocated to for the timeout.
> > 
> > 3. To safely free the memory I need to ensure the timeout
> >is not executing.  Until the timeout function, i.e.
> >to->to_func(), has returned it isn't necessarily safe to
> >free that memory.
> > 
> > 4. If I call timeout_del_barrier(9), it will only return if the
> >timeout in question is not running.  Assuming the timeout itself
> >is not a periodic timeout that reschedules itself we can then
> >safely free the memory.
> 
> Barriers aren't about references to timeouts, they're about references
> to the work associated with a timeout.
> 
> If you only cared about the timeout struct itself, then you can get
> all the information about whether it's referenced by the timeout
> queues from the return values from timeout_add and timeout_del.
> timeout_add() returns 1 if the timeout subsystem takes the reference
> to the timeout. If the timeout is already scheduled it returns 0
> because it's already got a reference. timeout_del returns 1 if the
> timeout itself was scheduled and it removed it, therefore giving
> up it's reference. If you timeout_del and it returns 0, then it
> wasn't on a queue inside timeouts. Easy.
> 
> What timeout_add and timeout_del don't tell you is whether the work
> referenced by the timeout is currently running. The timeout runners
> copy the function and argument onto the stack when they dequeue a
> timeout to run, and give up the reference to the timeout when the
> mutex around the timeout wheels/cirqs is released. However, the argument
> is still referenced on the stack and the function to process it may be
> about to run or is running. You can't tell that from the timeout_add/del
> return values though.
> 
> We provide two ways to deal with that. One is you have reference
> counters (or similar) on the thing you're deferring to a timeout.
> The other is you use a barrier so you know the work you deferred
> isn't on the timeout runners stack anymore because it's moved on
> to run the barrier work.
> 
> This is consistent with tasks and interrupts.
> 
> > Given this understanding, my approach was to focus on the timeout in
> > question.  So my code just spins until it the timeout is no longer
> > running.
> > 
> > But now I think I am mistaken.
> > 
> > IIUC you're telling me (and showing me, in code) that the goal of
> > timeout_barrier() is to wait for the *whole* softclock() to return,
> > not just the one timeout.
> 
> No, just the currently running timeout.
> 
> Putting the barrier timeout on the end of the proc and todo cirqs
> is because there's no CIRCQ_INSERT_HEAD I can use right now.
> 
> > Why do I want to wait for the whole softclock() or softclock_thread()?
> > Why not just wait for the one timeout to finish?
> 
> Cos I was too lazy to write CIRCQ_INSERT_HEAD last night :D

Okay.  After thinking this over I'm pretty sure we are skinning the
same cat in two different ways.

Your cond_wait(9) approach is fine for both timeout types because
timeout_del_barrier(9) is only safe to call from process context.
When I wrote my first diff I was under the impression it was safe to
call timeout_del_barrier(9) from interrupt context.  But I reread the
manpage and now I see that this is not the case, my bad.

> [...]
> 
> This discussion has some relevance to taskqs too. I was also lazy when I
> implemented taskq_barrier and used task_add to put the barrier onto the
> list of work, which meant it got added to the end. Now DRM relies on
> this behaviour. Maybe I should have pushed to commit the diff below.
> 
> This diff lets taskq barriers run as soon as possible, but adds compat
> to the drm workqueue stuff so they can wait for all pending work to
> finish as they expect.
> 
> P.S. don't call timeout_del from inside a timeout handler. I still think
> timeout_set_proc was a mistake.
> 
> [...]

Let's pick this whole train of thought (taskq, etc.) up in a
different thread.  One thing at a time.

--

Updated patch:

- Keep timeout_barrier(9).

- In timeout_barrier(9) use the barrier timeout for both normal and
  process-context timeouts.  All callers use cond_wait(9) now.

- Remove the kernel lock from the non-process path.  I assume I don't
  need to splx(splschedclock()) anymore?

- Rename "timeout_proc_barrier()" to "timeout_barrier_timeout()"
  

Re: services(5): add default ftps ports

2021-05-06 Thread Mark Kettenis
> From: "Theo de Raadt" 
> Date: Thu, 06 May 2021 10:26:31 -0600
> 
> Jan Klemkow  wrote:
> 
> > On Wed, May 05, 2021 at 12:18:43PM -0600, Theo de Raadt wrote:
> > > I would like a further justification for removing these ports from
> > > the very limited dynamic reserved space used by bindresvport.
> > > 
> > > (but not by rresvport, which appears still stomp over them)
> > > 
> > > For tcp, 32 of the 512 are locked out.
> > > For udp, 19.
> > > 
> > > What software is actually using these ports?
> > > 
> > > Is that software irrelevant these days?
> > 
> > I'm working on a diff to bring ftps with libtls into our ftpd(8).  There
> > is a "getaddrinfo(NULL, "ftps", , )" call, which uses this
> > port.  Thus, I made this change.
> 
> Hang on -- does the world want ftps support?

I was going to ask the same thing.  I mean even with encryption the
FTP protocol still is a bad idea given all the problems with NAT
traversal and such.



Re: services(5): add default ftps ports

2021-05-06 Thread Theo de Raadt
Jan Klemkow  wrote:

> On Wed, May 05, 2021 at 12:18:43PM -0600, Theo de Raadt wrote:
> > I would like a further justification for removing these ports from
> > the very limited dynamic reserved space used by bindresvport.
> > 
> > (but not by rresvport, which appears still stomp over them)
> > 
> > For tcp, 32 of the 512 are locked out.
> > For udp, 19.
> > 
> > What software is actually using these ports?
> > 
> > Is that software irrelevant these days?
> 
> I'm working on a diff to bring ftps with libtls into our ftpd(8).  There
> is a "getaddrinfo(NULL, "ftps", , )" call, which uses this
> port.  Thus, I made this change.

Hang on -- does the world want ftps support?



more rsync cleanup

2021-05-06 Thread Claudio Jeker
As noticed by benno@ the blk.blks buffer is leaked in some cases.
Fix those and cleanup up the pre_* functions a bit more.
I increased the diff context a bit to make the diff easier to read.

-- 
:wq Claudio

Index: uploader.c
===
RCS file: /cvs/src/usr.bin/rsync/uploader.c,v
retrieving revision 1.25
diff -u -p -U6 -r1.25 uploader.c
--- uploader.c  6 May 2021 07:35:22 -   1.25
+++ uploader.c  6 May 2021 15:34:36 -
@@ -191,22 +191,24 @@ pre_link(struct upload *p, struct sess *
 * to overwriting with a symbolic link.
 * If it's a non-directory, we just overwrite it.
 */
 
assert(p->rootfd != -1);
rc = fstatat(p->rootfd, f->path, , AT_SYMLINK_NOFOLLOW);
+
+   if (rc == -1 && errno != ENOENT) {
+   ERR("%s: fstatat", f->path);
+   return -1;
+   }
if (rc != -1 && !S_ISLNK(st.st_mode)) {
if (S_ISDIR(st.st_mode) &&
unlinkat(p->rootfd, f->path, AT_REMOVEDIR) == -1) {
ERR("%s: unlinkat", f->path);
return -1;
}
rc = -1;
-   } else if (rc == -1 && errno != ENOENT) {
-   ERR("%s: fstatat", f->path);
-   return -1;
}
 
/*
 * If the symbolic link already exists, then make sure that it
 * points to the correct place.
 */
@@ -294,22 +296,23 @@ pre_dev(struct upload *p, struct sess *s
 * If it replaces a directory, remove the directory first.
 */
 
assert(p->rootfd != -1);
rc = fstatat(p->rootfd, f->path, , AT_SYMLINK_NOFOLLOW);
 
+   if (rc == -1 && errno != ENOENT) {
+   ERR("%s: fstatat", f->path);
+   return -1;
+   }
if (rc != -1 && !(S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode))) {
if (S_ISDIR(st.st_mode) &&
unlinkat(p->rootfd, f->path, AT_REMOVEDIR) == -1) {
ERR("%s: unlinkat", f->path);
return -1;
}
rc = -1;
-   } else if (rc == -1 && errno != ENOENT) {
-   ERR("%s: fstatat", f->path);
-   return -1;
}
 
/* Make sure existing device is of the correct type. */
 
if (rc != -1) {
if ((f->st.mode & (S_IFCHR|S_IFBLK)) !=
@@ -382,22 +385,23 @@ pre_fifo(struct upload *p, struct sess *
 * mark it from replacement.
 */
 
assert(p->rootfd != -1);
rc = fstatat(p->rootfd, f->path, , AT_SYMLINK_NOFOLLOW);
 
+   if (rc == -1 && errno != ENOENT) {
+   ERR("%s: fstatat", f->path);
+   return -1;
+   }
if (rc != -1 && !S_ISFIFO(st.st_mode)) {
if (S_ISDIR(st.st_mode) &&
unlinkat(p->rootfd, f->path, AT_REMOVEDIR) == -1) {
ERR("%s: unlinkat", f->path);
return -1;
}
rc = -1;
-   } else if (rc == -1 && errno != ENOENT) {
-   ERR("%s: fstatat", f->path);
-   return -1;
}
 
if (rc == -1) {
newfifo = 1;
if (mktemplate(, f->path, sess->opts->recursive) == -1) {
ERRX1("mktemplate");
@@ -458,22 +462,23 @@ pre_sock(struct upload *p, struct sess *
 * mark it from replacement.
 */
 
assert(p->rootfd != -1);
rc = fstatat(p->rootfd, f->path, , AT_SYMLINK_NOFOLLOW);
 
+   if (rc == -1 && errno != ENOENT) {
+   ERR("%s: fstatat", f->path);
+   return -1;
+   }
if (rc != -1 && !S_ISSOCK(st.st_mode)) {
if (S_ISDIR(st.st_mode) &&
unlinkat(p->rootfd, f->path, AT_REMOVEDIR) == -1) {
ERR("%s: unlinkat", f->path);
return -1;
}
rc = -1;
-   } else if (rc == -1 && errno != ENOENT) {
-   ERR("%s: fstatat", f->path);
-   return -1;
}
 
if (rc == -1) {
newsock = 1;
if (mktemplate(, f->path, sess->opts->recursive) == -1) {
ERRX1("mktemplate");
@@ -530,13 +535,14 @@ pre_dir(const struct upload *p, struct s
assert(p->rootfd != -1);
rc = fstatat(p->rootfd, f->path, , AT_SYMLINK_NOFOLLOW);
 
if (rc == -1 && errno != ENOENT) {
ERR("%s: fstatat", f->path);
return -1;
-   } else if (rc != -1 && !S_ISDIR(st.st_mode)) {
+   }
+   if (rc != -1 && !S_ISDIR(st.st_mode)) {
ERRX("%s: not a directory", f->path);
return -1;
} else if (rc != -1) {
/*
 * FIXME: we should fchmod the permissions here as well,
 * as we may locally have shut down writing into the
@@ -682,19 +688,21 

Re: patch: make ld.so aware of pthread_key_create destructor - Re: multimedia/mpv debug vo=gpu crash on exit

2021-05-06 Thread Todd C . Miller
On Thu, 06 May 2021 09:32:28 +0200, Sebastien Marie wrote:

> We already take care of such situation with __cxa_thread_atexit_impl
> (in libc/stdlib/thread_atexit.c), by keeping an additionnal reference
> on object loaded (it makes ld.so aware that it is still used and so
> dlclose() doesn't unload it).
>
> I used the same idiom for pthread_key_create() and used dlctl(3) in
> the same way with the destructor address.

This will set STAT_NODELETE so the DSO will never really get unloaded.
That's not a problem for atexit() since the process is headed for
the exit.

I'm less sure about using it here since we don't have a way to
unreference the DSO upon pthread_key_delete().

 - todd



Re: patch: make ld.so aware of pthread_key_create destructor - Re: multimedia/mpv debug vo=gpu crash on exit

2021-05-06 Thread Hiltjo Posthuma
On Thu, May 06, 2021 at 09:32:28AM +0200, Sebastien Marie wrote:
> Hi,
> 
> Anindya, did a good analysis of the problem with mpv using gpu video
> output backend (it is using EGL and mesa if I correctly followed).
> 
> 
> For people not reading ports@ here a resume: the destructor function
> used in pthread_key_create() needs to be present in memory until
> _rthread_tls_destructors() is called.
> 
> in the case of mesa, eglInitialize() function could load, via
> dlopen(), code which will use pthread_key_create() with destructor.
> 
> once dlclose() is called, the object is unloaded from memory, but a
> reference to destructor is kept, leading to segfault when
> _rthread_tls_destructors() run and use the destructor (because
> pointing to unloaded code).
> 
> We already take care of such situation with __cxa_thread_atexit_impl
> (in libc/stdlib/thread_atexit.c), by keeping an additionnal reference
> on object loaded (it makes ld.so aware that it is still used and so
> dlclose() doesn't unload it).
> 
> I used the same idiom for pthread_key_create() and used dlctl(3) in
> the same way with the destructor address.
> 
> With the following diff, I am not able to reproduce the segfault
> anymore with mpv.
> 
> diff 393e7b397988bb6abe46729de1794883d2b9d5cf /home/semarie/repos/openbsd/src
> blob - 58b39bb7df70f54c708d2e2b11a3978806a86005
> file + lib/libc/thread/rthread_tls.c
> --- lib/libc/thread/rthread_tls.c
> +++ lib/libc/thread/rthread_tls.c
> @@ -22,10 +22,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
> -#include 
> -#include 
> -
>  #include "rthread.h"
>  
>  
> @@ -58,6 +56,9 @@ pthread_key_create(pthread_key_t *key, void (*destruct
>   rkeys[hint].used = 1;
>   rkeys[hint].destructor = destructor;
>  
> + if (destructor != NULL)
> + dlctl(NULL, DL_REFERENCE, destructor);
> + 
>   *key = hint++;
>   if (hint >= PTHREAD_KEYS_MAX)
>   hint = 0;
> 
> 
> I am still unsure if we want to solve this problem this way, and if
> this diff would introduce more problems than it tries to solve.
> 
> At first place, it might be better to not register a destructor when
> using dlopen()...
> 
> Comments would be appreciated.
> 
> Thanks.
> -- 
> Sebastien Marie
> 
> 
> On Wed, May 05, 2021 at 01:20:02AM -0700, Anindya Mukherjee wrote:
> > Hi,
> > 
> > I have been investigating the crash on exit problem with mpv in ports
> > with vo=gpu. I think I made a little bit of progress and thought I'd
> > share my findings.
> > 
> > The crash (SIGSEGV) happens when thread local destructors
> > are called from /usr/src/lib/libc/thread/rthread_tls.c:182 in
> > _rthread_tls_destructors after the gpu thread exits: vo_thread in
> > video/out/vo.c:1067. The crashing call stack looks like this:
> > 
> > #0  0x0176ffdc9680 in ?? ()
> > #1  0x017748d347b5 in _rthread_tls_destructors (thread=0x17798917840) 
> > at /usr/src/lib/libc/thread/rthread_tls.c:182
> > #2  0x017748d98623 in _libc_pthread_exit (retval= > variable: Unhandled dwarf expression opcode 0xa3>) at 
> > /usr/src/lib/libc/thread/rthread.c:150
> > #3  0x017795b22189 in _rthread_start (v= > Unhandled dwarf expression opcode 0xa3>) at 
> > /usr/src/lib/librthread/rthread.c:97
> > #4  0x017748d0c5ba in __tfork_thread () at 
> > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84
> > 
> > Note that some of the traces were taken from different runs so there
> > might be some mismatch between the handles/addresses.
> > 
> > It crashes because the destructor is dangling. This mystified me because
> > if I look at the mpv source, there is no thread local data for the gpu
> > thread. Indeed, right after the gpu thread starts running if we look
> > inside the thread structure, local_storage is null. However, if we look
> > at the same thread at the point of the crash, its local_storage is
> > populated:
> > 
> > (gdb) p *(*thread).local_storage
> > $3 = {
> >   keyid = 7,
> >   next = 0x177353442e0,
> >   data = 0x1770c276000
> > }
> > 
> > The keys are indexed by the keyid in the rkeys array, from where the
> > destructor is fetched in _rthread_tls_destructors:
> > 
> > (gdb) p rkeys[7]
> > $6 = {
> >   used = 1,
> >   destructor = 0x43dd33e2680
> > }
> > 
> > This destructor now points to invalid memory. It turns out the thread
> > local storage is being initialised here:
> > 
> > #0  _libc_pthread_key_create (key=0x43dd380da08, destructor=0x43dd33e2680) 
> > at /usr/src/lib/libc/thread/rthread_tls.c:42
> > #1  0x043dd33e2667 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> > #2  0x043e793f82f7 in pthread_once (once_control=0x43dd380d9f8, 
> > init_routine=0x43e793db3c0 <_libc_pthread_key_create>) at 
> > /usr/src/lib/libc/thread/rthread_once.c:26
> > #3  0x043dd33e24bd in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> > #4  0x043dd305475f in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> > #5  0x043dd3036c70 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> > #6  

patch: make ld.so aware of pthread_key_create destructor - Re: multimedia/mpv debug vo=gpu crash on exit

2021-05-06 Thread Sebastien Marie
Hi,

Anindya, did a good analysis of the problem with mpv using gpu video
output backend (it is using EGL and mesa if I correctly followed).


For people not reading ports@ here a resume: the destructor function
used in pthread_key_create() needs to be present in memory until
_rthread_tls_destructors() is called.

in the case of mesa, eglInitialize() function could load, via
dlopen(), code which will use pthread_key_create() with destructor.

once dlclose() is called, the object is unloaded from memory, but a
reference to destructor is kept, leading to segfault when
_rthread_tls_destructors() run and use the destructor (because
pointing to unloaded code).

We already take care of such situation with __cxa_thread_atexit_impl
(in libc/stdlib/thread_atexit.c), by keeping an additionnal reference
on object loaded (it makes ld.so aware that it is still used and so
dlclose() doesn't unload it).

I used the same idiom for pthread_key_create() and used dlctl(3) in
the same way with the destructor address.

With the following diff, I am not able to reproduce the segfault
anymore with mpv.

diff 393e7b397988bb6abe46729de1794883d2b9d5cf /home/semarie/repos/openbsd/src
blob - 58b39bb7df70f54c708d2e2b11a3978806a86005
file + lib/libc/thread/rthread_tls.c
--- lib/libc/thread/rthread_tls.c
+++ lib/libc/thread/rthread_tls.c
@@ -22,10 +22,8 @@
 #include 
 #include 
 #include 
+#include 
 
-#include 
-#include 
-
 #include "rthread.h"
 
 
@@ -58,6 +56,9 @@ pthread_key_create(pthread_key_t *key, void (*destruct
rkeys[hint].used = 1;
rkeys[hint].destructor = destructor;
 
+   if (destructor != NULL)
+   dlctl(NULL, DL_REFERENCE, destructor);
+   
*key = hint++;
if (hint >= PTHREAD_KEYS_MAX)
hint = 0;


I am still unsure if we want to solve this problem this way, and if
this diff would introduce more problems than it tries to solve.

At first place, it might be better to not register a destructor when
using dlopen()...

Comments would be appreciated.

Thanks.
-- 
Sebastien Marie


On Wed, May 05, 2021 at 01:20:02AM -0700, Anindya Mukherjee wrote:
> Hi,
> 
> I have been investigating the crash on exit problem with mpv in ports
> with vo=gpu. I think I made a little bit of progress and thought I'd
> share my findings.
> 
> The crash (SIGSEGV) happens when thread local destructors
> are called from /usr/src/lib/libc/thread/rthread_tls.c:182 in
> _rthread_tls_destructors after the gpu thread exits: vo_thread in
> video/out/vo.c:1067. The crashing call stack looks like this:
> 
> #0  0x0176ffdc9680 in ?? ()
> #1  0x017748d347b5 in _rthread_tls_destructors (thread=0x17798917840) at 
> /usr/src/lib/libc/thread/rthread_tls.c:182
> #2  0x017748d98623 in _libc_pthread_exit (retval= Unhandled dwarf expression opcode 0xa3>) at 
> /usr/src/lib/libc/thread/rthread.c:150
> #3  0x017795b22189 in _rthread_start (v= Unhandled dwarf expression opcode 0xa3>) at 
> /usr/src/lib/librthread/rthread.c:97
> #4  0x017748d0c5ba in __tfork_thread () at 
> /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84
> 
> Note that some of the traces were taken from different runs so there
> might be some mismatch between the handles/addresses.
> 
> It crashes because the destructor is dangling. This mystified me because
> if I look at the mpv source, there is no thread local data for the gpu
> thread. Indeed, right after the gpu thread starts running if we look
> inside the thread structure, local_storage is null. However, if we look
> at the same thread at the point of the crash, its local_storage is
> populated:
> 
> (gdb) p *(*thread).local_storage
> $3 = {
>   keyid = 7,
>   next = 0x177353442e0,
>   data = 0x1770c276000
> }
> 
> The keys are indexed by the keyid in the rkeys array, from where the
> destructor is fetched in _rthread_tls_destructors:
> 
> (gdb) p rkeys[7]
> $6 = {
>   used = 1,
>   destructor = 0x43dd33e2680
> }
> 
> This destructor now points to invalid memory. It turns out the thread
> local storage is being initialised here:
> 
> #0  _libc_pthread_key_create (key=0x43dd380da08, destructor=0x43dd33e2680) at 
> /usr/src/lib/libc/thread/rthread_tls.c:42
> #1  0x043dd33e2667 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #2  0x043e793f82f7 in pthread_once (once_control=0x43dd380d9f8, 
> init_routine=0x43e793db3c0 <_libc_pthread_key_create>) at 
> /usr/src/lib/libc/thread/rthread_once.c:26
> #3  0x043dd33e24bd in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #4  0x043dd305475f in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #5  0x043dd3036c70 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #6  0x043dd30e3ca3 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #7  0x043dd30e4b96 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #8  0x043dd302d89a in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #9  0x043dd3031162 in ?? () from /usr/X11R6/lib/modules/dri/iris_dri.so
> #10 

Re: simplify the openrsync uploader

2021-05-06 Thread Claudio Jeker
On Wed, May 05, 2021 at 11:34:17PM +0200, Sebastian Benoit wrote:
> Claudio Jeker(cje...@diehard.n-r-g.com) on 2021.05.05 17:53:20 +0200:
> > The rsync uploader (what is the generator in rsync) can be simplified and
> > cleaned up a fair bit.
> > 
> > There is some confusion of non-blocking IO on regular files and the idea
> > to poll() between openat() and fstat(). This is all not needed and
> > therefor a lot of the code handling files can be moved into pre_file.
> > This also removes the UPLOAD_READ_LOCAL state since it is no longer
> > needed.
> > 
> > As a little extra gift this diff also plugs a mem leak in rsync_uploader.
> > The mbuf buffer was not freed and so for every file being checksummed it
> > would leak one of those.
> 
> see below about that.
> 
> Other than that its ok.
> 
> > do {
> > msz = pread(*fileinfd, mbuf, blk.len, offs);
> > -   if (msz < 0) {
> > +   if ((size_t)msz != blk.len && (size_t)msz != blk.rem) {
> > ERR("pread");
> > close(*fileinfd);
> > *fileinfd = -1;
> > +   free(mbuf);
> 
> Isnt a free(blk.blks) needed as well, here and in more spots below?

Will investigate but I think that will be a different diff then.
 
> > return -1;
> > }
> > -   if ((size_t)msz != blk.len && (size_t)msz != blk.rem) {
> > -   /* short read, try again */
> > -   continue;
> > -   }
> > init_blk([i], , offs, i, mbuf, sess);
> > offs += blk.len;
> > LOG3(
> > @@ -947,6 +935,7 @@ rsync_uploader(struct upload *u, int *fi
> > i++;
> > } while (i < blk.blksz);
> >  
> > +   free(mbuf);
> > close(*fileinfd);
> > *fileinfd = -1;
> > LOG3("%s: mapped %jd B with %zu blocks",
> > 
> 

-- 
:wq Claudio