[v5] amd64: simplify TSC sync testing

2022-07-30 Thread Scott Cheloha
Hi,

At the urging of sthen@ and dv@, here is v5.

Two major changes from v4:

- Add the function tc_reset_quality() to kern_tc.c and use it
  to lower the quality of the TSC timecounter if we fail the
  sync test.

  tc_reset_quality() will choose a new active timecounter if,
  after the quality change, the given timecounter is no longer
  the best timecounter.

  The upshot is: if you fail the TSC sync test you should boot
  with the HPET as your active timecounter.  If you don't have
  an HPET you'll be using something else.

- Drop the SMT accomodation from the hot loop.  It hasn't been
  necessary since last year when I rewrote the test to run without
  a mutex.  In the rewritten test, the two CPUs in the hot loop
  are not competing for any resources so they should not be able
  to starve one another.

dv: Could you double-check that this still chooses the right
timecounter on your machine?  If so, I will ask deraadt@ to
put this into snaps to replace v4.

Additional test reports are welcome.  Include your dmesg.

--

I do not see much more I can do to improve this patch.

I am seeking patch review and OKs.

I am especially interested in whether my assumptions in tsc_ap_test()
and tsc_bp_test() are correct.  The whole patch depends on those
assumptions.  Is this a valid way to test for TSC desync?  Or am I
missing membar_producer()/membar_consumer() calls?

Here is the long version of "what" and "why" for this patch.

The patch is attached at the end.

- Computing a per-CPU TSC skew value is error-prone, especially
  on multisocket machines and VMs.  My best guess is that larger
  latencies appear to the skew measurement test as TSC desync,
  and so the TSC is demoted to a kernel timecounter on these
  machines or marked non-monotonic.

  This patch eliminates per-CPU TSC skew values.  Instead of trying
  to measure and correct for TSC desync we only try to detect desync,
  which is less error-prone.  This approach should allow a wider
  variety of machines to use the TSC as a timecounter when running
  OpenBSD.

- In the new sync test, both CPUs repeatedly try to detect whether
  their TSC is trailing the other CPU's TSC.  The upside to this
  approach is that it yields no false positives (if my assumptions
  about AMD64 memory access and instruction serialization are correct).
  The downside to this approach is that it takes more time than the
  current skew measurement test.  Each test round takes 1ms, and
  we run up to two rounds per CPU, so this patch slows boot down
  by 2ms per AP.

- If any CPU fails the sync test, the TSC is marked non-monotonic
  and a different timecounter is activated.  The TC_USER flag
  remains intact.  There is no "middle ground" where we fall back
  to only using the TSC in the kernel.

- Because there is no per-CPU skew value, there is also no concept
  of TSC drift anymore.

- Before running the test, we check for the IA32_TSC_ADJUST
  register and reset it if necessary.  This is a trivial way
  to work around firmware bugs that desync the TSC before we
  reach the kernel.

  Unfortunately, at the moment this register appears to only
  be available on Intel processors and I cannot find an equivalent
  but differently-named MSR for AMD processors.

--

Index: sys/arch/amd64/amd64/tsc.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v
retrieving revision 1.24
diff -u -p -r1.24 tsc.c
--- sys/arch/amd64/amd64/tsc.c  31 Aug 2021 15:11:54 -  1.24
+++ sys/arch/amd64/amd64/tsc.c  31 Jul 2022 03:06:39 -
@@ -36,13 +36,6 @@ int  tsc_recalibrate;
 uint64_t   tsc_frequency;
 inttsc_is_invariant;
 
-#defineTSC_DRIFT_MAX   250
-#define TSC_SKEW_MAX   100
-int64_ttsc_drift_observed;
-
-volatile int64_t   tsc_sync_val;
-volatile struct cpu_info   *tsc_sync_cpu;
-
 u_int  tsc_get_timecount(struct timecounter *tc);
 void   tsc_delay(int usecs);
 
@@ -236,22 +229,12 @@ cpu_recalibrate_tsc(struct timecounter *
 u_int
 tsc_get_timecount(struct timecounter *tc)
 {
-   return rdtsc_lfence() + curcpu()->ci_tsc_skew;
+   return rdtsc_lfence();
 }
 
 void
 tsc_timecounter_init(struct cpu_info *ci, uint64_t cpufreq)
 {
-#ifdef TSC_DEBUG
-   printf("%s: TSC skew=%lld observed drift=%lld\n", ci->ci_dev->dv_xname,
-   (long long)ci->ci_tsc_skew, (long long)tsc_drift_observed);
-#endif
-   if (ci->ci_tsc_skew < -TSC_SKEW_MAX || ci->ci_tsc_skew > TSC_SKEW_MAX) {
-   printf("%s: disabling user TSC (skew=%lld)\n",
-   ci->ci_dev->dv_xname, (long long)ci->ci_tsc_skew);
-   tsc_timecounter.tc_user = 0;
-   }
-
if (!(ci->ci_flags & CPUF_PRIMARY) ||
!(ci->ci_flags & CPUF_CONST_TSC) ||
!(ci->ci_flags & CPUF_INVAR_TSC))
@@ -268,111 +251,264 @@ tsc_timecounter_init(struct cpu_info *ci
calibrate_tsc_freq();
  

Re: echo(1): check for stdio errors

2022-07-30 Thread Todd C . Miller
On Sat, 30 Jul 2022 18:19:02 -0500, Scott Cheloha wrote:

> Bump.  The standard's error cases for fflush(3) are identical to those
> for fclose(3):
>
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/fflush.html
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/fclose.html
>
> Is the fact that our fclose(3) can succeed even if the error flag is
> set a bug?

As far as I can tell, neither fflush() nor fclose() check the status
of the error flag, though they may set it of course.  That is why
I was suggesting an explicit ferror() call at the end.

 - todd



Re: echo(1): check for stdio errors

2022-07-30 Thread Scott Cheloha
On Mon, Jul 11, 2022 at 01:27:23PM -0500, Scott Cheloha wrote:
> On Mon, Jul 11, 2022 at 08:31:04AM -0600, Todd C. Miller wrote:
> > On Sun, 10 Jul 2022 20:58:35 -0900, Philip Guenther wrote:
> > 
> > > Three thoughts:
> > > 1) Since stdio errors are sticky, is there any real advantage to checking
> > > each call instead of just checking the final fclose()?
> 
> My thinking was that we have no idea how many arguments we're going to
> print, so we may as well fail as soon as possible.
> 
> Maybe in more complex programs there would be a code-length or
> complexity-reducing upside to deferring the ferror(3) check until,
> say, the end of a subroutine or something.
> 
> > > [...]
> > 
> > Will that really catch all errors?  From what I can tell, fclose(3)
> > can succeed even if the error flag was set.  The pattern I prefer
> > is to use a final fflush(3) followed by a call to ferror(3) before
> > the fclose(3).
> 
> [...]

Bump.  The standard's error cases for fflush(3) are identical to those
for fclose(3):

https://pubs.opengroup.org/onlinepubs/9699919799/functions/fflush.html
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fclose.html

Is the fact that our fclose(3) can succeed even if the error flag is
set a bug?

Also, can I go ahead with this?  With this patch, echo(1) fails if we
(for example) try to write to a full file system.  So we are certainly
catching more stdio failures:

$ /bin/echo test > /tmp/myfile

/tmp: write failed, file system is full
$ echo $?
0

$ obj/echo test > /tmp/myfile

/tmp: write failed, file system is full
echo: stdout: No space left on device
$ echo $?
1

Progress!  Note that the shell builtin already fails in this case:

$ type echo
echo is a shell builtin
$ echo test > /tmp/myfile

/tmp: write failed, file system is full
jetsam$ echo $?
1

Index: echo.c
===
RCS file: /cvs/src/bin/echo/echo.c,v
retrieving revision 1.10
diff -u -p -r1.10 echo.c
--- echo.c  9 Oct 2015 01:37:06 -   1.10
+++ echo.c  30 Jul 2022 23:10:24 -
@@ -53,12 +53,15 @@ main(int argc, char *argv[])
nflag = 0;
 
while (*argv) {
-   (void)fputs(*argv, stdout);
-   if (*++argv)
-   putchar(' ');
+   if (fputs(*argv, stdout) == EOF)
+   err(1, "stdout");
+   if (*++argv && putchar(' ') == EOF)
+   err(1, "stdout");
}
-   if (!nflag)
-   putchar('\n');
+   if (!nflag && putchar('\n') == EOF)
+   err(1, "stdout");
+   if (fflush(stdout) == EOF || fclose(stdout) == EOF)
+   err(1, "stdout");
 
return 0;
 }



Re: Consistency and cleanup in /share/misc/airport

2022-07-30 Thread Stuart Henderson
On 2022/07/30 22:34, Thomas Wager wrote:
> On Fri, 2022-07-29 at 16:09 -0400, Daniel Dickman wrote:
> 
> > I think they’re called Metropolitan Area Airport Codes:
> > 
> > I found a list here:
> > Metropolitan Area Airport Codes
> > wikitravel.org
> > 
> > 
> > Do you want to submit a revised patch with all the corrections?
> > 
> 
> Thanks, I double checked with wikitravel.org as well. Here is a patch that
> also addresses changes to the Metropolitan Area Airport Codes. Some of them
> do not appear on wikitravel.org and not on iata.org but they are not reused
> either, so I left them as they are (NRW, QLA). For QSF I found its homepage
> here:
> http://www.egsa-constantine.dz/index.php/aeroports/aeroport-de-setif-08-mai-1945
> Its official name is "8 Mai 1945".
> Here is a revised patch:

Due to the rule for this file mentioned in the header, I think you'll
need to find a developer who has been to the added airports that have
replaced the metro areas, i.e. KCK MFW QDV QSF, either that or just
remove them.



Re: Consistency and cleanup in /share/misc/airport

2022-07-30 Thread Thomas Wager
On Fri, 2022-07-29 at 16:09 -0400, Daniel Dickman wrote:

> I think they’re called Metropolitan Area Airport Codes:
> 
> I found a list here:
> Metropolitan Area Airport Codes
> wikitravel.org
> 
> 
> Do you want to submit a revised patch with all the corrections?
> 

Thanks, I double checked with wikitravel.org as well. Here is a patch that
also addresses changes to the Metropolitan Area Airport Codes. Some of them
do not appear on wikitravel.org and not on iata.org but they are not reused
either, so I left them as they are (NRW, QLA). For QSF I found its homepage
here:
http://www.egsa-constantine.dz/index.php/aeroports/aeroport-de-setif-08-mai-1945
Its official name is "8 Mai 1945".
Here is a revised patch:


Index: airport
===
RCS file: /cvs/src/share/misc/airport,v
retrieving revision 1.83
diff -u -p -u -p -r1.83 airport
--- airport 26 Jun 2022 06:28:51 -  1.83
+++ airport 30 Jul 2022 20:23:57 -
@@ -470,7 +470,7 @@ DUD:Momona, Dunedin, New Zealand
 DUJ:Jefferson County, Du Bois, Pennsylvania, USA
 DUQ:Quamichan Lake, Duncan / Quam, British Columbia, Canada
 DUR:King Shaka International, Durban, South Africa
-DUS:Dusseldorf, Germany
+DUS:Duesseldorf, Germany
 DUT:Dutch Harbor, Alaska, USA
 DVO:Mati, Davao, Philippines
 DXB:Dubai, UAE
@@ -656,7 +656,7 @@ HAD:Halmstad, Halland, Sweden
 HAH:Moroni (Hahaya), Comoros
 HAJ:Langenhagen, Hanover, Germany
 HAK:Haikou, China
-HAM:Fuhlsbuettel Hamburg, Germany
+HAM:Hamburg, Germany
 HAN:Noibai, Hanoi, Vietnam
 HAU:Karmoy, Haugesund, Norway
 HAV:Jose Marti, Havana, Cuba
@@ -849,7 +849,7 @@ KBP:Kyiv Borispil International, Kyiv, U
 KBR:Sultan Ismail Petra, Kota Bharu, Malaysia
 KCG:Fisheries, Chignik, Alaska, USA
 KCH:Kuching, Sarawak, Malaysia
-KCK:All Airports around Kansas City, Kansas, USA
+KCK:Kirensk, Russia
 KCL:Lagoon, Chignik, Alaska, USA
 KDG:Kardjali, Bulgaria
 KEF:Keflavik, Reykjavik, Iceland
@@ -940,7 +940,7 @@ LAW:Municipal, Lawton, Oklahoma, USA
 LAX:Los Angeles International, California, USA
 LBA:Leeds/Bradford, England, United Kingdom
 LBB:Lubbock, Texas, USA
-LBC:Lubeck, Germany
+LBC:Luebeck, Germany
 LBE:Westmoreland County, Latrobe, Pennsylvania, USA
 LBF:North Platte Lee Bird Field, Nebraska, USA
 LBL:Glenn L Martin Terminal, Liberal, Kansas, USA
@@ -1088,7 +1088,7 @@ MEY:Meghauli, Nepal
 MFE:Mc Allen/Mission, Texas, USA
 MFM:Macau
 MFN:Milford Sound, New Zealand
-MFW:All Airports around Miami, Ft. Lauderdale and West Palm Beach, Florida, USA
+MFW:Magaruque Island, Mozambique
 MGA:Managua, Nicaragua
 MGM:Montgomery Municipal/Dannelly Field, Alabama, USA
 MGQ:Mogadishu, Somalia
@@ -1412,11 +1412,11 @@ PZE:Penzance, England, United Kingdom
 PZO:Puerto Ordaz, Venezuela
 PZU:Port Sudan, Sudan
 QBF:Vail/Eagle, Colorado, USA
-QDV:All Airports around Denver, Colorado, USA
+QDV:Comte. Rolim Adolfo Amaro, Jundiai, Brazil
 QKB:Breckenridge, Colorado, USA
 QLA:All Airports around Los Angeles, California, USA
 QRO:Queretaro, Mexico
-QSF:All Airports around San Francisco, California, USA
+QSF:8 May 1945, Setif, Algeria
 QSY:Sydney, New South Wales, Australia
 RAB:Lakunai, Rabaul, Papua New Guinea
 RAJ:Rajkot, India
@@ -1672,7 +1672,6 @@ TGM:Tirgu Mures, Romania
 TGU:Toncontin, Tegucigalpa, Honduras
 TGZ:Llano San Juan, Tuxtla Gutierrez, Chiapas, Mexico
 THE:Teresina, Piaui, Brazil
-THF:Tempelhof, Berlin, Germany
 THN:Trollhattan, Sweden
 THR:Mehrabad, Tehran, Iran
 THU:Thule, Pituffik, Greenland
@@ -1851,7 +1850,7 @@ XBR:Brockville, Ontario, Canada
 XCM:Chatham, Ontario, Canada
 XDM:Drummondville, Quebec, Canada
 XFD:Stratford, Ontario, Canada
-XFW:Flugplatz Hamburg-Finkenwerder, Hamburg, Germany
+XFW:Hamburg-Finkenwerder, Hamburg, Germany
 XIY:Xianyang, Xi An, China
 XLV:Niagara Falls, Ontario, Canada
 XLZ:Truro, Nova Scotia, Canada



Re: interface media without netlock

2022-07-30 Thread Mark Kettenis
> Date: Thu, 28 Jul 2022 13:30:12 +0200
> From: Alexander Bluhm 
> 
> Hi,
> 
> The netlock for SIOCSIFMEDIA and SIOCGIFMEDIA ioctl is not necessary.
> Legacy drivers run with kernel lock, interface media is MP safe or
> has kernel lock.
> 
> ixl(4) talks about net lock but in fact has kernel lock.
> 
> Problem is that smtpd(8) periodically checks media status.  This
> stops network if netlock was taken.
> 
> ok?

ok kettenis@

(my reasoning for the ix(4) and ixl(4) below)

> Index: dev/pci/if_ix.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_ix.c,v
> retrieving revision 1.186
> diff -u -p -r1.186 if_ix.c
> --- dev/pci/if_ix.c   27 Jun 2022 15:11:23 -  1.186
> +++ dev/pci/if_ix.c   27 Jul 2022 20:38:00 -
> @@ -1576,6 +1576,8 @@ ixgbe_update_link_status(struct ix_softc
>   struct ifnet*ifp = >arpcom.ac_if;
>   int link_state = LINK_STATE_DOWN;
>  
> + KERNEL_ASSERT_LOCKED();
> +
>   ixgbe_check_link(>hw, >link_speed, >link_up, 0);
>  
>   ifp->if_baudrate = 0;

The driver clearly assumes that the kernel lock is used as an
interlock between interrupt context and normal context.  Could add an
splassert() here as well, since that is obviously a requirement.

> Index: dev/pci/if_ixl.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_ixl.c,v
> retrieving revision 1.83
> diff -u -p -r1.83 if_ixl.c
> --- dev/pci/if_ixl.c  11 Mar 2022 18:00:45 -  1.83
> +++ dev/pci/if_ixl.c  27 Jul 2022 20:38:00 -
> @@ -2069,7 +2069,7 @@ ixl_media_status(struct ifnet *ifp, stru
>  {
>   struct ixl_softc *sc = ifp->if_softc;
>  
> - NET_ASSERT_LOCKED();
> + KERNEL_ASSERT_LOCKED();
>  
>   ifm->ifm_status = sc->sc_media_status;
>   ifm->ifm_active = sc->sc_media_active;
> @@ -3517,7 +3517,9 @@ ixl_link_state_update_iaq(struct ixl_sof
>   return;
>   }
>  
> + KERNEL_LOCK();
>   link_state = ixl_set_link_status(sc, iaq);
> + KERNEL_UNLOCK();
>   mtx_enter(>sc_link_state_mtx);
>   if (ifp->if_link_state != link_state) {
>   ifp->if_link_state = link_state;
> @@ -4488,6 +4490,8 @@ ixl_set_link_status(struct ixl_softc *sc
>   const struct ixl_aq_link_status *status;
>   const struct ixl_phy_type *itype;
>  
> + KERNEL_ASSERT_LOCKED();
> +
>   uint64_t ifm_active = IFM_ETHER;
>   uint64_t ifm_status = IFM_AVALID;
>   int link_state = LINK_STATE_DOWN;
> @@ -4513,7 +4517,6 @@ ixl_set_link_status(struct ixl_softc *sc
>   baudrate = ixl_search_link_speed(status->link_speed);
>  
>  done:
> - /* NET_ASSERT_LOCKED() except during attach */
>   sc->sc_media_active = ifm_active;
>   sc->sc_media_status = ifm_status;
>   sc->sc_ac.ac_if.if_baudrate = baudrate;

Right.  This may be a bit heavy-handed, but it does mean ifm_actiave
and ifm_status are consistent.

> Index: net/if.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
> retrieving revision 1.659
> diff -u -p -r1.659 if.c
> --- net/if.c  14 Jul 2022 11:03:15 -  1.659
> +++ net/if.c  27 Jul 2022 21:49:04 -
> @@ -2259,6 +2259,15 @@ forceup:
>   error = ((*ifp->if_ioctl)(ifp, cmd, data));
>   break;
>  
> + case SIOCSIFMEDIA:
> + if ((error = suser(p)) != 0)
> + break;
> + /* FALLTHROUGH */
> + case SIOCGIFMEDIA:
> + /* net lock is not needed */
> + error = ((*ifp->if_ioctl)(ifp, cmd, data));
> + break;
> +
>   case SIOCSETKALIVE:
>   case SIOCDIFPHYADDR:
>   case SIOCSLIFPHYADDR:
> @@ -2268,7 +2277,6 @@ forceup:
>   case SIOCSLIFPHYECN:
>   case SIOCADDMULTI:
>   case SIOCDELMULTI:
> - case SIOCSIFMEDIA:
>   case SIOCSVNETID:
>   case SIOCDVNETID:
>   case SIOCSVNETFLOWID:
> Index: net/if_media.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_media.c,v
> retrieving revision 1.36
> diff -u -p -r1.36 if_media.c
> --- net/if_media.c14 Jul 2022 13:46:25 -  1.36
> +++ net/if_media.c27 Jul 2022 21:49:14 -
> @@ -312,7 +312,7 @@ ifmedia_ioctl(struct ifnet *ifp, struct 
>   /*
>* Get list of available media and current media on interface.
>*/
> - case  SIOCGIFMEDIA:
> + case SIOCGIFMEDIA:
>   {
>   struct ifmediareq *ifmr = (struct ifmediareq *) ifr;
>   size_t nwords;
> 
> 



Re: interface media current data

2022-07-30 Thread Mark Kettenis
> Date: Sat, 30 Jul 2022 18:45:39 +0300
> From: Vitaliy Makkoveev 
> 
> On Wed, Jul 27, 2022 at 08:53:38PM +0200, Mark Kettenis wrote:
> > > Date: Wed, 27 Jul 2022 00:11:19 +0200
> > > From: Alexander Bluhm 
> > > 
> > > On Tue, Jul 26, 2022 at 11:19:27PM +0200, Mark Kettenis wrote:
> > > > > Date: Tue, 26 Jul 2022 18:11:01 +0200
> > > > > From: Alexander Bluhm 
> > > > >
> > > > > On Fri, Jul 22, 2022 at 06:13:04PM +0200, Alexander Bluhm wrote:
> > > > > > But I am fine with committing ifmedia, gem(4) and bge(4) now.  Then
> > > > > > we can decide on a per driver basis.  As long kernel lock is not
> > > > > > removed from the ifmedia layer, this diff is not strictly necessary.
> > > > > > I want to commit it anyway to show how MP in ifmedia should work.
> > > > >
> > > > > This is the part for gem(4) and bge(4).  I have tested them.
> > > > > ifmedia_current() is MP safe and provides the data which all the
> > > > > drivers need.
> > > > >
> > > > > After commiting this, I can look for more hardware to test.
> > > > >
> > > > > ok?
> > > >
> > > > I still don't understand what this fixes.  Most drivers create a list
> > > > of media when the PHY attaches during autoconf.  That list never
> > > > changes, which means that the pointer to the ifmedia struct remains
> > > > valid until the driver detaches.
> > > >
> > > > There is of course a race when you use ifconfig to change the media
> > > > type.  But your new interface doesn't really fix that race.  Code that
> > > > uses ifmedia_current() may still end up with the old information and
> > > > use that at a point where that information is no longer accurate.
> > > 
> > > It does not fix a real bug.  The diff tries to keep internal data
> > > structures within a file, instead of spreading them over all drivers.
> > > MP work is easier if each struct field has a clear lock that is
> > > responsible for it.  70 drivers that should change it only during
> > > attach is not that clear.
> > 
> > But making data structures "mpsafe" is not a goal in itself.  The way
> > the data is used must be safe as well.  If you don't think that
> > through, you run the risk of adding mutexes that don't actually help.
> > And this is starting to smell like one of those to me.
> > 
> > > The ifmedia_list is protected by mutex.  ifm_cur is a pointer to
> > > an element of the list.  It would be natural to protect it with the
> > > same mutex.  ifm_cur is changed by ifmedia_set() and ifmedia_ioctl()
> > > and ifmedia_delete_instance().
> > > 
> > > Look at bnxt_hwrm_port_phy_qcfg().  There the list is changed by
> > > bnxt_media_status() which is called from ifmedia_ioctl().
> > > ixgbe_handle_msf() changes the list during SPF interrupt.  But I
> > > have to admit that these drivers do not access ifm_cur.
> > > 
> > > Why do you refuse to make the ifmedia layer MP safe by itself?
> > > It is just one function and a mutex.  I cannot see harm.
> > 
> > I looked a bit at what NetBSD does.  I don't think what they do is a
> > good idea (in particular having pointers to locks).  But they actually
> > hold a lock across the callbacks.  Which made me realize that if you
> > don't do that you don't really fix the race between changes detected
> > by the PHY and manual changes through ioctl.  And I think the
> > callbacks can sleep.  So using a mutex may simply be wrong.
> > 
> > 
> 
> NetBSD uses spin mutex by default for ifmedia protection. But it could
> use driver's mutex for that purpose. That's why they have pointer to
> lock.
> 
> I don't like to keep lock while callback called. It introduces lock
> inconsistency like we have with (*if_qstart)() callbacks. May be we
> should pass the new media to the media change callback directly, like
> below:
> 
>   if (ifm->new_ifm_change_cb)
>   error = (*ifm->new_ifm_change_cb)(ifm, ifp, match,
>   newmedia);
>   else {
>   oldentry = ifm->ifm_cur;
>   oldmedia = ifm->ifm_media;
>   /* ... */
> 
>   error = (*ifm->ifm_change_cb)(ifp);
>   if (error && error != ENETRESET) {
>   mtx_enter(_mtx);
>   if (ifm->ifm_cur == match) {
>   ifm->ifm_cur = oldentry;
>   ifm->ifm_cur = oldentry;
>   }
>   mtx_leave(_mtx);
>   }
>   }
> 
> This keeps the media change atomicy, and leaves `ifm' protection to the
> driver, but without pointers to locks and callback calls with lock held.

I once more ask you guys.  What problem are you trying to solve?



Re: interface media current data

2022-07-30 Thread Vitaliy Makkoveev
On Wed, Jul 27, 2022 at 08:53:38PM +0200, Mark Kettenis wrote:
> > Date: Wed, 27 Jul 2022 00:11:19 +0200
> > From: Alexander Bluhm 
> > 
> > On Tue, Jul 26, 2022 at 11:19:27PM +0200, Mark Kettenis wrote:
> > > > Date: Tue, 26 Jul 2022 18:11:01 +0200
> > > > From: Alexander Bluhm 
> > > >
> > > > On Fri, Jul 22, 2022 at 06:13:04PM +0200, Alexander Bluhm wrote:
> > > > > But I am fine with committing ifmedia, gem(4) and bge(4) now.  Then
> > > > > we can decide on a per driver basis.  As long kernel lock is not
> > > > > removed from the ifmedia layer, this diff is not strictly necessary.
> > > > > I want to commit it anyway to show how MP in ifmedia should work.
> > > >
> > > > This is the part for gem(4) and bge(4).  I have tested them.
> > > > ifmedia_current() is MP safe and provides the data which all the
> > > > drivers need.
> > > >
> > > > After commiting this, I can look for more hardware to test.
> > > >
> > > > ok?
> > >
> > > I still don't understand what this fixes.  Most drivers create a list
> > > of media when the PHY attaches during autoconf.  That list never
> > > changes, which means that the pointer to the ifmedia struct remains
> > > valid until the driver detaches.
> > >
> > > There is of course a race when you use ifconfig to change the media
> > > type.  But your new interface doesn't really fix that race.  Code that
> > > uses ifmedia_current() may still end up with the old information and
> > > use that at a point where that information is no longer accurate.
> > 
> > It does not fix a real bug.  The diff tries to keep internal data
> > structures within a file, instead of spreading them over all drivers.
> > MP work is easier if each struct field has a clear lock that is
> > responsible for it.  70 drivers that should change it only during
> > attach is not that clear.
> 
> But making data structures "mpsafe" is not a goal in itself.  The way
> the data is used must be safe as well.  If you don't think that
> through, you run the risk of adding mutexes that don't actually help.
> And this is starting to smell like one of those to me.
> 
> > The ifmedia_list is protected by mutex.  ifm_cur is a pointer to
> > an element of the list.  It would be natural to protect it with the
> > same mutex.  ifm_cur is changed by ifmedia_set() and ifmedia_ioctl()
> > and ifmedia_delete_instance().
> > 
> > Look at bnxt_hwrm_port_phy_qcfg().  There the list is changed by
> > bnxt_media_status() which is called from ifmedia_ioctl().
> > ixgbe_handle_msf() changes the list during SPF interrupt.  But I
> > have to admit that these drivers do not access ifm_cur.
> > 
> > Why do you refuse to make the ifmedia layer MP safe by itself?
> > It is just one function and a mutex.  I cannot see harm.
> 
> I looked a bit at what NetBSD does.  I don't think what they do is a
> good idea (in particular having pointers to locks).  But they actually
> hold a lock across the callbacks.  Which made me realize that if you
> don't do that you don't really fix the race between changes detected
> by the PHY and manual changes through ioctl.  And I think the
> callbacks can sleep.  So using a mutex may simply be wrong.
> 
> 

NetBSD uses spin mutex by default for ifmedia protection. But it could
use driver's mutex for that purpose. That's why they have pointer to
lock.

I don't like to keep lock while callback called. It introduces lock
inconsistency like we have with (*if_qstart)() callbacks. May be we
should pass the new media to the media change callback directly, like
below:

if (ifm->new_ifm_change_cb)
error = (*ifm->new_ifm_change_cb)(ifm, ifp, match,
newmedia);
else {
oldentry = ifm->ifm_cur;
oldmedia = ifm->ifm_media;
/* ... */

error = (*ifm->ifm_change_cb)(ifp);
if (error && error != ENETRESET) {
mtx_enter(_mtx);
if (ifm->ifm_cur == match) {
ifm->ifm_cur = oldentry;
ifm->ifm_cur = oldentry;
}
mtx_leave(_mtx);
}
}

This keeps the media change atomicy, and leaves `ifm' protection to the
driver, but without pointers to locks and callback calls with lock held.



Re: vmctl create: accept exactly one file

2022-07-30 Thread Dave Voutila


Klemens Nanni  writes:

> Do not silently ignore additional arguments:
>
>   $ vmctl create
>   usage:  vmctl [-v] create [-b base | -i disk] [-s size] disk
>   $ vmctl create -s 1G 1.img 2.img
>   vmctl: raw imagefile created
>   $ ls *.img
>   1.img
>
>   $ ./obj/vmctl create -s 1G 1.qcow2 2.qcow2
>   usage:  vmctl [-v] create [-b base | -i disk] [-s size] disk
>   $ ./obj/vmctl create -s 1G 1.qcow2
>   vmctl: qcow2 imagefile created
>   $ ls *.qcow2
>   1.qcow2
>
> All other commands seem to properly check argc.
>
> OK?

Makes sense and looks good.

ok dv@

>
> Index: main.c
> ===
> RCS file: /cvs/src/usr.sbin/vmctl/main.c,v
> retrieving revision 1.71
> diff -u -p -r1.71 main.c
> --- main.c13 May 2022 00:17:20 -  1.71
> +++ main.c30 Jul 2022 06:32:05 -
> @@ -587,7 +587,7 @@ ctl_create(struct parse_result *res, int
>   argc -= optind;
>   argv += optind;
>
> - if (argc < 1)
> + if (argc != 1)
>   ctl_usage(res->ctl);
>
>   type = parse_disktype(argv[0], );



Re: randomise arc4random() rekey interval

2022-07-30 Thread Visa Hankala
On Sat, Jul 30, 2022 at 06:40:21PM +1000, Damien Miller wrote:
> On Fri, 29 Jul 2022, Theo de Raadt wrote:
> 
> > The question is what _rs_random_u32() will do when it calls
> > _rs_stir_if_needed().
> >
> > There is one potential problem. lib/libcrypto/arc4random/*.h contains
> > portable wrappers for _rs_forkdetect(), which actually do things.
> > memset(rs, 0, sizeof(*rs)) will trash the rs state. Let's imagine a
> > "fork" has happened same time that bytes run out.
> >
> > _rs_stir()
> > ...
> > rs->rs_count = REKEY_BASE;
> > _rs_random_u32 -> _rs_stir_if_needed -> _rs_forkdetect
> >   - all rs fields are zero'd with memset
> >   - _rs_forkdetect returns
> > 
> > back in _rs_stir_if_needed,
> >  - if (!rs || rs->rs_count <= len)
> >_rs_stir();
> > 
> >
> > So it will recurse once (only once, because a 2nd fork cannot happen).
> > But this is fragile.
> >
> > Alternatives are to get the value direct from getentropy -- with
> > KEYSZ + IVSZ + 4 maybe? Or fetch a value for this random bias early
> > and track it in rs, but don't damage it in the memset? Or split
> > _rs_random_u32() so that a sub-function of it may collect these 4
> > keystream bytes without the _rs_stir_if_needed/_rs_rekey checks?
> 
> I don't see how a fork could trash these - do you mean one that
> happened in a thread or a signal handler? AFAIK arc4random() isn't
> safe in these contexts right now, even without fork().
> 
> Anyway, this version invokes the chacha context directly so there's
> not possibility of _rs_stir() reentrance. It is still not safe against
> something clobbering rsx concurrently (but neither is the existing
> code).
> 
> Index: crypt/arc4random.c
> ===
> RCS file: /cvs/src/lib/libc/crypt/arc4random.c,v
> retrieving revision 1.56
> diff -u -p -r1.56 arc4random.c
> --- crypt/arc4random.c28 Feb 2022 21:56:29 -  1.56
> +++ crypt/arc4random.c30 Jul 2022 08:38:44 -
> @@ -49,6 +49,8 @@
>  #define BLOCKSZ  64
>  #define RSBUFSZ  (16*BLOCKSZ)
>  
> +#define REKEY_BASE   (1024*1024) /* NB. should be a power of 2 */
> +
>  /* Marked MAP_INHERIT_ZERO, so zero'd out in fork children. */
>  static struct _rs {
>   size_t  rs_have;/* valid bytes at end of rs_buf */
> @@ -86,6 +88,7 @@ static void
>  _rs_stir(void)
>  {
>   u_char rnd[KEYSZ + IVSZ];
> + uint32_t rekey_fuzz = 0;
>  
>   if (getentropy(rnd, sizeof rnd) == -1)
>   _getentropy_fail();
> @@ -100,7 +103,10 @@ _rs_stir(void)
>   rs->rs_have = 0;
>   memset(rsx->rs_buf, 0, sizeof(rsx->rs_buf));
>  
> - rs->rs_count = 160;
> + /* rekey interval should not be predictable */
> + chacha_encrypt_bytes(>rs_chacha, (uint8_t *)_fuzz,
> +  (uint8_t *)_fuzz, sizeof(rekey_fuzz));
> + rs->rs_count += REKEY_BASE + (rekey_fuzz % REKEY_BASE);

Replace += with =. With that fixed, OK visa@



Re: randomise arc4random() rekey interval

2022-07-30 Thread Damien Miller
On Fri, 29 Jul 2022, Theo de Raadt wrote:

> The question is what _rs_random_u32() will do when it calls
> _rs_stir_if_needed().
>
> There is one potential problem. lib/libcrypto/arc4random/*.h contains
> portable wrappers for _rs_forkdetect(), which actually do things.
> memset(rs, 0, sizeof(*rs)) will trash the rs state. Let's imagine a
> "fork" has happened same time that bytes run out.
>
> _rs_stir()
> ...
> rs->rs_count = REKEY_BASE;
> _rs_random_u32 -> _rs_stir_if_needed -> _rs_forkdetect
>   - all rs fields are zero'd with memset
>   - _rs_forkdetect returns
> 
> back in _rs_stir_if_needed,
>- if (!rs || rs->rs_count <= len)
>_rs_stir();
> 
>
> So it will recurse once (only once, because a 2nd fork cannot happen).
> But this is fragile.
>
> Alternatives are to get the value direct from getentropy -- with
> KEYSZ + IVSZ + 4 maybe? Or fetch a value for this random bias early
> and track it in rs, but don't damage it in the memset? Or split
> _rs_random_u32() so that a sub-function of it may collect these 4
> keystream bytes without the _rs_stir_if_needed/_rs_rekey checks?

I don't see how a fork could trash these - do you mean one that
happened in a thread or a signal handler? AFAIK arc4random() isn't
safe in these contexts right now, even without fork().

Anyway, this version invokes the chacha context directly so there's
not possibility of _rs_stir() reentrance. It is still not safe against
something clobbering rsx concurrently (but neither is the existing
code).

Index: crypt/arc4random.c
===
RCS file: /cvs/src/lib/libc/crypt/arc4random.c,v
retrieving revision 1.56
diff -u -p -r1.56 arc4random.c
--- crypt/arc4random.c  28 Feb 2022 21:56:29 -  1.56
+++ crypt/arc4random.c  30 Jul 2022 08:38:44 -
@@ -49,6 +49,8 @@
 #define BLOCKSZ64
 #define RSBUFSZ(16*BLOCKSZ)
 
+#define REKEY_BASE (1024*1024) /* NB. should be a power of 2 */
+
 /* Marked MAP_INHERIT_ZERO, so zero'd out in fork children. */
 static struct _rs {
size_t  rs_have;/* valid bytes at end of rs_buf */
@@ -86,6 +88,7 @@ static void
 _rs_stir(void)
 {
u_char rnd[KEYSZ + IVSZ];
+   uint32_t rekey_fuzz = 0;
 
if (getentropy(rnd, sizeof rnd) == -1)
_getentropy_fail();
@@ -100,7 +103,10 @@ _rs_stir(void)
rs->rs_have = 0;
memset(rsx->rs_buf, 0, sizeof(rsx->rs_buf));
 
-   rs->rs_count = 160;
+   /* rekey interval should not be predictable */
+   chacha_encrypt_bytes(>rs_chacha, (uint8_t *)_fuzz,
+(uint8_t *)_fuzz, sizeof(rekey_fuzz));
+   rs->rs_count += REKEY_BASE + (rekey_fuzz % REKEY_BASE);
 }
 
 static inline void



Re: inteldrm: fix build without intagp

2022-07-30 Thread Jonathan Gray
On Sat, Jul 30, 2022 at 07:51:07AM +, Klemens Nanni wrote:
> Pull inteldrm_refcnt out of NINTAGP > 0, otherwise it remains undefined but
> still used in inteldrm_attachhook():
> 
>   if (inteldrm_refcnt == 0) {
>   i915_init();
>   }
>   inteldrm_refcnt++;
> 
> OK?

ok jsg@

> 
> Index: sys/dev/pci/drm/i915/i915_drv.c
> ===
> RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_drv.c,v
> retrieving revision 1.142
> diff -u -p -r1.142 i915_drv.c
> --- sys/dev/pci/drm/i915/i915_drv.c   15 Jul 2022 17:57:26 -  1.142
> +++ sys/dev/pci/drm/i915/i915_drv.c   30 Jul 2022 07:45:48 -
> @@ -2007,15 +2007,15 @@ static const struct drm_driver driver = 
>  
>  #include "intagp.h"
>  
> -#if NINTAGP > 0
> -int  intagpsubmatch(struct device *, void *, void *);
> -int  intagp_print(void *, const char *);
> -
>  /*
>   * some functions are only called once on init regardless of how many times
>   * inteldrm attaches in linux this is handled via module_init()/module_exit()
>   */
>  int inteldrm_refcnt;
> +
> +#if NINTAGP > 0
> +int  intagpsubmatch(struct device *, void *, void *);
> +int  intagp_print(void *, const char *);
>  
>  int
>  intagpsubmatch(struct device *parent, void *match, void *aux)
> 
> 



inteldrm: fix build without intagp

2022-07-30 Thread Klemens Nanni
Pull inteldrm_refcnt out of NINTAGP > 0, otherwise it remains undefined but
still used in inteldrm_attachhook():

if (inteldrm_refcnt == 0) {
i915_init();
}
inteldrm_refcnt++;

OK?

Index: sys/dev/pci/drm/i915/i915_drv.c
===
RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_drv.c,v
retrieving revision 1.142
diff -u -p -r1.142 i915_drv.c
--- sys/dev/pci/drm/i915/i915_drv.c 15 Jul 2022 17:57:26 -  1.142
+++ sys/dev/pci/drm/i915/i915_drv.c 30 Jul 2022 07:45:48 -
@@ -2007,15 +2007,15 @@ static const struct drm_driver driver = 
 
 #include "intagp.h"
 
-#if NINTAGP > 0
-intintagpsubmatch(struct device *, void *, void *);
-intintagp_print(void *, const char *);
-
 /*
  * some functions are only called once on init regardless of how many times
  * inteldrm attaches in linux this is handled via module_init()/module_exit()
  */
 int inteldrm_refcnt;
+
+#if NINTAGP > 0
+intintagpsubmatch(struct device *, void *, void *);
+intintagp_print(void *, const char *);
 
 int
 intagpsubmatch(struct device *parent, void *match, void *aux)



vmctl create: accept exactly one file

2022-07-30 Thread Klemens Nanni
Do not silently ignore additional arguments:

$ vmctl create
usage:  vmctl [-v] create [-b base | -i disk] [-s size] disk
$ vmctl create -s 1G 1.img 2.img
vmctl: raw imagefile created
$ ls *.img
1.img

$ ./obj/vmctl create -s 1G 1.qcow2 2.qcow2
usage:  vmctl [-v] create [-b base | -i disk] [-s size] disk
$ ./obj/vmctl create -s 1G 1.qcow2
vmctl: qcow2 imagefile created
$ ls *.qcow2
1.qcow2

All other commands seem to properly check argc.

OK?

Index: main.c
===
RCS file: /cvs/src/usr.sbin/vmctl/main.c,v
retrieving revision 1.71
diff -u -p -r1.71 main.c
--- main.c  13 May 2022 00:17:20 -  1.71
+++ main.c  30 Jul 2022 06:32:05 -
@@ -587,7 +587,7 @@ ctl_create(struct parse_result *res, int
argc -= optind;
argv += optind;
 
-   if (argc < 1)
+   if (argc != 1)
ctl_usage(res->ctl);
 
type = parse_disktype(argv[0], );



bioctl.8: noauto flag has no effect in bootloaders

2022-07-30 Thread Klemens Nanni
Reading "at boot time" may come off as "in the bootloader", which seems
plausible since we support booting off root on softraid and at leat one
platform automatically assembles all available softraid volumes in the
bootloader, whether they contain the boot disk or not.

But all currently supported flags are exclusively handled by the kernel,
so "noauto" aka. BIOC_SCNOAUTOASSEMBLE is always ignored in bootloaders.

To make this a bit clearer whilst avoiding longer explanations or
implementation details, go with "system startup" as that's known wording
from rc(8) -- this should make it clear enough that it isn't about early
boot(8) code.

Sure, it's the kernel and not /etc/rc handling this flag, but it still
seems like an improvement to me, especially since softraid(4)
consistently says "boot" when talking about general bootloader support.


Anyone else think this makes a difference?

Given I could not immediately answer the question where "noauto" is
handled (or not) even though I'm currently working on softraid code in
one bootloader, I wanted to make this clear for everyone.

Index: bioctl.8
===
RCS file: /cvs/src/sbin/bioctl/bioctl.8,v
retrieving revision 1.109
diff -u -p -r1.109 bioctl.8
--- bioctl.88 Feb 2021 11:20:03 -   1.109
+++ bioctl.830 Jul 2022 06:12:18 -
@@ -196,7 +196,7 @@ Force the operation;
 for example, force the creation of volumes
 with unclean data in the metadata areas.
 .It Cm noauto
-Do not automatically assemble this volume at boot time.
+Do not automatically assemble this volume at system startup time.
 .El
 .It Fl c Ar raidlevel
 Create a new