Re: clean /dev from /etc/daily ?

2020-11-23 Thread Otto Moerbeek
tOn Mon, Nov 23, 2020 at 01:53:01PM +0100, Solene Rapenne wrote:

> A common mistake when using dd is to create a file in /dev which
> fills up the space of / and may stay silent until / gets filled up
> by something else that will fail.
> 
> Would it be OK to add this in /etc/daily?
> 
> find /dev -type f ! -name MAKEDEV -delete
> 
> AFAIK /dev should have only MAKEDEV as a regular file.
> hier(7) says /dev only have block and character devices
> with the exception of MAKEDEV.
> 

reporting is good, but deleting not.

-Otto



Re: Ryzen 5800X hw.setperf vs hw.cpuspeed

2020-11-20 Thread Otto Moerbeek
On Fri, Nov 20, 2020 at 09:48:47AM -0500, Bryan Steele wrote:

> On Fri, Nov 20, 2020 at 03:08:42PM +0100, Mark Kettenis wrote:
> > > Date: Fri, 20 Nov 2020 07:41:20 -0500
> > > From: Bryan Steele 
> > > 
> > > On Fri, Nov 20, 2020 at 09:26:08AM +0100, Otto Moerbeek wrote:
> > > > Hi,
> > > > 
> > > > I got a new Ryzen machine, dmesg below. What I'm observing might be a
> > > > issue with hw.setperf. 
> > > > 
> > > > On startsup it shows:
> > > > 
> > > > hw.cpuspeed=3800
> > > > hw.setperf=100
> > > > 
> > > > If I lower hw.setperf to zero, the new state is reflect immediately in
> > > > hw.cpuspeed:
> > > > 
> > > > hw.cpuspeed=2200
> > > > hw.setperf=0
> > > > 
> > > > And also sha256 -t becomes slower as expected.
> > > > 
> > > > But If I raise hw.setperf to 100 I'm seeing:
> > > > 
> > > > hw.cpuspeed=2200
> > > > hw.setperf=100
> > > > 
> > > > and sha256 -t is still slow. Only after some time passes (lets say a
> > > > couple of tens of seconds) it does show:
> > > > 
> > > > hw.cpuspeed=3800
> > > > hw.setperf=100
> > > > 
> > > > and sha256 -t is fast again.
> > > > 
> > > > This behaviour is different from my old machine, where setting
> > > > hs.setperf was reflected in hs.cpuspeed immediately both ways
> > > > 
> > > > Any clue?
> > > > 
> > > > -Otto
> > > 
> > > Hey Otto,
> > > 
> > > Nice machine! :-)
> > > 
> > > I've seen this "sticking" issue before (as have others), but haven't
> > > been able to narrow it down unfortunately. I'm not sure if it's a
> > > bug in the k1x-pstate.c code I wrote, it's some undocumented new
> > > behaviour on newer Ryzen CPUs, or if a MI setperf change happened
> > > at some point that's unhandled..
> > > 
> > > At least on a desktop I'd suggest to leaved apmd(8) and not do any
> > > manual hw.setperf tweaking, you should have adequate cooling and the
> > > BIOS will automatically adjust the CPU fan to keep it so. I believe
> > > it will also allow it to more quickly move into CPB boost frequencies
> > > if left at P-state L0 (but don't quote me on that).
> > 
> > I would expect this machine to use the acpucpu(4) setperf
> > implementation.  Figuring out if that is indeed the case would
> > probably be step 1 in debugging this.
> 
> I didn't realize there was a setperf implementation in acpicpu(4),
> k1x-pstate depends on acpicpu(4) to to gather PSS information, but
> otherwise writes the MSRs out itself rather than calling any ACPI
> methods.
> 
> In identifycpu() we're just matching on family, e.g:
> if (ci->ci_family >= 0x10)
>   setperf_setup = k1x_init;
> 
> The Intel SpeedStep case is below and matches based on a CPUID flag,
> so I don't see when the acpucpu implementation would ever be chosen
> at on either..
> 
> -Bryan.
> 

It's k1x_init being called to print the speeds.

At first I thought: I'd like to be able to use APM, I'm not so much
worried about cooling, but a lower power usage when the machine's
idle would be nice. You wouldn't believe the kWh price here ;-)

So I put a Watt meter on it. It turns out the machine uses about 51W
when idle and apm -H or -C does not make any difference. Power usage
goes up to about 157W when building and drops back to 51W when idling.

So you advice not to tweak hw.setperf or apm is sound.

BTW: no lap burning risks, it's a desktop CPU in a tower case standing
on the floor ;-)

-Otto



Ryzen 5800X hw.setperf vs hw.cpuspeed

2020-11-20 Thread Otto Moerbeek
Hi,

I got a new Ryzen machine, dmesg below. What I'm observing might be a
issue with hw.setperf. 

On startsup it shows:

hw.cpuspeed=3800
hw.setperf=100

If I lower hw.setperf to zero, the new state is reflect immediately in
hw.cpuspeed:

hw.cpuspeed=2200
hw.setperf=0

And also sha256 -t becomes slower as expected.

But If I raise hw.setperf to 100 I'm seeing:

hw.cpuspeed=2200
hw.setperf=100

and sha256 -t is still slow. Only after some time passes (lets say a
couple of tens of seconds) it does show:

hw.cpuspeed=3800
hw.setperf=100

and sha256 -t is fast again.

This behaviour is different from my old machine, where setting
hs.setperf was reflected in hs.cpuspeed immediately both ways

Any clue?

-Otto

OpenBSD 6.8-current (GENERIC.MP) #1: Thu Nov 19 21:01:06 CET 2020
o...@lou.intra.drijf.net:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 34286964736 (32698MB)
avail mem = 33232543744 (31693MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.3 @ 0xe8d60 (55 entries)
bios0: vendor American Megatrends Inc. version "F11d" date 10/29/2020
bios0: Gigabyte Technology Co., Ltd. B550 AORUS ELITE
acpi0 at bios0: ACPI 6.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT FIDT MCFG HPET BGRT IVRS PCCT SSDT 
CRAT CDIT SSDT SSDT SSDT SSDT WSMT APIC SSDT SSDT SSDT FPDT
acpi0: wakeup devices GPP0(S4) GP12(S4) GP13(S4) XHC0(S4) GP30(S4) GP31(S4) 
GPP2(S4) GPP3(S4) GPP8(S4) GPP1(S4)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimcfg0 at acpi0
acpimcfg0: addr 0xf000, bus 0-127
acpihpet0 at acpi0: 14318180 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 7 5800X 8-Core Processor, 3793.35 MHz, 19-21-00
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache, 32MB 64b/line disabled L3 cache
cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: AMD Ryzen 7 5800X 8-Core Processor, 3792.89 MHz, 19-21-00
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache, 32MB 64b/line disabled L3 cache
cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: AMD Ryzen 7 5800X 8-Core Processor, 3792.89 MHz, 19-21-00
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu2: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache, 32MB 64b/line disabled L3 cache
cpu2: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu2: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: AMD Ryzen 7 5800X 8-Core Processor, 3792.89 MHz, 19-21-00
cpu3: 

Re: diff: tcp ack improvement

2020-11-05 Thread Otto Moerbeek
On Fri, Nov 06, 2020 at 01:10:52AM +0100, Jan Klemkow wrote:

> Hi,
> 
> bluhm and I make some network performance measurements and kernel
> profiling.
> 
> Setup:Linux (iperf) -10gbit-> OpenBSD (relayd) -10gbit-> Linux (iperf)
> 
> We figured out, that the kernel uses a huge amount of processing time
> for sending ACKs to the sender on the receiving interface.  After
> receiving a data segment, we send our two ACK.  The first one in
> tcp_input() direct after receiving.  The second ACK is send out, after
> the userland or the sosplice task read some data out of the socket
> buffer.
> 
> The fist ACK in tcp_input() is called after receiving every other data
> segment like it is discribed in RFC1122:
> 
>   4.2.3.2  When to Send an ACK Segment
>   A TCP SHOULD implement a delayed ACK, but an ACK should
>   not be excessively delayed; in particular, the delay
>   MUST be less than 0.5 seconds, and in a stream of
>   full-sized segments there SHOULD be an ACK for at least
>   every second segment.
> 
> This advice is based on the paper "Congestion Avoidance and Control":
> 
>   4 THE GATEWAY SIDE OF CONGESTION CONTROL
>   The 8 KBps senders were talking to 4.3+BSD receivers
>   which would delay an ack for atmost one packet (because
>   of an ack’s clock’ role, the authors believe that the
>   minimum ack frequency should be every other packet).
> 
> Sending the first ACK (on every other packet) coasts us too much
> processing time.  Thus, we run into a full socket buffer earlier.  The
> first ACK just acknowledges the received data, but does not update the
> window.  The second ACK, caused by the socket buffer reader, also
> acknowledges the data and also updates the window.  So, the second ACK,
> is much more worth for a fast packet processing than the fist one.
> 
> The performance improvement is between 33% with splicing and 20% without
> splice:
> 
>   splicingrelaying
> 
>   current 3.1 GBit/s  2.6 GBit/s
>   w/o first ack   4.1 GBit/s  3.1 GBit/s
> 
> As far as I understand the implementation of other operating systems:
> Linux has implement a custom TCP_QUICKACK socket option, to turn this
> kind of feature on and off.  FreeBSD and NetBSD sill depend on it, when
> using the New Reno implementation.
> 
> The following diff turns off the direct ACK on every other segment.  We
> are running this diff in production on our own machines at genua and on
> our products for several month, now.  We don't noticed any problems,
> even with interactive network sessions (ssh) nor with bulk traffic.
> 
> Another solution could be a sysctl(3) or an additional socket option,
> similar to Linux, to control this behavior per socket or system wide.
> Also, a counter to ACK every 3rd, 4th... data segment could beat the
> problem.

I am wondering if you also looked at another scenario: the process
reading the soecket is sleeping so the receive buffer fills up without
any acks being sent. Won't that lead to a lot of retransmissions
containing data?

-Otto

> 
> bye,
> Jan
> 
> Index: netinet/tcp_input.c
> ===
> RCS file: /cvs/src/sys/netinet/tcp_input.c,v
> retrieving revision 1.365
> diff -u -p -r1.365 tcp_input.c
> --- netinet/tcp_input.c   19 Jun 2020 22:47:22 -  1.365
> +++ netinet/tcp_input.c   5 Nov 2020 23:00:34 -
> @@ -165,8 +165,8 @@ do { \
>  #endif
>  
>  /*
> - * Macro to compute ACK transmission behavior.  Delay the ACK unless
> - * we have already delayed an ACK (must send an ACK every two segments).
> + * Macro to compute ACK transmission behavior.  Delay the ACK until
> + * a read from the socket buffer or the delayed ACK timer causes one.
>   * We also ACK immediately if we received a PUSH and the ACK-on-PUSH
>   * option is enabled or when the packet is coming from a loopback
>   * interface.
> @@ -176,8 +176,7 @@ do { \
>   struct ifnet *ifp = NULL; \
>   if (m && (m->m_flags & M_PKTHDR)) \
>   ifp = if_get(m->m_pkthdr.ph_ifidx); \
> - if (TCP_TIMER_ISARMED(tp, TCPT_DELACK) || \
> - (tcp_ack_on_push && (tiflags) & TH_PUSH) || \
> + if ((tcp_ack_on_push && (tiflags) & TH_PUSH) || \
>   (ifp && (ifp->if_flags & IFF_LOOPBACK))) \
>   tp->t_flags |= TF_ACKNOW; \
>   else \
> 



Re: dig(1): Extended DNS Error (RFC 8914)

2020-10-30 Thread Otto Moerbeek
On Fri, Oct 30, 2020 at 03:04:03PM +0100, Florian Obser wrote:

Love it,

-Otto

> $ obj/dig @1.1.1.1 dnssec-failed.org
> 
> ; <<>> dig 9.10.8-P1 <<>> @1.1.1.1 dnssec-failed.org
> ; (1 server found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 26772
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
> 
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 1232
> ; EDE: 6 (DNSSEC Bogus)
> ;; QUESTION SECTION:
> ;dnssec-failed.org. IN  A
> 
> ;; Query time: 244 msec
> ;; SERVER: 1.1.1.1#53(1.1.1.1)
> ;; WHEN: Fri Oct 30 14:59:09 CET 2020
> ;; MSG SIZE  rcvd: 52
> 
> Since I'm not aware of a server/query combination that responds with
> UTF-8 encoded EXTENDED-TEXT I didn't implement anything special for
> this so it will use the default renderer that's also used for NSIDs,
> printing a hexdump + printable ascii, e.g.:
> 
> $ dig @k.root-servers.net +nsid . soa
> [...]
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 1232
> ; NSID: 6e 73 33 2e 6e 6c 2d 61 6d 73 2e 6b 2e 72 69 70 65 2e 6e 65 74 
> ("ns3.nl-ams.k.ripe.net")
> 
> OK?
> 
> diff --git lib/dns/include/dns/message.h lib/dns/include/dns/message.h
> index 65ffcfd4c3f..a70720eee39 100644
> --- lib/dns/include/dns/message.h
> +++ lib/dns/include/dns/message.h
> @@ -104,6 +104,7 @@
>  #define DNS_OPT_COOKIE   10  /*%< COOKIE opt code */
>  #define DNS_OPT_PAD  12  /*%< PAD opt code */
>  #define DNS_OPT_KEY_TAG  14  /*%< Key tag opt code */
> +#define DNS_OPT_EDE  15  /* RFC 8914 */
>  
>  /*%< The number of EDNS options we know about. */
>  #define DNS_EDNSOPTIONS  4
> diff --git lib/dns/message.c lib/dns/message.c
> index 5e0fb167382..9721f9c0ef4 100644
> --- lib/dns/message.c
> +++ lib/dns/message.c
> @@ -2434,6 +2434,68 @@ render_ecs(isc_buffer_t *ecsbuf, isc_buffer_t *target) 
> {
>   return (ISC_R_SUCCESS);
>  }
>  
> +static const char *
> +ede_info_code2str(uint16_t info_code)
> +{
> + if (info_code > 49151)
> + return "Private Use";
> +
> + switch (info_code) {
> + case 0:
> + return "Other Error";
> + case 1:
> + return "Unsupported DNSKEY Algorithm";
> + case 2:
> + return "Unsupported DS Digest Type";
> + case 3:
> + return "Stale Answer";
> + case 4:
> + return "Forged Answer";
> + case 5:
> + return "DNSSEC Indeterminate";
> + case 6:
> + return "DNSSEC Bogus";
> + case 7:
> + return "Signature Expired";
> + case 8:
> + return "Signature Not Yet Valid";
> + case 9:
> + return "DNSKEY Missing";
> + case 10:
> + return "RRSIGs Missing";
> + case 11:
> + return "No Zone Key Bit Set";
> + case 12:
> + return "NSEC Missing";
> + case 13:
> + return "Cached Error";
> + case 14:
> + return "Not Ready";
> + case 15:
> + return "Blocked";
> + case 16:
> + return "Censored";
> + case 17:
> + return "Filtered";
> + case 18:
> + return "Prohibited";
> + case 19:
> + return "Stale NXDomain Answer";
> + case 20:
> + return "Not Authoritative";
> + case 21:
> + return "Not Supported";
> + case 22:
> + return "No Reachable Authority";
> + case 23:
> + return "Network Error";
> + case 24:
> + return "Invalid Data";
> + default:
> + return "Unassigned";
> + }
> +}
> +
>  isc_result_t
>  dns_message_pseudosectiontotext(dns_message_t *msg,
>   dns_pseudosection_t section,
> @@ -2557,6 +2619,20 @@ dns_message_pseudosectiontotext(dns_message_t *msg,
>   ADD_STRING(target, "\n");
>   continue;
>   }
> + } else if (optcode == DNS_OPT_EDE) {
> + uint16_t info_code;
> + ADD_STRING(target, "; EDE");
> + if (optlen >= 2) {
> + info_code =
> + isc_buffer_getuint16();
> + optlen -= 2;
> + snprintf(buf, sizeof(buf), ": %u (",
> + info_code);
> + ADD_STRING(target, buf);
> + ADD_STRING(target,
> + ede_info_code2str(info_code));
> + ADD_STRING(target, ")");
> + }
>   } else {
>   ADD_STRING(target, "; 

tree.h: returning void, legal but weird

2020-10-10 Thread Otto Moerbeek


OK?

-Otto

Index: tree.h
===
RCS file: /cvs/src/sys/sys/tree.h,v
retrieving revision 1.29
diff -u -p -r1.29 tree.h
--- tree.h  30 Jul 2017 19:27:20 -  1.29
+++ tree.h  10 Oct 2020 16:36:15 -
@@ -910,25 +910,25 @@ _name##_RBT_PARENT(struct _type *elm) 
 __unused static inline void\
 _name##_RBT_SET_LEFT(struct _type *elm, struct _type *left)\
 {  \
-   return _rb_set_left(_name##_RBT_TYPE, elm, left);   \
+   _rb_set_left(_name##_RBT_TYPE, elm, left);  \
 }  \
\
 __unused static inline void\
 _name##_RBT_SET_RIGHT(struct _type *elm, struct _type *right)  \
 {  \
-   return _rb_set_right(_name##_RBT_TYPE, elm, right); \
+   _rb_set_right(_name##_RBT_TYPE, elm, right);\
 }  \
\
 __unused static inline void\
 _name##_RBT_SET_PARENT(struct _type *elm, struct _type *parent)
\
 {  \
-   return _rb_set_parent(_name##_RBT_TYPE, elm, parent);   \
+   _rb_set_parent(_name##_RBT_TYPE, elm, parent);  \
 }  \
\
 __unused static inline void\
 _name##_RBT_POISON(struct _type *elm, unsigned long poison)\
 {  \
-   return _rb_poison(_name##_RBT_TYPE, elm, poison);   \
+   _rb_poison(_name##_RBT_TYPE, elm, poison);  \
 }  \
\
 __unused static inline int \



Re: random canary bytes for malloc

2020-10-04 Thread Otto Moerbeek
On Tue, Sep 29, 2020 at 08:17:54AM +0200, Otto Moerbeek wrote:

> Hi,
> 
> until now, canary bytes (used by the C olption) were the same as the
> bytes used to junk (0xfd).  This means that certain overwrites are not
> detected, like setting the high bit. 
> 
> This makes the byte value used to write canaries random. I do not want
> to complicate the code to handle all combinatuon of F and C, so 0xfd
> is still acepted as a canary byte.
> 
> Please test with all your favourite combinations of malloc flags.

Any takers apart from tb@ who tested this earlier?

-Otto

> 
> Index: malloc.c
> ===
> RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
> retrieving revision 1.263
> diff -u -p -r1.263 malloc.c
> --- malloc.c  6 Sep 2020 06:41:03 -   1.263
> +++ malloc.c  10 Sep 2020 10:53:18 -
> @@ -193,7 +193,7 @@ struct malloc_readonly {
>   int def_malloc_junk;/* junk fill? */
>   int malloc_realloc; /* always realloc? */
>   int malloc_xmalloc; /* xmalloc behaviour? */
> - int chunk_canaries; /* use canaries after chunks? */
> + u_int   chunk_canaries; /* use canaries after chunks? */
>   int internal_funcs; /* use better recallocarray/freezero? */
>   u_int   def_malloc_cache;   /* free pages we cache */
>   size_t  malloc_guard;   /* use guard pages after allocations? */
> @@ -468,6 +468,11 @@ omalloc_init(void)
>  
>   while ((mopts.malloc_canary = arc4random()) == 0)
>   ;
> + if (mopts.chunk_canaries)
> + do {
> + mopts.chunk_canaries = arc4random();
> + } while ((u_char)mopts.chunk_canaries == 0 ||
> + (u_char)mopts.chunk_canaries == SOME_FREEJUNK); 
>  }
>  
>  static void
> @@ -938,7 +943,7 @@ fill_canary(char *ptr, size_t sz, size_t
>  
>   if (check_sz > CHUNK_CHECK_LENGTH)
>   check_sz = CHUNK_CHECK_LENGTH;
> - memset(ptr + sz, SOME_JUNK, check_sz);
> + memset(ptr + sz, mopts.chunk_canaries, check_sz);
>  }
>  
>  /*
> @@ -1039,7 +1044,7 @@ validate_canary(struct dir_info *d, u_ch
>   q = p + check_sz;
>  
>   while (p < q) {
> - if (*p != SOME_JUNK) {
> + if (*p != (u_char)mopts.chunk_canaries && *p != SOME_JUNK) {
>   wrterror(d, "chunk canary corrupted %p %#tx@%#zx%s",
>   ptr, p - ptr, sz,
>   *p == SOME_FREEJUNK ? " (double free?)" : "");
> 



dump: better handling of large filesystems

2020-09-29 Thread Otto Moerbeek
Hi,

this fixes an overwrite of spcl.c_addr.  Taken form FreeBSD.

See https://marc.info/?l=openbsd-misc=160018252418088=2

-Otto


Index: tape.c
===
RCS file: /cvs/src/sbin/dump/tape.c,v
retrieving revision 1.45
diff -u -p -r1.45 tape.c
--- tape.c  28 Jun 2019 13:32:43 -  1.45
+++ tape.c  26 Sep 2020 06:30:37 -
@@ -330,7 +330,10 @@ flushtape(void)
}
 
blks = 0;
-   if (spcl.c_type != TS_END) {
+   if (spcl.c_type != TS_END && spcl.c_type != TS_CLRI &&
+   spcl.c_type != TS_BITS) {
+   if (spcl.c_count > TP_NINDIR)
+   quit("c_count too large\n");
for (i = 0; i < spcl.c_count; i++)
if (spcl.c_addr[i] != 0)
blks++;



random canary bytes for malloc

2020-09-29 Thread Otto Moerbeek
Hi,

until now, canary bytes (used by the C olption) were the same as the
bytes used to junk (0xfd).  This means that certain overwrites are not
detected, like setting the high bit. 

This makes the byte value used to write canaries random. I do not want
to complicate the code to handle all combinatuon of F and C, so 0xfd
is still acepted as a canary byte.

Please test with all your favourite combinations of malloc flags.

-Otto

Index: malloc.c
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.263
diff -u -p -r1.263 malloc.c
--- malloc.c6 Sep 2020 06:41:03 -   1.263
+++ malloc.c10 Sep 2020 10:53:18 -
@@ -193,7 +193,7 @@ struct malloc_readonly {
int def_malloc_junk;/* junk fill? */
int malloc_realloc; /* always realloc? */
int malloc_xmalloc; /* xmalloc behaviour? */
-   int chunk_canaries; /* use canaries after chunks? */
+   u_int   chunk_canaries; /* use canaries after chunks? */
int internal_funcs; /* use better recallocarray/freezero? */
u_int   def_malloc_cache;   /* free pages we cache */
size_t  malloc_guard;   /* use guard pages after allocations? */
@@ -468,6 +468,11 @@ omalloc_init(void)
 
while ((mopts.malloc_canary = arc4random()) == 0)
;
+   if (mopts.chunk_canaries)
+   do {
+   mopts.chunk_canaries = arc4random();
+   } while ((u_char)mopts.chunk_canaries == 0 ||
+   (u_char)mopts.chunk_canaries == SOME_FREEJUNK); 
 }
 
 static void
@@ -938,7 +943,7 @@ fill_canary(char *ptr, size_t sz, size_t
 
if (check_sz > CHUNK_CHECK_LENGTH)
check_sz = CHUNK_CHECK_LENGTH;
-   memset(ptr + sz, SOME_JUNK, check_sz);
+   memset(ptr + sz, mopts.chunk_canaries, check_sz);
 }
 
 /*
@@ -1039,7 +1044,7 @@ validate_canary(struct dir_info *d, u_ch
q = p + check_sz;
 
while (p < q) {
-   if (*p != SOME_JUNK) {
+   if (*p != (u_char)mopts.chunk_canaries && *p != SOME_JUNK) {
wrterror(d, "chunk canary corrupted %p %#tx@%#zx%s",
ptr, p - ptr, sz,
*p == SOME_FREEJUNK ? " (double free?)" : "");



Re: btrace: add boolean AND and OR operators

2020-09-14 Thread Otto Moerbeek
On Mon, Sep 14, 2020 at 03:28:17PM +0200, Jasper Lievisse Adriaanse wrote:

> Hi,
> 
> This diff adds support for the '&' and '|' operators, along with
> a new testcase.
> 
> OK?

The precedence looks funny

I'd guess you want

%left '|'
%left '&'
%left '+' '-'
%left '/' '*'

To avoid suprises.

-Otto

> 
> Index: usr.sbin/btrace/bt_parse.y
> ===
> RCS file: /cvs/src/usr.sbin/btrace/bt_parse.y,v
> retrieving revision 1.16
> diff -u -p -r1.16 bt_parse.y
> --- usr.sbin/btrace/bt_parse.y11 Jul 2020 14:52:14 -  1.16
> +++ usr.sbin/btrace/bt_parse.y14 Sep 2020 15:14:10 -
> @@ -119,6 +119,7 @@ static int yylex(void);
>  
>  %left'+' '-'
>  %left'/' '*'
> +%left'&' '|'
>  %%
>  
>  grammar  : /* empty */
> @@ -172,6 +173,8 @@ term  : '(' term ')'  { $$ = 
> $2; }
>   | term '-' term { $$ = ba_op('-', $1, $3); }
>   | term '/' term { $$ = ba_op('/', $1, $3); }
>   | term '*' term { $$ = ba_op('*', $1, $3); }
> + | term '&' term { $$ = ba_op('&', $1, $3); }
> + | term '|' term { $$ = ba_op('|', $1, $3); }
>   | NUMBER{ $$ = ba_new($1, B_AT_LONG); }
>   | builtin   { $$ = ba_new(NULL, $1); }
>   | gvar  { $$ = bv_get($1); }
> @@ -331,6 +334,12 @@ ba_op(const char op, struct bt_arg *da0,
>   break;
>   case '/':
>   type = B_AT_OP_DIVIDE;
> + break;
> + case '&':
> + type = B_AT_OP_AND;
> + break;
> + case '|':
> + type = B_AT_OP_OR;
>   break;
>   default:
>   assert(0);
> Index: usr.sbin/btrace/bt_parser.h
> ===
> RCS file: /cvs/src/usr.sbin/btrace/bt_parser.h,v
> retrieving revision 1.9
> diff -u -p -r1.9 bt_parser.h
> --- usr.sbin/btrace/bt_parser.h   13 Aug 2020 11:29:39 -  1.9
> +++ usr.sbin/btrace/bt_parser.h   14 Sep 2020 15:14:10 -
> @@ -143,6 +143,8 @@ struct bt_arg {
>   B_AT_OP_MINUS,
>   B_AT_OP_MULT,
>   B_AT_OP_DIVIDE,
> + B_AT_OP_AND,
> + B_AT_OP_OR,
>   }ba_type;
>  };
>  
> Index: usr.sbin/btrace/btrace.c
> ===
> RCS file: /cvs/src/usr.sbin/btrace/btrace.c,v
> retrieving revision 1.24
> diff -u -p -r1.24 btrace.c
> --- usr.sbin/btrace/btrace.c  11 Sep 2020 08:16:15 -  1.24
> +++ usr.sbin/btrace/btrace.c  14 Sep 2020 15:14:10 -
> @@ -812,7 +812,7 @@ stmt_store(struct bt_stmt *bs, struct dt
>   case B_AT_BI_NSECS:
>   bv->bv_value = ba_new(builtin_nsecs(dtev), B_AT_LONG);
>   break;
> - case B_AT_OP_ADD ... B_AT_OP_DIVIDE:
> + case B_AT_OP_ADD ... B_AT_OP_OR:
>   bv->bv_value = ba_new(ba2long(ba, dtev), B_AT_LONG);
>   break;
>   default:
> @@ -992,6 +992,12 @@ baexpr2long(struct bt_arg *ba, struct dt
>   case B_AT_OP_DIVIDE:
>   result = first / second;
>   break;
> + case B_AT_OP_AND:
> + result = first & second;
> + break;
> + case B_AT_OP_OR:
> + result = first | second;
> + break;
>   default:
>   xabort("unsuported operation %d", ba->ba_type);
>   }
> @@ -1025,7 +1031,7 @@ ba2long(struct bt_arg *ba, struct dt_evt
>   case B_AT_BI_RETVAL:
>   val = dtev->dtev_sysretval[0];
>   break;
> - case B_AT_OP_ADD ... B_AT_OP_DIVIDE:
> + case B_AT_OP_ADD ... B_AT_OP_OR:
>   val = baexpr2long(ba, dtev);
>   break;
>   default:
> @@ -1093,7 +1099,7 @@ ba2str(struct bt_arg *ba, struct dt_evt 
>   case B_AT_VAR:
>   str = ba2str(ba_read(ba), dtev);
>   break;
> - case B_AT_OP_ADD ... B_AT_OP_DIVIDE:
> + case B_AT_OP_ADD ... B_AT_OP_OR:
>   snprintf(buf, sizeof(buf) - 1, "%ld", ba2long(ba, dtev));
>   str = buf;
>   break;
> @@ -1152,7 +1158,7 @@ ba2dtflags(struct bt_arg *ba)
>   case B_AT_MF_MAX:
>   case B_AT_MF_MIN:
>   case B_AT_MF_SUM:
> - case B_AT_OP_ADD ... B_AT_OP_DIVIDE:
> + case B_AT_OP_ADD ... B_AT_OP_OR:
>   break;
>   default:
>   xabort("invalid argument type %d", ba->ba_type);
> Index: regress/usr.sbin/btrace/Makefile
> ===
> RCS file: /cvs/src/regress/usr.sbin/btrace/Makefile,v
> retrieving revision 1.4
> diff -u -p -r1.4 Makefile
> --- 

Re: asn1/a_bitstring.c: zeroing after recallocarray

2020-09-02 Thread Otto Moerbeek
On Thu, Sep 03, 2020 at 07:03:14AM +0200, Theo Buehler wrote:

> The memset is not needed as recallocarray(3) does the zeroing already.
> (I also think the a->data == NULL check in the if clause is redundant,
> but I'm just suggesting to remove a bit that confused me)

ok,

-Otto

> 
> Index: asn1/a_bitstr.c
> ===
> RCS file: /var/cvs/src/lib/libcrypto/asn1/a_bitstr.c,v
> retrieving revision 1.29
> diff -u -p -U7 -r1.29 a_bitstr.c
> --- asn1/a_bitstr.c   20 Oct 2018 16:07:09 -  1.29
> +++ asn1/a_bitstr.c   15 Jun 2020 12:46:00 -
> @@ -211,16 +211,14 @@ ASN1_BIT_STRING_set_bit(ASN1_BIT_STRING 
>   if ((a->length < (w + 1)) || (a->data == NULL)) {
>   if (!value)
>   return(1); /* Don't need to set */
>   if ((c = recallocarray(a->data, a->length, w + 1, 1)) == NULL) {
>   ASN1error(ERR_R_MALLOC_FAILURE);
>   return 0;
>   }
> - if (w + 1 - a->length > 0)
> - memset(c + a->length, 0, w + 1 - a->length);
>   a->data = c;
>   a->length = w + 1;
>   }
>   a->data[w] = ((a->data[w]) & iv) | v;
>   while ((a->length > 0) && (a->data[a->length - 1] == 0))
>   a->length--;
>  
> 



Re: shrinking and growing reallocs: a theoretical? bad case for performance

2020-09-02 Thread Otto Moerbeek
On Tue, Sep 01, 2020 at 11:56:36AM +0100, Stuart Henderson wrote:

> On 2020/08/31 08:39, Otto Moerbeek wrote:
> > A question from Theo made me think about realloc and come up with a
> > particular bad case for performance. I do not know if it happens in
> > practice, but it was easy to create a test program to hit the case.
> 
> Not very scientific testing (a single attempt at building one port), but
> this seems to help quite a lot when compiling programs written in rust.
> I encourage others to test the diff :-)
> 

It turned out this particular case was a fluke. But I'm still very
interested in cases where it does matter and tests in general as well.

-Otto



Re: shrinking and growing reallocs: a theoretical? bad case for performance

2020-08-31 Thread Otto Moerbeek
On Mon, Aug 31, 2020 at 11:25:51AM -0600, Theo de Raadt wrote:

> > Taking advantage of the sparse address space is smart and as 64-bit
> > is now the norm, that space is even sparser.
> 
> Fundamentally this is moving various forms of pressure to the kernel,
> which does not do the best job yet.

This effect is reduced by making small shrinks a no-op.

> 
> The pivot code in mmap for new mappings isn't entirely bug-free so we've
> avoided it turning it on.  The idea of that code is be random as
> neccessary -- creating "unknowable addresses", but in doing so avoid
> fragmenting the address space excessively.  Excessive fragmentation in turn
> fragmentations allocation in multi-level page-tables, and that in turn
> results in excessive TLB pressure.  Which is difficult to gauge since things
> keep working, but brings in a big performance cost.
> 
> Basically we were brave to do very high amounts of randomization early on.
> At a cost.  But our work to improve the cost isn't finished.



Re: shrinking and growing reallocs: a theoretical? bad case for performance

2020-08-31 Thread Otto Moerbeek
On Mon, Aug 31, 2020 at 08:28:25AM -0400, David Higgs wrote:

> On Mon, Aug 31, 2020 at 2:41 AM Otto Moerbeek  wrote:
> 
> > Hi,
> >
> > A question from Theo made me think about realloc and come up with a
> > particular bad case for performance. I do not know if it happens in
> > practice, but it was easy to create a test program to hit the case.
> >
> > We're talking allocation >= a page here. Smaller allocation follow
> > different rules.
> >
> > If an allocation is grown by realloc, I first tries to extend the
> > allocation by mapping pages next to the existing allocation. Since
> > the location of pages is randomized, chanches are high that next to an
> > allocation there are unmapped pages so the grow will work out.
> >
> > If realloc needs to shrink the allocation it puts the high pages no
> > longer needed in the malloc cache. There they can be re-used by other
> > allocations. But if that happens, next a grow of first allocation will
> > fail: the pages are already mapped. So realloc needs to do a new
> > allocation followed by a copy and a cleanup of the original.
> >
> > So this strategy of a shrinking realloc to of put unneeded pages into
> > the cache can work against us, plus it has the consequence that use of
> > realloc leads to allocations close to each other: no free guard pages.
> >
> 
> If I am interpreting this correctly, realloc could be used to groom/shape
> the heap such that future allocations are less random and more predictable?
> 
> --david

In a way yes, but that's a consequence of caching pages: new
allocations will come from the cache if possible. But with this diff
there are less possibilities. Also note that malloc option S disables
the cache. 

-Otto



shrinking and growing reallocs: a theoretical? bad case for performance

2020-08-31 Thread Otto Moerbeek
Hi,

A question from Theo made me think about realloc and come up with a
particular bad case for performance. I do not know if it happens in
practice, but it was easy to create a test program to hit the case.

We're talking allocation >= a page here. Smaller allocation follow
different rules.

If an allocation is grown by realloc, I first tries to extend the
allocation by mapping pages next to the existing allocation. Since
the location of pages is randomized, chanches are high that next to an
allocation there are unmapped pages so the grow will work out.

If realloc needs to shrink the allocation it puts the high pages no
longer needed in the malloc cache. There they can be re-used by other
allocations. But if that happens, next a grow of first allocation will
fail: the pages are already mapped. So realloc needs to do a new
allocation followed by a copy and a cleanup of the original.

So this strategy of a shrinking realloc to of put unneeded pages into
the cache can work against us, plus it has the consequence that use of
realloc leads to allocations close to each other: no free guard pages.

The program below tests this scenario and runs awfully slow. The diff
fixes this by applying two strategies. The first already makes a huge
difference, but the second strategy will also reduce the total number
of syscalls at the cost of some more memory use.

1. I do not put high pages of shrinking reallocs into to cache, but
directly unmap.

2. For small shrinking reallocs realloc become a no-op. Pro: no
syscalls at all, cons: the actual allocation is larger, so less
overflow detection. So I do not do this if guard pages are active or
the reduction is larger than the cache size.

Some stats, First run is -current, second one is with (an earlier
version of) the diff on an armv7 machine. Other systems also show huge
differences.

[otto@wand:19]$ time ./a.out
0m31.68s real 0m10.02s user 0m21.65s system

[otto@wand:33]$ time ./a.out
0m00.16s real 0m00.12s user 0m00.03s system

I do not see any diffference for builds. But I cna imagine real-life
programs hitting the case.

-Otto



#include 
#include 
#include 

void *p;
size_t psz;
#define E(x) if ((x) == NULL) err(1, NULL)

void f(void)
{
int i;
void *s[64];

p = realloc(p, 1023*psz);
E(p);
for (i = 0; i < 64; i++) {
s[i] = malloc(psz);
E(s[i]);
}
p = realloc(p, 1024*psz);
E(p);
for (i = 0; i < 64; i++)
free(s[i]);

}

int main()
{
int i;

psz = getpagesize();
p = malloc(1024*psz);
E(p);
for (i = 0; i < 1000; i++)
f();
}


Index: malloc.c
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.262
diff -u -p -r1.262 malloc.c
--- malloc.c28 Jun 2019 13:32:42 -  1.262
+++ malloc.c31 Aug 2020 06:01:40 -
@@ -728,28 +728,8 @@ unmap(struct dir_info *d, void *p, size_
wrterror(d, "malloc cache overflow");
 }
 
-static void
-zapcacheregion(struct dir_info *d, void *p, size_t len)
-{
-   u_int i;
-   struct region_info *r;
-   size_t rsz;
-
-   for (i = 0; i < d->malloc_cache; i++) {
-   r = >free_regions[i];
-   if (r->p >= p && r->p <= (void *)((char *)p + len)) {
-   rsz = r->size << MALLOC_PAGESHIFT;
-   if (munmap(r->p, rsz))
-   wrterror(d, "munmap %p", r->p);
-   r->p = NULL;
-   d->free_regions_size -= r->size;
-   STATS_SUB(d->malloc_used, rsz);
-   }
-   }
-}
-
 static void *
-map(struct dir_info *d, void *hint, size_t sz, int zero_fill)
+map(struct dir_info *d, size_t sz, int zero_fill)
 {
size_t psz = sz >> MALLOC_PAGESHIFT;
struct region_info *r, *big = NULL;
@@ -762,7 +742,7 @@ map(struct dir_info *d, void *hint, size
if (sz != PAGEROUND(sz))
wrterror(d, "map round");
 
-   if (hint == NULL && psz > d->free_regions_size) {
+   if (psz > d->free_regions_size) {
_MALLOC_LEAVE(d);
p = MMAP(sz, d->mmap_flag);
_MALLOC_ENTER(d);
@@ -774,8 +754,6 @@ map(struct dir_info *d, void *hint, size
for (i = 0; i < d->malloc_cache; i++) {
r = >free_regions[(i + d->rotor) & (d->malloc_cache - 1)];
if (r->p != NULL) {
-   if (hint != NULL && r->p != hint)
-   continue;
if (r->size == psz) {
p = r->p;
r->p = NULL;
@@ -807,8 +785,6 @@ map(struct dir_info *d, void *hint, size
memset(p, SOME_FREEJUNK, sz);
return p;
}
-   if (hint != NULL)
-   return 

Re: ntpd: go into unsynced mode

2020-08-30 Thread Otto Moerbeek
On Sat, Aug 22, 2020 at 03:51:48PM +0200, Otto Moerbeek wrote:

> Hi,
> 
> At the moment ntpd never goes into unsynced mode if network
> connectivity is lost. The code to do that is only triggered when a
> pakcet is received, which does not happen. 
> 
> This diff fixes that by going into unsynced mode if no time data was
> processed for a while. 
> 
> An earlier version of this diff was tested by naddy@. Compared to that
> version, the needed period of inactivity is now three times as large
> and I set scale to 1, so recovery goes faster.
> 
> Please test and review,

anyone wants to ok?

-Otto


> 
> Index: ntp.c
> ===
> RCS file: /cvs/src/usr.sbin/ntpd/ntp.c,v
> retrieving revision 1.165
> diff -u -p -r1.165 ntp.c
> --- ntp.c 22 Jun 2020 06:11:34 -  1.165
> +++ ntp.c 22 Aug 2020 13:48:34 -
> @@ -89,6 +89,7 @@ ntp_main(struct ntpd_conf *nconf, struct
>   struct stat  stb;
>   struct ctl_conn *cc;
>   time_t   nextaction, last_sensor_scan = 0, now;
> + time_t   last_action = 0, interval;
>   void*newp;
>  
>   if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, PF_UNSPEC,
> @@ -402,6 +403,7 @@ ntp_main(struct ntpd_conf *nconf, struct
>   for (; nfds > 0 && j < idx_clients; j++) {
>   if (pfd[j].revents & (POLLIN|POLLERR)) {
>   nfds--;
> + last_action = now;
>   if (client_dispatch(idx2peer[j - idx_peers],
>   conf->settime, conf->automatic) == -1) {
>   log_warn("pipe write error (settime)");
> @@ -417,8 +419,24 @@ ntp_main(struct ntpd_conf *nconf, struct
>   for (s = TAILQ_FIRST(>ntp_sensors); s != NULL;
>   s = next_s) {
>   next_s = TAILQ_NEXT(s, entry);
> - if (s->next <= getmonotime())
> + if (s->next <= now) {
> + last_action = now;
>   sensor_query(s);
> + }
> + }
> +
> + /*
> +  * Compute maximum of scale_interval(INTERVAL_QUERY_NORMAL),
> +  * if we did not process a time message for three times that
> +  * interval, stop advertising we're synced.
> +  */
> + interval = INTERVAL_QUERY_NORMAL * conf->scale;
> + interval += MAXIMUM(5, interval / 10) - 1;
> + if (conf->status.synced && last_action + 3 * interval < now) {
> + log_info("clock is now unsynced");
> + conf->status.synced = 0;
> + conf->scale = 1;
> + priv_dns(IMSG_UNSYNCED, NULL, 0);
>   }
>   }
>  
> 



Re: ntpd: go into unsynced mode

2020-08-25 Thread Otto Moerbeek
On Tue, Aug 25, 2020 at 07:05:31PM +0200, Matthias Schmidt wrote:

> Hi Otto,
> 
> * Otto Moerbeek wrote:
> > Hi,
> > 
> > At the moment ntpd never goes into unsynced mode if network
> > connectivity is lost. The code to do that is only triggered when a
> > pakcet is received, which does not happen. 
> > 
> > This diff fixes that by going into unsynced mode if no time data was
> > processed for a while. 
> > 
> > An earlier version of this diff was tested by naddy@. Compared to that
> > version, the needed period of inactivity is now three times as large
> > and I set scale to 1, so recovery goes faster.
> > 
> > Please test and review,
> 
> I have your diff running on my Laptop which sometimes not connected to a
> network so it should be a good test case.
> 
> I haven't noticed any difference to before, so I count that as a good
> sign :)  I spotted only one thing:  While "ntpctl -s a" says that the
> clock is unsynced I see no message from ntpd in the logs.  Not sure if
> that's on purpose or not, I just noticed it.

Thanks for testing. 

There should be "clock is now unsynced" and "clock is now synced" messages
in /var/log/daemon... here they do appear.

-Otto



ntpd: go into unsynced mode

2020-08-22 Thread Otto Moerbeek
Hi,

At the moment ntpd never goes into unsynced mode if network
connectivity is lost. The code to do that is only triggered when a
pakcet is received, which does not happen. 

This diff fixes that by going into unsynced mode if no time data was
processed for a while. 

An earlier version of this diff was tested by naddy@. Compared to that
version, the needed period of inactivity is now three times as large
and I set scale to 1, so recovery goes faster.

Please test and review,

-Otto

Index: ntp.c
===
RCS file: /cvs/src/usr.sbin/ntpd/ntp.c,v
retrieving revision 1.165
diff -u -p -r1.165 ntp.c
--- ntp.c   22 Jun 2020 06:11:34 -  1.165
+++ ntp.c   22 Aug 2020 13:48:34 -
@@ -89,6 +89,7 @@ ntp_main(struct ntpd_conf *nconf, struct
struct stat  stb;
struct ctl_conn *cc;
time_t   nextaction, last_sensor_scan = 0, now;
+   time_t   last_action = 0, interval;
void*newp;
 
if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, PF_UNSPEC,
@@ -402,6 +403,7 @@ ntp_main(struct ntpd_conf *nconf, struct
for (; nfds > 0 && j < idx_clients; j++) {
if (pfd[j].revents & (POLLIN|POLLERR)) {
nfds--;
+   last_action = now;
if (client_dispatch(idx2peer[j - idx_peers],
conf->settime, conf->automatic) == -1) {
log_warn("pipe write error (settime)");
@@ -417,8 +419,24 @@ ntp_main(struct ntpd_conf *nconf, struct
for (s = TAILQ_FIRST(>ntp_sensors); s != NULL;
s = next_s) {
next_s = TAILQ_NEXT(s, entry);
-   if (s->next <= getmonotime())
+   if (s->next <= now) {
+   last_action = now;
sensor_query(s);
+   }
+   }
+
+   /*
+* Compute maximum of scale_interval(INTERVAL_QUERY_NORMAL),
+* if we did not process a time message for three times that
+* interval, stop advertising we're synced.
+*/
+   interval = INTERVAL_QUERY_NORMAL * conf->scale;
+   interval += MAXIMUM(5, interval / 10) - 1;
+   if (conf->status.synced && last_action + 3 * interval < now) {
+   log_info("clock is now unsynced");
+   conf->status.synced = 0;
+   conf->scale = 1;
+   priv_dns(IMSG_UNSYNCED, NULL, 0);
}
}
 



Re: adjtime(2): distribute skew along arbitrary runtime period

2020-07-16 Thread Otto Moerbeek
On Wed, Jul 15, 2020 at 09:08:29AM -0500, Scott Cheloha wrote:

> Hi,
> 
> adjtime(2) skews the clock at up to 5000ppm per second.  The way this
> actually happens is pretty straightforward: at the start of every UTC
> second we call ntp_update_second() from tc_windup() and reset
> th_adjustment.  th_adjustment is then mixed into the scale for one UTC
> second.  This cycle slowly chips away at th_adjtimedelta, eventually
> reducing it to zero.
> 
> This is fine, except that using UTC for your update period requires
> you to work around how the UTC time can jump forward a huge amount.
> There are two notable jumps:
> 
> 1. The big jump forward to the RTC time during boot.
> 
> 2. The big jump forward to the RTC time after each resume.
> 
> To handle this we have a magic number in the code, LARGE_STEP.  If the
> UTC time jumps more than LARGE_STEP (200) seconds we truncate the
> number of ntp_update_second() calls to 2 to avoid looping endlessly in
> tc_windup().  Here we find a wart: we do 2 calls to account for a
> missed leap second, even though we no longer handle those in the
> kernel.
> 
> The magic number approach is less than ideal because it doesn't handle
> short suspends correctly: suspends shorter than 200 seconds are
> deducted from th_adjtimedelta even though we do not skew the clock
> during suspend.
> 
> Now that the timehands have a concept of "runtime" (time spent not
> suspended) I think it would be nicer if we called ntp_update_second()
> along an arbitrary period on the runtime clock.
> 
> So, this diff:
> 
> When adjtime(2) is called the NTP update period (th_next_ntp_update)
> is changed to align with the current runtime.  Thereafter, once per
> second, ntp_update_second() is called.
> 
> We don't deduct any skew from th_adjtimedelta across a big UTC jump
> (like a suspend) because the runtime clock does not advance while the
> machine is down.
> 
> Another upside is that skew changes via adjtime(2) happen immediately
> instead of being applied up to one second later.  For example, if the
> adjtime(2) skew is cancelled, the skew stops right away instead of
> continuing for up to one second.  This behavior seems more correct to
> me.
> 
> And, obviously, we can get rid of the magic number.
> 
> --
> 
> otto: Does the NTP algorithm *require* us to distribute the adjtime(2)
>   skew as we do?  At the start of the UTC second?  Or can we choose
>   an arbitrary starting point for the period like I do in this diff?
> 
>   My intuition is that this diff shouldn't break anything, and my
>   testing suggests it doesn't, but I'd appreciate a test all the same.

As far as I know, the NTP adjustment algorithm does not depend on a
particular point.

-Otto

> 
> Index: kern_tc.c
> ===
> RCS file: /cvs/src/sys/kern/kern_tc.c,v
> retrieving revision 1.62
> diff -u -p -r1.62 kern_tc.c
> --- kern_tc.c 6 Jul 2020 13:33:09 -   1.62
> +++ kern_tc.c 15 Jul 2020 13:56:22 -
> @@ -35,14 +35,6 @@
>  #include 
>  #include 
>  
> -/*
> - * A large step happens on boot.  This constant detects such steps.
> - * It is relatively small so that ntp_update_second gets called enough
> - * in the typical 'missed a couple of seconds' case, but doesn't loop
> - * forever when the time step is large.
> - */
> -#define LARGE_STEP   200
> -
>  u_int dummy_get_timecount(struct timecounter *);
>  
>  int sysctl_tc_hardware(void *, size_t *, void *, size_t);
> @@ -77,6 +69,7 @@ struct timehands {
>   /* These fields must be initialized by the driver. */
>   struct timecounter  *th_counter;/* [W] */
>   int64_t th_adjtimedelta;/* [T,W] */
> + struct bintime  th_next_ntp_update; /* [T,W] */
>   int64_t th_adjustment;  /* [W] */
>   u_int64_t   th_scale;   /* [W] */
>   u_int   th_offset_count;/* [W] */
> @@ -564,12 +557,11 @@ void
>  tc_windup(struct bintime *new_boottime, struct bintime *new_offset,
>  int64_t *new_adjtimedelta)
>  {
> - struct bintime bt;
> + struct bintime diff, runtime, utc;
>   struct timecounter *active_tc;
>   struct timehands *th, *tho;
>   u_int64_t scale;
>   u_int delta, ncount, ogen;
> - int i;
>  
>   if (new_boottime != NULL || new_adjtimedelta != NULL)
>   rw_assert_wrlock(_lock);
> @@ -609,8 +601,8 @@ tc_windup(struct bintime *new_boottime, 
>* accordingly.
>*/
>   if (new_offset != NULL && bintimecmp(>th_offset, new_offset, <)) {
> - bintimesub(new_offset, >th_offset, );
> - bintimeadd(>th_naptime, , >th_naptime);
> + bintimesub(new_offset, >th_offset, );
> + bintimeadd(>th_naptime, , >th_naptime);
>   th->th_offset = *new_offset;
>   }
>  
> @@ -633,30 +625,29 @@ tc_windup(struct bintime *new_boottime, 
>*/
>   

Re: fsck_ffs: faster with lots of cylinder groups

2020-07-12 Thread Otto Moerbeek
On Sun, Jul 12, 2020 at 11:07:05AM +0200, Solene Rapenne wrote:

> On Sun, 12 Jul 2020 09:13:47 +0200
> Otto Moerbeek :
> 
> > On Mon, Jun 29, 2020 at 02:30:41PM +0200, Otto Moerbeek wrote:
> > 
> > > On Sun, Jun 21, 2020 at 03:35:21PM +0200, Otto Moerbeek wrote:
> > >   
> > > > Hi,
> > > > 
> > > > both phase 1 and phase 5 need cylinder group metadata.  This diff
> > > > keeps the cg data read in phase 1 in memory to be used by phase 5 if
> > > > possible. From FreeBSD. 
> > > > 
> > > > -Otto
> > > > 
> > > > On an empty 30T fileystem:
> > > > 
> > > > $ time obj/fsck_ffs -f /dev/sd3a
> > > > 2m44.10s real 0m13.21s user 0m07.38s system
> > > > 
> > > > $ time doas obj/fsck_ffs -f /dev/sd3a
> > > > 1m32.81s real 0m12.86s user 0m05.25s system
> > > > 
> > > > The difference will be less if a fileystem is filled up, but still 
> > > > nice.  
> > > 
> > > Any takers?  
> > 
> > No feedback. I'm getting discouraged in doing more filesystem work...
> > 
> > What to do?
> > 
> > 1) Abondon the diff
> > 2) Commit without ok
> > 
> > I did quite extensive testing, but both options are unsatisfactory.
> > 
> > -Otto
> 
> I'm not sure how to test your diff.
> Would running fsck on a sane filesystem enough?
> 
> Are you using Vms that you halt to force a
> fsck on them? Would this be a good test too?

I have used both large and small fieysystems, clean and with
inconsistencies, both ffs1 and ffs2. Sometimes I create
inconsistencies by power cycling a machine, buut I have created faulty
filesystems by carefully overwriting meta data with dd in the past as
well.

In this case running with a restricted ulimit -d to force the fallback
code to kick in is also an good idea.

-Otto



Re: fsck_ffs: faster with lots of cylinder groups

2020-07-12 Thread Otto Moerbeek
On Mon, Jun 29, 2020 at 02:30:41PM +0200, Otto Moerbeek wrote:

> On Sun, Jun 21, 2020 at 03:35:21PM +0200, Otto Moerbeek wrote:
> 
> > Hi,
> > 
> > both phase 1 and phase 5 need cylinder group metadata.  This diff
> > keeps the cg data read in phase 1 in memory to be used by phase 5 if
> > possible. From FreeBSD. 
> > 
> > -Otto
> > 
> > On an empty 30T fileystem:
> > 
> > $ time obj/fsck_ffs -f /dev/sd3a
> > 2m44.10s real 0m13.21s user 0m07.38s system
> > 
> > $ time doas obj/fsck_ffs -f /dev/sd3a
> > 1m32.81s real 0m12.86s user 0m05.25s system
> > 
> > The difference will be less if a fileystem is filled up, but still nice.
> 
> Any takers?

No feedback. I'm getting discouraged in doing more filesystem work...

What to do?

1) Abondon the diff
2) Commit without ok

I did quite extensive testing, but both options are unsatisfactory.

-Otto

> 
> > 
> > Index: fsck.h
> > ===
> > RCS file: /cvs/src/sbin/fsck_ffs/fsck.h,v
> > retrieving revision 1.32
> > diff -u -p -r1.32 fsck.h
> > --- fsck.h  5 Jan 2018 09:33:47 -   1.32
> > +++ fsck.h  21 Jun 2020 12:48:50 -
> > @@ -136,7 +136,6 @@ struct bufarea {
> >  struct bufarea bufhead;/* head of list of other blks in 
> > filesys */
> >  struct bufarea sblk;   /* file system superblock */
> >  struct bufarea asblk;  /* alternate file system superblock */
> > -struct bufarea cgblk;  /* cylinder group blocks */
> >  struct bufarea *pdirbp;/* current directory contents */
> >  struct bufarea *pbp;   /* current inode block */
> >  struct bufarea *getdatablk(daddr_t, long);
> > @@ -148,9 +147,7 @@ struct bufarea *getdatablk(daddr_t, long
> > (bp)->b_flags = 0;
> >  
> >  #definesbdirty()   sblk.b_dirty = 1
> > -#definecgdirty()   cgblk.b_dirty = 1
> >  #definesblock  (*sblk.b_un.b_fs)
> > -#definecgrp(*cgblk.b_un.b_cg)
> >  
> >  enum fixstate {DONTKNOW, NOFIX, FIX, IGNORE};
> >  
> > @@ -275,9 +272,13 @@ struct ufs2_dinode ufs2_zino;
> >  #defineFOUND   0x10
> >  
> >  union dinode *ginode(ino_t);
> > +struct bufarea *cglookup(u_int cg);
> >  struct inoinfo *getinoinfo(ino_t);
> >  void getblk(struct bufarea *, daddr_t, long);
> >  ino_t allocino(ino_t, int);
> > +void *Malloc(size_t);
> > +void *Calloc(size_t, size_t);
> > +void *Reallocarray(void *, size_t, size_t);
> >  
> >  int(*info_fn)(char *, size_t);
> >  char   *info_filesys;
> > Index: inode.c
> > ===
> > RCS file: /cvs/src/sbin/fsck_ffs/inode.c,v
> > retrieving revision 1.49
> > diff -u -p -r1.49 inode.c
> > --- inode.c 16 Sep 2018 02:43:11 -  1.49
> > +++ inode.c 21 Jun 2020 12:48:50 -
> > @@ -370,7 +370,7 @@ setinodebuf(ino_t inum)
> > partialsize = inobufsize;
> > }
> > if (inodebuf == NULL &&
> > -   (inodebuf = malloc((unsigned)inobufsize)) == NULL)
> > +   (inodebuf = Malloc((unsigned)inobufsize)) == NULL)
> > errexit("Cannot allocate space for inode buffer\n");
> >  }
> >  
> > @@ -401,7 +401,7 @@ cacheino(union dinode *dp, ino_t inumber
> > blks = howmany(DIP(dp, di_size), sblock.fs_bsize);
> > if (blks > NDADDR)
> > blks = NDADDR + NIADDR;
> > -   inp = malloc(sizeof(*inp) + (blks ? blks - 1 : 0) * sizeof(daddr_t));
> > +   inp = Malloc(sizeof(*inp) + (blks ? blks - 1 : 0) * sizeof(daddr_t));
> > if (inp == NULL)
> > errexit("cannot allocate memory for inode cache\n");
> > inpp = [inumber % numdirs];
> > @@ -423,10 +423,10 @@ cacheino(union dinode *dp, ino_t inumber
> > inp->i_blks[NDADDR + i] = DIP(dp, di_ib[i]);
> > if (inplast == listmax) {
> > newlistmax = listmax + 100;
> > -   newinpsort = reallocarray(inpsort,
> > +   newinpsort = Reallocarray(inpsort,
> > (unsigned)newlistmax, sizeof(struct inoinfo *));
> > if (newinpsort == NULL)
> > -   errexit("cannot increase directory list");
> > +   errexit("cannot increase directory list\n");
> > inpsort = newinpsort;
> > listmax = newlistmax;
> > }

Re: Undefined Behavior at jsmn.c

2020-07-12 Thread Otto Moerbeek
On Sun, Jul 12, 2020 at 09:57:02AM +0430, Ali Farzanrad wrote:

> Hi @tech,
> 
> I was comparing jsmn.c in acme-client with jsmn.c in FreeBSD [1].
> I found a switch without a default case which is an undefined behavior:
> 
> @@ -69,6 +69,8 @@
>   case '\t' : case '\r' : case '\n' : case ' ' :
>   case ','  : case ']'  : case '}' :
>   goto found;
> + default:
> + break;
>   }
>   if (js[parser->pos] < 32 || js[parser->pos] >= 127) {
>   parser->pos = start;
> 
> I have patched that undefined behavior + some style fix.

It is bad practise to intermix style changes with bug fixes. 
Please post the fix seperately.

-Otto

> 
> [1] https://svnweb.freebsd.org/base/head/lib/libpmc/pmu-events/jsmn.c
> 
> Index: jsmn.c
> ===
> RCS file: /cvs/src/usr.sbin/acme-client/jsmn.c,v
> retrieving revision 1.1
> diff -u -p -r1.1 jsmn.c
> --- jsmn.c31 Aug 2016 22:01:42 -  1.1
> +++ jsmn.c12 Jul 2020 05:10:34 -
> @@ -1,31 +1,33 @@
>  /*
> - Copyright (c) 2010 Serge A. Zaitsev
> - 
> - Permission is hereby granted, free of charge, to any person obtaining a copy
> - of this software and associated documentation files (the "Software"), to 
> deal
> - in the Software without restriction, including without limitation the rights
> - to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> - copies of the Software, and to permit persons to whom the Software is
> - furnished to do so, subject to the following conditions:
> - 
> - The above copyright notice and this permission notice shall be included in
> - all copies or substantial portions of the Software.
> - 
> - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> - IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> - FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> - AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> - LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> - OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> - THE SOFTWARE.*
> + * Copyright (c) 2010 Serge A. Zaitsev
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 
> THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
>   */
> +
>  #include "jsmn.h"
>  
> -/**
> - * Allocates a fresh unused token from the token pull.
> +/*
> + * Allocates a fresh unused token from the token pool.
>   */
> -static jsmntok_t *jsmn_alloc_token(jsmn_parser *parser,
> - jsmntok_t *tokens, size_t num_tokens) {
> +static jsmntok_t *
> +jsmn_alloc_token(jsmn_parser *parser, jsmntok_t *tokens, size_t num_tokens)
> +{
>   jsmntok_t *tok;
>   if (parser->toknext >= num_tokens) {
>   return NULL;
> @@ -39,22 +41,25 @@ static jsmntok_t *jsmn_alloc_token(jsmn_
>   return tok;
>  }
>  
> -/**
> +/*
>   * Fills token type and boundaries.
>   */
> -static void jsmn_fill_token(jsmntok_t *token, jsmntype_t type,
> -int start, int end) {
> +static void
> +jsmn_fill_token(jsmntok_t *token, jsmntype_t type, int start, int end)
> +{
>   token->type = type;
>   token->start = start;
>   token->end = end;
>   token->size = 0;
>  }
>  
> -/**
> +/*
>   * Fills next available token with JSON primitive.
>   */
> -static int jsmn_parse_primitive(jsmn_parser *parser, const char *js,
> - size_t len, jsmntok_t *tokens, size_t num_tokens) {
> +static int
> +jsmn_parse_primitive(jsmn_parser *parser, const char *js,
> +size_t len, jsmntok_t *tokens, size_t num_tokens)
> +{
>   jsmntok_t *token;
>   int start;
>  
> @@ -63,12 +68,19 @@ static int jsmn_parse_primitive(jsmn_par
>   for (; parser->pos < len && 

Re: adjfreq(2): limit adjustment to prevent overflow during tc_windup()

2020-07-03 Thread Otto Moerbeek
On Thu, Jul 02, 2020 at 08:27:58PM -0500, Scott Cheloha wrote:

> Hi,
> 
> When we recompute the scaling factor during tc_windup() there is an
> opportunity for arithmetic overflow/underflow when we add the NTP
> adjustment into the scale:
> 
>649  scale = (u_int64_t)1 << 63;
>650  scale += \
>651  ((th->th_adjustment + th->th_counter->tc_freq_adj) / 
> 1024) * 2199;
>652  scale /= th->th_counter->tc_frequency;
>653  th->th_scale = scale * 2;
> 
> At lines 650 and 651, you will overflow/underflow if
> th->th_counter->tc_freq_adj is sufficiently positive/negative.
> 
> I don't like the idea of checking for that overflow during
> tc_windup().  We can pick a reasonable adjustment range and check for
> it during adjfreq(2) and that should be good enough.
> 
> My strawman proposal is a range of -5 to 5 parts per
> billion.  We could push the limits a bit, but half a billion seems
> like a nice round number to me.
> 
> On a perfect clock, this means you can effect a 0.5x slowdown or a
> 1.5x speedup via adjfreq(2), but no slower/faster.
> 
> I don't *think* ntpd(8) would ever reach such extreme adjustments
> through its algorithm.  I don't think this will break anyone's working
> setup.
> 
> (Maybe I'm wrong, though.  otto@?)

Right, ntpd is pretty conversative and won't do big adjustments.

-Otto

> 
> Just so we're all clear that the math is sound, here's the result at
> the upper limit of the input range.  Note that adjtime(2) is capped at
> 5000PPM in ntp_update_second(), hence its value here.
> 
>   int64_t th_adjustment = (5000 * 1000) << 32;/* 2147483648000 */
>   int64_t tc_freq_adj = 5LL << 32;/* 21474836480 
> */
>   
> 
>   scale = (u_int64_t)1 << 63  /* 9223372036854775808 
> */
>   scale += (th_adjustment + tc_freq_adj) / 1024 * 2199;
>   /*+= (216895848448000) / 1024 * 2199; */
>   /*+= 465775362048000; */
> 
> 9223372036854775808 + 465775362048000 = 13881125657334775808,
> which less than 18446744073709551616, so we don't have overflow.
> 
> At the negative end of the input range, i.e.
> 
>   int64_t th_adjustment = (-5000 * 1000) << 32;
>   int64_t tc_freq_adj = -5LL << 32;
> 
> you have 9223372036854775808 - 465775362048000 = 4565618416374775808,
> so no underflow either.
> 
> Thoughts?
> 
> What is the best way to express this range in the documentation?  Do I
> say "parts per billion", or something else?
> 
> Index: sys/kern/kern_time.c
> ===
> RCS file: /cvs/src/sys/kern/kern_time.c,v
> retrieving revision 1.131
> diff -u -p -r1.131 kern_time.c
> --- sys/kern/kern_time.c  22 Jun 2020 18:25:57 -  1.131
> +++ sys/kern/kern_time.c  3 Jul 2020 00:57:49 -
> @@ -391,6 +391,9 @@ sys_settimeofday(struct proc *p, void *v
>   return (0);
>  }
>  
> +#define ADJFREQ_MAX (5LL << 32)
> +#define ADJFREQ_MIN (-5LL << 32)
> +
>  int
>  sys_adjfreq(struct proc *p, void *v, register_t *retval)
>  {
> @@ -408,6 +411,8 @@ sys_adjfreq(struct proc *p, void *v, reg
>   return (error);
>   if ((error = copyin(freq, , sizeof(f
>   return (error);
> + if (f < ADJFREQ_MIN || f > ADJFREQ_MAX)
> + return (EINVAL);
>   }
>  
>   rw_enter(_lock, (freq == NULL) ? RW_READ : RW_WRITE);
> Index: lib/libc/sys/adjfreq.2
> ===
> RCS file: /cvs/src/lib/libc/sys/adjfreq.2,v
> retrieving revision 1.7
> diff -u -p -r1.7 adjfreq.2
> --- lib/libc/sys/adjfreq.210 Sep 2015 17:55:21 -  1.7
> +++ lib/libc/sys/adjfreq.23 Jul 2020 00:57:49 -
> @@ -60,6 +60,10 @@ The
>  .Fa freq
>  argument is non-null and the process's effective user ID is not that
>  of the superuser.
> +.It Bq Er EINVAL
> +.Fa freq
> +is less than -5 parts-per-billion or greater than 5
> +parts-per-billion.
>  .El
>  .Sh SEE ALSO
>  .Xr date 1 ,



Re: fsck_ffs: faster with lots of cylinder groups

2020-06-29 Thread Otto Moerbeek
On Sun, Jun 21, 2020 at 03:35:21PM +0200, Otto Moerbeek wrote:

> Hi,
> 
> both phase 1 and phase 5 need cylinder group metadata.  This diff
> keeps the cg data read in phase 1 in memory to be used by phase 5 if
> possible. From FreeBSD. 
> 
>   -Otto
> 
> On an empty 30T fileystem:
> 
> $ time obj/fsck_ffs -f /dev/sd3a
> 2m44.10s real 0m13.21s user 0m07.38s system
> 
> $ time doas obj/fsck_ffs -f /dev/sd3a
> 1m32.81s real 0m12.86s user 0m05.25s system
> 
> The difference will be less if a fileystem is filled up, but still nice.

Any takers?

-Otto

> 
> Index: fsck.h
> ===
> RCS file: /cvs/src/sbin/fsck_ffs/fsck.h,v
> retrieving revision 1.32
> diff -u -p -r1.32 fsck.h
> --- fsck.h5 Jan 2018 09:33:47 -   1.32
> +++ fsck.h21 Jun 2020 12:48:50 -
> @@ -136,7 +136,6 @@ struct bufarea {
>  struct bufarea bufhead;  /* head of list of other blks in 
> filesys */
>  struct bufarea sblk; /* file system superblock */
>  struct bufarea asblk;/* alternate file system superblock */
> -struct bufarea cgblk;/* cylinder group blocks */
>  struct bufarea *pdirbp;  /* current directory contents */
>  struct bufarea *pbp; /* current inode block */
>  struct bufarea *getdatablk(daddr_t, long);
> @@ -148,9 +147,7 @@ struct bufarea *getdatablk(daddr_t, long
>   (bp)->b_flags = 0;
>  
>  #define  sbdirty()   sblk.b_dirty = 1
> -#define  cgdirty()   cgblk.b_dirty = 1
>  #define  sblock  (*sblk.b_un.b_fs)
> -#define  cgrp(*cgblk.b_un.b_cg)
>  
>  enum fixstate {DONTKNOW, NOFIX, FIX, IGNORE};
>  
> @@ -275,9 +272,13 @@ struct ufs2_dinode ufs2_zino;
>  #define  FOUND   0x10
>  
>  union dinode *ginode(ino_t);
> +struct bufarea *cglookup(u_int cg);
>  struct inoinfo *getinoinfo(ino_t);
>  void getblk(struct bufarea *, daddr_t, long);
>  ino_t allocino(ino_t, int);
> +void *Malloc(size_t);
> +void *Calloc(size_t, size_t);
> +void *Reallocarray(void *, size_t, size_t);
>  
>  int  (*info_fn)(char *, size_t);
>  char *info_filesys;
> Index: inode.c
> ===
> RCS file: /cvs/src/sbin/fsck_ffs/inode.c,v
> retrieving revision 1.49
> diff -u -p -r1.49 inode.c
> --- inode.c   16 Sep 2018 02:43:11 -  1.49
> +++ inode.c   21 Jun 2020 12:48:50 -
> @@ -370,7 +370,7 @@ setinodebuf(ino_t inum)
>   partialsize = inobufsize;
>   }
>   if (inodebuf == NULL &&
> - (inodebuf = malloc((unsigned)inobufsize)) == NULL)
> + (inodebuf = Malloc((unsigned)inobufsize)) == NULL)
>   errexit("Cannot allocate space for inode buffer\n");
>  }
>  
> @@ -401,7 +401,7 @@ cacheino(union dinode *dp, ino_t inumber
>   blks = howmany(DIP(dp, di_size), sblock.fs_bsize);
>   if (blks > NDADDR)
>   blks = NDADDR + NIADDR;
> - inp = malloc(sizeof(*inp) + (blks ? blks - 1 : 0) * sizeof(daddr_t));
> + inp = Malloc(sizeof(*inp) + (blks ? blks - 1 : 0) * sizeof(daddr_t));
>   if (inp == NULL)
>   errexit("cannot allocate memory for inode cache\n");
>   inpp = [inumber % numdirs];
> @@ -423,10 +423,10 @@ cacheino(union dinode *dp, ino_t inumber
>   inp->i_blks[NDADDR + i] = DIP(dp, di_ib[i]);
>   if (inplast == listmax) {
>   newlistmax = listmax + 100;
> - newinpsort = reallocarray(inpsort,
> + newinpsort = Reallocarray(inpsort,
>   (unsigned)newlistmax, sizeof(struct inoinfo *));
>   if (newinpsort == NULL)
> - errexit("cannot increase directory list");
> + errexit("cannot increase directory list\n");
>   inpsort = newinpsort;
>   listmax = newlistmax;
>   }
> @@ -582,7 +582,8 @@ allocino(ino_t request, int type)
>  {
>   ino_t ino;
>   union dinode *dp;
> - struct cg *cgp = 
> + struct bufarea *cgbp;
> + struct cg *cgp;
>   int cg;
>   time_t t;
>   struct inostat *info;
> @@ -602,7 +603,7 @@ allocino(ino_t request, int type)
>   unsigned long newalloced, i;
>   newalloced = MINIMUM(sblock.fs_ipg,
>   MAXIMUM(2 * inostathead[cg].il_numalloced, 10));
> - info = calloc(newalloced, sizeof(struct inostat));
> + info = Calloc(newalloced, sizeof(struct inostat));
>   if (info == NULL) {
>

Re: obsd 6.7 - ntpd error msg

2020-06-22 Thread Otto Moerbeek
On Thu, Jun 18, 2020 at 11:41:17AM +0200, Otto Moerbeek wrote:

> On Thu, Jun 18, 2020 at 09:57:34AM +0200, Salvatore Cuzzilla wrote:
> 
> > Perfect, tnx!
> > 
> > On 18.06.2020 07:58, Otto Moerbeek wrote:
> > > On Wed, Jun 17, 2020 at 10:53:54PM +0200, Salvatore Cuzzilla wrote:
> > > 
> > > > Hi Otto here the logs (multitail) - @22:49:15 I restarted ntpd:
> > > > -
> > > > Jun 17 22:49:23 obsd ntpd[88568]: constraint reply from 188.61.106.24: 
> > > > offset -0.541051
> > > > Jun 17 22:49:46 obsd ntpd[88568]: peer 172.17.1.1 now valid
> > > > 01] /var/log/daemon  <---   
> > > > 
> > > > 
> > > > 
> > > >  248KB - 2020/06/17 22:49:46
> > > > -
> > > > Jun 17 14:00:01 obsd syslogd[80400]: restart
> > > > Jun 17 16:19:41 obsd ntpd[92699]: pipe write error (from main): No such 
> > > > file or directory
> > > > Jun 17 16:21:07 obsd ntpd[29588]: pipe write error (from main): No such 
> > > > file or directory
> > > > Jun 17 17:00:01 obsd syslogd[80400]: restart
> > > > Jun 17 17:01:25 obsd ntpd[96273]: pipe write error (from main): No such 
> > > > file or directory
> > > > Jun 17 17:02:38 obsd ntpd[94737]: pipe write error (from main): No such 
> > > > file or directory
> > > > Jun 17 20:00:01 obsd syslogd[80400]: restart
> > > > Jun 17 22:00:01 obsd syslogd[80400]: restart
> > > > Jun 17 22:49:22 obsd ntpd[40936]: pipe write error (from main): No such 
> > > > file or directory
> > > > 02] /var/log/messages <---  
> > > > 
> > > > 
> > > > 
> > > >  205KB - 2020/06/17 22:49:22
> > > > -
> > > > 22:49:15 -ksh ToTo@obsd ~ $ doas rcctl restart ntpd
> > > > ntpd(ok)
> > > > ntpd(ok)
> > > > 22:49:23 -ksh ToTo@obsd ~ $
> > > 
> > > 
> > > OK, now we're getting somewhere.  It always helps to provide lots of
> > > information form the start.
> > > 
> > > The message is generated by ntpd being stopped.  It is harmless,
> > > though it is actually wrong, it's a pip read error.
> > > 
> > > So nothing to worry about.  I'll see if the log level should be
> > > changed to debug for this one or maybe another solution.
> > > 
> 
> And now with diff.

I committed a slighlty more conservative version of this diff. A dns
read error (which should not happen) still logs at warn level.

-Otto

> 
> Index: ntp.c
> ===
> RCS file: /cvs/src/usr.sbin/ntpd/ntp.c,v
> retrieving revision 1.164
> diff -u -p -r1.164 ntp.c
> --- ntp.c 11 Apr 2020 07:49:48 -  1.164
> +++ ntp.c 18 Jun 2020 09:39:03 -
> @@ -365,7 +365,7 @@ ntp_main(struct ntpd_conf *nconf, struct
>   if (nfds > 0 && pfd[PFD_PIPE_MAIN].revents & (POLLIN|POLLERR)) {
>   nfds--;
>   if (ntp_dispatch_imsg() == -1) {
> - log_warn("pipe write error (from main)");
> + log_debug("pipe read error (from main)");
>   ntp_quit = 1;
>   }
>   }
> @@ -380,7 +380,7 @@ ntp_main(struct ntpd_conf *nconf, struct
>   if (nfds > 0 && pfd[PFD_PIPE_DNS].revents & (POLLIN|POLLERR)) {
>   nfds--;
>   if (ntp_dispatch_imsg_dns() == -1) {
> - log_warn("pipe write error (from dns engine)");
> + log_debug("pipe read error (from dns engine)");
>   ntp_quit = 1;
>   }
>   }
> 



fsck_ffs: faster with lots of cylinder groups

2020-06-21 Thread Otto Moerbeek
Hi,

both phase 1 and phase 5 need cylinder group metadata.  This diff
keeps the cg data read in phase 1 in memory to be used by phase 5 if
possible. From FreeBSD. 

-Otto

On an empty 30T fileystem:

$ time obj/fsck_ffs -f /dev/sd3a
2m44.10s real 0m13.21s user 0m07.38s system

$ time doas obj/fsck_ffs -f /dev/sd3a
1m32.81s real 0m12.86s user 0m05.25s system

The difference will be less if a fileystem is filled up, but still nice.

-Otto

Index: fsck.h
===
RCS file: /cvs/src/sbin/fsck_ffs/fsck.h,v
retrieving revision 1.32
diff -u -p -r1.32 fsck.h
--- fsck.h  5 Jan 2018 09:33:47 -   1.32
+++ fsck.h  21 Jun 2020 12:48:50 -
@@ -136,7 +136,6 @@ struct bufarea {
 struct bufarea bufhead;/* head of list of other blks in 
filesys */
 struct bufarea sblk;   /* file system superblock */
 struct bufarea asblk;  /* alternate file system superblock */
-struct bufarea cgblk;  /* cylinder group blocks */
 struct bufarea *pdirbp;/* current directory contents */
 struct bufarea *pbp;   /* current inode block */
 struct bufarea *getdatablk(daddr_t, long);
@@ -148,9 +147,7 @@ struct bufarea *getdatablk(daddr_t, long
(bp)->b_flags = 0;
 
 #definesbdirty()   sblk.b_dirty = 1
-#definecgdirty()   cgblk.b_dirty = 1
 #definesblock  (*sblk.b_un.b_fs)
-#definecgrp(*cgblk.b_un.b_cg)
 
 enum fixstate {DONTKNOW, NOFIX, FIX, IGNORE};
 
@@ -275,9 +272,13 @@ struct ufs2_dinode ufs2_zino;
 #defineFOUND   0x10
 
 union dinode *ginode(ino_t);
+struct bufarea *cglookup(u_int cg);
 struct inoinfo *getinoinfo(ino_t);
 void getblk(struct bufarea *, daddr_t, long);
 ino_t allocino(ino_t, int);
+void *Malloc(size_t);
+void *Calloc(size_t, size_t);
+void *Reallocarray(void *, size_t, size_t);
 
 int(*info_fn)(char *, size_t);
 char   *info_filesys;
Index: inode.c
===
RCS file: /cvs/src/sbin/fsck_ffs/inode.c,v
retrieving revision 1.49
diff -u -p -r1.49 inode.c
--- inode.c 16 Sep 2018 02:43:11 -  1.49
+++ inode.c 21 Jun 2020 12:48:50 -
@@ -370,7 +370,7 @@ setinodebuf(ino_t inum)
partialsize = inobufsize;
}
if (inodebuf == NULL &&
-   (inodebuf = malloc((unsigned)inobufsize)) == NULL)
+   (inodebuf = Malloc((unsigned)inobufsize)) == NULL)
errexit("Cannot allocate space for inode buffer\n");
 }
 
@@ -401,7 +401,7 @@ cacheino(union dinode *dp, ino_t inumber
blks = howmany(DIP(dp, di_size), sblock.fs_bsize);
if (blks > NDADDR)
blks = NDADDR + NIADDR;
-   inp = malloc(sizeof(*inp) + (blks ? blks - 1 : 0) * sizeof(daddr_t));
+   inp = Malloc(sizeof(*inp) + (blks ? blks - 1 : 0) * sizeof(daddr_t));
if (inp == NULL)
errexit("cannot allocate memory for inode cache\n");
inpp = [inumber % numdirs];
@@ -423,10 +423,10 @@ cacheino(union dinode *dp, ino_t inumber
inp->i_blks[NDADDR + i] = DIP(dp, di_ib[i]);
if (inplast == listmax) {
newlistmax = listmax + 100;
-   newinpsort = reallocarray(inpsort,
+   newinpsort = Reallocarray(inpsort,
(unsigned)newlistmax, sizeof(struct inoinfo *));
if (newinpsort == NULL)
-   errexit("cannot increase directory list");
+   errexit("cannot increase directory list\n");
inpsort = newinpsort;
listmax = newlistmax;
}
@@ -582,7 +582,8 @@ allocino(ino_t request, int type)
 {
ino_t ino;
union dinode *dp;
-   struct cg *cgp = 
+   struct bufarea *cgbp;
+   struct cg *cgp;
int cg;
time_t t;
struct inostat *info;
@@ -602,7 +603,7 @@ allocino(ino_t request, int type)
unsigned long newalloced, i;
newalloced = MINIMUM(sblock.fs_ipg,
MAXIMUM(2 * inostathead[cg].il_numalloced, 10));
-   info = calloc(newalloced, sizeof(struct inostat));
+   info = Calloc(newalloced, sizeof(struct inostat));
if (info == NULL) {
pwarn("cannot alloc %zu bytes to extend inoinfo\n",
sizeof(struct inostat) * newalloced);
@@ -619,7 +620,8 @@ allocino(ino_t request, int type)
inostathead[cg].il_numalloced = newalloced;
info = inoinfo(ino);
}
-   getblk(, cgtod(, cg), sblock.fs_cgsize);
+   cgbp = cglookup(cg);
+   cgp = cgbp->b_un.b_cg;
if (!cg_chkmagic(cgp))
pfatal("CG %d: BAD MAGIC NUMBER\n", cg);
setbit(cg_inosused(cgp), ino % sblock.fs_ipg);
@@ -637,7 +639,7 @@ allocino(ino_t request, int type)
default:

Re: obsd 6.7 - ntpd error msg

2020-06-18 Thread Otto Moerbeek
On Thu, Jun 18, 2020 at 09:57:34AM +0200, Salvatore Cuzzilla wrote:

> Perfect, tnx!
> 
> On 18.06.2020 07:58, Otto Moerbeek wrote:
> > On Wed, Jun 17, 2020 at 10:53:54PM +0200, Salvatore Cuzzilla wrote:
> > 
> > > Hi Otto here the logs (multitail) - @22:49:15 I restarted ntpd:
> > > -
> > > Jun 17 22:49:23 obsd ntpd[88568]: constraint reply from 188.61.106.24: 
> > > offset -0.541051
> > > Jun 17 22:49:46 obsd ntpd[88568]: peer 172.17.1.1 now valid
> > > 01] /var/log/daemon  <--- 
> > >   
> > >   
> > >
> > > 248KB - 2020/06/17 22:49:46
> > > -
> > > Jun 17 14:00:01 obsd syslogd[80400]: restart
> > > Jun 17 16:19:41 obsd ntpd[92699]: pipe write error (from main): No such 
> > > file or directory
> > > Jun 17 16:21:07 obsd ntpd[29588]: pipe write error (from main): No such 
> > > file or directory
> > > Jun 17 17:00:01 obsd syslogd[80400]: restart
> > > Jun 17 17:01:25 obsd ntpd[96273]: pipe write error (from main): No such 
> > > file or directory
> > > Jun 17 17:02:38 obsd ntpd[94737]: pipe write error (from main): No such 
> > > file or directory
> > > Jun 17 20:00:01 obsd syslogd[80400]: restart
> > > Jun 17 22:00:01 obsd syslogd[80400]: restart
> > > Jun 17 22:49:22 obsd ntpd[40936]: pipe write error (from main): No such 
> > > file or directory
> > > 02] /var/log/messages <---
> > >   
> > >   
> > >
> > > 205KB - 2020/06/17 22:49:22
> > > -
> > > 22:49:15 -ksh ToTo@obsd ~ $ doas rcctl restart ntpd
> > > ntpd(ok)
> > > ntpd(ok)
> > > 22:49:23 -ksh ToTo@obsd ~ $
> > 
> > 
> > OK, now we're getting somewhere.  It always helps to provide lots of
> > information form the start.
> > 
> > The message is generated by ntpd being stopped.  It is harmless,
> > though it is actually wrong, it's a pip read error.
> > 
> > So nothing to worry about.  I'll see if the log level should be
> > changed to debug for this one or maybe another solution.
> > 

And now with diff.

-Otto

Index: ntp.c
===
RCS file: /cvs/src/usr.sbin/ntpd/ntp.c,v
retrieving revision 1.164
diff -u -p -r1.164 ntp.c
--- ntp.c   11 Apr 2020 07:49:48 -  1.164
+++ ntp.c   18 Jun 2020 09:39:03 -
@@ -365,7 +365,7 @@ ntp_main(struct ntpd_conf *nconf, struct
if (nfds > 0 && pfd[PFD_PIPE_MAIN].revents & (POLLIN|POLLERR)) {
nfds--;
if (ntp_dispatch_imsg() == -1) {
-   log_warn("pipe write error (from main)");
+   log_debug("pipe read error (from main)");
ntp_quit = 1;
}
}
@@ -380,7 +380,7 @@ ntp_main(struct ntpd_conf *nconf, struct
if (nfds > 0 && pfd[PFD_PIPE_DNS].revents & (POLLIN|POLLERR)) {
nfds--;
if (ntp_dispatch_imsg_dns() == -1) {
-   log_warn("pipe write error (from dns engine)");
+   log_debug("pipe read error (from dns engine)");
ntp_quit = 1;
}
}



Re: obsd 6.7 - ntpd error msg

2020-06-17 Thread Otto Moerbeek
On Wed, Jun 17, 2020 at 10:53:54PM +0200, Salvatore Cuzzilla wrote:

> Hi Otto here the logs (multitail) - @22:49:15 I restarted ntpd:
> -
> Jun 17 22:49:23 obsd ntpd[88568]: constraint reply from 188.61.106.24: offset 
> -0.541051
> Jun 17 22:49:46 obsd ntpd[88568]: peer 172.17.1.1 now valid
> 01] /var/log/daemon  <--- 
>   
>   
>248KB - 2020/06/17 
> 22:49:46
> -
> Jun 17 14:00:01 obsd syslogd[80400]: restart
> Jun 17 16:19:41 obsd ntpd[92699]: pipe write error (from main): No such file 
> or directory
> Jun 17 16:21:07 obsd ntpd[29588]: pipe write error (from main): No such file 
> or directory
> Jun 17 17:00:01 obsd syslogd[80400]: restart
> Jun 17 17:01:25 obsd ntpd[96273]: pipe write error (from main): No such file 
> or directory
> Jun 17 17:02:38 obsd ntpd[94737]: pipe write error (from main): No such file 
> or directory
> Jun 17 20:00:01 obsd syslogd[80400]: restart
> Jun 17 22:00:01 obsd syslogd[80400]: restart
> Jun 17 22:49:22 obsd ntpd[40936]: pipe write error (from main): No such file 
> or directory
> 02] /var/log/messages <---
>   
>   
>205KB - 2020/06/17 
> 22:49:22
> -
> 22:49:15 -ksh ToTo@obsd ~ $ doas rcctl restart ntpd
> ntpd(ok)
> ntpd(ok)
> 22:49:23 -ksh ToTo@obsd ~ $


OK, now we're getting somewhere.  It always helps to provide lots of
information form the start.

The message is generated by ntpd being stopped.  It is harmless,
though it is actually wrong, it's a pip read error.

So nothing to worry about.  I'll see if the log level should be
changed to debug for this one or maybe another solution.

-Otto
> 
> On 17.06.2020 21:18, Otto Moerbeek wrote:
> > On Wed, Jun 17, 2020 at 09:15:22PM +0200, Salvatore Cuzzilla wrote:
> > 
> > > Hi Otto,
> > > 
> > > thanks for helping, really appreciated!
> > > The msg is showing after each restart. My simple conf here below:
> > > -
> > > 21:05:52 -ksh ToTo@obsd ~ $ doas cat /etc/ntpd.conf
> > > # $OpenBSD: ntpd.conf,v 1.14 2015/07/15 20:28:37 ajacoutot Exp $
> > > #
> > > # See ntpd.conf(5) and /etc/examples/ntpd.conf
> > > 
> > > server 172.17.1.1
> > > sensor *
> > > constraints from "https://www.alfanetti.org;
> > > -
> > 
> > And show the log lines, all of them
> > 
> > -Otto
> > 
> > > 
> > > On 17.06.2020 20:51, Otto Moerbeek wrote:
> > > > On Wed, Jun 17, 2020 at 04:50:46PM +0200, Salvatore Cuzzilla wrote:
> > > >
> > > > > Hi Folks,
> > > > >
> > > > > when I restart ntpd I see this msg in /var/log/daemon:
> > > > >
> > > > > Jun 17 16:19:41 obsd ntpd[92699]: pipe write error (from main): No 
> > > > > suchfile or directory
> > > > >
> > > > > however, time seems to be in sync:
> > > > >
> > > > > ---
> > > > > 16:37:17 -ksh ToTo@obsd ~ $ ntpctl -sa
> > > > > 1/1 peers valid, 1/1 sensors valid, constraint offset -1s, clock 
> > > > > unsynced
> > > > >
> > > > > peer
> > > > >wt tl st  next  poll  offset   delay  jitter
> > > > > 172.17.1.1
> > > > > 1 10  3 2361s 3069s-0.008ms 0.716ms 0.137ms
> > > > >
> > > > > sensor
> > > > >wt gd st  next  poll  offset  correction
> > > > > vmt0
> > > > > 1  1  07s   15s27.860ms 0.000ms
> > > > >
> > > > > 16:38:20 -ksh ToTo@obsd ~ $ doas sysctl -a | grep timecounter
> > > > > kern.timecounter.tick=1
> > > > > kern.timecounter.timestepwarnings=0
> > > > > kern.timecounter.hardware=tsc
> > > > > kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) 
> > > > > acpitimer0(1000)
> > > > > ---
> > > > >
> > > > > anyone else experiencing the same?
> > > > >
> > > > > ---
> > > > > :wq,
> > > > > Salvatore.
> > > > >
> > > >
> > > > Was the message in the log before or after restarting?
> > > > Please show your ntpd.conf
> > > >
> > > > -Otto
> > > >
> > > 
> > > ---
> > > :wq,
> > > Salvatore.
> > 
> 
> ---
> :wq,
> Salvatore.



Re: obsd 6.7 - ntpd error msg

2020-06-17 Thread Otto Moerbeek
On Wed, Jun 17, 2020 at 09:15:22PM +0200, Salvatore Cuzzilla wrote:

> Hi Otto,
> 
> thanks for helping, really appreciated!
> The msg is showing after each restart. My simple conf here below:
> -
> 21:05:52 -ksh ToTo@obsd ~ $ doas cat /etc/ntpd.conf
> # $OpenBSD: ntpd.conf,v 1.14 2015/07/15 20:28:37 ajacoutot Exp $
> #
> # See ntpd.conf(5) and /etc/examples/ntpd.conf
> 
> server 172.17.1.1
> sensor *
> constraints from "https://www.alfanetti.org;
> -

And show the log lines, all of them

    -Otto

> 
> On 17.06.2020 20:51, Otto Moerbeek wrote:
> > On Wed, Jun 17, 2020 at 04:50:46PM +0200, Salvatore Cuzzilla wrote:
> > 
> > > Hi Folks,
> > > 
> > > when I restart ntpd I see this msg in /var/log/daemon:
> > > 
> > > Jun 17 16:19:41 obsd ntpd[92699]: pipe write error (from main): No 
> > > suchfile or directory
> > > 
> > > however, time seems to be in sync:
> > > 
> > > ---
> > > 16:37:17 -ksh ToTo@obsd ~ $ ntpctl -sa
> > > 1/1 peers valid, 1/1 sensors valid, constraint offset -1s, clock unsynced
> > > 
> > > peer
> > >wt tl st  next  poll  offset   delay  jitter
> > > 172.17.1.1
> > > 1 10  3 2361s 3069s-0.008ms 0.716ms 0.137ms
> > > 
> > > sensor
> > >wt gd st  next  poll  offset  correction
> > > vmt0
> > > 1  1  07s   15s27.860ms 0.000ms
> > > 
> > > 16:38:20 -ksh ToTo@obsd ~ $ doas sysctl -a | grep timecounter
> > > kern.timecounter.tick=1
> > > kern.timecounter.timestepwarnings=0
> > > kern.timecounter.hardware=tsc
> > > kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) 
> > > acpitimer0(1000)
> > > ---
> > > 
> > > anyone else experiencing the same?
> > > 
> > > ---
> > > :wq,
> > > Salvatore.
> > > 
> > 
> > Was the message in the log before or after restarting?
> > Please show your ntpd.conf
> > 
> > -Otto
> > 
> 
> ---
> :wq,
> Salvatore.



Re: obsd 6.7 - ntpd error msg

2020-06-17 Thread Otto Moerbeek
On Wed, Jun 17, 2020 at 04:50:46PM +0200, Salvatore Cuzzilla wrote:

> Hi Folks,
> 
> when I restart ntpd I see this msg in /var/log/daemon:
> 
> Jun 17 16:19:41 obsd ntpd[92699]: pipe write error (from main): No suchfile 
> or directory
> 
> however, time seems to be in sync:
> 
> ---
> 16:37:17 -ksh ToTo@obsd ~ $ ntpctl -sa
> 1/1 peers valid, 1/1 sensors valid, constraint offset -1s, clock unsynced
> 
> peer
>wt tl st  next  poll  offset   delay  jitter
> 172.17.1.1
> 1 10  3 2361s 3069s-0.008ms 0.716ms 0.137ms
> 
> sensor
>wt gd st  next  poll  offset  correction
> vmt0
> 1  1  07s   15s27.860ms 0.000ms
> 
> 16:38:20 -ksh ToTo@obsd ~ $ doas sysctl -a | grep timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) acpitimer0(1000)
> ---
> 
> anyone else experiencing the same?
> 
> ---
> :wq,
> Salvatore.
> 

Was the message in the log before or after restarting?
Please show your ntpd.conf

-Otto



sparc64: bootblocks vs ofwboot load address

2020-06-05 Thread Otto Moerbeek
Hi,

Miod remarked the overwriting of the bootblocks actually is a
regression I introduced. So teintroduce the lost comment and load
ofwboot at 0x6000. 

OK?

-Otto

Index: bootblk.fth
===
RCS file: /cvs/src/sys/arch/sparc64/stand/bootblk/bootblk.fth,v
retrieving revision 1.9
diff -u -p -r1.9 bootblk.fth
--- bootblk.fth 2 Apr 2020 06:06:22 -   1.9
+++ bootblk.fth 5 Jun 2020 08:09:33 -
@@ -716,7 +716,15 @@ create cur-blockno -1 l, -1 l, \ Curren
 2drop
 ;
 
-" load-base " evaluate constant loader-base
+\
+\ According to the 1275 addendum for SPARC processors:
+\ Default load-base is 0x4000.  At least 0x8. or
+\ 512KB must be available at that address.  
+\
+\ The Fcode bootblock can take up up to 8KB (O.K., 7.5KB) 
+\ so load programs at 0x4000 + 0x2000=> 0x6000
+\
+" load-base " evaluate 2000 + constant loader-base
 
 : load-file-signon ( load-file len boot-path len -- load-file len boot-path 
len )
." Loading file" space 2over type cr ." from device" space 2dup type cr
@@ -821,7 +829,7 @@ create cur-blockno -1 l, -1 l,  \ Curren
 ;
 
 : do-boot ( bootfile -- )
-   ." OpenBSD IEEE 1275 Bootblock 2.0" cr
+   ." OpenBSD IEEE 1275 Bootblock 2.1" cr
 
\ Open boot device
boot-path   ( boot-path len )



Re: filesystem code integer and many inodes

2020-06-02 Thread Otto Moerbeek
On Fri, May 29, 2020 at 09:30:04AM +0200, Otto Moerbeek wrote:

> On Thu, May 28, 2020 at 12:54:41PM -0600, Todd C. Miller wrote:
> 
> > On Thu, 28 May 2020 20:53:07 +0200, Otto Moerbeek wrote:
> > 
> > > Here's the separate diff for the prefcg loops. From FreeBSD.
> > 
> > OK millert@
> > 
> >  - todd
> > 
> 
> And here's the updated diff against -current. I removed a redundant
> cast in a fs_ipg * fs_ncg multiplication in fsck_ffs. Since both are
> u_int32 and we know the product is <= UINT_MAX, so we do not need to
> cast.
> 
> I would like to make some progress here, I have a followup diff to
> speed up Phase 5 of fsck_ffs...

Did anyone look closer at this?

Did anyone test?

-Otto


> 
> Index: sbin/clri/clri.c
> ===
> RCS file: /cvs/src/sbin/clri/clri.c,v
> retrieving revision 1.20
> diff -u -p -r1.20 clri.c
> --- sbin/clri/clri.c  28 Jun 2019 13:32:43 -  1.20
> +++ sbin/clri/clri.c  29 May 2020 07:23:27 -
> @@ -68,7 +68,8 @@ main(int argc, char *argv[])
>   char *fs, sblock[SBLOCKSIZE];
>   size_t bsize;
>   off_t offset;
> - int i, fd, imax, inonum;
> + int i, fd;
> + ino_t imax, inonum;
>  
>   if (argc < 3)
>   usage();
> Index: sbin/dumpfs/dumpfs.c
> ===
> RCS file: /cvs/src/sbin/dumpfs/dumpfs.c,v
> retrieving revision 1.35
> diff -u -p -r1.35 dumpfs.c
> --- sbin/dumpfs/dumpfs.c  17 Feb 2020 16:11:25 -  1.35
> +++ sbin/dumpfs/dumpfs.c  29 May 2020 07:23:27 -
> @@ -69,7 +69,7 @@ union {
>  #define acg  cgun.cg
>  
>  int  dumpfs(int, const char *);
> -int  dumpcg(const char *, int, int);
> +int  dumpcg(const char *, int, u_int);
>  int  marshal(const char *);
>  int  open_disk(const char *);
>  void pbits(void *, int);
> @@ -163,6 +163,7 @@ dumpfs(int fd, const char *name)
>   size_t size;
>   off_t off;
>   int i, j;
> + u_int cg;
>  
>   switch (afs.fs_magic) {
>   case FS_UFS2_MAGIC:
> @@ -172,7 +173,7 @@ dumpfs(int fd, const char *name)
>   afs.fs_magic, ctime());
>   printf("superblock location\t%jd\tid\t[ %x %x ]\n",
>   (intmax_t)afs.fs_sblockloc, afs.fs_id[0], afs.fs_id[1]);
> - printf("ncg\t%d\tsize\t%jd\tblocks\t%jd\n",
> + printf("ncg\t%u\tsize\t%jd\tblocks\t%jd\n",
>   afs.fs_ncg, (intmax_t)fssize, (intmax_t)afs.fs_dsize);
>   break;
>   case FS_UFS1_MAGIC:
> @@ -198,7 +199,7 @@ dumpfs(int fd, const char *name)
>   printf("cylgrp\t%s\tinodes\t%s\tfslevel %d\n",
>   i < 1 ? "static" : "dynamic",
>   i < 2 ? "4.2/4.3BSD" : "4.4BSD", i);
> - printf("ncg\t%d\tncyl\t%d\tsize\t%d\tblocks\t%d\n",
> + printf("ncg\t%u\tncyl\t%d\tsize\t%d\tblocks\t%d\n",
>   afs.fs_ncg, afs.fs_ncyl, afs.fs_ffs1_size, 
> afs.fs_ffs1_dsize);
>   break;
>   default:
> @@ -223,9 +224,9 @@ dumpfs(int fd, const char *name)
>   (intmax_t)afs.fs_cstotal.cs_ndir,
>   (intmax_t)afs.fs_cstotal.cs_nifree, 
>   (intmax_t)afs.fs_cstotal.cs_nffree);
> - printf("bpg\t%d\tfpg\t%d\tipg\t%d\n",
> + printf("bpg\t%d\tfpg\t%d\tipg\t%u\n",
>   afs.fs_fpg / afs.fs_frag, afs.fs_fpg, afs.fs_ipg);
> - printf("nindir\t%d\tinopb\t%d\tmaxfilesize\t%ju\n",
> + printf("nindir\t%d\tinopb\t%u\tmaxfilesize\t%ju\n",
>   afs.fs_nindir, afs.fs_inopb, 
>   (uintmax_t)afs.fs_maxfilesize);
>   printf("sbsize\t%d\tcgsize\t%d\tcsaddr\t%jd\tcssize\t%d\n",
> @@ -238,10 +239,10 @@ dumpfs(int fd, const char *name)
>   printf("nbfree\t%d\tndir\t%d\tnifree\t%d\tnffree\t%d\n",
>   afs.fs_ffs1_cstotal.cs_nbfree, afs.fs_ffs1_cstotal.cs_ndir,
>   afs.fs_ffs1_cstotal.cs_nifree, 
> afs.fs_ffs1_cstotal.cs_nffree);
> - printf("cpg\t%d\tbpg\t%d\tfpg\t%d\tipg\t%d\n",
> + printf("cpg\t%d\tbpg\t%d\tfpg\t%d\tipg\t%u\n",
>   afs.fs_cpg, afs.fs_fpg / afs.fs_frag, afs.fs_fpg,
>   afs.fs_ipg);
> - printf("nindir\t%d\tinopb\t%d\tnspf\t%d\tmaxfilesize\t%ju\n",
> + printf("nindir\t%d\tinopb\t%u\tnspf\t%d\tmaxfilesize\t%ju\n",
>

Re: sparc64 boot issue on qemu

2020-05-31 Thread Otto Moerbeek
On Sun, May 31, 2020 at 09:49:34AM +0100, Mark Cave-Ayland wrote:

> On 30/05/2020 10:54, Otto Moerbeek wrote:
> 
> > https://cdn.openbsd.org/pub/OpenBSD/snapshots/sparc64/
> > contains the unpatched miniroot.
> > 
> > https://www.drijf.net/openbsd/disk.qcow2
> > 
> > is the disk image based on the miniroot containing the patch in the
> > firts post in this thread.
> > 
> > Thanks for looking into this.
> > 
> > Note that we did *not* observe boot failure on any real sparc64
> > hardware. The bootblock changes I did for the 6.7 release were tested
> > on many different machines.
> 
> Thanks for the test case which enables me to reproduce the issue. With 
> ?fcode-verbose
> enabled you see this at the end of the FCode execution:
> 
> ...
> ...
> 5acf :  [ 0x8b7 ]
> 5ad0 : b(lit) [ 0x10 ]
> 5ad6 :  [ 0x81e ]
> 5ad7 : 0= [ 0x34 ]
> 5ad8 : swap [ 0x49 ]
> 5ad9 : drop [ 0x46 ]
> 5ada : b?branch [ 0x14 ]
>(offset) 5
> 5ade : (compile)  [ 0x8c8 ]
> 5adf : (compile) b(>resolve) [ 0xb2 ]
> OpenBSD IEEE 1275 Bootblock 2.0
> Booting from device /pci@1fe,0/pci@1,1/ide@3/ide@1/cdrom@0
> Try superblock read
> FFS v1
> ufs-open complete
> .Looking for ofwboot in directory...
> .
> ..
> ofwboot
> Found it
> .Loading 1a1c8  bytes of file...
> Copying 2000 bytes to 4000
> Copying 2000 bytes to 6000
> Copying 2000 bytes to 8000
> Copying 2000 bytes to a000
> Copying 2000 bytes to c000
> Copying 2000 bytes to e000
> Copying 2000 bytes to 1
> Copying 2000 bytes to 12000
> Copying 2000 bytes to 14000
> Copying 2000 bytes to 16000
> Copying 2000 bytes to 18000
> Copying 2000 bytes to 1a000
> Copying 2000 bytes to 1c000
> Copying 2000 bytes to 1e000
> 5ae0 : expect [ 0x8a ]
> 
> 
> Now that 0x8a is completely wrong since according to
> https://github.com/openbsd/src/blob/master/sys/arch/sparc64/stand/bootblk/bootblk.fth
> the last instruction should be exit which is 0x33.
> 
> Since the FCode itself is located at load-base (0x4000) it looks to me from 
> the above
> debug that you're loading ofwboot at the same address, overwriting the FCode. 
> Once
> do-boot has finished executing, the FCode interpreter returns to execute the 
> exit
> word which has now been overwritten: so instead of returning to the updated 
> client
> context via exit to execute ofwboot, it executes expect which asks for input 
> from the
> keyboard and then crashes because the stack is incorrect.
> 
> My recommendation would be to load ofwboot at 0x6000 instead of 0x4000 which I
> believe will fix the issue. It's interesting you mention that this works on 
> real
> hardware, since it doesn't agree with my reading of the IEEE-1275 
> specification so
> you're certainly relying on some undocumented behaviour here.
> 
> 
> ATB,
> 
> Mark.

Thanks, the following works indeed. 

-Otto

Index: bootblk.fth
===
RCS file: /cvs/src/sys/arch/sparc64/stand/bootblk/bootblk.fth,v
retrieving revision 1.9
diff -u -p -r1.9 bootblk.fth
--- bootblk.fth 2 Apr 2020 06:06:22 -   1.9
+++ bootblk.fth 31 May 2020 13:17:25 -
@@ -716,7 +716,8 @@ create cur-blockno -1 l, -1 l,  \ Curren
 2drop
 ;
 
-" load-base " evaluate constant loader-base
+\\ Do not overwrite bootblocks
+" load-base " evaluate 2000 + constant loader-base
 
 : load-file-signon ( load-file len boot-path len -- load-file len boot-path 
len )
." Loading file" space 2over type cr ." from device" space 2dup type cr



Re: sparc64 boot issue on qemu

2020-05-30 Thread Otto Moerbeek
On Sat, May 30, 2020 at 10:11:08AM +0100, Mark Cave-Ayland wrote:

> On 30/05/2020 10:03, Otto Moerbeek wrote:
> 
> > Hi,
> > 
> > thanks for the hints, but an unpatched 6.7 miniroot still fails to
> > boot for me
> > 
> > qemu-system-sparc64 -machine sun4u -m 1024 -drive \
> > file=miniroot67.img,format=raw -nographic -serial stdio -monitor none
> > 
> > OpenBIOS for Sparc64
> > Configuration device id QEMU version 1 machine id 0
> > kernel cmdline 
> > CPUs: 1 x SUNW,UltraSPARC-IIi
> > UUID: ----
> > Welcome to OpenBIOS v1.1 built on Oct 28 2019 17:08
> >   Type 'help' for detailed information
> > Trying disk:a...
> > Not a bootable ELF image
> > Not a bootable a.out image
> > 
> > Loading FCode image...
> > Loaded 6882 bytes
> > entry point is 0x4000
> > Evaluating FCode...
> > OpenBSD IEEE 1275 Bootblock 2.0
> > ..
> > 
> > And then hangs
> > 
> > While the patched bootblocks do boot (but hang later after
> > 
> > scsibus1 at softraid0: 256 targets
> > 
> > 
> > as before,
> > 
> > -Otto
> 
> Hmmm odd. Is it possible for you to upload your miniroot somewhere for me to 
> take a
> quick look? I don't have a great deal of time right now, but I can run it 
> through a
> debugger to see if anything obvious shows up.
> 
> 
> ATB,
> 
> Mark.

https://cdn.openbsd.org/pub/OpenBSD/snapshots/sparc64/
contains the unpatched miniroot.

https://www.drijf.net/openbsd/disk.qcow2

is the disk image based on the miniroot containing the patch in the
firts post in this thread.

Thanks for looking into this.

Note that we did *not* observe boot failure on any real sparc64
hardware. The bootblock changes I did for the 6.7 release were tested
on many different machines.

-Otto




Re: sparc64 boot issue on qemu

2020-05-30 Thread Otto Moerbeek
On Sat, May 30, 2020 at 09:29:36AM +0100, Mark Cave-Ayland wrote:

> On 29/05/2020 23:56, Jason A. Donenfeld wrote:
> 
> > Oh that's a nice observation about `boot disk -V`. Doing so actually
> > got me booting up entirely:
> > 
> > $ qemu-img convert -O qcow2 miniroot66.fs disk.qcow2
> > $ qemu-img resize disk.qcow2 20G
> > $ qemu-system-sparc64 -m 1024 -drive file=disk.qcow2,if=ide -net
> > nic,model=ne2k_pci -net user -boot a -nographic -monitor none -serial
> > stdio
> 
> I think the problem here is that you're asking OpenBIOS to boot from the 
> (empty)
> floppy disk with "-boot a" rather than the qcow2 image which is normally 
> attached to
> the first hard disk "-boot c". As this is the default, then I would expect the
> command line above to work if you simply drop "-boot a".
> 
> Also is there a particular reason for using the ne2k_pci NIC instead of the 
> default
> in-built sunhme device? I try and keep the documentation at
> https://wiki.qemu.org/Documentation/Platforms/SPARC as accurate as I can, so 
> do look
> there for latest best practices and command line examples.
> 
> Finally the version of qemu-system-sparc64 you are running can also boot from 
> a
> virtio-blk-pci device (again see the above wiki page for details) if you are 
> looking
> for the best emulated disk performance.
> 
> 
> ATB,
> 
> Mark.

Hi,

thanks for the hints, but an unpatched 6.7 miniroot still fails to
boot for me

qemu-system-sparc64 -machine sun4u -m 1024 -drive \
file=miniroot67.img,format=raw -nographic -serial stdio -monitor none

OpenBIOS for Sparc64
Configuration device id QEMU version 1 machine id 0
kernel cmdline 
CPUs: 1 x SUNW,UltraSPARC-IIi
UUID: ----
Welcome to OpenBIOS v1.1 built on Oct 28 2019 17:08
  Type 'help' for detailed information
Trying disk:a...
Not a bootable ELF image
Not a bootable a.out image

Loading FCode image...
Loaded 6882 bytes
entry point is 0x4000
Evaluating FCode...
OpenBSD IEEE 1275 Bootblock 2.0
..

And then hangs

While the patched bootblocks do boot (but hang later after

scsibus1 at softraid0: 256 targets


as before,

-Otto



sparc64 boot issue on qemu

2020-05-29 Thread Otto Moerbeek
On Thu, May 28, 2020 at 10:11:28AM +0200, Otto Moerbeek wrote:

> On Thu, May 28, 2020 at 01:21:21AM -0600, Jason A. Donenfeld wrote:
> 
> > On Thu, May 28, 2020 at 1:19 AM Otto Moerbeek  wrote:
> > > Of course.., I was running it from a !wxallowed mount. BTW, qemu is in
> > > packages, no need to build it yourself.
> > 
> > Sure, but now I've been somewhat nerd sniped and am playing with this
> > fcode forth implementation in qemu :-P. I wonder if there's something
> > missing in the 64-bit extensions to IEEE 1275, in table.fs...
> 
> OK, can reproduce. I'll see if I can find out something.
> 
>   -Otto
> 

After running the bootblocks in debug mode (using boot disk -V) and
seeing ofwboot was found and loaded, I added some debug code to the
bootblocks and now it correctly starts ofwboot on qemu


Trying disk:a...
Not a bootable ELF image
Not a bootable a.out image

Loading FCode image...
Loaded 6936 bytes
entry point is 0x4000
Evaluating FCode...
OpenBSD IEEE 1275 Bootblock 2.0
..free mem
close boot dev
start loaded program
>> OpenBSD BOOT 1.17
Trying bsd...
open /etc/random.seed: No such file or directory
Booting /pci@1fe,0/pci@1,1/ide@3/ide@0/disk@0:a/bsd
4225784@0x100+1288@0x1407af8+3249436@0x1c0+944868@0x1f1951c 
symbols @ 0xfef50340 139 start=0x100
console is /pci@1fe,0/pci@1,1/ebus@1/su
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2020 OpenBSD. All rights reserved.  https://www.OpenBSD.org
real mem = 2147483648 (2048MB)
avail mem = 2099232768 (2001MB)
random: boothowto does not indicate good seed
mainbus0 at root: OpenBiosTeam,OpenBIOS
cpu0 at mainbus0: SUNW,UltraSPARC-IIi (rev 9.1) @ 100 MHz
cpu0: physical 256K instruction (64 b/l), 16K data (32 b/l), 256K
external (64 b/l)
psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
psycho0: bus range 0-2, PCI bus 0
psycho0: dvma map c000-dfff
pci0 at psycho0
ppb0 at pci0 dev 1 function 1 "Sun Simba" rev 0x11
pci1 at ppb0 bus 1
ebus0 at pci1 dev 1 function 0 "Sun PCIO EBus2" rev 0x01
clock1 at ebus0 addr 2000-3fff: mk48t59
"power" at ebus0 addr 7240-7243 ivec 0x1 not configured
"fdthree" at ebus0 addr 0- not configured
com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
com0: console
pckbc0 at ebus0 addr 60-67 ivec 0x29
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0
"Bochs VGA" rev 0x02 at pci1 dev 2 function 0 not configured
pciide0 at pci1 dev 3 function 0 "CMD Technology PCI0646" rev 0x07:
DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide0: using ivec 0x7e0 for native-PCI interrupt
wd0 at pciide0 channel 0 drive 0: 
wd0: 16-sector PIO, LBA48, 3MB, 6400 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
atapiscsi0 at pciide0 channel 1 drive 0
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0:  removable
cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
ppb1 at pci0 dev 1 function 0 "Sun Simba" rev 0x11
pci2 at ppb1 bus 2
ne0 at pci2 dev 0 function 0 "Realtek 8029" rev 0x00: ivec 0x7d0,
address 52:54:00:12:34:56
softraid0 at root
scsibus1 at softraid0: 256 targets

It hangs at this point here, but I that's clearly another issue.

Puzzled...

-Otto

Index: bootblk.fth
===
RCS file: /cvs/src/sys/arch/sparc64/stand/bootblk/bootblk.fth,v
retrieving revision 1.9
diff -u -p -r1.9 bootblk.fth
--- bootblk.fth 2 Apr 2020 06:06:22 -   1.9
+++ bootblk.fth 29 May 2020 11:53:36 -
@@ -850,16 +850,22 @@ create cur-blockno -1 l, -1 l,\ Curren
   " /ofwboot" load-file( -- load-base )
then
 
+   ." free mem" cr
+
\ Free memory for reading disk blocks
cur-block 0<> if
   dev-block dev-blocksize free-mem
then
 
+   ." close boot dev" cr
+
\ Close boot device
boot-ihandle dup -1 <> if
   cif-close -1 to boot-ihandle 
then

+   ." start loaded program" cr
+
dup 0<> if " to load-base init-program" evaluate then
 ;
 



Re: filesystem code integer and many inodes

2020-05-29 Thread Otto Moerbeek
On Fri, May 29, 2020 at 09:30:04AM +0200, Otto Moerbeek wrote:

> On Thu, May 28, 2020 at 12:54:41PM -0600, Todd C. Miller wrote:
> 
> > On Thu, 28 May 2020 20:53:07 +0200, Otto Moerbeek wrote:
> > 
> > > Here's the separate diff for the prefcg loops. From FreeBSD.
> > 
> > OK millert@
> > 
> >  - todd
> > 
> 
> And here's the updated diff against -current. I removed a redundant
> cast in a fs_ipg * fs_ncg multiplication in fsck_ffs. Since both are
> u_int32 and we know the product is <= UINT_MAX, so we do not need to
> cast.
> 
> I would like to make some progress here, I have a followup diff to
> speed up Phase 5 of fsck_ffs...

This last line was directed at other tech@ subscribers and not so much
at millert@. Please review and/or test. Thanks!

-Otto



Re: filesystem code integer and many inodes

2020-05-29 Thread Otto Moerbeek
On Thu, May 28, 2020 at 12:54:41PM -0600, Todd C. Miller wrote:

> On Thu, 28 May 2020 20:53:07 +0200, Otto Moerbeek wrote:
> 
> > Here's the separate diff for the prefcg loops. From FreeBSD.
> 
> OK millert@
> 
>  - todd
> 

And here's the updated diff against -current. I removed a redundant
cast in a fs_ipg * fs_ncg multiplication in fsck_ffs. Since both are
u_int32 and we know the product is <= UINT_MAX, so we do not need to
cast.

I would like to make some progress here, I have a followup diff to
speed up Phase 5 of fsck_ffs...

-Otto

Index: sbin/clri/clri.c
===
RCS file: /cvs/src/sbin/clri/clri.c,v
retrieving revision 1.20
diff -u -p -r1.20 clri.c
--- sbin/clri/clri.c28 Jun 2019 13:32:43 -  1.20
+++ sbin/clri/clri.c29 May 2020 07:23:27 -
@@ -68,7 +68,8 @@ main(int argc, char *argv[])
char *fs, sblock[SBLOCKSIZE];
size_t bsize;
off_t offset;
-   int i, fd, imax, inonum;
+   int i, fd;
+   ino_t imax, inonum;
 
if (argc < 3)
usage();
Index: sbin/dumpfs/dumpfs.c
===
RCS file: /cvs/src/sbin/dumpfs/dumpfs.c,v
retrieving revision 1.35
diff -u -p -r1.35 dumpfs.c
--- sbin/dumpfs/dumpfs.c17 Feb 2020 16:11:25 -  1.35
+++ sbin/dumpfs/dumpfs.c29 May 2020 07:23:27 -
@@ -69,7 +69,7 @@ union {
 #define acgcgun.cg
 
 intdumpfs(int, const char *);
-intdumpcg(const char *, int, int);
+intdumpcg(const char *, int, u_int);
 intmarshal(const char *);
 intopen_disk(const char *);
 void   pbits(void *, int);
@@ -163,6 +163,7 @@ dumpfs(int fd, const char *name)
size_t size;
off_t off;
int i, j;
+   u_int cg;
 
switch (afs.fs_magic) {
case FS_UFS2_MAGIC:
@@ -172,7 +173,7 @@ dumpfs(int fd, const char *name)
afs.fs_magic, ctime());
printf("superblock location\t%jd\tid\t[ %x %x ]\n",
(intmax_t)afs.fs_sblockloc, afs.fs_id[0], afs.fs_id[1]);
-   printf("ncg\t%d\tsize\t%jd\tblocks\t%jd\n",
+   printf("ncg\t%u\tsize\t%jd\tblocks\t%jd\n",
afs.fs_ncg, (intmax_t)fssize, (intmax_t)afs.fs_dsize);
break;
case FS_UFS1_MAGIC:
@@ -198,7 +199,7 @@ dumpfs(int fd, const char *name)
printf("cylgrp\t%s\tinodes\t%s\tfslevel %d\n",
i < 1 ? "static" : "dynamic",
i < 2 ? "4.2/4.3BSD" : "4.4BSD", i);
-   printf("ncg\t%d\tncyl\t%d\tsize\t%d\tblocks\t%d\n",
+   printf("ncg\t%u\tncyl\t%d\tsize\t%d\tblocks\t%d\n",
afs.fs_ncg, afs.fs_ncyl, afs.fs_ffs1_size, 
afs.fs_ffs1_dsize);
break;
default:
@@ -223,9 +224,9 @@ dumpfs(int fd, const char *name)
(intmax_t)afs.fs_cstotal.cs_ndir,
(intmax_t)afs.fs_cstotal.cs_nifree, 
(intmax_t)afs.fs_cstotal.cs_nffree);
-   printf("bpg\t%d\tfpg\t%d\tipg\t%d\n",
+   printf("bpg\t%d\tfpg\t%d\tipg\t%u\n",
afs.fs_fpg / afs.fs_frag, afs.fs_fpg, afs.fs_ipg);
-   printf("nindir\t%d\tinopb\t%d\tmaxfilesize\t%ju\n",
+   printf("nindir\t%d\tinopb\t%u\tmaxfilesize\t%ju\n",
afs.fs_nindir, afs.fs_inopb, 
(uintmax_t)afs.fs_maxfilesize);
printf("sbsize\t%d\tcgsize\t%d\tcsaddr\t%jd\tcssize\t%d\n",
@@ -238,10 +239,10 @@ dumpfs(int fd, const char *name)
printf("nbfree\t%d\tndir\t%d\tnifree\t%d\tnffree\t%d\n",
afs.fs_ffs1_cstotal.cs_nbfree, afs.fs_ffs1_cstotal.cs_ndir,
afs.fs_ffs1_cstotal.cs_nifree, 
afs.fs_ffs1_cstotal.cs_nffree);
-   printf("cpg\t%d\tbpg\t%d\tfpg\t%d\tipg\t%d\n",
+   printf("cpg\t%d\tbpg\t%d\tfpg\t%d\tipg\t%u\n",
afs.fs_cpg, afs.fs_fpg / afs.fs_frag, afs.fs_fpg,
afs.fs_ipg);
-   printf("nindir\t%d\tinopb\t%d\tnspf\t%d\tmaxfilesize\t%ju\n",
+   printf("nindir\t%d\tinopb\t%u\tnspf\t%d\tmaxfilesize\t%ju\n",
afs.fs_nindir, afs.fs_inopb, afs.fs_nspf,
(uintmax_t)afs.fs_maxfilesize);
printf("sbsize\t%d\tcgsize\t%d\tcgoffset %d\tcgmask\t0x%08x\n",
@@ -261,7 +262,7 @@ dumpfs(int fd, const char *name)
afs.fs_sblkno, afs.fs_cblkno, afs.fs_iblkno, afs.fs_dblkno);
printf("cgrotor\t%d\tfmod\t%d\tronly\t%d\tclean\t%d\n",
afs.fs_cgrotor, afs.fs_fmod, afs.fs_ronly, afs.fs_clean);
-   printf("avgfpdir %d\tavgfilesize %

Re: filesystem code integer and many inodes

2020-05-28 Thread Otto Moerbeek
On Tue, May 26, 2020 at 04:11:50PM +0200, Otto Moerbeek wrote:

> On Tue, May 26, 2020 at 03:54:15PM +0200, Otto Moerbeek wrote:
> 
> > On Tue, May 26, 2020 at 07:51:28AM -0600, Todd C. Miller wrote:
> > 
> > > On Tue, 26 May 2020 12:07:21 +0200, Otto Moerbeek wrote:
> > > 
> > > > Apart from the noting the strange Subject: I also like to mention one
> > > > change in the way cylinder groups are scanned. The current code scans
> > > > forward and backward, which causes an uneven distribution of full cgs
> > > > (the upper end of the cgs will get full first). Fix that by always
> > > > scanning forward, wrapping to cg 0 if needed.
> > > 
> > > Should that be a separate commit?  I can't find any problems
> > > with the diff but I haven't tried running with it yet.
> > > 
> > >  - todd
> > 
> > Yeah, I can do that. Note that it must be comitted first, since the
> > loop condition is always true if I change the loop var to unsigned.
> > 
> > -Otto
> > 
> 
> And a new diff. I accidentally capitalized a letter just before sending.
> Thanks to naddy for spotting that.

Here's the separate diff for the prefcg loops. From FreeBSD.

OK?

-Otto


Index: ffs_alloc.c
===
RCS file: /cvs/src/sys/ufs/ffs/ffs_alloc.c,v
retrieving revision 1.111
diff -u -p -r1.111 ffs_alloc.c
--- ffs_alloc.c 28 May 2020 15:48:29 -  1.111
+++ ffs_alloc.c 28 May 2020 18:47:53 -
@@ -515,7 +518,7 @@ ffs_dirpref(struct inode *pip)
maxcontigdirs = 1;
 
/*
-* Limit number of dirs in one cg and reserve space for 
+* Limit number of dirs in one cg and reserve space for
 * regular files, but only if we have no deficit in
 * inodes or space.
 *
@@ -524,16 +527,17 @@ ffs_dirpref(struct inode *pip)
 * We scan from our preferred cylinder group forward looking
 * for a cylinder group that meets our criterion. If we get
 * to the final cylinder group and do not find anything,
-* we start scanning backwards from our preferred cylinder
-* group. The ideal would be to alternate looking forward
-* and backward, but tha tis just too complex to code for
-* the gain it would get. The most likely place where the
-* backward scan would take effect is when we start near
-* the end of the filesystem and do not find anything from
-* where we are to the end. In that case, scanning backward
-* will likely find us a suitable cylinder group much closer
-* to our desired location than if we were to start scanning
-* forward from the beginning for the filesystem.
+* we start scanning forwards from the beginning of the
+* filesystem. While it might seem sensible to start scanning
+* backwards or even to alternate looking forward and backward,
+* this approach fails badly when the filesystem is nearly full.
+* Specifically, we first search all the areas that have no space
+* and finally try the one preceding that. We repeat this on
+* every request and in the case of the final block end up
+* searching the entire filesystem. By jumping to the front
+* of the filesystem, our future forward searches always look
+* in new cylinder groups so finds every possible block after
+* one pass over the filesystem.
 */
for (cg = prefcg; cg < fs->fs_ncg; cg++)
if (fs->fs_cs(fs, cg).cs_ndir < maxndir &&
@@ -542,7 +546,7 @@ ffs_dirpref(struct inode *pip)
if (fs->fs_contigdirs[cg] < maxcontigdirs)
goto end;
}
-   for (cg = prefcg - 1; cg >= 0; cg--)
+   for (cg = 0; cg < prefcg; cg++)
if (fs->fs_cs(fs, cg).cs_ndir < maxndir &&
fs->fs_cs(fs, cg).cs_nifree >= minifree &&
fs->fs_cs(fs, cg).cs_nbfree >= minbfree) {
@@ -555,7 +559,7 @@ ffs_dirpref(struct inode *pip)
for (cg = prefcg; cg < fs->fs_ncg; cg++)
if (fs->fs_cs(fs, cg).cs_nifree >= avgifree)
goto end;
-   for (cg = prefcg - 1; cg >= 0; cg--)
+   for (cg = 0; cg < prefcg; cg++)
if (fs->fs_cs(fs, cg).cs_nifree >= avgifree)
goto end;
 end:



Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-28 Thread Otto Moerbeek
On Thu, May 28, 2020 at 01:21:21AM -0600, Jason A. Donenfeld wrote:

> On Thu, May 28, 2020 at 1:19 AM Otto Moerbeek  wrote:
> > Of course.., I was running it from a !wxallowed mount. BTW, qemu is in
> > packages, no need to build it yourself.
> 
> Sure, but now I've been somewhat nerd sniped and am playing with this
> fcode forth implementation in qemu :-P. I wonder if there's something
> missing in the 64-bit extensions to IEEE 1275, in table.fs...

OK, can reproduce. I'll see if I can find out something.

-Otto



Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-28 Thread Otto Moerbeek
On Thu, May 28, 2020 at 01:05:59AM -0600, Jason A. Donenfeld wrote:

> On Thu, May 28, 2020 at 12:15 AM Otto Moerbeek  wrote:
> >
> > On Wed, May 27, 2020 at 11:28:09PM -0600, Jason A. Donenfeld wrote:
> >
> > > Hi Otto,
> > >
> > > On Wed, May 27, 2020 at 4:07 AM Otto Moerbeek  wrote:
> > > > Although I'm not terribly interested in bugs that are only seen (s0
> > > > far) using emulation, please send me the details on how you set up
> > > > qemu.
> > >
> > > Right, it could very well be a TCG bug. But maybe not. Here's how to
> > > get things [not-]working:
> > >
> > > $ qemu-system-sparc64 --version
> > > QEMU emulator version 5.0.0
> > > $ qemu-img convert -O qcow2 miniroot66.fs disk.qcow2
> > > $ qemu-img resize disk.qcow2 20G
> > > $ qemu-system-sparc64 \
> > >         -machine sun4u \
> > >         -m 1024 \
> > >         -drive file=disk.qcow2,if=ide \
> > >         -net nic,model=ne2k_pci -net user \
> > >         -nographic -serial stdio -monitor none \
> > >         -boot c
> > >
> > > OpenBIOS for Sparc64
> > > [...]
> > > Loading FCode image...
> > > Loaded 5840 bytes
> > > entry point is 0x4000
> > > Evaluating FCode...
> > > OpenBSD IEEE 1275 Bootblock 1.4
> > > ..>> OpenBSD BOOT 1.14
> > > Trying bsd...
> > > [...]
> > > OpenBSD 6.6 (RAMDISK) #84: Sat Oct 12 10:42:12 MDT 2019
> > >    dera...@sparc64.openbsd.org:/usr/src/sys/arch/sparc64/compile/RAMDISK
> > > [...]
> > > Welcome to the OpenBSD/sparc64 6.6 installation program.
> > > (I)nstall, (U)pgrade, (A)utoinstall or (S)hell?
> > >
> > > If you swap out miniroot66.fs for miniroot67.fs, you'll get the error
> > > I sent prior.
> > >
> > > Jason
> > >
> >
> > Does not work for me, error message on OpenBSD/amd64:
> >
> > Could not allocate dynamic translator buffer
> >
> > ktrace snippet:
> >
> > 74960 qemu-system-spar CALL  
> > mmap(0,0x4000,0x7 > EC>,0x1002,-1,0)
> > 74960 qemu-system-spar RET   mmap -1 errno 91 Not supported
> >
> > It's doing a RWX mapping, won't fly on OpenBSD.
> >
> >         -Otto
> 
> This sequence worked fine on my OpenBSD box for reproducing the maybe-bug. 
> (See: mount option.) YMMV:
> 
> bart ~ # pkg_add git gmake glib2 bison sdl2 gsed bash xz
> [...]
> bart ~ # ftp -o - https://download.qemu.org/qemu-5.0.0.tar.xz | unxz | tar xf 
> -
> bart ~ # cd qemu-5.0.0/
> bart ~/qemu-5.0.0 # mkdir build && cd build
> bart ~/qemu-5.0.0/build # ../configure && gmake -j$(sysctl -n hw.ncpu)
> [...]
> bart ~/qemu-5.0.0/build # ftp 
> https://cdn.openbsd.org/pub/OpenBSD/6.7/sparc64/miniroot67.fs
> [...]
> bart ~/qemu-5.0.0/build # ./qemu-img convert -O qcow2 miniroot67.fs disk.qcow2
> bart ~/qemu-5.0.0/build # ./qemu-img resize disk.qcow2 20G
> Image resized.
> bart ~/qemu-5.0.0/build # mount
> /dev/sd0a on / type ffs (local, wxallowed)
> bart ~/qemu-5.0.0/build # ./sparc64-softmmu/qemu-system-sparc64 -machine 
> sun4u -m 1024 -drive file=disk.qcow2,if=ide -net nic,model=ne2k_pci -net user 
> -nographic -serial stdio -monitor none -boot c
> OpenBIOS for Sparc64
> Configuration device id QEMU version 1 machine id 0
> kernel cmdline 
> CPUs: 1 x SUNW,UltraSPARC-IIi
> UUID: ----
> Welcome to OpenBIOS v1.1 built on Oct 28 2019 17:08
>   Type 'help' for detailed information
> Trying disk:a...
> Not a bootable ELF image
> Not a bootable a.out image
> Loading FCode image...
> Loaded 6882 bytes
> entry point is 0x4000
> Evaluating FCode...
> OpenBSD IEEE 1275 Bootblock 2.0
> ..reserved fcode word.
> Unhandled Exception 0x0030
> PC = 0xffd0f3ac NPC = 0xffd0f3b0
> Stopping execution
> 
> Jason
> 

Of course.., I was running it from a !wxallowed mount. BTW, qemu is in
packages, no need to build it yourself.

-Otto



Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-28 Thread Otto Moerbeek
On Wed, May 27, 2020 at 11:28:09PM -0600, Jason A. Donenfeld wrote:

> Hi Otto,
> 
> On Wed, May 27, 2020 at 4:07 AM Otto Moerbeek  wrote:
> > Although I'm not terribly interested in bugs that are only seen (s0
> > far) using emulation, please send me the details on how you set up
> > qemu.
> 
> Right, it could very well be a TCG bug. But maybe not. Here's how to
> get things [not-]working:
> 
> $ qemu-system-sparc64 --version
> QEMU emulator version 5.0.0
> $ qemu-img convert -O qcow2 miniroot66.fs disk.qcow2
> $ qemu-img resize disk.qcow2 20G
> $ qemu-system-sparc64 \
> -machine sun4u \
> -m 1024 \
> -drive file=disk.qcow2,if=ide \
> -net nic,model=ne2k_pci -net user \
> -nographic -serial stdio -monitor none \
> -boot c
> 
> OpenBIOS for Sparc64
> [...]
> Loading FCode image...
> Loaded 5840 bytes
> entry point is 0x4000
> Evaluating FCode...
> OpenBSD IEEE 1275 Bootblock 1.4
> ..>> OpenBSD BOOT 1.14
> Trying bsd...
> [...]
> OpenBSD 6.6 (RAMDISK) #84: Sat Oct 12 10:42:12 MDT 2019
>dera...@sparc64.openbsd.org:/usr/src/sys/arch/sparc64/compile/RAMDISK
> [...]
> Welcome to the OpenBSD/sparc64 6.6 installation program.
> (I)nstall, (U)pgrade, (A)utoinstall or (S)hell?
> 
> If you swap out miniroot66.fs for miniroot67.fs, you'll get the error
> I sent prior.
> 
> Jason
> 

Does not work for me, error message on OpenBSD/amd64:

Could not allocate dynamic translator buffer

ktrace snippet:

74960 qemu-system-spar CALL  mmap(0,0x4000,0x7,0x1002,-1,0)
74960 qemu-system-spar RET   mmap -1 errno 91 Not supported

It's doing a RWX mapping, won't fly on OpenBSD.

-Otto




Re: WireGuard patchset for OpenBSD, rev. 2

2020-05-27 Thread Otto Moerbeek
On Wed, May 27, 2020 at 03:14:29AM -0600, Jason A. Donenfeld wrote:

> One interesting quirk in doing this on qemu is that the 6.7 and
> -current kernel both crash:
> 
> Loading FCode image...
> Loaded 6882 bytes
> entry point is 0x4000
> Evaluating FCode...
> OpenBSD IEEE 1275 Bootblock 2.0
> Unhandled Exception 0x0030
> PC = 0xffd0f3ac NPC = 0xffd0f3b0
> Stopping execution
> 
> Luckily it works fine on 6.6, so that's where I debugged this issue.
> But this might be a bug worth looking into. Otto's recent bootblk
> patch is a possible culprit, so I've CC'd him.
> 
> Jason
> 

Although I'm not terribly interested in bugs that are only seen (s0
far) using emulation, please send me the details on how you set up
qemu.

-Otto



Re: filesystem code integer and many inodes

2020-05-26 Thread Otto Moerbeek
On Tue, May 26, 2020 at 03:54:15PM +0200, Otto Moerbeek wrote:

> On Tue, May 26, 2020 at 07:51:28AM -0600, Todd C. Miller wrote:
> 
> > On Tue, 26 May 2020 12:07:21 +0200, Otto Moerbeek wrote:
> > 
> > > Apart from the noting the strange Subject: I also like to mention one
> > > change in the way cylinder groups are scanned. The current code scans
> > > forward and backward, which causes an uneven distribution of full cgs
> > > (the upper end of the cgs will get full first). Fix that by always
> > > scanning forward, wrapping to cg 0 if needed.
> > 
> > Should that be a separate commit?  I can't find any problems
> > with the diff but I haven't tried running with it yet.
> > 
> >  - todd
> 
> Yeah, I can do that. Note that it must be comitted first, since the
> loop condition is always true if I change the loop var to unsigned.
> 
>   -Otto
> 

And a new diff. I accidentally capitalized a letter just before sending.
Thanks to naddy for spotting that.

-Otto

Index: sbin/clri/clri.c
===
RCS file: /cvs/src/sbin/clri/clri.c,v
retrieving revision 1.20
diff -u -p -r1.20 clri.c
--- sbin/clri/clri.c28 Jun 2019 13:32:43 -  1.20
+++ sbin/clri/clri.c26 May 2020 09:41:18 -
@@ -68,7 +68,8 @@ main(int argc, char *argv[])
char *fs, sblock[SBLOCKSIZE];
size_t bsize;
off_t offset;
-   int i, fd, imax, inonum;
+   int i, fd;
+   ino_t imax, inonum;
 
if (argc < 3)
usage();
Index: sbin/dumpfs/dumpfs.c
===
RCS file: /cvs/src/sbin/dumpfs/dumpfs.c,v
retrieving revision 1.35
diff -u -p -r1.35 dumpfs.c
--- sbin/dumpfs/dumpfs.c17 Feb 2020 16:11:25 -  1.35
+++ sbin/dumpfs/dumpfs.c26 May 2020 09:41:18 -
@@ -69,7 +69,7 @@ union {
 #define acgcgun.cg
 
 intdumpfs(int, const char *);
-intdumpcg(const char *, int, int);
+intdumpcg(const char *, int, u_int);
 intmarshal(const char *);
 intopen_disk(const char *);
 void   pbits(void *, int);
@@ -163,6 +163,7 @@ dumpfs(int fd, const char *name)
size_t size;
off_t off;
int i, j;
+   u_int cg;
 
switch (afs.fs_magic) {
case FS_UFS2_MAGIC:
@@ -172,7 +173,7 @@ dumpfs(int fd, const char *name)
afs.fs_magic, ctime());
printf("superblock location\t%jd\tid\t[ %x %x ]\n",
(intmax_t)afs.fs_sblockloc, afs.fs_id[0], afs.fs_id[1]);
-   printf("ncg\t%d\tsize\t%jd\tblocks\t%jd\n",
+   printf("ncg\t%u\tsize\t%jd\tblocks\t%jd\n",
afs.fs_ncg, (intmax_t)fssize, (intmax_t)afs.fs_dsize);
break;
case FS_UFS1_MAGIC:
@@ -198,7 +199,7 @@ dumpfs(int fd, const char *name)
printf("cylgrp\t%s\tinodes\t%s\tfslevel %d\n",
i < 1 ? "static" : "dynamic",
i < 2 ? "4.2/4.3BSD" : "4.4BSD", i);
-   printf("ncg\t%d\tncyl\t%d\tsize\t%d\tblocks\t%d\n",
+   printf("ncg\t%u\tncyl\t%d\tsize\t%d\tblocks\t%d\n",
afs.fs_ncg, afs.fs_ncyl, afs.fs_ffs1_size, 
afs.fs_ffs1_dsize);
break;
default:
@@ -223,9 +224,9 @@ dumpfs(int fd, const char *name)
(intmax_t)afs.fs_cstotal.cs_ndir,
(intmax_t)afs.fs_cstotal.cs_nifree, 
(intmax_t)afs.fs_cstotal.cs_nffree);
-   printf("bpg\t%d\tfpg\t%d\tipg\t%d\n",
+   printf("bpg\t%d\tfpg\t%d\tipg\t%u\n",
afs.fs_fpg / afs.fs_frag, afs.fs_fpg, afs.fs_ipg);
-   printf("nindir\t%d\tinopb\t%d\tmaxfilesize\t%ju\n",
+   printf("nindir\t%d\tinopb\t%u\tmaxfilesize\t%ju\n",
afs.fs_nindir, afs.fs_inopb, 
(uintmax_t)afs.fs_maxfilesize);
printf("sbsize\t%d\tcgsize\t%d\tcsaddr\t%jd\tcssize\t%d\n",
@@ -238,10 +239,10 @@ dumpfs(int fd, const char *name)
printf("nbfree\t%d\tndir\t%d\tnifree\t%d\tnffree\t%d\n",
afs.fs_ffs1_cstotal.cs_nbfree, afs.fs_ffs1_cstotal.cs_ndir,
afs.fs_ffs1_cstotal.cs_nifree, 
afs.fs_ffs1_cstotal.cs_nffree);
-   printf("cpg\t%d\tbpg\t%d\tfpg\t%d\tipg\t%d\n",
+   printf("cpg\t%d\tbpg\t%d\tfpg\t%d\tipg\t%u\n",
afs.fs_cpg, afs.fs_fpg / afs.fs_frag, afs.fs_fpg,
afs.fs_ipg);
-   printf("nindir\t%d\tinopb\t%d\tnspf\t%d\tmaxfilesize\t%ju\n",
+   printf("nindir\t%d\tinopb\t%u\tnspf\t%d\tmaxfilesize\t%ju\n"

Re: filesystem code integer and many inodes

2020-05-26 Thread Otto Moerbeek
On Tue, May 26, 2020 at 07:51:28AM -0600, Todd C. Miller wrote:

> On Tue, 26 May 2020 12:07:21 +0200, Otto Moerbeek wrote:
> 
> > Apart from the noting the strange Subject: I also like to mention one
> > change in the way cylinder groups are scanned. The current code scans
> > forward and backward, which causes an uneven distribution of full cgs
> > (the upper end of the cgs will get full first). Fix that by always
> > scanning forward, wrapping to cg 0 if needed.
> 
> Should that be a separate commit?  I can't find any problems
> with the diff but I haven't tried running with it yet.
> 
>  - todd

Yeah, I can do that. Note that it must be comitted first, since the
loop condition is always true if I change the loop var to unsigned.

-Otto



Re: filesystem code integer and many inodes

2020-05-26 Thread Otto Moerbeek
On Tue, May 26, 2020 at 11:58:39AM +0200, Otto Moerbeek wrote:

> Hi,
> 
> In theory ffs code support a maximum of UINT_MAX inodes, but in
> practice, due to integer overflows in the current code, the limit is
> INT_MAX inodes.
> 
> This fixes that, and allows me to create and use filesystems with more
> than INT_MAX inodes. This is partly from FreeBSD code.
> 
> Main change is in fs.h, modifying a few fields to unsigned, most
> notably fs_ipg (inodes per cylinder group) and fs_ncg (number of
> cylinder groups). In various places fs_ipg * fs_ncg is computed, so
> both should be unsigned. I also made sure cg indexes are unsigned and
> ino_t is used for inode numbers.
> 
> Tested on a 30TB partition with various parameters (notably -f x -b y
> and -i z combinations).
> 
> Please test and/or review,

Apart from the noting the strange Subject: I also like to mention one
change in the way cylinder groups are scanned. The current code scans
forward and backward, which causes an uneven distribution of full cgs
(the upper end of the cgs will get full first). Fix that by always
scanning forward, wrapping to cg 0 if needed.

-Otto



filesystem code integer and many inodes

2020-05-26 Thread Otto Moerbeek
Hi,

In theory ffs code support a maximum of UINT_MAX inodes, but in
practice, due to integer overflows in the current code, the limit is
INT_MAX inodes.

This fixes that, and allows me to create and use filesystems with more
than INT_MAX inodes. This is partly from FreeBSD code.

Main change is in fs.h, modifying a few fields to unsigned, most
notably fs_ipg (inodes per cylinder group) and fs_ncg (number of
cylinder groups). In various places fs_ipg * fs_ncg is computed, so
both should be unsigned. I also made sure cg indexes are unsigned and
ino_t is used for inode numbers.

Tested on a 30TB partition with various parameters (notably -f x -b y
and -i z combinations).

Please test and/or review,

-Otto

Index: sbin/clri/clri.c
===
RCS file: /cvs/src/sbin/clri/clri.c,v
retrieving revision 1.20
diff -u -p -r1.20 clri.c
--- sbin/clri/clri.c28 Jun 2019 13:32:43 -  1.20
+++ sbin/clri/clri.c26 May 2020 09:41:18 -
@@ -68,7 +68,8 @@ main(int argc, char *argv[])
char *fs, sblock[SBLOCKSIZE];
size_t bsize;
off_t offset;
-   int i, fd, imax, inonum;
+   int i, fd;
+   ino_t imax, inonum;
 
if (argc < 3)
usage();
Index: sbin/dumpfs/dumpfs.c
===
RCS file: /cvs/src/sbin/dumpfs/dumpfs.c,v
retrieving revision 1.35
diff -u -p -r1.35 dumpfs.c
--- sbin/dumpfs/dumpfs.c17 Feb 2020 16:11:25 -  1.35
+++ sbin/dumpfs/dumpfs.c26 May 2020 09:41:18 -
@@ -69,7 +69,7 @@ union {
 #define acgcgun.cg
 
 intdumpfs(int, const char *);
-intdumpcg(const char *, int, int);
+intdumpcg(const char *, int, u_int);
 intmarshal(const char *);
 intopen_disk(const char *);
 void   pbits(void *, int);
@@ -163,6 +163,7 @@ dumpfs(int fd, const char *name)
size_t size;
off_t off;
int i, j;
+   u_int cg;
 
switch (afs.fs_magic) {
case FS_UFS2_MAGIC:
@@ -172,7 +173,7 @@ dumpfs(int fd, const char *name)
afs.fs_magic, ctime());
printf("superblock location\t%jd\tid\t[ %x %x ]\n",
(intmax_t)afs.fs_sblockloc, afs.fs_id[0], afs.fs_id[1]);
-   printf("ncg\t%d\tsize\t%jd\tblocks\t%jd\n",
+   printf("ncg\t%u\tsize\t%jd\tblocks\t%jd\n",
afs.fs_ncg, (intmax_t)fssize, (intmax_t)afs.fs_dsize);
break;
case FS_UFS1_MAGIC:
@@ -198,7 +199,7 @@ dumpfs(int fd, const char *name)
printf("cylgrp\t%s\tinodes\t%s\tfslevel %d\n",
i < 1 ? "static" : "dynamic",
i < 2 ? "4.2/4.3BSD" : "4.4BSD", i);
-   printf("ncg\t%d\tncyl\t%d\tsize\t%d\tblocks\t%d\n",
+   printf("ncg\t%u\tncyl\t%d\tsize\t%d\tblocks\t%d\n",
afs.fs_ncg, afs.fs_ncyl, afs.fs_ffs1_size, 
afs.fs_ffs1_dsize);
break;
default:
@@ -223,9 +224,9 @@ dumpfs(int fd, const char *name)
(intmax_t)afs.fs_cstotal.cs_ndir,
(intmax_t)afs.fs_cstotal.cs_nifree, 
(intmax_t)afs.fs_cstotal.cs_nffree);
-   printf("bpg\t%d\tfpg\t%d\tipg\t%d\n",
+   printf("bpg\t%d\tfpg\t%d\tipg\t%u\n",
afs.fs_fpg / afs.fs_frag, afs.fs_fpg, afs.fs_ipg);
-   printf("nindir\t%d\tinopb\t%d\tmaxfilesize\t%ju\n",
+   printf("nindir\t%d\tinopb\t%u\tmaxfilesize\t%ju\n",
afs.fs_nindir, afs.fs_inopb, 
(uintmax_t)afs.fs_maxfilesize);
printf("sbsize\t%d\tcgsize\t%d\tcsaddr\t%jd\tcssize\t%d\n",
@@ -238,10 +239,10 @@ dumpfs(int fd, const char *name)
printf("nbfree\t%d\tndir\t%d\tnifree\t%d\tnffree\t%d\n",
afs.fs_ffs1_cstotal.cs_nbfree, afs.fs_ffs1_cstotal.cs_ndir,
afs.fs_ffs1_cstotal.cs_nifree, 
afs.fs_ffs1_cstotal.cs_nffree);
-   printf("cpg\t%d\tbpg\t%d\tfpg\t%d\tipg\t%d\n",
+   printf("cpg\t%d\tbpg\t%d\tfpg\t%d\tipg\t%u\n",
afs.fs_cpg, afs.fs_fpg / afs.fs_frag, afs.fs_fpg,
afs.fs_ipg);
-   printf("nindir\t%d\tinopb\t%d\tnspf\t%d\tmaxfilesize\t%ju\n",
+   printf("nindir\t%d\tinopb\t%u\tnspf\t%d\tmaxfilesize\t%ju\n",
afs.fs_nindir, afs.fs_inopb, afs.fs_nspf,
(uintmax_t)afs.fs_maxfilesize);
printf("sbsize\t%d\tcgsize\t%d\tcgoffset %d\tcgmask\t0x%08x\n",
@@ -261,7 +262,7 @@ dumpfs(int fd, const char *name)
afs.fs_sblkno, afs.fs_cblkno, afs.fs_iblkno, afs.fs_dblkno);
printf("cgrotor\t%d\tfmod\t%d\tronly\t%d\tclean\t%d\n",
afs.fs_cgrotor, afs.fs_fmod, afs.fs_ronly, afs.fs_clean);
-   printf("avgfpdir %d\tavgfilesize %d\n",
+   printf("avgfpdir %u\tavgfilesize %u\n",
afs.fs_avgfpdir, 

Re: Increase default number of devices created for LDOMs on sparc64

2020-05-18 Thread Otto Moerbeek
On Mon, May 18, 2020 at 06:27:04PM -, Miod Vallat wrote:

> 
> > Learning how LDOMs work on this T4-1 and we only create 8 devices
> > (each /dev/ldom* and /dev/ttyV*) by default. The now-commonly-available
> > T4-1 machines can do far more than that pretty easily, so bump up the
> > number created by default from 8 to 16.
> >
> > ok?
> 
> MAKEDEV is a generated file. Edit the second-to-last line of MAKEDEV.md
> to add the extra 8 nodes.
> 

After edit MAKEDEV.md, commit MAKEDEV.md, run make and commit the resulting
MAKEDEV file.

-Otto



Re: luna88k: option FFS2 in RAMDISK

2020-05-18 Thread Otto Moerbeek
On Mon, May 18, 2020 at 08:29:57AM +0200, Otto Moerbeek wrote:

> Hi,
> 
> while luna88k cannot boot from ffs2, it should be able to use ffs2
> for non-root filesystems. So enable the RAMDISK to support ffs2,
> 
> Cannot test, no hardware :-(
> 
> ok?
> 
>   -Otto
> 

And now with diff

Index: RAMDISK
===
RCS file: /cvs/src/sys/arch/luna88k/conf/RAMDISK,v
retrieving revision 1.14
diff -u -p -r1.14 RAMDISK
--- RAMDISK 4 Sep 2019 14:29:42 -   1.14
+++ RAMDISK 18 May 2020 06:26:45 -
@@ -13,6 +13,7 @@ optionRAMDISK_HOOKS
 option SCSITERSE
 
 option FFS
+option FFS2
 option INET6
 
 config bsd root rd0a swap on rd0b



luna88k: option FFS2 in RAMDISK

2020-05-18 Thread Otto Moerbeek
Hi,

while luna88k cannot boot from ffs2, it should be able to use ffs2
for non-root filesystems. So enable the RAMDISK to support ffs2,

Cannot test, no hardware :-(

ok?

-Otto



Re: scan_ffs prints negative size

2020-05-16 Thread Otto Moerbeek
On Sat, May 16, 2020 at 02:25:43PM +0200, Denis Fondras wrote:

> Small diff to fix size printing.
> 
> Before :
> $ doas scan_ffs -v sd0
> block 55167 id 758d4818,f2894c98 size -859043093
> 
> After:
> $ doas ./obj/scan_ffs -v sd0
> block 55167 id 758d4818,f2894c98 size 3435924203

I do not think this is right. The field is int32_t, if it is negative,
something is wrong.

-Otto

> 
> Index: scan_ffs.c
> ===
> RCS file: /cvs/src/sbin/scan_ffs/scan_ffs.c,v
> retrieving revision 1.23
> diff -u -p -r1.23 scan_ffs.c
> --- scan_ffs.c28 Jun 2019 13:32:46 -  1.23
> +++ scan_ffs.c16 May 2020 12:19:18 -
> @@ -70,7 +70,7 @@ ufsscan(int fd, daddr_t beg, daddr_t end
>   sb = (struct fs*)([n]);
>   if (sb->fs_magic == FS_MAGIC) {
>   if (flags & FLAG_VERBOSE)
> - printf("block %lld id %x,%x size %d\n",
> + printf("block %lld id %x,%x size %u\n",
>   (long long)(blk + (n/512)),
>   sb->fs_id[0], sb->fs_id[1],
>   sb->fs_ffs1_size);
> 



Re: [PATCH] sysupgrade

2020-04-30 Thread Otto Moerbeek
On Wed, Apr 29, 2020 at 10:28:12PM -0500, James Jerkins wrote:

> Hello,
> 
> This patch adds two new options to sysupgrade. The first option is for small 
> box systems like an APU system that only has the base and manual sets 
> installed. The second option is for headless systems without X11 like 
> servers. I have tested this patch from the 6.5 release to 6.6 release to 
> current for both the minimal and no X11 options. In order to test, I did 
> remove the ftp -N option which is not present in the 6.5 or 6.6 releases. I 
> also tested sysupgrade without invoking either new option from 6.5 to 6.6 to 
> current for regression. All of these tests resulted in a successful upgrade.
> 
> I also repeated the above tests from a full install to minimal and base 
> installs and, of course, the system is broken after such an upgrade. While it 
> is possible to check for the presence of clang or xinit to guess if the 
> requested upgrade is safe, I believe it would still only be a guess that 
> couldn't eliminate all the creative ways someone could break their 
> installation. If anyone has a suggestion for how to address this problem I am 
> willing to work on it and submit an updated patch.
> 
> Thank you to all the OpenBSD developers for the incredible work you do every 
> day on OpenBSD and for sharing your work.
> 
> James

*if* we want the ability to upgrade not all sets, it should be done
automatically for the sets that are currently present, not with an
option. The latter is a sure way to end up with partial upgrades.

-Otto

> 
> 
> Index: sysupgrade.sh
> ===
> RCS file: /cvs/src/usr.sbin/sysupgrade/sysupgrade.sh,v
> retrieving revision 1.37
> diff -u -p -u -p -r1.37 sysupgrade.sh
> --- sysupgrade.sh 26 Jan 2020 22:08:36 -  1.37
> +++ sysupgrade.sh 30 Apr 2020 03:07:15 -
> @@ -34,7 +34,7 @@ ug_err()
>  
>  usage()
>  {
> - ug_err "usage: ${0##*/} [-fkn] [-r | -s] [installurl]"
> + ug_err "usage: ${0##*/} [-fkn] [-r | -s] [-x | -z] [installurl]"
>  }
>  
>  unpriv()
> @@ -78,14 +78,18 @@ SNAP=false
>  FORCE=false
>  KEEP=false
>  REBOOT=true
> +NOX11=false
> +MINIMAL=false
>  
> -while getopts fknrs arg; do
> +while getopts fknrsxz arg; do
>   case ${arg} in
>   f)  FORCE=true;;
>   k)  KEEP=true;;
>   n)  REBOOT=false;;
>   r)  RELEASE=true;;
>   s)  SNAP=true;;
> + x)  NOX11=true;;
> + z)  MINIMAL=true;;
>   *)  usage;;
>   esac
>  done
> @@ -96,6 +100,10 @@ if $RELEASE && $SNAP; then
>   usage
>  fi
>  
> +if $MINIMAL && $NOX11; then
> + usage
> +fi
> +
>  set -A _KERNV -- $(sysctl -n kern.version |
>   sed 's/^OpenBSD \([1-9][0-9]*\.[0-9]\)\([^ ]*\).*/\1 \2/;q')
>  
> @@ -152,9 +160,19 @@ if cmp -s /var/db/installed.SHA256 SHA25
>   exit 0
>  fi
>  
> +if $MINIMAL; then
> +# INSTALL.*, bsd*, base*, man*
> + SETS=$(sed -n -e 's/^SHA256 (\(.*\)) .*/\1/' \
> + -e '/^INSTALL\./p;/^bsd/p;/^base/p;/^man/p' SHA256)
> +elif $NOX11; then
> +# INSTALL.*, bsd*, *.tgz without x*
> + SETS=$(sed -n -e 's/^SHA256 (\(.*\)) .*/\1/' \
> + -e '/^INSTALL\./p;/^bsd/p;/^x/d;/\.tgz$/p' SHA256)
> +else
>  # INSTALL.*, bsd*, *.tgz
> -SETS=$(sed -n -e 's/^SHA256 (\(.*\)) .*/\1/' \
> --e '/^INSTALL\./p;/^bsd/p;/\.tgz$/p' SHA256)
> + SETS=$(sed -n -e 's/^SHA256 (\(.*\)) .*/\1/' \
> + -e '/^INSTALL\./p;/^bsd/p;/\.tgz$/p' SHA256)
> +fi
>  
>  OLD_FILES=$(ls)
>  OLD_FILES=$(rmel SHA256 $OLD_FILES)
> 
> 
> Index: sysupgrade.8
> ===
> RCS file: /cvs/src/usr.sbin/sysupgrade/sysupgrade.8,v
> retrieving revision 1.10
> diff -u -p -u -p -r1.10 sysupgrade.8
> --- sysupgrade.8  3 Oct 2019 12:43:58 -   1.10
> +++ sysupgrade.8  30 Apr 2020 03:07:30 -
> @@ -24,6 +24,7 @@
>  .Nm
>  .Op Fl fkn
>  .Op Fl r | s
> +.Op Fl x | z
>  .Op Ar installurl
>  .Sh DESCRIPTION
>  .Nm
> @@ -66,6 +67,16 @@ This is the default if the system is cur
>  .It Fl s
>  Upgrade to a snapshot.
>  This is the default if the system is currently running a snapshot.
> +.It Fl x
> +Perform an upgrade of the kernel and all sets except the X11 sets.
> +This option will render your system
> +.Sy unusable
> +if the current installation includes other sets.
> +.It Fl z
> +Perform an upgrade of the kernel and base and manual sets.
> +This option will render your system
> +.Sy unusable
> +if the current installation includes other sets.
>  .El
>  .Sh FILES
>  .Bl -tag -width "/auto_upgrade.conf" -compact
> 
> 



Re: Install: Invalid group _sndiop in #163 amd64

2020-04-29 Thread Otto Moerbeek
On Wed, Apr 29, 2020 at 11:58:55AM +, Oliver Marugg wrote:

> Yes, I made the mistake using this older snap for install (was on my usb
> stick) with consequences.
> With a today downloaded install.fs to USB Stick and http install no problem
> as you, Otto and Stuart mentioned. thx.
> Sorry, I really should know using latest -current only, but I learnt not
> using an older install.fs to install. Sorry for the noise.

Please always make sure youd are running an installer that matches the
sets you are downloading. If not, bad things can happen. This case is
just one of the things that can go wrong.

-Otto

> 
> -- Originalnachricht --
> Von: "Alexandre Ratchov" 
> An: "Otto Moerbeek" 
> Cc: "Oliver Marugg" ; tech@openbsd.org
> Gesendet: 29.04.2020 13:53:46
> Betreff: Re: Install: Invalid group _sndiop in #163 amd64
> 
> > On Wed, Apr 29, 2020 at 01:26:51PM +0200, Otto Moerbeek wrote:
> > >  On Wed, Apr 29, 2020 at 01:02:29PM +0200, Otto Moerbeek wrote:
> > > 
> > >  > On Wed, Apr 29, 2020 at 10:28:37AM +, Oliver Marugg wrote:
> > >  >
> > >  > > Possible typo group _sniop shoud be _sndiod:
> > >  >
> > >  > Nope. _sniop is the correct name. The real issue is that that group 
> > > isn't in
> > >  > the /etc/group file used during install.
> > >  >
> > >  > How that can happen I do not know yet.
> > > 
> > >  Just did an install using install67.fg and did not see that error.
> > > 
> > 
> > There was a snap ~1-2 weeks ago with a broken bsd.rd (with /etc/groups
> > missing _sndiop).
> > 
> > Now this is fixed.
> 



Re: Install: Invalid group _sndiop in #163 amd64

2020-04-29 Thread Otto Moerbeek
On Wed, Apr 29, 2020 at 01:02:29PM +0200, Otto Moerbeek wrote:

> On Wed, Apr 29, 2020 at 10:28:37AM +, Oliver Marugg wrote:
> 
> > Possible typo group _sniop shoud be _sndiod:
> 
> Nope. _sniop is the correct name. The real issue is that that group isn't in
> the /etc/group file used during install.
> 
> How that can happen I do not know yet.

Just did an install using install67.fg and did not see that error.

> 
>   -Otto
> 
> > 
> > During fresh install from install.fs amd64 of OpenBSD 6.7-beta (GENERIC.MP)
> > #163: Tue Apr 28 21:35:13 MDT 2020
> > ...
> > Making all device nodes...chgrp: group is invalid: _sndiop
> > chgrp: group is invalid: _sndiop
> >  done.
> > 
> > Multiprocessor machine; using bsd.mp instead of bsd.
> > ...
> > 
> > -oliver
> > 
> 
> 
> 



Re: Install: Invalid group _sndiop in #163 amd64

2020-04-29 Thread Otto Moerbeek
On Wed, Apr 29, 2020 at 10:28:37AM +, Oliver Marugg wrote:

> Possible typo group _sniop shoud be _sndiod:

Nope. _sniop is the correct name. The real issue is that that group isn't in
the /etc/group file used during install.

How that can happen I do not know yet.

-Otto

> 
> During fresh install from install.fs amd64 of OpenBSD 6.7-beta (GENERIC.MP)
> #163: Tue Apr 28 21:35:13 MDT 2020
> ...
> Making all device nodes...chgrp: group is invalid: _sndiop
> chgrp: group is invalid: _sndiop
>  done.
> 
> Multiprocessor machine; using bsd.mp instead of bsd.
> ...
> 
> -oliver
> 





Re: suggest to run rpki-client hourly

2020-04-17 Thread Otto Moerbeek
On Thu, Apr 16, 2020 at 05:18:15PM -0600, Theo de Raadt wrote:

> Job Snijders  wrote:
> 
> > In cases where rpki-client for some reason ends up taking longer than an
> > hour, the next execution attempt of the command will be skipped. Better
> > to just try again an hour later, this helps avoid concurrent rpki-client
> > processes crossing streams.
> 
> Agree.  As discussed privately rpki-client has safe output functions,
> but the parallel rsync input phase lacks collision prevention.
> 
> > I think 'once an hour' is a reasonable balance between the needs of
> > internet users (the ROAs creators who may depend urgently on an
> > expedient distribution of updated RPKI information); considerations for
> > what the Internet's CA infrastructure realisticly can support; and what
> > network operators are willing to tolerate in churn. We have to hold the
> > throttle open at the right position.
> 
> I agree we should try 1 hour.
> 
> > +#~ *   *   *   *   -s -n rpki-client -v && bgpctl reload
> 
> I would prefer if you use -ns rather than the two seperate options.
> 

ATM crontab -e validation after edit does not like that.

-Otto



Re: switch powerpc to MI mplock

2020-04-12 Thread Otto Moerbeek
On Fri, Apr 10, 2020 at 09:31:24AM +0200, Martin Pieuchot wrote:

> In order to reduce the differences with other architecture and to be able
> to use WITNESS on powerpc I'm proposing the diff below that makes use of
> the MI mp (ticket) lock implementation for powerpc.
> 
> This has been tested by Peter J. Philipp but I'd like to have more tests
> before proceeding.
> 
> As explained previously the pmap code, which is using a recursive
> spinlock to protect the hash, still uses the old lock implementation with
> this diff.
> 
> Please fire your MP macppc and report back :o)

Did a build on a dual cpu G4 with this diff without incident.

-Otto

> 
> Index: arch/powerpc/include/mplock.h
> ===
> RCS file: /cvs/src/sys/arch/powerpc/include/mplock.h,v
> retrieving revision 1.3
> diff -u -p -r1.3 mplock.h
> --- arch/powerpc/include/mplock.h 4 Dec 2017 09:51:03 -   1.3
> +++ arch/powerpc/include/mplock.h 9 Apr 2020 16:21:55 -
> @@ -27,25 +27,27 @@
>  #ifndef _POWERPC_MPLOCK_H_
>  #define _POWERPC_MPLOCK_H_
>  
> +#define __USE_MI_MPLOCK
> +
>  /*
>   * Really simple spinlock implementation with recursive capabilities.
>   * Correctness is paramount, no fancyness allowed.
>   */
>  
> -struct __mp_lock {
> +struct __ppc_lock {
>   volatile struct cpu_info *mpl_cpu;
>   volatile long   mpl_count;
>  };
>  
>  #ifndef _LOCORE
>  
> -void __mp_lock_init(struct __mp_lock *);
> -void __mp_lock(struct __mp_lock *);
> -void __mp_unlock(struct __mp_lock *);
> -int __mp_release_all(struct __mp_lock *);
> -int __mp_release_all_but_one(struct __mp_lock *);
> -void __mp_acquire_count(struct __mp_lock *, int);
> -int __mp_lock_held(struct __mp_lock *, struct cpu_info *);
> +void __ppc_lock_init(struct __ppc_lock *);
> +void __ppc_lock(struct __ppc_lock *);
> +void __ppc_unlock(struct __ppc_lock *);
> +int __ppc_release_all(struct __ppc_lock *);
> +int __ppc_release_all_but_one(struct __ppc_lock *);
> +void __ppc_acquire_count(struct __ppc_lock *, int);
> +int __ppc_lock_held(struct __ppc_lock *, struct cpu_info *);
>  
>  #endif
>  
> Index: arch/powerpc/powerpc/lock_machdep.c
> ===
> RCS file: /cvs/src/sys/arch/powerpc/powerpc/lock_machdep.c,v
> retrieving revision 1.8
> diff -u -p -r1.8 lock_machdep.c
> --- arch/powerpc/powerpc/lock_machdep.c   5 Mar 2020 09:28:31 -   
> 1.8
> +++ arch/powerpc/powerpc/lock_machdep.c   9 Apr 2020 16:21:01 -
> @@ -27,7 +27,7 @@
>  #include 
>  
>  void
> -__mp_lock_init(struct __mp_lock *lock)
> +__ppc_lock_init(struct __ppc_lock *lock)
>  {
>   lock->mpl_cpu = NULL;
>   lock->mpl_count = 0;
> @@ -43,7 +43,7 @@ extern int __mp_lock_spinout;
>  #endif
>  
>  static __inline void
> -__mp_lock_spin(struct __mp_lock *mpl)
> +__ppc_lock_spin(struct __ppc_lock *mpl)
>  {
>  #ifndef MP_LOCKDEBUG
>   while (mpl->mpl_count != 0)
> @@ -55,14 +55,14 @@ __mp_lock_spin(struct __mp_lock *mpl)
>   CPU_BUSY_CYCLE();
>  
>   if (nticks == 0) {
> - db_printf("__mp_lock(%p): lock spun out\n", mpl);
> + db_printf("__ppc_lock(%p): lock spun out\n", mpl);
>   db_enter();
>   }
>  #endif
>  }
>  
>  void
> -__mp_lock(struct __mp_lock *mpl)
> +__ppc_lock(struct __ppc_lock *mpl)
>  {
>   /*
>* Please notice that mpl_count gets incremented twice for the
> @@ -92,18 +92,18 @@ __mp_lock(struct __mp_lock *mpl)
>   }
>   ppc_intr_enable(s);
>  
> - __mp_lock_spin(mpl);
> + __ppc_lock_spin(mpl);
>   }
>  }
>  
>  void
> -__mp_unlock(struct __mp_lock *mpl)
> +__ppc_unlock(struct __ppc_lock *mpl)
>  {
>   int s;
>  
>  #ifdef MP_LOCKDEBUG
>   if (mpl->mpl_cpu != curcpu()) {
> - db_printf("__mp_unlock(%p): not held lock\n", mpl);
> + db_printf("__ppc_unlock(%p): not held lock\n", mpl);
>   db_enter();
>   }
>  #endif
> @@ -118,14 +118,14 @@ __mp_unlock(struct __mp_lock *mpl)
>  }
>  
>  int
> -__mp_release_all(struct __mp_lock *mpl)
> +__ppc_release_all(struct __ppc_lock *mpl)
>  {
>   int rv = mpl->mpl_count - 1;
>   int s;
>  
>  #ifdef MP_LOCKDEBUG
>   if (mpl->mpl_cpu != curcpu()) {
> - db_printf("__mp_release_all(%p): not held lock\n", mpl);
> + db_printf("__ppc_release_all(%p): not held lock\n", mpl);
>   db_enter();
>   }
>  #endif
> @@ -140,13 +140,13 @@ __mp_release_all(struct __mp_lock *mpl)
>  }
>  
>  int
> -__mp_release_all_but_one(struct __mp_lock *mpl)
> +__ppc_release_all_but_one(struct __ppc_lock *mpl)
>  {
>   int rv = mpl->mpl_count - 2;
>  
>  #ifdef MP_LOCKDEBUG
>   if (mpl->mpl_cpu != curcpu()) {
> - db_printf("__mp_release_all_but_one(%p): not held lock\n", mpl);
> + db_printf("__ppc_release_all_but_one(%p): not held lock\n", 
> mpl);
>   

Re: ntpd: prevent duplicate definitions of `conf` and `ibus_dns`

2020-04-11 Thread Otto Moerbeek
On Sat, Apr 04, 2020 at 04:00:50PM -0700, Michael Forney wrote:

> This prevents a linking error with gcc 10, which enables -fno-common
> by default.
> 
> ISO C requires exactly one definition of objects with external
> linkage throughout the entire program.
> 
> `conf` is already defined in ntpd.c and declared extern in ntpd.h,
> so the definition in parse.y is redundant.
> 
> The two definitions of `ibuf_dns` are distinct and local to their
> respective files, so make them static.
> ---
> It looks like while the ibuf_dns variables are distinct, only one
> or the other is used (one through ntp_main() and the other through
> ntp_dns()). So, an alternative is to add an extern declaration in
> ntpd.h.
> 
> I noticed that there are quite a few places where static could be
> used, but isn't, so I'm not sure which approach is preferred.

Thanks, committed. Opinions differ on static for non-library symbols.

-Otto

> 
>  usr.sbin/ntpd/ntp.c | 2 +-
>  usr.sbin/ntpd/ntp_dns.c | 2 +-
>  usr.sbin/ntpd/parse.y   | 1 -
>  3 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/usr.sbin/ntpd/ntp.c b/usr.sbin/ntpd/ntp.c
> index ea9a4e92274..9cb74f1d8da 100644
> --- a/usr.sbin/ntpd/ntp.c
> +++ b/usr.sbin/ntpd/ntp.c
> @@ -42,7 +42,7 @@
>  
>  volatile sig_atomic_t ntp_quit = 0;
>  struct imsgbuf   *ibuf_main;
> -struct imsgbuf   *ibuf_dns;
> +static struct imsgbuf*ibuf_dns;
>  struct ntpd_conf *conf;
>  struct ctl_conns  ctl_conns;
>  u_int peer_cnt;
> diff --git a/usr.sbin/ntpd/ntp_dns.c b/usr.sbin/ntpd/ntp_dns.c
> index 2e1a978338a..2dbd79dada6 100644
> --- a/usr.sbin/ntpd/ntp_dns.c
> +++ b/usr.sbin/ntpd/ntp_dns.c
> @@ -39,7 +39,7 @@
>  #include "ntpd.h"
>  
>  volatile sig_atomic_t quit_dns = 0;
> -struct imsgbuf   *ibuf_dns;
> +static struct imsgbuf*ibuf_dns;
>  
>  void sighdlr_dns(int);
>  int  dns_dispatch_imsg(struct ntpd_conf *);
> diff --git a/usr.sbin/ntpd/parse.y b/usr.sbin/ntpd/parse.y
> index 8d7ab09de34..533f67f1b8f 100644
> --- a/usr.sbin/ntpd/parse.y
> +++ b/usr.sbin/ntpd/parse.y
> @@ -57,7 +57,6 @@ int  lgetc(int);
>  int   lungetc(int);
>  int   findeol(void);
>  
> -struct ntpd_conf *conf;
>  struct sockaddr_inquery_addr4;
>  struct sockaddr_in6   query_addr6;
>  int   poolseqnum;
> -- 
> 2.26.0
> 



Re: Include /var/www/tmp into base install

2020-04-08 Thread Otto Moerbeek
On Wed, Apr 08, 2020 at 11:08:41AM +0100, Kevin Chadwick wrote:

> On 2020-04-07 17:12, Andrew Grillet wrote:
> > For me, the "/var is full" problem can be adequately mitigated by mounting
> > a separate partition as /var/tmp.
> 
> Does FFS2 have the same disklabel limit on partitions? I guess they are 
> unrelated.

Unrelated.

> 
> Sometimes users may decide which mount points to edit out during install and
> /var/tmp gives one more for them to understand if it's a problem moving to 
> /var.
> 
> Creating /var/tmp is actually a simpler consideration than removing an OS
> provided /var/tmp
> 
> On web servers, I have /var/www and /var/www/bin as well as others on mount
> points so e.g. /var/www is noexec and optionally read-only. /var/www/tmp is
> sometimes mfs.
> 
> That many mount points obviously doesn't fit so well generically but 
> permissive
> permissions if more mount points were available, might work.
> 
> I also wonder why /var/log is not on it's own partition by default. I almost
> always create it. I guess for smaller disks, more mount points is a pain?
> 
> 
> > More of an issue, although obviously not major - if there are a large
> > number of tmp directories, is making sure that they are all
> > routinely purged. Yes, I know this is down to careless admin practice, but
> > it happened to me earlier this year.
> 
> A smaller partition would actually have less inodes by default ffs settings.
> Something to consider. No idea if/how ffs2 changes that?

With default parameter an FFS2 fuilesystem will have almost the same
number of inodes as an FFS1 filesystem. Note that disklabel instructs
newfs (via the fsize/bsize fields in the label) to use larger block
sizes for larger partitions, resulting in less inodes compared to
size. But for the same size class it's a linear relation.

-Otto 



Re: Regarding the understanding of the malloc(3) code

2020-03-27 Thread Otto Moerbeek
On Fri, Mar 27, 2020 at 02:21:44PM +0530, Neeraj Pal wrote:

> On Wed, Mar 25, 2020 at 2:06 AM Otto Moerbeek  wrote:
> 
> > pp points to a page of chunks
> > bp point to the associated meta info: a bitmap that says which chunks
> > in the page are free. The bitmap is an aray of shorts, so 16 bits per
> > entry.
> >
> 
>  per entry means for our case bits[1], so only one entry?
> 
> in the code k is first is the chunk number, and then multiplied (by
> > shifting it by bp->shift) to get the byte offset of the chunk inside
> > the chunk page.
> >
> 
> Okay, so, here, we have 3 things:
> 1. pp is the page
> 2. k is the chunknum, before shiting, which means k is the index for the
> chunk in the page pp.
> 3. After shifting, k becomes the byte offset of the chunk inside the pp
> page.
> 
> p->bits is a bit mask. Each short in it holds 16 bits, so the first 16
> > chunks end uo in the first short, the next in the 2nd short etc.
> >
> > The *lp ^= 1 << k line actuall sets the bit.
> >
> 
> Okay, 16 chunks because each chunk has different bit and a total of 16 bits
> represents 16 different chunks.
> 
> So, as per the init_chunk_info() function. for the bitmap operations given
> below:
> 840
> 841 /* set all valid bits in the bitmap */
> 842 i = p->total - 1;
> 843 memset(p->bits, 0xff, sizeof(p->bits[0]) * (i / MALLOC_BITS));
> 844 p->bits[i / MALLOC_BITS] = (2U << (i % MALLOC_BITS)) - 1;
> 845 }
> 
> Here, it first calculates the i, which is 256 - 1, that is, 255 or in other
> words 0xff.
> Then, the memset(3) writes the len bytes of "0xff" to p->bits.
> 
> Here, len is sizeof(p-bits[0]) * (i / MALLOC_BITS), where
> sizeof(p->bits[0]) = 2bytes and (i / MALLOC_BYTES) is 255 / 16, that is,
> 15. And total it becomes, 2 * 15 = 30bytes.

For chunk size 256, there will indeed be 16 chunks in a page. i will
*not* be 255 in that case, but 15.  There is no such thing as
MALLOC_BYTES.  the memset will becomes memset(p->bits, 0xff, 2) and
set p->bits[0] to to 0x The line below it will set p->bits[1] to
(2<<15) - 1 = 0x; So all 16 bits needed are set to 1.

The debug session below is for chunk size 16, so the numbers are different.

> 
> So, it copies 30 bytes of 0xff to p->bits. But here, the main confusion
> lies, like the bits[1] is of type u_short and as we know the sizeof u_short
> is 2bytes but we are copying the 30 bytes through memset(3).
> 
> Then, in the next line, it is again making the 2 bytes to 0xff through
> calculating the last index of bit array. That is,
> 
> p->bits[255 / 16] = (2U << (255 % 16)) - 1
> p->bits[15] = 65536 - 1 = 65535, which, is 0x.
> 
> So, after overall calculations, it is copying the 30 bytes over to the
> sizeof(u_short), which is 2 bytes.
> 
> Below are the observations from the debugger:
> 
> openbsd# LD_PRELOAD=/usr/src/lib/libc/obj/libc.so.95.1 gdb -q sample
> (gdb) br main
> Breakpoint 1 at 0x1363: file sample.c, line 7.
> (gdb) r 12345
> Starting program: /root/test/sample 12345
> Breakpoint 1 at 0xfa68fc8a363: file sample.c, line 7.
> Error while reading shared library symbols:
> Dwarf Error: wrong version in compilation unit header (is 4, should be 2)
> [in module /usr/libexec/ld.so]
> 
> Breakpoint 1, main (argc=2, argv=0x7f7ea018) at sample.c:7
> 7 char *buff1, *buff2 = NULL;
> Current language:  auto; currently minimal
> (gdb) s
> 8 buff1 = (char *)malloc(8);
> (gdb) s
> malloc (size=8) at /usr/src/lib/libc/stdlib/malloc.c:1293
> 1293 int saved_errno = errno;
> (gdb) br init_chunk_info
> Breakpoint 2 at 0xfa951dafa20: file /usr/src/lib/libc/stdlib/malloc.c, line
> 832.
> (gdb) c
> Continuing.
> 
> Breakpoint 2, init_chunk_info (d=0xfa9011de110, p=0xfa910ecedb0, bits=4)
> at /usr/src/lib/libc/stdlib/malloc.c:832
> 832 if (bits == 0) {
> (gdb) n
> 838 p->shift = bits;
> (gdb)
> 839 p->total = p->free = MALLOC_PAGESIZE >> p->shift;
> (gdb)
> 840 p->size = 1U << bits;
> (gdb)
> 841 p->offset = howmany(p->total, MALLOC_BITS);
> (gdb)
> 843 p->canary = (u_short)d->canary1;
> (gdb)
> 846 i = p->total - 1;
> (gdb)
> 848 memset(p->bits, 0xff, sizeof(p->bits[0]) * (i / MALLOC_BITS));
> (gdb) x/30wx p->bits
> 0xfa910ecedd4: 0x 0x 0x 0x
> 0xfa910ecede4: 0x 0x 0x 0x
> 0xfa910ecedf4: 0x 0x 0x 0x
> 0xfa910ecee04: 0x 0x 0x 0x
> 0xfa910ecee14: 0x 0x 0x 0x
> 0xfa910ecee24: 0x 0x

Re: Regarding the understanding of the malloc(3) code

2020-03-24 Thread Otto Moerbeek
On Wed, Mar 25, 2020 at 01:54:51AM +0530, Neeraj Pal wrote:

> Hi Otto,
> 
> I am having two small issues or confusions:
> 
> First Query:
> 
>  885 /*
>  886  * Allocate a page of chunks
>  887  */
>  888 static struct chunk_info *
>  889 omalloc_make_chunks(struct dir_info *d, int bits, int listnum)
>  890 {
>  891 struct chunk_info *bp;
>  892 void *pp;
>  893
>  894 /* Allocate a new bucket */
>  895 pp = map(d, NULL, MALLOC_PAGESIZE, 0);
>  896 if (pp == MAP_FAILED)
>  897 return NULL;
>  898
>  899 /* memory protect the page allocated in the malloc(0) case */
>  900 if (bits == 0 && mprotect(pp, MALLOC_PAGESIZE, PROT_NONE) ==
> -1)
>  901 goto err;
>  902
>  903 bp = alloc_chunk_info(d, bits);
>  904 if (bp == NULL)
>  905 goto err;
>  906 bp->page = pp;
>  907
>  908 if (insert(d, (void *)((uintptr_t)pp | (bits + 1)),
> (uintptr_t)bp,
>  909 NULL))
>  910 goto err;
>  911 LIST_INSERT_HEAD(>chunk_dir[bits][listnum], bp, entries);
>  912 return bp;
>  913
>  914 err:
>  915 unmap(d, pp, MALLOC_PAGESIZE, 0, d->malloc_junk);
>  916 return NULL;
>  917 }
> 
> So, actually, as per the comment on line no. 885 in the above code, it is
> mentioned that it will allocate a page of chunks. But, what I have observed
> as from the code that pp is the new bucket means pp is the page which is
> full of chunks or which has chunks. As we can see on line 905, it calls
> alloc_chunk_info() function then inside that it calls init_chunk_info(),
> so, in short, we can say that first, it will allocate some chunk and then
> initialized that allocated chunk and then returns the same.

pp points to a page of chunks
bp point to the associated meta info: a bitmap that says which chunks
in the page are free. The bitmap is an aray of shorts, so 16 bits per entry.

> 
> So, if we compare from here then it means, bp is the allocated and
> initialized chunk and the bp->page = pp, means it stores the page pp to
> bp->page. Then, after the hash table, it returns bp, means it returns the
> allocated-initialized chunk. But at the same time, I was referring the
> https://junk.tintagel.pl/openbsd-daily-malloc-3.txt by @mulander where he
> mentioned that "so bp is a page of chunks". So, I became little confused
> because pp is the page of chunks, which is used in function malloc_bytes()
> where it calculates the page offset and adds it to page bp->page, which is
> used by the user to input or writes stuff, like it returns address bp->page
> + k addr returns by malloc(3).

again bp is the meta info. bp->page is the page of chunks itself

> 
> Second Query:
> 
> And, bp->page is the pp and k is the offset, so, is it possible to get the
> address of the specific chunk because (bp->page + k) belongs to some chunks
> or we can say k is the index of the chunk that is inside the bucket
> bp->page or pp?

in the code k is first is the chunk number, and then multiplied (by
shifting it by bp->shift) to get the byte offset of the chunk inside
the chunk page.

> 
> 
> And in the structure chunk_info, u_short bits[1] is the bit for tracking
> whether the chunk is free or not. So, it belongs to each and every chunk.
> For example, there are 10 chunks in a page and 5fth chunk is not free then
> it will set that bits[1] to 0 and other 9 will be 1.
> 
> OR
> 
> Is it like bits denote the bits, like in the function init_chunk_info, in
> the end, it copies 0xff bytes to p->bits with size 30. Then it calculates
> p->bits[15] = 65535, so it is like it makes the last bit to 1.

p->bits is a bit mask. Each short in it holds 16 bits, so the first 16
chunks end uo in the first short, the next in the 2nd short etc.

The *lp ^= 1 << k line actuall sets the bit.

> sometimes I also have things in my mind like if bits[1] then how it is
> possible to assignbits[15] it means it performs the operation on bits. that
> I have analyzed by debugging.
> 
> I have referred the paper for the understanding of bits[1] value,
> http://www.ouah.org/BSD-heap-smashing.txt. Actually not have proper
> confidence of understanding on bits logic.
> 
> Apart from the issues discussed above, I mostly understood most of the
> stuff on malloc(3) but for the above still not getting the convinsible
> understainding.
> 
> 
> Regards,
> Neeraj



bootblocks ffs2 support for sparc64

2020-03-24 Thread Otto Moerbeek


Hi,

As some of you know I have been working on the ability to boot from an
ffs2 root partition on as many platforms as possible. Many platforms
are done, but sparc64, landisk, octeon and luna88k remain.

sparc64 uses bootblock written in Forth that interpret the filesystem
on the boot disk.

This diff takes the bootblk changes netbsd did to support ffs2 and
merges it with the softraid changes stsp@ did a few years back. Result
is bootblocks that can load the 2nd stage bootloader from ffs1, ffs2
and softraid.

Please test it does not break your current setup with either ffs1 or
softraid, procedure is:

1. Make sure you have the very recent fgen changes

2. Build and install in sys/arch/sparc64/stand/bootblk

3. installboot  (for softraid setups this is not the
physical disk, but the softraid).

4. Reboot and make sure you are using the right version of the
bootblocks: 2.0.

Note that this does not actually enable ffs2 for installs. That comes
later, first I need to make sure I did not break existing setups.

-Otto

Index: sys/arch/sparc64/stand/bootblk/Makefile
===
RCS file: /cvs/src/sys/arch/sparc64/stand/bootblk/Makefile,v
retrieving revision 1.14
diff -u -p -r1.14 Makefile
--- sys/arch/sparc64/stand/bootblk/Makefile 17 Oct 2017 19:31:56 -  
1.14
+++ sys/arch/sparc64/stand/bootblk/Makefile 24 Mar 2020 07:33:45 -
@@ -8,8 +8,8 @@ S=  ${CURDIR}/../../../..
 # Override normal settings
 #
 
-CLEANFILES=assym.fth.h assym.fth.h.tmp machine \
-   bootblk bootblk.text bootblk.text.tmp
+CLEANFILES=machine ffs.fth.h \
+   bootblk bootblk.text bootblk.text.tmp -.d
 
 NOMAN=
 STRIPFLAG=
@@ -26,17 +26,17 @@ CPPFLAGS=   ${INCLUDES} ${IDENT} ${PARAM}
 
 all: bootblk.text bootblk
 
-assym.fth.h: ${.CURDIR}/genassym.sh genfth.cf
-   sh ${.CURDIR}/genassym.sh ${CC} ${CFLAGS} \
-   ${CPPFLAGS} ${PROF} <${.CURDIR}/genfth.cf >assym.fth.h.tmp && \
-   mv -f assym.fth.h.tmp assym.fth.h
-
-bootblk.text: bootblk.fth assym.fth.h
-   awk '/fload/ { file=$$2; while ((ret = getline "/dev/stderr"; next 
}; !/fload/' \
-   ${.CURDIR}/bootblk.fth >bootblk.text.tmp && \
+ffs.fth.h: ${.CURDIR}/genassym.sh genfth.cf
+   sh ${.CURDIR}/genassym.sh -f ${CC} ${CFLAGS} ${CPPFLAGS} ${PROF} \
+   < ${.CURDIR}/genfth.cf >ffs.fth.h.tmp && \
+   mv -f ffs.fth.h.tmp ffs.fth.h
+
+bootblk.text: bootblk.fth ffs.fth.h
+   awk '/fload/ { print "#include \"" $$2 "\"" }; !/fload/' \
+   ${.CURDIR}/bootblk.fth | /usr/bin/cpp -P > bootblk.text.tmp && \
mv -f bootblk.text.tmp bootblk.text
 
-bootblk: bootblk.fth assym.fth.h
+bootblk: bootblk.fth ffs.fth.h
fgen -o bootblk ${.CURDIR}/bootblk.fth
 
 beforeinstall:
Index: sys/arch/sparc64/stand/bootblk/bootblk.fth
===
RCS file: /cvs/src/sys/arch/sparc64/stand/bootblk/bootblk.fth,v
retrieving revision 1.8
diff -u -p -r1.8 bootblk.fth
--- sys/arch/sparc64/stand/bootblk/bootblk.fth  26 Nov 2014 19:57:41 -  
1.8
+++ sys/arch/sparc64/stand/bootblk/bootblk.fth  24 Mar 2020 07:33:45 -
@@ -1,12 +1,12 @@
-\  $OpenBSD: bootblk.fth,v 1.8 2014/11/26 19:57:41 stsp Exp $
-\  $NetBSD: bootblk.fth,v 1.3 2001/08/15 20:10:24 eeh Exp $
+\  $OpenBSD$
+\  $NetBSD: bootblk.fth,v 1.15 2015/08/20 05:40:08 dholland Exp $
 \
 \  IEEE 1275 Open Firmware Boot Block
 \
 \  Parses disklabel and UFS and loads the file called `ofwboot'
 \
 \
-\  Copyright (c) 1998 Eduardo Horvath.
+\  Copyright (c) 1998-2010 Eduardo Horvath.
 \  All rights reserved.
 \
 \  Redistribution and use in source and binary forms, with or without
@@ -17,11 +17,6 @@
 \  2. Redistributions in binary form must reproduce the above copyright
 \ notice, this list of conditions and the following disclaimer in the
 \ documentation and/or other materials provided with the distribution.
-\  3. All advertising materials mentioning features or use of this software
-\ must display the following acknowledgement:
-\   This product includes software developed by Eduardo Horvath.
-\  4. The name of the author may not be used to endorse or promote products
-\ derived from this software without specific prior written permission
 \
 \  THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
 \  IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 
WARRANTIES
@@ -41,6 +36,8 @@ headers
 
 false value boot-debug?
 
+: KB d# 1024 * ;
+
 \
 \ First some housekeeping:  Open /chosen and set up vectors into
 \  client-services
@@ -61,15 +58,15 @@ defer cif-seek ( low high ihandle -- -1|
 \ defer cif-peer ( phandle -- phandle )
 \ defer cif-getprop ( len adr cstr phandle -- )
 
-: find-cif-method ( method,len -- xf )
+: find-cif-method ( method len -- xf )
cif-phandle 

Re: Regarding the understanding of the malloc(3) code

2020-03-18 Thread Otto Moerbeek
On Wed, Mar 18, 2020 at 03:35:45PM +0530, Neeraj Pal wrote:

> On Wed, 18 Mar, 2020, 12:46 pm Otto Moerbeek,  wrote:
> 
> > There are several types of canaries. They try to detect corruption of
> > various meta data structures. There are alo canaries for user allocated
> > data, they are enabled with the C option.
> >
> Yeah, I am using option C through sysctl(8) to understand the Canary
> concept. And understood the user controlled part, that validates during
> free(3).
> 
> But I was thinking the idea behind the writing the canary checks, I know
> canary is something means random cookie which we usually places to detect
> overflow/underflow related vulns. So, I thought if there is now way one can
> corrupt the metadata then is it possible to remove them as may be it will
> improve some performance. But I don't have the idea or main reason for the
> same. So I maybe wrong. So, that's why I asked.

Not all meta-data canaries live in r/o memory.

> 
> >
> > In general addding an int to a pointer calculates an offset, so yes.
> >
> Yeah, I understood.
> 
> Study whats the role of p and k is. Let the code speak. If you fail to
> > understand parts, study further and play with it. You'll learn more
> > from that than asking for confirmation all the time.
> 
> Yeah sure. I understood basic idea about those calculations for p and k.
> That, p is page, here, and k is the offset for the chunk on the page.  I
> think the whole calculations related to that. But due to lots of
> mathematical operations not able to understood some parts,maybe I have to
> read and understand it again and again.
> 
> Actually, after compiling libc with debug symbols, I have written one basic
> sample code and debugging it though gdb and reading the source code side by
> side.
> 
> I am daily learning something new from reading the malloc(3) code. But
> sometimes I am not able to relate or match those thoughts that I got from
> reading codes with the thoughts of the developer that he has while
> development. So that's why I have asked about my understanding from
> developer's point of view.
> 
> Thank you for resolving my queries :)
> 
> Regards
> Neeraj

A thing that also helps is to follow the cvs history of a file. The
first version of my malloc (form 2008) was more simple, and looking at
the diffs through the years gives you great hints at what features
were added over the years, plus a few bugfixes.

-Otto



Re: Regarding the understanding of the malloc(3) code

2020-03-18 Thread Otto Moerbeek
On Wed, Mar 18, 2020 at 07:29:51AM +0530, Neeraj Pal wrote:

> On Fri, Mar 13, 2020, at 11:45 AM Otto Moerbeek  wrote:
> >
> > Please indent your code snippets.
> yeah, my apologies. I shall indent the code snippets.
> 
> >
> > di_info is special. Having a guard page on both sides for regular
> > allocation can be done, but would waste more pages. Note that
> > allocations are already spread throughout the address space, so it is
> > very likely that an allocation is surrounded by unmapped pages.
> >
> Okay, So, as from omalloc_poolinit() code it is not there, but we can do
> but it will be wastage of one-page memory and also entire address space is
> spread with unmapped pages. So, it is very likely that there will some
> unmapped page beside dir_info in the memory.
> 
> >
> > We need two pages to store dir_info.
> >
> >
> > > Now, MMAPNONE maps up to len (8192 + (4096 * 2)) = 16384
> >
> > We allocate 4 pages prot none.
> >
> > > then, mprotecting the pages through p + MALLOC_PAGESIZE + DIR_INFO_RSZ
> - 1
> >
> > the two middle pages are r/w.
> >
> > > d_avail = (8192 - 4824) >> 4 = 3368 >> 4 = 210
> > >
> > > Now, d = (p + MALLOC_PAGESIZE + (random_no_under_210 << 4)
> >
> > di_info ends up on an aligned address somewhere in the middle pages on
> > an offset between 0 and (210<<4) = 0..3360, counting from the start of
> > the two middle pages.
> Thank you for the information. So, it is like [(p + MALLOC_PAGESIZE) +
> (0..3360)]. It is kind of like array, For example, let's suppose, x = p +
> MALLOC_PAGESIZE. So, it will be x[0..3360].
> Am I right?
> >
> > MALLOC_CHUNK_LISTS could be increased at the cost of overhead.
> > MALLOC_MAXSHIFT cannot, it is the shift of the max chunk size that fits in
> > a page.
> Okay. I understood. And, yeah more randomization means more cost overhead.
> 
> Thank you, Otto, for your detailed information.
> 
> Please find the code below:
> 
> 948   /*
> 949* Allocate a chunk
> 950*/
> 951   static void *
> 952   malloc_bytes(struct dir_info *d, size_t size, void *f)
> 953   {
> 954   u_int i, r;
> 955   int j, listnum;
> 956   size_t k;
> 957   u_short *lp;
> 958   struct chunk_info *bp;
> 959   void *p;
> 960   
> 961   if (mopts.malloc_canary != (d->canary1 ^ 
> (u_int32_t)(uintptr_t)d) ||
> 962   d->canary1 != ~d->canary2)
> 963   wrterror(d, "internal struct corrupt");
> 964   
> 965   j = find_chunksize(size);
> 966   
> 967   r = ((u_int)getrbyte(d) << 8) | getrbyte(d);
> 968   listnum = r % MALLOC_CHUNK_LISTS;
> 969   /* If it's empty, make a page more of that size chunks */
> 970   if ((bp = LIST_FIRST(>chunk_dir[j][listnum])) == NULL) {
> 971   bp = omalloc_make_chunks(d, j, listnum);
> 972   if (bp == NULL)
> 973   return NULL;
> 974   }
> 975   
> 976   if (bp->canary != (u_short)d->canary1)
> 977   wrterror(d, "chunk info corrupted");
> 
> 
> Here, in the code mentioned above, we can see that on line 961 and
> line 976. I don't understand why it is checking for
> canaries of malloc_readonly with d and then allocated chunk bp with d,
> because I have seen that validation of canary
> happens in free(3) not in malloc(3). So, it is like there may be some
> cases where one can corrupt these also??

There are several types of canaries. They try to detect corruption of
various meta data structures. There are alo canaries for user allocated
data, they are enabled with the C option.

> 
> 
> 978   
> 979   i = (r / MALLOC_CHUNK_LISTS) & (bp->total - 1);
> 980   
> 981   /* start somewhere in a short */
> 982   lp = >bits[i / MALLOC_BITS];
> 983   if (*lp) {
> 984   j = i % MALLOC_BITS;
> 985   k = ffs(*lp >> j);
> 986   if (k != 0) {
> 987   k += j - 1;
> 988   goto found;
> 989   }
> 990   }
> 991   /* no bit halfway, go to next full short */
> 992   i /= MALLOC_BITS;
> 993   for (;;) {
> 994   if (++i >= bp->total / MALLOC_BITS)
> 995   i = 0;
> 996   lp = >bits[i];
> 997   if (*lp) {
> 998   k = ffs(*lp) - 1;
> 999   break;
> 

Re: Regarding the understanding of the malloc(3) code

2020-03-13 Thread Otto Moerbeek
On Fri, Mar 13, 2020 at 03:43:21AM +0530, Neeraj Pal wrote:

> On Tue, Mar 10, 2020 at 4:03 PM Otto Moerbeek  wrote:
> > There's an off by one in your question :-)
> Yeah, sorry about that, actually in flow of writing the mail forgot to notice.
> 
> > Fo single threaded programs, two malloc_dir pools are maintained.
> > One for MAP_CONCEALED memory (#0) and one for regular (#1).
> > For multi-threaded porgram more pools are created. This is to avoid 
> > contention,
> > accesses to diffrent pools can run concurently.
> okay, thanks for the information. So, likewise, for multi threaded
> applications, by default the malloc_mutexes is 8, (#0 for
> MAP_CONCEALED and other 7 for regular) as mentioned in the below code:

Please indent your code snippets.

> 
> static void
> omalloc_init(void)
> {
> char *p, *q, b[16];
> int i, j, mib[2];
> size_t sb;
> /*
> * Default options
> */
> mopts.malloc_mutexes = 8;
> mopts.def_malloc_junk = 1;
> ...
> ...
> ...
> 
> 
> 
> > yes. That way both underflow and oveflow has a chance to be caught.
> yeah, it's good. but I am not sure about it from the code. I mean from
> the below code snippet it seems that by default (means vm.malloc_conf
> != G) it is not on both sides?
>   dir_info

di_info is special. Having a guard page on both sides for regular
allocation can be done, but would waste more pages. Note that
allocations are already spread thrrougout the address space, so it is
very likely that an allocation is surrounded by unmapped pages.

> 
> static void
> omalloc_poolinit(struct dir_info **dp, int mmap_flag)
> {
> char *p;
> size_t d_avail, regioninfo_size;
> struct dir_info *d;
> int i, j;
> /*
> * Allocate dir_info with a guard page on either side. Also
> * randomise offset inside the page at which the dir_info
> * lies (subject to alignment by 1 << MALLOC_MINSHIFT)
> */
> if ((p = MMAPNONE(DIR_INFO_RSZ + (MALLOC_PAGESIZE * 2), mmap_flag)) ==
> MAP_FAILED)
> wrterror(NULL, "malloc init mmap failed");
> mprotect(p + MALLOC_PAGESIZE, DIR_INFO_RSZ, PROT_READ | PROT_WRITE);
> d_avail = (DIR_INFO_RSZ - sizeof(*d)) >> MALLOC_MINSHIFT;
> d = (struct dir_info *)(p + MALLOC_PAGESIZE +
> (arc4random_uniform(d_avail) << MALLOC_MINSHIFT));
> ...
> ...
> ...
> 
> From the above code, my observations are
> sizeof(*d) = 4824
> MALLOC_PAGEMASK = 4095
> DIR_INFO_RSZ = (4284 + 4095) & ~4095 = 8192

We need two pages to store dir_info.

> 
> Now, MMAPNONE maps up to len (8192 + (4096 * 2)) = 16384

We allocate 4 pages prot none.

> then, mprotecting the pages through p + MALLOC_PAGESIZE + DIR_INFO_RSZ - 1

the two middle pages are r/w.

> d_avail = (8192 - 4824) >> 4 = 3368 >> 4 = 210
> 
> Now, d = (p + MALLOC_PAGESIZE + (random_no_under_210 << 4)

di_info ends up on an aligned address somewhere in the middle pages on
an offset between 0 and (210<<4) = 0..3360, counting from the start of
the two middle pages.

> 
> where d is the randomized offset inside the page at which dir_info lies,
> So, lets suppose p is 1000 then 1000 + 4096 + (100 << 4) , then d will be 
> 6696.
> So, it means [p + MALLOC_PAGESIZE] can be treated as guard page before
> dir_info offset and if yes then after that there is no guard page by
> default, right?

No, there will be a guard page on each side. 

> 
> 
> > The second index of chunk_dir has size MALLOC_CHUNK_LISTS which is 4,
> > not 32.
> Yeah, sorry for incorrect values.
> > More than one list of free chunk pages per chunk size is maintained to
> > allow for more randomization.
> Okay, so in short it means below code will create 12 chunk_info_list
> where i is 0 to 11 and for each and every ith index there is j, so as
> per that,
> chunk_dir[0][0]
> chunk_dir[0][1]
> chunk_dir[0][2]
> chunk_dir[0][3]
> ...
> ...
> ...
> chunk_dir[11][0]
> chunk_dir[11][1]
> chunk_dir[11][2]
> chunk_dir[11][3]
> 
> ...
> ...
> ...
> for (i = 0; i <= MALLOC_MAXSHIFT; i++) {
> LIST_INIT(>chunk_info_list[i]);
> for (j = 0; j < MALLOC_CHUNK_LISTS; j++)
> LIST_INIT(>chunk_dir[i][j]);
> ...
> 
> 
> So, these many lists simply means it allows more randomization,
> wherever it is used, like also in case of allocating chunk using
> omalloc_make_chunks() in malloc_bytes()
> ...
> ...
> ...
> j = find_chunksize(size);
> r = ((u_int)getrbyte(d) << 8) | getrbyte(d);
> listnum = r % MALLOC_CHUNK_LISTS;
> /* If it's empty, make a page more of that size chunks */
> if ((bp = LIST_FIRST(>chunk_dir[j][listnum])) == NULL) {
> bp = omalloc_make_chunks(d, j, listnum);
> if (bp == NULL)
> return NUL

Re: Regarding the understanding of the malloc(3) code

2020-03-10 Thread Otto Moerbeek
On Tue, Mar 10, 2020 at 03:04:00AM +0530, Neeraj Pal wrote:

> Hi there,
> 
> I am reading and learning the internals of malloc(3).
> So, after compiling the debug version of libc and using it for one
> basic sample code for malloc(3).
> 
> Not able to understand some parts of the following code snippet:
> 
> void
> _malloc_init(int from_rthreads)
> {
> u_int i, nmutexes;
> struct dir_info *d;
> 
> _MALLOC_LOCK(1);
> if (!from_rthreads && mopts.malloc_pool[1]) {
> _MALLOC_UNLOCK(1);
> return;
> }
> if (!mopts.malloc_canary)
> omalloc_init();
> 
> nmutexes = from_rthreads ? mopts.malloc_mutexes : 2;
> if (((uintptr_t)_readonly & MALLOC_PAGEMASK) == 0)
> mprotect(_readonly, sizeof(malloc_readonly),
> PROT_READ | PROT_WRITE);
> for (i = 0; i < nmutexes; i++) {
> if (mopts.malloc_pool[i])
> continue;
> if (i == 0) {
> omalloc_poolinit(, MAP_CONCEAL);
> d->malloc_junk = 2;
> d->malloc_cache = 0;
> } else {
> omalloc_poolinit(, 0);
> d->malloc_junk = mopts.def_malloc_junk;
> d->malloc_cache = mopts.def_malloc_cache;
> }
> d->mutex = i;
> mopts.malloc_pool[i] = d;
> }
> 
> if (from_rthreads)
> mopts.malloc_mt = 1;
> else
> mopts.internal_funcs = 1;
> 
> /*
>  * Options have been set and will never be reset.
>  * Prevent further tampering with them.
>  */
> if (((uintptr_t)_readonly & MALLOC_PAGEMASK) == 0)
> mprotect(_readonly, sizeof(malloc_readonly), PROT_READ);
> _MALLOC_UNLOCK(1);
> }
> 
> In the above code snippet, could some please through some light on the
> following queries
> 1. Use of nmutexes?
> 2. And, why it is looping till nmutexes and calls function
> omalloc_poolinit(, MAP_CONCEAL) /* when i == 0*/
> and other calls to omalloc_poolinit(, 0) /* when i != 0 */
> So, suppose in the case of nmutexes = 2, I am not sure where are the
> uses of these 3 initialized pools, that is, malloc_pool[0],
> malloc_pool[1] and malloc_pool[2]?

There's an off by one in your question :-)

Fo single threaded programs, two malloc_dir pools are maintained. 
One for MAP_CONCEALED memory (#0) and one for regular (#1). 
For multi-threaded porgram more pools are created. This is to avoid contention,
accesses to diffrent pools can run concurently.

> 
> 
> static void
> omalloc_poolinit(struct dir_info **dp, int mmap_flag)
> {
> char *p;
> size_t d_avail, regioninfo_size;
> struct dir_info *d;
> int i, j;
> 
> /*
>  * Allocate dir_info with a guard page on either side. Also
>  * randomise offset inside the page at which the dir_info
>  * lies (subject to alignment by 1 << MALLOC_MINSHIFT)
>  */
> if ((p = MMAPNONE(DIR_INFO_RSZ + (MALLOC_PAGESIZE * 2), mmap_flag)) ==
> MAP_FAILED)
> wrterror(NULL, "malloc init mmap failed");
> mprotect(p + MALLOC_PAGESIZE, DIR_INFO_RSZ, PROT_READ | PROT_WRITE);
> d_avail = (DIR_INFO_RSZ - sizeof(*d)) >> MALLOC_MINSHIFT;
> d = (struct dir_info *)(p + MALLOC_PAGESIZE +
> (arc4random_uniform(d_avail) << MALLOC_MINSHIFT));
> 
> rbytes_init(d);
> d->regions_free = d->regions_total = MALLOC_INITIAL_REGIONS;
> regioninfo_size = d->regions_total * sizeof(struct region_info);
> d->r = MMAP(regioninfo_size, mmap_flag);
> if (d->r == MAP_FAILED) {
> d->regions_total = 0;
> wrterror(NULL, "malloc init mmap failed");
> }
> for (i = 0; i <= MALLOC_MAXSHIFT; i++) {
> LIST_INIT(>chunk_info_list[i]);
> for (j = 0; j < MALLOC_CHUNK_LISTS; j++)
> LIST_INIT(>chunk_dir[i][j]);
> ...
> ...
> ...
> In the above code, inside function omalloc_poolinit(), first, it will
> allocate dir_info structure with a guard page on *both sides* like
> [guard page]  [guard page]?

yes. That way both underflow and oveflow has a chance to be caught.

> 
> And, why it is initializing the list  chunk_info_list to 32 times and
> chunk_dir to 64 times, that is chunk_info_list[0...31] and
> chunk_dir[0...31][0...31]?

The second index of chunk_dir has size MALLOC_CHUNK_LISTS which is 4,
not 32.

More than one list of free chunk pages per chunk size is maintained to
allow for more randomization.

-Otto


> 
> Could someone please provide some hints on the above queries?
> 
> Regards,
> Neeraj
> 



heads up: amd64 snap

2020-03-07 Thread Otto Moerbeek
It looks like some BIOS do not like the recent biosboot changes.
Symptoms are a hang in the bios

I reverted them, the next amd64 snap should be ok again. 

0Otto




Re: top(1) CPU field width diff

2020-02-27 Thread Otto Moerbeek
On Thu, Feb 27, 2020 at 08:09:49PM +0100, Piotr Durlej wrote:

> Hello,
> 
> the following top(1) patch allows for printing CPU percentages >=100%
> without overflowing the CPU column width.
> 
> https://www.durlej.net/diff/top.diff
> 
> Regards,
> Piotr Durlej
> 

Please post diffs inline. Also, your snprintf calls are wrong.

-Otto



Re: amd64: use ffs2 for filesystems created by install

2020-02-21 Thread Otto Moerbeek
On Thu, Feb 20, 2020 at 09:02:32PM +0100, Otto Moerbeek wrote:

> On Thu, Feb 20, 2020 at 08:20:13PM +0100, Otto Moerbeek wrote:
> 
> > On Thu, Feb 20, 2020 at 07:48:25PM +0100, Otto Moerbeek wrote:
> > 
> > > On Thu, Feb 20, 2020 at 07:27:55PM +0100, Matthias Schmidt wrote:
> > > 
> > > > Hi Otto,
> > > > 
> > > > * Otto Moerbeek wrote:
> > > > > Hi,
> > > > > 
> > > > > This is amd64 only, it contains some i386 pieces, but those are
> > > > > incomplete.
> > > > > 
> > > > > With the diff, install uses ffs2 for the filesystems created during
> > > > > install. All boot loaders (except the floppy one) contain support for
> > > > > loading a kernel from ffs2.
> > > > > 
> > > > > To test, create a snapshot (see release(8)) with this diff and use it
> > > > > to install a new system. You could also use the snap at
> > > > > www.drijf.net/openbsd/66. Note that it is unsigned.
> > > > > 
> > > > > Note that when you manually create an fs, it still will be ffs1 by
> > > > > default. That is to not disturb other platforms. Use -O2 for ffs2.
> > > > > 
> > > > > Please test and provide feedback. One think you should see is that the
> > > > > newfs is much faster and fsck as well, since ffs2 creates inodes
> > > > > lazily and thus has much less inodes to check in the typical case.
> > > > 
> > > > I used your provided snap to do a few installations with VMs.  The
> > > > following things worked as expected:
> > > > 
> > > > * Default install on one disk
> > > > * Install on softraid crypto disk
> > > > * Install on softraid 1 with two disks below
> > > > 
> > > > I verified each time with dumpfs that FFS2 was used indeed.
> > > > 
> > > > I also checked out a large git repo on the first VM into /home and
> > > > pulled the plug to see how fsck behaves.  After reboot, fsck marked / as
> > > > clean and then I saw the message that init changed the secure level from
> > > > 0 to 1, but nothing more happened.  I could type so the system was not
> > > > hanging, however, it was also not checking /home (which I expected).  I
> > > > waited for 5 minutes, pulled the plug again and the fsck worked as
> > > > normal and the system booted to the login prompt.
> > > > 
> > > > I did that multiple times and each time it stopped on the first run.
> > > > After power cycling, everything worked as expected and ... wow, fsck on
> > > > FFS2 is indeed fast.
> > > > 
> > > > Cheers
> > > > 
> > > > Matthias
> > > 
> > > Thanks for testing. I am seeing the same problem if I do a vmctl stop
> > > -f of a VM. After that, / gets fscked followed by a hang.  Another reset
> > > fixes things. Going to check if this happens with a standard snap.
> > > 
> > >   -Otto
> > > 
> > 
> > It odes not happen with an ffs1 root. With ffs2, I'm seeing:
> > 
> > WARNING: / was not properly unmounted
> > Automatic boot in progress: starting file system checks.
> > /dev/sd0a (d7c346af87544f00.a): 1806 files, 41159 used, 463552 free
> > (48 frags, 57938 blocks, 0.0% fragmentation)
> > /dev/sd0a (d7c346af87544f00.a): MARKING FILE SYSTEM CLEAN
> > panic: init died (signal 11, exit 0)
> > 
> > If I go to single user mode, I can run fsck -p, but then the / fs is gone...
> > that explains why init would die. Investigating further...
> > 
> > -Otto
> > 
> 
> It looks like something is going wrong with the special case for root
> filesystems, a so called hot root. See fsck_ffs/main.c
> 
> To be continued.
> 
>   -Otto
> 

Diff below fixes the issue. Snap with it is uploaded. You should be
able to upgrade yor test with it.

-Otto

Index: ufs/ffs/ffs_vfsops.c
===
RCS file: /cvs/src/sys/ufs/ffs/ffs_vfsops.c,v
retrieving revision 1.182
diff -u -p -r1.182 ffs_vfsops.c
--- ufs/ffs/ffs_vfsops.c26 Dec 2019 13:28:49 -  1.182
+++ ufs/ffs/ffs_vfsops.c21 Feb 2020 10:34:01 -
@@ -533,8 +533,12 @@ ffs_reload_vnode(struct vnode *vp, void 
return (error);
}
 
-   *ip->i_din1 = *((struct ufs1_dinode *)bp->b_data +
-   ino_to_fsbo(fra->fs, ip->i_number));
+   if (fra->fs->fs_magic == FS_UFS1_MAGIC)
+   *ip->i_din1 = *((struct ufs1_dinode *)bp->b_data +
+   ino_to_fsbo(fra->fs, ip->i_number));
+   else
+   *ip->i_din2 = *((struct ufs2_dinode *)bp->b_data +
+   ino_to_fsbo(fra->fs, ip->i_number));
ip->i_effnlink = DIP(ip, nlink);
brelse(bp);
vput(vp);



Re: amd64: use ffs2 for filesystems created by install

2020-02-20 Thread Otto Moerbeek
On Thu, Feb 20, 2020 at 08:20:13PM +0100, Otto Moerbeek wrote:

> On Thu, Feb 20, 2020 at 07:48:25PM +0100, Otto Moerbeek wrote:
> 
> > On Thu, Feb 20, 2020 at 07:27:55PM +0100, Matthias Schmidt wrote:
> > 
> > > Hi Otto,
> > > 
> > > * Otto Moerbeek wrote:
> > > > Hi,
> > > > 
> > > > This is amd64 only, it contains some i386 pieces, but those are
> > > > incomplete.
> > > > 
> > > > With the diff, install uses ffs2 for the filesystems created during
> > > > install. All boot loaders (except the floppy one) contain support for
> > > > loading a kernel from ffs2.
> > > > 
> > > > To test, create a snapshot (see release(8)) with this diff and use it
> > > > to install a new system. You could also use the snap at
> > > > www.drijf.net/openbsd/66. Note that it is unsigned.
> > > > 
> > > > Note that when you manually create an fs, it still will be ffs1 by
> > > > default. That is to not disturb other platforms. Use -O2 for ffs2.
> > > > 
> > > > Please test and provide feedback. One think you should see is that the
> > > > newfs is much faster and fsck as well, since ffs2 creates inodes
> > > > lazily and thus has much less inodes to check in the typical case.
> > > 
> > > I used your provided snap to do a few installations with VMs.  The
> > > following things worked as expected:
> > > 
> > > * Default install on one disk
> > > * Install on softraid crypto disk
> > > * Install on softraid 1 with two disks below
> > > 
> > > I verified each time with dumpfs that FFS2 was used indeed.
> > > 
> > > I also checked out a large git repo on the first VM into /home and
> > > pulled the plug to see how fsck behaves.  After reboot, fsck marked / as
> > > clean and then I saw the message that init changed the secure level from
> > > 0 to 1, but nothing more happened.  I could type so the system was not
> > > hanging, however, it was also not checking /home (which I expected).  I
> > > waited for 5 minutes, pulled the plug again and the fsck worked as
> > > normal and the system booted to the login prompt.
> > > 
> > > I did that multiple times and each time it stopped on the first run.
> > > After power cycling, everything worked as expected and ... wow, fsck on
> > > FFS2 is indeed fast.
> > > 
> > > Cheers
> > > 
> > >   Matthias
> > 
> > Thanks for testing. I am seeing the same problem if I do a vmctl stop
> > -f of a VM. After that, / gets fscked followed by a hang.  Another reset
> > fixes things. Going to check if this happens with a standard snap.
> > 
> > -Otto
> > 
> 
> It odes not happen with an ffs1 root. With ffs2, I'm seeing:
> 
> WARNING: / was not properly unmounted
> Automatic boot in progress: starting file system checks.
> /dev/sd0a (d7c346af87544f00.a): 1806 files, 41159 used, 463552 free
> (48 frags, 57938 blocks, 0.0% fragmentation)
> /dev/sd0a (d7c346af87544f00.a): MARKING FILE SYSTEM CLEAN
> panic: init died (signal 11, exit 0)
> 
> If I go to single user mode, I can run fsck -p, but then the / fs is gone...
> that explains why init would die. Investigating further...
> 
>   -Otto
> 

It looks like something is going wrong with the special case for root
filesystems, a so called hot root. See fsck_ffs/main.c

To be continued.

-Otto



Re: amd64: use ffs2 for filesystems created by install

2020-02-20 Thread Otto Moerbeek
On Thu, Feb 20, 2020 at 07:48:25PM +0100, Otto Moerbeek wrote:

> On Thu, Feb 20, 2020 at 07:27:55PM +0100, Matthias Schmidt wrote:
> 
> > Hi Otto,
> > 
> > * Otto Moerbeek wrote:
> > > Hi,
> > > 
> > > This is amd64 only, it contains some i386 pieces, but those are
> > > incomplete.
> > > 
> > > With the diff, install uses ffs2 for the filesystems created during
> > > install. All boot loaders (except the floppy one) contain support for
> > > loading a kernel from ffs2.
> > > 
> > > To test, create a snapshot (see release(8)) with this diff and use it
> > > to install a new system. You could also use the snap at
> > > www.drijf.net/openbsd/66. Note that it is unsigned.
> > > 
> > > Note that when you manually create an fs, it still will be ffs1 by
> > > default. That is to not disturb other platforms. Use -O2 for ffs2.
> > > 
> > > Please test and provide feedback. One think you should see is that the
> > > newfs is much faster and fsck as well, since ffs2 creates inodes
> > > lazily and thus has much less inodes to check in the typical case.
> > 
> > I used your provided snap to do a few installations with VMs.  The
> > following things worked as expected:
> > 
> > * Default install on one disk
> > * Install on softraid crypto disk
> > * Install on softraid 1 with two disks below
> > 
> > I verified each time with dumpfs that FFS2 was used indeed.
> > 
> > I also checked out a large git repo on the first VM into /home and
> > pulled the plug to see how fsck behaves.  After reboot, fsck marked / as
> > clean and then I saw the message that init changed the secure level from
> > 0 to 1, but nothing more happened.  I could type so the system was not
> > hanging, however, it was also not checking /home (which I expected).  I
> > waited for 5 minutes, pulled the plug again and the fsck worked as
> > normal and the system booted to the login prompt.
> > 
> > I did that multiple times and each time it stopped on the first run.
> > After power cycling, everything worked as expected and ... wow, fsck on
> > FFS2 is indeed fast.
> > 
> > Cheers
> > 
> > Matthias
> 
> Thanks for testing. I am seeing the same problem if I do a vmctl stop
> -f of a VM. After that, / gets fscked followed by a hang.  Another reset
> fixes things. Going to check if this happens with a standard snap.
> 
>   -Otto
> 

It odes not happen with an ffs1 root. With ffs2, I'm seeing:

WARNING: / was not properly unmounted
Automatic boot in progress: starting file system checks.
/dev/sd0a (d7c346af87544f00.a): 1806 files, 41159 used, 463552 free
(48 frags, 57938 blocks, 0.0% fragmentation)
/dev/sd0a (d7c346af87544f00.a): MARKING FILE SYSTEM CLEAN
panic: init died (signal 11, exit 0)

If I go to single user mode, I can run fsck -p, but then the / fs is gone...
that explains why init would die. Investigating further...

-Otto



Re: amd64: use ffs2 for filesystems created by install

2020-02-20 Thread Otto Moerbeek
On Thu, Feb 20, 2020 at 07:27:55PM +0100, Matthias Schmidt wrote:

> Hi Otto,
> 
> * Otto Moerbeek wrote:
> > Hi,
> > 
> > This is amd64 only, it contains some i386 pieces, but those are
> > incomplete.
> > 
> > With the diff, install uses ffs2 for the filesystems created during
> > install. All boot loaders (except the floppy one) contain support for
> > loading a kernel from ffs2.
> > 
> > To test, create a snapshot (see release(8)) with this diff and use it
> > to install a new system. You could also use the snap at
> > www.drijf.net/openbsd/66. Note that it is unsigned.
> > 
> > Note that when you manually create an fs, it still will be ffs1 by
> > default. That is to not disturb other platforms. Use -O2 for ffs2.
> > 
> > Please test and provide feedback. One think you should see is that the
> > newfs is much faster and fsck as well, since ffs2 creates inodes
> > lazily and thus has much less inodes to check in the typical case.
> 
> I used your provided snap to do a few installations with VMs.  The
> following things worked as expected:
> 
> * Default install on one disk
> * Install on softraid crypto disk
> * Install on softraid 1 with two disks below
> 
> I verified each time with dumpfs that FFS2 was used indeed.
> 
> I also checked out a large git repo on the first VM into /home and
> pulled the plug to see how fsck behaves.  After reboot, fsck marked / as
> clean and then I saw the message that init changed the secure level from
> 0 to 1, but nothing more happened.  I could type so the system was not
> hanging, however, it was also not checking /home (which I expected).  I
> waited for 5 minutes, pulled the plug again and the fsck worked as
> normal and the system booted to the login prompt.
> 
> I did that multiple times and each time it stopped on the first run.
> After power cycling, everything worked as expected and ... wow, fsck on
> FFS2 is indeed fast.
> 
> Cheers
> 
>   Matthias

Thanks for testing. I am seeing the same problem if I do a vmctl stop
-f of a VM. After that, / gets fscked followed by a hang.  Another reset
fixes things. Going to check if this happens with a standard snap.

-Otto



amd64: use ffs2 for filesystems created by install

2020-02-20 Thread Otto Moerbeek
Hi,

This is amd64 only, it contains some i386 pieces, but those are
incomplete.

With the diff, install uses ffs2 for the filesystems created during
install. All boot loaders (except the floppy one) contain support for
loading a kernel from ffs2.

To test, create a snapshot (see release(8)) with this diff and use it
to install a new system. You could also use the snap at
www.drijf.net/openbsd/66. Note that it is unsigned.

Note that when you manually create an fs, it still will be ffs1 by
default. That is to not disturb other platforms. Use -O2 for ffs2.

Please test and provide feedback. One think you should see is that the
newfs is much faster and fsck as well, since ffs2 creates inodes
lazily and thus has much less inodes to check in the typical case.

-Otto

PS: the snap was built without the new boot version numbers.

Index: distrib/amd64/common/install.md
===
RCS file: /cvs/src/distrib/amd64/common/install.md,v
retrieving revision 1.55
diff -u -p -r1.55 install.md
--- distrib/amd64/common/install.md 28 Jul 2017 18:15:44 -  1.55
+++ distrib/amd64/common/install.md 20 Feb 2020 12:36:11 -
@@ -34,6 +34,8 @@
 MDXAPERTURE=2
 MDXDM=y
 NCPU=$(sysctl -n hw.ncpufound)
+MDFSOPT=-O2
+MDROOTFSOPT=-O2
 
 if dmesg | grep -q 'efifb0 at mainbus0'; then
MDEFI=y
Index: distrib/miniroot/install.sub
===
RCS file: /cvs/src/distrib/miniroot/install.sub,v
retrieving revision 1.1147
diff -u -p -r1.1147 install.sub
--- distrib/miniroot/install.sub2 Feb 2020 20:33:52 -   1.1147
+++ distrib/miniroot/install.sub20 Feb 2020 12:36:11 -
@@ -507,7 +507,7 @@ configure_disk() {
 
# Use machine-dependent newfs options for the root
# partition if defined.
-   _opt=
+   _opt=$MDFSOPT
[[ $_mp == / ]] && _opt=$MDROOTFSOPT
 
newfs -q $_opt ${_pp##/dev/}
@@ -3328,6 +3328,7 @@ umount -af >/dev/null 2>&1
 #
 # The following variables can be provided if required:
 #  MDEFI   - set to 'y' on archs that support GPT partitioning
+#  MDFSOPT - newfs options for non-root partitions
 #  MDROOTFSOPT - newfs options for the root partition
 #  MDSETS  - list of files to add to DEFAULT and ALLSETS
 #  MDSANESETS  - list of files to add to SANESETS
Index: sys/arch/amd64/stand/biosboot/biosboot.S
===
RCS file: /cvs/src/sys/arch/amd64/stand/biosboot/biosboot.S,v
retrieving revision 1.7
diff -u -p -r1.7 biosboot.S
--- sys/arch/amd64/stand/biosboot/biosboot.S5 Jul 2011 17:38:54 -   
1.7
+++ sys/arch/amd64/stand/biosboot/biosboot.S20 Feb 2020 12:36:32 -
@@ -108,6 +108,9 @@
  * While this can be calculated as
  * howmany(di_size, fs_bsize) it takes us too
  * many code bytes to do it.
+ * blkskew uint8t  the skew used to parse di_db[]. this is set to four by
+ * installboot for ffs2 (due to 64-bit blocks) and should
+ * be zero for ffs1.
  *
  * All of these are patched directly into the code where they are used
  * (once only, each), to save space.
@@ -121,7 +124,7 @@
  */
 
.globl  inodeblk, inodedbl, fs_bsize_p, fsbtodb, p_offset, nblocks
-   .globl  fs_bsize_s, force_chs
+   .globl  fs_bsize_s, force_chs, blkskew
.type   inodeblk, @function
.type   inodedbl, @function
.type   fs_bsize_p, @function
@@ -130,6 +133,7 @@
.type   p_offset, @function
.type   nblocks, @function
.type   force_chs, @function
+   .type   blkskew, @function
 
 
 /* Clobbers %ax, maybe more */
@@ -460,6 +464,8 @@ load_blocks:
 
/* Get the next filesystem block number into %eax */
lodsl   /* %eax = *(%si++), make sure 0x66 0xad */
+blkskew = .+2
+   addw$0x90, %si  /* adjust %si if needed (for ffs2) */
 
pushal  /* Save all 32-bit registers */
 
Index: sys/arch/amd64/stand/boot/Makefile
===
RCS file: /cvs/src/sys/arch/amd64/stand/boot/Makefile,v
retrieving revision 1.44
diff -u -p -r1.44 Makefile
--- sys/arch/amd64/stand/boot/Makefile  28 Nov 2019 00:17:10 -  1.44
+++ sys/arch/amd64/stand/boot/Makefile  20 Feb 2020 12:36:32 -
@@ -40,6 +40,9 @@ SRCS+=close.c closeall.c cons.c cread.c
fstat.c lseek.c open.c read.c readdir.c stat.c
 SRCS+= elf32.c elf64.c loadfile.c arc4.c
 SRCS+= ufs.c
+.if empty(COPTS:M-DFDBOOT)
+SRCS+= ufs2.c
+.endif
 .if ${SOFTRAID:L} == "yes"
 SRCS+= aes_xts.c bcrypt_pbkdf.c blowfish.c explicit_bzero.c hmac_sha1.c \
pkcs5_pbkdf2.c rijndael.c sha1.c sha2.c softraid.c
Index: 

Re: ffs1 and the future

2020-02-19 Thread Otto Moerbeek
On Wed, Feb 19, 2020 at 07:23:13PM -0600, Scott Cheloha wrote:

> On Wed, Feb 19, 2020 at 05:26:40PM +0100, Otto Moerbeek wrote:
> > On Wed, Feb 19, 2020 at 05:10:11PM +0100, Otto Moerbeek wrote:
> > 
> > > On Wed, Feb 19, 2020 at 10:02:10AM -0600, Scott Cheloha wrote:
> > > 
> > > > On Wed, Feb 19, 2020 at 04:00:34PM +0100, Otto Moerbeek wrote:
> > > > > 
> > > > > [...]
> > > > > 
> > > > > FFS1, the default filesystem, uses 32-bit signed timestamps on disk.
> > > > > That means that in 2038, there's going to be a problem, timestamps
> > > > > will the be interperet as coming from the start of the 1900's.
> > > > > 
> > > > > FFS2 does not have this limitation, but at the moment, we cannot boot
> > > > > from it. I'm working on that as well, but for now I like to propose a
> > > > > diff that interprets all timestamps in FFS1 as unsigned.
> > > > > 
> > > > > * On disk format dos not change
> > > > > * Current timestamp values do not change
> > > > 
> > > > Doesn't this change the interpretation of timestamps before 1970?
> > > > 
> > > > Humor me:
> > > > 
> > > > # date 19690101
> > > > Wed Jan  1 00:00:00 CST 1969
> > > > # touch test && ls -l test
> > > > -rw-r--r--  1 ssc  ssc  0 Jan  1 00:00 test
> > > > # stat test
> > > > 1038 8878266 -rw-r--r-- 1 ssc ssc 0 0 "Jan  1 00:00:58 1969" "Jan  1 
> > > > 00:00:58 1969" "Jan  1 00:00:58 1969" 32768 0 0 test
> > > > 
> > > > ... so what happens to such files?  The timestamps wrap around?
> > > > 
> > > > I'm not sure if that should prevent us from implementing this stopgap,
> > > > but it's worth considering.
> > > > 
> > > > > I have checked various tools like dump(8) and restore(8), they work
> > > > > properly. Code normally works with the fields from struct stat, which
> > > > > is already 64-bit. I can imagine code setting timestamps to -1
> > > > > explicitly, that could cause surprises.
> > > > > 
> > > > > So I'm asking for wider testing of the diff below.
> > > > 
> > > > One small bug below.
> > > > 
> > > > > Index: ufs/ffs/ffs_alloc.c
> > > > > ===
> > > > > RCS file: /cvs/src/sys/ufs/ffs/ffs_alloc.c,v
> > > > > retrieving revision 1.109
> > > > > diff -u -p -r1.109 ffs_alloc.c
> > > > > --- ufs/ffs/ffs_alloc.c   19 Jul 2019 00:24:31 -  1.109
> > > > > +++ ufs/ffs/ffs_alloc.c   16 Feb 2020 19:33:07 -
> > > > > @@ -888,7 +888,8 @@ ffs_fragextend(struct inode *ip, int cg,
> > > > >   return (0);
> > > > >  
> > > > >   cgp = (struct cg *)bp->b_data;
> > > > > - cgp->cg_ffs2_time = cgp->cg_time = time_second;
> > > > > + cgp->cg_ffs2_time = time_second;
> > > > > + cgp->cg_time = time_second;
> > > > 
> > > > You shouldn't re-read time_second here unless you want to introduce a
> > > > possible difference between cg_ffs2_time and cg_time.
> > > > 
> > > > You should also, in general, avoid time_second.  There is a split-read
> > > > bug on 32-bit platforms at the 2038 cross-over.
> > > > 
> > > > That one I'm less certain about, though.  time_second assignment is
> > > > brief compared to the alternative:
> > > > 
> > > > struct timespec now;
> > > > 
> > > > nanotime();
> > > > cgp->cg_ffs2_time = now.tv_sec;
> > > > cgp->cg_time = now.tv_sec;
> > > > 
> > > > ... and the window for the split-read bug is very small...
> > > > 
> > > > At minimum, don't re-read time_second.
> > > 
> > > OK, I'll change that. Ne aware the filesystems is full of time_second.
> > 
> > It's not as bad as I thought.
> 
> The changes to from time_second(9) to nanotime(9) are ok cheloha@.
> 
> I'm a bit uneasy about changing the sign of the timestamp if the
> ultimate solution is just "switch to ffs2".
> 
> If that's the case, why not make ffs2 bootable and leave ffs1 as-is?

We could do that, but making ffs2 bootable on all platforms is quite a
job. Also, ffs2 has a larger meta-data overhead. So for small
filesystems (think floppy and other bootblocks) it wastes more space.
Same for mfs filesystems.

I like to keep ffs1 usable for the future.

-Otto






Re: ffs1 and the future

2020-02-19 Thread Otto Moerbeek
On Wed, Feb 19, 2020 at 05:10:11PM +0100, Otto Moerbeek wrote:

> On Wed, Feb 19, 2020 at 10:02:10AM -0600, Scott Cheloha wrote:
> 
> > On Wed, Feb 19, 2020 at 04:00:34PM +0100, Otto Moerbeek wrote:
> > > 
> > > [...]
> > > 
> > > FFS1, the default filesystem, uses 32-bit signed timestamps on disk.
> > > That means that in 2038, there's going to be a problem, timestamps
> > > will the be interperet as coming from the start of the 1900's.
> > > 
> > > FFS2 does not have this limitation, but at the moment, we cannot boot
> > > from it. I'm working on that as well, but for now I like to propose a
> > > diff that interprets all timestamps in FFS1 as unsigned.
> > > 
> > > * On disk format dos not change
> > > * Current timestamp values do not change
> > 
> > Doesn't this change the interpretation of timestamps before 1970?
> > 
> > Humor me:
> > 
> > # date 19690101
> > Wed Jan  1 00:00:00 CST 1969
> > # touch test && ls -l test
> > -rw-r--r--  1 ssc  ssc  0 Jan  1 00:00 test
> > # stat test
> > 1038 8878266 -rw-r--r-- 1 ssc ssc 0 0 "Jan  1 00:00:58 1969" "Jan  1 
> > 00:00:58 1969" "Jan  1 00:00:58 1969" 32768 0 0 test
> > 
> > ... so what happens to such files?  The timestamps wrap around?
> > 
> > I'm not sure if that should prevent us from implementing this stopgap,
> > but it's worth considering.
> > 
> > > I have checked various tools like dump(8) and restore(8), they work
> > > properly. Code normally works with the fields from struct stat, which
> > > is already 64-bit. I can imagine code setting timestamps to -1
> > > explicitly, that could cause surprises.
> > > 
> > > So I'm asking for wider testing of the diff below.
> > 
> > One small bug below.
> > 
> > > Index: ufs/ffs/ffs_alloc.c
> > > ===
> > > RCS file: /cvs/src/sys/ufs/ffs/ffs_alloc.c,v
> > > retrieving revision 1.109
> > > diff -u -p -r1.109 ffs_alloc.c
> > > --- ufs/ffs/ffs_alloc.c   19 Jul 2019 00:24:31 -  1.109
> > > +++ ufs/ffs/ffs_alloc.c   16 Feb 2020 19:33:07 -
> > > @@ -888,7 +888,8 @@ ffs_fragextend(struct inode *ip, int cg,
> > >   return (0);
> > >  
> > >   cgp = (struct cg *)bp->b_data;
> > > - cgp->cg_ffs2_time = cgp->cg_time = time_second;
> > > + cgp->cg_ffs2_time = time_second;
> > > + cgp->cg_time = time_second;
> > 
> > You shouldn't re-read time_second here unless you want to introduce a
> > possible difference between cg_ffs2_time and cg_time.
> > 
> > You should also, in general, avoid time_second.  There is a split-read
> > bug on 32-bit platforms at the 2038 cross-over.
> > 
> > That one I'm less certain about, though.  time_second assignment is
> > brief compared to the alternative:
> > 
> > struct timespec now;
> > 
> > nanotime();
> > cgp->cg_ffs2_time = now.tv_sec;
> > cgp->cg_time = now.tv_sec;
> > 
> > ... and the window for the split-read bug is very small...
> > 
> > At minimum, don't re-read time_second.
> 
> OK, I'll change that. Ne aware the filesystems is full of time_second.

It's not as bad as I thought.

-Otto

Index: ufs/ffs/ffs_alloc.c
===
RCS file: /cvs/src/sys/ufs/ffs/ffs_alloc.c,v
retrieving revision 1.109
diff -u -p -r1.109 ffs_alloc.c
--- ufs/ffs/ffs_alloc.c 19 Jul 2019 00:24:31 -  1.109
+++ ufs/ffs/ffs_alloc.c 19 Feb 2020 16:24:23 -
@@ -871,6 +871,7 @@ ffs_fragextend(struct inode *ip, int cg,
struct fs *fs;
struct cg *cgp;
struct buf *bp;
+   struct timespec now;
daddr_t bno;
int i, frags, bbase;
 
@@ -888,7 +889,9 @@ ffs_fragextend(struct inode *ip, int cg,
return (0);
 
cgp = (struct cg *)bp->b_data;
-   cgp->cg_ffs2_time = cgp->cg_time = time_second;
+   nanotime();
+   cgp->cg_ffs2_time = now.tv_sec;
+   cgp->cg_time = now.tv_sec;
 
bno = dtogd(fs, bprev);
for (i = numfrags(fs, osize); i < frags; i++)
@@ -934,6 +937,7 @@ ffs_alloccg(struct inode *ip, int cg, da
struct fs *fs;
struct cg *cgp;
struct buf *bp;
+   struct timespec now;
daddr_t bno, blkno;
int i, frags, allocsiz;
 
@@ -950,7 +954,9 @@ ffs_alloccg(struct inode *ip, int cg, da
return (0);
}
 
-   cgp->cg_ffs2_time = cgp->cg_time = time_second;
+ 

Re: ffs1 and the future

2020-02-19 Thread Otto Moerbeek
On Wed, Feb 19, 2020 at 10:02:10AM -0600, Scott Cheloha wrote:

> On Wed, Feb 19, 2020 at 04:00:34PM +0100, Otto Moerbeek wrote:
> > 
> > [...]
> > 
> > FFS1, the default filesystem, uses 32-bit signed timestamps on disk.
> > That means that in 2038, there's going to be a problem, timestamps
> > will the be interperet as coming from the start of the 1900's.
> > 
> > FFS2 does not have this limitation, but at the moment, we cannot boot
> > from it. I'm working on that as well, but for now I like to propose a
> > diff that interprets all timestamps in FFS1 as unsigned.
> > 
> > * On disk format dos not change
> > * Current timestamp values do not change
> 
> Doesn't this change the interpretation of timestamps before 1970?
> 
> Humor me:
> 
> # date 19690101
> Wed Jan  1 00:00:00 CST 1969
> # touch test && ls -l test
> -rw-r--r--  1 ssc  ssc  0 Jan  1 00:00 test
> # stat test
> 1038 8878266 -rw-r--r-- 1 ssc ssc 0 0 "Jan  1 00:00:58 1969" "Jan  1 00:00:58 
> 1969" "Jan  1 00:00:58 1969" 32768 0 0 test
> 
> ... so what happens to such files?  The timestamps wrap around?
> 
> I'm not sure if that should prevent us from implementing this stopgap,
> but it's worth considering.
> 
> > I have checked various tools like dump(8) and restore(8), they work
> > properly. Code normally works with the fields from struct stat, which
> > is already 64-bit. I can imagine code setting timestamps to -1
> > explicitly, that could cause surprises.
> > 
> > So I'm asking for wider testing of the diff below.
> 
> One small bug below.
> 
> > Index: ufs/ffs/ffs_alloc.c
> > ===
> > RCS file: /cvs/src/sys/ufs/ffs/ffs_alloc.c,v
> > retrieving revision 1.109
> > diff -u -p -r1.109 ffs_alloc.c
> > --- ufs/ffs/ffs_alloc.c 19 Jul 2019 00:24:31 -  1.109
> > +++ ufs/ffs/ffs_alloc.c 16 Feb 2020 19:33:07 -
> > @@ -888,7 +888,8 @@ ffs_fragextend(struct inode *ip, int cg,
> > return (0);
> >  
> > cgp = (struct cg *)bp->b_data;
> > -   cgp->cg_ffs2_time = cgp->cg_time = time_second;
> > +   cgp->cg_ffs2_time = time_second;
> > +   cgp->cg_time = time_second;
> 
> You shouldn't re-read time_second here unless you want to introduce a
> possible difference between cg_ffs2_time and cg_time.
> 
> You should also, in general, avoid time_second.  There is a split-read
> bug on 32-bit platforms at the 2038 cross-over.
> 
> That one I'm less certain about, though.  time_second assignment is
> brief compared to the alternative:
> 
>   struct timespec now;
> 
>   nanotime();
>   cgp->cg_ffs2_time = now.tv_sec;
>   cgp->cg_time = now.tv_sec;
> 
> ... and the window for the split-read bug is very small...
> 
> At minimum, don't re-read time_second.

OK, I'll change that. Ne aware the filesystems is full of time_second.

-Otto

> 
> > bno = dtogd(fs, bprev);
> > for (i = numfrags(fs, osize); i < frags; i++)
> > Index: ufs/ffs/fs.h
> > ===
> > RCS file: /cvs/src/sys/ufs/ffs/fs.h,v
> > retrieving revision 1.42
> > diff -u -p -r1.42 fs.h
> > --- ufs/ffs/fs.h27 Nov 2016 13:27:55 -  1.42
> > +++ ufs/ffs/fs.h16 Feb 2020 19:33:07 -
> > @@ -199,7 +199,7 @@ struct fs {
> > int32_t  fs_dblkno; /* offset of first data / frags */
> > int32_t  fs_cgoffset;   /* cylinder group offset in cylinder */
> > int32_t  fs_cgmask; /* used to calc mod fs_ntrak */
> > -   int32_t  fs_ffs1_time;  /* last time written */
> > +   u_int32_t fs_ffs1_time; /* last time written */
> > int32_t  fs_ffs1_size;  /* # of blocks in fs / frags */
> > int32_t  fs_ffs1_dsize; /* # of data blocks in fs */
> > int32_t  fs_ncg;/* # of cylinder groups */
> > @@ -285,7 +285,7 @@ struct fs {
> > int32_t  fs_avgfpdir;   /* expected # of files per directory */
> > int32_t  fs_sparecon[26];   /* reserved for future constants */
> > u_int32_t fs_flags; /* see FS_ flags below */
> > -   int32_t  fs_fscktime;   /* last time fsck(8)ed */
> > +   u_int32_t fs_fscktime;  /* last time fsck(8)ed */
> > int32_t  fs_contigsumsize;  /* size of cluster summary array */ 
> > int32_t  fs_maxsymlinklen;  /* max length of an internal symlink */
> > int32_t  fs_inodefmt;   /* format of on

Re: ffs1 and the future

2020-02-19 Thread Otto Moerbeek
On Wed, Feb 19, 2020 at 10:02:10AM -0600, Scott Cheloha wrote:

> On Wed, Feb 19, 2020 at 04:00:34PM +0100, Otto Moerbeek wrote:
> > 
> > [...]
> > 
> > FFS1, the default filesystem, uses 32-bit signed timestamps on disk.
> > That means that in 2038, there's going to be a problem, timestamps
> > will the be interperet as coming from the start of the 1900's.
> > 
> > FFS2 does not have this limitation, but at the moment, we cannot boot
> > from it. I'm working on that as well, but for now I like to propose a
> > diff that interprets all timestamps in FFS1 as unsigned.
> > 
> > * On disk format dos not change
> > * Current timestamp values do not change
> 
> Doesn't this change the interpretation of timestamps before 1970?

yes, that seemed obvious to me...

> 
> Humor me:
> 
> # date 19690101
> Wed Jan  1 00:00:00 CST 1969
> # touch test && ls -l test
> -rw-r--r--  1 ssc  ssc  0 Jan  1 00:00 test
> # stat test
> 1038 8878266 -rw-r--r-- 1 ssc ssc 0 0 "Jan  1 00:00:58 1969" "Jan  1 00:00:58 
> 1969" "Jan  1 00:00:58 1969" 32768 0 0 test
> 
> ... so what happens to such files?  The timestamps wrap around?
> 
> I'm not sure if that should prevent us from implementing this stopgap,
> but it's worth considering.
> 
> > I have checked various tools like dump(8) and restore(8), they work
> > properly. Code normally works with the fields from struct stat, which
> > is already 64-bit. I can imagine code setting timestamps to -1
> > explicitly, that could cause surprises.
> > 
> > So I'm asking for wider testing of the diff below.
> 
> One small bug below.
> 
> > Index: ufs/ffs/ffs_alloc.c
> > ===
> > RCS file: /cvs/src/sys/ufs/ffs/ffs_alloc.c,v
> > retrieving revision 1.109
> > diff -u -p -r1.109 ffs_alloc.c
> > --- ufs/ffs/ffs_alloc.c 19 Jul 2019 00:24:31 -  1.109
> > +++ ufs/ffs/ffs_alloc.c 16 Feb 2020 19:33:07 -
> > @@ -888,7 +888,8 @@ ffs_fragextend(struct inode *ip, int cg,
> > return (0);
> >  
> > cgp = (struct cg *)bp->b_data;
> > -   cgp->cg_ffs2_time = cgp->cg_time = time_second;
> > +   cgp->cg_ffs2_time = time_second;
> > +   cgp->cg_time = time_second;
> 
> You shouldn't re-read time_second here unless you want to introduce a
> possible difference between cg_ffs2_time and cg_time.
> 
> You should also, in general, avoid time_second.  There is a split-read
> bug on 32-bit platforms at the 2038 cross-over.
> 
> That one I'm less certain about, though.  time_second assignment is
> brief compared to the alternative:
> 
>   struct timespec now;
> 
>   nanotime();
>   cgp->cg_ffs2_time = now.tv_sec;
>   cgp->cg_time = now.tv_sec;
> 
> ... and the window for the split-read bug is very small...
> 
> At minimum, don't re-read time_second.
> 
> > bno = dtogd(fs, bprev);
> > for (i = numfrags(fs, osize); i < frags; i++)
> > Index: ufs/ffs/fs.h
> > ===
> > RCS file: /cvs/src/sys/ufs/ffs/fs.h,v
> > retrieving revision 1.42
> > diff -u -p -r1.42 fs.h
> > --- ufs/ffs/fs.h27 Nov 2016 13:27:55 -  1.42
> > +++ ufs/ffs/fs.h16 Feb 2020 19:33:07 -
> > @@ -199,7 +199,7 @@ struct fs {
> > int32_t  fs_dblkno; /* offset of first data / frags */
> > int32_t  fs_cgoffset;   /* cylinder group offset in cylinder */
> > int32_t  fs_cgmask; /* used to calc mod fs_ntrak */
> > -   int32_t  fs_ffs1_time;  /* last time written */
> > +   u_int32_t fs_ffs1_time; /* last time written */
> > int32_t  fs_ffs1_size;  /* # of blocks in fs / frags */
> > int32_t  fs_ffs1_dsize; /* # of data blocks in fs */
> > int32_t  fs_ncg;/* # of cylinder groups */
> > @@ -285,7 +285,7 @@ struct fs {
> > int32_t  fs_avgfpdir;   /* expected # of files per directory */
> > int32_t  fs_sparecon[26];   /* reserved for future constants */
> > u_int32_t fs_flags; /* see FS_ flags below */
> > -   int32_t  fs_fscktime;   /* last time fsck(8)ed */
> > +   u_int32_t fs_fscktime;  /* last time fsck(8)ed */
> > int32_t  fs_contigsumsize;  /* size of cluster summary array */ 
> > int32_t  fs_maxsymlinklen;  /* max length of an internal symlink */
> > int32_t  fs_inodefmt;   /* format of on-disk inodes */
> > @@ -376,7 +37

mbr booting from ffs2

2020-02-19 Thread Otto Moerbeek
Hi,

booting from an ffs2 filesystem is a puzzle containing many pieces.
For amd64 and i386 mbr booting, the pieces below are needed.

Lifted from an old bitrig tree. 

Note that this is *not* enough to get thing going since boot(8) and
its variants do not support ffs2 yet, but for this diff I'm only
interested in not breaking existing working mbr boot setups.

-Otto

Index: sys/arch/amd64/stand/biosboot/biosboot.S
===
RCS file: /cvs/src/sys/arch/amd64/stand/biosboot/biosboot.S,v
retrieving revision 1.7
diff -u -p -r1.7 biosboot.S
--- sys/arch/amd64/stand/biosboot/biosboot.S5 Jul 2011 17:38:54 -   
1.7
+++ sys/arch/amd64/stand/biosboot/biosboot.S19 Feb 2020 15:19:55 -
@@ -108,6 +108,9 @@
  * While this can be calculated as
  * howmany(di_size, fs_bsize) it takes us too
  * many code bytes to do it.
+ * blkskew uint8t  the skew used to parse di_db[]. this is set to four by
+ * installboot for ffs2 (due to 64-bit blocks) and should
+ * be zero for ffs1.
  *
  * All of these are patched directly into the code where they are used
  * (once only, each), to save space.
@@ -121,7 +124,7 @@
  */
 
.globl  inodeblk, inodedbl, fs_bsize_p, fsbtodb, p_offset, nblocks
-   .globl  fs_bsize_s, force_chs
+   .globl  fs_bsize_s, force_chs, blkskew
.type   inodeblk, @function
.type   inodedbl, @function
.type   fs_bsize_p, @function
@@ -130,6 +133,7 @@
.type   p_offset, @function
.type   nblocks, @function
.type   force_chs, @function
+   .type   blkskew, @function
 
 
 /* Clobbers %ax, maybe more */
@@ -460,6 +464,8 @@ load_blocks:
 
/* Get the next filesystem block number into %eax */
lodsl   /* %eax = *(%si++), make sure 0x66 0xad */
+blkskew = .+2
+   addw$0x90, %si  /* adjust %si if needed (for ffs2) */
 
pushal  /* Save all 32-bit registers */
 
Index: sys/arch/i386/stand/biosboot/biosboot.S
===
RCS file: /cvs/src/sys/arch/i386/stand/biosboot/biosboot.S,v
retrieving revision 1.41
diff -u -p -r1.41 biosboot.S
--- sys/arch/i386/stand/biosboot/biosboot.S 5 Jul 2011 17:38:54 -   
1.41
+++ sys/arch/i386/stand/biosboot/biosboot.S 19 Feb 2020 15:19:55 -
@@ -108,6 +108,9 @@
  * While this can be calculated as
  * howmany(di_size, fs_bsize) it takes us too
  * many code bytes to do it.
+ * blkskew uint8t  the skew used to parse di_db[]. this is set to four by
+ * installboot for ffs2 (due to 64-bit blocks) and should
+ * be zero for ffs1.
  *
  * All of these are patched directly into the code where they are used
  * (once only, each), to save space.
@@ -121,7 +124,7 @@
  */
 
.globl  inodeblk, inodedbl, fs_bsize_p, fsbtodb, p_offset, nblocks
-   .globl  fs_bsize_s, force_chs
+   .globl  fs_bsize_s, force_chs, blkskew
.type   inodeblk, @function
.type   inodedbl, @function
.type   fs_bsize_p, @function
@@ -130,6 +133,7 @@
.type   p_offset, @function
.type   nblocks, @function
.type   force_chs, @function
+   .type   blkskew, @function
 
 
 /* Clobbers %ax, maybe more */
@@ -460,6 +464,8 @@ load_blocks:
 
/* Get the next filesystem block number into %eax */
lodsl   /* %eax = *(%si++), make sure 0x66 0xad */
+blkskew = .+2
+   addw$0x90, %si  /* adjust %si if needed (for ffs2) */
 
pushal  /* Save all 32-bit registers */
 
Index: usr.sbin/installboot/i386_installboot.c
===
RCS file: /cvs/src/usr.sbin/installboot/i386_installboot.c,v
retrieving revision 1.33
diff -u -p -r1.33 i386_installboot.c
--- usr.sbin/installboot/i386_installboot.c 2 Sep 2019 16:36:12 -   
1.33
+++ usr.sbin/installboot/i386_installboot.c 19 Feb 2020 15:19:55 -
@@ -2,6 +2,7 @@
 /* $NetBSD: installboot.c,v 1.5 1995/11/17 23:23:50 gwr Exp $ */
 
 /*
+ * Copyright (c) 2013 Pedro Martelletto
  * Copyright (c) 2011 Joel Sing 
  * Copyright (c) 2003 Tom Cosgrove 
  * Copyright (c) 1997 Michael Shalayeff
@@ -82,6 +83,7 @@ struct sym_data pbr_symbols[] = {
{"_inodeblk",   4},
{"_inodedbl",   4},
{"_nblocks",2},
+   {"_blkskew",1},
{NULL}
 };
 
@@ -90,6 +92,10 @@ static u_int findopenbsd(int, struct dis
 static int getbootparams(char *, int, struct disklabel *);
 static char*loadproto(char *, long *);
 static int gpt_chk_mbr(struct dos_partition *, u_int64_t);
+static int sbchk(struct fs *, daddr_t);
+static voidsbread(int, daddr_t, struct fs **, char *);
+

ffs1 and the future

2020-02-19 Thread Otto Moerbeek
Hoi,

FFS1, the default filesystem, uses 32-bit signed timestamps on disk.
That means that in 2038, there's going to be a problem, timestamps
will the be interperet as coming from the start of the 1900's.

FFS2 does not have this limitation, but at the moment, we cannot boot
from it. I'm working on that as well, but for now I like to propose a
diff that interprets all timestamps in FFS1 as unsigned.

* On disk format dos not change
* Current timestamp values do not change

I have checked various tools like dump(8) and restore(8), they work
properly. Code normally works with the fields from struct stat, which
is already 64-bit. I can imagine code setting timestamps to -1
explicitly, that could cause surprises.

So I'm asking for wider testing of the diff below.

-Otto

Index: ufs/ffs/ffs_alloc.c
===
RCS file: /cvs/src/sys/ufs/ffs/ffs_alloc.c,v
retrieving revision 1.109
diff -u -p -r1.109 ffs_alloc.c
--- ufs/ffs/ffs_alloc.c 19 Jul 2019 00:24:31 -  1.109
+++ ufs/ffs/ffs_alloc.c 16 Feb 2020 19:33:07 -
@@ -888,7 +888,8 @@ ffs_fragextend(struct inode *ip, int cg,
return (0);
 
cgp = (struct cg *)bp->b_data;
-   cgp->cg_ffs2_time = cgp->cg_time = time_second;
+   cgp->cg_ffs2_time = time_second;
+   cgp->cg_time = time_second;
 
bno = dtogd(fs, bprev);
for (i = numfrags(fs, osize); i < frags; i++)
Index: ufs/ffs/fs.h
===
RCS file: /cvs/src/sys/ufs/ffs/fs.h,v
retrieving revision 1.42
diff -u -p -r1.42 fs.h
--- ufs/ffs/fs.h27 Nov 2016 13:27:55 -  1.42
+++ ufs/ffs/fs.h16 Feb 2020 19:33:07 -
@@ -199,7 +199,7 @@ struct fs {
int32_t  fs_dblkno; /* offset of first data / frags */
int32_t  fs_cgoffset;   /* cylinder group offset in cylinder */
int32_t  fs_cgmask; /* used to calc mod fs_ntrak */
-   int32_t  fs_ffs1_time;  /* last time written */
+   u_int32_t fs_ffs1_time; /* last time written */
int32_t  fs_ffs1_size;  /* # of blocks in fs / frags */
int32_t  fs_ffs1_dsize; /* # of data blocks in fs */
int32_t  fs_ncg;/* # of cylinder groups */
@@ -285,7 +285,7 @@ struct fs {
int32_t  fs_avgfpdir;   /* expected # of files per directory */
int32_t  fs_sparecon[26];   /* reserved for future constants */
u_int32_t fs_flags; /* see FS_ flags below */
-   int32_t  fs_fscktime;   /* last time fsck(8)ed */
+   u_int32_t fs_fscktime;  /* last time fsck(8)ed */
int32_t  fs_contigsumsize;  /* size of cluster summary array */ 
int32_t  fs_maxsymlinklen;  /* max length of an internal symlink */
int32_t  fs_inodefmt;   /* format of on-disk inodes */
@@ -376,7 +376,7 @@ struct fs {
 struct cg {
int32_t  cg_firstfield; /* historic cyl groups linked list */
int32_t  cg_magic;  /* magic number */
-   int32_t  cg_time;   /* time last written */
+   u_int32_t cg_time;  /* time last written */
int32_t  cg_cgx;/* we are the cgx'th cylinder group */
int16_t  cg_ncyl;   /* number of cyl's this cg */
int16_t  cg_niblk;  /* number of inode blocks this cg */
Index: ufs/ufs/dinode.h
===
RCS file: /cvs/src/sys/ufs/ufs/dinode.h,v
retrieving revision 1.18
diff -u -p -r1.18 dinode.h
--- ufs/ufs/dinode.h30 May 2013 19:19:09 -  1.18
+++ ufs/ufs/dinode.h16 Feb 2020 19:33:07 -
@@ -72,11 +72,11 @@ struct  ufs1_dinode {
u_int32_t inumber;  /*   4: Lfs: inode number. */
} di_u;
u_int64_t   di_size;/*   8: File byte count. */
-   int32_t di_atime;   /*  16: Last access time. */
+   u_int32_t   di_atime;   /*  16: Last access time. */
int32_t di_atimensec;   /*  20: Last access time. */
-   int32_t di_mtime;   /*  24: Last modified time. */
+   u_int32_t   di_mtime;   /*  24: Last modified time. */
int32_t di_mtimensec;   /*  28: Last modified time. */
-   int32_t di_ctime;   /*  32: Last inode change time. */
+   u_int32_t   di_ctime;   /*  32: Last inode change time. */
int32_t di_ctimensec;   /*  36: Last inode change time. */
int32_t di_db[NDADDR];  /*  40: Direct disk blocks. */
int32_t di_ib[NIADDR];  /*  88: Indirect disk blocks. */



dumpfs: don't pick alternate superblock

2020-02-16 Thread Otto Moerbeek
Hi,

If the block size is 64k, the first alternate ffs1 superblock ends up
in a location first looked at by dumpfs.

fsck_ffs(8) (see setup.c) and ffs_mountfs() in
sys/ufs/ffs/ffs_vfsops.c have protection against that case, since we
really want the primary superblock, that's the one that is actually
updated when a fs is used.

So also do that in dumpfs(8).

OK?

-Otto

Index: dumpfs.c
===
RCS file: /cvs/src/sbin/dumpfs/dumpfs.c,v
retrieving revision 1.34
diff -u -p -r1.34 dumpfs.c
--- dumpfs.c28 Jun 2019 13:32:43 -  1.34
+++ dumpfs.c16 Feb 2020 12:24:31 -
@@ -139,6 +139,8 @@ open_disk(const char *name)
if (n == SBLOCKSIZE && (afs.fs_magic == FS_UFS1_MAGIC ||
(afs.fs_magic == FS_UFS2_MAGIC &&
afs.fs_sblockloc == sbtry[i])) &&
+   !(afs.fs_magic == FS_UFS1_MAGIC &&
+   sbtry[i] == SBLOCK_UFS2) &&
afs.fs_bsize <= MAXBSIZE &&
afs.fs_bsize >= sizeof(struct fs))
break;



Re: ntpd and 2036

2020-01-30 Thread Otto Moerbeek
On Mon, Jan 20, 2020 at 07:08:26AM +0100, Otto Moerbeek wrote:

> On Fri, Jan 10, 2020 at 03:14:42PM +0100, Otto Moerbeek wrote:
> 
> > Hi,
> > 
> > THe ntp protocol uses 32-bit unsigned timestamps counting seconds
> > since 1900. That means that in 2036 the timestamp field will wrap.
> > This difff makes sure ntpd handles that correctly by assuming we are
> > in era 0 unless we see "small" timestamps.
> > 
> > tested in the future (incuding wrapping form era 0 to 1) on a couple
> > of machines including one running xntpd for interoperability.
> > 
> > ok?
> 
> ping...

Firts post on Jan 10th, zero feedback. I think I'll commit and let the
community do the testing. We'll have 16 years to fix the bugs.

-Otto

> 
> > 
> > Index: client.c
> > ===
> > RCS file: /cvs/src/usr.sbin/ntpd/client.c,v
> > retrieving revision 1.112
> > diff -u -p -r1.112 client.c
> > --- client.c10 Nov 2019 19:24:47 -  1.112
> > +++ client.c10 Jan 2020 14:06:14 -
> > @@ -324,12 +324,6 @@ client_dispatch(struct ntp_peer *p, u_in
> > }
> > }
> >  
> > -   if (T4 < JAN_1970) {
> > -   client_log_error(p, "recvmsg control format", EBADF);
> > -   set_next(p, error_interval());
> > -   return (0);
> > -   }
> > -
> > ntp_getmsg((struct sockaddr *)>addr->ss, buf, size, );
> >  
> > if (msg.orgtime.int_partl != p->query->msg.xmttime.int_partl ||
> > @@ -374,16 +368,6 @@ client_dispatch(struct ntp_peer *p, u_in
> > T1 = p->query->xmttime;
> > T2 = lfp_to_d(msg.rectime);
> > T3 = lfp_to_d(msg.xmttime);
> > -
> > -   /*
> > -* XXX workaround: time_t / tv_sec must never wrap.
> > -* around 2020 we will need a solution (64bit time_t / tv_sec).
> > -* consider every answer with a timestamp beyond january 2030 bogus.
> > -*/
> > -   if (T2 > JAN_2030 || T3 > JAN_2030) {
> > -   set_next(p, error_interval());
> > -   return (0);
> > -   }
> >  
> > /* Detect liars */
> > if (!p->trusted && conf->constraint_median != 0 &&
> > Index: ntp.h
> > ===
> > RCS file: /cvs/src/usr.sbin/ntpd/ntp.h,v
> > retrieving revision 1.13
> > diff -u -p -r1.13 ntp.h
> > --- ntp.h   22 Apr 2009 07:42:17 -  1.13
> > +++ ntp.h   10 Jan 2020 14:06:14 -
> > @@ -141,7 +141,19 @@ struct ntp_query {
> >  #defineMODE_RES2   7   /* reserved for private use */
> >  
> >  #defineJAN_19702208988800UL/* 1970 - 1900 in seconds */
> > -#defineJAN_20301893456000UL + JAN_1970 /* 1. 1. 2030 00:00:00 
> > */
> > +
> > +/*
> > + * The era we're in if we have no reason to assume otherwise.
> > + * If lfp_to_d() sees an offset <= INT32_MAX the era is is assumed to be
> > + * NTP_ERA + 1.
> > + * Once the actual year is well into era 1, (after 2036) define NTP_ERA to 
> > 1
> > + * and adapt (remove) the test in lfp_to_d().
> > + * Once more than half of era 1 has elapsed (after 2104), re-inroduce the 
> > test
> > + * to move to era 2 if offset <= INT32_MAX, repeat for each half era.
> > + */
> > +#define NTP_ERA0
> > +
> > +#define SECS_IN_ERA(UINT32_MAX + 1ULL)
> >  
> >  #defineNTP_VERSION 4
> >  #defineNTP_MAXSTRATUM  15
> > Index: util.c
> > ===
> > RCS file: /cvs/src/usr.sbin/ntpd/util.c,v
> > retrieving revision 1.24
> > diff -u -p -r1.24 util.c
> > --- util.c  1 Mar 2017 00:56:30 -   1.24
> > +++ util.c  10 Jan 2020 14:06:14 -
> > @@ -86,12 +86,17 @@ d_to_tv(double d, struct timeval *tv)
> >  double
> >  lfp_to_d(struct l_fixedpt lfp)
> >  {
> > -   double  ret;
> > +   double  base, ret;
> >  
> > lfp.int_partl = ntohl(lfp.int_partl);
> > lfp.fractionl = ntohl(lfp.fractionl);
> >  
> > -   ret = (double)(lfp.int_partl) + ((double)lfp.fractionl / UINT_MAX);
> > +   /* see comment in ntp.h */
> > +   base = NTP_ERA;
> > +   if (lfp.int_partl <= INT32_MAX)
> > +   base++; 
> > +   ret = base * SECS_IN_ERA;
> > +   ret += (double)(lfp.int_partl) + ((double)lfp.fractionl / UINT_MAX);
> >  
> > return (ret);
> >  }
> > @@ -101,6 +106,8 @@ d_to_lfp(double d)
> >  {
> > struct l_fixedptlfp;
> >  
> > +   while (d > SECS_IN_ERA)
> > +   d -= SECS_IN_ERA;
> > lfp.int_partl = htonl((u_int32_t)d);
> > lfp.fractionl = htonl((u_int32_t)((d - (u_int32_t)d) * UINT_MAX));
> >  
> > 
> > 
> 



Re: Update disklabel(8) man page

2020-01-28 Thread Otto Moerbeek
On Tue, Jan 28, 2020 at 06:43:38PM +0100, Martin wrote:

> Hi
> 
> Attached a diff to bring the disklabel man page up to date. Information
> taken directly from editor.c line 95.

Committed, thanks,

-Otto

> 
> Best,
> 
> Martin
> 
> Index: disklabel.8
> ===
> RCS file: /cvs/src/sbin/disklabel/disklabel.8,v
> retrieving revision 1.138
> diff -u -p -r1.138 disklabel.8
> --- disklabel.8   19 Dec 2019 09:38:03 -  1.138
> +++ disklabel.8   28 Jan 2020 17:42:08 -
> @@ -529,7 +529,7 @@ and may vary from architecture to archit
>  swap 10% of disk.   80M \(en 2x max physical memory
>  /tmp  8% of disk.  120M \(en 4G
>  /var 13% of disk.   80M \(en 2x size of crash dump
> -/usr 10% of disk. 1300M \(en 6G
> +/usr 10% of disk. 1500M \(en 6G
>  /usr/X11R63% of disk.  384M \(en 1G
>  /usr/local   15% of disk.1G \(en 20G
>  /usr/src  2% of disk. 1300M \(en 2G
> 



Re: snmpd(8) timer.c garbage collect

2020-01-22 Thread Otto Moerbeek
On Wed, Jan 22, 2020 at 10:51:45AM +0100, Martijn van Duren wrote:

> Trying to wrap my head around some of the snmpd code I found this pearl
> that appears to do nothing more than warm up the room.

Do you really want to get rid of the init of snmpd_env->sc_cpustates ?

-Otto

> 
> OK?
> 
> martijn@
> 
> Index: Makefile
> ===
> RCS file: /cvs/src/usr.sbin/snmpd/Makefile,v
> retrieving revision 1.16
> diff -u -p -r1.16 Makefile
> --- Makefile  11 May 2019 17:46:02 -  1.16
> +++ Makefile  22 Jan 2020 09:50:51 -
> @@ -3,7 +3,7 @@
>  PROG=snmpd
>  MAN= snmpd.8 snmpd.conf.5
>  SRCS=parse.y log.c control.c snmpe.c \
> - mps.c trap.c mib.c smi.c kroute.c snmpd.c timer.c \
> + mps.c trap.c mib.c smi.c kroute.c snmpd.c \
>   pf.c proc.c usm.c agentx.c traphandler.c util.c
>  
>  LDADD=   -levent -lutil -lkvm -lcrypto
> Index: snmpd.h
> ===
> RCS file: /cvs/src/usr.sbin/snmpd/snmpd.h,v
> retrieving revision 1.86
> diff -u -p -r1.86 snmpd.h
> --- snmpd.h   2 Jan 2020 10:55:53 -   1.86
> +++ snmpd.h   22 Jan 2020 09:50:51 -
> @@ -745,9 +745,6 @@ unsigned int   smi_application(struct ber
>  void  smi_debug_elements(struct ber_element *);
>  char *smi_print_element(struct ber_element *);
>  
> -/* timer.c */
> -void  timer_init(void);
> -
>  /* snmpd.c */
>  int   snmpd_socket_af(struct sockaddr_storage *, in_port_t, int);
>  u_longsnmpd_engine_time(void);
> Index: snmpe.c
> ===
> RCS file: /cvs/src/usr.sbin/snmpd/snmpe.c,v
> retrieving revision 1.60
> diff -u -p -r1.60 snmpe.c
> --- snmpe.c   24 Oct 2019 12:39:27 -  1.60
> +++ snmpe.c   22 Jan 2020 09:50:51 -
> @@ -103,7 +103,6 @@ snmpe_init(struct privsep *ps, struct pr
>  
>   kr_init();
>   trap_init();
> - timer_init();
>   usm_generate_keys();
>  
>   /* listen for incoming SNMP UDP/TCP messages */
> Index: timer.c
> ===
> RCS file: timer.c
> diff -N timer.c
> --- timer.c   28 Oct 2016 08:01:53 -  1.7
> +++ /dev/null 1 Jan 1970 00:00:00 -
> @@ -1,169 +0,0 @@
> -/*   $OpenBSD: timer.c,v 1.7 2016/10/28 08:01:53 rzalamena Exp $ */
> -
> -/*
> - * Copyright (c) 2008 Reyk Floeter 
> - *
> - * Permission to use, copy, modify, and distribute this software for any
> - * purpose with or without fee is hereby granted, provided that the above
> - * copyright notice and this permission notice appear in all copies.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
> - * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
> - * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
> - * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> - * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
> - * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
> - * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> - */
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#include "snmpd.h"
> -#include "mib.h"
> -
> -void  timer_cpu(int, short, void *);
> -int   percentages(int, int64_t *, int64_t *, int64_t *, int64_t *);
> -
> -static int64_t   **cp_time;
> -static int64_t   **cp_old;
> -static int64_t   **cp_diff;
> -struct event   cpu_ev;
> -
> -void
> -timer_cpu(int fd, short event, void *arg)
> -{
> - struct event*ev = (struct event *)arg;
> - struct timeval   tv = { 60, 0 };/* every 60 seconds */
> - int  mib[3] = { CTL_KERN, KERN_CPTIME2, 0 }, n;
> - size_t   len;
> - int64_t *cptime2;
> -
> - len = CPUSTATES * sizeof(int64_t);
> - for (n = 0; n < snmpd_env->sc_ncpu; n++) {
> - mib[2] = n;
> - cptime2 = snmpd_env->sc_cpustates + (CPUSTATES * n);
> - if (sysctl(mib, 3, cp_time[n], , NULL, 0) == -1)
> - continue;
> - (void)percentages(CPUSTATES, cptime2, cp_time[n],
> - cp_old[n], cp_diff[n]);
> -#ifdef DEBUG
> - log_debug("timer_cpu: cpu%d %lld%% idle in %llds", n,
> - (cptime2[CP_IDLE] > 1000 ?
> - 1000 : (cptime2[CP_IDLE] / 10)), (long long) tv.tv_sec);
> -#endif
> - }
> -
> - evtimer_add(ev, );
> -}
> -
> -void
> -timer_init(void)
> -{
> - int  mib[] = { CTL_HW, HW_NCPU }, i;
> - size_t   len;
> -
> - len = 

Re: ntpd and 2036

2020-01-19 Thread Otto Moerbeek
On Fri, Jan 10, 2020 at 03:14:42PM +0100, Otto Moerbeek wrote:

> Hi,
> 
> THe ntp protocol uses 32-bit unsigned timestamps counting seconds
> since 1900. That means that in 2036 the timestamp field will wrap.
> This difff makes sure ntpd handles that correctly by assuming we are
> in era 0 unless we see "small" timestamps.
> 
> tested in the future (incuding wrapping form era 0 to 1) on a couple
> of machines including one running xntpd for interoperability.
> 
> ok?

ping...

-Otto

> 
> Index: client.c
> ===
> RCS file: /cvs/src/usr.sbin/ntpd/client.c,v
> retrieving revision 1.112
> diff -u -p -r1.112 client.c
> --- client.c  10 Nov 2019 19:24:47 -  1.112
> +++ client.c  10 Jan 2020 14:06:14 -
> @@ -324,12 +324,6 @@ client_dispatch(struct ntp_peer *p, u_in
>   }
>   }
>  
> - if (T4 < JAN_1970) {
> - client_log_error(p, "recvmsg control format", EBADF);
> - set_next(p, error_interval());
> - return (0);
> - }
> -
>   ntp_getmsg((struct sockaddr *)>addr->ss, buf, size, );
>  
>   if (msg.orgtime.int_partl != p->query->msg.xmttime.int_partl ||
> @@ -374,16 +368,6 @@ client_dispatch(struct ntp_peer *p, u_in
>   T1 = p->query->xmttime;
>   T2 = lfp_to_d(msg.rectime);
>   T3 = lfp_to_d(msg.xmttime);
> -
> - /*
> -  * XXX workaround: time_t / tv_sec must never wrap.
> -  * around 2020 we will need a solution (64bit time_t / tv_sec).
> -  * consider every answer with a timestamp beyond january 2030 bogus.
> -  */
> - if (T2 > JAN_2030 || T3 > JAN_2030) {
> - set_next(p, error_interval());
> - return (0);
> - }
>  
>   /* Detect liars */
>   if (!p->trusted && conf->constraint_median != 0 &&
> Index: ntp.h
> ===
> RCS file: /cvs/src/usr.sbin/ntpd/ntp.h,v
> retrieving revision 1.13
> diff -u -p -r1.13 ntp.h
> --- ntp.h 22 Apr 2009 07:42:17 -  1.13
> +++ ntp.h 10 Jan 2020 14:06:14 -
> @@ -141,7 +141,19 @@ struct ntp_query {
>  #define  MODE_RES2   7   /* reserved for private use */
>  
>  #define  JAN_19702208988800UL/* 1970 - 1900 in seconds */
> -#define  JAN_20301893456000UL + JAN_1970 /* 1. 1. 2030 00:00:00 
> */
> +
> +/*
> + * The era we're in if we have no reason to assume otherwise.
> + * If lfp_to_d() sees an offset <= INT32_MAX the era is is assumed to be
> + * NTP_ERA + 1.
> + * Once the actual year is well into era 1, (after 2036) define NTP_ERA to 1
> + * and adapt (remove) the test in lfp_to_d().
> + * Once more than half of era 1 has elapsed (after 2104), re-inroduce the 
> test
> + * to move to era 2 if offset <= INT32_MAX, repeat for each half era.
> + */
> +#define NTP_ERA  0
> +
> +#define SECS_IN_ERA  (UINT32_MAX + 1ULL)
>  
>  #define  NTP_VERSION 4
>  #define  NTP_MAXSTRATUM  15
> Index: util.c
> ===
> RCS file: /cvs/src/usr.sbin/ntpd/util.c,v
> retrieving revision 1.24
> diff -u -p -r1.24 util.c
> --- util.c1 Mar 2017 00:56:30 -   1.24
> +++ util.c10 Jan 2020 14:06:14 -
> @@ -86,12 +86,17 @@ d_to_tv(double d, struct timeval *tv)
>  double
>  lfp_to_d(struct l_fixedpt lfp)
>  {
> - double  ret;
> + double  base, ret;
>  
>   lfp.int_partl = ntohl(lfp.int_partl);
>   lfp.fractionl = ntohl(lfp.fractionl);
>  
> - ret = (double)(lfp.int_partl) + ((double)lfp.fractionl / UINT_MAX);
> + /* see comment in ntp.h */
> + base = NTP_ERA;
> + if (lfp.int_partl <= INT32_MAX)
> + base++; 
> + ret = base * SECS_IN_ERA;
> + ret += (double)(lfp.int_partl) + ((double)lfp.fractionl / UINT_MAX);
>  
>   return (ret);
>  }
> @@ -101,6 +106,8 @@ d_to_lfp(double d)
>  {
>   struct l_fixedptlfp;
>  
> + while (d > SECS_IN_ERA)
> + d -= SECS_IN_ERA;
>   lfp.int_partl = htonl((u_int32_t)d);
>   lfp.fractionl = htonl((u_int32_t)((d - (u_int32_t)d) * UINT_MAX));
>  
> 
> 



ntpd and 2036

2020-01-10 Thread Otto Moerbeek
Hi,

THe ntp protocol uses 32-bit unsigned timestamps counting seconds
since 1900. That means that in 2036 the timestamp field will wrap.
This difff makes sure ntpd handles that correctly by assuming we are
in era 0 unless we see "small" timestamps.

tested in the future (incuding wrapping form era 0 to 1) on a couple
of machines including one running xntpd for interoperability.

ok?

-Otto

Index: client.c
===
RCS file: /cvs/src/usr.sbin/ntpd/client.c,v
retrieving revision 1.112
diff -u -p -r1.112 client.c
--- client.c10 Nov 2019 19:24:47 -  1.112
+++ client.c10 Jan 2020 14:06:14 -
@@ -324,12 +324,6 @@ client_dispatch(struct ntp_peer *p, u_in
}
}
 
-   if (T4 < JAN_1970) {
-   client_log_error(p, "recvmsg control format", EBADF);
-   set_next(p, error_interval());
-   return (0);
-   }
-
ntp_getmsg((struct sockaddr *)>addr->ss, buf, size, );
 
if (msg.orgtime.int_partl != p->query->msg.xmttime.int_partl ||
@@ -374,16 +368,6 @@ client_dispatch(struct ntp_peer *p, u_in
T1 = p->query->xmttime;
T2 = lfp_to_d(msg.rectime);
T3 = lfp_to_d(msg.xmttime);
-
-   /*
-* XXX workaround: time_t / tv_sec must never wrap.
-* around 2020 we will need a solution (64bit time_t / tv_sec).
-* consider every answer with a timestamp beyond january 2030 bogus.
-*/
-   if (T2 > JAN_2030 || T3 > JAN_2030) {
-   set_next(p, error_interval());
-   return (0);
-   }
 
/* Detect liars */
if (!p->trusted && conf->constraint_median != 0 &&
Index: ntp.h
===
RCS file: /cvs/src/usr.sbin/ntpd/ntp.h,v
retrieving revision 1.13
diff -u -p -r1.13 ntp.h
--- ntp.h   22 Apr 2009 07:42:17 -  1.13
+++ ntp.h   10 Jan 2020 14:06:14 -
@@ -141,7 +141,19 @@ struct ntp_query {
 #defineMODE_RES2   7   /* reserved for private use */
 
 #defineJAN_19702208988800UL/* 1970 - 1900 in seconds */
-#defineJAN_20301893456000UL + JAN_1970 /* 1. 1. 2030 00:00:00 
*/
+
+/*
+ * The era we're in if we have no reason to assume otherwise.
+ * If lfp_to_d() sees an offset <= INT32_MAX the era is is assumed to be
+ * NTP_ERA + 1.
+ * Once the actual year is well into era 1, (after 2036) define NTP_ERA to 1
+ * and adapt (remove) the test in lfp_to_d().
+ * Once more than half of era 1 has elapsed (after 2104), re-inroduce the test
+ * to move to era 2 if offset <= INT32_MAX, repeat for each half era.
+ */
+#define NTP_ERA0
+
+#define SECS_IN_ERA(UINT32_MAX + 1ULL)
 
 #defineNTP_VERSION 4
 #defineNTP_MAXSTRATUM  15
Index: util.c
===
RCS file: /cvs/src/usr.sbin/ntpd/util.c,v
retrieving revision 1.24
diff -u -p -r1.24 util.c
--- util.c  1 Mar 2017 00:56:30 -   1.24
+++ util.c  10 Jan 2020 14:06:14 -
@@ -86,12 +86,17 @@ d_to_tv(double d, struct timeval *tv)
 double
 lfp_to_d(struct l_fixedpt lfp)
 {
-   double  ret;
+   double  base, ret;
 
lfp.int_partl = ntohl(lfp.int_partl);
lfp.fractionl = ntohl(lfp.fractionl);
 
-   ret = (double)(lfp.int_partl) + ((double)lfp.fractionl / UINT_MAX);
+   /* see comment in ntp.h */
+   base = NTP_ERA;
+   if (lfp.int_partl <= INT32_MAX)
+   base++; 
+   ret = base * SECS_IN_ERA;
+   ret += (double)(lfp.int_partl) + ((double)lfp.fractionl / UINT_MAX);
 
return (ret);
 }
@@ -101,6 +106,8 @@ d_to_lfp(double d)
 {
struct l_fixedptlfp;
 
+   while (d > SECS_IN_ERA)
+   d -= SECS_IN_ERA;
lfp.int_partl = htonl((u_int32_t)d);
lfp.fractionl = htonl((u_int32_t)((d - (u_int32_t)d) * UINT_MAX));
 




Re: backgrounded ssh, strange terminal behaviour

2019-12-10 Thread Otto Moerbeek
On Tue, Dec 10, 2019 at 01:56:29PM +, Stuart Henderson wrote:

> Not new (it happens in at least 6.6) but I just noticed this.
> 
> If I run some program via ssh command-line ("ssh localhost sleep 60"
> is good enough), then put it in the background (^Z bg), the terminal
> misses about a third of characters typed.
> 
> typed:123456789012345678901234567890
> accepted: 24680246891235780235790
> 
> Should this be treated as something more than "don't do that then"?
> 

AFAKS this only happens when using a mutiplexed (via ContorlMaster) connection.

-Otto



Re: [PATCH] correcting in-sane ntpd.conf

2019-12-08 Thread Otto Moerbeek
On Sun, Dec 08, 2019 at 11:15:55AM +0100, List wrote:

> Please excuse that I wasted your time. You're absolutely right.
> 
> The only thing that comes to my mind is that one could add something
> like a small notice that tells the new user to maybe alter his ntpd
> constraints to a "TLS-Provider" that resides in his time zone. 
> A good place for that could be the welcoming mail, which already
> describes some first steps. 

Why? You can travel the internet many times around and still be withing
the bounds the constraint checking allows. As for response time, google
anycast is pretty good at that.

-Otto

> 
> 
> On Sat, Dec 07, 2019 at 11:25:48AM -0700, Theo de Raadt wrote:
> > >That might be the case. 
> > >The man page creates the impression that my ntpd will carry out a TLS
> > >Handshake with "https://www.google.com;. Out of that handshake (because
> > >it is anycast) you get your approximate local time. Which serves as
> > >vague measuring point for answers by the ntp servers that you are
> > >querying. But the suggestion I made is absolutely 100 % wrong.
> > >
> > >Would it be an option to choose another Anycast resolving address ?
> > >For example akami.net ? 
> > 
> > akami.net has no https.
> > maybe you mean akamai.net?  again, no https.
> > 
> > many akamai services come out of less capable caches, not making the
> > same effective certificate promises as the google front-end.  would
> > you notice if an akamai service did a certificate downgrade? not
> > really.  i don't think the proposal is serious.
> > 
> > as a result we use quad9 and google https because their global
> > adjacency is excellent, and then we are avoiding cloudflare https
> > because we added their ticker in the mix (though their anycast ticker
> > is a very weird thing)
> > 
> > >g Stephan
> > >
> > >On Thu, Dec 05, 2019 at 03:03:43PM -0700, Theo de Raadt wrote:
> > >> I guess you don't understand what is going on there.
> > >> 
> > >> List  wrote:
> > >> 
> > >> > Hello, 
> > >> > 
> > >> > here a diff replacing www.google.com as a default time constraint by
> > >> > www.openbsd.org.
> > >> > It is claimed that OpenBSD would have sane and secure defaults. While
> > >> > www.google.com might be secure it ain't sane from a privacy concerned
> > >> > perspective. Therefore the diff. 
> > >> > 
> > >> > Regards,
> > >> > Stephan
> > >> > 
> > >> > Index: etc/ntpd.conf
> > >> > ===
> > >> > RCS file: /cvs/src/etc/ntpd.conf,v
> > >> > retrieving revision 1.16
> > >> > diff -u -p -r1.16 ntpd.conf
> > >> > --- etc/ntpd.conf   6 Nov 2019 19:04:12 -   1.16
> > >> > +++ etc/ntpd.conf   5 Dec 2019 21:36:57 -
> > >> > @@ -8,4 +8,4 @@ sensor *
> > >> >  
> > >> >   constraint from "9.9.9.9"  # quad9 v4 without DNS
> > >> >constraint from "2620:fe::fe"  # quad9 v6 without DNS
> > >> >-constraints from "www.google.com"  # intentionally not 8.8.8.8
> > >> >+constraints from "www.openbsd.org"  # intentionally not Google
> > >> > 
> > >> 
> > >
> 



Re: un-boolean_t i386's pmap

2019-12-05 Thread Otto Moerbeek
On Thu, Dec 05, 2019 at 04:12:01PM +0100, Martin Pieuchot wrote:

> On 05/12/19(Thu) 11:57, Otto Moerbeek wrote:
> > On Thu, Dec 05, 2019 at 12:38:34PM +0100, Martin Pieuchot wrote:
> > 
> > > ok?
> > 
> > I'm no kernel hacker but I really do not see the point.
> 
> Most of the kernel doesn't use any type for boolean.  The exception is
> UVM which uses its own boolean_t.  This type is inconsistently used in
> some pmap(9) functions as well.
> 
> I'm well aware of the arguments in favor of a boolean type as well as
> the arguments against.  I'm not taking any position, I'm striving for
> coherency.
> 
> On top of that, reducing the dependencies between the UVM and pmap
> layers help to draw a line.  That's what I'm after. 
> 

I grepped the kernel tree and see that uvm and the various pmap
subdirs are indeed the odd ones out. So go ahead, I'll go back to
userland grumbling about the many examples of wrong boolean usage in
mg ;-)

-Otto



Re: un-boolean_t i386's pmap

2019-12-05 Thread Otto Moerbeek
On Thu, Dec 05, 2019 at 12:38:34PM +0100, Martin Pieuchot wrote:

> ok?

I'm no kernel hacker but I really do not see the point.

boolean_t helps to see if a functions is supposed to return a boolean
instead of an error code. I hate reading a function and having to
guess if 0 is supposed to mean success or not.

-Otto




> 
> Index: i386/pmap.c
> ===
> RCS file: /cvs/src/sys/arch/i386/i386/pmap.c,v
> retrieving revision 1.204
> diff -u -p -r1.204 pmap.c
> --- i386/pmap.c   18 Jan 2019 01:34:50 -  1.204
> +++ i386/pmap.c   5 Dec 2019 11:23:20 -
> @@ -403,7 +403,7 @@ int pmap_pg_wc = PG_UCMINUS;
>   */
>  
>  uint32_t protection_codes[8];/* maps MI prot to i386 prot 
> code */
> -boolean_t pmap_initialized = FALSE;  /* pmap_init done yet? */
> +int pmap_initialized = 0;/* pmap_init done yet? */
>  
>  /*
>   * MULTIPROCESSOR: special VAs/ PTEs are actually allocated inside a
> @@ -1120,7 +1120,7 @@ pmap_init(void)
>* done: pmap module is up (and ready for business)
>*/
>  
> - pmap_initialized = TRUE;
> + pmap_initialized = 1;
>  }
>  
>  /*
> @@ -1525,7 +1525,7 @@ pmap_deactivate(struct proc *p)
>   * pmap_extract: extract a PA for the given VA
>   */
>  
> -boolean_t
> +int
>  pmap_extract_86(struct pmap *pmap, vaddr_t va, paddr_t *pap)
>  {
>   pt_entry_t *ptes, pte;
> @@ -1535,12 +1535,12 @@ pmap_extract_86(struct pmap *pmap, vaddr
>   pte = ptes[atop(va)];
>   pmap_unmap_ptes_86(pmap);
>   if (!pmap_valid_entry(pte))
> - return (FALSE);
> + return 0;
>   if (pap != NULL)
>   *pap = (pte & PG_FRAME) | (va & ~PG_FRAME);
> - return (TRUE);
> + return 1;
>   }
> - return (FALSE);
> + return 0;
>  }
>  
>  /*
> @@ -1594,7 +1594,7 @@ pmap_zero_phys_86(paddr_t pa)
>   * pmap_zero_page_uncached: the same, except uncached.
>   */
>  
> -boolean_t
> +int
>  pmap_zero_page_uncached_86(paddr_t pa)
>  {
>  #ifdef MULTIPROCESSOR
> @@ -1613,7 +1613,7 @@ pmap_zero_page_uncached_86(paddr_t pa)
>   pagezero(zerova, PAGE_SIZE);/* zero */
>   *zpte = 0;
>  
> - return (TRUE);
> + return 1;
>  }
>  
>  /*
> @@ -2009,7 +2009,7 @@ pmap_page_remove_86(struct vm_page *pg)
>   * pmap_test_attrs: test a page's attributes
>   */
>  
> -boolean_t
> +int
>  pmap_test_attrs_86(struct vm_page *pg, int testbits)
>  {
>   struct pv_entry *pve;
> @@ -2020,7 +2020,7 @@ pmap_test_attrs_86(struct vm_page *pg, i
>   testflags = pmap_pte2flags(testbits);
>  
>   if (pg->pg_flags & testflags)
> - return (TRUE);
> + return 1;
>  
>   mybits = 0;
>   mtx_enter(>mdpage.pv_mtx);
> @@ -2035,20 +2035,20 @@ pmap_test_attrs_86(struct vm_page *pg, i
>   mtx_leave(>mdpage.pv_mtx);
>  
>   if (mybits == 0)
> - return (FALSE);
> + return 0;
>  
>   atomic_setbits_int(>pg_flags, pmap_pte2flags(mybits));
>  
> - return (TRUE);
> + return 1;
>  }
>  
>  /*
>   * pmap_clear_attrs: change a page's attributes
>   *
> - * => we return TRUE if we cleared one of the bits we were asked to
> + * => we return 1 if we cleared one of the bits we were asked to
>   */
>  
> -boolean_t
> +int
>  pmap_clear_attrs_86(struct vm_page *pg, int clearbits)
>  {
>   struct pv_entry *pve;
> @@ -2075,7 +2075,7 @@ pmap_clear_attrs_86(struct vm_page *pg, 
>  
>   opte = ptes[ptei(pve->pv_va)];
>   if (opte & clearbits) {
> - result = TRUE;
> + result = 1;
>   i386_atomic_clearbits_l([ptei(pve->pv_va)],
>   (opte & clearbits));
>   pmap_tlb_shootpage(pve->pv_pmap, pve->pv_va);
> @@ -2276,9 +2276,9 @@ pmap_enter_86(struct pmap *pmap, vaddr_t
>   pt_entry_t *ptes, opte, npte;
>   struct vm_page *ptp;
>   struct pv_entry *pve, *opve = NULL;
> - boolean_t wired = (flags & PMAP_WIRED) != 0;
> - boolean_t nocache = (pa & PMAP_NOCACHE) != 0;
> - boolean_t wc = (pa & PMAP_WC) != 0;
> + int wired = (flags & PMAP_WIRED) != 0;
> + int nocache = (pa & PMAP_NOCACHE) != 0;
> + int wc = (pa & PMAP_WC) != 0;
>   struct vm_page *pg = NULL;
>   int error, wired_count, resident_count, ptp_count;
>  
> @@ -2449,7 +2449,7 @@ enter_now:
>   npte |= PG_PVLIST;
>   if (pg->pg_flags & PG_PMAP_WC) {
>   KASSERT(nocache == 0);
> - wc = TRUE;
> + wc = 1;
>   }
>   pmap_sync_flags_pte_86(pg, npte);
>   }
> @@ -2618,7 +2618,7 @@ pmap_growkernel_86(vaddr_t maxkvaddr)
>  
>   for (/*null*/ ; nkpde < needed_kpde ; nkpde++) {
>  
> - if (uvm.page_init_done == FALSE) {
> + if (uvm.page_init_done == 0) {
>  
>   

Re: unwind and split-horizon DNS

2019-12-01 Thread Otto Moerbeek
On Sat, Nov 30, 2019 at 08:39:36AM +0100, Otto Moerbeek wrote:

> On Fri, Nov 29, 2019 at 11:37:40PM +0100, Björn Ketelaars wrote:
> 
> > On Fri 29/11/2019 21:35, Otto Moerbeek wrote:
> > > On Fri, Nov 29, 2019 at 10:27:57AM +0100, Florian Obser wrote:
> > > 
> > > > On Fri, Nov 29, 2019 at 07:28:20AM +0100, Otto Moerbeek wrote:
> > > > > On Fri, Nov 29, 2019 at 07:02:27AM +0100, Björn Ketelaars wrote:
> > > > > > I experienced no regression while using the free wifi service of the
> > > > > > Dutch railways, which is known to do strange things with DNS.
> > > > > 
> > > > > Thanks for testing. The Dutch railways have been a great inspiration
> > > > > to unwind work, as florian@ can telll you :-)
> > > > 
> > > > They have got to be good at *something*. Not sure if it's their core
> > > > business to annoy the hell out of me, but hey...
> > > > 
> > > > Only joking, overall I'm quite happy with the Dutch railway. I use
> > > > them every work day and they get me where I need to go most of the
> > > > time.
> > > > 
> > > > -- 
> > > > I'm not entirely sure you are real.
> > > > 
> > > 
> > > And here's a rebased diff for your convenience,
> > 
> > The rebased diff results in a different behaviour than the first diff.
> > More precise, 'force acceptbogus forwarder' is not respected any more
> > resulting in issues with DNSSEC.
> > 
> > I compared the old- and the rebased diff and noticed that some bits have
> > been left out. Functionality is restored after applying the diff below.
> > 
> > 
> > diff --git sbin/unwind/unwind.c sbin/unwind/unwind.c
> > index 5a97dcccec4..4687a7cc122 100644
> > --- sbin/unwind/unwind.c
> > +++ sbin/unwind/unwind.c
> > @@ -675,6 +675,12 @@ merge_config(struct uw_conf *conf, struct uw_conf 
> > *xconf)
> > uw_forwarder, entry);
> > }
> >  
> > +   for (n = RB_MIN(force_tree, >force); n != NULL; n = nxt) {
> > +   nxt = RB_NEXT(force_tree, >force, n);
> > +   RB_REMOVE(force_tree, >force, n);
> > +   RB_INSERT(force_tree, >force, n);
> > +   }
> > +
> > free(xconf);
> >  }
> >  
> 
> Thanks for spottting that. I did myself as well, but sent the wrong
> version... Below the full corrected diff.

Diff has been committed with one grammar change: acceptbogus is now
two words.

Thanks to the testers,

-Otto



Re: unwind and split-horizon DNS

2019-11-29 Thread Otto Moerbeek
On Fri, Nov 29, 2019 at 11:37:40PM +0100, Björn Ketelaars wrote:

> On Fri 29/11/2019 21:35, Otto Moerbeek wrote:
> > On Fri, Nov 29, 2019 at 10:27:57AM +0100, Florian Obser wrote:
> > 
> > > On Fri, Nov 29, 2019 at 07:28:20AM +0100, Otto Moerbeek wrote:
> > > > On Fri, Nov 29, 2019 at 07:02:27AM +0100, Björn Ketelaars wrote:
> > > > > I experienced no regression while using the free wifi service of the
> > > > > Dutch railways, which is known to do strange things with DNS.
> > > > 
> > > > Thanks for testing. The Dutch railways have been a great inspiration
> > > > to unwind work, as florian@ can telll you :-)
> > > 
> > > They have got to be good at *something*. Not sure if it's their core
> > > business to annoy the hell out of me, but hey...
> > > 
> > > Only joking, overall I'm quite happy with the Dutch railway. I use
> > > them every work day and they get me where I need to go most of the
> > > time.
> > > 
> > > -- 
> > > I'm not entirely sure you are real.
> > > 
> > 
> > And here's a rebased diff for your convenience,
> 
> The rebased diff results in a different behaviour than the first diff.
> More precise, 'force acceptbogus forwarder' is not respected any more
> resulting in issues with DNSSEC.
> 
> I compared the old- and the rebased diff and noticed that some bits have
> been left out. Functionality is restored after applying the diff below.
> 
> 
> diff --git sbin/unwind/unwind.c sbin/unwind/unwind.c
> index 5a97dcccec4..4687a7cc122 100644
> --- sbin/unwind/unwind.c
> +++ sbin/unwind/unwind.c
> @@ -675,6 +675,12 @@ merge_config(struct uw_conf *conf, struct uw_conf *xconf)
>   uw_forwarder, entry);
>   }
>  
> + for (n = RB_MIN(force_tree, >force); n != NULL; n = nxt) {
> + nxt = RB_NEXT(force_tree, >force, n);
> + RB_REMOVE(force_tree, >force, n);
> + RB_INSERT(force_tree, >force, n);
> + }
> +
>   free(xconf);
>  }
>  

Thanks for spottting that. I did myself as well, but sent the wrong
version... Below the full corrected diff.

-Otto

Index: frontend.c
===
RCS file: /cvs/src/sbin/unwind/frontend.c,v
retrieving revision 1.41
diff -u -p -r1.41 frontend.c
--- frontend.c  29 Nov 2019 15:22:02 -  1.41
+++ frontend.c  29 Nov 2019 20:32:43 -
@@ -336,6 +336,7 @@ frontend_dispatch_main(int fd, short eve
case IMSG_RECONF_BLOCKLIST_FILE:
case IMSG_RECONF_FORWARDER:
case IMSG_RECONF_DOT_FORWARDER:
+   case IMSG_RECONF_FORCE:
imsg_receive_config(, );
break;
case IMSG_RECONF_END:
Index: parse.y
===
RCS file: /cvs/src/sbin/unwind/parse.y,v
retrieving revision 1.20
diff -u -p -r1.20 parse.y
--- parse.y 28 Nov 2019 10:02:44 -  1.20
+++ parse.y 29 Nov 2019 20:32:43 -
@@ -90,8 +90,9 @@ struct sockaddr_storage   *host_ip(const c
 
 typedef struct {
union {
-   int64_t  number;
-   char*string;
+   int64_t  number;
+   char*string;
+   struct force_treeforce;
} v;
int lineno;
 } YYSTYPE;
@@ -101,12 +102,13 @@ typedef struct {
 %token INCLUDE ERROR
 %token FORWARDER DOT PORT 
 %token AUTHENTICATION NAME PREFERENCE RECURSOR DHCP STUB
-%token BLOCK LIST LOG
+%token BLOCK LIST LOG FORCE ACCEPTBOGUS
 
 %token   STRING
 %token   NUMBER
-%typeyesno port dot prefopt log
+%typeyesno port dot prefopt log acceptbogus
 %typestring authname
+%type force_list
 
 %%
 
@@ -117,6 +119,7 @@ grammar : /* empty */
| grammar uw_pref '\n'
| grammar uw_forwarder '\n'
| grammar block_list '\n'
+   | grammar force '\n'
| grammar error '\n'{ file->errors++; }
;
 
@@ -311,6 +314,63 @@ dot:   DOT { $$ = 
DOT; }
 log:   LOG { $$ = 1; }
|   /* empty */ { $$ = 0; }
;
+
+force  :   FORCE acceptbogus prefopt '{' force_list optnl '}' {
+   struct force_tree_entry *n, *nxt;
+   int error = 0;
+
+   for (n = RB_MIN(force_tree, &$5); n != NULL;
+   n = nxt) {
+   nxt = RB_NEXT(force_tree, >force, n);
+   

Re: unwind and split-horizon DNS

2019-11-29 Thread Otto Moerbeek
On Fri, Nov 29, 2019 at 10:27:57AM +0100, Florian Obser wrote:

> On Fri, Nov 29, 2019 at 07:28:20AM +0100, Otto Moerbeek wrote:
> > On Fri, Nov 29, 2019 at 07:02:27AM +0100, Björn Ketelaars wrote:
> > > I experienced no regression while using the free wifi service of the
> > > Dutch railways, which is known to do strange things with DNS.
> > 
> > Thanks for testing. The Dutch railways have been a great inspiration
> > to unwind work, as florian@ can telll you :-)
> 
> They have got to be good at *something*. Not sure if it's their core
> business to annoy the hell out of me, but hey...
> 
> Only joking, overall I'm quite happy with the Dutch railway. I use
> them every work day and they get me where I need to go most of the
> time.
> 
> -- 
> I'm not entirely sure you are real.
> 

And here's a rebased diff for your convenience,

-Otto

Index: frontend.c
===
RCS file: /cvs/src/sbin/unwind/frontend.c,v
retrieving revision 1.41
diff -u -p -r1.41 frontend.c
--- frontend.c  29 Nov 2019 15:22:02 -  1.41
+++ frontend.c  29 Nov 2019 20:30:19 -
@@ -336,6 +336,7 @@ frontend_dispatch_main(int fd, short eve
case IMSG_RECONF_BLOCKLIST_FILE:
case IMSG_RECONF_FORWARDER:
case IMSG_RECONF_DOT_FORWARDER:
+   case IMSG_RECONF_FORCE:
imsg_receive_config(, );
break;
case IMSG_RECONF_END:
Index: parse.y
===
RCS file: /cvs/src/sbin/unwind/parse.y,v
retrieving revision 1.20
diff -u -p -r1.20 parse.y
--- parse.y 28 Nov 2019 10:02:44 -  1.20
+++ parse.y 29 Nov 2019 20:30:19 -
@@ -90,8 +90,9 @@ struct sockaddr_storage   *host_ip(const c
 
 typedef struct {
union {
-   int64_t  number;
-   char*string;
+   int64_t  number;
+   char*string;
+   struct force_treeforce;
} v;
int lineno;
 } YYSTYPE;
@@ -101,12 +102,13 @@ typedef struct {
 %token INCLUDE ERROR
 %token FORWARDER DOT PORT 
 %token AUTHENTICATION NAME PREFERENCE RECURSOR DHCP STUB
-%token BLOCK LIST LOG
+%token BLOCK LIST LOG FORCE ACCEPTBOGUS
 
 %token   STRING
 %token   NUMBER
-%typeyesno port dot prefopt log
+%typeyesno port dot prefopt log acceptbogus
 %typestring authname
+%type force_list
 
 %%
 
@@ -117,6 +119,7 @@ grammar : /* empty */
| grammar uw_pref '\n'
| grammar uw_forwarder '\n'
| grammar block_list '\n'
+   | grammar force '\n'
| grammar error '\n'{ file->errors++; }
;
 
@@ -311,6 +314,63 @@ dot:   DOT { $$ = 
DOT; }
 log:   LOG { $$ = 1; }
|   /* empty */ { $$ = 0; }
;
+
+force  :   FORCE acceptbogus prefopt '{' force_list optnl '}' {
+   struct force_tree_entry *n, *nxt;
+   int error = 0;
+
+   for (n = RB_MIN(force_tree, &$5); n != NULL;
+   n = nxt) {
+   nxt = RB_NEXT(force_tree, >force, n);
+   n->acceptbogus = $2;
+   n->type = $3;
+   RB_REMOVE(force_tree, &$5, n);
+   if (RB_INSERT(force_tree, >force,
+   n)) {
+   yyerror("%s already in an force "
+   "list", n->domain);
+   error = 1;
+   }
+   }
+   if (error)
+   YYERROR;
+   }
+   ;
+
+acceptbogus:   ACCEPTBOGUS { $$ = 1; }
+   |   /* empty */ { $$ = 0; }
+   ;
+
+force_list:force_list optnl STRING {
+   struct force_tree_entry *e;
+   size_t   len;
+
+   len = strlen($3);
+   e = malloc(sizeof(*e));
+   if (e == NULL)
+   err(1, NULL);
+   if (strlcpy(e->domain, $3, sizeof(e->domain)) >=
+   sizeof(e->domain)) {
+   yyerror("force %s too long", $3);
+   free($3);
+   YYERROR;
+   }
+   

Re: unwind and split-horizon DNS

2019-11-28 Thread Otto Moerbeek
On Fri, Nov 29, 2019 at 07:02:27AM +0100, Björn Ketelaars wrote:

> On Thu 28/11/2019 16:16, Otto Moerbeek wrote:
> > On Thu, Nov 28, 2019 at 03:26:34PM +0100, Otto Moerbeek wrote:
> > 
> > > Hi,
> > > 
> > > In many offices, split horizon DNS is used. This means that if you are
> > > in the office you are supposed to use a specific resolver that will
> > > hand out different results than when asking for the same name on the
> > > rest of the internet.
> > > 
> > > Until now unwind could not really handle that, e.g. in recursing mode,
> > > it would produce the view as from outside of the office. 
> > > 
> > > With this diff, it becomes possible to force using a specific resolver
> > > when resolving names in specific domains.
> > > 
> > > For example, with this unwind.conf:
> > > 
> > > # Office forwarder
> > > forwarder 1.2.3.4 
> > > force forwarder {
> > >   myoffice.com
> > >   dmz.colocation.com
> > > }
> > > 
> > > This will make unwind always use the mentioned forwarder for anything
> > > under office.com or dmz.colocation.com. If the forwarder is dead,
> > > regular resolving is done for these names and www.office.com will
> > > likely return the external address.
> > > 
> > > Often split-horizon DNS breaks DNSSEC for these specific domains. If
> > > that is the case, you can use
> > > 
> > > force acceptbogus forwarder { 
> > >   ... 
> > > }
> > > 
> > > please test this,
> > > 
> > >   -Otto
> > > 
> > > OAIndex: frontend.c
> > 
> > Dont know hwre that OA is comming from.  But it confuses patch, making
> > it skip first part of the diff. Proper diff below:
> 
> @Home I'm redirecting all DNS requests to a machine with unbound serving
> a couple of local-zones. unwind didn't work for me as these local-zones
> would not resolve because of DNSSEC. With your diff, and the config
> below unwind works perfect.
> 
> forwarder 10.0.0.1
> force acceptbogus forwarder {
>   lan
> }
> 
> I experienced no regression while using the free wifi service of the
> Dutch railways, which is known to do strange things with DNS.

Thanks for testing. The Dutch railways have been a great inspiration
to unwind work, as florian@ can telll you :-)

-Otto



Re: unwind and split-horizon DNS

2019-11-28 Thread Otto Moerbeek
On Thu, Nov 28, 2019 at 03:26:34PM +0100, Otto Moerbeek wrote:

> Hi,
> 
> In many offices, split horizon DNS is used. This means that if you are
> in the office you are supposed to use a specific resolver that will
> hand out different results than when asking for the same name on the
> rest of the internet.
> 
> Until now unwind could not really handle that, e.g. in recursing mode,
> it would produce the view as from outside of the office. 
> 
> With this diff, it becomes possible to force using a specific resolver
> when resolving names in specific domains.
> 
> For example, with this unwind.conf:
> 
> # Office forwarder
> forwarder 1.2.3.4 
> force forwarder {
>   myoffice.com
>   dmz.colocation.com
> }
> 
> This will make unwind always use the mentioned forwarder for anything
> under office.com or dmz.colocation.com. If the forwarder is dead,
> regular resolving is done for these names and www.office.com will
> likely return the external address.
> 
> Often split-horizon DNS breaks DNSSEC for these specific domains. If
> that is the case, you can use
> 
> force acceptbogus forwarder { 
>   ... 
> }
> 
> please test this,
> 
>   -Otto
> 
> OAIndex: frontend.c

Dont know hwre that OA is comming from.  But it confuses patch, making
it skip first part of the diff. Proper diff below:

-Otto

Index: frontend.c
===
RCS file: /cvs/src/sbin/unwind/frontend.c,v
retrieving revision 1.40
diff -u -p -r1.40 frontend.c
--- frontend.c  27 Nov 2019 17:09:12 -  1.40
+++ frontend.c  28 Nov 2019 14:24:17 -
@@ -336,6 +336,7 @@ frontend_dispatch_main(int fd, short eve
case IMSG_RECONF_BLOCKLIST_FILE:
case IMSG_RECONF_FORWARDER:
case IMSG_RECONF_DOT_FORWARDER:
+   case IMSG_RECONF_FORCE:
imsg_receive_config(, );
break;
case IMSG_RECONF_END:
Index: parse.y
===
RCS file: /cvs/src/sbin/unwind/parse.y,v
retrieving revision 1.20
diff -u -p -r1.20 parse.y
--- parse.y 28 Nov 2019 10:02:44 -  1.20
+++ parse.y 28 Nov 2019 14:24:17 -
@@ -90,8 +90,9 @@ struct sockaddr_storage   *host_ip(const c
 
 typedef struct {
union {
-   int64_t  number;
-   char*string;
+   int64_t  number;
+   char*string;
+   struct force_treeforce;
} v;
int lineno;
 } YYSTYPE;
@@ -101,12 +102,13 @@ typedef struct {
 %token INCLUDE ERROR
 %token FORWARDER DOT PORT 
 %token AUTHENTICATION NAME PREFERENCE RECURSOR DHCP STUB
-%token BLOCK LIST LOG
+%token BLOCK LIST LOG FORCE ACCEPTBOGUS
 
 %token   STRING
 %token   NUMBER
-%typeyesno port dot prefopt log
+%typeyesno port dot prefopt log acceptbogus
 %typestring authname
+%type force_list
 
 %%
 
@@ -117,6 +119,7 @@ grammar : /* empty */
| grammar uw_pref '\n'
| grammar uw_forwarder '\n'
| grammar block_list '\n'
+   | grammar force '\n'
| grammar error '\n'{ file->errors++; }
;
 
@@ -311,6 +314,63 @@ dot:   DOT { $$ = 
DOT; }
 log:   LOG { $$ = 1; }
|   /* empty */ { $$ = 0; }
;
+
+force  :   FORCE acceptbogus prefopt '{' force_list optnl '}' {
+   struct force_tree_entry *n, *nxt;
+   int error = 0;
+
+   for (n = RB_MIN(force_tree, &$5); n != NULL;
+   n = nxt) {
+   nxt = RB_NEXT(force_tree, >force, n);
+   n->acceptbogus = $2;
+   n->type = $3;
+   RB_REMOVE(force_tree, &$5, n);
+   if (RB_INSERT(force_tree, >force,
+   n)) {
+   yyerror("%s already in an force "
+   "list", n->domain);
+   error = 1;
+   }
+   }
+   if (error)
+   YYERROR;
+   }
+   ;
+
+acceptbogus:   ACCEPTBOGUS { $$ = 1; }
+   |   /* empty */ { $$ = 0; }
+   ;
+
+force_list:force_list optnl STRING {
+   struct force_tree_entry *e;
+   size_t   len;
+
+

unwind and split-horizon DNS

2019-11-28 Thread Otto Moerbeek
Hi,

In many offices, split horizon DNS is used. This means that if you are
in the office you are supposed to use a specific resolver that will
hand out different results than when asking for the same name on the
rest of the internet.

Until now unwind could not really handle that, e.g. in recursing mode,
it would produce the view as from outside of the office. 

With this diff, it becomes possible to force using a specific resolver
when resolving names in specific domains.

For example, with this unwind.conf:

# Office forwarder
forwarder 1.2.3.4 
force forwarder {
myoffice.com
dmz.colocation.com
}

This will make unwind always use the mentioned forwarder for anything
under office.com or dmz.colocation.com. If the forwarder is dead,
regular resolving is done for these names and www.office.com will
likely return the external address.

Often split-horizon DNS breaks DNSSEC for these specific domains. If
that is the case, you can use

force acceptbogus forwarder { 
... 
}

please test this,

-Otto

OAIndex: frontend.c
===
RCS file: /cvs/src/sbin/unwind/frontend.c,v
retrieving revision 1.40
diff -u -p -r1.40 frontend.c
--- frontend.c  27 Nov 2019 17:09:12 -  1.40
+++ frontend.c  28 Nov 2019 14:24:17 -
@@ -336,6 +336,7 @@ frontend_dispatch_main(int fd, short eve
case IMSG_RECONF_BLOCKLIST_FILE:
case IMSG_RECONF_FORWARDER:
case IMSG_RECONF_DOT_FORWARDER:
+   case IMSG_RECONF_FORCE:
imsg_receive_config(, );
break;
case IMSG_RECONF_END:
Index: parse.y
===
RCS file: /cvs/src/sbin/unwind/parse.y,v
retrieving revision 1.20
diff -u -p -r1.20 parse.y
--- parse.y 28 Nov 2019 10:02:44 -  1.20
+++ parse.y 28 Nov 2019 14:24:17 -
@@ -90,8 +90,9 @@ struct sockaddr_storage   *host_ip(const c
 
 typedef struct {
union {
-   int64_t  number;
-   char*string;
+   int64_t  number;
+   char*string;
+   struct force_treeforce;
} v;
int lineno;
 } YYSTYPE;
@@ -101,12 +102,13 @@ typedef struct {
 %token INCLUDE ERROR
 %token FORWARDER DOT PORT 
 %token AUTHENTICATION NAME PREFERENCE RECURSOR DHCP STUB
-%token BLOCK LIST LOG
+%token BLOCK LIST LOG FORCE ACCEPTBOGUS
 
 %token   STRING
 %token   NUMBER
-%typeyesno port dot prefopt log
+%typeyesno port dot prefopt log acceptbogus
 %typestring authname
+%type force_list
 
 %%
 
@@ -117,6 +119,7 @@ grammar : /* empty */
| grammar uw_pref '\n'
| grammar uw_forwarder '\n'
| grammar block_list '\n'
+   | grammar force '\n'
| grammar error '\n'{ file->errors++; }
;
 
@@ -311,6 +314,63 @@ dot:   DOT { $$ = 
DOT; }
 log:   LOG { $$ = 1; }
|   /* empty */ { $$ = 0; }
;
+
+force  :   FORCE acceptbogus prefopt '{' force_list optnl '}' {
+   struct force_tree_entry *n, *nxt;
+   int error = 0;
+
+   for (n = RB_MIN(force_tree, &$5); n != NULL;
+   n = nxt) {
+   nxt = RB_NEXT(force_tree, >force, n);
+   n->acceptbogus = $2;
+   n->type = $3;
+   RB_REMOVE(force_tree, &$5, n);
+   if (RB_INSERT(force_tree, >force,
+   n)) {
+   yyerror("%s already in an force "
+   "list", n->domain);
+   error = 1;
+   }
+   }
+   if (error)
+   YYERROR;
+   }
+   ;
+
+acceptbogus:   ACCEPTBOGUS { $$ = 1; }
+   |   /* empty */ { $$ = 0; }
+   ;
+
+force_list:force_list optnl STRING {
+   struct force_tree_entry *e;
+   size_t   len;
+
+   len = strlen($3);
+   e = malloc(sizeof(*e));
+   if (e == NULL)
+   err(1, NULL);
+   if (strlcpy(e->domain, $3, sizeof(e->domain)) >=
+   sizeof(e->domain)) {
+   yyerror("force %s too long", $3);
+   free($3);
+   YYERROR;
+   }
+  

Re: unwind: log missing config file

2019-11-19 Thread Otto Moerbeek
On Tue, Nov 19, 2019 at 03:53:16AM +0100, Florian Obser wrote:

> On Tue, Nov 19, 2019 at 12:15:34AM +0100, Klemens Nanni wrote:
> > On Mon, Nov 18, 2019 at 10:19:47PM +0100, Klemens Nanni wrote:
> > > With that, my initial case is no longer misleading;  alternatively, I
> > > can implement the dash semantic, but that's another diff.
> > Hm, that makes the default setup (no /etc/unwind.conf, empty
> > unwind_flags) always print a warning, which is ugly.
> 
> It's not just ugly, it's missleading. We want unwind to be the best it
> can be *without* a config file. We don't to suggest a config file. We
> want to steer people away from pushing buttons, not towards.
> 
> Btw. I'm not sure the parser can work with stdin, I think it needs to
> seek, which you can't do on stdin? But I might be mistaken on multiple
> levels.

yacc uses a one-token lookahead parser, it needs to look at the one
extra token max to decide what to do next. It keeps track of the last
token seen itself. As for the lexers we use, they use one character
lookahead, with an occasional ungetc. stdio is guaranteed to handle
that without issues for any stream source. As for parsers or lexers
needing seek, the thought alone! ;-)

-Otto



Re: [PATCH] [src] bin/ed/README - fix quote/comma

2019-11-17 Thread Otto Moerbeek
On Sun, Nov 17, 2019 at 08:31:00AM +, Raf Czlonka wrote:

> Hi all,
> 
> Pretty straightforward - comma snuck in inside the quoted book title.

This is how I learned it. Myabe a bit old-fashinoed, but not wrong.

-Otto


> 
> Regards,
> 
> Raf
> 
> Index: bin/ed/README
> ===
> RCS file: /cvs/src/bin/ed/README,v
> retrieving revision 1.5
> diff -u -p -r1.5 README
> --- bin/ed/README 15 Jun 2018 08:46:24 -  1.5
> +++ bin/ed/README 17 Nov 2019 08:29:14 -
> @@ -16,4 +16,4 @@ The ./test directory contains regression
>  file in that directory explains how to run these.
>  
>  For a description of the ed algorithm, see Kernighan and Plauger's book
> -"Software Tools in Pascal," Addison-Wesley, 1981.
> +"Software Tools in Pascal", Addison-Wesley, 1981.
> 



Re: unwind(8): refactor & simplify refcounting

2019-11-13 Thread Otto Moerbeek
On Tue, Nov 12, 2019 at 05:45:38PM +0100, Florian Obser wrote:

> Did I get this right? I'd appreciate it if someone could give this a
> once over.
> 
> Since resolve() switched to a callback mechanism all uw_resolver objects
> pass through resolve() and either asr_resolve_done() or
> ub_resolve_done().
> With that we can pull resolver_ref() and resolver_unref() into those
> functions to make the reference counting easier.
> Only check_resolver is special since it needs to refcount the to be
> checked resolver. But the resolver doing the actual work is
> automatically refcounted by resolve() and *_resolve_done().
> One last piece of the puzzle is to track the uw_resolver object in
> cb_data so that the *_resolve_done() functions have access to it.
> This also allowes us to remove the ad-hoc passing of the resolver in
> query_imsg. Since the callback functions all need access to the
> resolver that did the work we pass it in as first argument.
> 

Reviewed the code, did some tests and it all looks good.

One nit: I would have declared a typedef for the callback funtion type
to be used in the struct resolver_cb_data and the prototype and the
definition of resolve(), it makes those lines easier to read.  But ok
anyway,

-Otto


> diff --git resolver.c resolver.c
> index d1dce2dec71..2b7d81d29fc 100644
> --- resolver.c
> +++ resolver.c
> @@ -92,14 +92,11 @@ struct uw_resolver {
>   int64_t  histogram[nitems(histogram_limits)];
>  };
>  
> -struct check_resolver_data {
> - struct uw_resolver  *res;
> - struct uw_resolver  *check_res;
> -};
> -
>  struct resolver_cb_data {
> - void(*cb)(void *, int, void *, int, int, char *);
> - void*data;
> + void(*cb)(struct uw_resolver *, void *, int, void *,
> + int, int, char *);
> + void*data;
> + struct uw_resolver  *res;
>  };
>  
>  __dead void   resolver_shutdown(void);
> @@ -108,9 +105,10 @@ void  
> resolver_dispatch_frontend(int, short, void *);
>  void  resolver_dispatch_captiveportal(int, short, void *);
>  void  resolver_dispatch_main(int, short, void *);
>  int   resolve(struct uw_resolver *, const char*, int, int,
> -  void*, void (*cb)(void *, int, void *, int, int,
> -  char *));
> -void  resolve_done(void *, int, void *, int, int, char *);
> +  void*, void (*cb)(struct uw_resolver *, void *,
> +  int, void *, int, int, char *));
> +void  resolve_done(struct uw_resolver *, void *, int, void *,
> +  int, int, char *);
>  void  ub_resolve_done(void *, int, void *, int, int, char *,
>int);
>  void  asr_resolve_done(struct asr_result *, void *);
> @@ -129,8 +127,8 @@ void   set_forwarders_oppdot(struct 
> uw_resolver *,
>  void  resolver_check_timo(int, short, void *);
>  void  resolver_free_timo(int, short, void *);
>  void  check_resolver(struct uw_resolver *);
> -void  check_resolver_done(void *, int, void *, int, int,
> -  char *);
> +void  check_resolver_done(struct uw_resolver *, void *, int,
> +  void *, int, int, char *);
>  void  schedule_recheck_all_resolvers(void);
>  int   check_forwarders_changed(struct uw_forwarder_head *,
>struct uw_forwarder_head *);
> @@ -154,8 +152,8 @@ int
> check_captive_portal_changed(struct uw_conf *,
>struct uw_conf *);
>  void  trust_anchor_resolve(void);
>  void  trust_anchor_timo(int, short, void *);
> -void  trust_anchor_resolve_done(void *, int, void *, int,
> -  int, char *);
> +void  trust_anchor_resolve_done(struct uw_resolver *, void *,
> +  int, void *, int, int, char *);
>  void  add_autoconf_forwarders(struct imsg_rdns_proposal *);
>  void  rem_autoconf_forwarders(struct imsg_rdns_proposal *);
>  struct uw_forwarder  *find_forwarder(struct uw_forwarder_head *,
> @@ -480,14 +478,10 @@ resolver_dispatch_frontend(int fd, short event, void 
> *bula)
>   log_debug("%s: choosing %s", __func__,
>   uw_resolver_type_str[res->type]);
>  
> - query_imsg->resolver = res;
> - resolver_ref(res);
> -
>   clock_gettime(CLOCK_MONOTONIC, _imsg->tp);
>  
> - if ((resolve(res, query_imsg->qname, query_imsg->t,
> -

Re: HEADS UP: ntpd changing

2019-11-11 Thread Otto Moerbeek
On Sun, Nov 10, 2019 at 05:03:02PM -0700, Theo de Raadt wrote:

> The ntpd options -s and -S are going to be removed soon and at startup
> with print:
> 
> -s option no longer works and will be removed soon.
> Please reconfigure to use constraints or trusted servers.
> 
> Probably after 6.7 we'll delete the warning.  Maybe for 6.8 we'll remove
> -s and -S from getopt, and starting with those options will fail.
> 
> Effective immediately, the -s option stops doing what you expect.  It now
> does nothing.
> 
> Big improvements have happened in ntpd recently.  At startup, ntpd
> aggressively tries to learn from NTP packets validated by constraints,
> and set the time.
> 
> That means a smarter variation of -s is the default, but the information
> is now *VALIDATED* by constraints.
> 
> 2 additional constraints have been added.  If you have upgraded, please
> review /etc/examples/ntpd.conf for modern use
> 
> Those who cannot use https constraints, can instead tag server lines
> with the keyword "trusted", which means you believe MITM attacks are not
> possible on the network to those specific NTP servers.  Do this only on
> servers directly connected over trusted network.  If someone does
> "servers pool.ntp.org trusted", we're going to have a great laugh.
> 
> We're creating something a bit complex, but our goal is for every
> machine to have a close approximation of correct time.  If we get
> there, some good things will happen.  Some serious cargo-culting
> for using -s has gotten in the way (-s performs no MITM checks).
> 

So if you are running current do the following. Likely you can stop
after step 2.

1. remove -s from ntpd_flags

2. check if the default ntpd.config works for you; it most lilely will,
   *including setting the time on boot*. 

3. if you cannot use constraints because https to the world is not possible,
   consider running ntpd on your local net and use that as a peer marked as
   trusted or if availabel use a sensor marked as trusted.

4. Still having problems? Report so we can look at you use-case and
   find a solution.

-Otto



Re: Opportunistic DoT for unwind(8)

2019-11-02 Thread Otto Moerbeek
On Fri, Nov 01, 2019 at 10:43:27PM +0100, Remi Locherer wrote:

> On Fri, Nov 01, 2019 at 09:53:28PM +0100, Florian Obser wrote:
> > On Fri, Nov 01, 2019 at 09:45:37PM +0100, Florian Obser wrote:
> > > On Fri, Nov 01, 2019 at 09:35:07PM +0100, Remi Locherer wrote:
> > > > On Thu, Oct 31, 2019 at 08:14:04PM +0100, Otto Moerbeek wrote:
> > > > > Hi,
> > > > > 
> > > > > So here's a new diff that incorporates the bug fix mentioned plus
> > > > > debug printf line changes suggested by Stuart.
> > > > > 
> > > > > Please note that this is a diff on top of very recent current, i.e.
> > > > > florian's work he committed today. That means that you need to be
> > > > > up-to-date (including a recent libc update that was committed a few
> > > > > days ago) to be able to test this version.
> > > > 
> > > > I upgraded to a snapshot from today, updated the source and applied
> > > > your diff. Then I did the same test as last time using pf to block port 
> > > > 53
> > > > (block return out log inet proto {tcp udp} to !9.9.9.9 port 53).
> > > > 
> > > > Result: the non functional type asr is selected instead of the 
> > > > forwarder.
> > > > 
> > > > $ doas unwindctl status 
> > > > captive portal is unknown
> > > > 
> > > > selected type status
> > > >  recursor dead
> > > > forwarder validating (OppDoT)
> > > >  dhcp unknown (OppDoT)
> > > >*  asr dead
> > > > $
> > > > $ getent hosts undeadly.org
> > > > $ echo $?
> > > > 2
> > > > $ dig +short undeadly.org @9.9.9.9
> > > > 94.142.241.173
> > > > $
> > > > 
> > > > Without your patch unwind behaves similar regarding the type selection:
> > > > 
> > > > $ doas unwindctl status 
> > > > captive portal is unknown
> > > 
> > > ^ you are creating a not supported configuration.
> > > 
> > > When we are behind a captive portal or don't know yet if we are behind
> > > a captive portal resolving is forced to asr.
> > > 
> > > That might not be very wise if asr is dead but I currently don't see
> > > how this can happen in practice except with a well aimed foot-gun.
> > 
> > Actually, I have an idea how this can happen in practice, please try this:
> > 
> > diff --git resolver.c resolver.c
> > index f59860a5e98..5bbc63f60fa 100644
> > --- resolver.c
> > +++ resolver.c
> > @@ -1282,7 +1282,8 @@ best_resolver(void)
> >  
> > if (captive_portal_state == PORTAL_UNKNOWN || captive_portal_state ==
> > BEHIND) {
> > -   if (resolvers[UW_RES_ASR] != NULL) {
> > +   if (resolvers[UW_RES_ASR] != NULL && resolvers[UW_RES_ASR]->
> > +state != DEAD) {
> > res = resolvers[UW_RES_ASR];
> > goto out;
> > }
> > 
> > 
> 
> Yes, this makes unwind cope with this situation:
> 
> $ unwindctl status 
> not behind captive portal
> 
> selected type status
>  recursor dead
>*forwarder validating
>  dhcp dead
>   asr dead
> $
> 
> OK remi@
> 

And with my diff on top of that?

-Otto



  1   2   3   4   5   6   7   8   >