Re: Concurrent trie-hash map

2018-10-17 Thread Jason Thorpe



> On Oct 16, 2018, at 1:00 PM, Mindaugas Rasiukevicius  wrote:
> 
> Hi,
> 
> I recently implemented thmap [1] -- a concurrent trie-hash map, combining
> the elements of hashing and radix trie.  It is supports lock-free lookups
> and concurrent inserts/deletes.  It is designed to be optimal as a general
> purpose *concurrent* associative array [2][3].

I think this is a great thing to add to kern/ as a general facility.  There are 
applications beyond just the network stack, as well -- any read-mostly hash 
table that has a lock or an array-of-locks around it is a good candidate for 
being replaced by this.

We already have some objects in the kernel that use the "get-ref" / "put-ref" 
pattern for indicating they are being referenced; this is ideal for hooking in 
your G/C mechanism.  We should probably, as a separate exercise, extend that 
pattern to most of the objects managed in our kernel (even if the put-ref is a 
no-op).

-- thorpej



Re: libnv

2018-08-27 Thread Jason Thorpe



> On Aug 27, 2018, at 7:49 AM, Mindaugas Rasiukevicius  wrote:

> I do not think there was any conclusion either, or a conclusion that
> we always want to have static serialisation with a schema (IMO, there
> are pros and cons in both cases).  Worth noting that there are multiple
> libraries already: we already imported libnvpair (with tons of other
> code for ZFS and DTrace) and nobody complained; we still have XDR.

XDR has a separate reason for existing -- it is enshrined in wire protocols.

> While libnv is not a panacea, it solves some actual problems: it has
> intuitive reference counting (symmetric API), it supports accumulated
> error handling, it provides more efficient binary encoding, it does not
> have shared structures with internal locking, it does not have awkward
> API.  As a side note, it also makes code sharing with FreeBSD (as well
> as illumos) much easier.  Meanwhile, there has been little action on
> improving proplib or proposing a tangible replacement of it.

Let me try to address these points one by one:

==> It has intuitive reference counting (symmetric API)

I just sent you a separate mail on this subject.  Actually, the libprop 
reference counting rules are very simple:

"If you create an object, you implicitly own the reference and you are 
responsible for disposing of it.  In all other cases, references must be 
explicitly retained."

That's it.  It's really not that complicated.  Containers (arrays and 
dictionaries) retain their own references explicitly (because they want the 
object the live on not matter what the caller does with its own reference).  
When you "get" an object from the container, you did not create it, and thus 
need to explicitly retain your own (that you, of course, are responsible for 
disposing of later).

Please explain to me how this referencing counting rule is non-intuitive.


==> It supports accumulated error handling

I personally find APIs like this kind of annoying, but if you find it valuable, 
it could be added to libprop with a utility wrapper API.


==> It provides more efficient binary encoding

I'll grant this one, but with a caveat -- I always intended to to add the 
"binary plist" serialization format to libprop, but never got around to it.  It 
would require a refactor of the serialization stuff that's already in libprop, 
but this is a solvable problem and have been thinking of doing it soonish 
anyway because of some other ideas I've been toying with in my head.


==> It does not have shared structures with internal locking

I'll grant that this was probably a mistake initially with dictionaries and 
arrays.  Serialization for these should be handled at the caller's level, 
because in many instances you would need that anyway if you expect multiple 
threads to be accessing the container (to provide correct semantics at the 
caller's layer).  So, as such, it would probably be worth having a discussion 
about whether removing the internal locking would be a safe thing to do.


==> It does not have an awkward API

The individual object APIs themselves can be kind of awkward to use, but that 
was the price to pay for flexibility.  There are already several utility APIs 
that provide a simpler interface, and those could be extended.




> 
> Anyway, I am not proposing that everybody should use libnv.  However,
> it is is quite lightweight and self-contained.  I am already using libnv
> in the upstream NPF.  The conversion to libnv made the code simpler,
> shorter and easier to read.  And I am not planning to maintain proplib
> version: it causes too much headache and the differences are too vast.
> 
> -- 
> Mindaugas

-- thorpej



Re: libnv

2018-08-28 Thread Jason Thorpe



> On Aug 27, 2018, at 2:25 PM, David Holland  wrote:
> 
> On Mon, Aug 27, 2018 at 08:41:43AM -0700, Jason Thorpe wrote:
>>> [proplib]
>> 
>> Let me try to address these points one by one:
> 
> Now that you're back, can you explain a few more basic points about
> proplib?

I'll do my best to remember :-)

> (a) What's its mission? Is it supposed to be a data *storage* library,
> that is, the data lives in proplib and you do proplib accesses to
> extract the bits of it you want as you need them? Or is it supposed to
> be a data *transfer* library, that is, the data lives somewhere else
> and the expectation is that when you receive a proplib bundle that you
> immediately unpack it into its own data structures? (Note that saving
> to disk and loading back later is roughly comparable to sending and
> receiving among entities -- that's not what this question is about.)

It can be either.

Consider the case of properties on device_t's ... it's the storage medium for 
that information.

Or consider the case of a configuration file ... this isn't really the same 
thing as "saving state out to disk and reading it back in later", it's one 
application serializing the information in a standard way to be consumed by 
another ... and how the consumer deals with it is really up to them.

Or consider the case of wanting to collect some information from the kernel for 
consumption by userland (for display purposes, or whatever) ... but you want to 
isolate the receiver from the implementation details of the kernel sender.

Some additional utility functions could make it easier to use in various 
situations, but it's intent is to be "useful wherever" 

> (b) What's its data model? Is it supposed to handle hierarchical
> tree-structured data of the kind you might find in an ordinary
> programming language's native data types? Or is it supposed to support
> graph-structured data where many locations can share references to
> common values or sub-objects?

The former is how it's generally used, but it shares references to common 
sub-objects as a memory saving measure (either by assuming you use the API that 
manipulates the objects directly and do this yourself, or by de-duping certain 
immutable objects, as with prop_number_t).  There is not attempt to detect 
cycles, however, so y'know, exercise caution.

> (c) And also, what's its data model? What was the criterion used to
> determine which atom types it supports?

The data types supported by Apple XML property lists, with the following 
caveats:

-- No attempt to support  or .  There is no reason those could not 
be added, but their use in standalone and kernel environments is not feasible; 
they could certainly be excluded based on #ifdef...

-- I serialized unsigned numbers as hex (and assume that hex numbers that are 
read in are unsigned), and serialize signed as decimal.  I believe macOS will 
read both decimal and hex, but always writes out in decimal.  I think.  I 
should double-check.

I captured bool as a separate object type, as in CoreFoundation (CFBoolean), 
rather than in the number object as in Foundation (NSNumber).

-- thorpej



Re: ATF t_mlock() babylon5 kernel panics

2019-03-13 Thread Jason Thorpe


> On Mar 12, 2019, at 9:09 PM, Robert Elz  wrote:
> 
> The first issue I noticed, is that t_mlock() apparently belives
> the malloc(3) man page, which states:
> 
> The malloc() function allocates size bytes of uninitialized memory.  The
> allocated space is suitably aligned (after possible pointer coercion) for
> storage of any type of object.
> 
> and in particular, those last few words.   The "any type of object" that
> t_mlock wants to store is a "page" - that is a hardware page.

The test employs a bogus understanding of how malloc() is specified.  On x86, 
malloc() should return memory that is 16-byte aligned because that is the 
maximum alignment requirement of the fundamental types used by the compiler.

>   It obtains
> the size of that using:
> 
>   page = sysconf(_SC_PAGESIZE);
> 
> and then does
> 
>buf = malloc(page);
> 
> and if buf is not NULL (which it does check) assumes that it now
> has a correctly page aligned page sized block of memory, in which
> it can run mlock() related tests.
> 
> Something tells me that the "any type of object" does not include this
> one, and that t_mlock should be using posix_memalign() instead to allocate
> its page, so it can specify that it needs a page aligned page.

Correct.  Or mmap() (which always returns page-aligned pointers).

> Again, I am not proposing fixing the test until the kernel issues
> are corrected, but it would be good for someone who knows what alignment
> malloc() really promises to return (across all NetBSD architectures)
> to rewrite the man page to say something more specific than "any type of
> object" !

I've also seen the term "fundamental object" used.  One has to remember that 
malloc() is specified by the C standard, and C has no notion of "pages" or any 
other such silliness that we Unix people assume are fundamental :-)

> NetBSD's mlock() rounds down, so regardless of the alignment of the
> space allocated, the mlock() tests should be working (the locking might
> not be exactly what the test is expecting, but all it is doing is keeping
> pages locked in memory - which pages exactly this test does not really
> care).

POSIX specifically states that mlock() //may// require that the address is 
page-aligned ... Our implementation does not require this:


/*
 * align the address to a page boundary and adjust the size accordingly
 */

pageoff = (addr & PAGE_MASK);
addr -= pageoff;
size += pageoff;
size = (vsize_t)round_page(size);


That is to say, the intent of our implementation is to wire the page where the 
range begins through the page where the range ends.  Note that internally, UVM 
represents all ranges as [start, start+size) (assuming start and size are page 
aligned / rounded).


> On my test setup, the kernel did not panic.   It does however experience
> some other weirdness, some of which is also apparent in the bablylon5
> tests, and others which might be.
> 
> My test system is an amd64 XEN DomU - configired with no swap space, and
> just 1GB RAM.   It typically has a tmpfs mounted limitted to 1/2 GB
> (actually slightly more than that - not sure where I got the number from,
> there may have been a typo... the -s param is -s=537255936 in fstab.
> That oddity should be irrelevant.
> 
> The first thing I noticed was that when I run the t_mlock test in this
> environment, it ends up failing when /tmp has run out of space.   And I
> mean really run out of space, in that it is gone forever, and nothing I
> have thought of so far to try gets any of that space back again.

And there are no files there?  Even an open-unliked file should disappear when 
the offending process exits.

> I assume that jemalloc() (aka malloc() in the test) is doing some kind
> of mmap() that is backed by space on /tmp and continually grabbing more
> until it eventually runs out, and that the kernel never releases that
> space (even after the program that mapped it has exited).   That seems
> sub-optimal, and needs fixing in the kernel, anonymous mmap's (or whatever
> kind jemalloc() is doing) need to be released when there are no more
> processes that can possibly use them.

Well, note that tmpfs also uses anonymous memory.  Is it that "df" on the tmpfs 
is really showing a bunch of space allocated to the tmpfs?

> I did not try umount -f (easier to just reboot...) but a regular umount
> failed (EBUSY) even though there was nothing visibly using anything on
> /tmp (and I killed every possible program, leaving only init - and yes,
> that did include the console shell I use to test things).
> 
> Umounting the tmpfs before running the t_mlock test worked fine (which also
> illustrates that none of the very few daemon processes, nor the shell, etc,
> from my login, are just happening to be using /tmp - and that it is the
> results of the malloc() calls from t_mlock that must be the culprit.
> (While ATF is running, it would be using /tmp as both its working
> directory, 

Re: Importing aiomixer to src

2021-05-06 Thread Jason Thorpe


> On May 6, 2021, at 6:44 AM, nia  wrote:
> 
> I think this version is suitable for importing into NetBSD base
> for continued development. This will also allow us to easily
> keep it in sync if there are any changes to the audio stack.

This is really cool, I would support this!

-- thorpej



Re: draft mcount(3) man page

2021-05-21 Thread Jason Thorpe



> On May 21, 2021, at 5:06 PM, David Holland  wrote:
> 
> The workings of the various *mcount functions are confusing and
> currently undocumented. After wading into it just now for I think the
> third time recently, I wrote the following to ease the process.

This looks awesome, David, thanks for writing it!

Might it be worth noting some of the historical legacy symbol naming issues 
that were are the root of the bug report we were just talking about a little 
while ago?

> 
> Comments?
> 
>   --
> .\"
> .\" Copyright (c) 2021 The NetBSD Foundation, Inc.
> .\" All rights reserved.
> .\"
> .\" This code is derived from software contributed to The NetBSD Foundation
> .\" by David A. Holland.
> .\"
> .\" Redistribution and use in source and binary forms, with or without
> .\" modification, are permitted provided that the following conditions
> .\" are met:
> .\" 1. Redistributions of source code must retain the above copyright
> .\"notice, this list of conditions and the following disclaimer.
> .\" 2. Redistributions in binary form must reproduce the above copyright
> .\"notice, this list of conditions and the following disclaimer in the
> .\"documentation and/or other materials provided with the distribution.
> .\"
> .\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
> .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 
> LIMITED
> .\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> .\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
> .\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
> .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> .\" POSSIBILITY OF SUCH DAMAGE.
> .\"
> .Dd May 21, 2021
> .Dt MCOUNT 3
> .Os
> .Sh NAME
> .Nm mcount ,
> .Nm _mcount ,
> .Nm __mcount
> .Nd profiling hooks
> .Sh LIBRARY
> .Lb libc
> .Sh SYNOPSIS
> .Fn _mcount "/* asmlinkage */"
> .Fn __mcount "u_long frompc" "u_long selfpc"
> .Sh DESCRIPTION
> The
> .Fl pg
> option to
> .Xr cc 1
> causes the compiler to insert calls to a profiling hook
> at the entry to each function it compiles.
> This allows code in the standard library to collect call graph
> information.
> .Pp
> The compiler inserts a call to
> .Fn _mcount ,
> typically written in assembler, which is responsible for saving
> registers and calling
> .Fn __mcount .
> .Pp
> Note however that on some targets, e.g. vax, these names are
> reversed.
> The name of what this manual page refers to as
> .Fn _mcount
> is determined by the compiler, and the name of what this manual
> page refers to as
> .Fn __mcount
> is determined by the macro
> .Dv MCOUNT_DECL
> in
> .In machine/profile.h
> (q.v.).
> .Pp
> The
> .Ar frompc
> argument to
> .Nm __mcount
> is the address the call came from, that is, the original return
> address; the
> .Ar selfpc
> argument is the address of the callee, normally expressed as the
> return address of the call to
> .Nm _mcount .
> Note that
> .Nm _mcount
> itself does not have arguments in the normal sense; it receives stack
> and register contents consistent with the call location inserted by
> the compiler, which is usually at the beginning of the original
> function but may vary by target.
> It is responsibile for extracting the arguments to
> .Nm __mcount
> from that state, in addition to preserving register contents as
> needed.
> .Pp
> The code for
> .Nm _mcount
> is on some machines contained in the macro
> .Dv MCOUNT
> in
> .In machine/profile.h ,
> and on others in its own source file.
> In the former case,
> .Nm __mcount
> can be and often is declared static.
> .Sh RETURN VALUES
> None.
> .Sh RESTRICTIONS
> Both
> .Nm _mcount
> and
> .Nm __mcount
> are internal interfaces and must not be called directly by application
> code.
> The symbols are not even guaranteed to exist unless the
> .Fl pg
> option was given to
> .Xr cc 1 .
> This manual page is provided to document the internal workings for
> developers who need to adjust them.
>   --
> 
> -- 
> David A. Holland
> dholl...@netbsd.org

-- thorpej



Re: fuse_opt.h

2021-06-03 Thread Jason Thorpe


> On Jun 3, 2021, at 7:19 AM, Emmanuel Dreyfus  wrote:
> 
> Is there any reason not to fix it?

Seems like it would be a bad idea NOT to fix it.  I assume one of the goals of 
librefuse was to be ABI compatible?

-- thorpej



Re: CVS commit: src/share/misc

2021-03-30 Thread Jason Thorpe


> On Mar 30, 2021, at 8:03 PM, Christos Zoulas  wrote:
> 
> There are 3 x 'sizeof(' in the tree compared to 'sizeof ' in '*.c' and
> I am counting 'sizeof (' as 'sizeof ':
> 
> 191337 'sizeof('
> 63508 'sizeof '
> 
> I think that it is better to bless the prevailing majority as the rule,
> but we should let others express their opinions first.

"sizeof(...)" is definitely my preference, in no small part for symmetry with 
"offsetof(...)" (or, at least, the historical definition thereof before the 
compiler had a built-in...)

-- thorpej



Re: proposal: remove traditional C support from lint

2021-03-16 Thread Jason Thorpe



> On Mar 16, 2021, at 4:55 PM, Roland Illig  wrote:
> 
> Any objections to removing the -t flag and everything that belongs to it?

No objections from me.

-- thorpej



Re: Q about pthread condition variable usage - mixing interlocks

2021-04-08 Thread Jason Thorpe


> On Apr 8, 2021, at 11:15 AM, Mouse  wrote:
> 
> I'm curious: why do you care?  If, of course, it's anything you can
> talk about.

I have a set of futex-based synchronization objects for libpthread I’ve been 
hacking on off-and-on for a while.

-- thorpej



Re: Q about pthread condition variable usage - mixing interlocks

2021-04-08 Thread Jason Thorpe



> On Apr 8, 2021, at 11:15 AM, Mouse  wrote:
> 
>> Is there ANY situation where, for a given pthread condition variable, that u$
> 
> For what value of "legitimate"?
> 
> I see no reason why this would/should be forbidden, and, indeed, at
> least the 5.2 manpage for pthread_cond_wait seems to imply that it is
> permitted, provided the condvar is idle:

Ah, yes, I should clarify, I am speaking about only when the condition variable 
is NOT idle.  And now that I read IEEE Std 1003.1-2017 again, I see:

“””
During this time, the effect of an attempt by any thread to wait on that 
condition variable using a different mutex is undefined.
“””

I.e. I was wrong that there was language missing from the spec.  I’ll save 
myself some work and, rather than attempt to gracefully handle this sort of 
thing, I’ll just return EPERM when the situation is detected.

Thanks, folks.

-- thorpej



Q about pthread condition variable usage - mixing interlocks

2021-04-08 Thread Jason Thorpe
As far as I can tell, POSIX has no language that covers this situation, so I’m 
tossing it out here for the hive mind...

Is there ANY situation where, for a given pthread condition variable, that 
using a different mutex as the interlock in two different calls to 
pthread_cond_wait() would EVER be legitimate usage?  I cannot come up with any 
scenario in which this would be appropriate, but who knows what some crazy 
applications might do.

POSIX for whatever dumb reason allows pthread_cond_signal() / 
pthread_cond_broadcast() without holding the interlock, but at least there’s 
explicit language about this in the spec (and thankfully the language contains 
a mild finger-wag about it).  But as I said, the spec is silent on this 
particular question, and hopefully it’s because the answer is “NO THAT IS A 
COMPLETELY STUPID THING TO DO AND ANY SELF RESPECTING THREAD LIBRARY WILL 
IMMEDIATELY CAST YOUR APPLICATION INTO THE NINTH CIRCLE OF HELL IF YOU TRY IT”.

What say you, smart people?

Merci d’avance.

-- thorpej



Native timerfd and eventfd

2021-08-05 Thread Jason Thorpe
[Cross-posted to tech-userlevel because this is primarily about an API exposed 
to user-space.]

This past winter, I wrote NetBSD native implementations of the Linux timerfd[1] 
and eventfd[2] APIs, mainly to fill out gaps in COMPAT_LINUX.  Of course, I 
wanted to write ATF unit tests for them, so I needed them to be native 
syscalls, too.

The code to implement these is quite small, and the APIs are not unreasonable 
(eventfd is certainly less cumbersome to use for its intended purpose than a 
pair of pipes, and timerfd is a much better interface than EVFILT_TIMER and 
essentially serves as a descriptor-oriented interface to the regular POSIX 
timer support).  Because of this, I think they should be included as native 
syscalls.

If the consensus is that these are reasonable things to add as native syscalls, 
I’ll whip up man pages for them and do that.  Otherwise, I’ll just relegate 
them to COMPAT_LINUX and toss the ATF unit tests (I see no mechanism to run ATF 
unit tests against COMPAT_* syscalls, and I’m not particularly inclined to 
invent one).

[1] https://man7.org/linux/man-pages/man2/timerfd_create.2.html

[2] https://man7.org/linux/man-pages/man2/eventfd.2.html

-- thorpej



Re: Anyone still using YP support on NetBSD?

2021-09-28 Thread Jason Thorpe


> On Sep 28, 2021, at 7:35 AM, Izumi Tsutsui  wrote:
> 
> FYI,
> https://mail-index.netbsd.org/netbsd-bugs/2018/09/24/msg059139.html

I guess old habits die hard :-)

Ok, well, then I won’t ask if it should be G/C’d :-)

-- thorpej



Anyone still using YP support on NetBSD?

2021-09-23 Thread Jason Thorpe
Anyone?

-- thorpej



Re: Libc support for old BSD-style "sigcontext" signal handlers is broken, let's just kill it.

2021-10-26 Thread Jason Thorpe



> On Oct 26, 2021, at 8:27 AM, Valery Ushakov  wrote:
> 
> I'm not sure why we left the sigcontext version in the tree.  I would
> guess for reference only, so it doesn't really need any fixing, as far
> as I understand.

Presumably it was to keep old binaries that used the sigcontext style would 
keep working.  If it was for reference only, one would think it would have been 
commented as such and the binary objects not actually built.

The commit that broke it had the log message “make these consistent”, which 
doesn’t really convey to me that it was intended to disable them.

-- thorpej



Re: Libc support for old BSD-style "sigcontext" signal handlers is broken, let's just kill it.

2021-10-26 Thread Jason Thorpe


> On Oct 26, 2021, at 6:56 AM, Jason Thorpe  wrote:
> 
> Obviously, the practical impact of this is nil, since no one apparently 
> noticed (and I guess we didn’t break any programs that people were using). We 
> haven’t documented that style of handler for a VERY long time but presumably 
> the intent was for it to continue to work because the code remained and was 
> modified several times over the years. (it was certainly my original intent 
> when I first added support for multiple signal trampolines in a single 
> process ~20 years ago, and it did in fact work for the first few years that 
> code existed).

Oh, I suppose I should also point out that kernel support for the older 
trampoline is in the COMPAT_16 module (the new “siginfo” stuff was added in 
NetBSD 2.0).

-- thorpej



Libc support for old BSD-style "sigcontext" signal handlers is broken, let's just kill it.

2021-10-26 Thread Jason Thorpe
Hey folks —

I was looking into the request for a libc function that determined if a program 
counter was inside a signal trampoline (see 
https://mail-index.netbsd.org/tech-kern/2021/10/15/msg027703.html), and the 
first thing I decided to do as part of this was unify all of the 
__libc_sigaction14() functions … each architecture has its own copy, and 
they’re mostly identical, except for in their support (or not) for old-style 
“sigcontext” handlers and their trampoline ABI versions (VAX had actually rev’d 
its sigcontext trampoline ABI before “siginfo” support was added).  Anyway, a 
few defines in  with the ability to override values in 
 will do the trick.

However, as I was adding an #ifdef conditional to the sigcontext handling in 
the new unified __sigaction14_sigtramp.c (because new architectures added after 
“siginfo” support was added never really needed it, and didn’t define a “struct 
sigcontext”), I realized that in 2006, support for sigcontext handling was 
broken on platforms that previously supported it (each copy gained a 
__LIBC12_SOURCE__ conditional around the sigcontext support, but nothing ever 
defined __LIBC12_SOURCE__ for those files).

I went ahead and listed it out by architecture in my own notes, but the TL;DR 
is that “sigcontext” style signal handlers don’t work on any architecture.

For those not familiar, there are 3 basic types of signal handlers:

1. void handler(int sig);  // simple traditional handler

2. void handler(int sig, int code, struct sigcontext *scp);  // sigcontext 
handler

3. void handler(int sig, struct siginfo *info, ucontext_t *ctx);  // siginfo 
handler

Originally, the code in libc to select the correct trampoline for each type 
selected the “siginfo” trampoline if SA_SIGINFO was set in the sigaction, 
otherwise it used the “sigcontext” trampoline.  The trampoline is responsible 
to restoring the previous program flow, and it needs to know what format the 
saved registers are in to do so, which is why there are two types.

Since Feb 2006 (with one minor exception; don’t worry, it was broken a few 
months later), the “siginfo” trampoline has always been chosen, meaning that 
any old program dynamically linked against libc that uses style “2” above would 
get garbage in the “code” and “scp” arguments.

Obviously, the practical impact of this is nil, since no one apparently noticed 
(and I guess we didn’t break any programs that people were using).  We haven’t 
documented that style of handler for a VERY long time but presumably the intent 
was for it to continue to work because the code remained and was modified 
several times over the years. (it was certainly my original intent when I first 
added support for multiple signal trampolines in a single process ~20 years 
ago, and it did in fact work for the first few years that code existed).

At this point, I don’t see any value in going back and actually fixing 
sigcontext style handlers.  My preference here is to simply remove that 
trampoline from libc.  I do think kernel support should remain, because we 
generally are sticklers for binary compatibility, but we should completely hide 
the “struct sigcontext” definitions from user-space in .

Thoughts?

-- thorpej



Re: Libc support for old BSD-style "sigcontext" signal handlers is broken, let's just kill it.

2021-10-26 Thread Jason Thorpe



> On Oct 26, 2021, at 11:07 AM, Robert Elz  wrote:
> 
>Date:Tue, 26 Oct 2021 18:27:20 +0300
>From:Valery Ushakov 
>Message-ID:  
> 
>  | OTOH the old
>  | *binaries* (using old dynamic libc, or linked with old static libc)
>  | still need the kernel support.
> 
> I think the point is that the "old dynamic libc" is the current libc,
> unless an old binary is so old that it linked against libc.11 or earlier
> (which means way back into the last millenium sometime).
> 
> So not only is kernel support needed, libc (12) support is as well.  Once
> the fabled libc version bump happens this will all change.

I mean, I could certainly fix it (and rev the symbol), just don’t know if it’s 
worth the bother since it’s been broken for so long.

-- thorpej



Re: crypt_r()?

2022-02-15 Thread Jason Thorpe


> On Feb 15, 2022, at 3:30 PM, Joerg Sonnenberger  wrote:
> 
> Am Wed, Feb 16, 2022 at 12:04:16AM +0100 schrieb Niclas Rosenvik:
>> do you mean that the interface should be 
>> crypt_r(const char *key, const char setting, char * storage, size_t
>> *storage_len)
>> where storage can be set to NULL to return the needed storage size in
>> storage_len?
> 
> No. There are two sensible interface contracts here:
> 
> (1) Verification only, which takes the password and the expected hash
> and returns a bool. No separate settings necessary as the hash already
> contains all the necessary parameters.
> 
> (2) Hashing password, which takes the password and the settings and
> returns an allocated string with the resulting hash. This is essentially
> the same interface as crypt(3), but the caller is responsible for
> free(3) the return value. Given that the goal of the crypt(3) interface
> is to be slow, optimizing a memory allocation away saves nothing, except
> making a more complicated interface.

While we’re on the topic of “what we’d like the interfaces to do”…

There really should be a function that takes a user name or ID and a cleartext 
password string, and IPCs to another (trusted system) process to do the 
verification, so that programs that want to verify passwords don’t need root 
privileges.  Same for changing passwords.

Then there’s no need for any of this crypt(3) tomfoolery.

-- thorpej



Re: crypt_r()?

2022-02-15 Thread Jason Thorpe



> On Feb 15, 2022, at 5:13 PM, Mouse  wrote:
> 
>> There really should be a function that takes a user name or ID and a clearte$
> 
> Maybe.  But then you have a lot more failure modes and a lot more
> possible attack surface.  It would also mean that you can't check or
> change passwords in single-user mode without starting the magic daemon;
> that would be a substantial regression as far as user experience goes,
> if nothing else.  And what about checking the root password for
> single-user boot with insecure console?

You put the fallback logic into the function libc that can satisfy the request 
using the Old Way if the helper isn’t available.  Obviously, to satisfy it the 
Old Way, the process would need to have root privileges, but this would be OK 
in the scenario you’re describing.

> It _is_, however, very much in keeping with the "encapsulate
> single-purpose code into a single place" attitude that has brought a
> lot of benefits.  I wonder if there isn't some better way I'm missing.

It’s certainly a lot better than having a big complex program (that exposes 
itself to the network, potentially) require root privileges just to check 
passwords.

-- thorpej



Re: Proposal: remove usr.bin/mkstr

2022-04-09 Thread Jason Thorpe


> On Apr 9, 2022, at 8:49 AM, Robert Elz  wrote:
> Just stop suggesting removing things for no better reason than
> that you see no point keeping them.  If the existence of something
> which seems not to be all that necessary is being a roadblock
> to getting other work done, then by all means, request assistance,
> from which sometimes the solution might be to simply remove the
> roadblock -- other times a suitable alternative might be found.
> But while it is doing no harm, leave it alone.

+1

I’m definitely guilty of asking for things to be removed, but in the case of my 
suggestions, it’s because their presence were actual impediments to progress 
(e.g. keeping round the Token Ring and FDDI code, which no one had apparently 
used in many years, was impeding progress on network stack improvements).

I’m all for keeping old stuff around.

-- thorpej







postinstall(8): Add opensslcerts item to regen /etc/openssl/certs.

2023-08-26 Thread Jason Thorpe


> On Aug 26, 2023, at 1:59 AM, Taylor R Campbell  wrote:
> 
> Module Name: src
> Committed By: riastradh
> Date: Sat Aug 26 05:59:00 UTC 2023
> 
> Modified Files:
> src/usr.sbin/postinstall: postinstall.in
> 
> Log Message:
> postinstall(8): Add opensslcerts item to regen /etc/openssl/certs.
> 
> Works only with destdir /, since it relies on running openssl(1),
> which is not available as a tool or required in the cross-build
> environment.

Maybe there should be a boot-time check in an rc script for an out-of-date 
trust cache?

-- thorpej



Re: installboot option for u-boot bootloader paths

2022-07-05 Thread Jason Thorpe


> On Jul 5, 2022, at 8:28 PM, Brook Milligan  wrote:
> 
> Installboot can install U-Boot boot blocks directly into a system image.  
> Normally, the U-Boot files are searched for in /usr/pkg/share/u-boot, under 
> the expectation that most people will build them with pkgsrc.  However, it is 
> also possible to set an environment variable to a colon-separated search path 
> to accommodate other situations.
> 
> I would like to add a command line option (tentatively -u) to set the search 
> path and override the environment or default.
> 
> Please see the attached patch.  I would appreciate comments and improvements.

Looks good to me!

> Thanks a lot.

Thank *you*!

-- thorpej




Re: Reading I2C HID descriptor from userland

2022-08-20 Thread Jason Thorpe
See responses further down in-line:

> On Aug 20, 2022, at 8:51 AM, Emmanuel Dreyfus  wrote:
> 
> Hello
> 
> I have a ihidev device that fails to find its HID descriptor. In order to
> debug that, I try to retreive it from userspace.
> 
> At kernel boot, I have:
> ihidev1 at iic2 addr 0x5d
> 
> I try the code at the end of this message, inspired from sys/dec/i2c/ihidev.c.
> I am supposed to find the HID descriptor at address sc_hid_desc_addr, which 
> should come frome ACPI.
> 
> The HID over I2C specification is at 
> https://docs.microsoft.com/en-us/previous-versions/windows/hardware/design/dn642101%28v=vs.85%29?redirectedfrom=MSDN
> Section 10 deals with ACPI bindings. It says the HID descriptor
> address is to be find in a _DSM object that "defines a structure
> that contains the HID Descriptor address (2 bytes)"
> 
> acpdump -dt gives me this:
>Method (_DSM, 4, Serialized)  // _DSM: Device-Specific Method
>{
>If ((Arg0 == HIDG))
>{
>Return (HIDD (Arg0, Arg1, Arg2, Arg3, HID2))
>}
> 
>Return (Buffer (One)
>{
> 0x00 // .
>})
>}
> 
> How should this be readen? I found the HIDG definition earlier in the 
> ACPI dump, and my device has Name (HID2, One).
>Name (HIDG, ToUUID ("3cdff6f7-4267-4555-ad05-b30a3d8938de") /* HID I2C 
> Device */)
> (...)
>Method (HIDD, 5, Serialized)
>{
>If ((Arg0 == HIDG))
>{
>If ((Arg2 == Zero))
>{
>If ((Arg1 == One))
>{
>Return (Buffer (One)
>{
> 0x03 
> // .
>})
>}
>}
> 
>If ((Arg2 == One))
>{
>Return (Arg4)
>}
>}
> 
> Hence 1 or 3? Both return garbage. In fact I tried to brute-force the thing, 
> I never get anything relevant.

It’s definitely not going to be 3, because the arguments being passed to the 
HIDD method from the _DSM are:

Arg0 = 3cdff6f7-4267-4555-ad05-b30a3d8938de
Arg1 = 1
Arg2 = 1
Arg3 = NULL
Arg4 = One (I guess?)

What does the ihidev driver think the descriptor address is?  I.e. what’s the 
result of the _DSM evaluation?

Can you provide your entire ACPI dump?

-- thorpej



Re: dynamic linker change to handle multiple PT_LOAD segments

2023-01-05 Thread Jason Thorpe


> On Jan 5, 2023, at 8:53 AM, Christos Zoulas  wrote:
> 
> Hello,
> 
> Our dynamic linker ld_elf.so in map_object.c currently can only handle 2 
> PT_LOAD segments (one for text and one for data); the kernel elf loader does 
> not have this limitation, it can load multiple PT_LOAD segment. The following 
> patch (from FreeBSD) removes this limitation from the dynamic linker. The 
> reason I made this patch now is that the latest binutils (2.39) for amd64 
> sets a maxinum pagesize of 2Mb, and when using relocation read only binaries 
> (ld -z relro) this will cause the binaries to have an extra 2Mbytes for 
> alignment. This can be fixed by building binutils to set a separate code 
> segment by default (ld -z separate-code). Binutils then sets the maximum page 
> size to 4Kb. Setting this option creates 4 PT_LOAD segments, two for text 
> (r--, r-x) and two for data (r--, rw-), which also improves security. This is 
> the default for linux on x86 and the patch also makes it the default for 
> NetBSD x86. The patch also adds -z noseparate-code to the kernel builds so 
> that we don't need to fix the boot loaders. I am planning to commit this 
> soon, so please let me know if you hsve any objections.
> 
> Best,
> 
> christos

Is this ld.elf_so change also going to be put into NetBSD 10?  (Seems like it 
could / should be?). That way, the configuration changes for the toolchain 
(which you will presumable want to upstream) can match against NetBSD 10 for 
the new defaults.

-- thorpej



Re: style(5) proposal: forbid extern in .c

2023-03-15 Thread Jason Thorpe


> On Mar 15, 2023, at 4:23 AM, Taylor R Campbell 
>  wrote:
> 
> Proposal: Forbid extern declarations in .c files.
> 
> extern declarations in .c files invite easily avoided bugs where the
> definition and use have mismatched types, because the compiler doesn't
> have an opportunity to check them.  Fix: Always put the extern
> declaration in a .h file shared by the .c file defining it and the .c
> files using it.
> 
> Pretty simple.  Any objections?

LGTM.  Would be nice if there was a way to get the compiler to enforce this.

-- thorpej



Re: style(5) proposal: forbid extern in .c

2023-03-28 Thread Jason Thorpe



> On Mar 28, 2023, at 5:53 PM, Roland Illig  wrote:
> 
> Am 15.03.2023 um 16:40 schrieb Jason Thorpe:
>> 
>>> On Mar 15, 2023, at 4:23 AM, Taylor R Campbell 
>>>  wrote:
>>> 
>>> Proposal: Forbid extern declarations in .c files.
>>> 
>>> extern declarations in .c files invite easily avoided bugs where the
>>> definition and use have mismatched types, because the compiler doesn't
>>> have an opportunity to check them.  Fix: Always put the extern
>>> declaration in a .h file shared by the .c file defining it and the .c
>>> files using it.
>>> 
>>> Pretty simple.  Any objections?
>> 
>> LGTM.  Would be nice if there was a way to get the compiler to enforce this.
> 
> I implemented this check as a lint warning, for both objects and functions.
> 
> Building NetBSD/amd64 results in 5054 new warnings, which makes this
> warning the second-most popular, right between the 11193 'conversion may
> lose accuracy' and the 4974 'argument unused in function'.

This is great, thanks Roland!  I suspected you would make this happen :-)

-- thorpej



Re: epoll exposure

2023-07-31 Thread Jason Thorpe


> On Jul 31, 2023, at 6:25 PM, Rin Okuyama  wrote:
> 
> epoll(2) for COMPAT_LINUX is implemented in as similar manner to
> NetBSD's. Therefore, it should also have this compatible issue.
> Is is still useful for real Linux applications? Or do you have a
> plan to lift this limitation?

When I first looked at epoll() (back when I wrote eventfd and timerfd), I 
looked at the FreeBSD implementation and specifically chose not to use it 
because it was based on kqueue, and thus has the “kq descriptors not inherited 
across forks” wart.  The extent to which this will impact real Linux 
implementations … honestly I have no idea.

Perhaps the BEST solution is to fix this stupid limitation in kqueue (obviously 
version the system call for backward BSD compatibility).

-- thorpej



Re: epoll exposure

2023-07-31 Thread Jason Thorpe


> On Jul 31, 2023, at 6:53 PM, Theodore Preduta  wrote:
>> 
>> epoll(2) for COMPAT_LINUX is implemented in as similar manner to
>> NetBSD's. Therefore, it should also have this compatible issue.
> 
> The epoll implementation is literally a direct port of the one from
> FreeBSD's Linux compatibility layer, and so has the same limitations.
> 
>> Is is still useful for real Linux applications? Or do you have a
>> plan to lift this limitation?
> 
> As for if it is still useful, yes!  I've tested a bunch of Linux
> binaries (admittedly mostly ones written in Go) and they all worked as
> expected.  So from what I can tell, the fork() limitation is not that
> big of a deal in practice.

I want to make clear — I’m not trying to knock on the work that’s been done 
here… in fact, kudos to you, Theo, for doing this!

I only wanted to raise the concern about it as a *native* call… If it’s exposed 
natively, application developers might expect it to behave the same way, 
including the inherit-across-fork thing… I think if there’s a minor glitch in 
emulation, it’s a little easier to shrug off.  When I did timerfd and eventfd, 
I made sure the behavioral differences between the native NetBSD implementation 
and Linux were limited to things that didn’t make sense on NetBSD anyway, and 
would only be noticed by things that are elbow-deep in weird Linux-specific 
sh*t for other reasons :-)

Anyway, like I said, I think the best way forward is to make it possible for kq 
descriptors to be inherited… it’s a little tricky because of some of the wacky 
stuff kqueue can track, but I think NetBSD can lead on this and define a set of 
semantics that makes sense.

-- thorpej



Re: epoll exposure

2023-07-31 Thread Jason Thorpe


> On Jul 31, 2023, at 1:38 AM, nia  wrote:
> 
> Hey, I regret that epoll was committed without further discussion with
> pkgsrc developers. We have a lot of experience with this already
> (illumos/SmartOS exposes a compatibility epoll) and the situation is
> not entirely great and requires lots of workarounds.

What’s even worse … I think our epoll() emulation isn’t 100% compatible w/ 
Linux’s.  It wouldn’t bother me so much if it were, but I’m pretty sure it’s 
not.

(IIRC, epoll descriptors are inherited across fork, and if ours is emulated 
with kqueue under the covers, then that can’t work…)

-- thorpej



Re: Trivial program size inflation

2023-07-01 Thread Jason Thorpe


> On Jun 30, 2023, at 11:30 PM, tlaro...@polynum.com wrote:
> 
> Le Fri, Jun 30, 2023 at 01:37:10PM -0400, Mouse a écrit :
>> Based on something at work, I was looking at executable sizes.  I
>> eventually tried a program stripped about as far down as I could:
>> 
>> int main(void);
>> int main(void)
>> {
>> return(0);
>> }
>> 
>> and built it -static.  size on the resulting binary:

Ok, so:

the-ripe-vessel:thorpej 10$ cat nop.c
int main(void) { return 0; }
the-ripe-vessel:thorpej 11$ cc -static nop.c
the-ripe-vessel:thorpej 12$ size a.out   
   textdata bss dec hex filename
 581799   28928 2176553 2787280  2a87d0 a.out
the-ripe-vessel:thorpej 13$ cc --version
cc (nb1 20210411) 10.3.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

the-ripe-vessel:thorpej 14$ 

Should be pretty trivial to see what’s in there…

the-ripe-vessel:thorpej 15$ nm a.out | wc -l
1073
the-ripe-vessel:thorpej 16$ nm a.out | grep printf
00470176 T __vfprintf_unlocked_l
0046e96d T _snprintf
0046e9fc T _snprintf_l
00410319 T _snprintf_ss
00474706 T _vfprintf_l
0046e95b T _vsnprintf
0046e720 T _vsnprintf_l
0046d440 T _vsnprintf_ss
004370cf T je_malloc_cprintf
00437151 T je_malloc_printf
00436fcc T je_malloc_snprintf
0043704e T je_malloc_vcprintf
00434da1 T je_malloc_vsnprintf
0046e96d W snprintf
0046e9fc W snprintf_l
00410319 W snprintf_ss
004746b1 T vfprintf
00474706 W vfprintf_l
0046e95b W vsnprintf
0046e720 W vsnprintf_l
0046d440 W vsnprintf_ss
the-ripe-vessel:thorpej 17$ 

Hmmm, who is calling stdio routines here?

the-ripe-vessel:thorpej 19$ nm a.out | grep "je_" | wc -l
 392
the-ripe-vessel:thorpej 20$ 

Oh look, the entirety of jemalloc seems to be included in the binary.  WTF 
knows why that’s happening, but apparently it is, and jemalloc pulls in a ton 
of additional stuff.

Someone should maybe take a look into that.

-- thorpej



Re: Trivial program size inflation

2023-07-01 Thread Jason Thorpe



> On Jul 1, 2023, at 8:51 AM, Mouse  wrote:
> 
> Why TLS??  Is there some networking going on under the hood?

Thread Local Storage

-- thorpej



Re: Trivial program size inflation

2023-07-01 Thread Jason Thorpe


> On Jul 1, 2023, at 8:20 AM, tlaro...@polynum.com wrote:
> 
> This is also what I meant by "static seems to be considered deprecated".

Honestly, I find the obsession with static linking hilariously quaint.  NetBSD 
already bends backwards to an extreme degree by ensuring that old version of 
*system calls* work correctly *in the kernel*.  Some other systems provide the 
ABI contract at the libc boundary, and let the libc <-> kernel interface be 
fluid (keep the compatibility stuff in user-space where it belongs!)

Obviously this is not feasible to do with static binaries, and we have one 
platform that ONLY supports static, but for the rest, I honestly think we 
should officially deprecate static linking in the general case (obviously it 
has some uses in super-constrained environments, but in those cases we are 
often also using crunch’d binaries as well).

-- thorpej



Re: Trivial program size inflation

2023-07-01 Thread Jason Thorpe



> On Jul 1, 2023, at 1:41 PM, Michael van Elst  wrote:
> 
> thor...@me.com (Jason Thorpe) writes:
> 
>> Obviously this is not feasible to do with static binaries
> 
> In the world of go, you skip libc and static binaries are the norm.

Go sounds poorly designed, then.

-- thorpej



Re: Trivial program size inflation

2023-07-02 Thread Jason Thorpe


> On Jul 1, 2023, at 6:20 PM, Emmanuel Dreyfus  wrote:
> 
> On Sat, Jul 01, 2023 at 02:25:03PM +, RVP wrote:
>> Not to forget the code for C++ support. And, of course even static
>> binaries may call dlopen() and friends. So that dl*() and the ELF bits
>> right there.
> 
> At least in 9.3, dlopen() in a static binary does not work. Try using 
> a NSS module from a statically lunked binary to check that.

Right, because the dynamic linker is not loaded into static binaries.  For 
dynamic binaries, the kernel maps it as the program’s interpreter.

-- thorpej



Re: NetBSD 10 and NetBSD 11...

2023-05-17 Thread Jason Thorpe


> On May 17, 2023, at 11:47 AM, tlaro...@polynum.com wrote:
> 
> But the fact that the advertised list of changes for 10 stops at
> February 2023 is probably something the webmaster(s) should look
> at: it's a bit confusing/disturbing...

Those are just the major changes.  There has been a focus on bug fixes and 
other minor improvements since then.  Alas, often those don't warrant calling 
out in the CHANGES file.

Of course, major, potentially-destabilizing changes that are trickling into 
-current now are not going to be pulled into netbsd-10 until they've had more 
bake time, and not at all if they involve changes to the ABI.

-- thorpej



Re: NetBSD 10 and NetBSD 11...

2023-05-17 Thread Jason Thorpe



> On May 17, 2023, at 11:21 AM, tlaro...@polynum.com wrote:
> 
> I don't know on what mailing list to ask this...
> 
> I have seen on the web site that changes to 10 have stopped in february
> and that changes are now for 11...
> 
> Does this mean that 10 will never be released and that the focus is on
> 11 now?

There have been a steady stream of bug fixes from the trunk being pulled into 
the netbsd-10 release branch.  I don't have knowledge of releng@'s plans vis a 
vis a release date.

-- thorpej



Re: Issues with lseek(2) on a block device

2024-02-21 Thread Jason Thorpe


> On Feb 21, 2024, at 2:52 AM, Sad Clouds  wrote:
> 
> Hello, for most operating systems determining the size of a block
> device can be done with:
> 
> lseek(fd, 0, SEEK_END);

On what operating systems does this do what you claim?

> However, on NetBSD this does not seem to work.

It doesn’t work on macOS, either:

thorpej-mbp:thorpej 541$ sudo ./lseek /dev/disk4
Password:
lseek(fd, 4096, SEEK_SET) = 4096 bytes
lseek(fd, 0, SEEK_END)= 0 bytes
thorpej-mbp:thorpej 542$ 

-- thorpej



Re: Issues with lseek(2) on a block device

2024-02-21 Thread Jason Thorpe
(This should really be on tech-kern…)

> On Feb 21, 2024, at 1:20 PM, Taylor R Campbell 
>  wrote:
> 
>> Date: Wed, 21 Feb 2024 10:52:55 +
>> From: Sad Clouds 
>> 
>> Hello, for most operating systems determining the size of a block
>> device can be done with:
>> 
>> lseek(fd, 0, SEEK_END);
>> 
>> However, on NetBSD this does not seem to work.
> 
> Internally, this is happens for more or less the same reason that
> stat(2) doesn't return the size of a block device in st_size.
> 
> Each file system on which device nodes can reside implements its own
> VOP_GETATTR (used by both lseek(2) and stat(2)), and most of them use
> the _inode_ size (more or less) rather than querying the block device
> that the device node represents for its physical size.

Add a d_stat devsw entry point and implement spec_getattr()?  Except you don’t 
want ALL of the results from spec_getattr(), just some of them - rather, specfs 
can’t possibly provide all of the info, just some of it.  So the file system 
that owns the vnode (UFS, in this case) would need to call spec_getattr() to 
get the stuff that specfs can provide, then do its own stuff to overlay the 
usual UFS stuff over it.

-- thorpej



Re: Issues with lseek(2) on a block device

2024-02-25 Thread Jason Thorpe


> On Feb 25, 2024, at 12:29 AM, Robert Elz  wrote:
> 
> ps: as a semi-related anecdote, the first system I ever personally
> installed any unix version onto (way back in the mid 70's) booted from
> a reel to reel (1/2" I think) tape - and by booted, I mean ran with its
> filesystems (or filesystem, everything was in one) on the tape (read only,
> naturally).

HP-UX install media on QIC-24 back in the HP-UX 7/8/9 days (earliest I have 
experience with) was also mounted as the file system.  It’s quite a trip to see 
(well, hear, anyway) a tape drive get randomly-accessed like that.  And, yah, 
it’s not fast.

-- thorpej