rc.d scripts patch needs review

2020-10-24 Thread Rick Macklem
Hi,

I've put D26938 up on phabricator. The patch applies to the
/etc/rc.d scripts mountd and nfsd, to make use of the new
mountd "-R" option committed via r376026.

If anyone can review this, please do so.
(Is there a group review for rc scripts?)

Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: copyright notice question

2020-10-21 Thread Rick Macklem
Mehmet Erol Sanliturk wrote:
>On Wed, Oct 21, 2020 at 4:04 AM Rick Macklem 
>mailto:rmack...@uoguelph.ca>> wrote:
>>Warner Losh wrote:
>>>On Mon, Oct 19, 2020, 7:25 PM Rick Macklem 
>>>mailto:rmack...@uoguelph.ca><mailto:rmack...@uoguelph.ca<mailto:rmack...@uoguelph.ca>>>
>>> wrote:
>>>>I'll admit I've hesitated to ask this for a long time, but I guess I have 
>>>>to;-)
>>>>I've created two daemons for NFS-over-TLS, using the code in
>>>>/usr/src/usr.sbin/gssd/gssd.c as a starting point.
>>>>--> As such, I left the copyright notice from this file on the two files.
>>>>  (As you can see, it is a 2 clause FreeBSD one, so the terms aren't
>>>>   an issue.)
>>>>
>>>>What I am wondering is if I should be adding my name to it as an
>>>>additional author or something like that?
>>>>(I don't care, but it does seem a little weird that the copyright is held
>>>> by Isilon Inc, since I have no connection to them.)
>>>
>>>Likely. If you changed a substantial amount, then yes. The rule of thumb is 
>>>>50%
>>> is no brainer yes. Smaller percentages require more nuanced judgement as to 
>>> whether the changes are substantial enough to create a new work. Chances are
>>> quite good you fall into one of those categories. Also, if you have 
>>> replaced more
>>>than ~90% chances are good no original work remains. Again, a judgement call.
>>>There are more technical legal guidelines, but that would require a lawyer.
>>>
>>>My hunch is that unless this is something far more trivial than I then I'd 
>>>add the
>>> notice, but retaining the old.
>>Well, I'd guess it's somewhere in the 50->90% category.
>>Would just adding a comment below the current copyright notice like:
>>/*
>> * Extensively modified from /usr/src/usr.sbin/gssd.c for RPC-over-TLS
>> * by Rick Macklem.
>> */
>>be sufficient for the project, or should I put a second copyright notice
>>on it with my name on it? (This will seem odd, since it will have the same
>>terms as the extant one, but if that is what is appropriate for the project..)
>>
>>It is my understanding (I'm obviously not a lawyer, clearly indicated by the
>>size of my bank account;-) that a copyright notice can only be changed by
>>the holder (or with their express permission), but an additional copyright
>>notice can be attached to the work to cover the re-write.
>>Is this correct? (All amateur lawyers, feel free to respond;-)
>>
>>Thanks for your comments, rick
>>
>>>Warnet
>>
[copyright comment snipped]
>My opinion is as follows :
>
>At the top of the related sources I would include a message ( approximately ) 
>as >follows :
I believe for FreeBSD this would need to be after the main copyright notice,
but that is trivial, I think?


>After svn ( or git ) modification number(s)  ... ( including )  I have made 
>substantial ( or significant ) modifications ( or improvements ) .
>The copyright of these modifications with the present license listed below are 
>>belong to
>
>Rick Macklem , starting from date .
> (  Rick Macklem  ... an approximately fixed address ... )
Does anyone know if there are examples of this in other major open
source projects?

I would be very shy of creating a notice that is not exactly what other
FreeBSD files have in them. For one thing, is referring to license terms in 
another
copyright notice "standard practice"?

I'll admit that, unless there are examples of this elsewhere in the FreeBSD
source tree (or at least in other major open source projects), I would not be
comfortable doing this.

Maybe I'll try asking this question...
Is there a concern that the copyright notice that is on gssd.c could be 
considered
"not valid" due to the extensive changes made to the code by me?
(If the answer to that is "no", then I don't see that anything needs to be done,
 since the notice includes reasonable terms as already used elsewhere in 
FreeBSD.
 I have no interest in being a copyright holder for this unless the copyright 
can
 be invalidated.)
Put another way, "Is there a concern that the extensive changes would allow the
copyright notice be replaced by something like a GPL ?".

rick, who would rather just lease the notice alone



Each contributor may append such notifications listed on the topmost part .
When a person reads such sources , she/he very easily understands its 
modification and copyright status without any doubt .


Mehmet Erol Sanliturk








___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: copyright notice question

2020-10-20 Thread Rick Macklem
Warner Losh wrote:
>On Mon, Oct 19, 2020, 7:25 PM Rick Macklem 
>mailto:rmack...@uoguelph.ca>> wrote:
>>I'll admit I've hesitated to ask this for a long time, but I guess I have 
>>to;-)
>>I've created two daemons for NFS-over-TLS, using the code in
>>/usr/src/usr.sbin/gssd/gssd.c as a starting point.
>>--> As such, I left the copyright notice from this file on the two files.
>>  (As you can see, it is a 2 clause FreeBSD one, so the terms aren't
>>   an issue.)
>>
>>What I am wondering is if I should be adding my name to it as an
>>additional author or something like that?
>>(I don't care, but it does seem a little weird that the copyright is held
>> by Isilon Inc, since I have no connection to them.)
>>
>Likely. If you changed a substantial amount, then yes. The rule of thumb is 
>>50%
> is no brainer yes. Smaller percentages require more nuanced judgement as to 
> whether the changes are substantial enough to create a new work. Chances are
> quite good you fall into one of those categories. Also, if you have replaced 
> more 
>than ~90% chances are good no original work remains. Again, a judgement call. 
>There are more technical legal guidelines, but that would require a lawyer.
>
>My hunch is that unless this is something far more trivial than I then I'd add 
>the
> notice, but retaining the old.
Well, I'd guess it's somewhere in the 50->90% category.
Would just adding a comment below the current copyright notice like:
/*
 * Extensively modified from /usr/src/usr.sbin/gssd.c for RPC-over-TLS
 * by Rick Macklem.
 */
be sufficient for the project, or should I put a second copyright notice
on it with my name on it? (This will seem odd, since it will have the same
terms as the extant one, but if that is what is appropriate for the project..)

It is my understanding (I'm obviously not a lawyer, clearly indicated by the
size of my bank account;-) that a copyright notice can only be changed by
the holder (or with their express permission), but an additional copyright
notice can be attached to the work to cover the re-write.
Is this correct? (All amateur lawyers, feel free to respond;-)

Thanks for your comments, rick

Warnet


Here's what it currently says:
/*-
 * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
 *
 * Copyright (c) 2008 Isilon Inc http://www.isilon.com/
 * Authors: Doug Rabson mailto:d...@rabson.org>>
 * Developed with Red Inc: Alfred Perlstein 
mailto:alf...@freebsd.org>>
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *notice, this list of conditions and the following disclaimer in the
 *documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */

Thanks for any comments, rick
___
freebsd-current@freebsd.org<mailto:freebsd-current@freebsd.org> mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to 
"freebsd-current-unsubscr...@freebsd.org<mailto:freebsd-current-unsubscr...@freebsd.org>"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: review of new mountd option disabling use of rpcbind

2020-10-20 Thread Rick Macklem
Peter Eriksson wrote:
> Suggestion:
> Add a check for sysctl vfs.nfsd.server_min_nfsvers and if set to 4 or higher 
> - 
> automatically enable the “-R” option.
I actually have patches to the /etc/rc.d scripts that both set
vfs.nfsd.server_min_nfsvers=4 and the "-R" option.

The reason I went with an explicit "-R" is that I thought having mountd
magically stop registering with rpcbind might be considered a POLA
violation.
--> With the explicit "-R" option, it will only happen if the "-R" flag is
  set or if nfsv4_server_only="YES" is put in /etc/rc.conf (which is new,
  so it will be expected to result in different behaviour).
A second reason where the explicit "-R" might be preferred is:
if the nfsd is a loadable module, it is loaded by mountd.
However, to set the sysctl, it must be loaded before starting mountd.
(This is done by the /etc/rc.d/mountd script, so it is not a big issue, but
 might affect someone?)

However, nfsd already chooses to not register when with rpcbind when
vfs.nfsd.server_min_nfsvers, so I can also see an argument for doing
what you suggest, since it is consistent with wat nfsd does.

I don't have a strong opinion either way.
What do others think?

Thanks for the comment, rick

- Peter


> On 20 Oct 2020, at 02:56, Rick Macklem  wrote:
>
> Hi,
>
> I've put a patch up on phabricator that adds a new option to mountd
> which disables use of rpcbind. This can be done for NFSv4 only servers.
> It appears that rpcbind is now considered a security risk by some.
>
> I listed freqlabs@ as a reviewer, but if anyone else would like to review
> it, please do so. (Someone has reviewed the man page update already.
> Thanks bcr@.)
>
> It's D26746.
>
> rick
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


copyright notice question

2020-10-19 Thread Rick Macklem
I'll admit I've hesitated to ask this for a long time, but I guess I have to;-)
I've created two daemons for NFS-over-TLS, using the code in
/usr/src/usr.sbin/gssd/gssd.c as a starting point.
--> As such, I left the copyright notice from this file on the two files.
  (As you can see, it is a 2 clause FreeBSD one, so the terms aren't
   an issue.)

What I am wondering is if I should be adding my name to it as an
additional author or something like that?
(I don't care, but it does seem a little weird that the copyright is held
 by Isilon Inc, since I have no connection to them.)

Here's what it currently says:
/*-
 * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
 *
 * Copyright (c) 2008 Isilon Inc http://www.isilon.com/
 * Authors: Doug Rabson 
 * Developed with Red Inc: Alfred Perlstein 
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *notice, this list of conditions and the following disclaimer in the
 *documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */

Thanks for any comments, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


review of new mountd option disabling use of rpcbind

2020-10-19 Thread Rick Macklem
Hi,

I've put a patch up on phabricator that adds a new option to mountd
which disables use of rpcbind. This can be done for NFSv4 only servers.
It appears that rpcbind is now considered a security risk by some.

I listed freqlabs@ as a reviewer, but if anyone else would like to review
it, please do so. (Someone has reviewed the man page update already.
Thanks bcr@.)

It's D26746.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


RFC: gssd needs /usr mounted to start up

2020-10-10 Thread Rick Macklem
Meowthink reported a problem on freebsd-hackers@ where the
gssd would not start up because /usr was not yet mounted.
(I moved the discussion here, hoping to catch more comments.)

He has a separately mounted /usr and, recently, gssd was failing
to start since /usr was not yet mounted when /etc/rc.d/gssd was
executed.
Looking at /etc/rc.d/gssd, this is not surprising, since the REQUIRED
line only lists "root" as a requirement.
I can see a couple of things that can be done, but no obvious ideal
solution:
(A) - Add "mountcritlocal" to the REQUIRED line, which is what
Meowthink has done.
This seems harmless and works for the case of a local filesystem
 /usr, but does not work if /usr is an NFS mounted file system.

(B) - Add both "mountcritlocal" and "mountcritremote" to the
REQUIRED line.
This would also fix the case of an NFS mounted /usr, but it also
implies that all NFS entries in /etc/fstab that uses "sec=krb5[ip]"
would also need the "late" option specified.

I am thinking that (A) can be done and MFC'd, since it shouldn't
break anything (or cause a POLA violation).
Maybe (B) can be done for head/FreeBSD13 with an entry in the
Release notes, indicating the need for "late" on NFS entries using
"sec=krb5[ip]" in /etc/fstab. (It would result in a POLA violation if
MFC'd, since "sec=krb5[ip]" entries in /etc/fstab would break until
"late" is added.)

I am interested in a solution for this, in part, because the daemons
for NFS over TLS have the same problem.

Any ideas/suggestions, rick
ps: I thought of moving gssd to /sbin, but it uses several libraries,
  including Kerberos ones, that are in /usr/lib.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: RFC: should copy_file_range(2) remain Linux compatible or support special files?

2020-10-01 Thread Rick Macklem
Mateusz Guzik wrote:
>On 9/27/20, Alan Somers  wrote:
>> On Sun, Sep 27, 2020 at 7:49 AM Wall, Stephen 
>> wrote:
>>
>>>
>>> > I'll assume you are referring to the "flags" argument when you say
>>> "param" above.
>>>
>>> Correct, I was misremembering the man page.
>>>
>>> > However, since the Linux man page says it will return EINVAL if
>>> > the "flags" argument is non-zero, you've still introduced an
>>> incompatibility
>>> > w.r.t. the Linux behaviour.
>>>
>>> This would be a one-way incompatibility, i.e. code written on linux will
>>> run unaltered on FreeBSD.
>>> If the flag were along the lines of `FREEBSD_COPY_DEVICES` (or whatever,
>>> important part is `FREEBSD`) it will be quite obvious that this code
>>> needs
>>> to be adapted to other platforms:
>>> ```
>>> #ifndef FREEBSD_COPY_DEVICES
>>> #define FREEBSD_COPY_DEVICES 0
>>> #endif
>>> ```
>>>
>>> > Why require extra work for so little purpose?
>>>
>>> I'm sorry, I'm not sure what extra work you are referring to.  Specifying
>>> a flag on copy_file_range(2)?  That's trivial.
>>>
>>
>> It's easy to leave out, which could cause a lot of pain for users who don't
>> understand why their application isn't working.
>>
>
>A FreeBSD-specific flag to a Linux-alike syscall is bound to run into
>a conflict at some point, making it a non-starter.
>
>>
>>>
>>> > My opinion is that if we can make it work for character devices, we
>>> should.
This turns out to be a lot messier than I thought it would be.
For example: /dev/zero cannot be read via VOP_READ() on the vnode.
To read it, you must us dofileread() on the file descriptor.
--> This implies a separate copy loop from the one implemented by
vn_generic_copy_file_range(), which works on vnodes. (And that needs to
remain, because the NFS server only has vnodes and no open file descriptors.

At least that appears to be the case when I tried it and then looked in
sys/fs/devfs and sys/dev/null when it didn't work.

rick

>>
>> Well, collecting opinions was the point, no? :)
>>
>> What's going to use this function besides system commands?  I think I saw
>> `cp` and `dd` mentioned - I think it unlikely you need to be concerned
>> about their portability.
>>
>
> Userspace RAID-like applications could use it for rebuilds, and they'll
> need it to work on device nodes.  Userspace NFS servers and iSCSI servers
> could obviously use it.  And since the FUSE protocol includes a
> COPY_FILE_RANGE operation, many FUSE daemons could implement that with
> copy_file_range(2).

I think the first thing to do is check what Linux is doing here, most
notably they may have other primitives to take care of it and in that
case would be best to implement equivalents.

I don't have a strong opinion on VCHR support. I will note there may
be Linux code expecting to fail with such argument.

If Linux-compatible approach mentioned above is not going to work out,
I think the best thing to do is to add another syscall
(copy_file_range_np?) which can be tweaked however we see fit.

--
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: RFC: should copy_file_range(2) remain Linux compatible or support special files?

2020-09-26 Thread Rick Macklem
Wall, Stephen wrote:
> Could the as yet unused options param have a bit assigned to trigger the new 
> behavior?  Inform the linux community of the addition and let them decide if 
> they
> would like to adopt it as well.
I'll assume you are referring to the "flags" argument when you say "param" 
above.

You could. However, since the Linux man page says it will return EINVAL if
the "flags" argument is non-zero, you've still introduced an incompatibility
w.r.t. the Linux behaviour.
It does make it clear that copy_file_range(2) will have the non-Linux behaviour
when the flag is specified, which I think is a good idea.

rick


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


RFC: should copy_file_range(2) remain Linux compatible or support special files?

2020-09-26 Thread Rick Macklem
I know cross-posting is frowned upon, but I wanted everyone who might
like to comment to see this.

Currently copy_file_range(2) only supports regular files, which is compatible
with the Linux one, where EINVAL is returned when either file descriptor
refers to a non-regular file.

Alan Somers would like to extend the syscall to handle special files.
I think he has a couple of reasons for this (he can correct me):
- When integrating it into "cp", he needed to provide a fallback for
  special files and similar fallbacks would probably be needed for
  other utilities like "dd".
- iSCSI provides a "copy" operation which could be implemented using
  copy_file_range(2)/VOP_COPY_FILE_RANGE() if it supported special files.

kib@ was concerned that a copy from /dev/zero would fill a disk, but
I think that issue can be dealt with by limiting the duration of the syscall
to 1sec (so that the utility can be terminated via SIGTERM or similar).

I am on the fence w.r.t. since I modelled it after the Linux one and keeping
it Linux compatible would facilitate portable code, but I understand why
Alan Somers wants to extend it (the iSCSI support seems particularily useful).

Everyone, please comment on this, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


review of mountd.c change

2020-09-21 Thread Rick Macklem
Hi,

I just put a patch up in phabricator (D26521) which avoids always
malloc()'ng a large MAX_NGROUPS sized groups list by having a
small one in the structure and then only malloc()'ng when a large
groups list is needed.

I've listed kib@ and freqlabs@ as reviewers, but anyone else is
welcome to review it.

The review is probably about the technique I used.
(Alternately, the could just always malloc() the groups array the
 correct size instead of using the SMALLNGROUPS one or malloc()'ng.)

Thanks in advance to anyone that reviews it, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Plans for git

2020-09-20 Thread Rick Macklem
Christian Weisgerber wrote:
> On 2020-09-19, Zaphod Beeblebrox  wrote:
> 
> > Hrm.  Maybe what I hear others saying, tho, and not entirely being replied
> > to is just a nice concise document of the why.  What I hear you saying is
> > that GIT has momentum and that it's popular... (and I accept that --- it is
> > evidently true), but then I hear handwaving about features, but no list of
> > features that are a clear win/loose.
>
> How about the very basics (that Warner appears to have lost sight
> of)?
>
> Git is a distributed version control system.  You clone a repository
> and apart from pulling and pushing changes to another repository,
> all your work happens with the local repository.  Subversion has a
> central repository and needs to talk to the server all the time.
> Laptop on a plane?  No change of workflow with Git.
Well, I (mostly lurk) on the linux-...@vger.kernels.org mailing list,
where the Linux NFS work gets done.
What I see is the following, when someone has an enhancement/change
for the Linux NFS code.
Do I see one diff with all the changes in it...No.
I see anywhere from a few to over 50 email messages, each with
one little piece of the pie, out of git.

I have no idea how they review this stuff.
If I were stuck doing it, I'd end up creating an unpatched tree, copying it
and applying all the patches to the copy and then creating a single diff
to look at in phabricator, which does display the changes very nicely.

So, I hope that a transition to git does not encourage "lots of small
loosely related commits" to the FreeBSD repository.

Also, I find svnweb useful, mostly to look at the commits done to a
file in temporal (most recent first) order. The global serial revision
number is very nice, but so long as it is easy to see the temporal
ordering of changes, I can live with that.

> And since it's your repository, you can cheaply create your own
> branches, where you can commit your work and have a versioned history
> of it instead of just a flat diff.  I can't overstate the value of
> that.  Whether you work on something that will be pushed back
> upstream or just your private changes, it has a full commit history.
I, on the other hand, will have no use for this. I can easily keep track of
changes I do by naming file.sav, file.sav2,...
I like to carefully merge changes into the repository checkout after I've
tested them, taking a careful look at the changes as I go.
I find most of the subtle bugs (that wouldn't be detected during normal
testing) during this "code inspection".
--> I think anything that encourages another look at the change before
  commit is a good thing.
  Put another way, slow and careful is better than quick and easy, imho.

I've live with the transition, but to be honest, I know it won't make my
work better or easier, rick

You can easily revert commits, you can upstream it one by one, you
can upstream it with history.

When FreeBSD switched from CVS to SVN, there was hope or promise
of lightweight branches, but that never materialized.  Developers
still can't have private branches in the FreeBSD repository.  For
a while, a lot of development happened in a Perforce repository--a
commerical version control system, whose company had donated a
license--which offered this feature.  Nowadays, everybody who does
any but the most trivial development does so in a private Git
repository anyway.  It only makes sense to interface this directly
with the FreeBSD repository instead of going through a SVN<>Git
media break.

> Certainly the only clear things a quick search turns up that seem relevant
> is that GIT is GPL2.0 and SVN is Apache2.0.  This was enough for LLVM vs
> GCC and the repository is a core function, but I suppose not a necessary
> function for forked projects that can't abide, so...

There is a bit of historical precedent: The original BSD work at
Berkeley was kept in a SCCS repository, a proprietary version control
system at the time.

And of course the fact that significant FreeBSD development has
effectively happened in Perforce, then in Git for a long time and
is just merged back into the Subversion repository.  To put it
bluntly, the people doing the work have voted with their feet years
ago.

--
Christian "naddy" Weisgerber  na...@mips.inka.de
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Documentation regarding NFSv4

2020-09-18 Thread Rick Macklem
Russell L. Carter wrote:
>On 2020-09-18 16:28, Rick Macklem wrote:
> > Oh, and I forgot to mention name<->id# mapping.
> > If using AUTH_SYS (not kerberos), then you have the
> > choice of running "nfsuserd" or setting these two sysctls to 1.
> > vfs.nfs.enable_uidtostring=1
> > vfs.nfsd.enable_stringtouid=1
> > --> This makes the server just handle id#s (uid, gid) as numbers in
> > a string. (This is the default for Linux these days although
it was
> > '   frowned upon in the early days.)
> >
> > Running nfsuserd maps uid, gid numbers to/from names using the
> > password and group databases. This must be used for Kerberos mounts.
> >
> > Without the above properly configured, you'll see lots of files owned
> > by "nobody" on the client mounts.
>
>Those sysctls are interesting.  I wasn't aware of them and so I run
>nfsuserd.  What do they do, practically speaking?  My understanding,
>likely wrong, is that nfsuserd should allow different uid/gid
>server->client mappings, possibly different for different clients.
Well, in theory, yes.
In practice, that never really happened.
When NFSv4 was being designed, putting uid/gid numbers in file attributes
was felt to be too POSIX centric, so in file attributes, they are defined
as a string of the form "user@domain" or "group@domain".
What never happened was a good definition of what "domain" was supposed
to be or how clients/servers would handle multiple domains.
--> So, only one "domain" normally works and it is usually the same
  as the domain part of the machine's hostname.

Linux got tired of doing the number->string and string->number
mapping (awkward for NFS mounted root file systems, since the mapping
daemon is not running right away), so they switched to just doing
"uid" and "gid" (ie. the numbers in strings).
--> By setting the sysctls (both for the server), you run Linux compatible
   and don't need to run the nfsuserd (unless you use the -manage-gids
   option on it).

These days Linux is the de-facto standard (unless you are using Windows).

>However I still had to sync uid/gids across machines even though they
>are all running nfsuserd.  Didn't disable nfsuserd because... system
>is working... DFWI.
Well, user authentication is a different story...
- For Kerberos, the kerberos user principal is translated to POSIX
  credentials by the gssd daemon and you don't need a consistent
  uid, gid space, but do need to run nfsuserd, since the "uid" and "gid"
  strings don't work.
- Otherwise, you are using AUTH_SYS, which means the RPC authenticator
  has a uid and gid list in it and the credentials are derived from that.
  (If you run "nfsuserd -manage-gids", then the uid is used to acquire
   a list of gids on the server from its group database. Otherwise, the
   list of gids in the RPC authenticator is used.)
  --> You need a uniform uid space (and uniform gid space unless you
 are using "nfsuserd -manage-gids".

Confusing, yes.

rick
 
Anyway, naked FreeBSD-stable nfsv4 is rock solid in a clamped down
arena with a variety of FreeBSD and Debian clients.  Kudos.

Thanks,
Russell

 > rick
 >
 > 
 > From: Rick Macklem 


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Documentation regarding NFSv4

2020-09-18 Thread Rick Macklem
Oh, and I forgot to mention name<->id# mapping.
If using AUTH_SYS (not kerberos), then you have the
choice of running "nfsuserd" or setting these two sysctls to 1.
vfs.nfs.enable_uidtostring=1
vfs.nfsd.enable_stringtouid=1
--> This makes the server just handle id#s (uid, gid) as numbers in
   a string. (This is the default for Linux these days although it was
'   frowned upon in the early days.)

Running nfsuserd maps uid, gid numbers to/from names using the
password and group databases. This must be used for Kerberos mounts.

Without the above properly configured, you'll see lots of files owned
by "nobody" on the client mounts.

rick

________
From: Rick Macklem 
Sent: Friday, September 18, 2020 7:21 PM
To: Shawn Webb; freebsd-current@freebsd.org; freebsd-sta...@freebsd.org
Subject: Re: Documentation regarding NFSv4

Shawn Webb wrote:
>Hey all,
>
>It appears the Handbook and the nfsv4 manpages don't really agree,
>leading to some confusion as to how to properly set up an NFSv4 server
>on FreeBSD.
>
>Any guidance would be appreciated.
1 - I never look at the Handbook, but do try and maintain the man pages.
 Since you didn't explain the specifics related to your confusion, all I can
 say is that the man pages are probably more correct.

Assuming you already have a running NFSv3 NFS server, all you need to do
is:
- Add a V4: line to your /etc/exports files. This does not "export any file 
systems"
  (that is done by other lines in /etc/exports exactly the same as NFSv3).
  However, it does tell the NFSv4 server where the "root" is for NFSv4 clients.
  (ie. Where in the server's file system tree a "nfs-server:/" done by an NFSv4 
client
   ends up.)
- Add nfsv4_server_enable="YES" to your /etc/rc.conf.

Note that, since NFSv4 does allow a mount to cross server mount points (unlike
NFSv3), a client will normally only do a single mount at or near the "root"
specified by the "V4:" line (see "man exports").

If you explain what inconsistencies are in the docs, maybe someone could
fix them.

rick

Thanks,

--
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

GPG Key ID:  0xFF2E67A277F8E1FA
GPG Key Fingerprint: D206 BB45 15E0 9C49 0CF9  3633 C85B 0AF8 AB23 0FB2
https://git-01.md.hardenedbsd.org/HardenedBSD/pubkeys/src/branch/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Documentation regarding NFSv4

2020-09-18 Thread Rick Macklem
Shawn Webb wrote:
>Hey all,
>
>It appears the Handbook and the nfsv4 manpages don't really agree,
>leading to some confusion as to how to properly set up an NFSv4 server
>on FreeBSD.
>
>Any guidance would be appreciated.
1 - I never look at the Handbook, but do try and maintain the man pages.
 Since you didn't explain the specifics related to your confusion, all I can
 say is that the man pages are probably more correct.

Assuming you already have a running NFSv3 NFS server, all you need to do
is:
- Add a V4: line to your /etc/exports files. This does not "export any file 
systems"
  (that is done by other lines in /etc/exports exactly the same as NFSv3).
  However, it does tell the NFSv4 server where the "root" is for NFSv4 clients.
  (ie. Where in the server's file system tree a "nfs-server:/" done by an NFSv4 
client
   ends up.)
- Add nfsv4_server_enable="YES" to your /etc/rc.conf.

Note that, since NFSv4 does allow a mount to cross server mount points (unlike
NFSv3), a client will normally only do a single mount at or near the "root"
specified by the "V4:" line (see "man exports").

If you explain what inconsistencies are in the docs, maybe someone could
fix them.

rick

Thanks,

--
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

GPG Key ID:  0xFF2E67A277F8E1FA
GPG Key Fingerprint: D206 BB45 15E0 9C49 0CF9  3633 C85B 0AF8 AB23 0FB2
https://git-01.md.hardenedbsd.org/HardenedBSD/pubkeys/src/branch/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: rfc: should extant TLS connections be closed when a CRL is updated?

2020-09-16 Thread Rick Macklem
John-Mark Gurney wrote:
>Rick Macklem wrote this message on Fri, Sep 04, 2020 at 01:20 +:
>> The server side NFS over TLS daemon (rpc.tlsservd) can reload an updated
>> CRL (Certificate Revocation List) when a SIGHUP is posted to it.
>> However, it does not SSL_shutdown()/close() extant TCP connections using TLS.
>> (Those would only be closed if the daemon is restarted.)
>>
>> I am now thinking that, maybe, an SSL_shutdown()/close() should be done on
>> all extant TCP connections using NFS over TLS when an updated CRL is loaded,
>> since a connection might have used a revoked certificate for its handshake.
>>
>> What do others think?
>
>IMO, this should scan the existing connections, and only shut them
>down if they are using a revoked Cert.  This is the correct way to
>do things.
>
>I do realize that this is likely not possible, and in reality, the
>ssl library in use should do this automatically, but likely does not.
Well, not exactly "automatically, but X509_CRL_get0_by_ccert() checks
to see if a certificate is revoked, so all the code needed to do was
read the CRL file and then loop through the certificates, checking
each one.

>As the library likely does not, we should probably make this an
>option to close all connections upon CRL reload, with it being well
>documented.
>
>Now that option should likely be set to default on, but documented
>such that if you do regular/often CRL reloads, that a user may want
>to turn that off if it's disruptive to their server.
Not necessary, since doing just the revoked ones seems to work.

If you are curious, you can look at the recent commits or code
under head/projects/nfs-over-tls.

If anyone is interested in testing it, you can look at:
https://people.freebsd.org/~rmacklem/nfs-over-tls-setup.txt

Thanks for the useful suggestion, rick

--
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: rfc: should extant TLS connections be closed when a CRL is updated?

2020-09-04 Thread Rick Macklem
John-Mark Gurney wrote:
>Rick Macklem wrote this message on Fri, Sep 04, 2020 at 01:20 +:
>> The server side NFS over TLS daemon (rpc.tlsservd) can reload an updated
>> CRL (Certificate Revocation List) when a SIGHUP is posted to it.
>> However, it does not SSL_shutdown()/close() extant TCP connections using TLS.
>> (Those would only be closed if the daemon is restarted.)
>>
>> I am now thinking that, maybe, an SSL_shutdown()/close() should be done on
>> all extant TCP connections using NFS over TLS when an updated CRL is loaded,
>> since a connection might have used a revoked certificate for its handshake.
>>
>> What do others think?
>
>IMO, this should scan the existing connections, and only shut them
>down if they are using a revoked Cert.  This is the correct way to
>do things.
Yes. I agree with you and Stefan that this is the way to go.
(When I test with a single client, I sometimes forget that there might be
 1000s of connections on a production server.)

>I do realize that this is likely not possible, and in reality, the
>ssl library in use should do this automatically, but likely does not.
>As the library likely does not, we should probably make this an
>option to close all connections upon CRL reload, with it being well
>documented.
Well, I haven't looked yet, but I suspect that there are lower level OpenSSL
library functions that can be used to read each entry from the CRL.

If I can do that, it is just comparing the Issuer and Serial# with the ones
associated with the connection (captured when the handshake is done).

So long as the lower level ssl library functions are not internal ones,
I am comfortable doing that. (It might make the code a little harder
to maintain, but I suspect what is in OpenSSL3 will be around for a while,
API wise?)

>Now that option should likely be set to default on, but documented
>such that if you do regular/often CRL reloads, that a user may want
>to turn that off if it's disruptive to their server.
I think this is the fallback, if I can't easily read the entries out of the CRL.

Thanks for the good comments (Stefan too), rick

--
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


rfc: should extant TLS connections be closed when a CRL is updated?

2020-09-03 Thread Rick Macklem
Hi,

The server side NFS over TLS daemon (rpc.tlsservd) can reload an updated
CRL (Certificate Revocation List) when a SIGHUP is posted to it.
However, it does not SSL_shutdown()/close() extant TCP connections using TLS.
(Those would only be closed if the daemon is restarted.)

I am now thinking that, maybe, an SSL_shutdown()/close() should be done on
all extant TCP connections using NFS over TLS when an updated CRL is loaded,
since a connection might have used a revoked certificate for its handshake.

What do others think?

Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: /sys/modules/ nfscl & nfsd

2020-09-01 Thread Rick Macklem
Julian H. Stacey wrote:
>Hi curr...@freebsd.org,
>
>/sys/modules/ nfscl & nfsd
>With .ctm_status src-cur 14656 .svn_revision 364986
>
>/usr/src/sys/fs/nfsclient/nfs_clkrpc.c:40:10: fatal error: 'opt_kern_tls.h' 
>file not found
># #include "opt_kern_tls.h"
>
># /usr/src/sys/modules/nfsd
>#   /usr/src/sys/fs/nfsserver/nfs_nfsdkrpc.c:41:10: fatal error: 
>'opt_kern_tls.h' file not found
>
>Avoided for now by manualy patching out modules/Makefile
Should be fixed by r365262.

Thanks for reporting it, rick

Cheers,
Julian
--
Julian Stacey, Consultant Sys. Engineer, BSD Linux http://berklix.com/jhs/
Crash Brexit Dec. 2020 paid by speculators. http://berklix.uk/brexit/#money
Contraception V. Global warming, pollution, hunger, contagion, resource wars.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: should rpctlssd be called rpc.tlssd?

2020-09-01 Thread Rick Macklem
Gary Jennejohn wrote:On Tue, 1 Sep 2020 13:00:33 +0200 (CEST)
>Ronald Klop  wrote:
>>  Van: Rick Macklem 
>> Datum: dinsdag, 1 september 2020 04:37
>> Aan: "freebsd-current@FreeBSD.org" 
>> Onderwerp: should rpctlssd be called rpc.tlssd?
>> >
>> > This sounds trivial, but I thought I'd ask, in case anyone
>> > has a preference?
>> >
>> > The NFS over TLS code includes two daemons, one for
>> > the client and one for the server.
>> > I have called them rpctlscd and rpctlssd.
>> >
>> > There was/is a tradition in Sun RPC of putting a "." in
>> > the names.
>> > So, should I be calling these daemons:
>> > rpc.tlscd and rpc.tlssd?
>>
>> I don't have an opinion about the rpc* vs rpc.* tradition.
>> But what I do not understand is why the difference between 2 daemons
>> is only reflected in 1 character of their names.  The rest of the
>> name is actually not really significant in keeping them apart.
>>
>
>I had the same reaction.  Maybe something like rpc.tlsclntd and rpc.tlsservd?
Good point. Ben Kaduk thought the second "s" was a typo.

So, unless I hear comments to the contrary, rpc.tlsclntd and rpc.tlsservd it is.

Thanks everyone for your comments, rick
ps: Using a single letter was the old tradition of "shorter is better".
ls, cp, mv instead of dir, copy, move
But these aren't commands typed by users, so having move obvious names
seems correct.

--
Gary Jennejohn

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


should rpctlssd be called rpc.tlssd?

2020-08-31 Thread Rick Macklem
This sounds trivial, but I thought I'd ask, in case anyone
has a preference?

The NFS over TLS code includes two daemons, one for
the client and one for the server.
I have called them rpctlscd and rpctlssd.

There was/is a tradition in Sun RPC of putting a "." in
the names.
So, should I be calling these daemons:
rpc.tlscd and rpc.tlssd?

Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Does FreeBSD have an assigned Internet OID?

2020-08-30 Thread Rick Macklem
Poul-Henning Kamp wrote:
>Rick Macklem writes:
>> Poul-Henning Kamp wrote:
>
>> Is https://reviews.freebsd.org/D26225
>> sufficient to allow me to use 1.3.6.1.4.1.2238.1.1.1 for a user@domain
>> name in this otherName component of subjAltName in the X.509 cert?
>> (I didn't list the UserName as the first item of the subtree. Should I?)
>
>You should add a comment about how suballocations (if allowed) happens
>under that branch.
I think it is easiest to do them in this file.
I have added an entry for it.

>> Do I need to update the date/time for LAST-UPDATED and REVISION
>> when I commit it, I'd guess?
>
>Yes please.
I've updated https://reviews.freebsd.org/D26225

If you could review this, it would be appreciated, rick

--
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Does FreeBSD have an assigned Internet OID?

2020-08-28 Thread Rick Macklem
Poul-Henning Kamp wrote:
>Rick Macklem writes:
>> For the NFS over TLS work, I have a need for an Internet OID.
>> (I understand that IETF assigns ones for things like SNMP under
>> 1.3.6.1.4.1...)
>
>See:
>
>/usr/share/snmp/mibs/FREEBSD-MIB.txt
Is https://reviews.freebsd.org/D26225
sufficient to allow me to use 1.3.6.1.4.1.2238.1.1.1 for a user@domain
name in this otherName component of subjAltName in the X.509 cert?
(I didn't list the UserName as the first item of the subtree. Should I?)

Do I need to update the date/time for LAST-UPDATED and REVISION
when I commit it, I'd guess?

Thanks, rick

--
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Does FreeBSD have an assigned Internet OID?

2020-08-27 Thread Rick Macklem
For the NFS over TLS work, I have a need for an Internet OID.
(I understand that IETF assigns ones for things like SNMP under
1.3.6.1.4.1...)

I'm referring to the long strings of numbers separated by "."s,
where each number is a subtree administered by someone.

If either the project or Foundation has one assigned to them,
that I can acquire a subnumber (or whatever they call the next
layer down), please let me know.

Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


review of a change to sosend_generic()

2020-08-16 Thread Rick Macklem
Hi,

I put D25923 up on phabricator a little while ago.
I clicked on a couple of people that I thought might like to
review it.

However, if anyone else would like to review it, please do so.
The review is as much about the concept as the actual implementation.

Thanks, rick

Here is the description of it...
The kernel RPC cannot process non-application data records when
  using TLS.  It must to an upcall to a userspace daemon that will
  call SSL_read() to process them.
  
  This patch adds a new flag called MSG_TLSAPPDATA that the kernel
  RPC can use to tell sorecieve() to return ENXIO instead of a non-application
  data record, when that is what is at the top of the receive queue.
  
  The code could use any error return that is not normally returned by
  soreceive(). If some other errno is preferred, that can easily be changed.
  
  I also put the code in #ifdef KERN_TLS/#endif, although it will build without
  that, so that it is recognized as only useful when KERN_TLS is enabled.
  
  The alternative to doing this is to have the kernel RPC re-queue the
  non-application data message after receiving it, but that seems more
  complicated and might introduce message ordering issues when there
  are multiple non-application data records one after another.
  
  I do not know what, if any, changes will be required to support TLS1.3.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: can buffer cache pages be used in ext_pgs mbufs?

2020-08-13 Thread Rick Macklem
Konstantin Belousov wrote:
>On Tue, Aug 11, 2020 at 03:10:39AM +0000, Rick Macklem wrote:
>> Konstantin Belousov wrote:
>> >On Mon, Aug 10, 2020 at 12:46:00AM +, Rick Macklem wrote:
>> >> Konstantin Belousov wrote:
>> >> >On Fri, Aug 07, 2020 at 09:43:14PM -0700, Kirk McKusick wrote:
>> >> >> I do not have the answer to your question, but I am copying Kostik
>> >> >> as if anyone knows the answer, it is probably him.
>> >> >>
>> >> >>   ~Kirk
>> >> >>
>> >> >> =-=-=
>> >> >I do not know the exact answer, this is why I did not followed up on the
>> >> >original question on current@.  In particular, I have no idea about the
>> >> >ext_pgs mechanism.
>> >> >
>> >> >Still I can point one semi-obvious aspect of your proposal.
>> >> >
>> >> >When the buffer is written (with bwrite()), its pages are sbusied and
>> >> >the write mappings of them are invalidated. The end effect is that no
>> >> >modifications to the pages are possible until they are unbusied. This,
>> >> >together with the lock of the buffer that holds the pages, effectively
>> >> >stops all writes either through write(2) or by mmaped regions.
>> >> >
>> >> >In other words, any access for write to the range of file designated by
>> >> >the buffer, causes the thread to block until the pages are unbusied and
>> >> >the buffer is unlocked.  Which in described case would mean, until NFS
>> >> >server responds.
>> >> >
>> >> >If this is fine, then ok.
>> >> For what I am thinking of, I would say that is fine, since the ktls code 
>> >> reads
>> >> the pages to encrypt/send them, but can use other allocated pages for
>> >> the encrypted data.
>> >>
>> >> >Rick, do you know anything about the vm page lifecycle as mb_ext_pgs ?
>> >> Well, the anonymous pages (the only ones I've been using sofar) are
>> >> allocated with:
>> >> vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ |
>> >>VM_ALLOC_NODUMP | VM_ALLOC_WIRED);
>> >>
>> >> and then the m_ext_ext_free function (mb_free_mext_pgs()) does:
>> >> vm_page_unwire_noq(pg);
>> >> vm_page_free(pg);
>> >> on each of them.
>> >>
>> >> m->m_ext_ext_free() is called in tls_encrypt() when it no longer wants the
>> >> pages, but is normally called via m_free(m), which calls mb_free_extpg(m),
>> >> although there are a few other places.
>> >>
>> >> Since m_ext_ext_free is whatever function you want to make it, I suppose 
>> >> the
>> >> answer is "until your m_ext.ext_free" function is called.
>> >>
>> >> At this time, for ktls, if you are using software encryption, the call to 
>> >> ktls_encrypt(),
>> >> which is done before passing the mbufs down to TCP is when it is done 
>> >> with the
>> >> unencrypted data pages. (I suppose there is no absolute guarantee that 
>> >> this
>> >> happens before the kernel RPC layer times out waiting for an RPC reply, 
>> >> but it
>> >> is almost inconceivable, since this happens before the RPC request is 
>> >> passed
>> >> down to TCP.)
>> >>
>> >> The case I now think is more problematic is the "hardware assist" case. 
>> >> Although
>> >> no hardware/driver yet does this afaik, I suspect that the unencrypted 
>> >> data page
>> >> mbufs could end up stuck in TCP for a long time, in case a retransmit is 
>> >> needed.
>> >>
>> >> So, I now think I might need to delay the bufdone() call until the 
>> >> m_ext_ext_free()
>> >> call has been done for the pages, if they are buffer cache pages?
>> >> --> Usually I would expect the m_ext_ext_free() call for the mbuf(s) that
>> >>hold the data to be written to the server to be done long before
>> >>bufdone() would be called for the buffer that is being written,
>> >>but there is no guarantee.
>> >>
>> >> Am I correct in assuming that the pages for the buffer will remain valid 
>> >> and
>> >> readable through the direct map until bufdone() is called?
>> >> If I am

Re: can buffer cache pages be used in ext_pgs mbufs?

2020-08-10 Thread Rick Macklem
Konstantin Belousov wrote:
>On Mon, Aug 10, 2020 at 12:46:00AM +0000, Rick Macklem wrote:
>> Konstantin Belousov wrote:
>> >On Fri, Aug 07, 2020 at 09:43:14PM -0700, Kirk McKusick wrote:
>> >> I do not have the answer to your question, but I am copying Kostik
>> >> as if anyone knows the answer, it is probably him.
>> >>
>> >>   ~Kirk
>> >>
>> >> =-=-=
>> >I do not know the exact answer, this is why I did not followed up on the
>> >original question on current@.  In particular, I have no idea about the
>> >ext_pgs mechanism.
>> >
>> >Still I can point one semi-obvious aspect of your proposal.
>> >
>> >When the buffer is written (with bwrite()), its pages are sbusied and
>> >the write mappings of them are invalidated. The end effect is that no
>> >modifications to the pages are possible until they are unbusied. This,
>> >together with the lock of the buffer that holds the pages, effectively
>> >stops all writes either through write(2) or by mmaped regions.
>> >
>> >In other words, any access for write to the range of file designated by
>> >the buffer, causes the thread to block until the pages are unbusied and
>> >the buffer is unlocked.  Which in described case would mean, until NFS
>> >server responds.
>> >
>> >If this is fine, then ok.
>> For what I am thinking of, I would say that is fine, since the ktls code 
>> reads
>> the pages to encrypt/send them, but can use other allocated pages for
>> the encrypted data.
>>
>> >Rick, do you know anything about the vm page lifecycle as mb_ext_pgs ?
>> Well, the anonymous pages (the only ones I've been using sofar) are
>> allocated with:
>> vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL | VM_ALLOC_NOOBJ |
>>VM_ALLOC_NODUMP | VM_ALLOC_WIRED);
>>
>> and then the m_ext_ext_free function (mb_free_mext_pgs()) does:
>> vm_page_unwire_noq(pg);
>> vm_page_free(pg);
>> on each of them.
>>
>> m->m_ext_ext_free() is called in tls_encrypt() when it no longer wants the
>> pages, but is normally called via m_free(m), which calls mb_free_extpg(m),
>> although there are a few other places.
>>
>> Since m_ext_ext_free is whatever function you want to make it, I suppose the
>> answer is "until your m_ext.ext_free" function is called.
>>
>> At this time, for ktls, if you are using software encryption, the call to 
>> ktls_encrypt(),
>> which is done before passing the mbufs down to TCP is when it is done with 
>> the
>> unencrypted data pages. (I suppose there is no absolute guarantee that this
>> happens before the kernel RPC layer times out waiting for an RPC reply, but 
>> it
>> is almost inconceivable, since this happens before the RPC request is passed
>> down to TCP.)
>>
>> The case I now think is more problematic is the "hardware assist" case. 
>> Although
>> no hardware/driver yet does this afaik, I suspect that the unencrypted data 
>> page
>> mbufs could end up stuck in TCP for a long time, in case a retransmit is 
>> needed.
>>
>> So, I now think I might need to delay the bufdone() call until the 
>> m_ext_ext_free()
>> call has been done for the pages, if they are buffer cache pages?
>> --> Usually I would expect the m_ext_ext_free() call for the mbuf(s) that
>>hold the data to be written to the server to be done long before
>>bufdone() would be called for the buffer that is being written,
>>but there is no guarantee.
>>
>> Am I correct in assuming that the pages for the buffer will remain valid and
>> readable through the direct map until bufdone() is called?
>> If I am correct w.r.t. this, it should work so long as the m_ext_ext_free() 
>> calls
>> for the pages happen before the bufdone() call on the bp, I think?
>
>I think there is further complication with non-anonymous pages.
>You want (or perhaps need) the page content to be immutable and not
>changed while you pass pages around and give the for ktls sw or hw
>processing.  Otherwise it could not pass the TLS authentification if
>page was changed in process.
>
>Similar issue exists when normal buffer writes are scheduled through
>the strategy(), and you can see that bufwrite() does vfs_busy_pages()
>with clear_modify=1, which does two things:
>- sbusy the pages (sbusy pages can get new read-only mappings, but cannot
>  be mapped rw)
>- pmap_remove_write() on the pages to invalidate all current writeable
>  mappings.
>

can buffer cache pages be used in ext_pgs mbufs?

2020-08-06 Thread Rick Macklem
Hi,

I've been at this game for a while and one of the axioms is...
"Everything is harder than it at first looks."

Currently, when the FreeBSD NFS client does a write, it does:
- VOP_WRITE() copies the data into buffer cache block(s).
--> An nfsiod thread (or sometimes the thread that called VOP_WRITE()),
   copies the data from the buffer cache block into a list of mbuf clusters,
   prepends the NFS and RPC headers, then passes it down to TCP via sosend().

   After the RPC reply is received (or the RPC fails due to timeout):
   - m_freem() is called on the mbuf list.
   - bufdone()/brelse() is called for the buffer cache block.

For TLS, the mbuf list passed into sosend() must be ext_pgs mbufs, so the
mbuf clusters get copied to ext_pgs mbufs with anonymous pages before
the sosend() call.

So, what if the pages associated with the buffer cache block (b_pages)
were entered in the m_epg_pa[] array for the ext_pgs mbufs, instead of
copying the data into mbuf clusters?
- At a glance, this just seems like it would work.
  It looks like the buffer cache pages are wired down until bufdone()/brelse(),
  which happens after m_freem() on the mbuf list.
- There would need to be a custom m__ext.ext_free, but it looks like a no-op.
  (ie. does nothing, since the buffer cache code deals with the pages later.)

The only thing I can think of (and I don't understand the vm/memory cache
parts of FreeBSD) is that, since the buffer cache pages are written via copying
into their kva addresses and then read via the direct map of their physical
pages, there might be some sort of memory cache flush needed to ensure the
physical pages are up to date (no data still working its way through 
write-back).
- Is this a problem and how is it handled?

In summary, what am I missing that makes this difficult/impossible to do?

If no one has an answer, I'll just code it up and see what happens.

Thanks for any comments, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: RFC: ktls and krpc using M_EXTPG mbufs

2020-07-27 Thread Rick Macklem
Andrew Gallatin wrote:
>On 2020-07-19 19:34, Rick Macklem wrote:
>> I spent a little time chasing a problem in the nfs-over-tls code, where it
>> would sometimes end up with corrupted data in the file(s) of a mirrored
>> pNFS configuration.
>>
>> I think the problem was that the code filled the data to be written into
>> anonymous page M_EXTPG mbufs, then did a m_copym() { copy by
>> reference } and used the copies for the mirrored writes.
>> --> In ktls_encrypt(), the encryption was done to the same pages and,
>> sometimes, the encrypted data got encrypted again during the
>> sosend() of the other copy.
>>
>> Although I haven't reproduced it, a regular kernel write RPC could suffer the
>> same consequences if the RPC is retried (it keeps an m_copym() copy
>> of the request in the krpc for an RPC retry).
>>
>> At this time, the code in projects/nfs-over-tls works correctly, since it
>> always fills the data to be written into mbuf clusters, m_copym()s those
>> and then copies those { real copying using memcpy() } via
>> mb_mapped_to_unmapped() just before calling sosend().
>> --> This works, but it would be nice to avoid the mb_mapped_to_unmapped()
>>copying for all the data being written via an NFS over TLS connection.
>>
>> For the TCP_TLS_MODE_SW case:
>> --> The NFS code can fill the written data into anonymous pages on M_EXTPG
>> mbufs.
>> Then, the ktls_encrypt() could be modified to
>> allocate a new set of anonymous pages for the destination side of
>> the encryption (it already does this for the sendfile case) and put those
>> in a new mbuf list.
>> --> This would result in new anonymous pages and mbufs being allocated,
>> but would not do memcpy()s.
>> After encryption, it would just do a m_freem() on the unencrypted list.
>> --> For the krpc client case, this call would only decrement the reference
>>count on the unencrypted list and it could be used for a retry by the 
>> krpc
>>and then be free'd { m_freem() call } after a reply is received.
>>
>> If doing this for all the sosend()s of anonymous page M_EXTPG mbufs seems
>> like unnecessary overhead, the above could be enabled via a setsockopt()
>> on the socket.
>>
>> What do others think of this?
>
>Several comments:
>
>mb_mapped_to_unmapped() is surprisingly inexpensive.  It was less than
>5% before I converted iflib to M_NOMAP aware.
Hmm. Just wondering what the 5% refers to?
5% difference in throughput for a data stream
5% increase in CPU overheads
or ???

I do agree that, with multiple cores these days, avoiding the memcpy()s in
the client isn't that big a deal.
--> This issue is client side only. The NFS server can generate read and readdir
  replies (the only big ones) in anonymous ext_pgs mbufs now.

>It seems like NFS should be constructing mbufs like sendfile does, and
>pointing mbufs at its pages.  This would cause the crypto code to
>allocate a new set of pages upon encryption.
I suppose the ideal would be to use the pages that already hold the data
in the buffer cache, but I haven't even looked at what it might take to
do that? (The buffer cache block would have to remain busied until the
mbuf is free'd or something like that.)
I kinda plan on looking at this someday...

I suppose I could "pretend" they aren't anonymous pages by not
setting the EPG_ANON_FLAG, but that still wouldn't be enough to
fix this problem.
--> Not only does ktls_encrypt() need to use different pages, it needs
  to allocate new mbuf(s) for them, so that the unencrypted pages
  will still be associated with the mbuf list passed in.
(I don't really see "pretending" the pages aren't anonymous makes much
 difference?)

>> For the hardware offload case:
>> - Can I assume that the anonymous pages in M_EXTPG mbufs will remain
>> unchanged?
>> --> If so, and it won't change to TCP_TLS_MODE_SW, the NFS code could
>> fill the data to be written into M_EXTPG mbufs safely.
>>
>> - And, if so, can I safely use the ktls_session mode field to decide if 
>> offload
>>is happening?
>>I see the TCP_TXTLS_MODE socket opt which seems to
>>switch the mode to TCP_TLS_MODE_SW.
>>When does this happen? Or, can this happen to a session once in use?
>
>Yes.  The intent is to allow something (TCP stack, smart user daemon) to
>look at a connection & move it from hardware to software, if it has a
>lot of TCP re-transmits.
Ok, so I don't think the NFS code should assume the pages will remain
unencrypted, even if it appears hardware assist is being used, unless the
software case is change

RFC: ktls and krpc using M_EXTPG mbufs

2020-07-19 Thread Rick Macklem
I spent a little time chasing a problem in the nfs-over-tls code, where it
would sometimes end up with corrupted data in the file(s) of a mirrored
pNFS configuration.

I think the problem was that the code filled the data to be written into
anonymous page M_EXTPG mbufs, then did a m_copym() { copy by
reference } and used the copies for the mirrored writes.
--> In ktls_encrypt(), the encryption was done to the same pages and,
   sometimes, the encrypted data got encrypted again during the
   sosend() of the other copy.

Although I haven't reproduced it, a regular kernel write RPC could suffer the
same consequences if the RPC is retried (it keeps an m_copym() copy
of the request in the krpc for an RPC retry).

At this time, the code in projects/nfs-over-tls works correctly, since it
always fills the data to be written into mbuf clusters, m_copym()s those
and then copies those { real copying using memcpy() } via
mb_mapped_to_unmapped() just before calling sosend().
--> This works, but it would be nice to avoid the mb_mapped_to_unmapped()
  copying for all the data being written via an NFS over TLS connection.

For the TCP_TLS_MODE_SW case:
--> The NFS code can fill the written data into anonymous pages on M_EXTPG
   mbufs.
Then, the ktls_encrypt() could be modified to
allocate a new set of anonymous pages for the destination side of
the encryption (it already does this for the sendfile case) and put those
in a new mbuf list.
--> This would result in new anonymous pages and mbufs being allocated,
   but would not do memcpy()s.
After encryption, it would just do a m_freem() on the unencrypted list.
--> For the krpc client case, this call would only decrement the reference
  count on the unencrypted list and it could be used for a retry by the krpc
  and then be free'd { m_freem() call } after a reply is received.

If doing this for all the sosend()s of anonymous page M_EXTPG mbufs seems
like unnecessary overhead, the above could be enabled via a setsockopt()
on the socket.

What do others think of this?

For the hardware offload case:
- Can I assume that the anonymous pages in M_EXTPG mbufs will remain
   unchanged?
--> If so, and it won't change to TCP_TLS_MODE_SW, the NFS code could
   fill the data to be written into M_EXTPG mbufs safely.

- And, if so, can I safely use the ktls_session mode field to decide if offload
  is happening?
  I see the TCP_TXTLS_MODE socket opt which seems to
  switch the mode to TCP_TLS_MODE_SW.
  When does this happen? Or, can this happen to a session once in use?

Thanks for any/all comments on this, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-07-03 Thread Rick Macklem
Ryan Libby wrote:
>On Sun, Jun 28, 2020 at 9:57 PM Rick Macklem  wrote:
>>
>> Just in case you were waiting for another email, I have now run several
>> cycles of the kernel build over NFS on a recent head kernel with the
>> one line change and it has not hung.
>>
>> I don't know if this is the correct fix, but it would be nice to get 
>> something
>> into head to fix this.
>>
>> If I don't hear anything in the next few days, I'll put it in a PR so it
>> doesn't get forgotten.
>>
>> rick
>
>Thanks for the follow through on this.
>
>I think the patch is not complete.  It looks like the problem is that
>for systems that do not have UMA_MD_SMALL_ALLOC, we do
>uma_zone_set_allocf(vmem_bt_zone, vmem_bt_alloc);
>but we haven't set an appropriate free function.  This is probably why
>UMA_ZONE_NOFREE was originally there.  When NOFREE was removed, it was
>appropriate for systems with uma_small_alloc.
>
>So by default we get page_free as our free function.  That calls
>kmem_free, which calls vmem_free ... but we do our allocs with
>vmem_xalloc.  I'm not positive, but I think the problem is that in
>effect we vmem_xalloc -> vmem_free, not vmem_xfree.
>
>Three possible fixes:
> 1: The one you tested, but this is not best for systems with
>uma_small_alloc.
> 2: Pass UMA_ZONE_NOFREE conditional on UMA_MD_SMALL_ALLOC.
> 3: Actually provide an appropriate vmem_bt_free function.
>
>I think we should just do option 2 with a comment, it's simple and it's
>what we used to do.  I'm not sure how much benefit we would see from
>option 3, but it's more work.
I set hw.physmem to 1Gbyte on my amd64 system (did not have the patch)
and ran 6 cycles of the kernel build over NFS without a hang, so I don't
think any fix is needed for systems that support UMA_MD_SMALL_ALLOC.

The trivial patch for option 2 is attached.
I didn't do a comment, since you understand this and can probably
describe it more correctly.

Thanks, rick

Ryan

>
> 
> From: owner-freebsd-curr...@freebsd.org  
> on behalf of Rick Macklem 
> Sent: Thursday, June 18, 2020 11:42 PM
> To: Ryan Libby
> Cc: Konstantin Belousov; Jeff Roberson; freebsd-current@freebsd.org
> Subject: Re: r358252 causes intermittent hangs where processes are stuck 
> sleeping on btalloc
>
> Ryan Libby wrote:
> >On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem  wrote:
> >>
> >> Rick Macklem wrote:
> >> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over =
> NFS.
> >> >I thought this was the culprit, since I did 6 cycles of r358097 without =
> a hang.
> >> >However, I just got a hang with r358097, but it looks rather different.
> >> >The r358097 hang did not have any processes sleeping on btalloc. They
> >> >appeared to be waiting on two different locks in the buffer cache.
> >> >As such, I think it might be a different problem. (I'll admit I should h=
> ave
> >> >made notes about this one before rebooting, but I was flustrated that
> >> >it happened and rebooted before looking at it mush detail.)
> >> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never
> >> got a hang.
> >> --> It seems that r358097 is the culprit and r358098 makes it easier
> >>   to reproduce.
> >>   --> Basically runs out of kernel memory.
> >>
> >> It is not obvious if I can revert these two commits without reverting
> >> other ones, since there were a bunch of vm changes after these.
> >>
> >> I'll take a look, but if you guys have any ideas on how to fix this, plea=
> se
> >> let me know.
> >>
> >> Thanks, rick
> >
> >Interesting.  Could you try re-adding UMA_ZONE_NOFREE to the vmem btag
> >zone to see if that rescues it, on whatever base revision gets you a
> >reliable repro?
> Good catch! That seems to fix it. I've done 8 cycles of kernel build over
> NFS without a hang (normally I'd get one in the first 1-3 cycles).
>
> I don't know if the intend was to delete UMA_ZONE_VM and r358097
> had a typo in it and deleted UMA_ZONE_NOFREE or ???
>
> Anyhow, I just put it back to UMA_ZONE_VM | UMA_ZONE_NOFREE and
> the hangs seem to have gone away.
>
> The small patch I did is attached, in case that isn't what you meant.
>
> I'll run a few more cycles just in case, but I think this fixes it.
>
> Thanks, rick
>
> >
> > Jeff, to fill you in, I have been getting intermittent hangs on a Pentium=
>  4
> > (single core i386) with 1.25Gbytes ram when doing kernel builds using
> > head kernels from this wint

Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-06-28 Thread Rick Macklem
Just in case you were waiting for another email, I have now run several
cycles of the kernel build over NFS on a recent head kernel with the
one line change and it has not hung.

I don't know if this is the correct fix, but it would be nice to get something
into head to fix this.

If I don't hear anything in the next few days, I'll put it in a PR so it
doesn't get forgotten.

rick


From: owner-freebsd-curr...@freebsd.org  on 
behalf of Rick Macklem 
Sent: Thursday, June 18, 2020 11:42 PM
To: Ryan Libby
Cc: Konstantin Belousov; Jeff Roberson; freebsd-current@freebsd.org
Subject: Re: r358252 causes intermittent hangs where processes are stuck 
sleeping on btalloc

Ryan Libby wrote:
>On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem  wrote:
>>
>> Rick Macklem wrote:
>> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over =
NFS.
>> >I thought this was the culprit, since I did 6 cycles of r358097 without =
a hang.
>> >However, I just got a hang with r358097, but it looks rather different.
>> >The r358097 hang did not have any processes sleeping on btalloc. They
>> >appeared to be waiting on two different locks in the buffer cache.
>> >As such, I think it might be a different problem. (I'll admit I should h=
ave
>> >made notes about this one before rebooting, but I was flustrated that
>> >it happened and rebooted before looking at it mush detail.)
>> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never
>> got a hang.
>> --> It seems that r358097 is the culprit and r358098 makes it easier
>>   to reproduce.
>>   --> Basically runs out of kernel memory.
>>
>> It is not obvious if I can revert these two commits without reverting
>> other ones, since there were a bunch of vm changes after these.
>>
>> I'll take a look, but if you guys have any ideas on how to fix this, plea=
se
>> let me know.
>>
>> Thanks, rick
>
>Interesting.  Could you try re-adding UMA_ZONE_NOFREE to the vmem btag
>zone to see if that rescues it, on whatever base revision gets you a
>reliable repro?
Good catch! That seems to fix it. I've done 8 cycles of kernel build over
NFS without a hang (normally I'd get one in the first 1-3 cycles).

I don't know if the intend was to delete UMA_ZONE_VM and r358097
had a typo in it and deleted UMA_ZONE_NOFREE or ???

Anyhow, I just put it back to UMA_ZONE_VM | UMA_ZONE_NOFREE and
the hangs seem to have gone away.

The small patch I did is attached, in case that isn't what you meant.

I'll run a few more cycles just in case, but I think this fixes it.

Thanks, rick

>
> Jeff, to fill you in, I have been getting intermittent hangs on a Pentium=
 4
> (single core i386) with 1.25Gbytes ram when doing kernel builds using
> head kernels from this winter. (I also saw one when doing a kernel build
> on UFS, so they aren't NFS specific, although easier to reproduce that wa=
y.)
> After a typical hang, there will be a bunch of processes sleeping on "bta=
lloc"
> and several processes holding the following lock:
> exclusive sx lock @ vm/vm_map.c:4761
> - I have seen hangs where that is the only lock held by any process excep=
t
>the interrupt thread.
> - I have also seen processes waiting on the following locks:
> kern/subr_vmem.c:1343
> kern/subr_vmem.c:633
>
> I can't be absolutely sure r358098 is the culprit, but it seems to make t=
he
> problem more reproducible.
>
> If anyone has a patch suggestion, I can test it.
> Otherwise, I will continue to test r358097 and earlier, to try and see wh=
at hangs
> occur. (I've done 8 cycles of testing of r356776 without difficulties, bu=
t that
> doesn't guarantee it isn't broken.)
>
> There is a bunch more of the stuff I got for Kostik and Ryan below.
> I can do "db" when it is hung, but it is a screen console, so I need to
> transcribe the output to email by hand. (ie. If you need something
> specific I can do that, but trying to do everything Kostik and Ryan asked
> for isn't easy.)
>
> rick
>
>
>
> Konstantin Belousov wrote:
> >On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote:
> >> Konstantin Belousov wrote:
> >> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
> >> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem =
 wrote:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > Since I hadn't upgraded a kernel through the winter, it took me a=
 while
> >> >> > to bisect this, but r358252 seems to be the culprit.
> No longer true. I succeeded in reproducing the hang to-day running a
> r358251 kernel.
>
> I haven't had much luck sofar, 

Re: openzfs-kmod build error

2020-06-23 Thread Rick Macklem
Kostya Berger wrote:
>CURRENT r362292
>sysutils/openzfs-kmod build aborts with error:...
>/usr/ports/sysutils/openzfs-kmod/work/zfs->c0eb5c35e/module/os/freebsd/zfs/zfs_vfsops.c:128:19:
> error:
>  incompatible pointer types initializing 'vfs_checkexp_t *' (aka 'int 
> (*)(struct
>  mount *, struct sockaddr *, unsigned long *, struct ucred **, int *, int 
> *)') with
>  an expression of type 'int (vfs_t *, struct sockaddr *, int *, struct 
> ucred **, int
>  *, int **)' (aka 'int (struct mount *, struct sockaddr *, int *, struct 
> ucred **,
>  int *, int **)') [-Werror,-Wincompatible-pointer-types]
>.vfs_checkexp = zfs_checkexp,
>^~~~
>/usr/ports/sysutils/openzfs-kmod/work/zfs->c0eb5c35e/module/os/freebsd/zfs/zfs_vfsops.c:1911:56:
> error:
>  incompatible pointer types passing 'int *' to parameter of type 
> 'uint64_t *'
>  (aka 'unsigned long *') [-Werror,-Wincompatible-pointer-types]
>return (vfs_stdcheckexp(zfsvfs->z_parent->z_vfs, nam, extflagsp,
>  ^
>/usr/src/sys/sys/mount.h:980:17: note: passing argument to parameter here
>vfs_checkexp_t  vfs_stdcheckexp;
>^
>/usr/ports/sysutils/openzfs-kmod/work/zfs->c0eb5c35e/module/os/freebsd/zfs/zfs_vfsops.c:1912:32:
> error:
>  incompatible pointer types passing 'int **' to parameter of type 'int *';
>  dereference with * [-Werror,-Wincompatible-pointer-types]
>credanonp, numsecflavors, secflavors));
>  ^~
>  *
>/usr/src/sys/sys/mount.h:980:17: note: passing argument to parameter here
>vfs_checkexp_t  vfs_stdcheckexp;
>^
>3 errors generated.
>*** Error code 1
>
>Stop.
Post r362158, the argument types change. Since they are just passed to
vfs_stdcheckexp(), all that needs to be done is changing the types of
the arguments.

freqlabs@ volunteered to do this upstream, but I don't know if/when
that gets applied to the port?

If you can fix this, simply replace:
zfs_checkexp(vfs_t *vfsp, struct sockaddr *nam, int *extflagsp,
   struct ucred **credanonp, int *numsecflavors, int **secflavors)

with

zfs_checkexp(vfs_t *vfsp, struct sockaddr *nam, uint64_t *extflagsp,
   struct ucred **credanonp, int *numsecflavors, int *secflavors)

in the two places it exists in zfs_vfsops.c.

rick



With kindest regards,
Kostya Berger

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-06-18 Thread Rick Macklem
Ryan Libby wrote:
>On Mon, Jun 15, 2020 at 5:06 PM Rick Macklem  wrote:
>>
>> Rick Macklem wrote:
>> >r358098 will hang fairly easily, in 1-3 cycles of the kernel build over =
NFS.
>> >I thought this was the culprit, since I did 6 cycles of r358097 without =
a hang.
>> >However, I just got a hang with r358097, but it looks rather different.
>> >The r358097 hang did not have any processes sleeping on btalloc. They
>> >appeared to be waiting on two different locks in the buffer cache.
>> >As such, I think it might be a different problem. (I'll admit I should h=
ave
>> >made notes about this one before rebooting, but I was flustrated that
>> >it happened and rebooted before looking at it mush detail.)
>> Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never
>> got a hang.
>> --> It seems that r358097 is the culprit and r358098 makes it easier
>>   to reproduce.
>>   --> Basically runs out of kernel memory.
>>
>> It is not obvious if I can revert these two commits without reverting
>> other ones, since there were a bunch of vm changes after these.
>>
>> I'll take a look, but if you guys have any ideas on how to fix this, plea=
se
>> let me know.
>>
>> Thanks, rick
>
>Interesting.  Could you try re-adding UMA_ZONE_NOFREE to the vmem btag
>zone to see if that rescues it, on whatever base revision gets you a
>reliable repro?
Good catch! That seems to fix it. I've done 8 cycles of kernel build over
NFS without a hang (normally I'd get one in the first 1-3 cycles).

I don't know if the intend was to delete UMA_ZONE_VM and r358097
had a typo in it and deleted UMA_ZONE_NOFREE or ???

Anyhow, I just put it back to UMA_ZONE_VM | UMA_ZONE_NOFREE and
the hangs seem to have gone away.

The small patch I did is attached, in case that isn't what you meant.

I'll run a few more cycles just in case, but I think this fixes it.

Thanks, rick

>
> Jeff, to fill you in, I have been getting intermittent hangs on a Pentium=
 4
> (single core i386) with 1.25Gbytes ram when doing kernel builds using
> head kernels from this winter. (I also saw one when doing a kernel build
> on UFS, so they aren't NFS specific, although easier to reproduce that wa=
y.)
> After a typical hang, there will be a bunch of processes sleeping on "bta=
lloc"
> and several processes holding the following lock:
> exclusive sx lock @ vm/vm_map.c:4761
> - I have seen hangs where that is the only lock held by any process excep=
t
>the interrupt thread.
> - I have also seen processes waiting on the following locks:
> kern/subr_vmem.c:1343
> kern/subr_vmem.c:633
>
> I can't be absolutely sure r358098 is the culprit, but it seems to make t=
he
> problem more reproducible.
>
> If anyone has a patch suggestion, I can test it.
> Otherwise, I will continue to test r358097 and earlier, to try and see wh=
at hangs
> occur. (I've done 8 cycles of testing of r356776 without difficulties, bu=
t that
> doesn't guarantee it isn't broken.)
>
> There is a bunch more of the stuff I got for Kostik and Ryan below.
> I can do "db" when it is hung, but it is a screen console, so I need to
> transcribe the output to email by hand. (ie. If you need something
> specific I can do that, but trying to do everything Kostik and Ryan asked
> for isn't easy.)
>
> rick
>
>
>
> Konstantin Belousov wrote:
> >On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote:
> >> Konstantin Belousov wrote:
> >> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
> >> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem =
 wrote:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > Since I hadn't upgraded a kernel through the winter, it took me a=
 while
> >> >> > to bisect this, but r358252 seems to be the culprit.
> No longer true. I succeeded in reproducing the hang to-day running a
> r358251 kernel.
>
> I haven't had much luck sofar, but see below for what I have learned.
>
> >> >> >
> >> >> > If I do a kernel build over NFS using my not so big Pentium 4 (si=
ngle core,
> >> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.
> >> >> > When I do a "ps" in the debugger, I see processes sleeping on bta=
lloc.
> >> >> > If I revert to r358251, I cannot reproduce this.
> As above, this is no longer true.
>
> >> >> >
> >> >> > Any ideas?
> >> >> >
> >> >> > I can easily test any change you might suggest to see if it fixes=
 the
> >> 

does a ZFS change in head require additional work?

2020-06-16 Thread Rick Macklem
Hi,

r362158 changed the arguments for zfs_checkexp() in head.
There were no other changes, since the arguments are simply
passed on to vfs_stdcheckexp().

Is there something else that needs to be done,
such as sending this patch upstream?

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-06-15 Thread Rick Macklem
Rick Macklem wrote:
>r358098 will hang fairly easily, in 1-3 cycles of the kernel build over NFS.
>I thought this was the culprit, since I did 6 cycles of r358097 without a hang.
>However, I just got a hang with r358097, but it looks rather different.
>The r358097 hang did not have any processes sleeping on btalloc. They
>appeared to be waiting on two different locks in the buffer cache.
>As such, I think it might be a different problem. (I'll admit I should have
>made notes about this one before rebooting, but I was flustrated that
>it happened and rebooted before looking at it mush detail.)
Ok, so I did 10 cycles of the kernel build over NFS for r358096 and never
got a hang.
--> It seems that r358097 is the culprit and r358098 makes it easier
  to reproduce.
  --> Basically runs out of kernel memory.

It is not obvious if I can revert these two commits without reverting
other ones, since there were a bunch of vm changes after these.

I'll take a look, but if you guys have any ideas on how to fix this, please
let me know.

Thanks, rick

Jeff, to fill you in, I have been getting intermittent hangs on a Pentium 4
(single core i386) with 1.25Gbytes ram when doing kernel builds using
head kernels from this winter. (I also saw one when doing a kernel build
on UFS, so they aren't NFS specific, although easier to reproduce that way.)
After a typical hang, there will be a bunch of processes sleeping on "btalloc"
and several processes holding the following lock:
exclusive sx lock @ vm/vm_map.c:4761
- I have seen hangs where that is the only lock held by any process except
   the interrupt thread.
- I have also seen processes waiting on the following locks:
kern/subr_vmem.c:1343
kern/subr_vmem.c:633

I can't be absolutely sure r358098 is the culprit, but it seems to make the
problem more reproducible.

If anyone has a patch suggestion, I can test it.
Otherwise, I will continue to test r358097 and earlier, to try and see what 
hangs
occur. (I've done 8 cycles of testing of r356776 without difficulties, but that
doesn't guarantee it isn't broken.)

There is a bunch more of the stuff I got for Kostik and Ryan below.
I can do "db" when it is hung, but it is a screen console, so I need to
transcribe the output to email by hand. (ie. If you need something
specific I can do that, but trying to do everything Kostik and Ryan asked
for isn't easy.)

rick



Konstantin Belousov wrote:
>On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote:
>> Konstantin Belousov wrote:
>> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
>> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem  wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > Since I hadn't upgraded a kernel through the winter, it took me a while
>> >> > to bisect this, but r358252 seems to be the culprit.
No longer true. I succeeded in reproducing the hang to-day running a
r358251 kernel.

I haven't had much luck sofar, but see below for what I have learned.

>> >> >
>> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single 
>> >> > core,
>> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.
>> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
>> >> > If I revert to r358251, I cannot reproduce this.
As above, this is no longer true.

>> >> >
>> >> > Any ideas?
>> >> >
>> >> > I can easily test any change you might suggest to see if it fixes the
>> >> > problem.
>> >> >
>> >> > If you want more debug info, let me know, since I can easily
>> >> > reproduce it.
>> >> >
>> >> > Thanks, rick
>> >>
>> >> Nothing obvious to me.  I can maybe try a repro on a VM...
>> >>
>> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
>> >>
>> >> "btalloc" is "We're either out of address space or lost a fill race."
>From what I see, I think it is "out of address space".
For one of the hangs, when I did "show alllocks", everything except the
intr thread, was waiting for the
exclusive sx lock @ vm/vm_map.c:4761

>> >
>> >Yes, I would be not surprised to be out of something on 1G i386 machine.
>> >Please also add 'show alllocks'.
>> Ok, I used an up to date head kernel and it took longer to reproduce a hang.
Go down to Kostik's comment about kern.maxvnodes for the rest of what I've
learned. (The time it takes to reproduce one of these varies greatly, but I 
usually
get one within 3 cycles of a full kernel build over NFS. I have had it happen
once when doing

Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-06-09 Thread Rick Macklem
Hope you don't mind the top post, but since this is now an update and somewhat
different, I don't think it makes sense to imbed this in the message below.

r358098 will hang fairly easily, in 1-3 cycles of the kernel build over NFS.
I thought this was the culprit, since I did 6 cycles of r358097 without a hang.
However, I just got a hang with r358097, but it looks rather different.
The r358097 hang did not have any processes sleeping on btalloc. They
appeared to be waiting on two different locks in the buffer cache.
As such, I think it might be a different problem. (I'll admit I should have
made notes about this one before rebooting, but I was flustrated that
it happened and rebooted before looking at it mush detail.)

Jeff, to fill you in, I have been getting intermittent hangs on a Pentium 4
(single core i386) with 1.25Gbytes ram when doing kernel builds using
head kernels from this winter. (I also saw one when doing a kernel build
on UFS, so they aren't NFS specific, although easier to reproduce that way.)
After a typical hang, there will be a bunch of processes sleeping on "btalloc"
and several processes holding the following lock:
exclusive sx lock @ vm/vm_map.c:4761
- I have seen hangs where that is the only lock held by any process except
   the interrupt thread.
- I have also seen processes waiting on the following locks:
kern/subr_vmem.c:1343
kern/subr_vmem.c:633

I can't be absolutely sure r358098 is the culprit, but it seems to make the
problem more reproducible.

If anyone has a patch suggestion, I can test it.
Otherwise, I will continue to test r358097 and earlier, to try and see what 
hangs
occur. (I've done 8 cycles of testing of r356776 without difficulties, but that
doesn't guarantee it isn't broken.)

There is a bunch more of the stuff I got for Kostik and Ryan below.
I can do "db" when it is hung, but it is a screen console, so I need to
transcribe the output to email by hand. (ie. If you need something
specific I can do that, but trying to do everything Kostik and Ryan asked
for isn't easy.)

rick



Konstantin Belousov wrote:
>On Fri, May 22, 2020 at 11:46:26PM +, Rick Macklem wrote:
>> Konstantin Belousov wrote:
>> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
>> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem  wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > Since I hadn't upgraded a kernel through the winter, it took me a while
>> >> > to bisect this, but r358252 seems to be the culprit.
No longer true. I succeeded in reproducing the hang to-day running a
r358251 kernel.

I haven't had much luck sofar, but see below for what I have learned.

>> >> >
>> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single 
>> >> > core,
>> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.
>> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
>> >> > If I revert to r358251, I cannot reproduce this.
As above, this is no longer true.

>> >> >
>> >> > Any ideas?
>> >> >
>> >> > I can easily test any change you might suggest to see if it fixes the
>> >> > problem.
>> >> >
>> >> > If you want more debug info, let me know, since I can easily
>> >> > reproduce it.
>> >> >
>> >> > Thanks, rick
>> >>
>> >> Nothing obvious to me.  I can maybe try a repro on a VM...
>> >>
>> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
>> >>
>> >> "btalloc" is "We're either out of address space or lost a fill race."
>From what I see, I think it is "out of address space".
For one of the hangs, when I did "show alllocks", everything except the
intr thread, was waiting for the
exclusive sx lock @ vm/vm_map.c:4761

>> >
>> >Yes, I would be not surprised to be out of something on 1G i386 machine.
>> >Please also add 'show alllocks'.
>> Ok, I used an up to date head kernel and it took longer to reproduce a hang.
Go down to Kostik's comment about kern.maxvnodes for the rest of what I've
learned. (The time it takes to reproduce one of these varies greatly, but I 
usually
get one within 3 cycles of a full kernel build over NFS. I have had it happen
once when doing a kernel build over UFS.)

>> This time, none of the processes are stuck on "btalloc".
> I'll try and give you most of the above, but since I have to type it in by 
> hand
> from the screen, I might not get it all. (I'm no real typist;-)
> > show alllocks
> exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259
> 

new mbuf functions to allocate and copy into ext_pgs mbufs with anonymous pages

2020-06-07 Thread Rick Macklem
Hi,

I just put a patch that has a couple of new mbuf handling functions here:
https://reviews.freebsd.org/D25182
I listed glebius@, gallatin@ and jhb@ as possible reviewers, but if anyone
else wants to review or comment on these, please feel free to do so.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


getgrouplist duplication of cr_groups[0] as cr_groups[1]

2020-06-03 Thread Rick Macklem
Hi,

During testing of a mountd.c patch I have, I found an "old bug" where the
mountd.c code assumed that getgrouplist() would always duplicate
cr_groups[0] in cr_groups[1].

If I read the commit logs correctly, this was always the case until
r174547 (only 12years ago), which switched getgrouplist() to
use __getgroupmembership().
Kirk fixed the deduplication code in gr_addgid() in r328304 so that
gr_addgid() would not deduplicate cr_groups[0,1].
However, the case where the "user" is not also listed in the group
database for the same group as their gid in the password database
will not be duplicated.
--> It also implies that getgrouplist() can return with ngroups == 1,
  with only the basegid in it.

So, is getgrouplist(3) always returning with cr_groups[0] and cr_groups[1]
duplicated required behaviour?

If the duplication is not required, then I can easily fix mountd to
check for the non-duplicated case.
I will probably patch it anyhow, since the one line change will be
harmless even if getgrouplist() is changed to always return the
duplicate of cr_groups[0] in cr_groups[1].

rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: vfs_mouse.c breakage?

2020-06-01 Thread Rick Macklem
It also needed .
It is ancient code (that started out in SunOS. if I recall correctly), where 
they
used things like "bool_t" and set them with TRUE/FALSE (upper case).

Unfortunately, those includes love to include other includes...

Anyhow, I think it is fixed now, rick


From: Pete Wright 
Sent: Monday, June 1, 2020 8:05 PM
To: Rick Macklem; FreeBSD Current
Subject: Re: vfs_mouse.c breakage?

CAUTION: This email originated from outside of the University of Guelph. Do not 
click links or open attachments unless you recognize the sender and know the 
content is safe. If in doubt, forward suspicious emails to ith...@uoguelph.ca



On 6/1/20 2:50 PM, Rick Macklem wrote:
> Pete Wright wrote:
>> Subject: vfs_mouse.c breakage?
> Not sure if the vfs mouse is broken (sorry, I couldn't resist), but...

hah nice - dyslexia + poor eyesight are not good bedfellows :^)
>
> I think it needs a:
> #include 
>
> but it will take a little while for me to test this.
>
> Thanks for reporting it, rick

no prob - adding that include thew some more errors

$ git diff
diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c
index 03f95b2845f9..4282b1938095 100644
--- a/sys/kern/vfs_mount.c
+++ b/sys/kern/vfs_mount.c
@@ -39,6 +39,7 @@
  #include 
  __FBSDID("$FreeBSD$");

+#include 
  #include 
  #include 
  #include 


here's a snippet of the exception:
--- vfs_mount.o ---
In file included from /usr/home/pete/git/freebsd/sys/kern/vfs_mount.c:42:
In file included from /usr/home/pete/git/freebsd/sys/rpc/auth.h:50:
/usr/home/pete/git/freebsd/sys/rpc/xdr.h:105:3: error: type name
requires a specifier or qualifier
 bool_t  (*x_getlong)(struct XDR *, long *);


I'll sit tight for now - thanks for checking it out!

-pete

--
Pete Wright
p...@nomadlogic.org
@nomadlogicLA

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: vfs_mouse.c breakage?

2020-06-01 Thread Rick Macklem
Pete Wright wrote:
>Subject: vfs_mouse.c breakage?
Not sure if the vfs mouse is broken (sorry, I couldn't resist), but...

I think it needs a:
#include 

but it will take a little while for me to test this.

Thanks for reporting it, rick

>hello - i am having issues building CURRENT after this was applied:
>--- vfs_mount.o ---
>/usr/home/pete/git/freebsd/sys/kern/vfs_mount.c:2360:27: error: use of
>undeclared identifier 'AUTH_SYS'
> exp->ex_secflavors[0] = AUTH_SYS;
> ^
>1 error generated.
>*** [vfs_mount.o] Error code 1
>
>
>was curious if others are seeing this?
>
>cheers,
>-pete
>
>--
>Pete Wright
>p...@nomadlogic.org
>@nomadlogicLA
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


review of an update to "struct export_args"

2020-05-31 Thread Rick Macklem
Hi,

I have put a patch up on phabricator
https://reviews.freebsd.org/D25088

I have listed kib@ and freqlabs@ as reviewers, but if anyone else
wishes to review it, be my guest.

It updates "struct export_args" to make the ex_flags field 64bits and
the mapped user (is called ex_anon in the current structure) is no longer
limited to 16 additional groups.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-05-26 Thread Rick Macklem
Konstantin Belousov wrote:
>On Fri, May 22, 2020 at 11:46:26PM +0000, Rick Macklem wrote:
>> Konstantin Belousov wrote:
>> >On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
>> >> On Wed, May 20, 2020 at 6:04 PM Rick Macklem  wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > Since I hadn't upgraded a kernel through the winter, it took me a while
>> >> > to bisect this, but r358252 seems to be the culprit.
No longer true. I succeeded in reproducing the hang to-day running a
r358251 kernel.

I haven't had much luck sofar, but see below for what I have learned.

>> >> >
>> >> > If I do a kernel build over NFS using my not so big Pentium 4 (single 
>> >> > core,
>> >> > 1.25Gbytes RAM, i386), about every second attempt will hang.
>> >> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
>> >> > If I revert to r358251, I cannot reproduce this.
As above, this is no longer true.

>> >> >
>> >> > Any ideas?
>> >> >
>> >> > I can easily test any change you might suggest to see if it fixes the
>> >> > problem.
>> >> >
>> >> > If you want more debug info, let me know, since I can easily
>> >> > reproduce it.
>> >> >
>> >> > Thanks, rick
>> >>
>> >> Nothing obvious to me.  I can maybe try a repro on a VM...
>> >>
>> >> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
>> >>
>> >> "btalloc" is "We're either out of address space or lost a fill race."
>From what I see, I think it is "out of address space".
For one of the hangs, when I did "show alllocks", everything except the
intr thread, was waiting for the
exclusive sx lock @ vm/vm_map.c:4761

>> >
>> >Yes, I would be not surprised to be out of something on 1G i386 machine.
>> >Please also add 'show alllocks'.
>> Ok, I used an up to date head kernel and it took longer to reproduce a hang.
Go down to Kostik's comment about kern.maxvnodes for the rest of what I've
learned. (The time it takes to reproduce one of these varies greatly, but I 
usually
get one within 3 cycles of a full kernel build over NFS. I have had it happen
once when doing a kernel build over UFS.)

>> This time, none of the processes are stuck on "btalloc".
> I'll try and give you most of the above, but since I have to type it in by 
> hand
> from the screen, I might not get it all. (I'm no real typist;-)
> > show alllocks
> exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259
> exclusive lockmgr nfs (nfs) r = 0 locked @ kern/vfs_lookup.c:737
> exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked @ 
> kern/subr_vmem.c:1343
> exclusive lockmgr bufwait (bufwait) r = 0 locked @ kern/vfs_bio.c:1663
> exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c:2930
> exclusive lockmgr syncer (syncer) r = 0 locked @ kern/vfs_subr.c:2474
> Process 12 (intr) thread 0x.. (108)
> exclusive sleep mutex Giant (Giant) r = 0 locked @ kern/kern_intr.c:1152
>
> > ps
> - Not going to list them all, but here are the ones that seem interesting...
> 18 0 0 0 DL vlruwt 0x11d939cc [vnlru]
> 16 0 0 0 DL (threaded)   [bufdaemon]
> 100069  D  qsleep  [bufdaemon]
> 100074  D  -   [bufspacedaemon-0]
> 100084  D  sdflush  0x11923284 [/ worker]
> - and more of these for the other UFS file systems
> 9 0 0 0   DL psleep  0x1e2f830  [vmdaemon]
> 8 0 0 0   DL (threaded)   [pagedaemon]
> 100067  D   psleep 0x1e2e95c   [dom0]
> 100072  D   launds 0x1e2e968   [laundry: dom0]
> 100073  D   umarcl 0x12cc720   [uma]
> … a bunch of usb and cam ones
> 100025  D   -   0x1b2ee40  [doneq0]
> …
> 12 0 0 0 RL  (threaded)   [intr]
> 17  I [swi6: task queue]
> 18  Run   CPU 0   [swi6: Giant taskq]
> …
> 10  D   swapin 0x1d96dfc[swapper]
> - and a bunch more in D state.
> Does this mean the swapper was trying to swap in?
>
> > acttrace
> - just shows the keyboard
> kdb_enter() at kdb_enter+0x35/frame
> vt_kbdevent() at vt_kdbevent+0x329/frame
> kdbmux_intr() at kbdmux_intr+0x19/frame
> taskqueue_run_locked() at taskqueue_run_locked+0x175/frame
> taskqueue_run() at taskqueue_run+0x44/frame
> taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame
> ithread_loop() at ithread_loop+0x237/frame
> fork_exit() at fork_exit+0x6c/frame
> fork_trampo

Re: r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-05-22 Thread Rick Macklem
Konstantin Belousov wrote:
>On Wed, May 20, 2020 at 11:58:50PM -0700, Ryan Libby wrote:
>> On Wed, May 20, 2020 at 6:04 PM Rick Macklem  wrote:
>> >
>> > Hi,
>> >
>> > Since I hadn't upgraded a kernel through the winter, it took me a while
>> > to bisect this, but r358252 seems to be the culprit.
>> >
>> > If I do a kernel build over NFS using my not so big Pentium 4 (single core,
>> > 1.25Gbytes RAM, i386), about every second attempt will hang.
>> > When I do a "ps" in the debugger, I see processes sleeping on btalloc.
>> > If I revert to r358251, I cannot reproduce this.
>> >
>> > Any ideas?
>> >
>> > I can easily test any change you might suggest to see if it fixes the
>> > problem.
>> >
>> > If you want more debug info, let me know, since I can easily
>> > reproduce it.
>> >
>> > Thanks, rick
>>
>> Nothing obvious to me.  I can maybe try a repro on a VM...
>>
>> ddb ps, acttrace, alltrace, show all vmem, show page would be welcome.
>>
>> "btalloc" is "We're either out of address space or lost a fill race."
>
>Yes, I would be not surprised to be out of something on 1G i386 machine.
>Please also add 'show alllocks'.
Ok, I used an up to date head kernel and it took longer to reproduce a hang.
This time, none of the processes are stuck on "btalloc".
I'll try and give you most of the above, but since I have to type it in by hand
from the screen, I might not get it all. (I'm no real typist;-)
> show alllocks
exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c: 3259
exclusive lockmgr nfs (nfs) r = 0 locked @ kern/vfs_lookup.c:737
exclusive sleep mutex kernel area domain (kernel arena domain) r = 0 locked @ 
kern/subr_vmem.c:1343
exclusive lockmgr bufwait (bufwait) r = 0 locked @ kern/vfs_bio.c:1663
exclusive lockmgr ufs (ufs) r = 0 locked @ kern/vfs_subr.c:2930
exclusive lockmgr syncer (syncer) r = 0 locked @ kern/vfs_subr.c:2474
Process 12 (intr) thread 0x.. (108)
exclusive sleep mutex Giant (Giant) r = 0 locked @ kern/kern_intr.c:1152

> ps
- Not going to list them all, but here are the ones that seem interesting...
18 0 0 0 DL vlruwt 0x11d939cc [vnlru]
16 0 0 0 DL (threaded)   [bufdaemon]
100069  D  qsleep  [bufdaemon]
100074  D  -   [bufspacedaemon-0]
100084  D  sdflush  0x11923284 [/ worker]
- and more of these for the other UFS file systems
9 0 0 0   DL psleep  0x1e2f830  [vmdaemon]
8 0 0 0   DL (threaded)   [pagedaemon]
100067  D   psleep 0x1e2e95c   [dom0]
100072  D   launds 0x1e2e968   [laundry: dom0]
100073  D   umarcl 0x12cc720   [uma]
… a bunch of usb and cam ones
100025  D   -   0x1b2ee40  [doneq0]
…
12 0 0 0 RL  (threaded)   [intr]
17  I [swi6: task queue]
18  Run   CPU 0   [swi6: Giant taskq]
…
10  D   swapin 0x1d96dfc[swapper]
- and a bunch more in D state.
Does this mean the swapper was trying to swap in?

> acttrace
- just shows the keyboard
kdb_enter() at kdb_enter+0x35/frame
vt_kbdevent() at vt_kdbevent+0x329/frame
kdbmux_intr() at kbdmux_intr+0x19/frame
taskqueue_run_locked() at taskqueue_run_locked+0x175/frame
taskqueue_run() at taskqueue_run+0x44/frame
taskqueue_swi_giant_run(0) at taskqueue_swi_giant_run+0xe/frame
ithread_loop() at ithread_loop+0x237/frame
fork_exit() at fork_exit+0x6c/frame
fork_trampoline() at 0x../frame

> show all vmem
vmem 0x.. 'transient arena'
  quantum: 4096
  size:  23592960
  inuse: 0
  free: 23592960
  busy tags:   0
  free tags:2
 inusesize   freesize
  16777216   0   0   123592960
vmem 0x.. 'buffer arena'
  quantum:  4096
  size:   94683136
  inuse: 94502912
  free: 180224
  busy tags:1463
  free tags:  3
   inuse  size freesize
  16384   2 32768 1 16384
  32768   39   1277952 1  32768
  655361422   93192192 0   0
  131072  0 01  131072
vmem 0x.. 'i386trampoline'
  quantum:  1
  size:   24576
  inuse: 20860
  free:   3716
  busy tags: 9
  free tags:  3
   inuse  size  free  size
  32 1 481   52
  64  2208   0   0
  1282280   00
  2048  12048 1   3664
  4096  28192 0   0
  8192  110084   0   0
vmem 0x.. 'kernel rwx arena'
  quantum:4096

Re: RFC: merging nfs-over-tls changes into head/sys

2020-05-22 Thread Rick Macklem
John Baldwin wrote:
>On 5/21/20 2:01 PM, Rick Macklem wrote:
>> Hi,
>>
>> I have now completed changes to the code in projects/nfs-over-tls, which
>> implements TLS encryption of NFS RPC messages. (This roughly conforms
>> to the internet draft "Towards Remote Procedure Call Encryption By Default",
>> which should soon become an RFC. For now, TLS1.2 is used instead of TLS1.3,
>> since FreeBSD's KERN_TLS does not yet implement TLS1.3.)
>>
>> I'd like to start merging some of the kernel changes into head/sys.
>>
>> The first of these would be creation of the syscall used by the daemons.
>> (The code in projects/nfs-over-tls cheats and uses the syscall for the gssd,
>>  but it needs to have its own syscall so that the gssd daemon can run 
>> concurrently
>>  with it. I didn't want testers to need to build userland just to get a 
>> syscall stub
>>  in libc.)
>>
>> After this, there are a bunch of changes to the NFS code to add support for
>> ext_pgs mbufs (these are significant patches, but should not affect the
>> non-ext_pgs mbuf case, since they'll be conditional on ND_EXTPGS/M_EXTPGS).
>>
>> Does this sound ok to do?
>>
>> Please let me know if you see problems with me doing this?
>
>I don't see any problems, per se, but I still need to do some changes on my
>end for software KTLS RX before it's ready to merge (I'm hoping to kill
>the iovecs in the kthreads entirely).
Sure. My plan is to merge bits and pieces, because some of it involves parts
of the system like mount exports or changes to soreceive_generic(),
that will require reviews.

To be honest, most of the changes are not specifically nfs-over-tls (or
krpc-over-tls, although NFS is currently the only consumer).
They are things like generating ext_pgs mbuf lists (which can be used for
non-TLS connections, although I'm not sure they are useful for other cases?)
or a better way of handling the krpc client side receive.

I think it will be quite a while before all the kernel bits are in head, but 
having
the syscall in head (mainly the syscall stub in libc) will make it easier for
testers to set systems up. They may not be FreeBSD types.

No rush on the TLS changes from my perspective. (It would be nice to get
the kernel bits in FreeBSD13. The userland stuff could probably become a
package/port, I think?

Thanks yet again, for your help with this, rick


--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


RFC: merging nfs-over-tls changes into head/sys

2020-05-21 Thread Rick Macklem
Hi,

I have now completed changes to the code in projects/nfs-over-tls, which
implements TLS encryption of NFS RPC messages. (This roughly conforms
to the internet draft "Towards Remote Procedure Call Encryption By Default",
which should soon become an RFC. For now, TLS1.2 is used instead of TLS1.3,
since FreeBSD's KERN_TLS does not yet implement TLS1.3.)

I'd like to start merging some of the kernel changes into head/sys.

The first of these would be creation of the syscall used by the daemons.
(The code in projects/nfs-over-tls cheats and uses the syscall for the gssd,
 but it needs to have its own syscall so that the gssd daemon can run 
concurrently
 with it. I didn't want testers to need to build userland just to get a syscall 
stub
 in libc.)

After this, there are a bunch of changes to the NFS code to add support for
ext_pgs mbufs (these are significant patches, but should not affect the
non-ext_pgs mbuf case, since they'll be conditional on ND_EXTPGS/M_EXTPGS).

Does this sound ok to do?

Please let me know if you see problems with me doing this?

Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


r358252 causes intermittent hangs where processes are stuck sleeping on btalloc

2020-05-20 Thread Rick Macklem
Hi,

Since I hadn't upgraded a kernel through the winter, it took me a while
to bisect this, but r358252 seems to be the culprit.

If I do a kernel build over NFS using my not so big Pentium 4 (single core,
1.25Gbytes RAM, i386), about every second attempt will hang.
When I do a "ps" in the debugger, I see processes sleeping on btalloc.
If I revert to r358251, I cannot reproduce this.

Any ideas?

I can easily test any change you might suggest to see if it fixes the
problem.

If you want more debug info, let me know, since I can easily
reproduce it.

Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: TLS certificates for NFS-over-TLS floating client

2020-03-28 Thread Rick Macklem
John-Mark Gurney wrote:
>Rick Macklem wrote this message on Thu, Mar 26, 2020 at 14:33 +:
>> John-Mark Gurney wrote:
>> [lots of stuff snipped]
>> >Rick Macklem wrote:
>> >> I had originally planned on some "secret" in the certificate (like a CN 
>> >> name
>> >> that satisfies some regular expression or ???) but others convinced me 
>> >> that
>> >> that wouldn't provide anything beyond knowing that the certificate was
>> >> signed by the appropriate CA, so I have not implemented anything.
>> >
>> >Yeah, having a "secret" in the CN doesn't make sense, but what does make
>> >sense is allowing the exports line to specify what the CN should contain
>> >to be authenticated...
>> >
>> >Say as a corp, you issue personal certificates to everyone.  This is
>> >because you require everyone to sign and/or encrypt their email w/
>> >S/MIME.  Each cert includes the email address of that person, so you
>> >could simply do something like:
>> >/home/alice -tlscert -tlsroot /etc/company.pem -email al...@example.com
>> >
>> >And anyone who has the certificate w/ al...@example.com that was signed
>> >by the public key in /etc/company.pem would be granted access to the
>> >export /home/alice.
>> >
>> >If it allowed ANY cert signed by the CA specified, then that introduces
>> >an authentication problem, as now if Malory is a coworker of Alice
>> >could also access Alice's home directory...
>> >
>> >IMO, this is one auth feature that MUST be supported...
Here's what I have just coded up:
- If an option is set for the server TLS handshake daemon and it gets a verified
  certificate from the client
  - It looks at the CN and if it is of the form "user@domain", it tries to 
translate
"user@domain" to a POSIX  using the same mechanism that
nfsuserd(8) currently uses. ("user@domain" is what NFSv4 uses for a file 
Owner.)
- Then all RPCs on this TCP connection are done using the  
above,
   ignoring the authenticator in the RPC message header. (Yes, similar to 
the
   -mapall exports option.)
I have also added a "-tlscnuser" exports option that would require all clients
to have the above form of certificate. (Without this exports option, the above
would work assuming the daemon option is set, but other mounts would be
allowed as well.)

The problem of handling multiple "user" domains has never been solved.
(That is what your -tlsroot option was intended to do assuming a 1 to 1
 relationship between CA root and username domains, I think?)
The problem is that getpwnam(3) needs to know how to look up user names
for these different "domain" values. (Now, both nfsused(8) and this daemon
only strips off the default domain, if it matches, and then hands the rest of
the string to getpwnam(2).)

Although "user@domain" isn't exactly an email address, they often are the
same string in practice.

I have not yet posted to nf...@ietf.org to see what they think.
However, I don't think there is any changes in the draft required to do this.
Also, I think interoperability should be ok, since it is controlled by whoever
issues the certificate for the client and the NFS client will normally just
handle this certificate opaquely.

Btw, the server TLS handshake daemon does do a SSL_CTX_set_client_CA_list(),
so it tells the client which CA (usually only one) that it wants a certificate 
for.

>> The draft does not address user authentication, only machine authentication.
>> --> ie. The certificate is used to decide if a system can do a mount.
>>   Users are still identified via user credentials in the RPC message 
>> header.
>>   For AUTH_SYS, that still means .
>>   Otherwise, you need to use Kerberos (sec=krb5[ip]).
>>   You could use "tls,sec=krb5" for a mount, but the only advantage that
>>   might have over "sec=krb5p" is performance, if there is hardware assist
>>   for TLS or something like that.
>>
>> >Now that I reread your comments, it sounds like any certificate would be
>> >allowed in #2 as long as it is valid, and that would only be marginally
>> >better than verification by IP, and in some ways worse, in that now any
>> >user could pretend to be any other user, or you have to do something
>> >crazy like have a CA per user.
>> The case where I see #2 is useful is where this discussion started some 
>> weeks ago.
>> The example I started with was:
>> /home -tls -network 192.168.1.0 -mask 255.255.255.0
>> /home -tlscert
>>
>> This says that machines on the local l

Re: TLS certificates for NFS-over-TLS floating client

2020-03-26 Thread Rick Macklem
Sorry about the top post, but I thought of a few things to add to my
last post to this thread...
1 - I agree that for systems like laptops, the line between machine and
 user authentication is fuzzy.
2 - I do like your idea of having an exports(5) option that specifies a CN
  that identifies a user and then does all RPCs as that user.
  I'm not sure if an issuer needs to be specified (or even can be 
specified),
  since the CAfile argument to the rpctlssd daemon determines if a 
certificate
  from a particular CA will verify and the verification must happen before 
the
  exports(5) export options can be applied. (Basically, certificate 
verification
  happens via a NULL RPC that does a STARTTLS when a TCP connection to
  the server is first established.)
3 - I'll post to nf...@ietf.org to see what others think of this, since it 
would not
  require any changes to the draft/RFC.
4 - Although it does require a revision to the export_args structure, I think 
it is
 worth doing even if others don't implement it.

Again, thanks for your comments, rick


From: owner-freebsd-curr...@freebsd.org  on 
behalf of Rick Macklem 
Sent: Thursday, March 26, 2020 10:33 AM
To: John-Mark Gurney
Cc: Alexander Leidinger; freebsd-current@FreeBSD.org
Subject: Re: TLS certificates for NFS-over-TLS floating client

John-Mark Gurney wrote:
[lots of stuff snipped]
>Rick Macklem wrote:
>> I had originally planned on some "secret" in the certificate (like a CN name
>> that satisfies some regular expression or ???) but others convinced me that
>> that wouldn't provide anything beyond knowing that the certificate was
>> signed by the appropriate CA, so I have not implemented anything.
>
>Yeah, having a "secret" in the CN doesn't make sense, but what does make
>sense is allowing the exports line to specify what the CN should contain
>to be authenticated...
>
>Say as a corp, you issue personal certificates to everyone.  This is
>because you require everyone to sign and/or encrypt their email w/
>S/MIME.  Each cert includes the email address of that person, so you
>could simply do something like:
>/home/alice -tlscert -tlsroot /etc/company.pem -email al...@example.com
>
>And anyone who has the certificate w/ al...@example.com that was signed
>by the public key in /etc/company.pem would be granted access to the
>export /home/alice.
>
>If it allowed ANY cert signed by the CA specified, then that introduces
>an authentication problem, as now if Malory is a coworker of Alice
>could also access Alice's home directory...
>
>IMO, this is one auth feature that MUST be supported...
The draft does not address user authentication, only machine authentication.
--> ie. The certificate is used to decide if a system can do a mount.
  Users are still identified via user credentials in the RPC message header.
  For AUTH_SYS, that still means .
  Otherwise, you need to use Kerberos (sec=krb5[ip]).
  You could use "tls,sec=krb5" for a mount, but the only advantage that
  might have over "sec=krb5p" is performance, if there is hardware assist
  for TLS or something like that.

>Now that I reread your comments, it sounds like any certificate would be
>allowed in #2 as long as it is valid, and that would only be marginally
>better than verification by IP, and in some ways worse, in that now any
>user could pretend to be any other user, or you have to do something
>crazy like have a CA per user.
The case where I see #2 is useful is where this discussion started some weeks 
ago.
The example I started with was:
/home -tls -network 192.168.1.0 -mask 255.255.255.0
/home -tlscert

This says that machines on the local lan can mount and do not need to have
certificates, but must use TLS so data is encrypted on the wire.
Mounts from anywhere else (presumably laptops) are allowed so long as they have 
a
certificate signed by me (as in the site local CA).
I trust these machines enough that I am willing to allow them to use AUTH_SYS,
which is what 99.9...% of NFS mounts do now.
(So, I'd agree that the site local certificate is not a lot better than IP 
address
 for client machine identity, just that it is an alternative that can be useful.
 Without TLS, a line like "/home" allows anyone to mount /home from anywhere
 and I think you'd agree that few NFS admins. will want to do that. I'm assuming
 no external firewall for this example.)

Now, your suggestion of identifying a user via the CN and then having the
server do all RPCs for the mount as that user is an interesting one.
My concern w.r.t. implementing it would be interoperability.
Put another way, if other servers such as Linux, Netapp,... don't adopt it
(and they won't until there is a draft/RFC specifying it), it would be
FreeBSD server specific and I'd lik

Re: TLS certificates for NFS-over-TLS floating client

2020-03-26 Thread Rick Macklem
John-Mark Gurney wrote:
[lots of stuff snipped]
>Rick Macklem wrote:
>> I had originally planned on some "secret" in the certificate (like a CN name
>> that satisfies some regular expression or ???) but others convinced me that
>> that wouldn't provide anything beyond knowing that the certificate was
>> signed by the appropriate CA, so I have not implemented anything.
>
>Yeah, having a "secret" in the CN doesn't make sense, but what does make
>sense is allowing the exports line to specify what the CN should contain
>to be authenticated...
>
>Say as a corp, you issue personal certificates to everyone.  This is
>because you require everyone to sign and/or encrypt their email w/
>S/MIME.  Each cert includes the email address of that person, so you
>could simply do something like:
>/home/alice -tlscert -tlsroot /etc/company.pem -email al...@example.com
>
>And anyone who has the certificate w/ al...@example.com that was signed
>by the public key in /etc/company.pem would be granted access to the
>export /home/alice.
>
>If it allowed ANY cert signed by the CA specified, then that introduces
>an authentication problem, as now if Malory is a coworker of Alice
>could also access Alice's home directory...
>
>IMO, this is one auth feature that MUST be supported...
The draft does not address user authentication, only machine authentication.
--> ie. The certificate is used to decide if a system can do a mount.
  Users are still identified via user credentials in the RPC message header.
  For AUTH_SYS, that still means .
  Otherwise, you need to use Kerberos (sec=krb5[ip]).
  You could use "tls,sec=krb5" for a mount, but the only advantage that
  might have over "sec=krb5p" is performance, if there is hardware assist
  for TLS or something like that.

>Now that I reread your comments, it sounds like any certificate would be
>allowed in #2 as long as it is valid, and that would only be marginally
>better than verification by IP, and in some ways worse, in that now any
>user could pretend to be any other user, or you have to do something
>crazy like have a CA per user.
The case where I see #2 is useful is where this discussion started some weeks 
ago.
The example I started with was:
/home -tls -network 192.168.1.0 -mask 255.255.255.0
/home -tlscert

This says that machines on the local lan can mount and do not need to have
certificates, but must use TLS so data is encrypted on the wire.
Mounts from anywhere else (presumably laptops) are allowed so long as they have 
a
certificate signed by me (as in the site local CA).
I trust these machines enough that I am willing to allow them to use AUTH_SYS,
which is what 99.9...% of NFS mounts do now.
(So, I'd agree that the site local certificate is not a lot better than IP 
address
 for client machine identity, just that it is an alternative that can be useful.
 Without TLS, a line like "/home" allows anyone to mount /home from anywhere
 and I think you'd agree that few NFS admins. will want to do that. I'm assuming
 no external firewall for this example.)

Now, your suggestion of identifying a user via the CN and then having the
server do all RPCs for the mount as that user is an interesting one.
My concern w.r.t. implementing it would be interoperability.
Put another way, if other servers such as Linux, Netapp,... don't adopt it
(and they won't until there is a draft/RFC specifying it), it would be
FreeBSD server specific and I'd like to avoid that.
There was some discussion w.r.t. user authentication via certificates
during development of the draft, but they decided to defer that work for
now, so they could get something in place for machine authentication first.
(If I understood the discussion on nf...@ietf.org.)

rick

>I'm wonder if your use of the term secret was the problem, and not the
>idea...  Anything that goes in the client cert is by definition public...
>TLS prior to 1.3 sends the client cert in clear text...  But keying
>based upon the contents of the cert is fine, as that's the point of
>what a cert is..  It's trusting the CA to say that the CN and other
>fields in the cert corresponds to this user, and you can use parts of
>the cert, like the CN to decide which user the public key in the cert
>corresponds to.

--
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: TLS certificates for NFS-over-TLS floating client

2020-03-25 Thread Rick Macklem
John-Mark Gurney wrote:
>Rick Macklem wrote this message on Mon, Mar 23, 2020 at 23:53 +:
>> Alexander Leidinger wrote:
>> John-Mark Gurney  wrote:
>> >>Rick Macklem wrote:
>> >>> to be the best solution. The server can verify that the certificate
>> >>> was issued by
>> >>> the local CA. Unfortunately, if the client is compromised and the
>> >>> certificate is copied
>> >>> to another client, that client would gain access.
>> >>
>> >> This is why CRLs/OSCP is necessary, but there isn't anything you can do
>> >> to prevent that.  This is both a better situation than what we have
>> >> today (no auth/confidentiality), and if you tie the cert to an IP, it's
>> >> of limited use, and better than today...
>> >
>> >There are multiple ways to handle that:
>> >  - First of all, you can just validate based upon "cert is signed by
>> >trusted CA".
>> >  - Then you can require that the CN matches the hostname and the
>> >reverse lookup matches.
>> >  - Or (similar to browsers today) you can ignore the CN and require
>> >that the hostnames of the client matches one of the subject
>> >alternative name (SAN) entries (requires reverse DNS lookup to work
>> >and match).
>> At this point, I have three variants in the code (strictest to less strict):
>> 1 - A "-h" command line option on the server handshake daemon (called 
>> rpctlssd).
>>  This requires that all clients have
>>  certificates that validate and have the FQDN acquired via reverse DNS of
>>  the IP address of the client for the TCP connection 
>> (getnameinfo(NI_NAMEREQD))
>>  in either the subjectAltName or CommonName. (I call X509_check_host()
>>  and this is what I understand it checks.)
>>  --> This case implies that the NFS server admin. does not "trust" the
>> client's IP address enough to apply exports(5) line(s)to it to 
>> decide to
>> allow the client to do an NFS mount.
>>  (An NFS server *might* be willing to allow client(s) to mount via any 
>> IP address
>>   for the #2 case below and not use this option.)
>> 2 - Without "-h" the rpctlssd daemon passes flags into the kernel to indicate
>>  if the client provided a certificate and whether or not it verified.
>>  Then the "-tlscert" option on the appropriate exports(5) line(s)
>>  indicates that the client must have provided a certificate that 
>> verified.
>>  --> For this case (and #3), the server admin. is willing to "trust" the 
>> client's
>> IP address enough to apply exports(5) rules to it.
>>  --> This is the case where a floating (no fixed IP) laptop could have a
>>certificate signed by a site local CA.
>> 3 - Similar to #2, but uses the "-tls" option on the exports(5) line(s).
>>  In this case, the client must use TLS so that data is encrypted on the 
>> wire,
>>  but does not need to have a certificate.
>>  --> The NFS server admin. "trusts" the client IP address enough that 
>> they
>>are willing to allow the client to mount based on that IP, but 
>> wants the
>>data to be encrypted on the wire.
>>- Avoids the bother of generating certificates for the client(s).
>>
>> A part of this (as discussed in the IETF draft) is to make this easy enough 
>> to
>> use that it does get used. (sec=krb5p achieves both user authentication and
>> data encryption on the wire, but does not get widely used, due to the need
>> to run a KDC, etc).
>>
>> Comments on the above options is welcome, since this does need to be
>> reviewed by users.
>
>Maybe I'm missing the option where the cert needs to be authenticated,
>but matching against IP/dns name does not need to be done.  Or is this
>a case of #2.  I'm just confused by the first point of #2 in that the
>server admin is wiling to trust the IP address...
Yes, that is #2. "trust the IP address" probably wasn't a good way to
express it. I was simply trying to say "the NFS server admin. is willing to
use the client IP address to select rules via the exports(5) file".

>I'd like to see where CN or other field is freeform/provided by exports
>entry, and validated to gain access to those resources...  i.e. it
>doesn't matter what IP or DNS name the client is, it's based solely on
>the certificate.  This would allow roaming users..  This maybe be
>addressed by #

Re: TLS certificates for NFS-over-TLS floating client

2020-03-23 Thread Rick Macklem
Alexander Leidinger wrote:
John-Mark Gurney  wrote:
>>Rick Macklem wrote:
>>> to be the best solution. The server can verify that the certificate  
>>> was issued by
>>> the local CA. Unfortunately, if the client is compromised and the  
>>> certificate is copied
>>> to another client, that client would gain access.
>>
>> This is why CRLs/OSCP is necessary, but there isn't anything you can do
>> to prevent that.  This is both a better situation than what we have
>> today (no auth/confidentiality), and if you tie the cert to an IP, it's
>> of limited use, and better than today...
>
>There are multiple ways to handle that:
>  - First of all, you can just validate based upon "cert is signed by  
>trusted CA".
>  - Then you can require that the CN matches the hostname and the  
>reverse lookup matches.
>  - Or (similar to browsers today) you can ignore the CN and require  
>that the hostnames of the client matches one of the subject  
>alternative name (SAN) entries (requires reverse DNS lookup to work  
>and match).
At this point, I have three variants in the code (strictest to less strict):
1 - A "-h" command line option on the server handshake daemon (called rpctlssd).
 This requires that all clients have
 certificates that validate and have the FQDN acquired via reverse DNS of
 the IP address of the client for the TCP connection 
(getnameinfo(NI_NAMEREQD))
 in either the subjectAltName or CommonName. (I call X509_check_host()
 and this is what I understand it checks.)
 --> This case implies that the NFS server admin. does not "trust" the
client's IP address enough to apply exports(5) line(s)to it to 
decide to
allow the client to do an NFS mount.
 (An NFS server *might* be willing to allow client(s) to mount via any IP 
address
  for the #2 case below and not use this option.)
2 - Without "-h" the rpctlssd daemon passes flags into the kernel to indicate
 if the client provided a certificate and whether or not it verified.
 Then the "-tlscert" option on the appropriate exports(5) line(s) 
 indicates that the client must have provided a certificate that verified.
 --> For this case (and #3), the server admin. is willing to "trust" the 
client's
IP address enough to apply exports(5) rules to it.
 --> This is the case where a floating (no fixed IP) laptop could have a
   certificate signed by a site local CA.
3 - Similar to #2, but uses the "-tls" option on the exports(5) line(s).
 In this case, the client must use TLS so that data is encrypted on the 
wire,
 but does not need to have a certificate.
 --> The NFS server admin. "trusts" the client IP address enough that they
   are willing to allow the client to mount based on that IP, but wants 
the
   data to be encrypted on the wire.
   - Avoids the bother of generating certificates for the client(s).

A part of this (as discussed in the IETF draft) is to make this easy enough to
use that it does get used. (sec=krb5p achieves both user authentication and
data encryption on the wire, but does not get widely used, due to the need
to run a KDC, etc).

Comments on the above options is welcome, since this does need to be
reviewed by users.

rick

 
At this point you prevent simple cert theft as additionally you  
require that someone also has to be able to modify the DNS (or NFS  
server hosts file, but then he probably can already add an additional  
cert to the truststore of nfsd).

Additional protection is possible by also adding the IP to the SAN. I  
haven't put an effort into evaluating if either IP or DNS is enough  
from a security point of view, or if it makes a difference if the "IP  
in SAN" case has an additional benefit in terms of security if it is  
in addition to the reverse DNS requirement.

Yes this makes it more inconvenient to change the IP of a host, but if  
all the policy possibilities are up to the admin to configure (e.g.  
"cert is signed by trusted CA" as a default, and all the other  
possibilities as an option for nfsd), it is up to the admin and the  
security requirement.

In general, all the possibilities are can either be distinct, or  
accumulative. There is also the possibility that you do not go with  
any CA but configure X self-signed certs for X clients as being  
trusted and the cert of the client needs to be an exact match with one  
of the X self-signed certs in the truststore.

Whatever the policy(/ies) is/are in nfsd, I suggest to make it  
explicit in the docs what is required and what is checked for each  
policy. And even if it may be obvious for you Rick, please also print  
out why a client was rejected. Unfortunately I've seen a lot of cases  
where the error

Re: TLS certificates for NFS-over-TLS floating client

2020-03-20 Thread Rick Macklem
Miroslav Lachman wrote:
>Rick Macklem wrote on 2020/03/19 03:09:
>> Miroslav Lachman wrote:
>>>
>> [...]
>
>>> NFS (or any other server) should check list of revoked certificates too.
>>> Otherwise you will not be able to deny access to user which you no
>>> longer want to have an access.
>> Yes, good point.
>> I won't claim to understand this stuff, but from what I can see, all that is
>> done is the CRL is appended to the CAfile (the one with the CA certificates
>> are in being used for certificate verification via 
>> SSL__CTX_load_verify_locations().
>> >(https://raymii.org/s/articles/OpenSSL_manually_verify_a_certificate_against_a_CRL.html
>> shows a CAfile and CRLfile being concatenated and then used to verify a 
>> certificate.)
>>
>> There is code in sendmail that loads a CRL file separately, but it seems to
>> just put it in the X509 store returned by SSL_CTX_get_cert_store(), which
>> is the one where the CAfile certificates are stored via 
>> SSL_CTX_load_verify_locations(),
>> I think?
>> (It just seems easier to append it to CAfile than do this. The sendmail code 
>> uses
>>   poorly documented functions where the man page says
>>   "SSL_CTX_load_verify_locations()" normally takes care of this.)
>>
>> Does this sound right? rick
>
>I think it would be better to have it in a separate file as Apache does
>https://httpd.apache.org/docs/current/mod/mod_ssl.html#sslcarevocationfile
>
>Seems more convenient to have CA file write protected (read only) and
>then separate file for list of revoked client certificates, maybe
>somewhere else than CA certificate.
Done. (Actually, the SSL_CTX_load_verify_locations() failed when the CRL was
appended to the CAfile, so I needed to use a separate file to get it working.)

I found X509_load_crl_file(), which does all the glop in sendmail's tls.c file
to do it. (And it looks like the sendmail code only handles a CRL file
with a single entry in it.)

Thanks for the comments, rick

Kind regards
Miroslav Lachman
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: TLS certificates for NFS-over-TLS floating client

2020-03-20 Thread Rick Macklem
Jan Bramkamp wrote:
>On 20.03.20 02:44, Russell L. Carter wrote:
>> Here I commit heresy, by A) top posting, and B) by just saying, why
>> not make it easy, first, to tunnel NFSv4 sessions through
>> e.g. net/wireguard or sysutils/spiped?  NFS is point to point.
>> Security infrastructure that actually works understands the shared
>> secret model.
>
>Why not use IPsec in transport mode instead of a tunnel? It avoids
>unnecessary overhead and is already implemented in the kernel. It should
>be enough to "just" require IPsec for TCP port 2049 and run a suitable
>key exchange daemon.
I think the problem with these suggestions is interoperability.
The draft (that should soon become an RFC) describes use of RPC-over-TLS
and since the authors are both Linux NFS developers, I expect Linux to
implement this someday.
Once the Linux client can do it, the NFS server vendors will implement it.

NFS isn't great, but it is supported by a variety of vendors/systems and I
see that as one of its main features.

rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: TLS certificates for NFS-over-TLS floating client

2020-03-19 Thread Rick Macklem
John-Mark Gurney wrote:
>Rick Macklem wrote this message on Wed, Mar 04, 2020 at 03:15 +:
>> I am slowly trying to understand TLS certificates and am trying to figure
>> out how to do the following:
>> -> For an /etc/exports file with...
>> /home -tls -network 192.168.1.0 -mask 255.255.255.0
>> /home -tlscert
>
>Are you looking at implementing draft-cel-nfsv4-rpc-tls?
Yes. The 2 week out of date (I can only do commits once in a while these days) 
can
be found in FreeBSD's subversion under base/projects/nfs-over-tls.

>> This syntax isn't implemented yet, but the thinking is that clients on the
>> 192.168.1 subnet would use TLS, but would not require a certificate.
>> For access from anywhere else, the client(s) would be required to have a
>> certificate.
>>
>> A typical client mounting from outside of the subnet might be my laptop,
>> which is using wifi and has no fixed IP/DNS name.
>> --> How do you create a certificate that the laptop can use, which the NFS
>>server can trust enough to allow the mount?
>> My thinking is that a "secret" value can be put in the certificate that the 
>> NFS
>> server can check for.
>> The simplest way would be a fairly long list of random characters in the
>> organizationName and/or organizationUnitName field(s) of the subject name.
>> Alternately, it could be a newly defined extension for X509v3, I think?
>>
>> Now, I'm not sure, but I don't think this certificate can be created via
>> a trust authority such that it would "verify". However, the server can
>> look for the "secret" in the certificate and allow the mount based on that.
>>
>> Does this sound reasonable?
>
>Without a problem statement or what you're trying to accomplish, it's
>hard to say if it is.
The problem I was/am trying to solve was a way for NFS clients without a
fixed IP/DNS name could have a certificate to allow access to the NFS server.
As suggested by others, having a site local CA created by the NFS admin. seemed
to be the best solution. The server can verify that the certificate was issued 
by
the local CA. Unfortunately, if the client is compromised and the certificate 
is copied
to another client, that client would gain access.
--> I've thought of having the client keep the certificate encrypted in a file 
and
   require the "user" of the client type in a passphrase to unencrypt the 
certificate
   so that it can be used by the daemon in the client that handles the 
client side
   of the TLS handshake, but I have not implemented this.
   --> This would at least subvert the simple case of the certificate file 
being copied
  to a different client and being used to mount the NFS server, but 
if the
  client is compromised, then the passphrase could be captured 
and...

>> Also, even if the NFS client/server have fixed IP addresses with well known
>> DNS names, it isn't obvious to me how signed certificates can be acquired
>> for them?
>> (Lets Encrypt expects the Acme protocol to work and that seems to be
>>  web site/http specific?)
>
>There is DNS challenges that can be used.  I use them to obtain certs
>for SMTP and SIP servers...  using nsupdate, this is relatively easy to
>automate pushing the challenges to a DNS server, and I now use DNS
>challenges for everything, including https.
Since my internet connection is a single dynamically assigned IP from the phone
company, I doubt this would work for me (which I why I say I don't know how
to do this). I suspect there are ways and it would be nice if you could document
this, so I can put it in a howto document.
- An actual example using the nsupdate command would be nice.
Thanks, rick

> Thanks for any help with this, rick

Let me know if you'd like to hop on a call about this.

--
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: TLS certificates for NFS-over-TLS floating client

2020-03-18 Thread Rick Macklem
Miroslav Lachman wrote:
>Hiroki Sato wrote on 2020/03/04 05:35:
>
[...]
>
>>   I do not think it is a good idea to use a certificate with an
>>   embedded secret for authentication and/or authorization.
>>
>>   In the case that the client offers a certificate upon establishing a
>>   TLS connection for authentication purpose, the authenticity will be
>>   checked on the server usually by using the CA certificate which was
>>   used to issue the client certificate.  The CA cert must be put to
>>   somewhere the NFS server can read.
>>
>>   The CA cert is secret.  So if the NFS server can check the client
>>   certificate by the CA certificate, it means the NFS server can
>>   trust the client.  I think no additional information is required.
>
>NFS (or any other server) should check list of revoked certificates too.
>Otherwise you will not be able to deny access to user which you no
>longer want to have an access.
Yes, good point.
I won't claim to understand this stuff, but from what I can see, all that is
done is the CRL is appended to the CAfile (the one with the CA certificates
are in being used for certificate verification via 
SSL__CTX_load_verify_locations().
(https://raymii.org/s/articles/OpenSSL_manually_verify_a_certificate_against_a_CRL.html
shows a CAfile and CRLfile being concatenated and then used to verify a 
certificate.)

There is code in sendmail that loads a CRL file separately, but it seems to
just put it in the X509 store returned by SSL_CTX_get_cert_store(), which
is the one where the CAfile certificates are stored via 
SSL_CTX_load_verify_locations(),
I think?
(It just seems easier to append it to CAfile than do this. The sendmail code 
uses
 poorly documented functions where the man page says
 "SSL_CTX_load_verify_locations()" normally takes care of this.)

Does this sound right? rick

>   Authorization such as which mount point can be mounted by using the
>   client cert can be implemented by using the CN field or other X.509
>   attributes, of course.  It can be just a clear text.
>
>   I think this is one of the most reliable and straightforward ways
>   because in most cases both NFS servers and the clients are under the
>   sysadmin's control.
>
> rm> Now, I'm not sure, but I don't think this certificate can be created via
> rm> a trust authority such that it would "verify". However, the server can
> rm> look for the "secret" in the certificate and allow the mount based on 
> that.
>
>   In the way described above, to use TLS client authentication, the NFS
>   server admin has to have a certificate which allows to sign another
>   certificate.  This means that the admin must be a CA or trusted
>   authority.
>
>   In practice, one can generate a self-signed certificate by using
>   openssl(1) and use it as its CA certificate.  He can issue
>   certificates signed by it for the NFS clients, and put his CA
>   certificate to somewhere the NFS server can read.

Take a look on easy-rsa
https://www.freshports.org/security/easy-rsa/

It is used for example by OpenVPN to create private CA and sign
certificates of clients. It is good starting point to understand what
and how can work.

> rm> Also, even if the NFS client/server have fixed IP addresses with well 
> known
> rm> DNS names, it isn't obvious to me how signed certificates can be acquired
> rm> for them?
> rm> (Lets Encrypt expects the Acme protocol to work and that seems to be
> rm>  web site/http specific?)
>
>   TLS certificates do not always have (or do not need to have) a domain
>   name as an attribute.  Data in attributes are restricted depending on
>   the purpose, so certificates issued by Let's Encrypt can have only
>   domain names (CN or Subject Alternative Name), for example.  An
>   example which is not supported by Let's Encrypt is a certificate for
>   S/MIME email encryption which has an email address.

Kind regards
Miroslav Lachman
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: when does a server need to use SSL_CTX_set_client_CA_list()?

2020-03-16 Thread Rick Macklem
Alexander Leidinger wrote:
>Quoting Rick Macklem  (from Sun, 15 Mar 2020  
>23:27:58 +):
>
>> As such, it stills seems to be a bit of a mystery to me, but it  
>> seems that putting
>> all the certificates in a CAfile and not using a CApath directory is  
>> the simpler
>> way to go.
>
>If you have multiple CAs in the file, the code needs to search for one  
>which matches. If you use the path, the code just needs to list the  
>directory and check the filename which matches the id of the CA-cert.  
>On a recent -current system have where you've never run "certctl  
>rehash" have a look into /etc/ssl/certs, then run "certctl rehash",  
>and then check /etc/ssl/certs again to see what I mean.
>
>For a program which communicates with a lot of different systems which  
>use different CAs (mailserver, browser), the path makes sense. For a  
>NFS server I wouldn't configure all the Mozilla-accepted CAs. As such  
>a CAfile may be enough, but having the possibility for both allows the  
>user to chose which way he wants to configure his system (e.g. maybe  
>he has just one CA in a directory, but for consistency reasons he  
>prefers to specify the path to be able to use one way to configure  
>things).
>
>You can do it either way, technically it doesn't matter. It makes  
>sense to have both possibilities (that would be my preference, to give  
>the user the choice which way he wants to handle it). Having only the  
>file-way would not be stupid (as you can see with wpa and unbound,  
>which are used in a similar way in this regard than one would use  
>NFS). Only the path-way would be less favorable in my opinion.
Well, I can easily provide command line options for both CAfile and CApath.
The part that confuses me is that only CAfile gets used for:
SSL_CTX_set_client_CA_list(SSL_load_CA_names(CAfile))
in the examples I've found, so the CA list that goes to the client doesn't seem
to get set for the CApath case?
As such, there does seem to be a technical difference between using CAfile and
CApath.

And Garrett seems to indicate SSL_CTX_set_client_CA_LIST() should always be 
done.

Note that NFS will often (not always, that's a decision for the NFS admin) want
certificates from clients (something that a web server doesn't normally do).

For now, I'll just provide both command line arguments, but note in the man
page that SSL_CTX_set_client_CA_list() is only done for CAfile.

Thanks for your comments, rick

> I haven't yet decided whether or not I'll specify a command option  
> for setting
> CApath. Sendmail does. wpa and unboud don't?

Sendmail needs to use more than one CA if it wants to validate  
connections from anyone, and it wants to do it in a performant way.  
WIFI and DNS typically only need one CA.

Bye,
Alexander.

-- 
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netch...@freebsd.org  : PGP 0x8F31830F9F2772BF


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: when does a server need to use SSL_CTX_set_client_CA_list()?

2020-03-15 Thread Rick Macklem
Ronald Klop wrote:
>On Sat, 14 Mar 2020 02:28:22 +0100, Rick Macklem 
>wrote:
>
>> Hi,
>>
>> Since it is done in sample code, I have an option in the RPC-over-TLS
>> server daemon that does the SSL_CTX_set_client_CA_list() call.
>> When I test, I have not used this option and the code seems to work.
>> Maybe this is because the client only has a single certificate?
>>
>> Here's the lame description I have in the man page for the option:
>> .It Fl C Ar client_cafile
>> If this option is specified, the server calls
>> .Dq
>> SSL_CTX_set_client_CA_list(ctx,SSL_load_client_CA_file(``client_cafile''))
>> during TLS context configuration.
>> I do not know when this is needed, but it appears to be required for
>> certain TLS configurations.
>>
>> Does someone know when this call is needed?
>> Can you explain it? (Just about anything is better than the above;-)
>>
>
>
>grep -r SSL_CTX_set_client_CA_list /usr/src/* gives a couple of matches
>(sendmail, wpa & unbound). Maybe that source gives a hint.
Good point. I had looked at the s_server in openssl, but not the others.
It looks like wpa and unbound do what I was thinking of and uses the
CAfile argument for both SSL_CTX_load_verify_locations() and
SSL_CTX_set_client_CA_list(SSL_load_client_CA_file()), setting CApath NULL
for SSL_CTX_load_verify_locations().

Sendmail and the s_server.c in openssl pass both CAfile and CApath arguments
to SSL_CTX_load_verify_locations() and then uses the CAfile argument for
SSL_CTX_set_client_CA_list(SSL_load_client_CA_file()).
This means that SSL_CTX_set_client_CA_list() was only called for the CAfile case
and not the CApath case. (The SSL_CTX_load_verify_locations() man page notes 
that
the certificates in CApath are only loaded when verification is being done and
only when a certificate is not found in CAfile, but that doesn't seem to answer
when/if CApath gets used. It is a directory of CA files, but why do it that way
instead of putting them all in a single CAfile?)

As such, it stills seems to be a bit of a mystery to me, but it seems that 
putting
all the certificates in a CAfile and not using a CApath directory is the simpler
way to go.

I haven't yet decided whether or not I'll specify a command option for setting
CApath. Sendmail does. wpa and unboud don't?

Thanks for the suggestion, rick

Regard,

Ronald.


> Thanks, rick
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to
> "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: when does a server need to use SSL_CTX_set_client_CA_list()?

2020-03-14 Thread Rick Macklem
Garrett Wollman wrote:
>Rick Macklem writes:
>>Since it is done in sample code, I have an option in the RPC-over-TLS
>>server daemon that does the SSL_CTX_set_client_CA_list() call.
>>When I test, I have not used this option and the code seems to work.
>>Maybe this is because the client only has a single certificate?
>
>In general, the server needs to send a list of CAs that it's willing
>to accept for client certificate use, because the server should never
>accept just any old CA; normally, a client will interpret receiving
>the list as a request to send a client certificate issued by one of
>the indicated CAs, but the client can send its certificate even if the
>server doesn't send the list or even if the server sends a list but
>client certificate isn't issued by a CA on the list.
>
>It's probably a good idea to send the list even if there's only a
>single valid CA, configured by prior agreement; the overhead is
>minimal and it gives an indication to a fussy or confused client what
>is being required of it.
Ok, so does SSL_CTX_load_verify_locations() set up the server to verify
the certificates and SSL_CTX_set_client_CA_list() set the list of certificate
names sent to the client?

Put another way, should the server normally:
SSL_CTX_load_verify_locations(ctx, cafile, NULL);
and
SSL_CTX_set_client_CA_list(SSL_CTX_load_client_CA_file(cafile));
where cafile is the file with the CA certificates in it?

I currently have the server setting these via separate options and only do the
first one.
If they both use the same file, then I can simplify things and get rid of one of
the options.

Thanks for your help with this, rick

>My recollection is that in the OpenSSL API in particular, if you don't
>set an explicit client CA list, but you *do* set a CA bundle or
>directory to automatically construct the *server's* trust path, then
>the library will just send the name of every single CA it knows about.

-GAWollman

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


when does a server need to use SSL_CTX_set_client_CA_list()?

2020-03-13 Thread Rick Macklem
Hi,

Since it is done in sample code, I have an option in the RPC-over-TLS
server daemon that does the SSL_CTX_set_client_CA_list() call.
When I test, I have not used this option and the code seems to work.
Maybe this is because the client only has a single certificate?

Here's the lame description I have in the man page for the option:
.It Fl C Ar client_cafile
If this option is specified, the server calls
.Dq SSL_CTX_set_client_CA_list(ctx,SSL_load_client_CA_file(``client_cafile''))
during TLS context configuration.
I do not know when this is needed, but it appears to be required for
certain TLS configurations.

Does someone know when this call is needed?
Can you explain it? (Just about anything is better than the above;-)

Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: TLS certificates for NFS-over-TLS floating client

2020-03-05 Thread Rick Macklem
Rick Macklem wrote:
>Benjamin Kaduk wrote:
>>Rick Macklem wrote:
[stuff snipped]
>>> A typical client mounting from outside of the subnet might be my laptop,
>>> which is using wifi and has no fixed IP/DNS name.
>>> --> How do you create a certificate that the laptop can use, which the NFS
>>>server can trust enough to allow the mount?
>>
>>You can give your laptop a certificate for an arbitrary name, provided that
>>the NFS server knows to "validate" that name in an appropriate fashion.  (I
>>don't remember what draft-ietf-nfsv4-rpc-tls says about this validation.)
The draft seems to just refer to RFC5280 and it seems to allow a local CA
(Page 12, 2nd (a) section on page). It does not seem to specify details w.r.t.
validation beyond that.

>As you note below, creating a site local CA is probably appropriate and the
>server should be able to check that the certificates were signed by this.
>(I haven't quite figured out how to do this yet. I think I've created the CA
>and used to sign a client certificate, but haven't yet gotten the server daemon
>to verify it. (Playing with SSL_set_verify_locations() to try to get it to 
>work.;-)
Just fyi, I got this working. My mistake yesterday was that I created a 
certificate
for the client that had a SubjectName identical to the IssuerName. It happened
because I test on one machine (client and server) and I used the hostname as
the CN.
--> This resulted in SSL_get_verify_results() returning "self-signed".

Once I created a client certificate with a different CN in the SubjectName, it
validated ok.

Thanks everyone for your help sofar, rick

>> My thinking is that a "secret" value can be put in the certificate that the 
>> NFS
>> server can check for.
>> The simplest way would be a fairly long list of random characters in the
>> organizationName and/or organizationUnitName field(s) of the subject name.
>> Alternately, it could be a newly defined extension for X509v3, I think?
>
>It would be better to just make a site-local CA and trust everything it
>issues (which, admittedly, is not the greatest option itself.)
I had thought this would be too much work, but it seems fairly straightforward,
so this is what I am now working on.

>> Now, I'm not sure, but I don't think this certificate can be created via
>> a trust authority such that it would "verify". However, the server can
>> look for the "secret" in the certificate and allow the mount based on that.
>>
>> Does this sound reasonable?
>
>I'm not sure what goal you're trying to achieve by this "security through
>obscurity".
Yes. I now see it is the CA stuff that can stay secret.

>> Also, even if the NFS client/server have fixed IP addresses with well known
>> DNS names, it isn't obvious to me how signed certificates can be acquired
>> for them?
>> (Lets Encrypt expects the Acme protocol to work and that seems to be
>>  web site/http specific?)
>
>RFC 8738 specifies the ACME protocol for validating IP addresses.
I had looked at an older RFC, where it seemed to be web site specific.
Since none of my stuff has fixed well known DNS names, I'm not going to
worry about using an established CA for now.

Thanks to everyone for their comments.
I may respond to some of the other posts, but I'm figuring things out for now.

rick

-Ben
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: TLS certificates for NFS-over-TLS floating client

2020-03-05 Thread Rick Macklem
Benjamin Kaduk wrote:
>On Wed, Mar 04, 2020 at 03:15:48AM +0000, Rick Macklem wrote:
>> Hi,
>>
>> I am slowly trying to understand TLS certificates and am trying to figure
>> out how to do the following:
>> -> For an /etc/exports file with...
>> /home -tls -network 192.168.1.0 -mask 255.255.255.0
>> /home -tlscert
>>
>> This syntax isn't implemented yet, but the thinking is that clients on the
>> 192.168.1 subnet would use TLS, but would not require a certificate.
>> For access from anywhere else, the client(s) would be required to have a
>> certificate.
>
>My gut reaction: that doesn't sound like a good idea.
Yep, I thought that the stuff in the certificate was encrypted in a way that the
client couldn't see it. I now see that isn't the case.

>Trusting the local network to be secure is pretty risky, in general.
Well, for my personal case, the subnet has a few machines plugged into it
around my desk and wifi isn't enabled on the modem/NAT gateway, so
I'm fairly confident the local machines are ok.
To be honest, I don't need encryption on the wire, but since the phone
company uses Huawei technology, I could see some wanting the data
encrypted on the wire, if the data were sensitive.
This case is meant to be easy to do, since the clients don't have to have
certificates.

I am trying to provide whatever people might need/want when I implement
this. The rest is obviously up to them.

>> A typical client mounting from outside of the subnet might be my laptop,
>> which is using wifi and has no fixed IP/DNS name.
>> --> How do you create a certificate that the laptop can use, which the NFS
>>server can trust enough to allow the mount?
>
>You can give your laptop a certificate for an arbitrary name, provided that
>the NFS server knows to "validate" that name in an appropriate fashion.  (I
>don't remember what draft-ietf-nfsv4-rpc-tls says about this validation.)
As you note below, creating a site local CA is probably appropriate and the
server should be able to check that the certificates were signed by this.
(I haven't quite figured out how to do this yet. I think I've created the CA
and used to sign a client certificate, but haven't yet gotten the server daemon
to verify it. (Playing with SSL_set_verify_locations() to try to get it to 
work.;-)

>> My thinking is that a "secret" value can be put in the certificate that the 
>> NFS
>> server can check for.
>> The simplest way would be a fairly long list of random characters in the
>> organizationName and/or organizationUnitName field(s) of the subject name.
>> Alternately, it could be a newly defined extension for X509v3, I think?
>
>It would be better to just make a site-local CA and trust everything it
>issues (which, admittedly, is not the greatest option itself.)
I had thought this would be too much work, but it seems fairly straightforward,
so this is what I am now working on.

>> Now, I'm not sure, but I don't think this certificate can be created via
>> a trust authority such that it would "verify". However, the server can
>> look for the "secret" in the certificate and allow the mount based on that.
>>
>> Does this sound reasonable?
>
>I'm not sure what goal you're trying to achieve by this "security through
>obscurity".
Yes. I now see it is the CA stuff that can stay secret.

>> Also, even if the NFS client/server have fixed IP addresses with well known
>> DNS names, it isn't obvious to me how signed certificates can be acquired
>> for them?
>> (Lets Encrypt expects the Acme protocol to work and that seems to be
>>  web site/http specific?)
>
>RFC 8738 specifies the ACME protocol for validating IP addresses.
I had looked at an older RFC, where it seemed to be web site specific.
Since none of my stuff has fixed well known DNS names, I'm not going to
worry about using an established CA for now.

Thanks to everyone for their comments.
I may respond to some of the other posts, but I'm figuring things out for now.

rick

-Ben
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


TLS certificates for NFS-over-TLS floating client

2020-03-03 Thread Rick Macklem
Hi,

I am slowly trying to understand TLS certificates and am trying to figure
out how to do the following:
-> For an /etc/exports file with...
/home -tls -network 192.168.1.0 -mask 255.255.255.0
/home -tlscert

This syntax isn't implemented yet, but the thinking is that clients on the
192.168.1 subnet would use TLS, but would not require a certificate.
For access from anywhere else, the client(s) would be required to have a
certificate.

A typical client mounting from outside of the subnet might be my laptop,
which is using wifi and has no fixed IP/DNS name.
--> How do you create a certificate that the laptop can use, which the NFS
   server can trust enough to allow the mount?
My thinking is that a "secret" value can be put in the certificate that the NFS
server can check for.
The simplest way would be a fairly long list of random characters in the
organizationName and/or organizationUnitName field(s) of the subject name.
Alternately, it could be a newly defined extension for X509v3, I think?

Now, I'm not sure, but I don't think this certificate can be created via
a trust authority such that it would "verify". However, the server can
look for the "secret" in the certificate and allow the mount based on that.

Does this sound reasonable?

Also, even if the NFS client/server have fixed IP addresses with well known
DNS names, it isn't obvious to me how signed certificates can be acquired
for them?
(Lets Encrypt expects the Acme protocol to work and that seems to be
 web site/http specific?)

Thanks for any help with this, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: how to use the ktls

2020-02-03 Thread Rick Macklem
Benjamin Kaduk wrote:
>On Tue, Jan 28, 2020 at 11:01:31PM +0000, Rick Macklem wrote:
>> John Baldwin wrote:
>> [stuff snipped]
>> >I don't know yet. :-/  With the TOE-based TLS I had been testing with, this 
>> >doesn't
>> >happen because the NIC blocks the data until it gets the key and then it's 
>> >always
>> >available via KTLS.  With software-based KTLS for RX (which I'm going to 
>> >start
>> >working on soon), this won't be the case and you will potentially have some 
>> >data
>> >already ready by OpenSSL that needs to be drained from OpenSSL before you 
>> >can
>> >depend on KTLS.  It's probably only the first few messsages, but I will 
>> >need to figure
>> >out a way that you can tell how much pending data in userland you need to 
>> >read via
>> >SSL_read() and then pass back into the kernel before relying on KTLS (it 
>> >would just
>> >be a single chunk of data after SSL_connect you would have to do this for).
>> I think SSL_read() ends up calling ssl3_read_bytes(..APPLICATION..) and then 
>> it throws
>> away non-application data records. (Not sure, ssl3_read_bytes() gets pretty 
>> convoluted at
>> a glance.;-)
>
>Yes, SSL_read() interprets the TLS record type and only passes application
>data records through to the application.  It doesn't exactly "throw away"
>the other records, though -- they still get processed, just internally to
>libssl :)
>I expect based on heuristics that the 485 bytes are a NewSessionTicket
>message, but that actual length is very much not a protocol constant and is
>an implementation detail of the TLS server.  (That said, an openssl server
>is going to be producing the same length every time, for a given version of
>openssl, unless you configure it otherwise.)
Well, I looked at the data and it appears to be two application data records,
both of length 234. (These are in the receive queue before the other end does
an SSL_write() and the only data returned by SSL_read() is what a subsequent
SSL_write() has written.)

My hunch is that, once they are unencrypted, they are just padding.
Anyhow, since they are "application data" the receive side of KERN_TLS
should be able to handle them.
--> I don't think I need to do anything after the SSL_connect() in userland
  to deal with these.

Thanks for your help, rick

-Ben
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: easy way to work around a lack of a direct map on i386

2020-01-31 Thread Rick Macklem
Thanks everyone. I should have waited a day, since jhb@ responded
w.r.t. using sf_bufs as well.
For now, we are sticking with a 64bit only solution, since work on the
receive side of KERN_TLS is more critical to getting this going.

rick


From: owner-freebsd-curr...@freebsd.org  on 
behalf of Konstantin Belousov 
Sent: Friday, January 31, 2020 7:31 AM
To: Hans Petter Selasky
Cc: Rick Macklem; freebsd-current@FreeBSD.org
Subject: Re: easy way to work around a lack of a direct map on i386

On Fri, Jan 31, 2020 at 10:13:58AM +0100, Hans Petter Selasky wrote:
> On 2020-01-31 00:37, Konstantin Belousov wrote:
> > On Thu, Jan 30, 2020 at 11:23:02PM +0000, Rick Macklem wrote:
> > > Hi,
> > >
> > > The current code for KERN_TLS uses PHYS_TO_DMAP()
> > > to access unmapped external pages on m_ext.ext_pgs
> > > mbufs.
> > > I also need to do this to implement RPC-over-TLS.
> > >
> > > The problem is that some arches, like i386, don't
> > > support PHYS_TO_DMAP().
> > >
> > > Since it appears that there will be at most 4 pages on
> > > one of these mbufs, my thinking was...
> > > - Acquire four pages of kva from the kernel_map during
> > >booting.
> > > - Then just use pmap_qenter() to fill in the physical page
> > >mappings for long enough to copy the data.
> > >
> > > Does this sound reasonable?
> > > Is there a better way?
> >
> > Use sfbufs, they should work on all arches.  In essence, they provide MI
> > interface to DMAP where possible.  I do not remember did I bumped the
> > limit for i386 after 4/4 went in.
> >
> > There is currently no limits for sfbufs use per subsystem, but I think it
> > is not very likely to cause too much troubles.  Main rule is to not sleep
> > waiting for more sfbufs if you already own one..
>
> In the DRM-KMS LinuxKPI we have:
>
> void *
> kmap(vm_page_t page)
> {
> #ifdef LINUXKPI_HAVE_DMAP
> vm_offset_t daddr;
>
> daddr = PHYS_TO_DMAP(VM_PAGE_TO_PHYS(page));
>
> return ((void *)daddr);
> #else
> struct sf_buf *sf;
>
> sched_pin();
> sf = sf_buf_alloc(page, SFB_NOWAIT | SFB_CPUPRIVATE);
> if (sf == NULL) {
> sched_unpin();
> return (NULL);
> }
> return ((void *)sf_buf_kva(sf));
> #endif
> }
>
> void
> kunmap(vm_page_t page)
> {
> #ifdef LINUXKPI_HAVE_DMAP
> /* NOP */
> #else
> struct sf_buf *sf;
>
> /* lookup SF buffer in list */
> sf = sf_buf_alloc(page, SFB_NOWAIT | SFB_CPUPRIVATE);
>
> /* double-free */
> sf_buf_free(sf);
> sf_buf_free(sf);
>
> sched_unpin();
> #endif
> }
>
> I think that is the fastest way to do this.

So the kmap address is only valid on the CPU that called the function ?
This is strange, I was not able to find mention of it in references to
kmap.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


easy way to work around a lack of a direct map on i386

2020-01-30 Thread Rick Macklem
Hi,

The current code for KERN_TLS uses PHYS_TO_DMAP()
to access unmapped external pages on m_ext.ext_pgs
mbufs.
I also need to do this to implement RPC-over-TLS.

The problem is that some arches, like i386, don't
support PHYS_TO_DMAP().

Since it appears that there will be at most 4 pages on
one of these mbufs, my thinking was...
- Acquire four pages of kva from the kernel_map during
  booting.
- Then just use pmap_qenter() to fill in the physical page
  mappings for long enough to copy the data.

Does this sound reasonable?
Is there a better way?

Thanks for your comments, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: how to use the ktls

2020-01-28 Thread Rick Macklem
John Baldwin wrote:
[stuff snipped]
>I don't know yet. :-/  With the TOE-based TLS I had been testing with, this 
>doesn't
>happen because the NIC blocks the data until it gets the key and then it's 
>always
>available via KTLS.  With software-based KTLS for RX (which I'm going to start
>working on soon), this won't be the case and you will potentially have some 
>data
>already ready by OpenSSL that needs to be drained from OpenSSL before you can
>depend on KTLS.  It's probably only the first few messsages, but I will need 
>to figure
>out a way that you can tell how much pending data in userland you need to read 
>via
>SSL_read() and then pass back into the kernel before relying on KTLS (it would 
>just
>be a single chunk of data after SSL_connect you would have to do this for).
I think SSL_read() ends up calling ssl3_read_bytes(..APPLICATION..) and then it 
throws
away non-application data records. (Not sure, ssl3_read_bytes() gets pretty 
convoluted at
a glance.;-)

I've found another issue that should keep me amused for a while (this is 
becoming an
interesting little project;-).
The KERN_TLS needs unmapped pages on the mbuf chain, but that isn't what NFS
generates.
I think I'll have to implement some sort of copy function that creates mbufs 
with unmapped
pages and then maps them into kernel space for long enough that the data can be 
copied,
called just before sosend(). Most NFS RPC messages will easily fit in one page.

Someday, the biggies like server read reply may be able to do what sendfile 
does and
put the read data in unmapped page mbufs, avoiding the long list of mbuf 
clusters
that VOP_READ() currently copies the data into.
--> But that's longer term than getting this to work.;-)

Thanks for all your help John, rick

> I'm currently testing with a kernel that doesn't have options KERN_TLS and
> (so long as I get rid of the 478 bytes), it then just does unencrypted RPCs.
>
> So, I guess the big question is can I get access to your WIP code for KTLS
> receive? (I have no idea if I can make progress on it, but I can't do a lot 
> more
> before I have that.)

The WIP only works right now if you have a Chelsio T6 NIC as it uses the T6's 
TCP
offload engine to do TLS.  If you don't have that gear, ping me off-list.  It
would also let you not worry about the SSL_read case for now for initial 
testing.

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: how to use the ktls

2020-01-27 Thread Rick Macklem
John Baldwin wrote:
>On 1/26/20 8:08 PM, Rick Macklem wrote:
>> John Baldwin wrote:
>> [stuff snipped]
>>> Hmmm, this might be a fair bit of work indeed.
>>>
>>> Right now KTLS only works for transmit (though I have some WIP for receive).
>>>
>>> KTLS does assumes that the initial handshake and key negotiation is handled 
>>> by
>>> OpenSSL.  OpenSSL uses custom setockopt() calls to tell the kernel which
>>> session keys to use.
>>>
>>> I think what you would want to do is use something like OpenSSL_connect() in
>>> userspace, and then check to see if KTLS "worked".  If it did, you can tell
>>> the kernel it can write to the socket directly, otherwise you will have to
>>> bounce data back out to userspace to run it through SSL_write() and have
>>> userspace do SSL_read() and then feed data into the kernel.
>>>
>>> The pseudo-code might look something like:
>>>
>>> SSL *s;
>>>
>>> s = SSL_new(...);
>>>
>>> /* fd is the existing TCP socket */
>>> SSL_set_fd(s, fd);
>>> OpenSSL_connect(s);
>>> if (BIO_get_ktls_send(SSL_get_wbio(s)) {
>>>/* Can use KTLS for transmit. */
>>> }
>>> if (BIO_get_ktls_recv(SSL_get_rbio(s)) {
>>>/* Can use KTLS for receive. */
>>> }
>>
>> So, I've been making some progress. The first stab at the daemons that do the
>> handshake are now on svn in base/projects/nfs-over-tls/usr.sbin/rpctlscd and
>> rpctlssd.
>>
>> A couple of questions...
>> 1 - I haven't found BIO_get_ktls_send() or BIO_get_ktls_recv(). Are they in 
>> some
>>different library?
>
>They only existing currently in OpenSSL master (which will be OpenSSL 3.0.0 
>when it
>is released).  I have some not-yet-tested WIP changes to backport those 
>changes into
>the base OpenSSL, but it will also add overhead to future OpenSSL imports 
>perhaps,
>so it is something I need to work with secteam@ on to decide if it's viable 
>once I
>have a tested PoC.
>
>I will try to at least provide a patch to the security/openssl port to add a 
>KTLS
>option "soon" that you could use for testing.
John, I wouldn't worry much about this.
The calls are currently #ifdef notnow in the daemon and I'm fine with that.
SSL_connect() has returned 1, so the daemon knows that the handshake is 
complete and
the kernel code that did the upcall to the daemon can check for KERN_TLS 
support.

>> 2 - After a successful SSL_connect(), the receive queue for the socket has 
>> 478bytes
>>   of stuff in it. SSL_read() seems to know how to skip over it, but I 
>> haven't
>>   figured out a good way to do this. (I currently just do a 
>> recv(..478,0) on the
>>   socket.)
>>   Any idea what to do with this? (Or will the receive side of the ktls 
>> figure out
>>   how to skip over it?)
>
>I don't know yet. :-/  With the TOE-based TLS I had been testing with, this 
>doesn't
>happen because the NIC blocks the data until it gets the key and then it's 
>always
>available via KTLS.  With software-based KTLS for RX (which I'm going to start
>working on soon), this won't be the case and you will potentially have some 
>data
>already ready by OpenSSL that needs to be drained from OpenSSL before you can
>depend on KTLS.  It's probably only the first few messsages, but I will need 
>to figure
>out a way that you can tell how much pending data in userland you need to read 
>via
>SSL_read() and then pass back into the kernel before relying on KTLS (it would 
>just
>be a single chunk of data after SSL_connect you would have to do this for).
Well, SSL_read() doesn't return these bytes. I think it just throws them away.

I have a simple test client/server where the client sends "HELLO THERE" to the
server and the server replies "GOODBYE" after the SSL_connect()/SSL_accept()
has been done.
--> If the "HELLO THERE"/"GOODBYE" is done with SSL_write()/SSL_read() it works.
however
--> If the above is done with send()/recv(), the server gets the "HELLO THERE", 
but
  the client gets 485bytes of data, where the last 7 are "GOODBYE".
  --> If I do a recv( ..475..) in the client right after SSL_connect() it 
works ok.

I do this for testing, since it can then do the NFS mount (unencrypted).

Looking inside SSL_read() I found:
*
1742 * If we are a client and haven't received the ServerHello etc then 
we
1743 * better do that
1744 */
1745ossl_statem_check_finish_init(s, 0);

but all ossl_statem_check_finish_init(s, 0); seems to do is set a variab

Re: how to use the ktls

2020-01-26 Thread Rick Macklem
John Baldwin wrote:
[stuff snipped]
>Hmmm, this might be a fair bit of work indeed.
>
>Right now KTLS only works for transmit (though I have some WIP for receive).
>
>KTLS does assumes that the initial handshake and key negotiation is handled by
>OpenSSL.  OpenSSL uses custom setockopt() calls to tell the kernel which
>session keys to use.
>
>I think what you would want to do is use something like OpenSSL_connect() in
>userspace, and then check to see if KTLS "worked".  If it did, you can tell
>the kernel it can write to the socket directly, otherwise you will have to
>bounce data back out to userspace to run it through SSL_write() and have
>userspace do SSL_read() and then feed data into the kernel.
>
>The pseudo-code might look something like:
>
>SSL *s;
>
>s = SSL_new(...);
>
>/* fd is the existing TCP socket */
>SSL_set_fd(s, fd);
>OpenSSL_connect(s);
>if (BIO_get_ktls_send(SSL_get_wbio(s)) {
>   /* Can use KTLS for transmit. */
>}
>if (BIO_get_ktls_recv(SSL_get_rbio(s)) {
>   /* Can use KTLS for receive. */
>}

So, I've been making some progress. The first stab at the daemons that do the
handshake are now on svn in base/projects/nfs-over-tls/usr.sbin/rpctlscd and
rpctlssd.

A couple of questions...
1 - I haven't found BIO_get_ktls_send() or BIO_get_ktls_recv(). Are they in some
  different library?
2 - After a successful SSL_connect(), the receive queue for the socket has 
478bytes
 of stuff in it. SSL_read() seems to know how to skip over it, but I haven't
 figured out a good way to do this. (I currently just do a recv(..478,0) on 
the
 socket.)
 Any idea what to do with this? (Or will the receive side of the ktls 
figure out
 how to skip over it?)

I'm currently testing with a kernel that doesn't have options KERN_TLS and
(so long as I get rid of the 478 bytes), it then just does unencrypted RPCs.

So, I guess the big question is can I get access to your WIP code for KTLS
receive? (I have no idea if I can make progress on it, but I can't do a lot more
before I have that.)

Oh, and for anyone out there...
What is the easiest freebie way to test signed certificates?
(I currently am using a self-signed certificate, but I need to test the "real" 
version
 at some point soon.)

Thanks, rick


--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: how to use the ktls

2020-01-13 Thread Rick Macklem
John Baldwin wrote:
>On 1/12/20 8:23 PM, Benjamin Kaduk wrote:
>> On Thu, Jan 09, 2020 at 10:53:38PM +0000, Rick Macklem wrote:
>>> John Baldwin wrote:
>>>> On 1/7/20 3:02 PM, Rick Macklem wrote:
>>>>> Hi,
>>>>>
>>>>> Now that I've completed NFSv4.2 I'm on to the next project, which is 
>>>>> making NFS
>>>>> work over TLS.
>>>>> Of course, I know absolutely nothing about TLS, which will make this an 
>>>>> interesting
>>>>> exercise for me.
>>>>> I did find simple server code in the OpenSSL doc. which at least gives me 
>>>>> a starting
>>>>> point for the initialization stuff.
>>>>> As I understand it, this initialization must be done in userspace?
>>>>>
>>>>> Then somehow, the ktls takes over and does the encryption of the
>>>>> data being sent on the socket via sosend_generic(). Does that sound right?
>>>>>
>>>>> So, how does the kernel know the stuff that the initialization phase 
>>>>> (handshake)
>>>>> figures out, or is it magic I don't have to worry about?
>>>>>
>>>>> Don't waste much time replying to this. A few quick hints will keep me 
>>>>> going for
>>>>> now. (From what I've seen sofar, this TLS stuff isn't simple. And I 
>>>>> thought Kerberos
>>>>> was a pain.;-)
>>>>>
>>>>> Thanks in advance for any hints, rick
>>>>
>>>> Hmmm, this might be a fair bit of work indeed.
>>> If it was easy,  it wouldn't be fun;-) FreeBSD13 is a ways off and if it 
>>> doesn't make that, oh well..
>>>
>>>> Right now KTLS only works for transmit (though I have some WIP for 
>>>> receive).
>>> Hopefully your WIP will make progress someday, or I might be able to work 
>>> on it.
>>>
>>>> KTLS does assumes that the initial handshake and key negotiation is 
>>>> handled by
>>>> OpenSSL.  OpenSSL uses custom setockopt() calls to tell the kernel which
>>>> session keys to use.
>>> Yea, I figured I'd need a daemon like the gssd for this. The krpc makes it 
>>> a little
>>> more fun, since it handles TCP connections in the kernel.
>>>
>>>> I think what you would want to do is use something like OpenSSL_connect() 
>>>> in
>>>> userspace, and then check to see if KTLS "worked".
>>> Thanks (and for the code below). I found the simple server code in the 
>>> OpenSSL doc,
>>> but the client code gets a web page and is quite involved.
>>>
>>>> If it did, you can tell
>>>> the kernel it can write to the socket directly, otherwise you will have to
>>>> bounce data back out to userspace to run it through SSL_write() and have
>>>> userspace do SSL_read() and then feed data into the kernel.
>>> I don't think bouncing the data up/down to/from userland would work well.
>>> I'd say "if it can't be done in the kernel, too bad". The above could be 
>>> used for
>>> a NULL RPC to see it is working, for the client.
>>
>> So you're saying that we'd only support rpc-over-tls as an NFS client and
>> not as a server, at least until the WIP for ktls read appears?
Actually, I'd say that neither NFS client nor server will work over tls until
the receive side works, since NFS RPCs result in bi-directional traffic.

>To be clear, I have KTLS RX working with TOE right now.  I have a design in my
>head for KTLS RX that would use software and co-processor engines via OCF such
>as aesni(4) and ccr(4) that I hope to implement in the next few months, so KTLS
>RX isn't too far off.  OpenSSL already supports KTLS RX on Linux and the 
>FreeBSD
>patches I already have use the same API.  (Each received TLS frame is read via
>recvmsg() with the TLS header fields in a cmsg.)
Sounds good. It will be a while before I get to the stage where I need it.
I'm currently working  on how to give userland access to a socket created in the
kernel, so that a daemon can use it.

Have fun with it, rick

--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: how to use the ktls

2020-01-09 Thread Rick Macklem
John Baldwin wrote:
>On 1/7/20 3:02 PM, Rick Macklem wrote:
>> Hi,
>>
>> Now that I've completed NFSv4.2 I'm on to the next project, which is making 
>> NFS
>> work over TLS.
>> Of course, I know absolutely nothing about TLS, which will make this an 
>> interesting
>> exercise for me.
>> I did find simple server code in the OpenSSL doc. which at least gives me a 
>> starting
>> point for the initialization stuff.
>> As I understand it, this initialization must be done in userspace?
>>
>> Then somehow, the ktls takes over and does the encryption of the
>> data being sent on the socket via sosend_generic(). Does that sound right?
>>
>> So, how does the kernel know the stuff that the initialization phase 
>> (handshake)
>> figures out, or is it magic I don't have to worry about?
>>
>> Don't waste much time replying to this. A few quick hints will keep me going 
>> for
>> now. (From what I've seen sofar, this TLS stuff isn't simple. And I thought 
>> Kerberos
>> was a pain.;-)
>>
>> Thanks in advance for any hints, rick
>
>Hmmm, this might be a fair bit of work indeed.
If it was easy,  it wouldn't be fun;-) FreeBSD13 is a ways off and if it 
doesn't make that, oh well..

>Right now KTLS only works for transmit (though I have some WIP for receive).
Hopefully your WIP will make progress someday, or I might be able to work on it.

>KTLS does assumes that the initial handshake and key negotiation is handled by
>OpenSSL.  OpenSSL uses custom setockopt() calls to tell the kernel which
>session keys to use.
Yea, I figured I'd need a daemon like the gssd for this. The krpc makes it a 
little
more fun, since it handles TCP connections in the kernel.

>I think what you would want to do is use something like OpenSSL_connect() in
>userspace, and then check to see if KTLS "worked".
Thanks (and for the code below). I found the simple server code in the OpenSSL 
doc,
but the client code gets a web page and is quite involved.

>If it did, you can tell
>the kernel it can write to the socket directly, otherwise you will have to
>bounce data back out to userspace to run it through SSL_write() and have
>userspace do SSL_read() and then feed data into the kernel.
I don't think bouncing the data up/down to/from userland would work well.
I'd say "if it can't be done in the kernel, too bad". The above could be used 
for
a NULL RPC to see it is working, for the client.

>The pseudo-code might look something like:
>
>SSL *s;
>
>s = SSL_new(...);
>
>/* fd is the existing TCP socket */
>SSL_set_fd(s, fd);
>OpenSSL_connect(s);
>if (BIO_get_ktls_send(SSL_get_wbio(s)) {
>  /* Can use KTLS for transmit. */
>}
>if (BIO_get_ktls_recv(SSL_get_rbio(s)) {
>   /* Can use KTLS for receive. */
>}

Thanks John, rick


--
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


how to use the ktls

2020-01-07 Thread Rick Macklem
Hi,

Now that I've completed NFSv4.2 I'm on to the next project, which is making NFS
work over TLS.
Of course, I know absolutely nothing about TLS, which will make this an 
interesting
exercise for me.
I did find simple server code in the OpenSSL doc. which at least gives me a 
starting
point for the initialization stuff.
As I understand it, this initialization must be done in userspace?

Then somehow, the ktls takes over and does the encryption of the
data being sent on the socket via sosend_generic(). Does that sound right?

So, how does the kernel know the stuff that the initialization phase (handshake)
figures out, or is it magic I don't have to worry about?

Don't waste much time replying to this. A few quick hints will keep me going for
now. (From what I've seen sofar, this TLS stuff isn't simple. And I thought 
Kerberos
was a pain.;-)

Thanks in advance for any hints, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: getting rid of sys/nfs/nfs_lock.c

2019-12-29 Thread Rick Macklem
Dennis Clarke wrote:
>On 12/28/19 7:30 PM, Rick Macklem wrote:
>> Hi,
>>
>> sys/nfs/nfs_lock.c uses Giant. Since it has not been used by default since
>> March 2008, I suspect it can be removed from head without any impact.
>> Post March 2008, the only way this code could be executed is by both
>> building a kernel without "options NFSLOCKD" and deleting nfslockd.ko
>> from the kernel boot directory and then running rpc.lockd on the system.
>>
>> I doubt anyone has been doing both of the above, but if you think it is
>> still useful, please speak up. (I have an untested patch that replaces Giant
>> with a regular mutex. I realized this code is not used when I trying to test 
>> it.;-)
>>
>> Also, if it seems appropriate, I could commit a patch that makes it print out
>> "deprecated and going away before FreeBSD 13" message, but I doubt anyone
>> will ever see it.
>> Should I do such a message and wait a few months for the deletion?
>
>Such a message is a good idea.
>
>I am curious if there is any way in which we would see that message when
>creating an NFS share via ZFS set sharenfs='foo' ?
Only if your kernel was built without "options NFSLOCKD" and you do not
have nfslockd.ko in your kernel boot directory.
Highly unlikely for amd64, since neither of the above would be true unless
you created a custom kernel config and deleted nfslockd.ko from the kernel
boot directory you installed it in.

It is slightly more likely to occur for an arm installation, since many of those
do not configure NFS into the kernel, if you did not have the modules in the
boot directory and you then started rpc.lockd.

rick


--
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken
GreyBeard and suspenders optional
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: getting rid of sys/nfs/nfs_lock.c

2019-12-28 Thread Rick Macklem
Oh, I forgot to mention that, post March 2008, this code was replaced by the
in kernel nlm found in sys/nlm, which is why it has been in use.



From: owner-freebsd-curr...@freebsd.org  on 
behalf of Rick Macklem 
Sent: Saturday, December 28, 2019 7:30 PM
To: freebsd-current@freebsd.org
Subject: getting rid of sys/nfs/nfs_lock.c

Hi,

sys/nfs/nfs_lock.c uses Giant. Since it has not been used by default since
March 2008, I suspect it can be removed from head without any impact.
Post March 2008, the only way this code could be executed is by both
building a kernel without "options NFSLOCKD" and deleting nfslockd.ko
from the kernel boot directory and then running rpc.lockd on the system.

I doubt anyone has been doing both of the above, but if you think it is
still useful, please speak up. (I have an untested patch that replaces Giant
with a regular mutex. I realized this code is not used when I trying to test 
it.;-)

Also, if it seems appropriate, I could commit a patch that makes it print out
"deprecated and going away before FreeBSD 13" message, but I doubt anyone
will ever see it.
Should I do such a message and wait a few months for the deletion?

Thanks for your comments, rick
ps: The current patch that prepares the kernel for deletion of 
sys/nfs/nfs_lock.c
 is in reviews.freebsd.org/D22933.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


getting rid of sys/nfs/nfs_lock.c

2019-12-28 Thread Rick Macklem
Hi,

sys/nfs/nfs_lock.c uses Giant. Since it has not been used by default since
March 2008, I suspect it can be removed from head without any impact.
Post March 2008, the only way this code could be executed is by both
building a kernel without "options NFSLOCKD" and deleting nfslockd.ko
from the kernel boot directory and then running rpc.lockd on the system.

I doubt anyone has been doing both of the above, but if you think it is
still useful, please speak up. (I have an untested patch that replaces Giant
with a regular mutex. I realized this code is not used when I trying to test 
it.;-)

Also, if it seems appropriate, I could commit a patch that makes it print out
"deprecated and going away before FreeBSD 13" message, but I doubt anyone
will ever see it.
Should I do such a message and wait a few months for the deletion?

Thanks for your comments, rick
ps: The current patch that prepares the kernel for deletion of 
sys/nfs/nfs_lock.c
 is in reviews.freebsd.org/D22933.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Heads up: Large patch that adds NFSv4.2 has been committed to head/current

2019-12-12 Thread Rick Macklem
Hi,

r355677 is a large patch that adds NFSv4.2 support to the NFS client/server.
It has survived a "make universe" for all arches that would build (some mips
and sparc64 failed for reasons unrelated to this patch).
However, I have not been able to do a build with a recent GCC.
If there are build problems, please let me know.

Although there are a lot of code changes, they should not affect the other
versions of NFS. The patch does add two new sysctls that can be used to
limit the minor versions of NFSv4 supported by the nfsd and, as such, NFSv4.2
can be disabled without reverting this patch.

It does change the internal interface between the NFS modules, so they must
all be upgraded simultaneously. Although arguably not necessary, I will do a
version bump for this.

Hopefully this big patch does not cause you grief, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


merge of NFSv4.2 support into head/current

2019-11-25 Thread Rick Macklem
Hi,

I have completed development and a testing cycle for the NFSv4.2 (RFC-7862)
code in base/projects/nfsv42 on subversion.

NFSv4.2 is a minor revision to NFSv4.1 and adds support for the following 
optional
features:
- lseek(SEEK_DATA/SEEK_HOLE)
- posix_fallocate()
- posix_fadvise(POSIX_FADV_WILLNEED/POSIX_FADV_DONTNEED)
- Server side copy of byte ranges between two files on the same NFS mount
  point when the copy_file_range(2) syscall is used.
- Extended attribute support as specified by RFC-8276.
(There are some other optional features, but I do not intend to implement those
 at this time.)

Although this patch is fairly large, it should not affect the other versions of
NFS.

If anyone would like to do testing of it now, all you need is a fairly current
FreeBSD-current system, with the kernel replaced by one built from the
sources found in the above projects area. (And then you specify "minorversion=2"
as a mount option.

If anyone sees a problem with merging this code into head/current over
the next few weeks, please let me know.

Thanks, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


re: Reproducable deadlock in NFS client

2019-10-03 Thread Rick Macklem
Hi Peter,

You could try a couple of things:
1 - kib@ just put a patch up on phabricator that reorganizes the handling
  of vnode_pager_setsize().
  D21883
  (If you could test this patch, that might be the best approach.)
or
2 - The only differences between the post r352392 code and the older stuff
 is that it calls vnode_pager_setsize() when the size hasn't changed.
 I can't think of why that might cause a problem, but??
 I have a patch in phabricator D21814 that doesn't do the
 vnode_pager_setsize() call when the size doesn't change.
--> If this patch were to avoid the hang, it could help diagnose the
   problem.
 The other difference is that it called vnode_pager_setsize() when there
 was a small change, but not enough to affect a page boundary. I can't think
 of how this would affect things either, but..
or
If you can't test either of the above patches, you could try reverting both
r352393 and r352457, which would put things back the way they've been
for years, to see if that works ok.

Good luck with it, rick
ps: Btw, capturing "procstat -kk" and "ps axHl" would give you/us more info.
 (The "H" on "ps" shows the iod threads.)
  If you can drop into the debugger when it is hung as above, you could
  capture the stuff listed here:
https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: RFC: should lseek(SEEK_DATA/SEEK_HOLE) return ENOTTY?

2019-08-16 Thread Rick Macklem
Ian Lepore wrote:
>On Sun, 2019-08-11 at 09:12 -0600, Alan Somers wrote:
>> On Sun, Aug 11, 2019 at 8:57 AM Ian Lepore  wrote:
>> >
>> > On Sun, 2019-08-11 at 09:04 +0200, Gary Jennejohn wrote:
>> > > On Sun, 11 Aug 2019 02:03:10 +
>> > > Rick Macklem  wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I've noticed that, if you do a lseek(SEEK_DATA/SEEK_HOLE) on a
>> > > > file
>> > > > that
>> > > > resides in a file system that does not support holes, ENOTTY is
>> > > > returned.
>> > > >
>> > > > This error isn't listed for lseek() and seems a liitle weird.
>> > > >
>> > >
>> > > ENOTTY is the standard error return for an unimplemented
>> > > ioctl(2),
>> > > and SEEK_HOLE ultimately becomes a call to fo_ioctl().
>> > >
>> > > > I can see a couple of alternatives to this:
>> > > > 1 - Return a different error. Maybe ENXIO?
>> > > > or
>> > > > 2 - Have lseek() do the trivial implementation when the
>> > > > VOP_IOCTL()
>> > > > fails.
>> > > >- For SEEK_DATA, just return the offset given as argument
>> > > > and
>> > > > for SEEK_HOLE
>> > > >   return the file's size as the offset.
>> > > >
>> > > > What do others think? rick
>> > > > ps: The man page should be updated, whatever is done w.r.t.
>> > > > this.
>> > > >
>> > >
>> > > I also vote for option 2
>> > >
>> >
>> > If SEEK_DATA and SEEK_HOLE don't return the standard "ioctl not
>> > supported" error code and return a fake result, how are you
>> > supposed to
>> > determine at runtime whether SEEK_HOLE is supported or not?
>> >
>> > -- Ian
>>
>> pathconf(2) will tell you.
>>
>
>Ahh, I wasn't aware of that.
>
>For option 2, lseek() has to not just return the info, but must also
>actually set the file position accordingly, and has to treat offset >=
>filesize as an error.

I have put a patch for this at https://reviews.freebsd.org/D21299
I listed markj@ as a reviewer, but anyone is welcome to review it, if they'd 
like.

Since vn_bmap_seekhole() can return ENOTTY, the above patch follows that
convention as well.

I also have a trivial patch to map errnos not specified for lseek() to EINVAL.
https://reviews.freebsd.org/D21300.
Ditto above w.r.t. to reviewing it.

rick


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: RFC: should lseek(SEEK_DATA/SEEK_HOLE) return ENOTTY?

2019-08-11 Thread Rick Macklem
Ian Lepore wrote:
>On Sun, 2019-08-11 at 09:12 -0600, Alan Somers wrote:
>> On Sun, Aug 11, 2019 at 8:57 AM Ian Lepore  wrote:
>> >
>> > On Sun, 2019-08-11 at 09:04 +0200, Gary Jennejohn wrote:
>> > > On Sun, 11 Aug 2019 02:03:10 +
>> > > Rick Macklem  wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I've noticed that, if you do a lseek(SEEK_DATA/SEEK_HOLE) on a
>> > > > file
>> > > > that
>> > > > resides in a file system that does not support holes, ENOTTY is
>> > > > returned.
>> > > >
>> > > > This error isn't listed for lseek() and seems a liitle weird.
>> > > >
>> > >
>> > > ENOTTY is the standard error return for an unimplemented
>> > > ioctl(2),
>> > > and SEEK_HOLE ultimately becomes a call to fo_ioctl().
That's true and explains why it returns ENOTTY. However, lseek(2) is not 
ioctl(2)
and it doesn't list ENOTTY as an error.
(Just to make things confusing, lseek(2) using SEEK_DATA/SEEK_HOLE appears to
 be only a POSIX draft at this point, so POSIX doesn't really help w.r.t. what 
errors
 should be returned for this case.)

>> > >
>> > > > I can see a couple of alternatives to this:
>> > > > 1 - Return a different error. Maybe ENXIO?
>> > > > or
>> > > > 2 - Have lseek() do the trivial implementation when the
>> > > > VOP_IOCTL()
>> > > > fails.
>> > > >- For SEEK_DATA, just return the offset given as argument
>> > > > and
>> > > > for SEEK_HOLE
>> > > >   return the file's size as the offset.
>> > > >
>> > > > What do others think? rick
>> > > > ps: The man page should be updated, whatever is done w.r.t.
>> > > > this.
>> > > >
>> > >
>> > > I also vote for option 2
>> > >
>> >
>> > If SEEK_DATA and SEEK_HOLE don't return the standard "ioctl not
>> > supported" error code and return a fake result, how are you
>> > supposed to
>> > determine at runtime whether SEEK_HOLE is supported or not?
>> >
>> > -- Ian
>>
>> pathconf(2) will tell you.
>>
>
>Ahh, I wasn't aware of that.
>
>For option 2, lseek() has to not just return the info, but must also
>actually set the file position accordingly, and has to treat offset >=
>filesize as an error.
Yes, this check is done below the VOP_IOCTL() layer for the file system
(using vn_bmap_seekhole() or custom code).

I think the easiest way to implement #2 is create a vop_stdioctl() and put it 
into
sys/kern/vfs_default.c. It would need to do this check.

Interestingly, I had assumed the discussion would have been between leaving
the errno alone vs changing the errno. I only threw in #2 for completeness
sake.
--> Now, it appears that #2 is the favourite.

I'll wait for more responses before I propose a patch.

Thanks for the comments, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


RFC: should lseek(SEEK_DATA/SEEK_HOLE) return ENOTTY?

2019-08-10 Thread Rick Macklem
Hi,

I've noticed that, if you do a lseek(SEEK_DATA/SEEK_HOLE) on a file that
resides in a file system that does not support holes, ENOTTY is returned.

This error isn't listed for lseek() and seems a liitle weird.

I can see a couple of alternatives to this:
1 - Return a different error. Maybe ENXIO?
or
2 - Have lseek() do the trivial implementation when the VOP_IOCTL() fails.
   - For SEEK_DATA, just return the offset given as argument and for SEEK_HOLE
  return the file's size as the offset.

What do others think? rick
ps: The man page should be updated, whatever is done w.r.t. this.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: should a copy_file_range(2) syscall be interrupted via a signal

2019-07-07 Thread Rick Macklem
Konstantin Belousov wrote:
>On Fri, Jul 05, 2019 at 08:59:23PM +0000, Rick Macklem wrote:
>> Konstantin Belousov wrote:
>> >On Fri, Jul 05, 2019 at 07:30:54PM +0200, Jilles Tjoelker wrote:
>> >> On Fri, Jul 05, 2019 at 12:28:51AM +, Rick Macklem wrote:
>> >> > I have been working on a Linux compatible copy_file_range(2) syscall
>> >> > (the current code can be found at https://reviews.freebsd.org/D20584).
>> >>
>> >> > One outstanding issue is how it should deal with signals. Right now, I
>> >> > have vn_start_write() without PCATCH, so that it won't be interrupted
>> >> > by a signal, but I notice that vn_write() {ie. write syscall } does
>> >> > have PCATCH on vn_start_write() and so does vn_rdwr() when it is
>> >> > called without IO_NODELOCKED.
>> >>
>> >> A regular write() is only interruptible when writing to a terminal,
>> >> pseudo-terminal master, pipe, socket, or, under certain conditions, a
>> >> file on an NFS intr mount. Therefore, applications may not have the code
>> >> to resume interrupted writes to regular files gracefully.
>> Yes, agreed. Since this syscall only works on VREG vnodes, the only weird 
>> cases
>> are NFS (and maybe fuse). I'll let asomers@ address the fuse situation.
>>
>> >>
>> >> > I am thinking that copy_file_range(2) should do this also.
>> >> > However, if it returns an error, it is impossible for the caller to
>> >> > know how much of the data range got copied.
>> >>
>> >> A regular write() returns partial success if interrupted by a signal
>> >> when it has already written something. Therefore, the application can
>> >> resume the operation by adjusting pointers and counts.
>> >>
>> >> Something similar applies to "deterministic" errors like [EFBIG] where
>> >> the first call will write as far as possible (if this is not nothing)
>> >> successfully and the next attempt will return the error.
>> >>
>> >> > What do you think the copy_file_range(2) code should do?
>> >>
>> >> I'm not sure it should actually be done, but the need for adjusting
>> >> pointers and counts could be avoided with a little extra kernel and libc
>> >> code. The system call would receive an additional argument pointing to
>> >> an off_t that indicates how many bytes previous calls have already
>> >> written. A libc wrapper would initialize this to 0. With this, the
>> >> system call can be restarted automatically after a signal.
>> >>
>> >> In any case, [EINTR] and the internal ERESTART must not be returned
>> >> unless it is safe to repeat the call with the same (direct) arguments.
>> Well, since the copy_file_range(2) syscall is allowed to return fewer bytes 
>> copied
>> than requested and this doesn't mean EOF, it seems that doing that would
>> achieve the result of allowing an application to call it again.
>> (Basically, it must be used in a loop until the bytes of the range have been 
>> copied,
>>  since returning fewer bytes copied than requested is a normal outcome.)
>>
>> >BTW, if the syscall is made interruptible, it should be made cancellable ?
>> Not sure what you mean by "cancellable"? If you mean "terminated by a signal
>> where there has been no change to the output file, then that could only 
>> easily be
>> done by returning EINTR before any data has been copied.
>> If you mean something else, then I'd need to know what that is?
>See pthread_setcancelstate(3) for start, but the POSIX 1003.1-2017
>2.9.5 Thread Cancellation is the definitive spec, including the quite
>readable overview.
Ok, thanks. That explains why cancellation of NFSv4.2 Copy operations are 
defined
the way they are.

>>
>> >I think that PCATCH commonly used for vn_start_write(9) is not the best
>> >decision.  It is safe in the sense explained by Jilles, since its 
>> >interruption
>> >only happens at the very beginning of the syscall, but it contradict to the
>> >tradition of write(2) to the local fs being not interruptible.
>> >
>> >I suggest to not make the syscall interruptible by default, and perhaps
>> >only allow it with a flag.  Then you would need to explain that the
>> >syscall is only interruptible between VOPs, it is up to fs to decide if
>> >the VOP_READ/VOP_WRITE is interruptible (e.g. devfs and nfs).
>> This is how it is coded now. The one thing I have

Re: test program for copy_file_range(2)

2019-07-05 Thread Rick Macklem
Alan Somers wrote:
>On Fri, Jul 5, 2019 at 9:11 AM Rick Macklem  wrote:
>>
>> Alan Somers wrote:
>> >On Thu, Jul 4, 2019 at 6:38 PM Rick Macklem  wrote:
>> >>
>> >> I have a little program for testing the copy_file_range(2) syscall I've 
>> >> been
>> >> working on. (The current version is attached, in case anyone is 
>> >> interested.)
>> >>
>> >> It take a few minutes to run on a slow system and uses about 6Gbytes of 
>> >> disk
>> >> space for the file system the output file is on. (It creates 2 files to 
>> >> use for testing.
>> >> The first one is sparse and the second is copied from it, but grows as 
>> >> different byte
>> >> ranges get copied, since "punching holes" is done via writes of 0 bytes.)
>> >>
>> >> My question is..
>> >> What needs to be done to include this in FreeBSD?
>> >> I see some stuff under head/tests. I could probably figure out
>> >> what the macros in those files are, but I can only see tests to see if
>> >> arguments are valid and similar. As such, I'm not sure if this is the 
>> >> correct
>> >> place for a test like this?
>> >>
>> >> Thanks for any help with this, rick
>> >
>> >head/tests is for complete automated tests, mostly in ATF format.
>> >Your program sounds more like the kind of helper program that might be
>> >more suitable for head/tools/regression.  Those programs all require
>> >some operator interaction.  If you can automate your program then we
>> >should add it to head/tests/sys.  Does it really need 6GB to get
>> >decent test coverage?
>> Well, I wanted the input file to exceed 4Gb and to have a > 4Gb hole in it, 
>> to catch
>> 32bit bugs (I test on i386). This did catch some problems during testing.
>>
>> Then, the program copies (random) ranges of the file to a second file. If 
>> the random
>> copy is done over the "big hole" for the case where it hasn't truncated the 
>> output
>> file (every second iteration), then it writes a "lot of 0s", growing the 
>> output file
>> up to 6Gb of data.
>>
>> I could limit the "random" ranges to not copy the "big hole", but that would 
>> avoid
>> testing that case.
>>
>> rick
>
>random ranges are another problem.  Automated tests shouldn't use
>random behavior, because then failures won't be reproducible.  It's
>best to test a set of hand selected edge cases.  If you're going to
>test random ranges too, then the program should use a user-selectable
>random seed (perhaps seeding from the timer if the user doesn't
>specify a seed, and printing the seed that was chosen).
Good points. Now, I'm about as far from an expert on testing as they come, but
the problem in this case is that I have already written the code to handle the
edge cases I recognized. (Ideally the guy who writes the test program isn't the
guy who wrote the code, but...)

By doing the "random" stuff, I am hoping to catch cases that I hadn't 
anticipated.
(I put the "random" in quotes, since I use random(3) without seeding it, so I
 actually get the same reproducible results.)
I crank the # of cycles up so that it runs for hours/days/weeks.

I do agree I should add some specific edge cases (which I have already checked)
to the test program.

rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: should a copy_file_range(2) syscall be interrupted via a signal

2019-07-05 Thread Rick Macklem
Konstantin Belousov wrote:
>On Fri, Jul 05, 2019 at 07:30:54PM +0200, Jilles Tjoelker wrote:
>> On Fri, Jul 05, 2019 at 12:28:51AM +0000, Rick Macklem wrote:
>> > I have been working on a Linux compatible copy_file_range(2) syscall
>> > (the current code can be found at https://reviews.freebsd.org/D20584).
>>
>> > One outstanding issue is how it should deal with signals. Right now, I
>> > have vn_start_write() without PCATCH, so that it won't be interrupted
>> > by a signal, but I notice that vn_write() {ie. write syscall } does
>> > have PCATCH on vn_start_write() and so does vn_rdwr() when it is
>> > called without IO_NODELOCKED.
>>
>> A regular write() is only interruptible when writing to a terminal,
>> pseudo-terminal master, pipe, socket, or, under certain conditions, a
>> file on an NFS intr mount. Therefore, applications may not have the code
>> to resume interrupted writes to regular files gracefully.
Yes, agreed. Since this syscall only works on VREG vnodes, the only weird cases
are NFS (and maybe fuse). I'll let asomers@ address the fuse situation.

>>
>> > I am thinking that copy_file_range(2) should do this also.
>> > However, if it returns an error, it is impossible for the caller to
>> > know how much of the data range got copied.
>>
>> A regular write() returns partial success if interrupted by a signal
>> when it has already written something. Therefore, the application can
>> resume the operation by adjusting pointers and counts.
>>
>> Something similar applies to "deterministic" errors like [EFBIG] where
>> the first call will write as far as possible (if this is not nothing)
>> successfully and the next attempt will return the error.
>>
>> > What do you think the copy_file_range(2) code should do?
>>
>> I'm not sure it should actually be done, but the need for adjusting
>> pointers and counts could be avoided with a little extra kernel and libc
>> code. The system call would receive an additional argument pointing to
>> an off_t that indicates how many bytes previous calls have already
>> written. A libc wrapper would initialize this to 0. With this, the
>> system call can be restarted automatically after a signal.
>>
>> In any case, [EINTR] and the internal ERESTART must not be returned
>> unless it is safe to repeat the call with the same (direct) arguments.
Well, since the copy_file_range(2) syscall is allowed to return fewer bytes 
copied
than requested and this doesn't mean EOF, it seems that doing that would
achieve the result of allowing an application to call it again.
(Basically, it must be used in a loop until the bytes of the range have been 
copied,
 since returning fewer bytes copied than requested is a normal outcome.)

>BTW, if the syscall is made interruptible, it should be made cancellable ?
Not sure what you mean by "cancellable"? If you mean "terminated by a signal
where there has been no change to the output file, then that could only easily 
be
done by returning EINTR before any data has been copied.
If you mean something else, then I'd need to know what that is?

>I think that PCATCH commonly used for vn_start_write(9) is not the best
>decision.  It is safe in the sense explained by Jilles, since its interruption
>only happens at the very beginning of the syscall, but it contradict to the
>tradition of write(2) to the local fs being not interruptible.
>
>I suggest to not make the syscall interruptible by default, and perhaps
>only allow it with a flag.  Then you would need to explain that the
>syscall is only interruptible between VOPs, it is up to fs to decide if
>the VOP_READ/VOP_WRITE is interruptible (e.g. devfs and nfs).
This is how it is coded now. The one thing I have noticed is that a
copy_file_range() can take a long time (about 2min for 2Gbytes on the old 
hardware
I test on). This seems like a long delay for C when you do that to an 
application
copying a large file. ("cp" and "dd" also take 2min for 2Gbytes, so it isn't a 
bug
in copy_file_range(2). It just introduces a long delay in response to C.)

rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: should a copy_file_range(2) syscall be interrupted via a signal

2019-07-05 Thread Rick Macklem
Hans Petter Selasky wrote:
>On 2019-07-05 02:28, Rick Macklem wrote:
>> I am thinking that copy_file_range(2) should do this also.
>> However, if it returns an error, it is impossible for the caller to know how 
>> much
>> of the data range got copied.
>
>How can you kill a program stuck on copy_file_range(2) w/o catching signals?
Well, if "stuck" means sleeping somewhere inside the VOP_WRITE() call for
the file system, I think it is "stuck" forever, just like write(2), isn't it?

For NFS, the "intr" option might allow write(2) to return EINTR, but it often
takes a forced dismount (actually "umount -N") to get it "unstuck".

However, I think for the case where the signal is detected outside of
VOP_READ()/VOP_WRITE() in the copy loop, it does make sense to terminate
it and I think the suggestion of returning "bytes copied" instead of EINTR is
a good one.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: should a copy_file_range(2) syscall be interrupted via a signal

2019-07-05 Thread Rick Macklem
Mark Johnston wrote:
>On Fri, Jul 05, 2019 at 12:28:51AM +0000, Rick Macklem wrote:
>> Hi,
>>
>> I have been working on a Linux compatible copy_file_range(2) syscall
>> (the current code can be found at https://reviews.freebsd.org/D20584).
>>
>> One outstanding issue is how it should deal with signals.
>> Right now, I have vn_start_write() without PCATCH, so that it won't be
>> interrupted by a signal, but I notice that vn_write() {ie. write syscall } 
>> does
>> have PCATCH on vn_start_write() and so does vn_rdwr() when it is called
>> without IO_NODELOCKED.
>>
>> I am thinking that copy_file_range(2) should do this also.
>> However, if it returns an error, it is impossible for the caller to know how 
>> much
>> of the data range got copied.
>
>Couldn't copy_file_range() return the number of bytes copied in this
>case?  (The Linux man page notes that short writes are possible.) I
>would expect to see the same error handling that we have in
>dofilewrite(), where certain errnos are squashed.
I think this would be a good approach for local file systems, since I believe 
that
the only place that EINTR can be generated is the vn_start_write() call, since
vn_rdwr(IO_NODELOCKED) never returns it and the call completes before
returning.

As such, the EINTR happens at a "well known" place in the copy and a return of
the bytes copied should be fine.

Now, for NFS, it gets a little weird...
- For NFSv3, many use the "intr" mount option, which means that a VOP_WRITE()
  can return EINTR and the caller doesn't know if the write succeeded on the NFS
  server or not.
  --> Returning "bytes copied" instead of an error for this case doesn't seem
   appropriate to me, since there is no way to know if the last write 
happened?
However, "intr" is not recommended for NFSv4 and NFSv4.2 is the only case where
there is an RPC to do this on the server.

Maybe nfs_copy_file_range() shouldn't "hide" EINTR, although the local file
systems do so.

I think sounds like a good approach.
What do others think?

>> What do you think the copy_file_range(2) code should do?
>
>I'd find it surprising if copy_file_range() isn't interruptible.
I'll admit I haven't tested on Linux, so I don't know what happens there.
The Linux man page doesn't mention EINTR, but I don't know what happens
for a Linux "intr" NFS mount. I do have a Linux system for testing, but it is 
the
same system I have been using to test this syscall on FreeBSD. Maybe I need to
boot/play around with it.

I do think returning "bytes copied" instead of EINTR is a good idea, where 
practical.

Thanks for the comments, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [Differential] D20584: add a linux compatible copy_file_range(2) syscall

2019-07-05 Thread Rick Macklem
jilles wrote in copy_file_range.2:99
> The Linux man page (from 
> http://man7.org/linux/man->pages/man2/copy_file_range.2.html ) says that a 
> non-zero flags argument will cause >the call to return an [EINVAL] error. I 
> think that is better than ignoring the argument >completely since it allows 
> adding flags more safely (since there will not be existing >applications that 
> pass in, for example, uninitialized data as flags).

The fun part is that the Linux folks are already discussing adding flags.
I don't know if they are already in Linux-next (or whatever they call their next
release), but it sounded like they were headed that way.

As such, I thought ignoring "flags" would be easier than returning EINVAL for
code that works on Linux.

However, I can see the counter argument, which is "returning EINVAL will
indicate that the Linux flag isn't used on FreeBSD", so that developers will
become aware of that.

What do others think w.r.t. which is the better approach? rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: test program for copy_file_range(2)

2019-07-05 Thread Rick Macklem
Alan Somers wrote:
>On Thu, Jul 4, 2019 at 6:38 PM Rick Macklem  wrote:
>>
>> I have a little program for testing the copy_file_range(2) syscall I've been
>> working on. (The current version is attached, in case anyone is interested.)
>>
>> It take a few minutes to run on a slow system and uses about 6Gbytes of disk
>> space for the file system the output file is on. (It creates 2 files to use 
>> for testing.
>> The first one is sparse and the second is copied from it, but grows as 
>> different byte
>> ranges get copied, since "punching holes" is done via writes of 0 bytes.)
>>
>> My question is..
>> What needs to be done to include this in FreeBSD?
>> I see some stuff under head/tests. I could probably figure out
>> what the macros in those files are, but I can only see tests to see if
>> arguments are valid and similar. As such, I'm not sure if this is the correct
>> place for a test like this?
>>
>> Thanks for any help with this, rick
>
>head/tests is for complete automated tests, mostly in ATF format.
>Your program sounds more like the kind of helper program that might be
>more suitable for head/tools/regression.  Those programs all require
>some operator interaction.  If you can automate your program then we
>should add it to head/tests/sys.  Does it really need 6GB to get
>decent test coverage?
Well, I wanted the input file to exceed 4Gb and to have a > 4Gb hole in it, to 
catch
32bit bugs (I test on i386). This did catch some problems during testing.

Then, the program copies (random) ranges of the file to a second file. If the 
random
copy is done over the "big hole" for the case where it hasn't truncated the 
output
file (every second iteration), then it writes a "lot of 0s", growing the output 
file
up to 6Gb of data.

I could limit the "random" ranges to not copy the "big hole", but that would 
avoid
testing that case.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


test program for copy_file_range(2)

2019-07-04 Thread Rick Macklem
I have a little program for testing the copy_file_range(2) syscall I've been
working on. (The current version is attached, in case anyone is interested.)

It take a few minutes to run on a slow system and uses about 6Gbytes of disk
space for the file system the output file is on. (It creates 2 files to use for 
testing.
The first one is sparse and the second is copied from it, but grows as 
different byte
ranges get copied, since "punching holes" is done via writes of 0 bytes.)

My question is..
What needs to be done to include this in FreeBSD?
I see some stuff under head/tests. I could probably figure out
what the macros in those files are, but I can only see tests to see if
arguments are valid and similar. As such, I'm not sure if this is the correct
place for a test like this?

Thanks for any help with this, rick
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static char junkbuf[128 * 1024];
/*
 * Write xfer bytes into outfd.
 */
static void
junk_write(int outfd, off_t xfer)
{
size_t len;
ssize_t outsiz;

do {
if (xfer > sizeof(junkbuf))
len = sizeof(junkbuf);
else
len = xfer;
outsiz = write(outfd, junkbuf, len);
if (outsiz != len)
err(1, "Can't write junk");
xfer -= outsiz;
} while (xfer > 0);
}

/* Compare the two files for same data. */
static void
comp_files(int infd, int outfd, off_t seekoff, off_t seekout, off_t xfer)
{
char buf[128 * 1024], buf2[128 * 1024];
ssize_t insiz, outsiz;

if (seekoff == seekout) {
lseek(infd, 0, SEEK_SET);
lseek(outfd, 0, SEEK_SET);
xfer = 0;
} else {
lseek(infd, seekoff, SEEK_SET);
lseek(outfd, seekout, SEEK_SET);
}
do {
insiz = read(infd, buf, sizeof(buf));
if (insiz < 0)
err(1, "Can't read infd");
outsiz = read(outfd, buf2, sizeof(buf2));
if (outsiz < 0)
err(1, "Can't read outfd");
if (xfer == 0) {
if (insiz < outsiz)
errx(1, "Premature EOF on infd");
if (outsiz < insiz)
errx(1, "Premature EOF on outfd");
} else if (insiz > outsiz)
insiz = outsiz;
if (insiz > 0 && memcmp(buf, buf2, insiz) != 0)
errx(1, "File data not same");
if (xfer > 0) {
xfer -= insiz;
if (xfer == 0)
insiz = 0;
}
} while (insiz > 0);
}

/*
 * Copy a file range from infd to outfd.
 */
static void
copy_range(int infd, int outfd, off_t xfer)
{
size_t len;
ssize_t ret;

while (xfer > 0) {
if (xfer > SIZE_T_MAX)
len = SIZE_T_MAX;
else
len = xfer;
ret = copy_file_range(infd, NULL, outfd, NULL, len, 0);
if (ret <= 0)
err(1, "Copy range failed!");
xfer -= ret;
}
}

int
main(int argc, char *argv[])
{
int i, infd, j, outfd;
struct stat st, outst;
off_t seekoff, seekout, xfer;
bool check_alloc;
char cp;

if (argc != 3)
errx(1, "Usage: testcfr  ");
/* Fill in junk_buf with the alphabet over and over and over again. */
cp = 'a';
for (i = 0; i < sizeof(junkbuf); i++) {
junkbuf[i] = cp++;
if (cp > 'z')
cp = 'a';
}
infd = open(argv[1], O_CREAT | O_RDWR, 0644);
if (infd < 0)
err(1, "can't open %s", argv[1]);
outfd = open(argv[2], O_CREAT | O_RDWR, 0644);
if (outfd < 0)
err(1, "can't create %s", argv[2]);

seekoff = 0;
/*
 * Create the input file as a sparse file and then copy file ranges
 * of it to the output file and compare the two files.
 */
for (i = 0; i < 2; i++) {
if (i > 0) {
seekoff = 1024 * 1024 * 1024;
seekoff *= 6;
ftruncate(infd, 0);
ftruncate(outfd, 0);
}
lseek(infd, seekoff, SEEK_SET);
write(infd, "", 4);
lseek(infd, 256 * 1024, SEEK_CUR);
write(infd, "", 4);
lseek(infd, 512 * 1024, SEEK_CUR);
write(infd, "", 4);

lseek(infd, 0, SEEK_SET);
lseek(outfd, 0, SEEK_SET);
if 

should a copy_file_range(2) syscall be interrupted via a signal

2019-07-04 Thread Rick Macklem
Hi,

I have been working on a Linux compatible copy_file_range(2) syscall
(the current code can be found at https://reviews.freebsd.org/D20584).

One outstanding issue is how it should deal with signals.
Right now, I have vn_start_write() without PCATCH, so that it won't be
interrupted by a signal, but I notice that vn_write() {ie. write syscall } does
have PCATCH on vn_start_write() and so does vn_rdwr() when it is called
without IO_NODELOCKED.

I am thinking that copy_file_range(2) should do this also.
However, if it returns an error, it is impossible for the caller to know how 
much
of the data range got copied.

What do you think the copy_file_range(2) code should do?

Thanks, rick
ps: I've used FreeBSD-current@ this time, to see if I get more replies than I
  did using FreeBSD-fs@.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


patch to add a Linux compatible copy_file_range(2) syscall

2019-06-09 Thread Rick Macklem
Hi,

I just put a patch in phabricator that is intended to add a Linux compatible
copy_file_range(2) syscall. My main interest in having this is that NFSv4.2 will
know how to do file copying locally on the NFS server, saving all the 
reads/writes
across the wire.

It copies the file byte range in the kernel. I don't know how the performance
compares with a userland file copy done to a local file system on the machine.
(It would save syscalls, but I have no idea if that will result in a noticeable
 performance difference?)

It is at https://reviews.freebsd.org/D20584
I've listed a few guys as possible reviewers, but if anyone else would like to 
review
it, feel free to add yourself.

If anyone is able to test this, it would be appreciated and let me know how it 
goes, rick.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: adding a syscall to libc?

2019-06-09 Thread Rick Macklem
Konstantin Belousov wrote:
>On Sat, Jun 08, 2019 at 02:57:27AM +0000, Rick Macklem wrote:
>> Hi,
>>
First off, thanks Kostik for the fine explanation. I agree with Oliver that it 
should
be captured somewhere like the wiki. I'm no wiki guy, so hopefully someone else
will do this?

>> I've started working of a copy_file_range() syscall for FreeBSD. I think I 
>> have the
>> kernel patched and ready for some testing.
>> However, I'm confused about what I need to do in src/lib/libc/sys?
>> - Some syscalls have little .c files, but other ones do not.
>>   When is one of these little .c files needed and, when not needed, what else
>>   needs to be done? (I notice that syscall.mk in src/sys/sys automagically, 
>> but
>>   I can't see what else, if anything, needs to be done?)
>Most important is to add the new syscall public symbol to sys/Symbol.map
>into the correct version, FBSD_1.6 for CURRENT-13.  Do no bother with
>__sys_XXX and __XXX aliases.
I could only find a Symbol.map in src/lib/libc/sys. I added it there and it 
seems to
work. (I am using a stable/12 source tree for testing the build/userland. I'll 
check
head in case it has moved.)

>'Tiny .c files' are typically used for one of two purposes:
>- Convert raw kernel interface into something expected by userspace,
>  often this coversion uses more generic and non-standard interface to
>  implement more usual function.  Examples are open(2) or waitid(2)
>  which are really tiny wrappers around openat(2) and wait6(2) in
>  today libc.
>- Allow libthr to hook into libc to provide additional services.  Libthr
>  often has to modify semantic of raw syscall, and libc contains the
>  tables redirecting to implementation, the tables are patched on libthr
>  load.  Since tables must fill entries with some address in case libthr
>  is not loaded, tiny functions which wrap syscalls are created for
>  use in that tables.
>
>I think you do not need anything that complications for start, in which
>case adding new syscall consists of the following steps:
Yes, I don't think I need the above.

>- Add the syscall to sys/kern/syscalls.master, and if reasonable,
>  to sys/compat/freebsd32/syscalls.master.
I don't think a 32bit binary on a 64bit system needs this for now.
(At least that's my understanding of what this is used for?)

>- Consider if the syscall makes sense in capsicumized environment,
>  and if yes, list the syscall in sys/kern/capabilities.conf.  Typically,
>  if syscall provides access to the global files namespace, it must be not
>  allowed.  On the other hand, if syscall only operates on already opened
>  file descriptors, then it is suitable (but of course there are lot of
>  nuances).
It uses open fds, but I think I'll leave it out of capabilities.conf for now. If
there is a need, someone more familiar with capsicum can check it.

>- Add syscall prototype to the user-visible portion of header,
>  hiding it under the proper visibility check.
Hmm, not quite sure what you mean here. It ends up in sys/sysproto.h
automagically. Does it need to go somewhere else too?

>- Add syscall symbol to lib/libc/sys/Symbol.ver.
All I found was lib/libc/sys/Symbol.map and I've added it there.

>- Implement the syscall.  There are some additional details that might
>  require attention:
>- If compat32 syscall going to be implemented, or you know
>  that Linuxolator needs to implement same syscall and would
>  like to take advantage of the code, provide
>int kern_YOURSYSCALL();
>  wrapper and declare it in sys/syscallsubr.h.  Real implementations
>  of host-native and compat32 sys_YOURSYSCALL() should be just
>  decoding of uap members and call into kern_YOURSYSCALL.
I think it might be useful for the Linuxolator, since it is meant to be Linux
compatible, so I've done this.

>- Consider the need to add auditing for new syscall.
This one I need to look at more closely. I may end up posting to the list
w.r.t. what to do about this. I think I'll leave it out of the first draft for 
phabricator.

>- Add man page for the syscall, at lib/libc/sys/YOURSYSCALL.2, and connect
>  it to the build in lib/libc/sys/Makefile.inc.
Yea, I know I have to write a man page. Maybe get to that tomorrow.

>- When creating review for the change, do not include diff for generated
>  files after make sysent.  Similarly, when doing the commit, first commit
>  everything non-generated, then do make -C sys/kern sysent (and
>  make sysent -C sys/compat/freebsd32 sysent if appropriate) and commit
>  the generated files in follow-up.
Righto, I'll do this when it gets to that stage.

Thanks again for the useful answer, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


adding a syscall to libc?

2019-06-07 Thread Rick Macklem
Hi,

I've started working of a copy_file_range() syscall for FreeBSD. I think I have 
the
kernel patched and ready for some testing.
However, I'm confused about what I need to do in src/lib/libc/sys?
- Some syscalls have little .c files, but other ones do not.
  When is one of these little .c files needed and, when not needed, what else
  needs to be done? (I notice that syscall.mk in src/sys/sys automagically, but
  I can't see what else, if anything, needs to be done?)

Thanks in advance for your help, rick
ps: I am using the Linux man pages for the syscall ABI. At some point, I'll put 
this
  in phabricator and post here for comments/review.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: RFC w.r.t. toggling debugging on/off for mountd via a signal

2019-05-19 Thread Rick Macklem



Cy Schubert wrote:
[lots of stuff snipped]
>Instead of syslog() calls, DTrace probes are designed for this type of 
>instrumentation.

DTrace us way too obscure for me. Never used it, probably never will.
(Remember I'm the guy who still uses "ed" to edit all my text files, because 
screen
 editors like "vi" are too obscure for me.;-)

If you want to volunteer to replace the syslog() calls in the patch with DTrace 
stuff
(the patch is attached to PR#237860 and the calls are uses of the macro called
 LOGDEBUG) and create a simple explanation of how to enable/disable it,
feel free to do so.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: RFC w.r.t. toggling debugging on/off for mountd via a signal

2019-05-18 Thread Rick Macklem
Alan Somers wrote:
>On Sat, May 18, 2019 at 7:59 PM Rick Macklem  wrote:
>>
>> Hi,
>>
>> I've been working with Peter Errikson on a patch for mountd that adds a new 
>> option
>> for incremental updating of exports. This seems to be helping a lot w.r.t. 
>> performance
>> on an NFS server with lots (1+) of exported file systems.
>>
>> I have debug syslog() calls in the code, which I/Peter think would be worth 
>> keeping
>> in the production code in case someone runs into problems with this new 
>> option.
>>
>> As such, I'd like to have the code compiled in by default (not only if DEBUG 
>> is defined,
>> as mountd.c has now). I also was thinking it would be nice if the daemon 
>> didn't need
>> to be restarted to enable/disable the debugging output, since that breaks NFS
>> mounting during the restart.
>>
>> So, I was thinking of having the debugging output toggled on/off via SIGUSR1.
>>
>> What do you think of this idea?
>> Any other/better ways to do this?
>> Also, would LOG_DAEMON and LOG_DEBUG sound like the correct facility and
>> priority for theses syslog() calls?
>>
>> Thanks in advance for any comments, rick
>
>If the debug messages aren't so verbose that they'll slow down
>syslogd, then you can just leave them enabled all the time.  syslogd
>will filter them.  However, if they're super-verbose then SIGUSR1
>sounds reasonable.  I can't think of another daemon with runtime
>selectable logging verbosity like that.
Yes, these are pretty chatty. 5-10 lines for each entry in an exports file.
Multiply that times the number of entries. (Peter's servers are between 2
to 72000+ file systems. Not sure if he has multiple entries/file system.)

To give you a clue, without this patch, it can take 20sec->over 1min to reload 
them
when mountd gets a SIGHUP.

It's just that the export file handling code is pretty convoluted, so I think 
the patch
is ok, but I won't be too surprised if someone finds a problem.

rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


RFC w.r.t. toggling debugging on/off for mountd via a signal

2019-05-18 Thread Rick Macklem
Hi,

I've been working with Peter Errikson on a patch for mountd that adds a new 
option
for incremental updating of exports. This seems to be helping a lot w.r.t. 
performance
on an NFS server with lots (1+) of exported file systems.

I have debug syslog() calls in the code, which I/Peter think would be worth 
keeping
in the production code in case someone runs into problems with this new option.

As such, I'd like to have the code compiled in by default (not only if DEBUG is 
defined,
as mountd.c has now). I also was thinking it would be nice if the daemon didn't 
need
to be restarted to enable/disable the debugging output, since that breaks NFS
mounting during the restart.

So, I was thinking of having the debugging output toggled on/off via SIGUSR1.

What do you think of this idea?
Any other/better ways to do this?
Also, would LOG_DAEMON and LOG_DEBUG sound like the correct facility and
priority for theses syslog() calls?

Thanks in advance for any comments, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


patch that replaces a single linked list with a hash table of lists (mountd.c) for review

2019-05-15 Thread Rick Macklem
Hi,

I just put a patch for mountd.c in phabricator as D20270, which replaces the
single linked list of structures for exported file systems with a hash table of 
lists.
This is part of what I hope will fix the performance of mountd when reloading
the exports file(s) for a server with a lot of exported file systems.
Peter Eriksson has reported that his file server with 72000+ exported file 
systems
takes 16sec to reload the exports file(s), which implies that the nfsd threads 
are
suspended for 16sec whenever this happens.

If anyone is willing to review this patch, please do so.

Thanks, rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Do the pidhashtbl_locks added by r340742 need to be sx locks?

2019-04-10 Thread Rick Macklem
Mateusz Guzik wrote:
>On 4/11/19, Rick Macklem  wrote:
>> Hi,
>>
>> I finally got around to looking at what effect replacing pfind_locked()
>> with
>> pfind() has for the NFSv4 client and it is broken.
>>
>> The problem is that the NFS code needs to call some variant of "pfind()"
>> while
>> holding a mutex lock. The current _pfind() code uses the pidhashtbl_locks,
>> which are "sx" locks.
>>
>> There are a few ways to fix this:
>> 1 - Create a custom version of _pfind() for the NFS client with the sx_X()
>> calls
>>   removed, plus replace the locking of allproc_lock with locking of all
>> the
>>   pidhashtbl_locks, so that the "sx" locks are acquired before the
>> mutex.
>>   --> Not very efficient, but since it is only done once/sec, I can live
>> with it.
>> 2 - Similar to the above, but still lock the allproc_lock and use a loop of
>>  FOREACH_PROC_IN_SYSTEM(p) instead of a hash list for the pid in the
>>  custom pfind(). (I don't know if this would be preferable to locking
>> all
>>  the pidhashtbl_locks for other users of pfind()?)
>> 3 - Convert the pidhashtbl_locks to mutexes. Then the NFS client doesn't
>> need
>>  to acquire any proc related locks and it just works.
>>  I can't see anywhere that "sleeps" while holding the pidhashtbl_locks,
>> so I
>>  think they can be converted, although I haven't tried it yet?
>>
>> From my perspective, #3 seems the better solution.
>> What do others think?
>>
>
>Changing the lock type to rwlock may be doable and worthwhile on its own,
>but I don't think it would constitute the right fix.
>
>Preferably there would be an easy to use mechanism which allows
>registering per-process callbacks. Currently it can be somewhat emulated
>with EVENTHANDLERs, but it would give calls for all exiting processes, not
>only the ones of interest. Then there would be no need to periodically
>scan as you would always get notified on process exit.
Long ago when I first did the NFSv4 code for OpenBSD2.6, I had a callback 
function
pointer in "struct proc" which the NFS code set non-null to get a callback.
{ The code still has remnants of that because it still has 
nfscl_cleanup_common(),
   which was code shared by that callback and this approach which was used for
   the Mac OS X port, where I couldn't change "struct proc". }
I have never added anything like that for FreeBSD, but I suppose we could look
at doing it that way.
To be honest, since the current code works fine and can be difficult to test 
well,
I hesitate to change to using a callback.

>Note the current code does not ref processes it is interested in any
>manner and just performs a timestamp check to see if it got the one it
>expected (with pid reuse in mind).
>
>So I think a temporary hack which will do the trick will take the current
>approach further: rely on struct proc being type-stable (i.e. never being
>freed) and also store the pointer. You can always safely PROC_LOCK it, do
>checks to see the proc is alive and has the right timestamp...
Hmm, so you are saying that every element of the proc_zone always has a valid
p_mtx field in it that can be safely PROC_LOCK()'d no matter if the element
refers to a process at that time?
I would also need help with the code to determine if the structure refers to
a process that currently exists with the same pid and creation time.

I suppose saving "p" with the lock/open owner string and then doing what you
suggest is possible, but it would take some work.

For now, I can just grab all the pidhashtbl_locks once/sec and fix head so it 
works.

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Do the pidhashtbl_locks added by r340742 need to be sx locks?

2019-04-10 Thread Rick Macklem
Hi,

I finally got around to looking at what effect replacing pfind_locked() with
pfind() has for the NFSv4 client and it is broken.

The problem is that the NFS code needs to call some variant of "pfind()" while
holding a mutex lock. The current _pfind() code uses the pidhashtbl_locks,
which are "sx" locks.

There are a few ways to fix this:
1 - Create a custom version of _pfind() for the NFS client with the sx_X() calls
  removed, plus replace the locking of allproc_lock with locking of all the
  pidhashtbl_locks, so that the "sx" locks are acquired before the mutex.
  --> Not very efficient, but since it is only done once/sec, I can live 
with it.
2 - Similar to the above, but still lock the allproc_lock and use a loop of
 FOREACH_PROC_IN_SYSTEM(p) instead of a hash list for the pid in the
 custom pfind(). (I don't know if this would be preferable to locking all
 the pidhashtbl_locks for other users of pfind()?)
3 - Convert the pidhashtbl_locks to mutexes. Then the NFS client doesn't need
 to acquire any proc related locks and it just works.
 I can't see anywhere that "sleeps" while holding the pidhashtbl_locks, so I
 think they can be converted, although I haven't tried it yet?

>From my perspective, #3 seems the better solution.
What do others think?

rick
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: what do jails map 127.0.0.1 to?

2019-02-16 Thread Rick Macklem
Rodney W. Grimes wrote:
[stuff snipped]
> ipv4 127.0.0.1 == ipv6 ::1, see /etc/hosts

Thanks. I've created D19218 with the patch for nfsuserd.c to both check the
mapping of localhost and adding support for IPv6.
I've listed bz@ as a reviewer, but if anyone else would like to review it, feel 
to
do so.

Thanks, rick

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


  1   2   3   4   5   6   >