Re: KAUTH_SYSTEM_UNENCRYPTED_SWAP

2020-05-18 Thread Alexander Nasonov
matthew green wrote:
> what's the use-case for disabling encrypted swap later?

It might be too slow on some machines.

> i'd argue we should avoid kauth for this and simply disable
> it always as i've been unable to think of any use case that
> is the only solution.

Always encrypted swap would be even better but ... slow machines.

-- 
Alex


Re: KAUTH_SYSTEM_UNENCRYPTED_SWAP

2020-05-17 Thread Alexander Nasonov
Greg Troxel wrote:
> Kamil Rytarowski  writes:
> 
> > Is it possible to avoid negation in the name?
> >
> > KAUTH_SYSTEM_ENABLE_SWAP_ENCRYPTION
> 
> I think the point is to have one permission to enable it, which is
> perhaps just regular root, and another to disable it if securelevel is
> elevated.
> 
> So perhaps there should be two names, one to enable, one to disable.

Kauth is about security rather than speed or convenience. Disabling
encryption may improve speed but it definitely degrades your security
level. So, you can enable vm.swap_encrypt at any level but you can't
disable it if you care about security.

-- 
Alex


Re: KAUTH_SYSTEM_UNENCRYPTED_SWAP

2020-05-16 Thread Alexander Nasonov
m...@netbsd.org wrote:
> No objections from me, but I feel like "will commit unless objected"
> should be done on longer time scales. I spend way too much time on
> netbsd and I still have some days I dont get to reading email for
> whatever reason.

It's a small change, we discussed it on source-changes-d and
current-users already. I could have committed it without sending
the patch but I introduce a new public constant and I wasn't
very sure how to best name it.

-- 
Alex


KAUTH_SYSTEM_UNENCRYPTED_SWAP

2020-05-16 Thread Alexander Nasonov
Attached patch adds KAUTH_SYSTEM_UNENCRYPTED_SWAP and
it forbids changing vm.swap_encrypt from 1 to 0 when
securelevel > 0.

If there are no objections, I'm going to commit it tomorrow.

-- 
Alex
Index: share/man/man9/kauth.9
===
RCS file: /cvsroot/src/share/man/man9/kauth.9,v
retrieving revision 1.112
diff -p -u -u -r1.112 kauth.9
--- share/man/man9/kauth.9  15 Jul 2018 05:16:41 -  1.112
+++ share/man/man9/kauth.9  16 May 2020 19:22:46 -
@@ -25,7 +25,7 @@
 .\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
 .\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 .\"
-.Dd July 14, 2018
+.Dd May 17, 2020
 .Dt KAUTH 9
 .Os
 .Sh NAME
@@ -488,6 +488,8 @@ Check if changing the RTC offset is allo
 .It Dv KAUTH_REQ_SYSTEM_TIME_TIMECOUNTERS
 Check if manipulating timecounters is allowed.
 .El
+.It Dv KAUTH_SYSTEM_UNENCRYPTED_SWAP
+Check if encrypted swap can be degraded to unencrypted.
 .It Dv KAUTH_SYSTEM_VERIEXEC
 Check if operations on the
 .Xr veriexec 8
Index: share/man/man9/secmodel_securelevel.9
===
RCS file: /cvsroot/src/share/man/man9/secmodel_securelevel.9,v
retrieving revision 1.19
diff -p -u -u -r1.19 secmodel_securelevel.9
--- share/man/man9/secmodel_securelevel.9   18 May 2019 10:21:03 -  
1.19
+++ share/man/man9/secmodel_securelevel.9   16 May 2020 19:22:46 -
@@ -26,7 +26,7 @@
 .\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
 .\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 .\"
-.Dd May 18, 2019
+.Dd May 17, 2020
 .Dt SECMODEL_SECURELEVEL 9
 .Os
 .Sh NAME
@@ -129,6 +129,11 @@ calls are denied.
 .It
 Access to unmanaged memory is denied.
 .It
+The
+.Va vm.swap_encrypt
+.Xr sysctl 8
+variable may not be changed to 0.
+.It
 Only GPIO pins that have been set at
 .Em securelevel
 0 can be accessed.
Index: sys/secmodel/securelevel/secmodel_securelevel.c
===
RCS file: /cvsroot/src/sys/secmodel/securelevel/secmodel_securelevel.c,v
retrieving revision 1.35
diff -p -u -u -r1.35 secmodel_securelevel.c
--- sys/secmodel/securelevel/secmodel_securelevel.c 11 May 2020 19:36:39 
-  1.35
+++ sys/secmodel/securelevel/secmodel_securelevel.c 16 May 2020 19:22:47 
-
@@ -343,6 +343,11 @@ secmodel_securelevel_system_cb(kauth_cre
result = KAUTH_RESULT_DENY;
break;
 
+   case KAUTH_SYSTEM_UNENCRYPTED_SWAP:
+   if (securelevel > 0)
+   result = KAUTH_RESULT_DENY;
+   break;
+
case KAUTH_SYSTEM_DEBUG:
default:
break;
Index: sys/secmodel/suser/secmodel_suser.c
===
RCS file: /cvsroot/src/sys/secmodel/suser/secmodel_suser.c,v
retrieving revision 1.54
diff -p -u -u -r1.54 secmodel_suser.c
--- sys/secmodel/suser/secmodel_suser.c 16 May 2020 19:12:38 -  1.54
+++ sys/secmodel/suser/secmodel_suser.c 16 May 2020 19:22:47 -
@@ -397,6 +397,11 @@ secmodel_suser_system_cb(kauth_cred_t cr
 
break;
 
+   case KAUTH_SYSTEM_UNENCRYPTED_SWAP:
+   if (isroot)
+   result = KAUTH_RESULT_ALLOW;
+   break;
+
case KAUTH_SYSTEM_VERIEXEC:
switch (req) {
case KAUTH_REQ_SYSTEM_VERIEXEC_ACCESS:
Index: sys/sys/kauth.h
===
RCS file: /cvsroot/src/sys/sys/kauth.h,v
retrieving revision 1.84
diff -p -u -u -r1.84 kauth.h
--- sys/sys/kauth.h 29 Apr 2020 05:54:37 -  1.84
+++ sys/sys/kauth.h 16 May 2020 19:22:47 -
@@ -152,6 +152,7 @@ enum {
KAUTH_SYSTEM_FS_SNAPSHOT,
KAUTH_SYSTEM_INTR,
KAUTH_SYSTEM_KERNADDR,
+   KAUTH_SYSTEM_UNENCRYPTED_SWAP,
 };
 
 /*
Index: sys/uvm/uvm_swap.c
===
RCS file: /cvsroot/src/sys/uvm/uvm_swap.c,v
retrieving revision 1.189
diff -p -u -u -r1.189 uvm_swap.c
--- sys/uvm/uvm_swap.c  10 May 2020 02:38:10 -  1.189
+++ sys/uvm/uvm_swap.c  16 May 2020 19:22:47 -
@@ -2078,12 +2078,34 @@ uvm_swap_decryptpage(struct swapdev *sdp
explicit_memset(, 0, sizeof aes);
 }
 
+static int
+sysctl_swap_encrypt(SYSCTLFN_ARGS)
+{
+   struct sysctlnode node;
+   int newval, error;
+
+   newval = *(int *)rnode->sysctl_data;
+
+   node = *rnode;
+   node.sysctl_data = 
+   error = sysctl_lookup(SYSCTLFN_CALL());
+   if (error || newp == NULL)
+   return error;
+
+   if (!newval && kauth_authorize_system(l->l_cred,
+   KAUTH_SYSTEM_UNENCRYPTED_SWAP, 0, NULL, NULL, NULL))
+   return EPERM;
+
+   *(int *)rnode->sysctl_data = newval;
+   return 0;
+}
+
 

Re: fexecve

2019-09-08 Thread Alexander Nasonov
Christos Zoulas wrote:
> - We can completely dissallow fexecve in chrooted environments.

Full disk encryption (loaded with cgdroot.kmod) requires a complete
system to be chrooted.

-- 
Alex


ASan: Unauthorized Access

2019-04-06 Thread Alexander Nasonov
Yesterday evening I updated the tree and compiled GENERIC_KASAN kernel.

It survived a night of three aft test runs but I see a bunch of
'ASan: Unauthorized Access' stack traces in dmesg, all around
the same time. They all look alike. Is it a known problem?

[ 10325.124325] ASan: Unauthorized Access In 0x80f60ce0: Addr 
0x938023b77f80 [4 bytes, read, RedZone]
[ 10325.124325] #0 0x80f60ce0 in kauth_cred_uidmatch 
[ 10325.134352] #1 0x816f6dc5 in secmodel_extensions_network_cb 
[ 10325.134352] #2 0x80f5f75c in kauth_authorize_action_internal 

[ 10325.134352] #3 0x80f61dfa in kauth_authorize_action 
[ 10325.134352] #4 0x80f627a0 in kauth_authorize_network 
[ 10325.134352] #5 0x80aeafd3 in sysctl_inpcblist 
[ 10325.134352] #6 0x80fc447f in sysctl_dispatch 
[ 10325.134352] #7 0x80fc48dc in sys___sysctl 
[ 10325.134352] #8 0x8026b3ae in syscall 
[ 10325.134352] ASan: Unauthorized Access In 0x80f60d05: Addr 
0x938023b77f84 [4 bytes, read, RedZone]
[ 10325.134352] #0 0x80f60d05 in kauth_cred_uidmatch 
[ 10325.134352] #1 0x816f6dc5 in secmodel_extensions_network_cb 
[ 10325.134352] #2 0x80f5f75c in kauth_authorize_action_internal 

[ 10325.144377] #3 0x80f61dfa in kauth_authorize_action 
[ 10325.144377] #4 0x80f627a0 in kauth_authorize_network 
[ 10325.144377] #5 0x80aeafd3 in sysctl_inpcblist 
[ 10325.144377] #6 0x80fc447f in sysctl_dispatch 
[ 10325.144377] #7 0x80fc48dc in sys___sysctl 
[ 10325.144377] #8 0x8026b3ae in syscall 
[ 10325.144377] ASan: Unauthorized Access In 0x80f60ce0: Addr 
0x93805d15db00 [4 bytes, read, RedZone]
[ 10325.144377] #0 0x80f60ce0 in kauth_cred_uidmatch 
[ 10325.144377] #1 0x816f6dc5 in secmodel_extensions_network_cb 
[ 10325.144377] #2 0x80f5f75c in kauth_authorize_action_internal 

[ 10325.144377] #3 0x80f61dfa in kauth_authorize_action 
[ 10325.144377] #4 0x80f627a0 in kauth_authorize_network 
[ 10325.154403] #5 0x80aeafd3 in sysctl_inpcblist 
[ 10325.154403] #6 0x80fc447f in sysctl_dispatch 
[ 10325.154403] #7 0x80fc48dc in sys___sysctl 
[ 10325.154403] #8 0x8026b3ae in syscall 
[ 10325.154403] ASan: Unauthorized Access In 0x80f60d05: Addr 
0x93805d15db04 [4 bytes, read, RedZone]
[ 10325.154403] #0 0x80f60d05 in kauth_cred_uidmatch 
[ 10325.154403] #1 0x816f6dc5 in secmodel_extensions_network_cb 
[ 10325.154403] #2 0x80f5f75c in kauth_authorize_action_internal 

[ 10325.154403] #3 0x80f61dfa in kauth_authorize_action 
[ 10325.154403] #4 0x80f627a0 in kauth_authorize_network 
[ 10325.154403] #5 0x80aeafd3 in sysctl_inpcblist 
[ 10325.154403] #6 0x80fc447f in sysctl_dispatch 
[ 10325.154403] #7 0x80fc48dc in sys___sysctl 
[ 10325.164430] #8 0x8026b3ae in syscall 
[ 10325.164430] ASan: Unauthorized Access In 0x80f60ce0: Addr 
0x938023b77f80 [4 bytes, read, RedZone]
[ 10325.164430] #0 0x80f60ce0 in kauth_cred_uidmatch 
[ 10325.164430] #1 0x816f6dc5 in secmodel_extensions_network_cb 
[ 10325.164430] #2 0x80f5f75c in kauth_authorize_action_internal 

[ 10325.164430] #3 0x80f61dfa in kauth_authorize_action 
[ 10325.164430] #4 0x80f627a0 in kauth_authorize_network 
[ 10325.164430] #5 0x80aeafd3 in sysctl_inpcblist 
[ 10325.164430] #6 0x80fc447f in sysctl_dispatch 
[ 10325.164430] #7 0x80fc48dc in sys___sysctl 
[ 10325.164430] #8 0x8026b3ae in syscall 
[ 10325.164430] ASan: Unauthorized Access In 0x80f60d05: Addr 
0x938023b77f84 [4 bytes, read, RedZone]
[ 10325.164430] #0 0x80f60d05 in kauth_cred_uidmatch 
[ 10325.174456] #1 0x816f6dc5 in secmodel_extensions_network_cb 
[ 10325.174456] #2 0x80f5f75c in kauth_authorize_action_internal 

[ 10325.174456] #3 0x80f61dfa in kauth_authorize_action 
[ 10325.174456] #4 0x80f627a0 in kauth_authorize_network 
[ 10325.174456] #5 0x80aeafd3 in sysctl_inpcblist 
[ 10325.174456] #6 0x80fc447f in sysctl_dispatch 
[ 10325.174456] #7 0x80fc48dc in sys___sysctl 
[ 10325.174456] #8 0x8026b3ae in syscall 
[ 10325.174456] ASan: Unauthorized Access In 0x80f60ce0: Addr 
0x93805d15db00 [4 bytes, read, RedZone]
[ 10325.174456] #0 0x80f60ce0 in kauth_cred_uidmatch 
[ 10325.174456] #1 0x816f6dc5 in secmodel_extensions_network_cb 
[ 10325.174456] #2 0x80f5f75c in kauth_authorize_action_internal 

[ 10325.174456] #3 0x80f61dfa in kauth_authorize_action 
[ 10325.184518] #4 0x80f627a0 in kauth_authorize_network 
[ 10325.184518] #5 0x80aeafd3 in sysctl_inpcblist 
[ 10325.184518] #6 0x80fc447f in sysctl_dispatch 
[ 10325.184518] #7 0x80fc48dc in sys___sysctl 
[ 10325.184518] #8 0x8026b3ae in 

Re: Removing PF

2019-03-31 Thread Alexander Nasonov
Mouse wrote:
> Security is not a boolean.

Some say that security isn't a product, security is a process.

Surgeons among us have a particular view on that process ;-)

Alex


Re: Time to merge the pgoyette-compat branch (take two)

2018-09-12 Thread Alexander Nasonov
Martin Husemann wrote:
> Also I wonder if we could do some nm digging and awk scripts with tsort
> to find potential symbol collisions or missing symbols not properly
> covered by module dependencies.

BTW, I'm working on Lua bindings for elftoolchain [1]. In principle,
it should be a better alternative to nm+awk hacks but I'm not sure if
the code can handle all required functionality. You're welcome to take
a look :-)

[1] https://www.github.com/xmmswap/luaelftoolchain

-- 
Alex


Re: Too many PMC implementations

2018-08-25 Thread Alexander Nasonov
David Holland wrote:
> On Sat, Aug 25, 2018 at 11:26:07AM +0100, Alexander Nasonov wrote:
>  > 1. It's not standartised and it will very likely change in future versions
> 
> That doesn't really matter as long as you're only using one version at
> a time...

If bytecode is generated from a valid Lua program, it's indeed takes
very little effort to update to a new version. But updating handcrafted
bytecode may take a bit of time.

>  > 2. There is no bpf_validate for Lua bytecode. In fact, Lua team abandoned
>  >an idea of bytecode validation few years ago. From Lua 5.3 manual:
>  > 
>  >Lua does not check the consistency of binary chunks. Maliciously
>  >crafted binary chunks can crash the interpreter.
> 
> Are we talking about installing untrusted/unprivileged kernel trace
> logic? Because that seems like a bad idea, or at least a very hard
> thing to get right... and if not, it doesn't matter if there's a
> validator.

Lua bytecode is turing complete and not validatable but I'm pretty sure
some subset of it (e.g. no loops, no strings, etc) can be validated.

> (Also, isn't EBPF not really validatable either, or am I mixing it
> up with something else?)

Last I checked, the author(s) of eBPF claimed that it can be validated.

-- 
Alex


Re: Too many PMC implementations

2018-08-25 Thread Alexander Nasonov
Kamil Rytarowski wrote:
> There is already a Lua-powered solution for traces in Linux: ktap. It
> uses nice rules written natively in Lua.. however it seems to be
> abandoned in favor of eBPF.

I see two potential problems with using Lua bytecode:

1. It's not standartised and it will very likely change in future versions

2. There is no bpf_validate for Lua bytecode. In fact, Lua team abandoned
   an idea of bytecode validation few years ago. From Lua 5.3 manual:

   Lua does not check the consistency of binary chunks. Maliciously
   crafted binary chunks can crash the interpreter.

Alex


signature.asc
Description: PGP signature


Re: secmodel_securelevel(9) and machdep.svs.enabled

2018-04-26 Thread Alexander Nasonov
26.04.2018, 13:03, "Thor Lancelot Simon" :If that is true it's a serious regression.  Are you talking about thepathological case where "options INSECURE" is set?ThorMy reasoning was based entirely upon reading secmodel_securelevel(9) and kmem(4) man pages. I don’t know whether it accurately reflects the reality.Alex



Re: secmodel_securelevel(9) and machdep.svs.enabled

2018-04-25 Thread Alexander Nasonov
Alexander Nasonov wrote:
> Thinking a bit more about this, I don't think my patch will prevent
> data leakage from the kernel because /dev/mem and /dev/kmem are
> readable at all securelevels.

There is an important distrinction, though. Code in sys/dev/mm.c
can be changed to scramble sensitive pages (e.g. cgd(4) keys) while
meltdown is a wild beast and it's nearly impossible to control.

-- 
Alex


Re: secmodel_securelevel(9) and machdep.svs.enabled

2018-04-25 Thread Alexander Nasonov
Maxime Villard wrote:
> Yes, it's fine. I've never taken care of securelevel, but your change
> can't be incorrect. Perhaps I would use just KAUTH_MACHDEP_SVS instead
> of KAUTH_MACHDEP_SVS_DISABLE, in case another operation gets added in
> the future, but that doesn't matter.

I don't think securelevel should care about details of SVS. If you
want to introduce levels of SVS, KAUTH_MACHDEP_SVS_DISABLE can still
be used to prevent lowering (instead of disabling SVS completely).
Perhaps the name can be changed to KAUTH_MACHDEP_SVS_DEGRADE or
something similar but it's not that important.

Thinking a bit more about this, I don't think my patch will prevent
data leakage from the kernel because /dev/mem and /dev/kmem are
readable at all securelevels. It can only prevent leakage in some
situations. For example, when root is compromised inside chroot
and chroot directory is on a file system mounted with nodev.

-- 
Alex


Re: secmodel_securelevel(9) and machdep.svs.enabled

2018-04-24 Thread Alexander Nasonov
Alexander Nasonov wrote:
> When securelevel is set, should be lock 1->0 change for
> machdep.svs.enabled (and possibly for other sysctls related
> to recent security mitigations)?

Can I commit the attached patch? (doc update will follow)

-- 
Alex
Index: src/sys/sys/kauth.h
===
RCS file: /cvsroot/src/sys/sys/kauth.h,v
retrieving revision 1.75
diff -p -u -u -r1.75 kauth.h
--- src/sys/sys/kauth.h 28 Aug 2017 00:46:07 -  1.75
+++ src/sys/sys/kauth.h 24 Apr 2018 17:59:13 -
@@ -320,7 +320,8 @@ enum {
KAUTH_MACHDEP_NVRAM,
KAUTH_MACHDEP_UNMANAGEDMEM,
KAUTH_MACHDEP_PXG,
-   KAUTH_MACHDEP_X86PMC
+   KAUTH_MACHDEP_X86PMC,
+   KAUTH_MACHDEP_SVS_DISABLE
 };
 
 /*
Index: src/sys/secmodel/suser/secmodel_suser.c
===
RCS file: /cvsroot/src/sys/secmodel/suser/secmodel_suser.c,v
retrieving revision 1.43
diff -p -u -u -r1.43 secmodel_suser.c
--- src/sys/secmodel/suser/secmodel_suser.c 14 Jun 2017 17:48:41 -  
1.43
+++ src/sys/secmodel/suser/secmodel_suser.c 24 Apr 2018 17:59:13 -
@@ -854,6 +854,7 @@ secmodel_suser_machdep_cb(kauth_cred_t c
case KAUTH_MACHDEP_UNMANAGEDMEM:
case KAUTH_MACHDEP_PXG:
case KAUTH_MACHDEP_X86PMC:
+   case KAUTH_MACHDEP_SVS_DISABLE:
if (isroot)
result = KAUTH_RESULT_ALLOW;
break;
Index: src/sys/secmodel/securelevel/secmodel_securelevel.c
===
RCS file: /cvsroot/src/sys/secmodel/securelevel/secmodel_securelevel.c,v
retrieving revision 1.30
diff -p -u -u -r1.30 secmodel_securelevel.c
--- src/sys/secmodel/securelevel/secmodel_securelevel.c 25 Feb 2014 18:30:13 
-  1.30
+++ src/sys/secmodel/securelevel/secmodel_securelevel.c 24 Apr 2018 17:59:13 
-
@@ -494,6 +494,11 @@ secmodel_securelevel_machdep_cb(kauth_cr
result = KAUTH_RESULT_DENY;
break;
 
+   case KAUTH_MACHDEP_SVS_DISABLE:
+   if (securelevel > 0)
+   result = KAUTH_RESULT_DENY;
+   break;
+
case KAUTH_MACHDEP_CPU_UCODE_APPLY:
if (securelevel > 1)
result = KAUTH_RESULT_DENY;
Index: src/sys/arch/x86/x86/svs.c
===
RCS file: /cvsroot/src/sys/arch/x86/x86/svs.c,v
retrieving revision 1.17
diff -p -u -u -r1.17 svs.c
--- src/sys/arch/x86/x86/svs.c  30 Mar 2018 19:58:05 -  1.17
+++ src/sys/arch/x86/x86/svs.c  24 Apr 2018 17:59:11 -
@@ -38,6 +38,7 @@ __KERNEL_RCSID(0, "$NetBSD: svs.c,v 1.17
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -737,11 +738,13 @@ sysctl_machdep_svs_enabled(SYSCTLFN_ARGS
error = 0;
else
error = EOPNOTSUPP;
-   } else {
-   if (svs_enabled)
+   } else if (svs_enabled) {
+   error = kauth_authorize_machdep(kauth_cred_get(),
+   KAUTH_MACHDEP_SVS_DISABLE, NULL, NULL, NULL, NULL);
+   if (!error)
error = svs_disable();
-   else
-   error = 0;
+   } else {
+   error = 0;
}
 
return error;


secmodel_securelevel(9) and machdep.svs.enabled

2018-04-19 Thread Alexander Nasonov
When securelevel is set, should be lock 1->0 change for
machdep.svs.enabled (and possibly for other sysctls related
to recent security mitigations)?

-- 
Alex


Re: KASSERT in exec_elf.c for DYN executable when p_align==0

2018-03-18 Thread Alexander Nasonov
Christos Zoulas wrote:
> In article <20180317225722.GA1538@neva>,
> Alexander Nasonov  <al...@yandex.ru> wrote:
> >Coverity (CID 1427746) complains about a division by zero when
> >align is 0 in all PT_LOAD headers.
> >...
> >I would be nice to perform sanity checks of tainted executable
> >instead of panicing.
> 
> Fixed, thanks.

But it doesn't fix CID 1427746. Given that both 0 and 1 specify no alignment,
the fix is simple:

-   for (align = i = 0; i < eh->e_phnum; i++)
+   align = 1;
+   for (i = 0; i < eh->e_phnum; i++)
if (ph[i].p_type == PT_LOAD && ph[i].p_align > align)
align = ph[i].p_align;

Alex


Re: KASSERT in exec_elf.c for DYN executable when p_align==0

2018-03-17 Thread Alexander Nasonov
Alexander Nasonov wrote:
> Steps to reproduce (on amd64 compiled with MKPIE=yes):
> 
> bvi -s 0x0e2 /bin/echo # change 20 to 00
> bvi -s 0x11a /bin/echo # change 20 to 00
> 
> /bin/echo # boom!
> 
> I would be nice to perform sanity checks of tainted executable
> instead of panicing.

Attached is a simple patch. I don't know (yet) if it works.

Alex
Index: exec_elf.c
===
RCS file: /cvsroot/src/sys/kern/exec_elf.c,v
retrieving revision 1.94
diff -p -u -u -r1.94 exec_elf.c
--- exec_elf.c  17 Mar 2018 00:30:50 -  1.94
+++ exec_elf.c  17 Mar 2018 23:10:43 -
@@ -129,7 +129,8 @@ elf_placedynexec(struct exec_package *ep
Elf_Addr align, offset;
int i;
 
-   for (align = i = 0; i < eh->e_phnum; i++)
+   align = 1;
+   for (i = 0; i < eh->e_phnum; i++)
if (ph[i].p_type == PT_LOAD && ph[i].p_align > align)
align = ph[i].p_align;
 
@@ -679,6 +680,12 @@ exec_elf_makecmds(struct lwp *l, struct 
 
for (i = 0; i < eh->e_phnum; i++) {
pp = [i];
+   if (pp->p_type == PT_LOAD &&
+   (pp->p_align & (pp->p_align - 1)) != 0) {
+   DPRINTF("bad alignment %#jx", (uintmax_t)pp->p_align);
+   error = ENOEXEC;
+   goto bad;
+   }
if (pp->p_type == PT_INTERP) {
if (pp->p_filesz < 2 || pp->p_filesz > MAXPATHLEN) {
DPRINTF("bad interpreter namelen %#jx",


KASSERT in exec_elf.c for DYN executable when p_align==0

2018-03-17 Thread Alexander Nasonov
Coverity (CID 1427746) complains about a division by zero when
align is 0 in all PT_LOAD headers.

I tried reproducing the problem but the code in question is inside
'if (offset < epp->ep_vm_minaddr)' and it isn't easily reproducable.

However, I hit KASSERT panic:

"(offset & (align - 1)) == 0" file sys/kern/exec_elf.c, line 139.

Steps to reproduce (on amd64 compiled with MKPIE=yes):

bvi -s 0x0e2 /bin/echo # change 20 to 00
bvi -s 0x11a /bin/echo # change 20 to 00

/bin/echo # boom!

I would be nice to perform sanity checks of tainted executable
instead of panicing.

-- 
Alex


Re: 8.0 crash inside KVM

2018-02-24 Thread Alexander Nasonov
Michael van Elst wrote:
> al...@yandex.ru (Alexander Nasonov) writes:
> 
> >inside ProxMox kvm and rebooted it a couple of times. But on the
> >last reboot, the kernel stuck at something and I sent a reboot from
> >the console. A bit later it panicked.
> 
> If "console" means: issuing the reboot from DDB, then this can be
> expected. If it means: running multiple /sbin/reboot concurrently,
> then it's a bug.

>From ProxMox noVNC console.

-- 
Alex


Re: 8.0 crash inside KVM

2018-02-24 Thread Alexander Nasonov
Alexander Nasonov wrote:
> I don't know if a panic in a VM should be reported to gnats. I'm sending
> the report to tech-kern instead.
> 
> I successfully installed 8.0_BETA amd64 GENERIC, which I downloaded from
> 
> https://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-8/201802240710Z/images/
> 

One more panic on shutdown:

panic: kernel diagnostic assertion "pipe != NULL" failed:
/usr/src/sys/dev/usb/usbdi.c", line 670
Fatal breakpoint trap in supervisor mode
Trap type 1 code 0 rip 0x80224d95 cs 0x8 rflags 0x246 cr2 0 ilevel 0 
rsp 0xfe80028a7d30
curlwp 0xfe8022707500 pi 0.24 lowest kstack 0xe80028a42c0
Stopped in pid 0.24 (system) at netbsd:breakpoint+0x5c: leave

breakpoint() at netbsd:breakpoint+9x5c
vpanic() at netbsd:vpanic+0x140
ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
usbd_abort_pipe() at netbsd:usbd_abort_pipe+0x6b
usbd_kill_pipe() at netbsd:usbd_kill_pipe+0x11
usbd_new_device() at netbsd:usbd_new_device+0x3ec
usb_doattach() at netbsd:usb_doattach+0xa8
config_interrupt_thread() at netbsd:config_interrupt_thread+0x30


Snapshots are available. I can give remote access to noVNC for debugging.

-- 
Alex


8.0 crash inside KVM

2018-02-24 Thread Alexander Nasonov
I don't know if a panic in a VM should be reported to gnats. I'm sending
the report to tech-kern instead.

I successfully installed 8.0_BETA amd64 GENERIC, which I downloaded from

https://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-8/201802240710Z/images/

inside ProxMox kvm and rebooted it a couple of times. But on the
last reboot, the kernel stuck at something and I sent a reboot from
the console. A bit later it panicked.

Their noVNC console doesn't support copy/paste. The information below is 
incomplete and may be inaccurate.

Mutex error: mutex_destroy,387: assertion failed: !MUTEX_OWNED(mtx->mtx_owner) 
& !MUTEX_HAS_WAITERS(mtx)

lock address: 0xfe81dbfaa0a0
current cpu :  1
current lwp : 0xfe81bdff94a0
owner field : 0x81481c60 wait/spin:   0/0

...

trap type 1 code 0 rip 0x80224d95 cs 0x8 rflags 0x246 cr2 0 ilevel 0 
rsp 0xfe810aeccc60
curlwp 0xfe81bdff94a0 pid 0.15 lowest kstack 0xfe810aec92c0
Stopped in pid 0.15 (system) at netbsd:breakpoint+0x5: leave

breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+x0140
snprintf() at netbsd:snprintf
lockdebug_abort() at netbsd:lockdebug_abort+x06e
mutex_destroy() at netbsd:mutex_destroy+0x3d
disk_destroy() at netbsd:disk_destroy+0x18
wddetach() at netbsd:wddetach+0x10c
config_detach() at netbsd:config_detach+0x10b
config_detach_all() at netbsd:config_detach_all+0x97
cpu_reboot() at netbsd:cpu_reboot+0x173
sysmon_pswitch_event() at netbsd:sysmon_pswitch_event+0x14e
sysmon_task_queue_thread() at netbsd:sysmon_task_queue_thread+x044


Some bits from dmesg:

ahcisata0 at pci0 dev 7 function 0: vendor 8086 product 2922 (rev 0x02)
ahcisata0: interrupting at ioapic0 pin 11
ahcisata0: 64-bit DMA
ahcisata0: AHCI revision 1.0, 6 ports, 32 slots, CAP 
0xc0141f05
atabus2 at ahcisata0 channel 0
atabus3 at ahcisata0 channel 1

...

ahcisata0 port 0: device present, speed: 1.5Gb/s
atapibus0 at atabus1: 2 targets
cd0 at atapibus0 drive 0:  cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2 (using DMA)
uhub0: device problem, disabling port 1
acpi0: power button pressed, shutting down!
syncing disks ... done
cd0: detached
midi0: detached
sysbeep0: detached
atapibus0: detached
pci2: detached
pci1: detached
wd0 at atabus2 drive 0
wd0: 
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 128 GB, 266305 cyl, 16 head, 63 sec, 512 bytes/sect x 268435456 sectors
wd0(ahcisata0:0:0) using PIO mode 4, DMA mode 2, Ultra DMA mode 5 (Ultra/100) 
(using DMA)
atabus7: detached
pad0: output 44100Hz, 16-bit, stereo
audio0 at pad0: half duplex, playback, capture, mmap
pad0: Virtual format configured - Format SLINEAR, precision 16, channels 2, 
frequency 44100
pad0: Latency: 139 milliseconds
spkr1 at audio0: PC speaker (synthesized)
atabus6: detached
atabus5: detached
atabus4: detached
atabus3: detached
atabus1: detached
atabus0: detached
WARNNG: 1 error while detecting hardware; check system log.
ppb1: detached
boot device: wd0
ppb0: detached
root on wd0a dumps on wd0bpchb0: detached

Skipping crash dump on recursive panic


Alex


Re: Spectre

2018-01-18 Thread Alexander Nasonov
m...@netbsd.org wrote:
> Considering JITs are a much bigger risk, and how cheap this suggestion
> is, should we use lfence / similar for other architectures within sljit
> (and possibly lua)?

While everyone seems to be concerned on negative performance impact
of Spectre, I recently worked on preventing speculative loads to
avoid trashing caches and keeping tails of performance in good
shape. I found that lfence was ineffective and I had to insert data
dependency and some extra work to distract processor. It worked
much better and it improved latencies in tails.

If you look at Intel's documentation of lfence, they make it very clear
that lfence doesn't prevent speculative loads.

-- 
Alex


Re: Go binary panics on amd64-current - SIG*unknown

2018-01-07 Thread Alexander Nasonov
Christos Zoulas wrote:
> Index: linux_sigaction.c
> ===
> RCS file: /cvsroot/src/sys/compat/linux/common/linux_sigaction.c,v
> retrieving revision 1.34
> diff -u -u -r1.34 linux_sigaction.c
> --- linux_sigaction.c 17 Oct 2008 20:21:34 -  1.34
> +++ linux_sigaction.c 7 Jan 2018 00:10:43 -
> @@ -84,6 +84,15 @@
>   linux_old_to_native_sigaction(, );
>   }
>   sig = SCARG(uap, signum);
> + /*
> +  * XXX: Linux has 33 realtime signals, the go binary wants to
> +  * reset all of them; nothing else uses the last RT signal, so for
> +  * now ignore it.
> +  */
> + if (sig == LINUX__NSIG) {
> + uprintf("%s: setting signal %d ignored\n", __func__, sig);
> + sig--;  /* back to 63 which is ignored */
> + }
>   if (sig < 0 || sig >= LINUX__NSIG)
>   return (EINVAL);
>   if (sig > 0 && !linux_to_native_signo[sig]) {

it works but you patched the wrong file, this change should go into
linux_signal.c.

-- 
Alex


Re: Go binary panics on amd64-current - SIG*unknown

2018-01-07 Thread Alexander Nasonov
Ah, I thought I mentioned it in my email. Yes, it’s Linux binary.Sent from Yandex.Mail for mobile: http://m.ya.ru/ymail



Go binary panics on amd64-current - SIG*unknown

2018-01-06 Thread Alexander Nasonov
I downloaded IACA tool from intel.com but I couldn't run it:

$ ktrace ./iaca-v3.0-lin64/iaca
fatal error: rt_sigaction read failure

runtime stack:
runtime.throw(0x75de24, 0x19)
/nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/panic.go:596 +0x95
runtime.getsig(0x40, 0x441c60)
/nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/os_linux.go:427 +0x92
runtime.initsig(0xe59d00)
/nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/signal_unix.go:79 
+0x98
runtime.mstart1()
/nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/proc.go:1175 +0xa4
runtime.mstart()
/nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/proc.go:1149 +0x64

goroutine 1 [runnable]:
runtime.main()
/nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/proc.go:106
runtime.goexit()
/nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/asm_amd64.s:2197 +0x1

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
/nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/asm_amd64.s:2197 +0x1


$ kdump

  5477   5477 iaca CALL  rt_sigaction(SIG*unknown 64*,0,0x7f7fe580,8)
  5477   5477 iaca RET   rt_sigaction -1 errno -22 Invalid argument
  5477   5477 iaca CALL  rt_sigaction(SIG*unknown 64*,0,0x7f7fe8b8,8)
  5477   5477 iaca RET   rt_sigaction -1 errno -22 Invalid argument
 
$ uname -a
NetBSD neva 8.99.7 NetBSD 8.99.7 (GENERIC_KASLR) #0: Sat Nov 18 09:54:53 GMT 
2017  
alnsn@neva:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC_KASLR
 amd64

I don't know if it matters but I have PAX protections enabled:

$ sysctl -a |grep -w pax | grep enabled
security.pax.mprotect.enabled = 1
security.pax.segvguard.enabled = 1
security.pax.aslr.enabled = 1

-- 
Alex


Re: Restricting rdtsc [was: kernel aslr]

2018-01-05 Thread Alexander Nasonov
Taylor R Campbell wrote:
> > Date: Tue, 28 Mar 2017 16:58:58 +0200
> > From: Maxime Villard 
> > 
> > Having read several papers on the exploitation of cache latency to defeat
> > aslr (kernel or not), it appears that disabling the rdtsc instruction is a
> > good mitigation on x86. However, some applications can legitimately use it,
> > so I would rather suggest restricting it to root instead.
> 
> Put barriers in the way of legitimate applications to thwart
> hypothetical attackers who will... step around them and use another
> time source, of which there are many options in the system?  This
> sounds more like cutting off the nose to spite the face than a good
> mitigation against real attacks.

Old thread but the authors of the spectre paper did exactly what Taylor said:

https://spectreattack.com/spectre.pdf

"JavaScript does not provide access to the rdtscp instruction, and
Chrome intentionally degrades the accuracy of its high-resolution
timer to dissuade timing attacks using performance.now() [1]. However,
the Web Workers feature of HTML5 makes it simple to create a separate
thread that repeatedly decrements a value in a shared memory location
[18, 32]. This approach yielded a high-resolution timer that provided
sufficient resolution."

-- 
Alex


Re: raid and cgd

2017-11-26 Thread Alexander Nasonov
Edgar Fu? wrote:
> RAIDframe operates on units called stripes. FFS mostly operates on FS blocks. 
> If one FS block is not exactly a whole number of stripes (either because of 
> misalignment or because an FS block is smaller than a RAID stripe), then each 
> write of an FS block will force the RAID to do a RMW cycle.
> Further down, RAIDframe will operate on the components in units of stripe 
> size 
> divided by (components minus one). If this is not aligned to whatever the 
> component can natively modify (think 512e discs), then the component will RMW.
> Doing RMW will dramatically (at least by a factor of three, more likely ten) 
> reduce performance.

Your description matches a paragraph in raidctl(8) that starts with
'Tuning RAID 5 sets is trickier.' while my setup is RAID 1 and
recommended values for SectPerSU are 32 to 128. I set it to 128 but
I'll try 64 to see if makes a difference.

-- 
Alex


Re: raid and cgd

2017-11-26 Thread Alexander Nasonov
Edgar Fu? wrote:
> > when raid is configured for the first time, it needs to do something 
> > with every block?
> raidctl -i, yes.

That's what I thought. Thanks for confirming.

> Other than that, you'll need to align your FS blocks with RAID stripes 
> and each component's stripe part with whatever cgd operates on.

Can you please elaborate? Do they have to match for the best performance,
or should one be a multiple of the other?

> Why do you do RAID-on-CGD and not CGD-on-RAID?

Do I?

First, I configured raid0 and then cgd0 as following:

cgdconfig -s cgd0 /dev/raid0d aes-cbc 256 < /dev/urandom

This was a temporary cgd0 configuration to scrub data.

-- 
Alex


Re: raid and cgd

2017-11-26 Thread Alexander Nasonov
Mouse wrote:
> > My raid is made of two disks dk8 and dk9.  When there is an activity
> > on cgd, dk8 and dk9 shows more than 2x bytes each than cgd.  Why is
> > that?
> 
> Speculation: read-modify-write cycles?
> 
> Speculation: cgd's encryption leading to accessing more of its
> underlying disk than the upper-layer access?

One more: when raid is configured for the first time, it needs to
do something with every block? I'm a newbie and so far I spent more
time learning about raid commands on linux (disgusting!). Time to
read raid(4) and raidctl(8) properly.

-- 
Alex


Re: raid and cgd

2017-11-26 Thread Alexander Nasonov
Alexander Nasonov wrote:
> My raid is made of two disks dk8 and dk9. When there is an activity on cgd,
> dk8 and dk9 shows more than 2x bytes each than cgd. Why is that?
>
>wd01525   71M  66.8   msix2 vec 0 2730 
> ftarg
>wd11524   71M  59.3   msix2 vec 2   75 
> wired
>dk91524   71M  59.8
>dk81524   71M  67.3
>  raid0 761   24M  86.9
>   cgd0 761   24M  99.5

It's still scrubbing raid0 (dd if=/dev/zero of=/dev/rcgd0d bs=32k)
but stats now look different and there is no 2-3x difference in
transfrer rates anymore.

cgd0 is configured with aes-cbc 256.

Disks: seeks xfers bytes %busy   msix1 vec 4 2048 fmin
   wd03586  112M  30.7   msix2 vec 0 2730 ftarg
   dk3   msix2 vec 1  itarg
   wd13586  112M  29.6   msix2 vec 2   75 wired
   dk4   msix2 vec 3  pdfre
   dk5   msix2 vec 4  pdscn
   dk6
   dk7
   dk93568  111M  31.5
  dk10
   dk0
   dk1
   dk2
   dk83568  111M  32.0
  dk11
 raid03568  111M  39.0
  cgd03550  111M  94.5

-- 
Alex


raid and cgd

2017-11-26 Thread Alexander Nasonov
My raid is made of two disks dk8 and dk9. When there is an activity on cgd,
dk8 and dk9 shows more than 2x bytes each than cgd. Why is that?

$ systat 2
:vmstat

Disks: seeks xfers bytes %busy   msix1 vec 4 2048 fmin
   wd01525   71M  66.8   msix2 vec 0 2730 ftarg
   dk3  1203   0.3   msix2 vec 1  itarg
   wd11524   71M  59.3   msix2 vec 2   75 wired
   dk4   msix2 vec 3  pdfre
   dk5   msix2 vec 4  pdscn
   dk6
   dk7
   dk91524   71M  59.8
  dk10
   dk0
   dk1
   dk2
   dk81524   71M  67.3
  dk11
 raid0 761   24M  86.9
  cgd0 761   24M  99.5

-- 
Alex


Re: Restricting rdtsc [was: kernel aslr]

2017-04-04 Thread Alexander Nasonov
Maxime Villard wrote:
> Le 29/03/2017 ? 00:49, Alexander Nasonov a ?crit :
> > I think this should be either all-or-nothing. You either have rdtsc as
> > a time source or you don't. Similar for rdpmc (and other performance
> > counters).
> 
> Well, the idea was to make the availability more fine-grained.
> 
> 
> Seeing the general skepticism that prevails, I guess we can just forget about
> this idea.

There are two more or less independent things: fine-grained time source
and userspace rdtsc. The latter is often used directly when vdso isn't
available. If we implement vdso, I assume that software that needs rdtsc
can be taught to call it via vdso.

With vdso implemented, we can have a flag that enables/disables
vdso globally as well as per process (paxctl?). Independetly,
the kernel can be configured to use either fine-grained or hackerproof
time source for regular (non-vdso) system calls.

Alex


Re: Restricting rdtsc [was: kernel aslr]

2017-03-28 Thread Alexander Nasonov
Maxime Villard wrote:
> Having read several papers on the exploitation of cache latency to defeat
> aslr (kernel or not), it appears that disabling the rdtsc instruction is a
> good mitigation on x86. However, some applications can legitimately use it,
> so I would rather suggest restricting it to root instead.

Why does root need it? For ntp? Properly implemented ntp should be
privsep'ed.

I think this should be either all-or-nothing. You either have rdtsc as
a time source or you don't. Similar for rdpmc (and other performance
counters).

-- 
Alex


Re: A Fast Alternative to Modulo Reduction

2016-12-16 Thread Alexander Nasonov
Mouse wrote:
> Abhinav Upadhyay wrote:
> >> Both operations are not _equivalent_ but he proves that the latter
> >> is also a fair mapping of x in the range [0, N) for 32 bit integers.
> 
> Only under certain assumptions about x - loosely put, that the entropy
> in x is distributed equally across all of x's bits.  But, for example,
> if you use the common string hash function that works like h=0 then
> loop{h=(h*37)+*cp++}, short strings will have all 0s in the high bits.

I can confirm that for several common hash functions (murmur, xxh,
jenkins), a distribution of bits isn't good enough to generate an
acyclic graph for chm (cdb uses the chm algorithm) for many inputs
while fast_divide32 reduction works fine most of the time.

Code is here: https://github.com:/alnsn/rgph

Alex


Re: EFI native support on amd64

2016-12-09 Thread Alexander Nasonov
Kimihiro Nonaka wrote:
> I updated the patches.

It works great. Some observations:

- When the kernel starts booting, it prints lines painfully slow on
  my Asus laptop. When the kernel initialises genfb(4), printing
  suddenly speeds up. If I connect to Sony TV via HDMI, it somewhat
  slow but acceptable.
- I can't figure out how to change screen size in boot.cfg. Adding
  text=2 to boot.cfg doesn't work.
- Likewise, I don't know how to change screen size in genfb(4). I will
  google more on the topic.
- Does the bootloader suport msdos fs? Currently, I create an ffs
  partition solely for hosting boot.cfg file. It would be nice to
  read boot.cfg file from the boot EFI partition.

Alex


Re: EFI native support on amd64

2016-12-08 Thread Alexander Nasonov
Kimihiro Nonaka wrote:
> 
> I'm working on efiboot.
> 
> http://cdn.netbsd.org/pub/NetBSD/misc/nonaka/efiboot/

Hi Kimihiro,

That's cool! I can boot from your bootloader. The compilation fails,
though:

/home/alnsn/netbsd-current/tooldir.amd64/bin/x86_64--netbsd-ld: cannot
find startprog32.o: No such file or directory

I'm building on amd64 -current.

-- 
Alex


EFI native support on amd64

2016-12-08 Thread Alexander Nasonov
Hi,

I see some pieces of EFI in the base (gnu-efi, grub support) but
no complete native support. If I remember correctly, someone worked
on it but I don't have any link to their work. Are there any patches
I can hack on further?

PS I was able to link a hello world amd64 executable. It printed
gibberish, though, presumably because it expected CHAR16. Otherwise,
coding was quite easy given my lack of knowledge in this area.

Alex


Re: small changes in aesxcbcmac.c

2016-09-26 Thread Alexander Nasonov
Alexander Nasonov wrote:
> The first change shrinks aes_xcbc_mac_init by 183 bytes on amd64
> (from 562 to 379 bytes).
> The second change avoids a comparison with an address that may
> point beyond the end of a buffer.
> The third change is stylistic.
> Alex

If there are no objections I'll commit the code.

PS I noticed some excessive memory copying (often of fixed-size blocks).
Some of them may be needed to prevent side channel attacks by measuring
execution time of cache misses. Data of the stack is more likely to be
in cache but it's not bulletproof. If we rely on this at all, buffers on
the stack should have __cacheline_aligned attribute but I don't see any
in the code.

>  aes_xcbc_mac_result(u_int8_t *addr, void *vctx)
>  {
> - u_char digest[AES_BLOCKSIZE];
> + u_int8_t digest[AES_BLOCKSIZE];
>   aesxcbc_ctx *ctx;
>   int i;

This buffer isn't actually needed. The destination addr can be passed
directly to rijndaelEncrypt() calls inside the function. I didn't
change it because it is the only array in the function and removing it
would disable ssp.

Alex



Re: small changes in aesxcbcmac.c

2016-09-26 Thread Alexander Nasonov
Rhialto wrote:
> On Sun 25 Sep 2016 at 22:01:16 +0100, Alexander Nasonov wrote:
> > -   while (addr + AES_BLOCKSIZE < ep) {
> > +   while (ep - addr > AES_BLOCKSIZE) {
> 
> I think that if ep points beyond tbe buffer (apart from the
> just-past-the-end location), the subtraction is just as undefined
> behaviour as before...

If understand the code correctly, ep points to the first byte past the
end which is well defined.

Alex


signature.asc
Description: PGP signature


Re: small changes in aesxcbcmac.c

2016-09-26 Thread Alexander Nasonov
Eric Haszlakiewicz wrote:
> On September 25, 2016 5:01:16 PM EDT, Alexander Nasonov <al...@yandex.ru> 
> wrote:
> >The first change shrinks aes_xcbc_mac_init by 183 bytes on amd64
> >(from 562 to 379 bytes).
> 
> Do you mean it shrinks its stack usage?  Or does that change to static const 
> vars somehow shrink the function itself?

gcc copies each byte to the stack:

80532a10:   c6 45 98 01 movb   $0x1,-0x68(%rbp)
80532a14:   c6 45 99 01 movb   $0x1,-0x67(%rbp)
80532a18:   c6 45 9a 01 movb   $0x1,-0x66(%rbp)
80532a1c:   c6 45 9b 01 movb   $0x1,-0x65(%rbp)
80532a20:   c6 45 9c 01 movb   $0x1,-0x64(%rbp)
80532a24:   c6 45 9d 01 movb   $0x1,-0x63(%rbp)
80532a28:   c6 45 9e 01 movb   $0x1,-0x62(%rbp)
80532a2c:   c6 45 9f 01 movb   $0x1,-0x61(%rbp)
80532a30:   c6 45 a0 01 movb   $0x1,-0x60(%rbp)
80532a34:   c6 45 a1 01 movb   $0x1,-0x5f(%rbp)
80532a38:   c6 45 a2 01 movb   $0x1,-0x5e(%rbp)
80532a3c:   c6 45 a3 01 movb   $0x1,-0x5d(%rbp)
80532a40:   c6 45 a4 01 movb   $0x1,-0x5c(%rbp)
80532a44:   c6 45 a5 01 movb   $0x1,-0x5b(%rbp)
80532a48:   c6 45 a6 01 movb   $0x1,-0x5a(%rbp)
80532a4c:   c6 45 a7 01 movb   $0x1,-0x59(%rbp)
80532a50:   c6 45 a8 02 movb   $0x2,-0x58(%rbp)
80532a54:   c6 45 a9 02 movb   $0x2,-0x57(%rbp)
80532a58:   c6 45 aa 02 movb   $0x2,-0x56(%rbp)

and so on.

Alex


small changes in aesxcbcmac.c

2016-09-25 Thread Alexander Nasonov
The first change shrinks aes_xcbc_mac_init by 183 bytes on amd64
(from 562 to 379 bytes).
The second change avoids a comparison with an address that may
point beyond the end of a buffer.
The third change is stylistic.
Alex
--- sys/opencrypto/aesxcbcmac.c.orig2016-09-25 21:44:25.344941650 +0100
+++ sys/opencrypto/aesxcbcmac.c 2016-09-25 13:21:43.364224984 +0100
@@ -41,9 +41,12 @@
 int
 aes_xcbc_mac_init(void *vctx, const u_int8_t *key, u_int16_t keylen)
 {
-   u_int8_t k1seed[AES_BLOCKSIZE] = { 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 };
-   u_int8_t k2seed[AES_BLOCKSIZE] = { 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 };
-   u_int8_t k3seed[AES_BLOCKSIZE] = { 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3 };
+   static const u_int8_t k1seed[AES_BLOCKSIZE] =
+   { 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 };
+   static const u_int8_t k2seed[AES_BLOCKSIZE] =
+   { 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 };
+   static const u_int8_t k3seed[AES_BLOCKSIZE] =
+   { 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3 };
u_int32_t r_ks[(RIJNDAEL_MAXNR+1)*4];
aesxcbc_ctx *ctx;
u_int8_t k1[AES_BLOCKSIZE];
@@ -98,7 +101,7 @@
ctx->buflen = 0;
}
/* due to the special processing for M[n], "=" case is not included */
-   while (addr + AES_BLOCKSIZE < ep) {
+   while (ep - addr > AES_BLOCKSIZE) {
memcpy(buf, addr, AES_BLOCKSIZE);
for (i = 0; i < sizeof(buf); i++)
buf[i] ^= ctx->e[i];
@@ -115,7 +118,7 @@
 void
 aes_xcbc_mac_result(u_int8_t *addr, void *vctx)
 {
-   u_char digest[AES_BLOCKSIZE];
+   u_int8_t digest[AES_BLOCKSIZE];
aesxcbc_ctx *ctx;
int i;
 


Re: cgdstrategy: divide fault in supervisor mode

2016-09-14 Thread Alexander Nasonov
Michael van Elst wrote:
> Right. This needs to be written differently. Instead of GETCGD_SOFTC()
> use:
> 
>   cs = getcgd_softc(bp->b_dev);
>   if (!cs) {
>   bp->b_error = ENXIO;
>   biodone(bp);
>   return;
> }

I enabled DEBUG in the config and changed cgdstrategy. Same crash:

Stopped in pid 10.1 (mount_ffs) at  netbsd:cgdstrategy+0x2d:divl
4
0(%r12),%eax

808edcd8 :
808edcd8:   55  push   %rbp
808edcd9:   48 89 e5mov%rsp,%rbp
808edcdc:   53  push   %rbx
808edcdd:   48 83 ec 08 sub$0x8,%rsp
808edce1:   48 89 fbmov%rdi,%rbx
808edce4:   f6 05 d5 d0 8e 00 01testb  $0x1,0x8ed0d5(%rip)  
  # 811dadc0 
808edceb:   75 52   jne808edd3f 

808edced:   48 8b 7b 38 mov0x38(%rbx),%rdi
808edcf1:   e8 e5 fd ff ff  callq  808edadb 

808edcf6:   48 89 c7mov%rax,%rdi
808edcf9:   48 85 c0test   %rax,%rax
808edcfc:   74 58   je 808edd56 

808edcfe:   48 83 7b 48 00  cmpq   $0x0,0x48(%rbx)
808edd03:   8b 4b 34mov0x34(%rbx),%ecx
808edd06:   78 11   js 808edd19 

808edd08:   89 c8   mov%ecx,%eax
808edd0a:   31 d2   xor%edx,%edx
808edd0c:   f7 77 40divl   0x40(%rdi)
808edd0f:   85 d2   test   %edx,%edx
808edd11:   75 06   jne808edd19 

808edd13:   f6 43 40 03 testb  $0x3,0x40(%rbx)
808edd17:   74 18   je 808edd31 

808edd19:   c7 43 20 16 00 00 00movl   $0x16,0x20(%rbx)
808edd20:   89 4b 24mov%ecx,0x24(%rbx)
808edd23:   48 89 dfmov%rbx,%rdi
808edd26:   48 83 c4 08 add$0x8,%rsp
808edd2a:   5b  pop%rbx
808edd2b:   5d  pop%rbp
808edd2c:   e9 f0 c3 fc ff  jmpq   808ba121 

808edd31:   48 89 demov%rbx,%rsi
808edd34:   48 83 c4 08 add$0x8,%rsp
808edd38:   5b  pop%rbx
808edd39:   5d  pop%rbp
808edd3a:   e9 a1 2e 00 00  jmpq   808f0be0 

808edd3f:   48 63 57 34 movslq 0x34(%rdi),%rdx
808edd43:   48 89 femov%rdi,%rsi
808edd46:   48 c7 c7 18 15 f9 80mov$0x80f91518,%rdi
808edd4d:   31 c0   xor%eax,%eax
808edd4f:   e8 4f d8 f8 ff  callq  8087b5a3 
808edd54:   eb 97   jmp808edced 

808edd56:   c7 43 20 06 00 00 00movl   $0x6,0x20(%rbx)
808edd5d:   eb c4   jmp808edd23 


808eeb2e:   48 c7 c7 d8 dc 8e 80mov$0x808edcd8,%rdi
808eeb35:   5b  pop%rbx
808eeb36:   41 5c   pop%r12
808eeb38:   5d  pop%rbp
808eeb39:   e9 4f db f4 ff  jmpq   8083c68d 

808eeb9d:   48 c7 c7 d8 dc 8e 80mov$0x808edcd8,%rdi
808eeba4:   5b  pop%rbx
808eeba5:   41 5c   pop%r12
808eeba7:   5d  pop%rbp
808eeba8:   e9 e0 da f4 ff  jmpq   8083c68d 

Alex


Re: cgdstrategy: divide fault in supervisor mode

2016-09-14 Thread Alexander Nasonov
Michael van Elst wrote:
> Right. This needs to be written differently. Instead of GETCGD_SOFTC()
> use:
> 
>   cs = getcgd_softc(bp->b_dev);
>   if (!cs) {
>   bp->b_error = ENXIO;
>   biodone(bp);
>   return;
> }


I tried something similar but with bp->b_resid = bp->b_bcount;
instead of biodone(bp); It still crashes.

I'll try your code.

Alex


Re: cgdstrategy: divide fault in supervisor mode

2016-09-14 Thread Alexander Nasonov
Michael van Elst wrote:
> Ah, maybe then:
> 
> --- cgd.c   5 Aug 2016 08:24:46 -   1.110
> +++ cgd.c   13 Sep 2016 21:43:27 -
> @@ -305,13 +305,17 @@
>  static void
>  cgdstrategy(struct buf *bp)
>  {
> -   struct  cgd_softc *cs = getcgd_softc(bp->b_dev);
> -   struct  dk_softc *dksc = >sc_dksc;
> -   struct  disk_geom *dg = >sc_dkdev.dk_geom;
> +   struct  cgd_softc *cs;
> +   struct  dk_softc *dksc;
> +   struct  disk_geom *dg;
>  
> DPRINTF_FOLLOW(("cgdstrategy(%p): b_bcount = %ld\n", bp,
> (long)bp->b_bcount));
>  
> +   GETCGD_SOFTC(cs, bp->b_dev);
> +   dksc = >sc_dksc;
> +   dg = >sc_dkdev.dk_geom;
> +

It will not compile because cgdstrategy() returns void.

Alex


Re: cgdstrategy: divide fault in supervisor mode

2016-09-13 Thread Alexander Nasonov
Alexander Nasonov wrote:
> I can examine data at 0x34 offset and it's indeed 0.

Correction: $rdi+0x40 is the correct location.

I also inspected low half of dev_t ($rbx+0x38) and its value was 0x1423
which corresponds to:

brw-r-  1 root  operator  20, 35 Dec 13  2015 /dev/cgd2d


Alex


Re: cgdstrategy: divide fault in supervisor mode

2016-09-13 Thread Alexander Nasonov
Michael van Elst wrote:
> That would require dg_secsize to be 0 which is difficult to do
> because the drivers initialize the value and the common disk_set_info()
> function fixes up a zero value.

I can reproduce division by zero but not when rebooting. If I take
an unconfigured cgd device, i.e. cgd2 and run

mount /dev/cgd2d /tmp

the kernel will panic instead of returning ENXIO.

> But maybe the dg pointer is bad? Please have a look at the %rdi
> register.

I don't know what was rdi's value when it crashed during reboot but
crashes when mounting /dev/cgd2d all have good kernel-space values.
I can examine data at 0x34 offset and it's indeed 0.

$ crash -M netbsd.12.core
Crash version 7.99.36, image version 7.99.36.
System panicked: dump forced via kernel debugger
Backtrace from time of crash is available.
crash> dmesg|tail
iwm0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36M
bps 48Mbps 54Mbps
acpibat0: normal capacity on 'charge state'
fatal integer divide fault in supervisor mode
trap type 8 code 0 rip 808db36f cs 8 rflags 10246 cr2 4d8000 ilevel 0 rs
p fe8116cfba50
curlwp 0xfe836fcbab00 pid 13.1 lowest kstack 0xfe8116cf82c0

dumping to dev 20,17 (offset=212951, size=3119109):
dump
crash> bt
_KERNEL_OPT_NARCNET() at 0
_KERNEL_OPT_NARCNET() at 0
db_reboot_cmd() at db_reboot_cmd
db_command() at db_command+0xeb
db_command_loop() at db_command_loop+0x90
db_trap() at db_trap+0xe3
kdb_trap() at kdb_trap+0xe1
trap() at trap+0x574
--- trap (number 8) ---
cgdstrategy() at cgdstrategy+0x26
bdev_strategy() at bdev_strategy+0x68
spec_strategy() at spec_strategy+0x81
VOP_STRATEGY() at VOP_STRATEGY+0x33
bio_doread() at bio_doread+0x98
bread() at bread+0x1a
ffs_mountfs() at ffs_mountfs+0x170
ffs_mount() at ffs_mount+0x227
VFS_MOUNT() at VFS_MOUNT+0x34
mount_domount() at mount_domount+0x122
do_sys_mount() at do_sys_mount+0x20f
sys___mount50() at sys___mount50+0x33
syscall() at syscall+0x15b
--- syscall (number 410) ---
75c7da:
crash> ps
PIDLID S CPU FLAGS   STRUCT LWP *   NAME WAIT
13   >   1 7   1 0   fe836fcbab00  mount_ffs
12   1 2   1   802   fe811681d2a0  mount
81 2   1   802   fe811681d6c0ksh
21 2   1   802   fe811681dae0ksh
11 2   1   802   fe81163f5680   init
...

Alex


cgdstrategy: divide fault in supervisor mode

2016-09-13 Thread Alexander Nasonov
Someone warned me that adding cgd to dump devices can have bad
consequences. I think I caught one possible bad case yesterday.
I was lucky enough to still have my data.

My setup is quite complicated. I have a small root on wd0a which
does only one thing: to mount a real root on cgd0a and chroot to
it. The rest of the system is on cgd1.

I was in a single-user mode, inside /altroot (iirc), all fs mounted
but I wanted to remount them in read-only mode. Some of them couldn't
be unmounted and I forced umounts with the -f flag. Then I mounted them
with read-only flag. I don't remember exact commands but I have nested
mount points, e.g. /var/log inside /var and I was definitely trying to
remount both inner and outer fs.

All mount/umount worked but when I ran reboot, the system trapped here:

fatal integer divide fault in supervisor mode
trap type 8 code 0 rip 808db36f cs 8 rflags 10246 cr2 efd...
curlwp 0xfe81163b4a40 pid 276.1 lowest kstack 0xfe8117343...
kernel: integer divide fault trap, code=0
Stopped in pid 276.1 (reboot) atnetbsd:cgdstrategy+0x26:
4
0(%rdi),%eax

This it what I run:

NetBSD neva 7.99.36 NetBSD 7.99.36 (GENERIC) #0: Fri Sep  2 22:04:02 BST 2016  
alnsn@nebeda:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC
 amd64

Sources checked out on Sep 2.

Looking at the assembly, it appears that the fault happened at the
second line of this branch:

if (bp->b_blkno < 0 ||
(bp->b_bcount % dg->dg_secsize) != 0 ||

(offset of b_blkno is 0x48, b_bcount's offset is 0x34).

808db349 :
808db349:   55  push   %rbp
808db34a:   48 89 e5mov%rsp,%rbp
808db34d:   53  push   %rbx
808db34e:   48 83 ec 08 sub$0x8,%rsp
808db352:   48 89 fbmov%rdi,%rbx
808db355:   48 8b 7f 38 mov0x38(%rdi),%rdi
808db359:   e8 4d fe ff ff  callq  808db1ab 
808db35e:   48 83 7b 48 00  cmpq   $0x0,0x48(%rbx)
808db363:   78 3d   js 808db3a2 
808db365:   48 89 c7mov%rax,%rdi
808db368:   8b 4b 34mov0x34(%rbx),%ecx
808db36b:   89 c8   mov%ecx,%eax
808db36d:   31 d2   xor%edx,%edx
808db36f:   f7 77 40divl   0x40(%rdi)

Alex


Re: current amd64 crashes in usb_transfer_complete

2016-08-14 Thread Alexander Nasonov
Alexander Nasonov wrote:
> The crash happened only once. I'm not sure that running with these
> will prove anything. I'll try to take steps to reproduce the crash
> again before running with these patches.

Got a different crash in usb_transfer_complete. I was playing with
RPI2 powered from USB on my computer and connected my computer over
uplcom(4).
It likely happened because I unplugged uplcom(4) while minicom was
still hanging up. It didn't happen immediately, though. I was able
to start X and firefox before the kernel crashed. I also replugged
power USB a couple of times but it's a very dumb, I'm not sure if
plugging it triggers any code in the kernel.

I don't know if this crash is related to the first crash. The first
crash happened while I was trying to secure yubikey into the slot.

Here's dmesg of the second crash:

panic: lock error: Mutex: mutex_vector_enter: locking against myself: lock 
0xfe81163ee370 cpu 3 lwp 0xfe8117053b20
cpu3: Begin traceback...
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
lockdebug_abort() at netbsd:lockdebug_abort+0x63
mutex_vector_enter() at netbsd:mutex_vector_enter+0x369
ucomwritecb() at netbsd:ucomwritecb+0x22
usb_transfer_complete() at netbsd:usb_transfer_complete+0x149
xhci_abort_xfer() at netbsd:xhci_abort_xfer+0x15d
usbd_ar_pipe() at netbsd:usbd_ar_pipe+0x3b
usbd_abort_pipe() at netbsd:usbd_abort_pipe+0x27
ucomclose() at netbsd:ucomclose+0x12c
spec_close() at netbsd:spec_close+0x11a
VOP_CLOSE() at netbsd:VOP_CLOSE+0x33
vn_close() at netbsd:vn_close+0x36
closef() at netbsd:closef+0x54
fd_close() at netbsd:fd_close+0x1ac
sys_close() at netbsd:sys_close+0x20
syscall() at netbsd:syscall+0x15b

crash> show lock 0xfe81163ee370
doesn't show anything because the kernel isn't compiled with LOCKDEBUG.

Alex


Re: current amd64 crashes in usb_transfer_complete

2016-08-12 Thread Alexander Nasonov
Takahiro Hayashi wrote:
> Could you try both of
> 
> http://www.netbsd.org/~skrll/usb.softint.diff
> https://mail-index.netbsd.org/tech-kern/2016/08/09/msg020963.html
> 
> and check if you can still see the problem?

The crash happened only once. I'm not sure that running with these
will prove anything. I'll try to take steps to reproduce the crash
again before running with these patches.

Alex


Re: current amd64 crashes in usb_transfer_complete

2016-08-08 Thread Alexander Nasonov
Alexander Nasonov wrote:
> Got KASSERT when a yubikey device was plugged in.
> 
> $ crash -M netbsd.9.core
> Crash version 7.99.35, image version 7.99.35.
> System panicked: kernel diagnostic assertion "xfer->ux_state == XFER_ONQU" 
> failed: file "/home/alnsn/netbsd-current/clean/src/sys/dev/usb/usbdi.c", line 
> 910
> Backtrace from time of crash is available.
> crash> bt
> _KERNEL_OPT_NARCNET() at 0
> _KERNEL_OPT_NARCNET() at 0
> vpanic() at vpanic+0x149
> cd_play_msf() at cd_play_msf
> usb_transfer_complete() at usb_transfer_complete+0x24e
> xhci_abort_xfer() at xhci_abort_xfer+0x15d
> usbd_ar_pipe() at usbd_ar_pipe+0x3b
> usbd_abort_pipe() at usbd_abort_pipe+0x27
> uhidev_detach() at uhidev_detach+0x32
> config_detach() at config_detach+0xf8
> usb_disconnect_port() at usb_disconnect_port+0xae
> uhub_explore() at uhub_explore+0x1fe
> usb_discover.isra.2() at usb_discover.isra.2+0x4e
> usb_event_thread() at usb_event_thread+0x7c
> 
> 
> Kernel built from src updated about a week ago:
> 
> $ uname -a
> NetBSD neva 7.99.35 NetBSD 7.99.35 (GENERIC) #0: Wed Aug  3 23:02:01 BST 2016 
>  
> alnsn@nebeda.localdomain:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC
>  amd64
> 
> $ gunzip -c /netbsd |ident |grep usbdi.c
>  $NetBSD: usbdi.c,v 1.171 2016/05/17 11:37:50 pooka Exp $

$ dmesg -M /var/crash/netbsd.9.core


1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
usbd_ar_pipe() at WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 125 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
netbsd:usbd_ar_pipe+0x3b
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 132 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6
WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7
WARNING: SPL NOT LOWER

current amd64 crashes in usb_transfer_complete

2016-08-08 Thread Alexander Nasonov
Got KASSERT when a yubikey device was plugged in.

$ crash -M netbsd.9.core
Crash version 7.99.35, image version 7.99.35.
System panicked: kernel diagnostic assertion "xfer->ux_state == XFER_ONQU" 
failed: file "/home/alnsn/netbsd-current/clean/src/sys/dev/usb/usbdi.c", line 
910
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NARCNET() at 0
_KERNEL_OPT_NARCNET() at 0
vpanic() at vpanic+0x149
cd_play_msf() at cd_play_msf
usb_transfer_complete() at usb_transfer_complete+0x24e
xhci_abort_xfer() at xhci_abort_xfer+0x15d
usbd_ar_pipe() at usbd_ar_pipe+0x3b
usbd_abort_pipe() at usbd_abort_pipe+0x27
uhidev_detach() at uhidev_detach+0x32
config_detach() at config_detach+0xf8
usb_disconnect_port() at usb_disconnect_port+0xae
uhub_explore() at uhub_explore+0x1fe
usb_discover.isra.2() at usb_discover.isra.2+0x4e
usb_event_thread() at usb_event_thread+0x7c


Kernel built from src updated about a week ago:

$ uname -a
NetBSD neva 7.99.35 NetBSD 7.99.35 (GENERIC) #0: Wed Aug  3 23:02:01 BST 2016  
alnsn@nebeda.localdomain:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC
 amd64

$ gunzip -c /netbsd |ident |grep usbdi.c
 $NetBSD: usbdi.c,v 1.171 2016/05/17 11:37:50 pooka Exp $

Alex


Re: UVM and the NULL page

2016-07-30 Thread Alexander Nasonov
Maxime Villard wrote:
> ...
> You know - as well as I do - that NULL pointer dereferences are quite common,
> and that it is the main way to execute malicious code in kernel mode. Allowing
> NULL is a huge problem on architectures like amd64. The way you are talking
> about compatibility sounds like you are ready to sacrifice the security of
> almost every NetBSD user just to allow a few programs that make use of a 
> mapping
> model which has been known to be flawed for years. This is ridiculous.

I don't think that NULL pointer dereference is "is the main way to
execute malicious code in kernel mode". There are so many other ways.

If you're so scared about being hacked, stop writing in C.

-- 
Alex


Re: dump to cgdNb device

2016-06-16 Thread Alexander Nasonov
Alexander Nasonov wrote:
> Taylor R Campbell wrote:
> > Maybe savecore should have an option to specify what the dump device
> > is supposed to be.
> 
> I think it has but I didn't try it.

Ah, I misread your sentence. savecore understands -N but I didn't try
it.

-- 
Alex


Re: dump to cgdNb device

2016-06-16 Thread Alexander Nasonov
Taylor R Campbell wrote:
> Can you ktrace it to see what devices it's actually looking at?
> 
> Every time I try to scrutinize savecore it makes me dizzy.  I think if
> you selected cgd1b as the dump device with `swapctl -D', rather than
> using the default of b, and/or if the running kernel image
> doesn't match the dumped kernel image, savecore may get confused.

Indeed. When I renamed my kernel to /netbsd, savecore worked.

> Maybe savecore should have an option to specify what the dump device
> is supposed to be.

I think it has but I didn't try it.

Alex


Re: dump to cgdNb device

2016-06-16 Thread Alexander Nasonov
Taylor R Campbell wrote:
> So maybe it did get written, but there's just something funny about
> it.  If you pass -v to savecore, can you find what aspect of the dump
> savecore doesn't like?

I don't see any difference between -vf and -f.

savecore: magic number mismatch (0x0 != 0x8fca0101)
savecore: no core dump
savecore: warning: /dev/ksyms version mismatch:
NetBSD 7.99.30 (GENERIC) #0: Sun May 29 20:24:03 BST 2016

alnsn@nebeda.localdomain:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC

and atad

savecore: msgbuf magic incorrect
savecore: reboot after panic: <<< 1K+ of non-printable nonsense >>>
savecore: dump time is zero
savecore: writing core to /var/crash/netbsd.5.core
savecore: writing kernel to /var/crash/netbsd.5
savecore: (null): Bad address
dumplo = 110530048 (215879 * 512)

Alex


Re: dump to cgdNb device

2016-06-16 Thread Alexander Nasonov
Taylor R Campbell wrote:
> Just to double-check (since I am likely to make this mistake myself!),
> you're not using a random-keyed cgd, right?  Random-keyed cgd is great
> for swap while the key is in memory but not so great for dump after
> rebooting!

I have both types but the one I was testing was definitely with a
persistent key.

> Does strings(1) on /dev/rcgd1b show anything vaguely meaningful?

Yes, I see this string:

NetBSD 7.99.30 (CGDDEBUG) #0: Thu Jun 16 19:40:27 BST 2016

alnsn@neva:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/CGDDEBUG

This is the kernel I was running. The disk was initially filled
with random bits and CGDDEBUG kernel didn't exist back then.

> Hmm...  I might have caused your dump to scribble all over another
> part of the disk, by not adjusting the blkno before passing it along
> to bdev_dump.  I'm not sure about this -- but the blkno adjustments
> certainly need to be reviewed.

I don't see any problem after reboot. I cat'ed all files in neighbour
partitions (496M and 3G) to /dev/null and I don't see any error. There
aren't full, though. One if about 50% full, the other is 23% full.

Alex


Re: dump to cgdNb device

2016-06-16 Thread Alexander Nasonov
Taylor R Campbell wrote:
> Slightly simpler version attached -- device_private is not
> device_lookup; device_private never fails.  (If the device_lookup in
> cgddump failed, we wouldn't have gotten to cgd_dumpblocks anyway.)

It dumped succesfully but savecore doesn't see any valid core. If I
run savecore -f, it prints gibberish.

Alex


Re: dump to cgdNb device

2016-06-16 Thread Alexander Nasonov
Michael van Elst wrote:
> al...@yandex.ru (Alexander Nasonov) writes:
> 
> >There is a risk even with hardware devices but it's smaller because less
> >software is involved. Dumping to cgd is a quite important usecase and
> >perhaps we should make an exception. Would it help to RO protect some
> >data structures like private keys?
> 
> You would need to protect all data that is required to dump a block,
> the keys aren't more important than e.g. the disklabel or the
> bus space handle of the disk controller.

True, but you have to protect disklabel even for "hardware" devices. My
point was about protecting code specific to a "software" device to make
it looks more like "hardware" device.

Alex


dump to cgdNb device

2016-06-15 Thread Alexander Nasonov
Hi,

I setup an encrypted disk cgd1 (aes-cbc 256 on top of wd0g, disklabel
verification) with a dump device cgd1b but I can't dump to it (I enter
ddb and type sync to dump). It prints "device bad".

If I enable CGDDEBUG and set cgddebug to 1, it prints some additional
information (typing from a photo):

dumping to dev 20,17 (offset=215879, size=3119109):getcgd_softc(0x1411): unit =
1
cgdsize(0x1411)
cgdopen(0x1411, 0)
getcgd_softc(0x1411): unit = 1
cgdclose(0x1411, 0)
getcgd_softc(0x1411): unit = 1

dump cgddump(0x1411, 215879, 0x8000d6c3e000, 4096)
getcgd_softc(0x1411, unit = 1
device bad

With more verbose debugging it can hang instead of printing "device
bad". For example, if I set cgddebug to 4 and mount cgd1e

dumping to dev 20,17 (offset=215879, size=3119109):getcgd_softc(0x1411): unit =
1
cgdsize(0x1411)
cgdopen(0x1411, 0)
getcgd_softc(0x1411): unit = 1
getcgd_softc(0x1413): unit = 1
cgdstrategy(0xfe836f63b5a8): b_bcount = 512
cgd_diskstart(x0fe836f85cd08, 0xfe836f63b5a8)


Is it expected to work at all?

I'm running:

NetBSD neva 7.99.30 NetBSD 7.99.30 (GENERIC) #0: Sun May 29 20:24:03 BST
2016
alnsn@nebeda.localdomain:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC
amd64

wd0 disklabel (cgd1 is on wd0g):
#sizeoffset fstype [fsize bsize cpg/sgs]
 a:   1049328 134219776 4.2BSD   1024  8192 0  # (Cyl. 133154*- 134195*)
 b:  25168752 135269104   swap # (Cyl. 134195*- 159164*)
 c: 365898416 134219776 unused  0 0# (Cyl. 133154*- 496148)
 d: 500118192 0 unused  0 0# (Cyl.  0 - 496148)
 e:  33554432  2048 Linux Ext2  0 0# (Cyl.  2*-  33290*)
 f: 100663296  33556480 Linux Ext2  0 0# (Cyl.  33290*- 133154*)
 g: 339680336 160437856 4.2BSD   2048 16384 0  # (Cyl. 159164*- 496148)

cgd1 disklabel:

#sizeoffset fstype [fsize bsize cpg/sgs]
 a:   1048576 0 4.2BSD   1024  8192 0  # (Cyl.  0 -511)
 b:  25168752   1048576   swap # (Cyl.512 -  12801*)
 d: 339680336 0 unused  0 0# (Cyl.  0 - 165859*)
 e:   6292188  26217328 4.2BSD   2048 16384 0  # (Cyl.  12801*-  15873*)
 f: 227059614  32509516 4.2BSD   2048 16384 0  # (Cyl.  15873*- 126742*)
 g:  41943040 259569130 4.2BSD   2048 16384 0  # (Cyl. 126742*- 147222*)
 h:  33554432 301512170 4.2BSD   2048 16384 0  # (Cyl. 147222*- 163606*)
 i:   2516582 335066602 4.2BSD   2048 16384 0  # (Cyl. 163606*- 164835*)
 j:   1048576 337583184 4.2BSD   1024  8192 0  # (Cyl. 164835*- 165347*)
 k:   1048576 338631760 4.2BSD   1024  8192 0  # (Cyl. 165347*- 165859*)

Thanks,
Alex


Re: status of Linux ptrace on amd64?

2014-12-04 Thread Alexander Nasonov
Christos Zoulas wrote:
 In article 20141202221949.GA4053@neva,
 Alexander Nasonov  al...@yandex.ru wrote:
 I don't think source code is available.
 
 It is going to be hard to get it to work

Yeah. I can't make any further progress. Working gdb on 32bit Linux
binary gave me a false hope.

I'm stuck at orig_ax. If I understand correctly, pin (and strace
too) want to know a syscall number of a process stopped in a syscall
and they query for orig_ax. I use (lt-l_sysent - emul_linux.e_sysent)
to get a syscall number but l_sysent is always NULL. I haven't
looked at whether a traced process is really in syscall and whether
l_sysent is guaranteed to be syncronised accross cpus (I think, it's
not but l_sysent is always NULL in nonsmp kernel too).

Anyway, I've got some working code. It's not ptrace related, though.
To make the tool run, I needed to add support for /proc/sys/kernel/osrelease.
The patch is here

http://www.netbsd.org/~alnsn/procfs-sys-kernel-osrelease.diff

Alex


/proc/sys/kernel/osrelease patch

2014-12-04 Thread Alexander Nasonov
[Changing the topic from Re: status of Linux ptrace on amd64?]

Christos Zoulas wrote:
 On Dec 4, 10:15am, m...@netbsd.org (Emmanuel Dreyfus) wrote:
 | That looks fine. Any other reviewer?
 
 The code in the PFSsyskern and PFSsys can be factored out to a function
 (and used perhaps by other directories with single entries).

The procfs code has a mini framework for static entries but it's
limited to the root directory only. Extending it would be a better
approach I think but it would require more work.

 nm[0] assignment can move outside the if like nm[3].

I'd like to move two almost identical copies of the code in
procfs_doversion() and procfs_doosrelease() to a new helper
function. I'll check if your still comment applies after the
change.

Alex


Re: status of Linux ptrace on amd64?

2014-12-02 Thread Alexander Nasonov
Alexander Nasonov wrote:
 Christos Zoulas wrote:
  Should not be that hard, but what is that tool reading from PEEKUSER?
  Registers?
 
 In both cases it reads from user_regs_struct, if I understand everything
 correctly. But it's the first step, the tool would definitely try other
 things if PEEKUSER didn't fail. In fact, I'm not sure that thing would
 work at all because it's a dynamic instrumentation tool called Pin. It
 updates code on the fly.

OK, I implemented reading orig_ax by PEEKUSER. It fixed ptrace failure
but it added two problems:

1. I had to disable compat_linux32 to fix compilation. I need
   to make linux_sys_ptrace_arch multiarch-aware.
2. PEEKUSER didn't change anything. The tool still detaches its child
   and exits. I can't debug to see what's going on, gdb receives SIGUSR1
   from the tool. I suspect PEEKUSER returns unexpected value.

Alex


Re: status of Linux ptrace on amd64?

2014-12-02 Thread Alexander Nasonov
Christos Zoulas wrote:
 Have you tried porting the tool to NetBSD? What is involved?

I don't think source code is available.

Alex


status of Linux ptrace on amd64?

2014-12-01 Thread Alexander Nasonov
Hi,

While trying to make some Linux instrumentation tool work on -current
amd64 I noticed that ptrace support in compat_linux and compat_linux32
have some notable differences and compat_linux32 has a better support.
For instance, I can debug 32bit Linux binaries with NetBSD's gdb but
any attempt to set a breakpoint on 64bit Linux results in SIGTRAP.

Neither compat_linux32 nor compat_linux work for that particular tool
because they don't support LIUNX_PEEKUSER.

So, I wonder how much work it is to improve ptrace in compat_linux and
compat_linux32?

Alex


Re: status of Linux ptrace on amd64?

2014-12-01 Thread Alexander Nasonov
Christos Zoulas wrote:
 Should not be that hard, but what is that tool reading from PEEKUSER?
 Registers?

In both cases it reads from user_regs_struct, if I understand everything
correctly. But it's the first step, the tool would definitely try other
things if PEEKUSER didn't fail. In fact, I'm not sure that thing would
work at all because it's a dynamic instrumentation tool called Pin. It
updates code on the fly.

I've got one more question. Is it possible for Linux emulated binary
to control NetBSD native binary with ptrace? I see that it can attach,
pokedata and continue but would it be able to do more advanced things
correctly?

Alex


ACPI warning

2014-11-27 Thread Alexander Nasonov
Hi,

I noticed a strange warning on my console:

ACPI Warning: \_SB_.PCI0.LPCB.SNC_.GSNE: Insufficient arguments - Caller
passed 0, method requires 1 (20131218/nsarguments-263)

What does it mean?

In case it matters, I was playing with linux emulation and
/emul/linux/proc/version.

I'm running -current built in August.

NetBSD neva 7.99.1 NetBSD 7.99.1 (GENERIC) #0: Wed Aug 20 18:54:44 BST
2014
alnsn@neva:/home/alnsn/netbsd-current/src/sys/arch/amd64/compile/obj/GENERIC
amd64

Alex


Re: icache sync private rump component

2014-07-21 Thread Alexander Nasonov
Matt Thomas wrote:
 
 On Jul 19, 2014, at 2:02 AM, Alexander Nasonov al...@yandex.ru wrote:
 
  To compile mips/cache.h in rump kernel, I needed to add -DMIPS3=1
  to Makefile.rump for mips platforms. This is the only change outside
  of sljit scope.
 
 the cache instructions are privileged.  There's a sysarch interface
 that you can use the clean the cache.

That's exactly what rumpcomp_sync_icache() hypercall does. I need this
definition to compile cache.c stub.

To draw an analogy, my hack is similar to this hack, except that I
define one baseline cpu while they define all cpus:

#ifdef _KERNEL
#if defined(_MODULAR) || defined(_LKM) || defined(_STANDALONE)
/* Assume all CPU architectures are valid for LKM's and standlone progs */
#define MIPS1   1
#define MIPS3   1
#define MIPS4   1
#define MIPS32  1
#define MIPS32R21
#define MIPS64  1
#define MIPS64R21
#endif

Alex


Re: icache sync private rump component

2014-07-20 Thread Alexander Nasonov
Justin Cormack wrote:
 On Jul 19, 2014 10:01 AM, Alexander Nasonov al...@yandex.ru wrote:
 
  To compile mips/cache.h in rump kernel, I needed to add -DMIPS3=1
  to Makefile.rump for mips platforms. This is the only change outside
  of sljit scope.
 
 You surely can't do that, people may be trying to compile on non mips3
 hardware?

Some mips have different ILP models in userspace and in the kernel.
This means that rump test doesn't necessarily test the code in the
kernel. I came up with the following:

In sys/rump/Makefile.rump:

# Define baseline cpu for mips ports, required for
# rumpcomp_sync_icache() hypercall.
.if !empty(MACHINE_ARCH:Mmips*)
.if !empty(MACHINE_ARCH:Mmips64*)
CPPFLAGS+=  -DMIPS64=1
.else
CPPFLAGS+=  -DMIPS1=1
.endif
.endif

In sys/arch/mips/include/sljitarch.h:

#ifdef _LP64
#define SLJIT_CONFIG_MIPS_64 1
#else
#define SLJIT_CONFIG_MIPS_32 1
#endif

Alex


icache sync private rump component

2014-07-19 Thread Alexander Nasonov
Hi,

I'd like to commit a private rump component that adds a hypercall for
synching icache. This will help us to test bpfjit and npf on arm and
mips platforms.

I already sent a couple of emails to Antti but because he hasn't
replied yet and the branching date is fast approaching, I thought
I'd give a heads up to the community. I'll commit my changes if I
don't hear any objections in the next few days.

Basically, this component resides in librumpkern_sljit and it
exposes one function:

int rumpcomp_sync_icache(void *, uint64_t);

This declaration doesn't go to any public header file because it's
only being used inside librumpkern_sljit.

On arm, rumpcomp_sync_icache() makes ARM_SYNC_ICACHE sysarch syscall
while on mips it calls _cacheflush() which is defined in libc.

On the kernel side, both arm and mips use a global object with a bunch
of function pointers to various cpu-related routines. Those objects
are defined in cpufunc.c and cache.c, respectively.

For the rump kernel, I add barebone versions of those objects with
only one non-NULL function pointer for icache_rync_range routine.

To compile mips/cache.h in rump kernel, I needed to add -DMIPS3=1
to Makefile.rump for mips platforms. This is the only change outside
of sljit scope.

Alex


Re: icache sync private rump component

2014-07-19 Thread Alexander Nasonov
Justin Cormack wrote:
 On Jul 19, 2014 10:01 AM, Alexander Nasonov al...@yandex.ru wrote:
 
  To compile mips/cache.h in rump kernel, I needed to add -DMIPS3=1
  to Makefile.rump for mips platforms. This is the only change outside
  of sljit scope.
 
 You surely can't do that, people may be trying to compile on non mips3
 hardware?

Pre mips3 or post mips3? Or even completely different port like amd64?

If rump kernel worked without setting any of these variables, chances are,
it will work with -DMIPS3. I can actually change it to -DMIPS1.

Alex


Re: icache sync private rump component

2014-07-19 Thread Alexander Nasonov
Alexander Nasonov wrote:
 Justin Cormack wrote:
  On Jul 19, 2014 10:01 AM, Alexander Nasonov al...@yandex.ru wrote:
  
   To compile mips/cache.h in rump kernel, I needed to add -DMIPS3=1
   to Makefile.rump for mips platforms. This is the only change outside
   of sljit scope.
  
  You surely can't do that, people may be trying to compile on non mips3
  hardware?
 
 Pre mips3 or post mips3? Or even completely different port like amd64?
 
 If rump kernel worked without setting any of these variables, chances are,
 it will work with -DMIPS3. I can actually change it to -DMIPS1.

Wikipedia (https://en.wikipedia.org/wiki/MIPS_architecture) states that
Each revision is a superset of its predecessors. So, I can use MIPS I
as a base and set -DMIPS1=1 for rump. Does it sound good?

Alex


Re: icache sync private rump component

2014-07-19 Thread Alexander Nasonov
Alexander Nasonov wrote:
 Wikipedia (https://en.wikipedia.org/wiki/MIPS_architecture) states that
 Each revision is a superset of its predecessors.

This is strictly true for MIPS I to MIPS V but we also have MIPS
32 and MIPS 64. MIPS 32 is based on MIPS II (with some features
from MIPS III, IV and V) while MIPS 64 is based on MIPS V.
So, -DMIPS1=1 should be fine for all ports.

Alex


Re: icache sync private rump component

2014-07-19 Thread Alexander Nasonov
Greg Troxel wrote:
 Why is this private?  If it's generally necessary, it should become part
 of the standard interface.Is it just that sljit is the only place
 that is currently creating code that is later executed?

I made a change to the standard place about a month ago but Antti
objected and I rolled it back. We discussed it on source-changes-d.

If my understaning is correct, Antti would like to see a generic
interface that handles other related things like W^X protection.
In the end we agreed on creating a private component.

Yes, sljit is the only place that generate code on the fly. Other
executable code is handled by rt-linker.

Alex


pgpnhjA0K1EsT.pgp
Description: PGP signature


Re: icache sync private rump component

2014-07-19 Thread Alexander Nasonov
19.07.2014, 19:57, Greg Troxel g...@ir.bbn.com:
 Are you saying that's
 the evenntual goal and this is just a temporary hack to make sjlit work
 on rump?

Committed code will have a comment stating that it's temporary but because I 
don't know what exactly should be in the new interface I can't promise that I 
can undertake that work.

Alex

PS sorry for the formatting, I'm replying from a tablet.

-- 
Alex


Re: lua: pending patches

2014-07-12 Thread Alexander Nasonov
Lourival Vieira Neto wrote:
 Hi folks,
 
 Here are some pending patches which I want to commit:
 http://www.netbsd.org/~lneto/pending/. Please, could someone review
 them?
 
 Thank you in advance!
 
 --
 0007:
 lua: updated from 5.1 to 5.3 work3

I will review the changes later but I wonder why the rush to update lua
to work-in-progress version?

Alex


Re: one time crash in usb_allocmem_flags

2014-02-10 Thread Alexander Nasonov
10.02.14, 12:15, Nick Hudson sk...@netbsd.org:
 
 Please fill a PR so it doesn't get forgotten about.

Sure, will do.

 
 At first glance it doesn't look like that usb_frag_freelist isn't 
 protected correctly. I looks more like random corruption. What was the 
 value of %edx?

The stack isn't in that function anymore, I'm not sure it shows the right 
values. 'show registers' command prints all zeroes except rbp=fe80ca6f1450 
and rsp=fe80ca6f1410.

-- 
Alex


one time crash in usb_allocmem_flags

2014-02-09 Thread Alexander Nasonov
Hi,

I was running current amd64 (last updated few weeks ago) when I got
a random crash shortly after switching to X mode. If my analysis is
correct, it crashed in usb_allocmem_flags inside this loop:

LIST_FOREACH(f, usb_frag_freelist, next) {
KDASSERTMSG(usb_valid_block_p(f-block, usb_blk_fraglist),
%s: usb frag %p: unknown block pointer %p,
 __func__, f, f-block);
if (f-block-tag == tag)
break;
}

It couldn't access f-block-tag. I wasn't actively using any of
the usb devices at that time. I wonder if it's a known problem or
should I file a PR? Details of the analysis is below.

Thanks,
Alex

crash dmesg
...
fatal protection fault in supervisor mode
trap type 4 code 0 rip 808515e2 cs 8 rflags 13282 cr2
7f7ff5773020 ilevel 0 rsp fe80ca6f16c0
curlwp 0xfe811a8aaba0 pid 475.1 lowest kstack 0xfe80ca6ee000
panic: trap
cpu2: Begin traceback...
vpanic() at netbsd:vpanic+0x13e
printf_nolog() at netbsd:printf_nolog
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x9e
ehci_allocm() at netbsd:ehci_allocm+0x2c
usbd_transfer() at netbsd:usbd_transfer+0x5f
usbd_open_pipe_intr() at netbsd:usbd_open_pipe_intr+0xcb
uhidev_open() at netbsd:uhidev_open+0xb3
wsmouseopen() at netbsd:wsmouseopen+0xf3
cdev_open() at netbsd:cdev_open+0x87
spec_open() at netbsd:spec_open+0x183
VOP_OPEN() at netbsd:VOP_OPEN+0x33
vn_open() at netbsd:vn_open+0x1b0
do_open() at netbsd:do_open+0x102
do_sys_openat() at netbsd:do_sys_openat+0x68
sys_open() at netbsd:sys_open+0x24
syscall() at netbsd:syscall+0x9a
--- syscall (number 5) ---
7f7ff403af3a:
cpu2: End traceback...
rebooting in 10 9 8 7 6 5 4 3 2 1 0

crash dmesg|grep usb
usb0 at xhci0: USB revision 2.0
usb1 at ehci0: USB revision 2.0
uhub0 at usb0: NetBSD xHCI Root Hub, class 9/0, rev 2.00/1.00, addr 0
uhub1 at usb1: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00,
addr 1
usbd_transfer() at netbsd:usbd_transfer+0x5f
usbd_open_pipe_intr() at netbsd:usbd_open_pipe_intr+0xcb

crash x 0x808515e2
usb_allocmem_flags+0xfd:751a3948


$ objdump -d /netbsd
...
8085158b:   48 c7 c7 60 15 f8 80mov$0x80f81560,%rdi
80851592:   e8 69 42 d3 ff  callq  80585800 
mutex_enter
80851597:   48 8b 05 c2 bf 69 00mov0x69bfc2(%rip),%rax  
  # 80eed560 usb_frag_freelist
8085159e:   48 85 c0test   %rax,%rax
808515a1:   75 3c   jne808515df 
usb_allocmem_flags+0xfa

/* You don't need to look at this block */
808515a3:   48 8d 4d c8 lea-0x38(%rbp),%rcx
808515a7:   45 31 c0xor%r8d,%r8d
808515aa:   ba 40 00 00 00  mov$0x40,%edx
808515af:   be 00 10 00 00  mov$0x1000,%esi
808515b4:   48 89 dfmov%rbx,%rdi
808515b7:   e8 f4 fb ff ff  callq  808511b0 
usb_block_allocmem
808515bc:   89 c3   mov%eax,%ebx
808515be:   85 c0   test   %eax,%eax
808515c0:   75 ac   jne8085156e 
usb_allocmem_flags+0x89
808515c2:   48 8b 4d c8 mov-0x38(%rbp),%rcx
808515c6:   c7 41 38 00 00 00 00movl   $0x0,0x38(%rcx)
808515cd:   bb 40 00 00 00  mov$0x40,%ebx
808515d2:   31 d2   xor%edx,%edx
808515d4:   eb 57   jmp8085162d 
usb_allocmem_flags+0x148
/* end of block. */

/*  LIST_FOREACH(f, usb_frag_freelist, next) { */
808515d6:   48 8b 40 10 mov0x10(%rax),%rax
808515da:   48 85 c0test   %rax,%rax
808515dd:   74 c4   je 808515a3 
usb_allocmem_flags+0xbe
808515df:   48 8b 10mov(%rax),%rdx
808515e2:   48 39 1acmp%rbx,(%rdx)
808515e5:   75 ef   jne808515d6 
usb_allocmem_flags+0xf1


crash ps
PIDLID S CPU FLAGS   STRUCT LWP *   NAME WAIT
475 1 7   2 0   fe811a8aaba0   Xorg
72   1 2   3   902   fe811a709b80  xinit
43   1 2   3   802   fe811a709760 sh
437  1 2   3   802   fe811d311720ksh
420  1 2   2   802   fe811e2b6240  getty
435  1 2   0   802   fe811e2b6a80  getty
429  1 2   3   802   fe811e2b6660  login
412  1 2   0   802   fe811e4c1220  getty
390  1 2   0   802   fe8119a90b60   cron
407  1 2   0   802   fe811d767b00  inetd
342  1 2   3   802   fe811d311300  

Mostly working uts(4) touchscreen

2014-01-04 Thread Alexander Nasonov
Touchscreen of my new notebook was reporting touchscreen has no
range report which turned out to be Z-axis related and I thought
that always passing z=0 might solve the problem.

And it indeed mostly solved the problem. I can move the mouse and
click inside the modular Xorg. The only problem is that the mouse
moves two times faster to the right and 4-5 faster to the bottom.

Basically, it works like if I had a small touchpad at the top left
corner of my screen.

If I'm in a text mode and run 'wsmoused -d /dev/wsmouse1', the
cursor doesn't move at all but it flashes (in the center of the
screen) when I touch the screen.

I wonder where should I make the last change to restore 1:1 mappings
between movements and screen pixels?

Excerpts from dmesg, xorg.conf and my patch to dev/usb/uts.c are below.

Alex

Section InputDevice
Identifier  Mouse1
Driver  mouse
Option  Protocol wsmouse
Option  Device /dev/wsmouse1
Option  ZAxisMapping X
EndSection

$ dmesg |grep uhid
uhidev0 at uhub2 port 3 configuration 1 interface 0
uhidev0: eGalax Inc. eGalaxTouch EXC7910-1031-12.00.03, rev 2.00/31.04, addr 5, 
iclass 3/1
uhidev0: 7 report ids
uhid0 at uhidev0 reportid 3: input=63, output=63, feature=0
uhid1 at uhidev0 reportid 5: input=0, output=0, feature=2
uts0 at uhidev0 reportid 6wsmouse1 at uts0 mux 0
uhid2 at uhidev0 reportid 7: input=0, output=0, feature=256
uhidev0 at uhub2 port 3 configuration 1 interface 0
uhidev0: eGalax Inc. eGalaxTouch EXC7910-1031-12.00.03, rev 2.00/31.04, addr 5, 
iclass 3/1
uhidev0: 7 report ids
uhid0 at uhidev0 reportid 3: input=63, output=63, feature=0
uhid1 at uhidev0 reportid 5: input=0, output=0, feature=2
uts0 at uhidev0 reportid 6wsmouse1 at uts0 mux 0
uhid2 at uhidev0 reportid 7: input=0, output=0, feature=256
uhidev0 at uhub2 port 3 configuration 1 interface 0
uhidev0: eGalax Inc. eGalaxTouch EXC7910-1031-12.00.03, rev 2.00/31.04, addr 4, 
iclass 3/1
uhidev0: 7 report ids
uhid0 at uhidev0 reportid 3: input=63, output=63, feature=0
uhid1 at uhidev0 reportid 5: input=0, output=0, feature=2
uts0 at uhidev0 reportid 6wsmouse1 at uts0 mux 0
uhid2 at uhidev0 reportid 7: input=0, output=0, feature=256

$ cvs diff -u sys/dev/usb/uts.c
Index: sys/dev/usb/uts.c
===
RCS file: /cvsroot/src/sys/dev/usb/uts.c,v
retrieving revision 1.3
diff -p -u -u -r1.3 uts.c
--- sys/dev/usb/uts.c   5 Jan 2013 23:34:21 -   1.3
+++ sys/dev/usb/uts.c   4 Jan 2014 23:08:03 -
@@ -75,6 +75,7 @@ struct uts_softc {
struct hid_location sc_loc_btn;

int sc_enabled;
+   int z_enabled;

int flags;  /* device configuration */
 #define UTS_ABS0x1 /* absolute position */
@@ -199,13 +200,9 @@ uts_attach(device_t parent, device_t sel
return;
}

-   /* requires HID usage Digitizer:In_Range */
-   if (!hid_locate(desc, size, HID_USAGE2(HUP_DIGITIZERS, HUD_IN_RANGE),
-   uha-reportid, hid_input, sc-sc_loc_z, flags)) {
-   aprint_error_dev(sc-sc_hdev.sc_dev,
-   touchscreen has no range report\n);
-   return;
-   }
+   sc-z_enabled = hid_locate(desc, size,
+   HID_USAGE2(HUP_DIGITIZERS, HUD_IN_RANGE),
+   uha-reportid, hid_input, sc-sc_loc_z, flags);

/* multi-touch support would need HUD_CONTACTID and HUD_CONTACTMAX */

@@ -215,8 +212,12 @@ uts_attach(device_t parent, device_t sel
sc-sc_loc_x.pos, sc-sc_loc_x.size));
DPRINTF((uts_attach: Y\t%d/%d\n,
sc-sc_loc_y.pos, sc-sc_loc_y.size));
-   DPRINTF((uts_attach: Z\t%d/%d\n,
-   sc-sc_loc_z.pos, sc-sc_loc_z.size));
+   if (sc-z_enabled) {
+   DPRINTF((uts_attach: Z\t%d/%d\n,
+   sc-sc_loc_z.pos, sc-sc_loc_z.size));
+   } else {
+   DPRINTF((uts_attach: Z disabled\n));
+   }
 #endif

a.accessops = uts_accessops;
@@ -368,7 +369,7 @@ uts_intr(struct uhidev *addr, void *ibuf
} else
dy = -hid_get_data(ibuf, sc-sc_loc_y);

-   dz = hid_get_data(ibuf, sc-sc_loc_z);
+   dz = sc-z_enabled ? hid_get_data(ibuf, sc-sc_loc_z) : 0;

if (hid_get_data(ibuf, sc-sc_loc_btn))
buttons |= 1;


Re: BPF memstore and bpf_validate_ext()

2013-12-29 Thread Alexander Nasonov
Mindaugas Rasiukevicius wrote:
 Moreover, the usual byte-code produced by tcpdump/pcap does not
 even use the memory store so you optimisations would most of the
 time be applicable anyway!

This is not always the case. For instance,

# tcpdump -y IEEE802_11 -i urtwn0 -d not tcp
tcpdump: data link type IEEE802_11
(000) ldx  #0x0
(001) txa
(002) add  #24
(003) st   M[0]
(004) ldb  [x + 0]
(005) jset #0x8 jt 6jf 11
(006) jset #0x4 jt 11   jf 7
(007) jset #0x80jt 8jf 11
(008) ld   M[0]
(009) add  #2
(010) st   M[0]
(011) ldb  [0]
(012) jset #0x4 jt 27   jf 13
(013) ldb  [0]
(014) jset #0x8 jt 15   jf 27
(015) ldx  M[0]
(016) ldh  [x + 6]
(017) jeq  #0x86dd  jt 18   jf 27
(018) ldx  M[0]
(019) ldb  [x + 14]
(020) jeq  #0x6 jt 37   jf 21
(021) ldx  M[0]
(022) ldb  [x + 14]
(023) jeq  #0x2cjt 24   jf 27
(024) ldx  M[0]
(025) ldb  [x + 48]
(026) jeq  #0x6 jt 37   jf 27
(027) ldb  [0]
(028) jset #0x4 jt 38   jf 29
(029) ldb  [0]
(030) jset #0x8 jt 31   jf 38
(031) ldx  M[0]
(032) ldh  [x + 6]
(033) jeq  #0x800   jt 34   jf 38
(034) ldx  M[0]
(035) ldb  [x + 17]
(036) jeq  #0x6 jt 37   jf 38
(037) ret  #0
(038) ret  #65535

Alex


Re: BPF memstore and bpf_validate_ext()

2013-12-20 Thread Alexander Nasonov
Sorry for top-posting. I'm replying from my phone.

I've not looked at linux bpf before. I remember taking a quick look at 
bpf_jit_compile function but I didn't like emitting binary machine code with 
macro commands.

I spent few minutes today looking at linux code and I noticed few interesting 
things:

- They use negative offsets to access auxiliary data. So, there is a clear 
distinction between local memory store and external data. I don't think it's a 
new addition, though.
- They have a big enum of commands. Many of them translate to bpf commands but 
there are also special commands like load protocol number into A. There is a 
decoder from bpf but I have no clue how it works.
- Those commands are adapted to work with skbuf data.

Alex


20.12.13, 04:16, David Laight da...@l8s.co.uk:
 
 On Fri, Dec 20, 2013 at 01:28:12AM +0200, Mindaugas Rasiukevicius wrote:
  Alexander Nasonov al...@yandex.ru wrote:
   
   Well, if it wasn't needed for many year in bpf, why do we need it now? ;-)
   
  
  Because it was decided to use BPF byte-code for more applications and that
  meant there is a need for improvements.  It is called evolution. :)
 
 Has anyone here looked closely at the changes linux is making to bpf?
 
   David
 
 -- 
 David Laight: da...@l8s.co.uk


-- 
Alex


Re: BPF memstore and bpf_validate_ext()

2013-12-19 Thread Alexander Nasonov
Mindaugas Rasiukevicius wrote:
   That is great, but we are going circles here.  If a program just needs
   some values stored in the memory store - why would you create a COP to
   get few integers instead of simply letting the caller to pass them.  It
   is the thing which actually has those numbers.

Well, if it wasn't needed for many year in bpf, why do we need it now? ;-)

To answer your question about a program that needs some values stored,
you want a protocol to pass some values from the host C environment to
embedded BPF environment. You do it in a very ad-hoc manner. It's
not possible to say by looking at bpf program which memwords are passed
from outside and which are really internal.

BPF is a language and all its extensions should be designed with
this in mind.

 I have already explained the benefits of the external memory store in this
 thread (simple, straightforward way to pass values and have their cache).
 Your main argument seems to be that the external memstore makes it more
 difficult for bpfjit to optimise certain corner cases (which, as been
 pointed out, are not common).

Well, I was arguing about performance because you were concerned about
performance of your bpf programs.

Your solution lacks a concept. You're mixing things together without
bothering much about how will they interact to achieve a common
goal. I already pointed out that your COP is powerful enough to
copy external data to a memory store. To add to my other point
above, you're mixing local and global memory. Or, if you use an
analogy with Lua (that uses a byte code to interpret program),
there are local and global variables with a very clear distinction
between them. With your proposal, there is no clear distinction
inside bpf program which memory words are internal and which are
external.

Though, I don't think I want to see global memory in bpf.

 Given that BPF did not have JIT compiler for many years and even a very
 simplistic JIT compilation is a huge win - I do not understand why do you
 consider corner case optimisations as a higher value than the benefits
 provided by the external memstore.

I personally see no benefits in external memstore at all. You can copy
external data with a special copfunc.

  BPF_COP
  is powerful enough for getting external data. It's a bit slower but I
  can give you SLJIT_FAST_CALL interface. If it's still slow for you,
  you should stop using sljit and consider other alternatives.
 
 That is wonderful, but not the point..

Ok, let me make a point them. If you want to make significant changes to
bpf, you'd better start from scrach and design whatever you want. I'd be
easier than arguing about bpf changes forever.

Alex


Re: BPF memstore and bpf_validate_ext()

2013-12-18 Thread Alexander Nasonov
Mindaugas Rasiukevicius wrote:
 Alexander Nasonov al...@yandex.ru wrote:
  My point is that you mix argument pack with something else. They
  should be separeted.
 
 The external memory store can be used as an argument (and the initial
 values determined as proposed in this thread).  If you want, we can pass
 it as a third argument, I just think it is a pointless indirection level.
 Would even need extra wrapping i.e. more work in bpfjit, but if you want
 that separated - fine.

I'll tell you more. Mixing mem with arguments will not only save
me one register/one indirection level but it will allow to use
memory addressing. It's very attractive solution from a performance
perspective. But it's clearly a hack and I don't want to have hacks
in public API.

  I already offered to support SLJIT_FAST_CALL copfuncs in bpfjit.
  They're much faster than regular copfuncs.  But that's mean you
  will need to emit sljit code and you will have a limited number of
  sljit registers and all other limitations of sljit. You still should
  be able to copy data from auxiliary argument to memstore and you
  can do it quite fast. ...
 
 That is great, but we are going circles here.  If a program just needs
 some values stored in the memory store - why would you create a COP to
 get few integers instead of simply letting the caller to pass them.  It
 is the thing which actually has those numbers.

What if your program just needs something else? Does it mean we have
to add a new feature to bpf? No. We have to stop somewhere. BPF_COP
is powerful enough for getting external data. It's a bit slower but I
can give you SLJIT_FAST_CALL interface. If it's still slow for you,
you should stop using sljit and consider other alternatives.

  It's easy to
  suggest to have a flag but it's actually a lot or work. You need to
  write several lines of C code to generate a single instruction.
 
 The flag would basically say treat the memstore as internal i.e. just
 do all the optimisations, because I assure there are no side effects.
 It is a green light for what you said you already want to implement.
 That would be one-liner check or I miss something?

Ok, with my recent change on github, it's indeed one-line change
but having two different modes can introduce subtle bugs. If you
forget to set the mode properly, you may have a situation when your
memory is in a bad state only for packets of length 131 (or whatever)
and it's consistent for all other lengths.
I'm as a maintainer of bpfjit is responsible for a robust public
interface.

Alex


Re: BPF memstore and bpf_validate_ext()

2013-12-15 Thread Alexander Nasonov
Mindaugas Rasiukevicius wrote:
 Also, it was you who proposed sljit.

Proposed for what? I implemented bpfjit using sljit if that's what
you mean. I offered you a help with implementing jit compiler for
npfcode. It was your idea to add COP/COPX and I agreed to implement
a support for it in bpfjit. I never agreed on implementing external
memory.

 It can optimise *most* practical
 cases (80-20 rule) and I am happy with that.  I do not understand why
 are you concerned about those rare/unusual cases.  Do you have some
 particular application in mind?  Something else than in our tree?

I don't have any application in mind but I don't understand why are you
pushing two extentions to bpf solely to get performance benefit for
your cases and you don't care that bpf looses performance even if there
are no cop instructions in a program at all.

 We can pass the memstore pointer as a separate argument (it would be
 three arguments, fine for sljit), but what's the point..

My point is that you mix argument pack with something else. They
should be separeted.

 Why are you ignoring the fact that your optimisations can still be added
 and be effective?  I already suggested - we can add a flag to indicate that
 the caller does not care about the result in the memory store.

I already offered to support SLJIT_FAST_CALL copfuncs in bpfjit.
They're much faster than regular copfuncs.  But that's mean you
will need to emit sljit code and you will have a limited number of
sljit registers and all other limitations of sljit. You still should
be able to copy data from auxiliary argument to memstore and you
can do it quite fast.

You didn't respond to me about it. If you ingored it because you don't
want to deal with sljit than you're pulling a blanket. It's easy to
suggest to have a flag but it's actually a lot or work. You need to
write several lines of C code to generate a single instruction.

I don't want to maintain two different modes of code generation.
If you want this flag, go ahead, write the code, write the tests
and everyone will be happy.


 Moreover,
 the usual byte-code produced by tcpdump/pcap does not even use the memory
 store so you optimisations would most of the time be applicable anyway!

Maybe in this case. I don't know all use cases. There are some
IDS/IPS that use bpf but I never looked at them.

In any case, this functionality will have to be tested.

Alex


Re: BPF memstore and bpf_validate_ext()

2013-12-10 Thread Alexander Nasonov
Mindaugas Rasiukevicius wrote:
 Alexander Nasonov al...@yandex.ru wrote:
  In your case BPF_COP is always the first instruction (or in the
  first linear block). Because it's the first and it's a function
  call, it can be moved outside of bpf program and inlined.
  
  If this is the case, you don't need BPF_COP at all.
 
 For this particular case - yes, correct.
 
  Except of course you may run out or memwords one day and you want
  to have BPF_COP as a fallback.
 
 I still need BPF_COP for other tasks (e.g. NPF_COP_TABLE).  Running out
 of words is unlikely, but COP can certainly be used to handle that too.

I wonder why do you need two different features when one is a
superset of the other if you could use BPF_COP? If the only reason
is performance, external memory is not a win-win proposition.

PS sljit has a non-standard fast call mechanism. If you're really
concerned about performance, you can generate copfuncs and I can
call them from bpfjit.

Alex


Re: BPF memstore and bpf_validate_ext()

2013-12-09 Thread Alexander Nasonov
Mindaugas Rasiukevicius wrote:
 Now that the BPF memory store can be external and can be provided by the
 caller of the BPF program, we can use to it pass some values or reuse it
 as a cache.

I don't think that external memory is needed. You added an auxiliary
agrument, what is it for if it's not for passing some values?

External memory disables several optimizations in bpfjit for most
filter programs even if they don't use BPF_COP.

1. sljit has a limited number of registers especially memory addressable
   registers. I will have to allocate a register to a memstore pointer.
   Because all registers are already assigned, I will have to start
   moving data. The best I can think of is assigning BPFJIT_TMP2 to a
   new pointer but I will need to switch to EREG register in 32bit
   BPF_LD. This register is emulated on some arches, for instance on
   i386.
2. I have (or plan to have) some simple optimizations which aren't
   possible by rewriting a bpf program. For instance, if there are
   multiple loads in a linear block, I will only generate one index
   check.
   Here is an example: LD+W(0) ST(0) LD+W(4) ST(1) LD+W(8) ST(2) ...
   It loads packet words with increasing offsets and stores them
   in memwords for later processing. I generate only only one index
   check for the first LD instruction (return 0 is packet is shorter
   than 12 bytes). I can exit early without storing any word because
   I know that memory will not be available after that bpf program
   returns.
   If you make it external, I will have to generate three index
   checks to make sure that stores are visible to a caller for all
   possible packet lengths.

 However, this has to be supported by bpf_validate() - we have
 to tell it which words are going to be initialised by the caller, since it
 currently checks and prevents loads of the uninitialised memstore words.
 
 Hence, here is the proposed API function for this:
 
 int bpf_set_initwords(bpf_ctx_t *bc, uint8_t *wordmask, size_t len);

Your description of the new API is too terse to be absolutely
certain but it doesn't look like my proposal I sent to you privately.

I think you can add a mask to each copfunc which indicates which
of the words it loads and which words are stored by the copfunc.

For instance, your npf_cop_l3 will look like

{ .copfunc = npf_cop_l3,
  .loads  = BPF_COP_LOAD_NONE,
  .stores = BPF_COP_STORE(BPF_MW_IPVER) |
BPF_COP_STORE(BPF_MW_L4OFF) |
BPF_COP_STORE(BPF_MW_L4PROTO)
},
/* ... other copfuncs ... */

Validation of BPF_COP instruction will look very natural:

/* inside BPF_COP in bpf_validate_ext */
if (bc-copfuncs) {
if (bc-copfuncs[i].inwords  invalid)
goto out;
invalid = ~bc-copfuncs[i].outwords;
}

It's a bit more trickier for BPF_COPX because you need to pre-calculate
loads/stores masks but it's doable.

At the moment, bpf_filter_ext() resets the invalid mask. This
translates to .stores = BPF_COP_STORE_ALL.

Alex


Re: [patch] put Lua standard libraries into the kernel

2013-11-28 Thread Alexander Nasonov
Marc Balmer wrote:
 Am 27.11.13 22:23, schrieb Martin Husemann:
 
  Can't it be a per-state option, passed by luactl when creating the state?
 
 That is actually an excellent idea.  So what should be the default,
 stdlibs enabled or not enabled?

I'm a bit late to join the conversation but I wanted to emphasize one
quite important thing:

Please keep look  feel of plain userspace Lua even in the kernel space.

If you look at NetBSD C code in the kernel, and compare it with some
other's OS kernel code, NetBSD code will likely look more natural.
I hope that one day we can say the same about Lua code in NetBSD.

There is no need to invent new things if they're already provided by the
languange.

So, I don't think we need a special sysctl to control loading of standard
libraries. Lua already gives you luaL_openlibs() and luaL_requiref().

Alex


Re: [patch] put Lua standard libraries into the kernel

2013-11-28 Thread Alexander Nasonov
Marc Balmer wrote:
 If they are, then of course the standard functions like luaL_openlibs()
 are used.  So this not about a replacement mechanism for
 luaL_openlibs(), but to control whether luaL_openlibs() is called or not.

Yes, I understand this but I don't understand why to you need a special
control for it. It's your choice as a programmer to call that function
or not to call. Why do you need a special variable/syscal to control this?

Alex


Re: [patch] put Lua standard libraries into the kernel

2013-11-28 Thread Alexander Nasonov
Alexander Nasonov wrote:
 Yes, I understand this but I don't understand why to you need a special
 control for it. It's your choice as a programmer to call that function
 or not to call. Why do you need a special variable/syscal to control this?

Ok, I missed luactl(8) part I guess. I don't know your usecase but this
remote orchestration of kernel Lua state from userspace sounds a bit
dodgy to me.
Since you support 'luactl require' you can use it to load base libraries.
There is no need for a boolean flag if you already have a more generic
mechanism in place.
Alex


Re: in which we present an ugly hack to make sys/queue.h CIRCLEQ work

2013-11-21 Thread Alexander Nasonov
matthew green wrote:
 thanks to dholland, apb, joerg, martin, matt, and skrll for this
 least-worst-so-far solution.

I'm pretty sure that __attribute__((__may_alias__)) was on the table
but I wonder why was it rejected?
Thanks,
Alex


Re: [patch] changing lua_Number to int64_t

2013-11-17 Thread Alexander Nasonov
Mouse wrote:
 Also, using an exact-width type assumes that the hardware/compiler in
 question _has_ such a type.
 
 It's possible that lua, NetBSD, or the combination of the two is
 willing to write off portability to machines where one or both of those
 potential portability issues becomes actual.  But that seems to be
 asking for trouble to me; history is full of but nobody will ever want
 to port this to one of _those_ that come back to bite people.

I was perfectly fine with long long because it's long enough to
represent all integers in range [-2^53-1, 2^53-1].

As Marc pointed out, Lua has a single numeric type which is double
by default. Many Lua libraries don't need FP and they use a subset of
exactly representable integers (not all of them do range checks, though).
Extending the range when porting from userspace to kernel will decrease
the pain factor of porting.

Alex


Re: A Library for Converting Data to and from C Structs for Lua

2013-11-17 Thread Alexander Nasonov
Marc Balmer wrote:
 I came accross a small library for converting data to an from C structs
 for Lua, written by Roberto Ierusalimschy:
 
 http://www.inf.puc-rio.br/~roberto/struct/
 
 I plan to import it and to make it available to both lua(1) and lua(4)
 as follows:
 
 The source code will be imported into
 ${NETBSDSRCDIR}/sys/external/mit/struct unaltered and then be modified
 to compile on NetBSD.

Shouldn't it be mit/luastruct?

 Then ${NETBSDSRCDIR}/sys/module/luastruct/ and
 ${NETBSDSRCDIR}/lib/lua/struct/ directories will be added with the
 respective Makefiles etc.

Are you going to make it kmod? It's an overkill, IMO. Not every Lua
module a kernel module. The framework should support this distinction.

PS I don't see devel/lua-struct. It'd be nice to have it in pkgsrc.

Alex


Re: [patch] changing lua_Number to int64_t

2013-11-16 Thread Alexander Nasonov
Lourival Vieira Neto wrote:
 On Sat, Nov 16, 2013 at 8:52 PM, Christos Zoulas chris...@astron.com wrote:
  In article 52872b0c.5080...@msys.ch, Marc Balmer  m...@msys.ch wrote:
 Changing the number type to int64_t is certainly a good idea.  Two
 questions, however:
 
  Why not intmax_t?
 
 My only argument is that int64_t has a well-defined width and, AFAIK,
 intmax_t could vary. But I have no strong feelings about this. Do you
 think intmax_t would be better?

int64_t should be enough to cover a range of exactly representable
integers in userspace Lua program where lua_Number is double.

I don't see a need for bigger type unless mainstream Lua switches to
long double. I don't expect it to happen any time soon.

PS Why do you still use a shadow copy of luaconf.h? Please add your
changes to the main luaconf.h. If you guard your kernel changes properly
with _KERNEL, they will not affect userspace.

PPS %PRId64 may break in C++11, space between the literals should fix it.

Alex


Re: Moving Lua source codes

2013-10-21 Thread Alexander Nasonov
Marc Balmer wrote:
 Yes, this is an issue.  Dunno if we need a 'kluac' or so, at the moment
 I'd say loading code from source form is ok.

Supporting binary chunks is more challenging because binary format can
change completely in a new Lua version. Source code is more stable,
there are often small changes in the language too (like new keywords)
but I'd say they are manageable.

Alex


Re: Lua in-kernel (lbuf library)

2013-10-19 Thread Alexander Nasonov
Terry Moore wrote:
 Just to clarify a bit
 
  Indeed, we started with Lua because AWK was not embeddable and because of
  the 1-origin issue. 
 
 We thought, mistakenly, that a language that didn't look very much like C
 would cause fewer problems because of the 0-origin / 1-origin difference.
 Apparently, however, it's a deeper habit of thought. Ditto for the string
 escape sequences. Apparently the '[' and ']' for indexing, and the '\' for
 character escapes, trigger very deeply seated patterns.

While I agree that 0-1 switch is mentally hard [*] but if you often need
to access arrays by index in Lua code then you either solving a wrong
problem of doing something a wrong way.

Lua is designed to be a glue language. If you need arrays (as opposite
to collections which can be iterated over with ipairs/pairs/iterators),
then you're likely doing low-level C stuff in Lua.

Data structures in kernel are often organized as linked lists. To
iterate from Lua, you probably want to write an iterator and also
a lookup function.

Lets say you want to iterate over all ps processes from ddb (if gdb
has python support, why can't we have a cooler thing?). You do this:

ddb lua on
ddb for p in processes:match(ps) do print(p.pid) end
906
2245
4935
...
ddb =processes:find(1).path
/sbin/init
ddb lua off

You can do a lot without ever accessing elements by integer indices.

 Based on our experience, it seems risky to use Lua to implement code that
 interoperates with kernel-like C code in security-critical contexts. We
 found this risk to be insurmountable. Engineers who are used to zero-origin
 code, and who are looking at zero-origin code for reference, will make
 zero-origin mistakes. 

Again, don't do C stuff in Lua. However, there is a different kind of
risk when developing security code in Lua: it's the layers of Lua
itselft. Any complex layer introduces a non-neglidgible risk.

[*] I found it out while developing mixed Lua-C module which could also
detect LuaJIT and use its zero-based FFI structures.

Alex


Re: Lua in-kernel (lbuf library)

2013-10-15 Thread Alexander Nasonov
Christoph Badura wrote:
 Also, having to switch mentally between zero-based arrays in the kernel C
 code and 1-based arrays in the Lua code make my head ache.

Yeah, I totally agree here. There are several other reasons why Lua will
not become same league player with C in the kernel. But for some
projects, the classical module (in C) and scripting (in Lua) separation
works extremely well. This includes complex configurations where you
need to orchestrate many calls to C code or some complex tasks like
generating code for bpf or now defunct npf opcode.

Alex


Re: Lua in-kernel (lbuf library)

2013-10-15 Thread Alexander Nasonov
Lourival Vieira Neto wrote:
 I'm developing a library to handle buffers in Lua, named lbuf. It is
 been developed as part of my efforts to perform experimentation in
 kernel network stack using Lua. Initially, I intended to bind mbuf to
 allow, for example, to write protocols dissectors in Lua. For example,
 calling a Lua function to inspect network packets:
 
 function filter(packet)
   if packet.field == value then return DROP end
   return PASS
 end
 
 Thus, I started to design a Lua binding to mbuf inspired by '#pragma
 pack' and bitfields of C lang. Then, I realized that this Lua library
 could be useful to other kernel (and user-space) areas, such as device
 drivers and user-level protocols. So, I started to develop this
 binding generically as a independent library to give random access to
 bits in a buffer. It is just in the early beginning, but I want to
 share some thoughts.

I wonder if you looked at Lua support in Wireshark [1]? Unfortunately,
it's GPL and they even have a special section 'Beware the GPL' on wiki.

[1] http://wiki.wireshark.org/Lua

Alex


Re: Adding Lua to the kernel and moving Lua source codes

2013-10-07 Thread Alexander Nasonov
Jean-Yves Migeon wrote:
 Le 07/10/2013 12:05, Alan Barrett a écrit :
 I still haven't seen a use case for in-kernel Lua.  I mean, an
 example (preferably a working example) of something useful that could
 not easily be done without in-kernel Lua.  I'd prefer not to see it
 added to the base system without a use case.
 
 I second the use case.
 
 Not something as polished or finished as possible, but at least
 shows that it is useful (I am well aware of the cause/effect circle
 ie. you cannot prove it without having Lua first available, but
 breaking that vicious circle with a few examples can help).

In the early days of bpfjit when I didn't yet know of sljit, I was
considering ripping off lua code for generating machine instructions
from LuaJIT2 code.

I still believe that rewriting bpfjit in Lua would improve readability.
I even started a rewrite mostly as a good use-case for Lua bindings for
sljit but it's low priority project for me.

Alex


lua_Number in the kernel

2013-10-01 Thread Alexander Nasonov
[ Ccing to Justin who seems to be interested in Lua in NetBSD
  but I'm not sure whether he's subscribed to tech-kern@ ].

Like some other people, I beleived that Lua kernel project is dormant
and was just waiting for any activity before starting a discussion here
but Marc replied today to an ongoing discussion on developers@. Hence,
my post.

It's very important to decide on lua_Number type in kernel early because
it affects nearly all Lua code and also because arithmetic operations is
often a cause of security holes.

In Lua, you can use your-own signed type for numbers. De-facto standard
for userspace is double (note that LuaJIT supports _only_ double type)
but ptrdiff_t is also recommented if for some reason double isn't an
option.

So, ptrdiff_t looks like a reasonable choice for the kernel but when I
was developing sljit binding for Lua [1] and I was trying to make it
robust against overflows for two different types (double and ptrdiff_t)
on both 32bit and 64bit platforms, I ended up creating bindings [2] for
arbitrary precision library.

The problem is that there are three different ranges for integer
arithmetic:

1. IEEE 754 double is a always -2^53-1 to 2^53-1 regardless of platform,
2. ptrdiff_t on 2s-complement 32bit platform is -2^31 to 2^31-1,
3. ptrdiff_t on 2s-complement 64bit platform is -2^63 to 2^63-1.

Note also that min values of ptrdiff_t on 2s-complement platforms don't
have a regular math semantic (e.g. negation produces identical value).

It's very challenging to write a code that supports all three options
without resorting to #ifdef (or Lua equivalent) or splitting a number
into low and high halves.

I'd like to propose that lua_Number in the kernel should always be
int64_t (*). This type will guarantee regular arithmetic rules for
the range (-2^53, 2^53) and for 32-bit signed integer range, in
particular.

When possible, Lua code can assume 32-bit arithmetic (without
2s-complement iggerularities). Some Lua libraries (for instance, BitOp)
assume 32bit. Other libraries that we might port to the kernel are often
written in assumption that lua_Number is double and int64_t should be
safe to use too.

When you need a full 64-bit range (signed or unsigned), arbitrary
precision library can be used instead.

[1] https://github.com/alnsn/luaSljit
[2] https://github.com/alnsn/luaBn
(*) I assume that int64_t type is available on all supported platforms
but it can be emulated by a compiler, of course.

Alex


Re: Importing lua(4), but where in the source tree?

2013-01-09 Thread Alexander Nasonov
Marc Balmer wrote:
 Sure. The full diff is at 
 http://www.netbsd.org/~mbalmer/diffs/kernel_lua_010.diff and it's the files 
 that the diff now places in sys/modules/lua/ that I think should better go to 
 sys/dev/lua/

These placeholder files look hackish. Do you compile Lua with all extras
like turned off? How many in-place changes do you anticipate?

As you move Lua to the kernel space, dist location should be moved to
sys/external.

Alex


Re: Importing lua(4), but where in the source tree?

2013-01-09 Thread Alexander Nasonov
Alexander Nasonov wrote:
 These placeholder files look hackish. Do you compile Lua with all extras
 like turned off? How many in-place changes do you anticipate?

... and how many of these changes will reside outside if luaconf.h?

 As you move Lua to the kernel space, dist location should be moved to
 sys/external.

Alex


  1   2   >