Re: Hurd SSI paper

2019-06-17 Thread Brent W. Baccala
On Sat, Jun 15, 2019 at 11:05 AM Almudena Garcia 
wrote:

> Great job!! I'm reading the paper, looks very good.
>
> I'm working in SMP support, using xAPIC (32 bit yet). Your work makes more
> sense to my project :-)
> You can find my work here: https://github.com/AlmuHS/GNUMach_SMP
>

I haven't been working on Hurd for the last year or so, but I do pop in and
browse the mailing list from time to time, so I am aware of your work.

If Hurd's going to be a player on the operating system stage, we obviously
have to get SMP (and 64-bit addressing) working, or we won't be taken
seriously.

So, you're doing the "grunt work" to move this project forward, and I
applaud your effort!

agape
brent


Re: Hurd SSI paper

2019-06-17 Thread Brent W. Baccala
On Mon, Jun 17, 2019 at 4:12 AM Joan Lledó  wrote:

> Hi Brent,
>
> I've only read the introduction sections for now, but I found a little
> typo: the title of the paper says: "Building an SSI *Cluter* with
> GNU/Hurd", shouldn't it be "Cluster"?, it lacks the 's'.
>

Perhaps a Freudian slip?

Thank you.

agape
brent


Hurd SSI paper

2019-06-14 Thread Brent W. Baccala
FYI -

I had a request to write a paper documenting my work on Hurd, so I did.
It's called "Building an SSI Cluster with GNU/Hurd" and it's available here:

https://www.freesoft.org/software/hurd/building.pdf

The LaTeX source is also in the github repository (link in the paper).

Feedback is welcome!

agape
brent


Re: Personal Introduction

2018-04-05 Thread Brent W. Baccala
On Thu, Apr 5, 2018 at 7:27 PM, Samuel Thibault <samuel.thiba...@gnu.org>
wrote:

> Brent W. Baccala, le jeu. 05 avril 2018 19:06:23 -0400, a ecrit:
>
> Yes, Mach is
> sort of an exception, because it was merely the ground for the whole
> kernel.  But being BSD-licenced, it was not posing problems for future
> re-licensing.


Well, Mach is what we're talking about now, and if being BSD-licensed
doesn't pose a problem, then why should my contributions be a problem if
they're GPL-licensed?

Please make a decision about the 169 line patch I attached to my earlier
email.  You can use it under the GPL, and I'm even willing to assign
copyright on it to the FSF.  But it's not going to be "all past and future
work" on Hurd, or gnumach, or anything else.

I think Charlie Sale deserves some clear guidance on whether or not he can
base a new tracing facility on that patch.

agape
brent


Re: Personal Introduction

2018-04-05 Thread Brent W. Baccala
On Thu, Apr 5, 2018 at 6:53 PM, Samuel Thibault <samuel.thiba...@gnu.org>
wrote:

> Brent W. Baccala, le jeu. 05 avril 2018 18:42:24 -0400, a ecrit:
>
> > that the Debian maintainers have threatened to drop us completely unless
> we get
> > our code upstreamed into the main glibc code base.
>
> Nope. That never happened, you are completely inventing this.
>
> What did happen is that upstream glibc said they'd really want to have
> glibc master actually build on the Hurd (and even cross-build, which is
> not the harder part) so that they can do build tests while revamping the
> code.  That only just makes sense.
>

Sorry, then.  I haven't followed that thread closely.  Thank you for
correcting me.


> > I'm hoping the copyright assignment won't be an issue with the kernel
> proper,
> > since it's based on Mach and isn't owned by the FSF, but I'm not 100% of
> that.
>
> Mach is covered by copyright assignments too, only Mig is not.
> The fact that the base code copyright isn't owned by the FSF doesn't
> mean the changes can't be.


Well, then I guess FSF is not going to accept my code into gnumach, either.

It sure seems that copyright exceptions are made for big pieces of code
(Mach, LWIP), but not for contributions from individual developers.

agape
brent


Re: Personal Introduction

2018-04-05 Thread Brent W. Baccala
On Thu, Apr 5, 2018 at 11:20 AM, Charlie Sale 
wrote:

> Hey Brent
>
> I would be willing to help with that project. I'll see what I can do to
> contribute.
>
> You said that you had some code written. Where can I find it? Is it in a
> branch on the main tree?
>

No, unfortunately.  Part of the reason is how we interact with Debian.  I
run a Debian/Hurd system, and there are Debian-specific patches that
haven't been incorporated into our main source tree.  Also, when you get a
Debian source tree, it doesn't come with any git version control
information.  If I worked on our git tree, then I'd have to figure how to
apply the Debian patches to get something that actually runs right.  So, I
tend to work on the Debian-ized source tree, without any version control
per se.  It's not ideal.

I'm attaching a patch with the work I've done so far on the tracing
facility.  It doesn't really do anything, yet, just adds a new system port
to each task, and a first crack at a subroutine to create trace messages,
but if you read the design message I sent to the list, you'll see that the
subroutine isn't as sophisticated as I'm now thinking it needs to be.

Also, I should mention that a major issue on the mailing list for the last
two months has been upstreaming glibc.  It's again related to how we
interact with Debian.  There are so many Hurd-specific, Debian-specific
changes to glibc, that the Debian maintainers have threatened to drop us
completely unless we get our code upstreamed into the main glibc code
base.  I'm not working on that code, because the Free Software Foundation
wants me to sign a copyright assignment whose language I object to, but if
you're willing to sign the copyright assignment, I'm sure Samuel Thibault
would appreciate some help with that.  It's a bit of a mess, while the
project that I'm proposing is more self-contained, but you should know that
there's at least one other ongoing project we could use help with.

I'm hoping the copyright assignment won't be an issue with the kernel
proper, since it's based on Mach and isn't owned by the FSF, but I'm not
100% of that.

Also, I gave a lecture on Hurd a few months ago that might interest you.
It's an hour and half, and the first part talks about generic HPC issues,
but the second part goes into some detail about how Hurd is designed, and
my own vision for what I'm trying to do with it.  I've got a screencast on
youtube, if you're interested:

https://www.youtube.com/watch?v=JwsuAEF2FYE





> Also, would you recommend developing in a GNU/Hurd environment as opposed
> to a GNU/Linux environment? I tried running Debian GNU/Hurd in qemu, but I
> had some major troubles with that (keyboard didn't work at all). Should I
> request an account on the main Hurd machine?
>

I agree with Samuel that qemu is the way to go.  I use Hurd exclusively in
qemu, with Linux running on my bare metal.  Don't try to run Hurd bare
metal.  We're not likely to have device drivers to support all of your
hardware, plus if you're going to do kernel work, you really want to have
it in a virtual machine so you can run gdb on the kernel.

We think your keyboard should work!  Once you get it up and running,
though, I don't use its console for day-to-day work.  I ssh into it.


> I am excited to try to help!
>
> Thanks!
> Charlie
>
>
diff -ur -x 'Makefile*' -x debian -x configure -x config.sub -x config.guess gnumach-1.8+git20171101/include/mach/task_special_ports.h mnt/root/gnumach-1.8+git20171101/include/mach/task_special_ports.h
--- gnumach-1.8+git20171101/include/mach/task_special_ports.h	2014-03-17 14:29:32.0 -0400
+++ mnt/root/gnumach-1.8+git20171101/include/mach/task_special_ports.h	2018-03-06 22:31:19.0 -0500
@@ -40,6 +40,8 @@
 #define TASK_EXCEPTION_PORT	3	/* Exception messages for task are
 	   sent to this port. */
 #define TASK_BOOTSTRAP_PORT	4	/* Bootstrap environment for task. */
+#define TASK_TRACE_PORT		5	/* Trace port that receives a copy
+   of all messages sent to/from task. */
 
 /*
  *	Definitions for ease of use
@@ -63,4 +65,10 @@
 #define task_set_bootstrap_port(task, port)	\
 		(task_set_special_port((task), TASK_BOOTSTRAP_PORT, (port)))
 
+#define task_get_trace_port(task, port)	\
+		(task_get_special_port((task), TASK_TRACE_PORT, (port)))
+
+#define task_set_trace_port(task, port)	\
+		(task_set_special_port((task), TASK_TRACE_PORT, (port)))
+
 #endif	/* _MACH_TASK_SPECIAL_PORTS_H_ */
diff -ur -x 'Makefile*' -x debian -x configure -x config.sub -x config.guess gnumach-1.8+git20171101/ipc/ipc_kmsg.c mnt/root/gnumach-1.8+git20171101/ipc/ipc_kmsg.c
--- gnumach-1.8+git20171101/ipc/ipc_kmsg.c	2016-12-19 09:53:33.0 -0500
+++ mnt/root/gnumach-1.8+git20171101/ipc/ipc_kmsg.c	2018-03-07 00:08:16.0 -0500
@@ -2627,6 +2627,73 @@
 	}
 }
 
+/*
+ *	Routine:	ipc_kmsg_trace
+ *	Purpose:

Re: Personal Introduction

2018-04-04 Thread Brent W. Baccala
Charlie -

Welcome to Hurd!

I'm not sure what you consider a small task.  Perhaps you could look at my
March 9th email to this list, entitled "RFC: kernel trace facility".
Briefly, I want to instrument the kernel so that we can trace the messages
going to and from a particular task.  Our current way of doing this (a
program called rpctrace, that you should probably try out for yourself),
leaves a lot to be desired.  I think just about everyone on this list would
agree on the need for such a facility, although its actual design is still
open for debate.

Actual kernel coding is required, which is somewhat rare on this project,
because with a microkernel architecture, so much of what we do is in user
space.

I've started work on it and have a little bit of code written.

Does this look like the right size and complexity for you?  If not, I'll
try to suggest something else.

Thanks for your email, and for any help you can give us!

agape
brent

On Tue, Apr 3, 2018 at 1:36 PM, Charlie Sale 
wrote:

> Hello GNU Hurd
>
> I am new to this list, so I figured I would introduce myself.
>
> After reading this project's website, I am very interested in contributing
> to this project. I have always been interested in learning about/doing some
> kernel development. This seems like an excellent place to start.
>
> While I do need to learn about how the GNU Hurd works, I do have a solid
> foundation in C and I am a quick learner. I would love to contribute to
> this project.
>
> I intend to start reading up on the documentation and digging around in
> the code, but it seems like the best way to learn is start with small
> tasks. Is there a good starting point?
>
> Thanks!
> Charlie
>


RFC: kernel trace facility

2018-03-09 Thread Brent W. Baccala
I've had so many problems with rpctrace that I'm starting work on a kernel
trace facility, and I think the consensus on this list is that we need
something like this.

Here's the idea.

Add a trace port as a new task special port.  The same RPCs that get/set
task ports and exception ports will be used to get/set a task's trace
port.  Task ports will be inherited from the parent on task creation, so
children of traced tasks will run traced by default.

All messages sent and received by the task will be copied to the trace
port, in the same format seen by the task (i,e, the traced task's port
names will be used), prepended by a message header that will identify the
traced task, indicate if the message was sent or received, and include the
return value of the mach_msg() call, to indicate truncated messages using
MACH_RCV_TOO_LARGE.  Out-of-line memory will not be copied.  The traced
message will include the address of the out-of-line memory in the traced
task's memory space, but will not include the out-of-line memory.  In
short, the message will be copied verbatim as seen by the traced task.

Format of the trace messages:

   task_ttask;
   boolean_t send-or-receive;
   unsigned int  type;
   kernel_return_t   retval;
   byte[]message;

'task' is a send right in the port space of the message recipient, which
means that any task receiving trace messages will be getting send rights to
task ports, but since you need such a send right to request the messages in
the first place, I think that's OK.  I might wrap the two booleans into a
single integer.  'send-or-receive' is obvious, and maybe should get wrapped
into 'type' as a flag bit.  'type' indicates if we're tracing a message
that the task itself exchanged, or a message exchanged on one of its
special ports.  Possible types: 'normal', 'task', 'thread',
'task-exception', 'thread-exception'.

Since it's a debugging tool... send timeouts will trigger delivery of a
trace message with MACH_SEND_TIMED_OUT in the trace header.  Most error
returns from mach_msg() will trigger a traced message indicating the error.

No facility will be provided to edit or block the delivery of messages.
However, the trace operation (and thus mach_msg) will block and wait if the
task port's queue is full.

Resource shortages in the kernel will cause trace messages to be quietly
dropped, with nothing more than a printf() to the console.

All syscalls will check current_task()'s trace port.  If it's not IP_NULL,
return SEND_INTERRUPTED to force an actual RPC message to be generated.

New routines:

ipc_kmsg_trace_copy() will be passed a kmsg, will ikm_alloc a new kmsg with
enough space for the trace header and the old message, will copy the old
kmsg into the new one, and return the new one.  It's expected to be called
right after ipc_kmsg_get(), and right before ipc_kmsg_put().

ipc_kmsg_trace_send() will be passed a trace kmsg and a return code.  The
return code will be inserted into the trace kmsg and the message will be
queued to the task's trace port.


mach_msg() and friends will call ipc_kmsg_trace_copy() right after
ipc_kmsg_get() and right before ipc_kmsg_put().  In the get case, we'll
wait until the message has been processed a bit to figure out what return
code should be associated with then, then call ipc_kmsg_trace_send().  In
the put case, we're about to return to user space, so we pretty know what
our return code is and can call trace_send() right after trace_copy().

ipc_kobject_server() will check its destination to see if it's a task port
or a thread port.  If so, it will call the ipc_kmsg_trace() routines for
both the request and the reply.  This ensures that that we'll also see
messages targeted at the task's control ports, even if they come from
another task.  They won't be in the same format, however.  By the time
ipc_kobject_server() runs, the port rights have been translated into kernel
pointers, and that's the format the trace will receive.  Since a message
like vm_map() might include a send right that only exists in some other
task's port space, it doesn't seem like there's too much of an
alternative.  This will leak some kernel addresses, but I don't think
that's too serious, as there's nothing useful the receive can do with them,
baring some kind of Meltdown-type memory leakage bug, but the existence of
such a bug is a separate issue.

Exceptions are also traced.

How to identify threads?  Maybe add an extra header field to the trace
message to indicate which thread a 'thread' or 'thread-exception' message
is localized to.

Comments?

agape
brent


rpctrace on a statically linked executable

2018-03-05 Thread Brent W. Baccala
Hi -

I'm confused by rpctrace's behavior on a statically linked test program.
Here's the program:

#include 

int main(int argc, char *argv[])
{
  printf("Hello world!\n");
}

Perhaps you've seen it before :-)

When I compile the program with -static, I get an executable that works
fine, but can't be traced:

root@qemu-hurd:~# rpctrace ./hello-world
task134(pid8538)->vm_statistics () = 0 {4096 462739 56607 206740 23868
12642497 0 424184 481063 142824515 37245163 15804468 15747574}
Child 8538 Killed

A little more digging shows that the program is receiving an exception
after the vm_statistics call.  It happens very early, before the C library
has set up an exception handler, so the exception goes to rpctrace itself,
but some code fiddling inside rpctrace shows it:

root@qemu-hurd:~# ./rpctrace ./hello-world
135(task134)-->134(task-1)->vm_statistics () = 0 {4096 462739 56536 206804
23868 12643374 0 424194 481436 142836984 37247781 15804960 15748056}
136(task-1)-->121(task1)->exception_raise (   142<--144(pid8561)
134<--145(pid8561) 1 2 20) = 0x4001 (Operation not permitted)
Child 8562 Killed

I'm stumped.  Any idea why tracing a statically linked program will throw
an exception so early, and only when being traced?

agape
brent


Re: gnumach debugger advice

2018-02-15 Thread Brent W. Baccala
On Sun, Feb 11, 2018 at 4:42 AM, Samuel Thibault <samuel.thiba...@gnu.org>
wrote:

> Brent W. Baccala, on dim. 11 févr. 2018 01:23:10 -0500, wrote:
> > So how can I figure out where in processor_set_tasks() (or its
> subroutines)
> > that the kernel has blocked?
>
> show all tasks
>
> Should give you the list of tasks, then
>
> show task $task123
>
> shows the list of threads, then
>
> trace/t $task123.4
>
> shows the backtrace of the thread.


Thank you, Samuel.

I had another problem, though - gdb was attached to the program and it was
halted.  That caused a lot of my threads to report their traces as
"Continuation thread_bootstrap_return".  I had to detach gdb, or at least
"continue" the program, before the thread backtraces reported normally.  I
thought I was doing something wrong with the debugger.

So, there was nothing blocked in processor_set_tasks(); I was just confused.

Thanks again for your helpful reply.

agape
brent


gnumach debugger advice

2018-02-10 Thread Brent W. Baccala
Hi -

Can anybody advice me how to use the gnumach debugger for a particular case?

I've got a subhurd's proc server that is blocked in the kernel RPC
processor_set_tasks().  That thread is holding a global lock in the proc
server and locking up the program while waiting for the RPC to return,
which it isn't doing.

So how can I figure out where in processor_set_tasks() (or its subroutines)
that the kernel has blocked?

agape
brent


attaching rpctrace to running processes

2018-02-09 Thread Brent W. Baccala
Hi -

I've modified rpctrace to attach to running processes and trace them.  I've
added a new set of patches (the 0200 series) to my github repository with
these changes.  I'm still chasing bugs, and it can't detach from processes
without killing them, but it's basically working.

The only big problem is the inability to invisibly swap receive rights.
When rpctrace attaches, it moves all the old port rights to rpctrace, wraps
them, and replaces them with new port rights managed by rpctrace.  This
works fine for everything except a bare receive right with a mach_msg
waiting for messages on it.  Moving such a receive right causes the
mach_msg to return reporting MACH_RCV_PORT_DIED.  I don't see any way
around this without modifying the kernel.  Portsets don't have this
problem; you can pull twenty receive rights out of a portset, put twenty
replacements back in, and it works fine.

For some programs, this isn't a problem.  cat and bash seem to deal with
the error returns to io_read by just retrying the read, and everything is
fine.  Attaching to proc is a hit-or-miss affair, as __pthread_block isn't
as forgiving of error returns, but if you reboot the subhurd and try again,
you can eventually attach to it, like once every half dozen attempts or
so.  I don't understand why, but since my primary goal is tracing proc,
this is good news (I think).

Don't know about others on the list, but I anticipated kernel problems
detaching, and expected attaching to work fine.  Somewhat bummed that it
isn't so simple.

agape
brent


Re: glibc / proc server interaction

2018-02-08 Thread Brent W. Baccala
On Mon, Feb 5, 2018 at 2:42 PM, Brent W. Baccala <cos...@freesoft.org>
wrote:

> On Mon, Feb 5, 2018 at 1:51 AM, Samuel Thibault <samuel.thiba...@gnu.org>
> wrote:
>
>> Hello,
>>
>> Brent W. Baccala, on lun. 05 févr. 2018 00:35:17 -0500, wrote:
>> > How do we know that the proc server knows about the task yet?  It will
>> get a
>> > notification from the kernel that a new task has been created, but how
>> do we
>> > know that the notification has been processed yet?
>>
>> Isn't the notification synchronous ?
>
>
> I don't see how.  The kernel just sends a mach_notify_new_task message,
> and there is no reply.
>

OK, I think I've figured this out.  The proc server's task_find() function
will call add_tasks() if its hash lookup fails, and that function queries
the kernel's task list and adds tasks that it doesn't know about.

agape
brent


libpager, proc server, rpctrace, and other meandering

2018-02-02 Thread Brent W. Baccala
Hi -

I'm writing to update the list on my current work and possibly get some
useful suggestions.

I'm trying to test the multi-client libpager in a subhurd, but I'm having
problems even getting the old libpager code to work cleanly in a subhurd,
so I know I've got problems that aren't just in my code.

My problematic test case is to boot a subhurd that's got a full hurd system
on a second (virtual) hard drive and run dpkg-buildpackage -b  on the hurd
packge's source tree.  You just need to make a copy of your disk available
on hd1 and run boot /dev/hd1.

It runs for a while, and then the whole system hangs.  I get some invalid
right / invalid name errors from the proc server at in its
S_mach_notify_new_task routine (looks like its getting a bogus task port)
and then everything stops.  Doesn't always stop at the same point, but I
can't get through a package build.

This is what I'd really appreciate some help with, from somebody who knows
the proc server better than me.  Like I said, it fails with just stock
code.  I'm thinking it's some kind of race condition relaying messages
between the two proc servers in the main hurd and the subhurd, but I can't
quite puzzle it out.

So, I thought to myself, why not modify rpctrace so that it can attach to
the proc server and trace it?  Maybe I could modify startup to start proc
with rpctrace wrapped around it, but we eventually want rpctrace to attach
and detach running programs, so let me at least get the attach code written.

Went and wrote an attach subroutine for rpctrace (about 400 lines of code),
that seems to work, but with one major problem.  rpctrace sets up its
receive rights so that they have either a send right or a send-once right
associated with them, and the two cases are completely separate.  If you're
importing ports from an attached process, however, you might end up with
receive rights that have both send and send-once rights attached to them.

OK, so I went and modified rpctrace so that it now has a single port
structure that handles both send and send-once rights.

Now, it turns out that I can't detect which kind of right (send or
send-once) a message comes in on, which makes the code a mess (still don't
have it working).  The reason I can't detect it is that the protected
payload optimization doesn't report which type of right the message came in
on.  So, I think, I'll disable the protected payload optimization by
clearing the protected payload first thing after the port is created.  That
doesn't work either, because libports puts a simulated protected payload
right back in.  I'd sure like to have an option on libports to disable
protected payloads and let me see the unmodified Mach messages, but that
doesn't exist.

Or, for that matter, it would be nice if the kernel reported two different
values in MACH_MSGH_BITS_LOCAL (one for send and one for send once) when
protected payloads are in use.

agape
brent


Re: Hurd lecture

2018-01-19 Thread Brent W. Baccala
On Fri, Jan 19, 2018 at 7:35 AM, Ricardo Wurmus  wrote:

>
> Hi Brent,
>
> > I put a screencast of the lecture on youtube:
> >
> > https://www.youtube.com/watch?v=JwsuAEF2FYE
>
> thank you.  This was very interesting.  The introduction to Mach IPC and
> memory management was especially good.  I wonder if a shorter variant of
> this part of the lecture could be used by new contributors as an
> alternative to reading the Mach kernel postscript books.
>

That's a good idea, and I can leverage the work I've already done with the
graphics.  I'll have to think about what we might want in a second video
that isn't in the first one.  Any suggestions?

Personally, I’m very interested in a Single System Image Hurd cluster; I
> still have a bunch of unused Sun cluster nodes with x86_64 CPUs, but
> sadly there is no high-speed network to connect them all (just regular
> old 1G network cards).
>
> In your experience, is high-speed network very important or are there
> ways to make it unlikely that memory has to be transferred across nodes?
>

My experience is that we're nowhere close to 1 Gbps vs 40 Gbps being an
issue.

My primary test environment is a virtual machine that talks to itself as
the "remote", and the performance problems there are severe enough that
running remote programs results in a delay that is noticeable to the human
running the program.

Actually, we're barely even at that point.  Stock hurd can't even execute
remote programs.  You either need to patch the exec server so that it reads
binaries and shared libraries instead of memory mapping them, or use the
multi-client libpager that I've been working on the past six months, but is
still failing some test cases.

See, for example,
http://lists.gnu.org/archive/html/bug-hurd/2016-08/msg00099.html, or do a
reverse chronological search on bug-hurd for "libpager".

At the very end of that video lecture, I suggested how we might address the
performance problems in "netmsg", after we've got a multi-client libpager,
and after we've got a 64-bit user space, and after we've got SMP support.
(Who cares about a cluster where you can only use 4 GB and one core on each
node?)

First, netmsg needs to be rewritten so that it doesn't serialize all the
Mach traffic over a single TCP session.  That's probably its biggest
bottleneck.

After that, I think we should move our networking drivers into a shared
library, so that netmsg can access the PCI device directly.  That would
avoid a lot of context switches, like netmsg <-> TCP/IP stack <-> network
driver.

And it's not as crazy as it might first seem.  Networking devices are
increasingly virtuallized.  Not only can you just fiddle a few software
options to add a virtual PCI device to a virtual machine, but Cisco's vNIC
cards can present themselves as anywhere from 1 to 16 PCI devices.  So just
fiddle a few options in your management console, and even your hardware can
now made a new PCI device appear out of thin air.

So, it makes sense to allocate an entire PCI device to netmsg, and let all
your inter-node Mach traffic run over one interface, while your normal
TCP/IP stack has a separate interface.  Of course, we also need to support
configurations where you can't do that, but I think raw PCI access is going
to be the way to go if you really want performance.

Those are my current thoughts on Hurd cluster performance.  In short,
priorities are:

1. multi-client libpager (nearly usable)
2. 64-bit user space
3. SMP
4. rewrite netmsg to avoid TCP serialization (and other issues)
5. raw PCI device access

agape
brent


Re: Hurd lecture

2018-01-19 Thread Brent W. Baccala
On Fri, Jan 19, 2018 at 4:47 AM, Thomas Schwinge 
wrote:

> > Could we add that to the collection of presentations that we've got on
> the
> > documentation page?
>
> If you'd like to, you can add it yourself using the wiki interface at
> , or even just as a
> patch to the web pages repository,
> .  Or, if
> you prefer, I'll take care of adding it later on.


The web editing interface doesn't seem to be working, and I don't have
write access to the git repositories on either darnassus or savannah.

I'd be happy to add them myself, if somebody would give me the correct
permissions.

agape
brent


Hurd lecture

2018-01-18 Thread Brent W. Baccala
Hi -

FYI, I gave an hour and a half Hurd lecture yesterday at Catholic
University in Washington, D.C.

I put a screencast of the lecture on youtube:

https://www.youtube.com/watch?v=JwsuAEF2FYE

The first part of the video is a general discussion of high performance
computing, leading into my own work developing Hurd as a cluster operating
system.

The Mach/Hurd specific part starts at 43 minutes.

I developed some tikz macros to illustrate Mach/Hurd concepts using LaTeX.
For example, there's an illustration of Mach IPC at 51:20, the memory
object protocol at 1:09:30, and the netmsg server is illustrated at 1:16:25.

Others might find these macros useful for constructing similar
presentations.  If so, I've got the LaTeX source for the slides in my
github hurd repository here:

https://github.com/BrentBaccala/hurd/blob/master/slides/cua.tex

I might revise it a bit; the macros are a bit clunky.

A finished PDF is available here:

https://drive.google.com/open?id=1Wv5dZpbKP17-H1uUOb9hnnBvXBf2LHiL

Could we add that to the collection of presentations that we've got on the
documentation page?

agape
brent


Re: race condition in libports

2018-01-05 Thread Brent W. Baccala
On Fri, Jan 5, 2018 at 7:24 PM, Samuel Thibault <samuel.thiba...@gnu.org>
wrote:

> Hello,
>
> Brent W. Baccala, on ven. 05 janv. 2018 17:45:57 -0500, wrote:
> > I've "fixed" this by making sure we don't remove the hash table entry
> unless
> > there are exactly two weak references outstanding, but I'm not sure
> that's the
> > best way to handle it.  It doesn't seem like the dropweak routine should
> have
> > to be so careful; it shouldn't get called twice like that.
>
> Well, AIUI from the documentation, dropweak should manage by itself how
> many weak references it should drop. I.e. the number of times dropweak
> is called doesn't matter, the first call should drop them, and later
> calls should be fine that there aren't any to drop any more. So it seems
> to me that your proposed fix is correct. I just rewrote it to make sure
> to get coherent counts between hard and weak.


Well, in that case, perhaps it should work by checking to see if the port
is in the hash, rather than by looking at how many weak references it's
got.  Something more like this:

  if ((refcounts_hard_references(>pi.refcounts) == 0)
  && (* i->id_hashloc == i))
{
  /* Nobody got a send right in between, we can remove from the hash.
*/
  hurd_ihash_locp_remove (, i->id_hashloc);
  ports_port_deref_weak (>pi);
}

Also, what documentation?  The hurd texinfo file?  That's the only place
that I know of where this is documented.

agape
brent


Re: race condition in libports

2018-01-05 Thread Brent W. Baccala
On Sun, Dec 17, 2017 at 11:41 PM, Brent W. Baccala <cos...@freesoft.org>
wrote:

> On Sun, Dec 17, 2017 at 8:02 AM, Samuel Thibault <samuel.thiba...@gnu.org>
> wrote:
>
>> Hello,
>>
>> Brent W. Baccala, on sam. 16 déc. 2017 21:37:05 -0500, wrote:
>> > basically, we're storing ports in a inode-to-port hash, looking them
>> > up when io_identity() gets called, and removing them from the hash
>> > when the class's clean routine gets called.
>>
>> That's the bug: one needs a reference for this.  And it's a weak
>> reference: identity is fine with getting rid of it.  Could you try the
>> attached patch?
>
>
> Makes sense.  I've applied it.  So far, so good.
>

I'm still having problems with this, even with the stock ext2fs/libpager
code.  To reproduce these bugs, I suggest making a copy of the virtual disk
(hd0 -> hd1), booting a subhurd on hd1, then trying to build the entire
hurd package (dpkg-buildpackage -b).  I've yet to successfully build the
hurd packages in a subhurd - there's always some bug or another that gets
triggered.  If not this one, then I've been having problems with the proc
server.

Back to the io_identity code...

Part of the problem is that sizeof(ino_t) is 8 bytes, while
sizeof(hurd_ihash_key_t) is only 4 bytes.  So we need to use the
"generalized key interface" to hash the full 8 bytes.

But that doesn't fix everything.  I'm seeing periodic double removal from
the hash table.  It originally manifested as an underflow on the weak
reference count, but I added a line:

   assert_backtrace(hurd_ihash_value_valid(* i->id_hashloc));

right before the hurd_ihash_locp_remove in id_clean, and this assert
periodically triggers.

The most notable thing about it is that it always seems to be triggered by
a NO SENDERS notification with a mscount field less than the mscount in the
portinfo structure:

#8  0x0807ed9d in __assert_fail_backtrace (assertion=0x817c080
"hurd_ihash_value_valid(* i->id_hashloc)",
file=0x817c03c
"/root/hurd-0.9.git20171119/build-deb/../libfshelp/get-identity.c",
line=60,
function=0x817c0a8 <__PRETTY_FUNCTION__.8282> "id_clean") at
./build-deb/../libshouldbeinlibc/assert-backtrace.c:61
#9  0x08073d61 in id_clean (cookie=0x20550ec0) at
./build-deb/../libfshelp/get-identity.c:60
#10 0x0807a20e in ports_port_deref (portstruct=0x20550ec0) at
./build-deb/../libports/port-deref.c:39
#11 0x0807a9bd in internal_demuxer (outheadp=0x22800ef0, inp=0x22802f00) at
./build-deb/../libports/manage-multithread.c:215
#12 synchronized_demuxer (inp=0x22802f00, outheadp=0x22800ef0) at
./build-deb/../libports/manage-multithread.c:239

(gdb) frame 11
#11 0x0807a9bd in internal_demuxer (outheadp=0x22800ef0, inp=0x22802f00) at
./build-deb/../libports/manage-multithread.c:215
215   ports_port_deref (pi);
(gdb) x/8x inp
0x22802f00: 0x1700  0x0020  0x  0x20550ec0
0x22802f10: 0x  0x0046  0x10012002  0x0002

(gdb) print *pi
$9 = {class = 0x209ae8, refcounts = {references = {hard = 0, weak = 1},
value = 4294967296}, mscount = 24,
  cancel_threshold = 0, flags = 0, port_right = 22723, current_rpcs = 0x0,
bucket = 0x820e628, hentry = 0x84a7cf8,
  ports_htable_entry = 0x205b3610}

Notice that the NO SENDER message had an mscount of 2 (the last word in the
x/8x), but the portinfo structure has an mscount of 24.

Also, there's no hard references and one weak reference in the portinfo
structure.  At this point there should be two weak references - one for the
hash table entry and one for the demuxer.  If the hash table assert didn't
fail, then we'd be getting a weak reference underflow soon.

I've puzzled over this for days, and this is my current theory for what's
happening.

We started with one hard reference (because we have outstanding send
rights) and one weak reference (the hash table).  Then we got a NO SENDERS
message, so we started to drop the hard reference by demoting it to a weak
reference, in libports/port-deref.c.  Now we've got no hards, two weaks,
the entry still in the hash table, and nothing locked (we're about to call
the dropweak_routine).

At this point, other threads preempt us.  Our portinfo structure is still
in the hash table, so more send rights get created (22 more, in the gdb
trace above!), they get consumed, and another NO SENDERS message gets
generated.  It gets to the dropweak_routine with no hard references and
three weak references (the two from before, plus one more from the second
NO SENDERS' demuxer).  It removes the hash table entry (dropping one weak
reference), drops another weak reference (in ports_port_deref), and
finishes running.

Now the first thread (finally) gets to run again.  It's now got no hard
references, one weak reference, and the dropweak_routine runs on a portinfo
structure that's already been removed from the 

Re: race condition destroying condition variables

2017-12-27 Thread Brent W. Baccala
On Wed, Dec 27, 2017 at 2:31 PM, Samuel Thibault <samuel.thiba...@gnu.org>
wrote:

> Hello,
>
> Brent W. Baccala, on mar. 26 déc. 2017 23:06:13 -0500, wrote:
> > Also, the Linux source code in nptl/ includes the following comment:
> >
> >   /* If there are waiters which have been already signalled or
> >  broadcasted, but still are using the pthread_cond_t structure,
> >  pthread_cond_destroy needs to wait for them.  */
>
> Ok, so even if Posix explicitly says that it has undefined behavior,
> since nptl behaves fine we should probably behave fine too.
>

Let me clarify - that comment precedes a block of code in
pthread_cond_destroy() that waits for the other threads.

See glibc-2.23/nptl/pthread_cond_destroy.c, around line 50.

That code's been rewritten in glibc-2.25, but the requirement is still
there.

glibc-2.25/nptl/pthread_cond_destroy.c, lines 38-40:

Thus, we can assume that all waiters that are still accessing the condvar
> have been woken.  We wait until they have confirmed to have woken up by
> decrementing __wrefs.


... and then it waits for "wrefs >> 3" to become zero.

My point was that since Linux waits for the other threads in
pthread_cond_destroy(), we should too.

agape
brent


Re: race condition destroying condition variables

2017-12-26 Thread Brent W. Baccala
Well, I've tried both Samuel's (swap space) and Svante's (nocheck)
suggestions and have found that both allow me to successfully build the
glibc packages!

The problem that I've got now is that I've changed the size of the
condition variables by adding an extra field, an integer that tracks the
number of waiting threads.  This means that I have to recompile anything
that depends on the size of a condition variable, which is potentially
everything on the system.

First, I'd like to build the packages using the same scripts used to build
the Debian distribution packages.  Is this available somewhere?

Next, how will modifying the memory layout of condition variables affect
our packaging?  I figure that we'll have to bump libc0.3 to libc0.4, and
that should pretty much do the trick, right?

Finally, I'm wondering if we need to change this at all.  The "__data"
field in struct __pthread_cond seems unused, at least in the libpthreads
directory.  Is there any use for this pointer?  Or can I use those bits?

Also, the Linux source code in nptl/ includes the following comment:

  /* If there are waiters which have been already signalled or
 broadcasted, but still are using the pthread_cond_t structure,
 pthread_cond_destroy needs to wait for them.  */

...which is the conclusion I've come to as well.

agape
brent


Re: race condition destroying condition variables

2017-12-21 Thread Brent W. Baccala
Well, I've got a patch that might work, but I'm having a lot of trouble
testing it.

I can't dpkg-buildpackage the Debian glibc package.

It gets into the test routines, then a bunch of the math tests crash with
SIGSEGVs and SIGILLs, then I get a bunch of kernel errors:

no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2423))
no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2423))
no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2423))
no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2814))
no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2814))
no more room in ee26a908 ((...i386-libc/elf/ld.so.1(2814))

This happens in either tst-vfprintf-width-prec or tst-mallocfork2,
depending on whether I allocate 2 GB or 4 GB to the virtual machine.

Any ideas what the problem might be?

And how are the Debian packages built for downloading?  It must be
something different than what I'm doing...

agape
brent


Re: race condition destroying condition variables

2017-12-19 Thread Brent W. Baccala
On Tue, Dec 19, 2017 at 3:25 AM, Samuel Thibault <samuel.thiba...@gnu.org>
 wrote:

> Brent W. Baccala, on mar. 19 déc. 2017 00:08:44 -0500, wrote:
> > Looks like there's a race condition when we destroy a condition
> variable.  My
> > understanding of the expected behavior is that once all the threads have
> been
> > signaled (i.e, pthread_cond_broadcast is called), the condition variable
> can be
> > safely destroyed with pthread_cond_destroy.
>
> Err, I don't think that POSIX allows to assume that. The fact that
> pthread_cond_broadcast has returned doesn't mean that other threads have
> finished with pthread_cond_wait.
>
>
POSIX seems a little unclear on that, but C++ seems to require it.

POSIX:

"It shall be safe to destroy an initialized condition variable upon which
no threads are currently blocked." [1]

and

"The *pthread_cond_broadcast*() function shall unblock all threads
currently blocked on the specified condition variable *cond*." [2]

A little further down on [1], in an "informative" section, is the following
snippet of code:

(A) pthread_cond_broadcast(>notbusy);
pthread_mutex_unlock(>lm);
(B) pthread_cond_destroy(>notbusy);

...accompanied by the following comment:

"In this example, the condition variable and its list element may be freed
(line B) immediately after all threads waiting for it are awakened (line
A)..."

which is what makes sense to me.  Of course, our pthread_cond_destroy()
does nothing, but assuming that the condvar is now freed in a step (C),
this code won't work reliably in our current implementation.

I found a discussion of this on stackexchange [3], where the top answer
observes that 'The standard only says "shall unblock" and not "on
successful return from this call they will be unblocked".'

The C++ standard, however, is more explicit, in section 30.5.1:

"Only the notification to unblock the wait must happen before destruction".

cppreference.com [4] says:

"It is only safe to invoke the destructor if all threads have been
notified. It is not required that they have exited their respective wait
functions: some threads may still be waiting to reacquire the associated
lock, or may be waiting to be scheduled to run after reacquiring it."

That's the behavior I was counting on.  I'm using C++, and have found that
I can't just notify_all() a condition variable, then destroy it.

And, yes, I've got everything locked so that another thread can't jump in
there with a wait() between those two events.

On Tue, Dec 19, 2017 at 4:17 AM, Richard Braun <rbr...@sceen.net> wrote:

Besides, the threads should also all go through reacquiring the associated
> mutex, usually sitting right next to the condition variable, and usually
> both embedded in a parent object. What you're normally really interested
> in is releasing this parent object, including destroying the mutex, which
> means you also have to wait for all threads to unlock it. One common way
> to deal with this is reference counters on the parent object.


In my case, I've got a lot more condition variables than mutexes, because I
don't want threads waking up for events other than the one they're waiting
for.  One mutex for the entire pager, and a condition variable for every
outstanding lock and write.  So, for example, if a thread is waiting for a
particular write to finish, it waits on that write's condition variable,
which gets signaled and destroyed when the write finishes, while the pager
object and the mutex continue to exist.

I was thinking about wrapping such a counter as you suggest into the
condition variable structure, to ensure that the POSIX informative behavior
and the C++ behavior work reliably.

Increment the counter (atomically) when we're about to block on a condition
variable, and decrement it when we're done the post-block processing.  Then
pthread_cond_destroy() can first check to make sure that the wait queue is
empty (return EBUSY if not), then spin wait for the counter to be zero
otherwise.  I think that should ensure that we can free() the condition
variable after pthread_cond_destroy() has returned, and that, in turn, will
ensure that the C++ destructor works reliably, too.

agape
brent

[1]
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_destroy.html
[2]
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_cond_broadcast.html
[3]
https://stackoverflow.com/questions/7598457/when-can-a-cond-var-be-used-to-synchronize-its-own-destruction-unmapping
[4]
http://en.cppreference.com/w/cpp/thread/condition_variable/~condition_variable


race condition destroying condition variables

2017-12-18 Thread Brent W. Baccala
Hi -

Looks like there's a race condition when we destroy a condition variable.
My understanding of the expected behavior is that once all the threads have
been signaled (i.e, pthread_cond_broadcast is called), the condition
variable can be safely destroyed with pthread_cond_destroy.

The problem is in glibc's libpthread/sysdeps/generic/pt-cond-timedwait.c.
After __pthread_block() returns, we spinlock on cond->__lock.  The problem
is that our __pthread_block() is just a mach_msg receive, and our
__pthread_wakeup() (called by pthread_cond_broadcast) is just a mach_msg
send.

So we can do a pthread_cond_broadcast, which will send messages to all
waiting threads, but there's no guarantee that the threads have received
the message; the message could be queued.  Then we destroy the condition
variable, then the thread receives the message and tries on spinlock on a
free'd region of memory.

It looks like the whole reason for that spinlock is to figure out if
somebody else removed us from the wait queue, and to remove ourselves from
the wait queue if they did not (i.e, we timed out).

I'm puzzling about how to fix it, other than by reorganizing my libpager
code so that condition variables don't get destroyed very often.

Also, I'm a bit confused by the management of the source code.  Is the
authoritative copy at git://git.savannah.gnu.org:/hurd/libpthread.git?

agape
brent


Re: race condition in libports

2017-12-17 Thread Brent W. Baccala
On Sun, Dec 17, 2017 at 8:02 AM, Samuel Thibault <samuel.thiba...@gnu.org>
wrote:

> Hello,
>
> Brent W. Baccala, on sam. 16 déc. 2017 21:37:05 -0500, wrote:
> > basically, we're storing ports in a inode-to-port hash, looking them
> > up when io_identity() gets called, and removing them from the hash
> > when the class's clean routine gets called.
>
> That's the bug: one needs a reference for this.  And it's a weak
> reference: identity is fine with getting rid of it.  Could you try the
> attached patch?


Makes sense.  I've applied it.  So far, so good.

Now I feel stupid!  Holding a port in a hash is such a classic example of a
weak reference, but I didn't even think about moving that code out of the
clean function.

Thank you.

agape
brent


race condition in libports

2017-12-16 Thread Brent W. Baccala
Hi -

I'm making good progress on the multi-client libpager.  I've been running
it on my root filesystem for about a month now, with few problems recently.

However, there are still some bugs.  One seems to be in libports.  It
manifests like this:

/hurd/ext2fs.static: ../../libports/../libshouldbeinlibc/refcount.h:171:
refcounts_ref: Assertion '! (r.hard == 1 && r.weak == 0) || !"refcount
detected use-after-free!"' failed.
/hurd/ext2fs.static: ../../libports/complete-deallocate.c:41:
_ports_complete_deallocate: Assertion '! "reacquired reference w/o send
rights"' failed.

gdb indicates that the port in question was generated by
libfshelp/get-identity.c.  That file's a short read; basically, we're
storing ports in a inode-to-port hash, looking them up when io_identity()
gets called, and removing them from the hash when the class's clean routine
gets called.

I think what's happening is that we have a port that loses its last send
right, and after its refcount is decremented but before its clean routine
gets called, another call to io_identity() pulls it out of the hash.  Then
you've got ports_get_right complaining (that's the first line) that it's
incrementing a zero refcount, and ports_port_deref complaining (that's the
second line) that it deallocating a port that now has send rights.

Looking at the tail end of libports/no-senders.c, you'll see that
ports_port_deref gets called after we've dropped the mutex on _ports_lock.
I'm thinking that we need to hold that mutex all the way until the class's
clean routine has returned in order to assure that the refcount get
decremented and the port gets removed from the hash atomically.

Of course, that requires holding a global lock while the clean routine
runs.  It seems to me that only the port in question needs to be locked,
but the individual ports don't seem to have mutexs associated with them.

Any ideas what to do?

agape
brent


Re: hurd vm images don't seem to be booting

2017-12-13 Thread Brent W. Baccala
On Wed, Dec 13, 2017 at 3:39 AM, Samuel Thibault <samuel.thiba...@gnu.org>
wrote:

> Hello,
>
> Brent W. Baccala, on mar. 12 déc. 2017 20:53:05 -0500, wrote:
> > qemu-system-i386 -enable-kvm -m 1024 -nographic -drive file=
> > debian-hurd-20171101.img,format=raw
>
> Why -nographic?  That means no VGA output, thus no wonder no output :)


Oh, yes, that's it...

Forgot that I'd changed my images to use a serial console.

Thanks.

agape
brent


hurd vm images don't seem to be booting

2017-12-12 Thread Brent W. Baccala
Hi -

I've downloaded Hurd images from these two locations:

http://cdimage.debian.org/cdimage/ports/current/hurd-i386/debian-hurd.img.tar.gz

http://people.debian.org/~sthibault/hurd-i386/debian-hurd.img.tar.gz

and I've had the same problem with both.  I can't get the grub boot menu to
load with this VM invocation:

qemu-system-i386 -enable-kvm -m 1024 -nographic -drive
file=debian-hurd-20171101.img,format=raw

which I think is pretty standard.  My own VM images continue to work.

Is something broken in these public downloads, or am I missing a step
somehow?

agape
brent


Re: FSF copyright assignment (was: SIGILL problems with Hurd port of GO in gcc-8, and rpctrace bugs)

2017-11-21 Thread Brent W. Baccala
On Mon, Nov 20, 2017 at 3:33 AM, Samuel Thibault <samuel.thiba...@gnu.org>
wrote:

> Brent W. Baccala, on dim. 19 nov. 2017 20:04:35 -0500, wrote:
> > The   assignment   of   par.   1(a)   above   applies   to   all
> >  past   and   future   works   of   Developer   that
>  constitute
> >  changes   and
> > enhancements to the Program.
> >
> > An obvious reading of this is that everything I do on Hurd for the rest
> of my
> > life will belong to the Free Software Foundation.
>
> "the Program" meaning the FSF repository, not whatever branch you have
> on your disk or whatever.
>

The contract defines "the Program":

1.(a)   Developer   hereby   agrees   to   assign   and   does   hereby
 assign   to   FSF   Developer's   copyright   in   changes   and/or
enhancements to the suite of programs known as GNU HURD (herein called the
Program), including any accompanying
documentation files and supporting files as well as the actual program
code. These changes and/or enhancements are herein
called the Works.


There is no mention here of the FSF repository.  "the Program" is GNU
HURD.  Notice that if the GPL was read the way that you are proposing, as
soon as I made a change on my local disk, the GPL would no longer apply,
since the local copy would no longer be considered "the Program".


> >   I've asked them to change the language and they have refused.
>
> Because they believe it already means what you want.
>

The way a legal contract is supposed to work is that the parties negotiate
until they develop language that everyone agrees on.

Of course, we know that often, legal contracts don't work that way at all.
Some big organization hands you a contract, tells to sign it, and if you
won't, too bad.

And everyone needs to get a lawyer, so it's all very, very expensive.


> > My patches are covered under the GPL; you're free to use them.  Samuel
> can
> > incorporate them into the git repository, if he so chooses.
>
> I can not choose that alone. Being covered by assignments is a GNU
> decision.
>

True enough.  I was needling you a bit.  :-)  Sorry if I caused any offense.

My point is that I've done everything that, say, Linus Torvalds would want
for code to be incorporated into the Linux kernel.

agape
brent


Re: SIGILL problems with Hurd port of GO in gcc-8, and rpctrace bugs.

2017-11-19 Thread Brent W. Baccala
On Fri, Nov 17, 2017 at 3:14 PM, Svante Signell 
wrote:

>
> Thanks a lot for your patches for rpctrace. Now more failing programs
> can be traced, where the standard version fails. There are still some
> examples hanging hard on gsync_wait or entering an infinite loop.
>
> And thank you for writing the first ever documentation of rpctrace :)
>

You're welcome, on both points.

Can you please, please, please consider signing the copyright agreement

with FSF!
>

No, I'm sorry.  This is the offending clause:

The   assignment   of   par.   1(a)   above   applies   to   all   past
>  and   future   works   of   Developer   that   constitute   changes   and
> enhancements to the Program.


An obvious reading of this is that everything I do on Hurd for the rest of
my life will belong to the Free Software Foundation.  I've asked them to
change the language and they have refused.

My patches are covered under the GPL; you're free to use them.  Samuel can
incorporate them into the git repository, if he so chooses.  I'm even
willing to assign copyright on the patches to the FSF, but it has to be a
more limited assignment that what they've proposed.

I have considered this, and there's no way I'll sign a contract with that
clause in it.

Thanks anyway, I'll use your patches for local use. And if you want
> more examples of where rpctrace hangs or loops forever, or testing of
> new versions, please let me know.
>

Sure, let's take a look.  I've spent a good bit of time studying rpctrace,
so if you've got some test cases that uncover bugs, I might be able to
understand them.

agape
brent


Re: SIGILL problems with Hurd port of GO in gcc-8, and rpctrace bugs.

2017-11-16 Thread Brent W. Baccala
On Thu, Nov 16, 2017 at 5:55 PM, Svante Signell 
wrote:

>
> > Perhaps you could try those patches and see if they fix your problem?
> If not,
> > then it's something else that should be investigated further.
>
> Thanks, I will try to apply them: Do I need all 0001 to 0011 and 0101 to
> 0102
> patches?
>

I'm not sure exactly which ones you absolutely need.  The 0100 series is
documentation, so you definitely don't need them.

agape
brent


Re: SIGILL problems with Hurd port of GO in gcc-8, and rpctrace bugs.

2017-11-16 Thread Brent W. Baccala
On Thu, Nov 16, 2017 at 12:39 PM, Samuel Thibault <samuel.thiba...@gnu.org>
wrote:

> Hello,
>
> Brent W. Baccala, on jeu. 16 nov. 2017 12:25:23 -0500, wrote:
> > This isn't in the main Hurd repository because I wouldn't sign the FSF
> > copyright assignment.  They insisted on a clause that said I assign
> copyright
> > on "all past and future works of Developer that constitute changes and
> > enhancements to the Program" and I wouldn't agree to it, because I was
> only
> > willing to assign copyright on specific changes.
>
> Rather, there is disagreement over what the assignment means.  I really
> believe that FSF intends "changes and enhancements to the Program" to
> mean what was actually *committed*, and not changes you could have in
> your own private repository.  Meaning: you only have to say what should
> be commited to decide what is covered by the assignment.
>
>
That was their claim.  I asked them to write it into the agreement and they
wouldn't do it.

Me (15 Nov 2016):

OK, so can you modify 1 (b) in the contract to clarify this point?  I am
> not assigning "all past and future works", but only the ones that I
> submit.  "Anything not contributed is not assigned".  Fine.


Donald Robertson (16 Nov 2016):

There's no need to modify it; that's already how it works. Any
> modification to our forms has to be approved by legal counsel, which is
> a resource we have in quite limited amounts, so I hope you will sign the
> assignment as is.


...and I did not sign it

agape
brent


Re: SIGILL problems with Hurd port of GO in gcc-8, and rpctrace bugs.

2017-11-16 Thread Brent W. Baccala
On Wed, Nov 15, 2017 at 11:53 AM, Svante Signell 
wrote:

>
> Additionally the rpctrace output hangs hard at gsync_wait() both for the
> ./index0-out-reduced_OK.x and ./index0-out-reduced_nOK.x files.
> Has rpctrace been thoroughly tested on multi-thread applications?
>
> Please, give feedback on this if you have possibility, specifically the
> rpctrace
> bug(s).
>
>
rpctrace has a known problem interacting with system RPCs that block the
sender.

See the discussion thread starting here:

http://lists.gnu.org/archive/html/bug-hurd/2016-09/msg2.html

My guess is that you're seeing the same problem with gsync_wait() that I
saw with vm_map().  I've got patches to rpctrace to fix this problem, here:

https://github.com/BrentBaccala/hurd

For those of you who've seen this before, note that I've renamed the github
repository.

This isn't in the main Hurd repository because I wouldn't sign the FSF
copyright assignment.  They insisted on a clause that said I assign
copyright on "all past and future works of Developer that constitute
changes and enhancements to the Program" and I wouldn't agree to it,
because I was only willing to assign copyright on specific changes.

So, I've started keeping my own repository.  I just double-checked those
patches, and they apply cleanly again the current Savannah repository (just
warnings about offset and fuzz).

Perhaps you could try those patches and see if they fix your problem?  If
not, then it's something else that should be investigated further.

agape
brent


possible recent performance change?

2017-09-11 Thread Brent W. Baccala
Hi -

I recently updated my Debian Hurd system for the first time in a few
months, and it started showing an annoying behavior running emacs over ssh,
from an Ubuntu host to Hurd on a qemu VM.  The editor would freeze for
about a second, then suddenly catch up with the key strokes I had typed in
the meantime.

I backed out to an earlier snapshot, last updated on June 27, and this
behavior went away.

So, one possibility is that something changed in the Debian archive over
the last two months or so to trigger this.

Any educated guesses as to what that might be, before I dig any deeper?

agape
brent


libpager pagemap

2017-08-31 Thread Brent W. Baccala
Hi -

My work on a multi-client libpager has progressed to the point where I have
pseudo code for the logic and am considering how to implement the pagemap.

I'm trying to achieve three goals with the pagemap:

1. keep pagemap entries as small as possible
2. support arbitrary numbers of clients, and
3. support a single client with minimal overhead

(1) seems important since ext2fs disk pagers cover the entire partition and
can grow quite large.  (2) is pretty much the whole point of my project;
though I have considering relaxing that requirement to set an upper limit
on the number of clients.  (3) seems important since it's the most common
case, but maybe the speed of libpager isn't much of an issue.

I don't see any reasonable way of achieving (2) without making the pagemap
entry at least as big as a pointer (4 bytes).  Currently it's a short (2
bytes), so we'll have to double the size of our pagemaps.

I'm considering several schemes to achieve (3).  Here's one, that handles
up to four clients fairly efficiently.

First, steal a bit from the pointer by aligning the structure on a word
boundary.  Then, if the least significant bit is 1, interpret the pagemap
entry as a bitfield rather than as a pointer to a more complicated
structure.

A 32 bit field (err... make that a 31 bit field) can handle up to four
clients.  The new pagemap entries have an ACCESSLIST, which is an unsorted
set of clients (those who currently have access), and a WAITLIST, which is
a sorted list of clients that have requested access, and a few boolean
flags.

We number our clients 0 through 3, and I'm thinking above moving their
ports into high port number space to easily identify them.  So, the first
client on the first memory object would be on port 0x800, the second
client would be on 0x8001, etc.  The second memory object would have
its ports on 0x8004 through 0x8007.  So we can just strip off the
lowest 2 bits to figure out our "client number".  I don't think protected
payloads would help, since we're not receiving messages on these ports;
they're send ports used to communicate with the kernel(s).

ACCESSLIST entries requires two bits - one to indicate if access is
granted, and one to indicate if it's read-only or read-write access.  So,
we use 2*4 = 8 bits for the ACCESSLIST.  The WAITLIST is sorted (client
requests are processed FIFO) and requires four bits per slot - one to
indicate that the slot is in use, two to identify the client, one to
indicate if the request is for read or write access.  We need four slots
for four clients, so 4*4 = 16 bits for the WAITLIST.  Then we need a few
more flag bits.  It all fits.

Once we hit our fifth client, or some oddball condition (like an error
return other than the expected EIO, ENOSPC, EDQUOT), we shift to a pointer
that points to a more complicated, dynamically allocated structure.

The dynamically allocated structures are themselves stored in an rb-tree,
so that if multiple pages are in the same condition (same set of clients
with access, for example), they all point to the same structure.  Every
time we want to change a pagemap structure, we construct the new pagemap
structure in a temporary object, then search the rb-tree for a matching
entry, and either use a pointer to the existing entry (if found), or insert
the temporary object, use a pointer to it, and construct a new temporary
the next time we need to do this.

That can be a bit slow, so I'm thinking about including some extra pointers
in that dynamic structure to cache the most common cases (like moving the
first client on WAITLIST to ACCESSLIST).  Of course, these structures would
only be used if we've got more than four clients, so we're already into a
corner case anyway.  Right now, I'm leaning towards not including any extra
pointers right now.  I have no experience with large numbers of clients, so
I'm not really sure what should be optimized, anyway.

It's a pretty complex scheme, and thus prone to bugs, which is why I'm
posting to the list for comments.  Does anybody think it would be better to
ditch the bit field trick completely and handle even single clients using
pointers to malloc'ed structures?  The code would be simpler but slower.

agape
brent


Re: libpager

2017-08-10 Thread Brent W. Baccala
On Sun, Aug 6, 2017 at 2:34 PM, Justus Winter <jus...@gnupg.org> wrote:

> "Brent W. Baccala" <cos...@freesoft.org> writes:
>
> >> Are you aware of the pager rework in [0]?  I'm considering to review and
> >> merge these changes.  It may make sense to try to come up with a common
> >> API.  What is the state of your library, is it a drop-in replacement?
> >>
> >> 0: https://git.sceen.net/hurd/hurd.git/log/?h=mplaneta/gsoc12/review_v1
> >
> >
> > I was not aware of this link, exactly.  Richard Braun mentioned a GSoC 12
> > project a week ago, but I thought that it was a gnumach modification.
> > Glancing at this git repository, I see that it's all translator stuff.
>
> Not merely, see e.g.:
>
> https://git.sceen.net/hurd/hurd.git/commit/?h=mplaneta/gsoc1
> 2/review_v1=2ca8719606eeca99c425b3282017a6412b49213a
>
>
OK, so it looks like Neal Walfield, in 2002, modified libpager to pass in
pointers to its callback functions instead of linking to fixed names in the
translators.  The new callbacks allow for multi-page operations and take
block number arguments instead of byte offsets.  These patches were applied
into the gsoc12 review branch in 2012 but haven't made it into master.  Not
much changed in the libpager logic itself, except that multi-page
operations are supported, even though the kernel still doesn't use them
without further patches.

Other than passing a structure of pointers to pager_create(), the biggest
difference that I see is how the data is returned.  You don't just malloc
an array and return it from pager_read_page().  In the new API, the
read_pages() function must return the data by calling pager_data_supply(),
pager_data_unavailable(), or pager_data_read_error().

The immediate issue that I see with this is that in a multi-client
libpager, you can't just wrap pager_data_supply() around
memory_object_data_supply() because you don't know which clients to supply
the data to.  That means that pager_data_supply() will have to look at the
pagemap to figure exactly what to do with the data it's passed.  I don't
foresee any serious problems, though.

Obviously, changes are required to the filesystem translators to support
the new API.  Some of the translators, like fatfs, were just modified to
run a loop over their old, single page code.  That could even be moved to
the libpager library; i.e, the structure of function pointers could include
both read_page() and read_pages(), and use the old read_page() interface if
read_pages() is NULL.

Perhaps I could design libpager to use the new read_pages() interface, but
continue supporting the old API for now.  Then we can easily transition to
the new interface, perhaps just by changing pager_create() calls to
pager_create_ops(), which would take an extra pointer to a struct pager_ops.

What do you guys think?  Do we like Neal Walfield's API?  Start moving in
that direction?

agape
brent


Re: libpager

2017-08-06 Thread Brent W. Baccala
On Sun, Aug 6, 2017 at 12:40 PM, Justus Winter <jus...@gnupg.org> wrote:

> "Brent W. Baccala" <cos...@freesoft.org> writes:
>
> > but it's obviously got some issues
>
> What kind of issues?
>

I was thinking of a multi-client environment where a disk-backed pager is
also doing non-disk-backed (i.e, cache coherency) operations.  One client
is trying to page in from the disk while two other clients (with write
access) are passing a page back and forth between themselves with no disk
operations required.  The clients doing non-disk stuff would have to wait
for the disk operations to complete.  Now that I think about it more, I
admit that it's a bit of a stretch.


> > and could become a performance bottleneck at some point.  There's no
> > good reason to block all access to page 100 while a disk operation
> > completes on page 1.
>
> Let's wait until it actually becomes a bottleneck...
>

Yes, I agree.


> Are you aware of the pager rework in [0]?  I'm considering to review and
> merge these changes.  It may make sense to try to come up with a common
> API.  What is the state of your library, is it a drop-in replacement?
>
> 0: https://git.sceen.net/hurd/hurd.git/log/?h=mplaneta/gsoc12/review_v1


I was not aware of this link, exactly.  Richard Braun mentioned a GSoC 12
project a week ago, but I thought that it was a gnumach modification.
Glancing at this git repository, I see that it's all translator stuff.
I'll study it.  Is there corresponding gnumach work?

The state of my library is that it's in pseudo code.  It's intended to be a
drop-in replacement for libpager.

agape
brent


libpager

2017-08-06 Thread Brent W. Baccala
Hi -

I've learned some interesting things about libpager that I'd like to share
with the list.

I've found two bugs in the existing code that were "fixed" by the new
demuxer that sequentially services requests on each pager object.

For example, the data return code sets a flag PAGINGOUT in the pagemap
before it starts calling pager_write_page (which could be slow; writing to
a hard drive, say).  Future data returns check the PAGINGOUT flag and wait
on a condition variable if it's set.  The problem is that if multiple
threads start waiting on that, pthreads doesn't guarantee what order they
will run in when the conditional variable is signaled, so the data writes
can get reordered.  If three data returns come in 1, 2, 3, (maybe because
pager_sync is called three times), number 1 starts writing, but if it
doesn't finish quick enough, 2 and 3 can get reordered.

Except that they can't.  The new demux code queues the second and third
writes.  They don't process until the first one is done.  The pager object
is essentially locked until the pager_write_page() completes.

I went so far as to write a test case to exercise the bug!  Just good
coding practice - develop tests for your known bugs first.  Then I ran it,
and it couldn't reproduce the bug!  Only after thinking about the code more
did I understand why.

I know the demuxer code was rewritten to avoid thread storms, but it's
obviously got some issues and could become a performance bottleneck at some
point.  There's no good reason to block all access to page 100 while a disk
operation completes on page 1.  I'm not looking to re-write it right now,
but I'm curious.  Does anybody remember what characterized the thread
storms?  What conditions triggered them?  What kind of pager operations
were being done?

agape
brent


Re: multi page memory object operations

2017-07-31 Thread Brent W. Baccala
On Mon, Jul 31, 2017 at 5:21 AM, Richard Braun <rbr...@sceen.net> wrote:

> On Sun, Jul 30, 2017 at 10:20:55PM -0400, Brent W. Baccala wrote:
>
> > Does anybody know the history of multi page requests?  Was it ever
> > implemented in the kernel?
>
> Maksym Planeta worked on it as part of the 2012 GSoC. I tried to review
> it for merging, but was never satisfied with some details, too afraid
> of what it would break (in both the API and ABI), and didn't consider
> the gain worth at the time (I always thought improving the page cache
> internally first, without involving the external pagers, would yield
> much better results, and it did). Both Maksym and I went on to do other
> things and the work just lingered, as is often the case. It could make
> sense to revive the work now that paging was globally improved, but
> you'll likely get conflicts, and the work isn't trivial, IOW it really
> needs careful review.


I'm designing the new libpager to handle multi page requests, so I'll just
leave it at that.  If the kernel work gets revived, the filesystem
translators will be able to handle it.

agape
brent


multi page memory object operations

2017-07-30 Thread Brent W. Baccala
Hi -

I've found something puzzling while working on the libpager code.

It seems that the current libpager's data-request and data-unlock routines
only handle single page requests, while the Mach documentation suggests
that the kernel can make multi page requests using these routines.

Looking at the gnumach code, it sure seems that the kernel only makes
single page requests.

Does anybody know the history of multi page requests?  Was it ever
implemented in the kernel?

agape
brent


multi-client libpager

2017-07-17 Thread Brent W. Baccala
Hi -

Just a quick FYI to let you all know that I'm working on a multi-client
libpager.  This will allow execution of binaries on remote nodes in a Hurd
cluster.  I'm well into pseudo coding the logic; you can follow my work in
the netmsg github repository:

g...@github.com:BrentBaccala/netmsg.git

The psuedo code is in the libpager/NOTES file.

I'm not going to assign the copyright to the FSF.  It will remain GPL2+ in
its own github repository.  It should be a drop in replacement for the
standard libpager.

agape
brent


Re: [GSoC 2017] Support for fsysopts and multiple interfaces

2017-06-29 Thread Brent W. Baccala
On Thu, Jun 29, 2017 at 7:13 AM, Joan Lledó 
wrote:

> Hello Brent,
>
> Thanks for your interest in my GSoC. I didn't know the tools you
> propose and think that everything that moves us towards using
> standards is good. What I believe is that probably such a development
> should be done by another developer more experienced than me, besides,
> until the proper libraries are created it'd suppose a big effort that
> mightn't be worth the gain, but it's always fine to know about new
> technologies.
>
> Regards!


Well, my intent was more to document my thoughts than propose anything for
current work.

Justus mentioned ioctl's, I'd overlooked that, so sometimes it's best to
post an email and see what other people have to say.

agape
brent


Re: [GSoC 2017] Support for fsysopts and multiple interfaces

2017-06-28 Thread Brent W. Baccala
Joan -

Thank you for your work on this.  I haven't commented on it until now,
partly because of some email problems, and partly because I haven't been
working on Hurd for the last six months or so.

On Mon, Jun 26, 2017 at 7:28 AM, Joan Lledó 
wrote:

> I've advanced in several fronts during the last two weeks, like the
> initial configuration of the stack from the command line, by using
> libargp, or reading and writing a new configuration in run time,
> through fsysopts. The aim is to support exactly the same options
> pfinet does, so we'll be able to replace pfinet by the LwIP translator
> transparently for the rest of the system.
>

That's good!  As far as I know, using command line style options is the
only way to change configuration in the current pfinet translator.

Leaving that for backwards compatibility is fine, but I think that we also
want to have more of an API to interface with software.

The current state of the art seems to be YANG data models (RFC 6020)
manipulated by either NETCONF (RFC 6241) or RESTCONF (RFC 8040).

NETCONF is transported over an SSH encrypted session and uses an
XML-encoded request/reply format.  RESTCONF is transported over SSL and
uses HTTP verbs with JSON encoded data.  Also, NETCONF supports
transactions, allowing complex configuration changes to be made atomically.

For example, here's a NETCONF request that simultaneously sets an IP
address on an interface and sets the default route to point out that
interface.  The changes are supposed to be atomic, so there's no point at
which an old default route points to an address that's no longer reachable
because the interface has been reconfigured.


   
  
 

eth

   
  10.1.1.1
  255.255.255.0
   

 
 

   
  rt:static
  
  
 

   0.0.0.0/0
   
  10.1.1.254
   

 
  
   

 
  
   


My example is probably not quite right, and obviously complex enough to
relegate to a library, several actually, at least one for the XML and
another for the YANG.  It's something that could be contributed back to
LwIP.

For our purposes, I envision dropping TCP/SSH and sending XML configuration
snippets (like the one above) over a Mach RPC with a string argument, and
getting the XML encoded response back in return.  Encoding and decoding
large XML strings would obviously present performance issues, but I don't
expect these operations to be run often enough for that to be an issue.

NETCONF also supports retrieving operational data, so you could retrieve a
list of TCP sessions, to implement something like netstat.  A YANG model
hasn't been published for TCP, but there's a TCP MIB and a procedure to
convert MIBs to YANG.  [1]  Something like this would retrieve the TCP
session data needed for a basic netstat:


   
  
 

 
  
   


Implementing this is a lot of work, and I'm not suggesting it for part of
this GSoC project, but I want to document and discuss what kind of API our
network translators should ultimately support.  Our options, as I see it,
are:

1. some kind of non-standard, Hurd-specific API to set configurations and
retrieve statistics

2. MIBs with Mach RPCs to implement SNMP operations.  Outdated and
non-atomic.

3. YANG models with RESTCONF-like operations.  Would practically require
embedding an http server in the translator.

4. YANG models with NETCONF operations, as described above.

[1] http://www.netconfcentral.org/database_docs

agape
brent


Re: rpctrace / libpager / signal preemptor

2016-12-01 Thread Brent W. Baccala
On Wed, Nov 30, 2016 at 11:12 PM, Samuel Thibault 
wrote:

>
> Err, I'm sorry, did you perhaps miss the fix I made after:
>
> commit 406b031c996ec4cd8c76d251de8b7bf462d8b975
> Author: Samuel Thibault 
> Date:   Sun Nov 20 16:16:24 2016 +0100
>
> Fix SIGBUS code
>

Err, yes, I missed that.  I don't know topgit, so that glibc git repository
is largely incomprehensible to me.

I have to learn how to use topgit, obviously...

agape
brent


Re: rpctrace / libpager / signal preemptor

2016-12-01 Thread Brent W. Baccala
On Wed, Nov 16, 2016 at 9:05 AM, Samuel Thibault 
wrote:

> Samuel Thibault, on Wed 16 Nov 2016 19:50:07 +0100, wrote:
> > Samuel Thibault, on Wed 16 Nov 2016 19:46:52 +0100, wrote:
> > > The attached testcase does get the faulting address.
> >
> > And the attached testcase doesn't.
>
> And is fixed by the attached patch, could you try it?
>

OK, I've finally tested this patch!

It took me a while.  I had a lot of problems building the glibc package, so
I finally stopped trying to build the package and just built the library
with the intent of injecting it into ext2fs using LD_LIBRARY_PATH.  It took
me a while to figure out that settrans clobbers the environment, and that
bug I reported about the /proc environ files returning I/O errors certainly
didn't help.  I finally got the translator loading the library using a
trick that I want to document:

settrans -a mnt /usr/bin/env LD_LIBRARY_PATH=/root/lib /hurd/ext2fs
ramdisk

The patch works, but is incomplete.  Samuel's test programs attempt to
access unmapped memory addresses, which generate KERN_MEMORY_FAILURE, but
ext2fs attempts to access mapped addresses back by a memory manager
returning faults, which generates KERN_MEMORY_ERROR, so we also need this:

--- sysdeps/mach/hurd/i386/exc2signal.c~ 2016-11-09 20:03:52.0 -1000
+++ sysdeps/mach/hurd/i386/exc2signal.c  2016-11-30 01:54:02.0 -1000
@@ -40,6 +40,7 @@
 {
case KERN_INVALID_ADDRESS:
case KERN_MEMORY_FAILURE:
+   case KERN_MEMORY_ERROR:
  *signo = SIGSEGV;
  detail->code = posix ? SEGV_MAPERR : detail->exc_subcode;
  break;

That, along with the other patches, seems to produce an ext2fs that can
handle disk full conditions without dying!

It still exhibits some weird behavior - creating directory entries for the
files that it doesn't have the space for.

Do we have any kind of a test suite for hurd?  This seems like a nice test
- mount a ramdisk and fill it up.  We've uncovered a number of bugs trying
to do that.

agape
brent


signal preemptors

2016-11-25 Thread Brent W. Baccala
Coming back to the subject of signal preemptors...

Why do we need them at all?  Why not just use the existing POSIX signal
facilities?

We install a signal handler, saving the old signal handler, check the
faulting memory address inside the new signal handler, and relay the signal
on to the saved signal handler if it doesn't match.

Also, codesearch.debian.net shows that the only place signal preemptors are
used is in the hurd source tree itself (no great surprise), and the four
places that it's used there (libpager, libstore, libdiskfs, and exec) don't
do anything else with SIGBUS or SIGSEGV, so we don't even have to worry
about nested signal handlers, really.  We can just catch the signals.

Signal preemptors, as a non standard addition to the already complicated
POSIX signal API, raise questions like, what happens when a signal is
preempted while a debugger is attached?  Does the debugger get the signal
before the preemptor, or does the preemptor fire without waiting for the
debugger?

agape
brent


Re: gdb handling of Mach exceptions

2016-11-25 Thread Brent W. Baccala
On Wed, Nov 23, 2016 at 10:03 PM, Brent W. Baccala <cos...@freesoft.org>
wrote:

>
> Any comments?
>

Well, yes, actually.  :-)

gdb's hurd target has a poorly documented command "set noninvasive".  I
don't completely understand it, but...

I'm starting to see the rational for an "invasive" debugging mode.
"Invasive" means that we debug by wrapping Mach ports - the task port needs
to be wrapped along with the exception ports.  "Non-invasive" means that we
rely on the C library in the program under test to provide some debugging
support.  No port wrapping is done.  Breakpoints, for example, work by
letting the program's message thread generate a SIGTRAP, which gets relayed
to the proc server and is presented to the debugger in response to a wait()
call.

Non-invasive is more efficient, but invasive is more reliable.

I'd code it myself, except that we've got the same old problem with
detaching a process when its ports are wrapped.  This shows, incidentally,
that adding a system call trace facility to the kernel isn't a complete
solution.  We don't just want to trace these messages - we want to
intercept them and only deliver them after a human being has had the chance
to inspect them at a debugger prompt.

agape
brent


Re: C++ vs. glibc/Hurd/Mach headers

2016-11-25 Thread Brent W. Baccala
On Fri, Nov 25, 2016 at 1:46 AM, Thomas Schwinge 
wrote:

> Hi!
>
> Motivation for bringing this up again: GDB has recently switched from
> using a C to a C++ compiler.  GDB, for obvious reasons, needs to access
> low-level Hurd/Mach interfaces.
>
>
I've also had problems compiling hurd code using g++.

In addition to what Thomas has described, the ports library is unusable
with C++ because struct port_info has a member named "class".

Also, the initializer syntax used in /usr/include/refcount.h is unusable
with g++.  For example:

209   const union _references op =
210 { .references = { .weak = ~0U, .hard = 1} };

generates a compiler error:

sorry, unimplemented: non-trivial designated initializers not supported

To reproduce both problems, just create a file containing the line #include
 and try to compile it with g++.  It doesn't matter if you
put it inside an extern "C" block, either.

agape
brent


Re: gdb and PIE binaries

2016-11-22 Thread Brent W. Baccala
On Fri, Nov 11, 2016 at 7:17 AM, Samuel Thibault 
wrote:

> Hello,
>
> Debian is pushing more and more PIE builds, so that address
> randomization can be done. However, on GNU/Hurd, gdb can't work with
> core files from processes running PIE programs, so one has to pass
> CFLAGS=-no-pie etc. to be able to debug programs, it'll become more and
> more problematic.
>

I've found that I can't even debug PIE executables with gdb, let alone core
files.

I encountered this problem when trying to debug gdb on itself.  Downloading
the gdb source tree and building it with "dpkg-buildpackage -b" produced a
PIE executable that I couldn't set breakpoints on properly.  The Debian
/usr/bin/gdb, though, is not PIE, which makes me wonder if someone
(Samuel?) is compiling our Debian packages without PIE, to avoid this
problem.

agape
brent


gdb handling of Mach exceptions

2016-11-21 Thread Brent W. Baccala
Hi -

I've been trying to use gdb on ext2fs when it runs out of disk space and
starts generating memory access exceptions.

It doesn't behave right.  The memory faults get reported as unknown signals
and the program can't be restarted:

Program received signal ?, Unknown signal.
[Switching to Thread 812.8]
0x080eca46 in memcpy ()
(gdb) cont
Continuing.
warning: Signal ? does not exist on this system.
warning: Pid 812 died with unknown exit status, using SIGKILL.

Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb)

I've tracked the problem down to the file gdb/common/signals.c in the gdb
source tree.  This file includes a lot of statements like this:

  /* Mach exceptions.  Assumes that the values for EXC_ are positive! */
#if defined (EXC_BAD_ACCESS) && defined (_NSIG)
  if (hostsig == _NSIG + EXC_BAD_ACCESS)
return GDB_EXC_BAD_ACCESS;
#endif

The problem is that mach/exception.h isn't included at this point, so
EXC_BAD_ACCESS isn't defined and this code isn't included.

I've added #include  to signals.c, and produced a working
gdb:

Program received signal EXC_BAD_ACCESS, Could not access memory.
[Switching to Thread 769.8]
0x080eca46 in memcpy ()
(gdb) cont
Continuing.

Obviously, tacking a Mach-specific include into signals.c isn't the right
solution, so can somebody suggest a proper fix?

Otherwise, I'll dig into it some more and eventually propose something
myself.

agape
brent


Re: rpctrace / libpager / signal preemptor

2016-11-20 Thread Brent W. Baccala
On Sun, Nov 20, 2016 at 5:37 AM, Samuel Thibault 
wrote:

> Samuel Thibault, on Sun 20 Nov 2016 14:50:50 +0100, wrote:
> > Samuel Thibault, on Wed 16 Nov 2016 20:05:49 +0100, wrote:
> > > Samuel Thibault, on Wed 16 Nov 2016 19:50:07 +0100, wrote:
>
> > > And is fixed by the attached patch, could you try it?
> >
> > It seems to be fixing at least some cases indeed.
>
> With a couple more fixes, I could get a "cp" overflowing run not to
> crash ext2fs. Probably other ways of using room in ext2 need fixing too.
>

Sorry I haven't answered for a few days.  I've been trying to test your
patch by building a new glibc package, and keep having all kinds of
problems with memory exhaustion.  I suppose I could test your patch just by
building the library itself, but the Debian package build calls "make
check" and it's causing me all kinds of grief.  So much so that I wonder if
something haven't changed recently; I've built glibc before without these
problems.  At least I'm getting a good look at how a Hurd system behaves
when it runs out of swap.

Do you have any idea what other fixes are needed?  I intend to use ext2fs
as a testbed for a new multi-client libpager, and I want to get its known
bugs fixed first.

I've been approaching this problem from a different angle - trying to get
gdb to handle the memory errors properly, so I can debug a signal
preemptor.  I'll describe that in a separate discussion thread.

And I'm still trying to test your patch...

agape
brent


rpctrace / libpager / signal preemptor

2016-11-08 Thread Brent W. Baccala
Hi -

Just a status update about what I'm working on.  My primary goal right now
is to get ext2fs working right when a ramdisk fills up.  dd hangs and/or
crashes ext2fs instead of cleanly erroring out.

First, there's a problem in libpager

--- a/libpager/data-unlock.c
+++ b/libpager/data-unlock.c
@@ -66,16 +66,16 @@ _pager_S_memory_object_data_unlock (struct pager *p,

   if (!err)
 /* We can go ahead and release the lock.  */
 _pager_lock_object (p, offset, length, MEMORY_OBJECT_RETURN_NONE, 0,
VM_PROT_NONE, 0);
   else
 {
   /* Flush the page, and set a bit so that m_o_data_request knows
 to issue an error.  */
   _pager_lock_object (p, offset, length, MEMORY_OBJECT_RETURN_NONE, 1,
- VM_PROT_WRITE, 1);
+   VM_PROT_WRITE, 0);
   _pager_mark_next_request_error (p, offset, length, err);
 }
  out:
   return 0;

The final argument to _pager_lock_object is the 'synchronous' flag.  The
call needs to be asynchronous because libpager is single threaded, at least
in the sense that individual memory objects only process one request at a
time.  In this case, we're processing a data_unlock request, and would have
to handle a lock_completed message before lock_object would return
(synchronously).

Next, there's a problem with the rpctrace code that I recently modified,
specifically the part that synchronizes messages by processing them in
order of their 'seqno'.  Kernel messages seem to have 'seqno' zero.  In
particular, exceptions get indefinitely blocked.  I'm currently using the
following patch (not a complete fix):

--- a/utils/rpctrace.c
+++ b/utils/rpctrace.c
@@ -1232,7 +1232,7 @@ trace_and_forward (mach_msg_header_t *inp,
mach_msg_header_t *outp)

   msgid = msgid_info (inp->msgh_id);

-  while (inp->msgh_seqno != TRACED_INFO (info)->seqno)
+  while (inp->msgh_seqno > TRACED_INFO (info)->seqno)
 {
   pthread_cond_wait (& TRACED_INFO (info)->sequencer, );
 }

Maybe I shouldn't use seqno at all, and sequence messages based on their
arrival order.

Once that's been resolved, then we're back to the problem with signal
preemptors!  libpager/pager-memcpy.c includes the following code:

  void fault (int signo, long int sigcode, struct sigcontext *scp)
{
  assert (scp->sc_error == EKERN_MEMORY_ERROR);
  err = pager_get_error (pager, sigcode - window + offset);
  n -= sigcode - window;
  vm_deallocate (mach_task_self (), window, window_size);
  longjmp (buf, 1);
}

Since sigcode no longer contains the faulting address (it's in the subcode,
remember?) this code calls pager_get_error with a negative second argument
and segfaults in the handler, killing ext2fs.

What's supposed to happen (I think) is that the io_write handler in ext2fs
attempting to write data into the mapped file triggers a data unlock / data
lock / data request / data error sequence, which raises an exception on the
memcpy, which gets caught and we get an error return.  Or maybe not.  Maybe
diskfs_grow() should return an error before we attempt the memcpy.  I don't
understand the ext2fs code well enough to know.

I'm starting to go through the signal preemptor code, trying to figure out
a way to handle that problem.

agape
brent


gsync/libihash hash function

2016-11-04 Thread Brent W. Baccala
Hi -

My recent foray into gsync left me thinking a bit about its hash function:

#define MIX2_LL(x, y)   x) << 5) | ((x) >> 27)) ^ (y))

which seems a bit weak.  The result of gsync_key_hash is then taken modulo
GSYNC_NBUCKETS, which is currently 512.

libihash has a similar issue.  It seems to use no hash function by default,
but provides Murmur3 if you want it, which looks better than just shifting
bits, but also slower.

I've been looking at the CRC32 instruction, which was introduced ten years
ago in SSE 4.2.  According to the Intel Software Developer's Manual, CRC32
works like this:

Starting with an initial value in the first operand (destination operand),
accumulates
a CRC32 (polynomial 0x11EDC6F41) value for the second operand (source
operand)
and stores the result in the destination operand. The source operand can be
a
register or a memory location. The destination operand must be an r32 or r64
register. If the destination is an r64 register, then the 32-bit result is
stored in the
least significant double word and H is stored in the most
significant double
word of the r64 register.

Agner Fog (www.agner.org) reports that this instruction executes in a
single micro-op.

It seems a bit silly to agonize over performance when we've got so many
other issues, but I just wanted to shoot a message to the list to document
my research.  I'm thinking that CRC32 is our best bet to compute hash
functions on the Intel architecture.  Of course, it comes with issues like
detecting whether the processor supports it.

agape
brent


Re: kernel panic in gsync_wait

2016-11-04 Thread Brent W. Baccala
On Fri, Nov 4, 2016 at 12:00 AM, Samuel Thibault <samuel.thiba...@gnu.org>
wrote:

> Brent W. Baccala, on Thu 03 Nov 2016 15:51:04 -1000, wrote:
> > I see... so there must be fallback code (option 2 on my list); I just
> haven't
> > found it.
> >
> > Where is KERN_FAILURE handled in user space?
>
> It's not. Gsync_wait just returns, and the while loop just tries to take
> the lock again.
>

Of course; it's designed to replace a spinlock!  Very clever.

Clever can cause problems, though.  We need to document this.

I've booted a new kernel and the weird ext2fs data corruption problems have
gone away.

I'm currently working on a test case where I create a small ramdisk and
fill it to exhaustion.  'dd' doesn't cleanly error out, though; it hangs.
I'm working through the error handling code in libpager (and friends) and
have already found one problem, but there are others.

I guess I should file it as a bug.

How do we want to handle fixed bugs, like this gsync problem?  Should we
open and close a bug report, just so it's documented in the bug database?
I can open the bug, of course, but I don't have permission to close it...

agape
brent


Re: kernel panic in gsync_wait

2016-11-03 Thread Brent W. Baccala
On Thu, Nov 3, 2016 at 2:18 PM, Samuel Thibault <samuel.thiba...@gnu.org>
wrote:

> Brent W. Baccala, on Thu 03 Nov 2016 14:12:41 -1000, wrote:
> > I suspect that this ultimately affects just about every program on a
> > Hurd system.
>
> Sure, it's used internally by glibc. But see the commit I made to
> gnumach: that makes gnumach return an error. glibc thus doesn't actually
> wait, just spinlock, i.e. behave correctly, just not so efficiently.
>

I see... so there must be fallback code (option 2 on my list); I just
haven't found it.

Where is KERN_FAILURE handled in user space?

agape
brent


Re: kernel panic in gsync_wait

2016-11-03 Thread Brent W. Baccala
Hi -

I've been trying to figure what to do with the gsync code.  It causes
undefined behavior and occasional kernel panics when rpctrace is used on
gsync_wait and gsync_wake calls.  I suspect that this ultimately affects
just about every program on a Hurd system.  Even if a program doesn't call
locking primitives directly, they're used by the C library, right?

On Sun, Oct 30, 2016 at 10:16 PM, Kalle Olavi Niemitalo <k...@iki.fi> wrote:

> "Brent W. Baccala" <cos...@freesoft.org> writes:
>
> > Even if I'm right about the nature of this bug, I don't understand
> gnumach
> > well enough to know how a task should access another task's memory.
>
> vm_copy apparently supports such access; code from there could be
> reused.  But if rpctrace uses gsync_wait on the address space of
> another task and the page has been paged out, then the call could
> end up blocking for the pager, and I don't think you want that.
>

I studied vm_copy and came up with some code, but gsync_wake needs to
actually modify the memory location, so setting up a copy doesn't work.
Plus, vm_copy makes a mapping that is visible to user space and leaves the
user space code responsible for deallocating it.

As for blocking for the pager, I think the code does that already.  As
innocuous as this line of code looks:

 *(unsigned int *)addr = val;

it can trigger a page fault and block the RPC, right?

Here's what I think gsync_wait and gsync_wake need to do if their task
argument isn't the current task:

1. lookup the address in the other task's pagemap, flag the map entry
'shared', and insert a copy of it into the kernel pagemap,

2. access the memory location, triggering a page fault which actually
creates the physical mapping, since Mach doesn't create physical maps until
required, then

3a. either remove the pagemap entry from the kernel map when we're done, or
3b. leave it for future use.

If we choose option (3a), then every wrapped call to gsync_{wait,wake} will
trigger a page fault.  If we choose option (3b), then step 1 needs to be
modified to search the kernel pagemap to see if it already has a copy of
the map entry in question.  I have to think some more to figure how that
mapping ultimately gets removed, and of course there's a memory exhaustion
issue if we add lots of new entries to the kernel map on a 32-bit system.

Another possibility would be to reject the RPC back to user space, and put
the old locking code back into glibc as a fallback.

All this brings us back to...

On Sun, Oct 30, 2016 at 10:16 PM, Kalle Olavi Niemitalo <k...@iki.fi> wrote:

>
> Could gsync_wait be removed from gnumach.defs and replaced with
> only a trap that does not take a task_t parameter and cannot be
> intercepted by rpctrace?
>

I don't think we have anything else quite like gsync - kernel code that
directly accesses user space memory in a different task from the one that
trapped.  Our choices seem to be:

1. add a lot of new complexity to the gsync kernel routines
2. add fallback code to glibc
3. add a new system call as Kalle proposed

Comments?

agape
brent


Re: two more rpctrace patches

2016-11-03 Thread Brent W. Baccala
On Wed, Nov 2, 2016 at 2:39 PM, Brent W. Baccala <cos...@freesoft.org>
wrote:

> On Wed, Nov 2, 2016 at 2:26 PM, Kalle Olavi Niemitalo <k...@iki.fi> wrote:
>
>>
>> Look at how the commit messages in hurd.git at Savannah (not at
>> Debian) are formatted.  "make dist" runs gitlog-to-changelog,
>> which generates a ChangeLog file from those.
>>
>
> OK, I think I see what you want.
>

Was that last patch in the desired format?

Would you like me to reformat the commit messages on the last ten rpctrace
patches and resubmit them?

agape
brent


Re: two more rpctrace patches

2016-11-02 Thread Brent W. Baccala
On Wed, Nov 2, 2016 at 2:26 PM, Kalle Olavi Niemitalo  wrote:

>
> Look at how the commit messages in hurd.git at Savannah (not at
> Debian) are formatted.  "make dist" runs gitlog-to-changelog,
> which generates a ChangeLog file from those.
>

OK, I think I see what you want.


Re: two more rpctrace patches

2016-11-02 Thread Brent W. Baccala
On Wed, Nov 2, 2016 at 3:10 AM, Justus Winter  wrote:

>
> Cool.  Two nitpicks:  1/ Instead of attaching patches, why don't you use
> git send-email.  That is easier for everyone.  2/ The summary line of
> your patches is too long, try to keep it at ~60 chars or so, and we
> require changelog-style descriptions of the changes.
>

OK, I can do all that.

How do you submit changelog-style descriptions?  They don't seem to be kept
in ChangeLog...

agape
brent


Re: rpctrace man page

2016-11-02 Thread Brent W. Baccala
On Tue, Nov 1, 2016 at 5:15 AM, Samuel Thibault 
wrote:

>
> GNU projects usually don't have man pages, but info pages.  The
> doc/hurd.texi indeed doesn't have any part for rpctrace.  It should :)
>
>
What an embarrassing *faux pas*!  It's like fumbling to put the Metro card
in the turnstile, like asking the meaning of the local argot, like
mispronouncing Wahiawa...


> I also notice that "info settrans" doesn't seem to bring to the settrans
> part of the documentation.  That should be fixed.
>

 The attached patches should address both points.

agape
brent
From d89c1aee558a2b83270db908e923d9f3395adcb5 Mon Sep 17 00:00:00 2001
From: Brent Baccala 
Date: Tue, 1 Nov 2016 03:55:27 -1000
Subject: [PATCH 1/2] add rpctrace documentation to info file

---
 doc/hurd.texi | 147 ++
 1 file changed, 147 insertions(+)

diff --git a/doc/hurd.texi b/doc/hurd.texi
index 8428a77..06b5a61 100644
--- a/doc/hurd.texi
+++ b/doc/hurd.texi
@@ -162,6 +162,7 @@ into another language, under the above conditions for modified versions.
 * Networking::  Interconnecting with other machines.
 * Terminal Handling::   Helping people interact with the Hurd.
 * Running Programs::Program execution and process management.
+* Debugging Programs::  Tracing and debugging programs.
 * Authentication::  Verifying user and server privileges.
 * Index::   Guide to concepts, functions, and files.
 
@@ -319,6 +320,10 @@ Networking
 * libpipe::
 * Socket Interface::Network communication I/O protocol.
 
+Debugging Programs
+
+* rpctrace::Trace Mach Remote Procedure Calls
+
 Authentication
 
 * Auth Interface::  Auth ports implement the auth interface.
@@ -4647,6 +4652,148 @@ FIXME: finish
 @section proc
 @section crash
 
+@node Debugging Programs
+@chapter Debugging Programs
+
+@menu
+* rpctrace::Trace Mach Remote Procedure Calls
+@end menu
+
+@node rpctrace
+@section rpctrace
+@pindex rpctrace
+
+@command{rpctrace}
+runs a specified program until it exits, intercepting and tracing its Remote Procedure Calls.
+Child processes are also traced.  Synopsis:
+
+@example
+rpctrace [-E var[=value]] [-i FILE] [-I DIR] [--nostdinc] [-o FILE] [-s SIZE] command [args]
+@end example
+
+Each line in the trace begins with the port to which the RPC is being sent, followed
+by the name of the RPC, its arguments in parenthesis, an equal sign, and then the reply.
+
+Mach ports are identified using port numbers internal to @command{rpctrace}
+(not the program being traced),
+and are printed in the format
+@code{@var{DEST}<--@var{SRC}(@var{PID})},
+where @var{SRC} is the port number @command{rpctrace} received the message on,
+@var{DEST} is the port number it is forwarding the message to, and
+@var{PID} identifies which task the source port is associated with.
+Only traced processes are identified by PID; ports sourced from untraced processes
+(and the kernel) are tagged with PID -1.
+
+Consider the following line from @command{rpctrace}:
+
+@example
+110<--536(pid1290)->dir_lookup ("etc/ld.so.cache" 1 0) = 0 1 ""530<--540(pid1290)
+@end example
+
+Process 1290 has transmitted a @samp{dir_lookup} RPC, which was received by
+@command{rpctrace}
+on port 536 and forwarded to port 110, containing three arguments: a string and two integers.
+A reply message was received containing two integers, a null string, and a send right to
+a Mach port.  If process 1290 now transmits a message to its new send right, it will
+be received by @command{rpctrace} on port 540 and forwarded to port 530.
+
+Task ports and thread ports are recognized by @command{rpctrace}
+and printed in special formats:
+@code{@var{TASK}(@var{PID})} and @code{@var{THREAD}(@var{PID})}.
+Thus, the following line shows process 1290 making an RPC to its own task port
+(though this association is not obvious) and allocating a new receive right,
+which appears on port number 17 (in process 1290's port space, not
+@command{rpctrace}'s).
+
+@example
+task523(pid1290)->mach_port_allocate (1) = 0 pn@{ 17@}
+@end example
+
+If the message immediately following an RPC is not a reply to that RPC, a continuation
+line is printed, using a number that is the port @command{rpctrace}
+is expecting the reply on.  The following sequence shows process 1290 making two
+RPCs (probably from two different threads), and then the two replies being received:
+
+@example
+task523(pid1290)->vm_allocate (0 4096 1) ...525
+task523(pid1290)->task_set_special_port (3530<--544(pid-1)) ...543
+525... = 0 19619840
+543... = 0
+@end example
+
+Some RPCs (called @dfn{simpleroutines})
+have no reply message, and are printed with a terminating semicolon, i.e:
+
+@example
+68<--70(pid1731)->memory_object_lock_request (0 4096 2 0 8   98);
+@end example
+
+Port numbers for send-once rights are 

two more rpctrace patches

2016-11-02 Thread Brent W. Baccala
Aloha -

I'm attaching two more patches to rpctrace that close bug 48863.

agape
brent
From 10a2a49e370ca55b6cea4cdc4a54ae106b243817 Mon Sep 17 00:00:00 2001
From: Brent Baccala 
Date: Tue, 1 Nov 2016 01:07:52 -1000
Subject: [PATCH 1/2] rpctrace: don't wrap send-once rights sent to the task
 owning the receive right

---
 utils/rpctrace.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/utils/rpctrace.c b/utils/rpctrace.c
index 72ca614..6eb9892 100644
--- a/utils/rpctrace.c
+++ b/utils/rpctrace.c
@@ -708,11 +708,22 @@ rewrite_right (mach_port_t *right, mach_msg_type_name_t *type,
   /* There is no way to know if this send-once right is to the same
 	 receive right as any other send-once or send right we have seen.
 	 Fortunately, it doesn't matter, since the recipient of the
-	 send-once right we pass along can't tell either.  We always just
-	 make a new send-once wrapper object, that will trace the one
-	 message it receives, and then die.  */
-  *type = MACH_MSG_TYPE_MAKE_SEND_ONCE;
-  return TRACED_INFO (new_send_once_wrapper (*right, right, source))->name;
+	 send-once right we pass along can't tell either.  We make a new
+	 send-once wrapper object, that will trace the one message it
+	 receives, and then die, unless the source and destination tasks
+	 are the same.  This can only happen for the reply message to a
+	 mach_port RPC, a special case detected in trace_and_forward(),
+	 in which case we leave the send-once right alone, since we're
+	 passing it to the owner of its corresponding receive right. */
+  if (source == dest)
+	{
+	  return NULL;
+	}
+  else
+	{
+	  *type = MACH_MSG_TYPE_MAKE_SEND_ONCE;
+	  return TRACED_INFO (new_send_once_wrapper (*right, right, source))->name;
+	}
 
 case MACH_MSG_TYPE_PORT_RECEIVE:
   /* We have got a receive right, call it A and the send wrapper for
-- 
2.6.4

From 251d19fb55f99a358ad5cf2097be07a50a9a3057 Mon Sep 17 00:00:00 2001
From: Brent Baccala 
Date: Tue, 1 Nov 2016 01:09:21 -1000
Subject: [PATCH 2/2] rpctrace: don't use reply code unless we've found a
 matching request, and remove unused is_req field from request structure

---
 utils/rpctrace.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/utils/rpctrace.c b/utils/rpctrace.c
index 6eb9892..2664ffd 100644
--- a/utils/rpctrace.c
+++ b/utils/rpctrace.c
@@ -189,7 +189,6 @@ struct send_once_info
 /* This structure stores the information of the RPC requests. */
 struct req_info
 {
-  boolean_t is_req;
   mach_msg_id_t req_id;
   mach_port_t reply_port;
   task_t from;
@@ -210,7 +209,6 @@ add_request (mach_msg_id_t req_id, mach_port_t reply_port,
   req->from = from;
   req->to = to;
   req->reply_port = reply_port;
-  req->is_req = TRUE;
 
   req->next = req_head;
   req_head = req;
@@ -1322,18 +1320,18 @@ trace_and_forward (mach_msg_header_t *inp, mach_msg_header_t *outp)
 
   if (msgid_display (msgid))
 {
+  struct req_info *req = NULL;
+
   if (inp->msgh_local_port == MACH_PORT_NULL
 	  && info->type == MACH_MSG_TYPE_MOVE_SEND_ONCE
 	  && inp->msgh_size >= sizeof (mig_reply_header_t)
 	  /* The notification message is considered as a request. */
 	  && (inp->msgh_id > 72 || inp->msgh_id < 64)
   && !memcmp(&((mig_reply_header_t *) inp)->RetCodeType,
- , sizeof (RetCodeType)))
+ , sizeof (RetCodeType))
+	  && (req = remove_request (inp->msgh_id - 100,
+inp->msgh_remote_port)))
 	{
-	  struct req_info *req = remove_request (inp->msgh_id - 100,
-		 inp->msgh_remote_port);
-	  assert (req);
-	  req->is_req = FALSE;
 	  /* This sure looks like an RPC reply message.  */
 	  mig_reply_header_t *rh = (void *) inp;
 	  print_reply_header ((struct send_once_info *) info, rh, req);
@@ -1352,7 +1350,6 @@ trace_and_forward (mach_msg_header_t *inp, mach_msg_header_t *outp)
   else
 	{
 	  struct task_info *task_info;
-	  struct req_info *req = NULL;
 
 	  /* Print something about the message header.  */
 	  print_request_header ((struct sender_info *) info, inp);
-- 
2.6.4



rpctrace man page

2016-11-01 Thread Brent W. Baccala
Hmm... maybe somebody on the list will fix the gsync bug, since I don't
know how...

...what can I do in the meantime...

How about writing a man page?

agape
brent


rpctrace.1
Description: Binary data


kernel panic in gsync_wait

2016-10-31 Thread Brent W. Baccala
Hi -

My new and improved rpctrace is generating kernel panics when run on
ext2fs.  This happens when rpctrace calls gsync_wait, with ext2fs as the
'task' argument.

Could you guys look at gnumach's kern/gsync.c, at line 212?  It looks to me
like that code tacitly assumes that the 'addr' it's accessing is in the
memory space of the task calling the RPC, instead of the task passed in as
the first argument.

Even if I'm right about the nature of this bug, I don't understand gnumach
well enough to know how a task should access another task's memory.

agape
brent


minor bug in glibc

2016-10-30 Thread Brent W. Baccala
Hi -

I'm using my newly enhanced rpctrace to hunt down a few bugs.

Here's a minor one in glibc that shows up like this on "rpctrace /bin/true":

task337(pid1240)->mach_port_deallocate (pn{  0}) = 0xf ((os/kern) invalid
name)

The trick is to get rpctrace to halt the process when it encounters
something like that, then gdb shows you right where the problem is.  I
haven't worked that up into a publishable patch, yet...

agape
brent
--- sysdeps/mach/hurd/dl-sysdep.c~	2016-08-08 12:55:47.0 -1000
+++ sysdeps/mach/hurd/dl-sysdep.c	2016-10-30 17:25:06.564622626 -1000
@@ -472,7 +472,8 @@
   err = __io_map ((mach_port_t) fd, _rd, _wr);
   if (err)
 	return __hurd_fail (err), MAP_FAILED;
-  __mach_port_deallocate (__mach_task_self (), memobj_wr);
+  if (memobj_wr != MACH_PORT_NULL)
+	__mach_port_deallocate (__mach_task_self (), memobj_wr);
 }
 
   mapaddr = (vm_address_t) addr;


Re: [PATCH 8/8] rpctrace: use condition variable to keep messages in sequence

2016-10-30 Thread Brent W. Baccala
On Sat, Oct 29, 2016 at 1:48 PM, Brent W. Baccala <cos...@freesoft.org>
wrote:

> Yes, patch7 without patch8 can reorder messages as they're being resent,
> but patch8 uses the mutex introduced in patch7, so the ordering can't be
> reversed.
>
> They could be combined into a single patch.  Do you want me to prepare a
> combined patch and email it in?
>

OK, I'm attaching the combined patch7+patch8.

I've studied it a bit more, and it has a race condition that looks
unavoidable to me.

rpctrace on ext2fs exposes the problem.  ext2fs maps a region of memory
that it manages itself (using libpager), then calls vm_copy() on that
region, using its own task port.  This causes the kernel to make a
memory_object_data_request back to ext2fs, which makes some more requests
(mach_port_deallocate and vm_allocate) on the task port before replying
with a memory_object_data_supply, which allows the original vm_copy to
return.

So, it's calling vm_copy on its task port, and needs to do some more
operations on the task port before the vm_copy completes.  And just like
the problem I described two months ago ("mach_msg blocking on call to
vm_map"), the mach_msg call sending the vm_copy will block even if you're
not waiting for the reply message.

So... in rpctrace, I want to signal the condition variable (allowing the
next message to be processed) after the message has been sent, but since
mach_msg blocks, I can't do that without blocking the entire task port,
which deadlocks the process.

My solution is to signal the condition variable right before the call to
mach_msg, but this creates a race condition where messages can get
reordered.

As you know, I never liked this blocking behavior of mach_msg, but I just
can't see any way around it now.  If you can suggest something, let me
know...

agape
brent
From f12c28e9b16d7a3c628586b739eee3f4e004f753 Mon Sep 17 00:00:00 2001
From: Brent Baccala <cos...@freesoft.org>
Date: Sun, 30 Oct 2016 16:13:34 -1000
Subject: [PATCH] multithread rpctrace to avoid deadlocks in the kernel

---
 utils/rpctrace.c | 41 ++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/utils/rpctrace.c b/utils/rpctrace.c
index c01f8d4..72ca614 100644
--- a/utils/rpctrace.c
+++ b/utils/rpctrace.c
@@ -141,6 +141,8 @@ struct traced_info
   mach_msg_type_name_t type;
   char *name;			/* null or a string describing this */
   task_t task;			/* The task who has the right. */
+  mach_port_seqno_t seqno;  /* next RPC to be processed on this port */
+  pthread_cond_t sequencer; /* used to sequence RPCs when they are processed out-of-order */
 };
 
 /* Each traced port has one receiver info and multiple send wrappers.
@@ -250,6 +252,8 @@ struct port_class *other_class;
 struct port_bucket *traced_bucket;
 FILE *ostream;
 
+pthread_mutex_t tracelock = PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP;
+
 /* These are the calls made from the tracing engine into
the output formatting code.  */
 
@@ -334,9 +338,13 @@ destroy_receiver_info (struct receiver_info *info)
   while (send_wrapper)
 {
   struct sender_info *next = send_wrapper->next;
+#if 0
+  if (refcounts_hard_references (_INFO (send_wrapper)->pi.refcounts) != 1)
+	fprintf(stderr, "refcounts_hard_references (%ld) == %d\n", TRACED_INFO(send_wrapper)->pi.port_right, refcounts_hard_references (_INFO (send_wrapper)->pi.refcounts));
   assert (
 	refcounts_hard_references (_INFO (send_wrapper)->pi.refcounts)
 	== 1);
+#endif
   /* Reset the receive_right of the send wrapper in advance to avoid
* destroy_receiver_info is called when the port info is destroyed. */
   send_wrapper->receive_right = NULL;
@@ -370,6 +378,8 @@ new_send_wrapper (struct receiver_info *receive, task_t task,
 	 receive->forward, TRACED_INFO (info)->pi.port_right, task2pid (task));
   TRACED_INFO (info)->type = MACH_MSG_TYPE_MOVE_SEND;
   TRACED_INFO (info)->task = task;
+  TRACED_INFO (info)->seqno = 0;
+  pthread_cond_init(& TRACED_INFO(info)->sequencer, NULL);
   info->receive_right = receive;
   info->next = receive->next;
   receive->next = info;
@@ -400,6 +410,8 @@ new_send_once_wrapper (mach_port_t right, mach_port_t *wrapper_right, task_t tas
 			   sizeof *info, );
   assert_perror (err);
   TRACED_INFO (info)->name = 0;
+  TRACED_INFO (info)->seqno = 0;
+  pthread_cond_init(& TRACED_INFO(info)->sequencer, NULL);
 }
 
   info->forward = right;
@@ -451,6 +463,8 @@ traced_clean (void *pi)
 {
   struct sender_info *info = pi;
 
+  pthread_mutex_lock();
+
   assert (TRACED_INFO (info)->type == MACH_MSG_TYPE_MOVE_SEND);
   free (TRACED_INFO (info)->name);
 
@@ -466,6 +480,8 @@ traced_clean (void *pi)
 
   info->receive_right = NULL;
 }
+
+  pthread_mutex_unlock();
 }
 
 /* Check if the receive right has been seen. */
@@ 

Re: [PATCH 1/8] remove warning messages on rpctrace from 'asprintf'

2016-10-30 Thread Brent W. Baccala
On Sat, Oct 29, 2016 at 1:40 PM, Brent W. Baccala <cos...@freesoft.org>
wrote:

> On Oct 29, 2016 2:53 AM, "Samuel Thibault" <samuel.thiba...@gnu.org>
> wrote:
> >
> > Hello,
> >
> > > #define easprintf(args...)assert(asprintf (args) != -1)
> >
> > That will be removed when building with -DNDEBUG, not a good thing :)
>
> An excellent point.  I'll revise PATCH 1 tomorrow.
>
OK, I'm attached a revised patch1.  It's a bit longer than the original,
but it rebased successfully.

agape
brent
From 8f7f79220691d9ee4d32514bb7ac63aef91b3860 Mon Sep 17 00:00:00 2001
From: Brent Baccala <cos...@freesoft.org>
Date: Thu, 20 Oct 2016 20:46:32 -1000
Subject: [PATCH] remove warning messages on rpctrace from 'asprintf'

---
 utils/rpctrace.c | 50 --
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/utils/rpctrace.c b/utils/rpctrace.c
index 25d9bc6..3582df5 100644
--- a/utils/rpctrace.c
+++ b/utils/rpctrace.c
@@ -59,6 +59,19 @@ static const struct argp_option options[] =
 
 static const char args_doc[] = "COMMAND [ARG...]";
 static const char doc[] = "Trace Mach Remote Procedure Calls.";
+
+void easprintf(char **strp, const char *fmt, ...)
+{
+  va_list argptr;
+  int retval;
+
+  va_start(argptr, fmt);
+  retval = vasprintf(strp, fmt, argptr);
+  va_end(argptr);
+
+  assert_perror(retval == -1);
+}
+
 
 /* This structure stores the information of the traced task. */
 struct task_info
@@ -353,8 +366,8 @@ new_send_wrapper (struct receiver_info *receive, task_t task,
   assert_perror (err);
 
   TRACED_INFO (info)->name = 0;
-  asprintf (_INFO (info)->name, "  %lu<--%lu(pid%d)", 
-	receive->forward, TRACED_INFO (info)->pi.port_right, task2pid (task));
+  easprintf (_INFO (info)->name, "  %lu<--%lu(pid%d)",
+	 receive->forward, TRACED_INFO (info)->pi.port_right, task2pid (task));
   TRACED_INFO (info)->type = MACH_MSG_TYPE_MOVE_SEND;
   info->task = task;
   info->receive_right = receive;
@@ -978,8 +991,8 @@ wrap_all_threads (task_t task)
 	  thread_send_wrapper = new_send_wrapper (thread_receiver_info,
 		  task, _thread_port);
 	  free (TRACED_INFO (thread_send_wrapper)->name);
-	  asprintf (_INFO (thread_send_wrapper)->name,
-		"thread%lu(pid%d)", threads[i], task2pid (task));
+	  easprintf (_INFO (thread_send_wrapper)->name,
+		 "thread%lu(pid%d)", threads[i], task2pid (task));
 
 	  err = mach_port_insert_right (mach_task_self (),
 	new_thread_port, new_thread_port,
@@ -1031,8 +1044,8 @@ wrap_new_thread (mach_msg_header_t *inp, struct req_info *req)
   mach_port_deallocate (mach_task_self (), reply->child_thread);
 
   free (TRACED_INFO (send_wrapper)->name);
-  asprintf (_INFO (send_wrapper)->name, "thread%lu(pid%d)",
-	thread_port, task2pid (req->from));
+  easprintf (_INFO (send_wrapper)->name, "thread%lu(pid%d)",
+	 thread_port, task2pid (req->from));
   ports_port_deref (send_wrapper);
 }
 
@@ -1078,11 +1091,11 @@ wrap_new_task (mach_msg_header_t *inp, struct req_info *req)
 
   pid = task2pid (task_port);
   free (TRACED_INFO (task_wrapper1)->name);
-  asprintf (_INFO (task_wrapper1)->name, "task%lu(pid%d)",
-	task_port, task2pid (req->from));
+  easprintf (_INFO (task_wrapper1)->name, "task%lu(pid%d)",
+	 task_port, task2pid (req->from));
   free (TRACED_INFO (task_wrapper2)->name);
-  asprintf (_INFO (task_wrapper2)->name, "task%lu(pid%d)",
-	task_port, pid);
+  easprintf (_INFO (task_wrapper2)->name, "task%lu(pid%d)",
+	 task_port, pid);
   ports_port_deref (task_wrapper1);
 }
 
@@ -1226,13 +1239,13 @@ trace_and_forward (mach_msg_header_t *inp, mach_msg_header_t *outp)
 	if (TRACED_INFO (info)->name == 0)
 	  {
 		if (msgid == 0)
-		  asprintf (_INFO (info)->name, "reply(%u:%u)",
-			(unsigned int) TRACED_INFO (info)->pi.port_right,
-			(unsigned int) inp->msgh_id);
+		  easprintf (_INFO (info)->name, "reply(%u:%u)",
+			 (unsigned int) TRACED_INFO (info)->pi.port_right,
+			 (unsigned int) inp->msgh_id);
 		else
-		  asprintf (_INFO (info)->name, "reply(%u:%s)",
-			(unsigned int) TRACED_INFO (info)->pi.port_right,
-			msgid->name);
+		  easprintf (_INFO (info)->name, "reply(%u:%s)",
+			 (unsigned int) TRACED_INFO (info)->pi.port_right,
+			 msgid->name);
 	  }
 	break;
 
@@ -1307,7 +1320,8 @@ trace_and_forward (mach_msg_header_t *inp, mach_msg_header_t *outp)
 
 	  /* Print something about the message header.  */
 	  print_request_header ((struct sender_info *) info, inp);
-	  /* It's a nofication message. */
+
+	  /* It's a notification message. */
 	  if (inp->msgh_id <

Re: [PATCH 8/8] rpctrace: use condition variable to keep messages in sequence

2016-10-29 Thread Brent W. Baccala
On Oct 29, 2016 2:55 AM, "Samuel Thibault"  wrote:
>
> Hello,
>
> Better apply this patch before patch7, shouldn't we?  Otherwise there's
> a little git interval during which rpctrace is unreliable.
>
> Samuel

I had thought of the entire patch set being applied monolithically.

Yes, patch7 without patch8 can reorder messages as they're being resent,
but patch8 uses the mutex introduced in patch7, so the ordering can't be
reversed.

They could be combined into a single patch.  Do you want me to prepare a
combined patch and email it in?

agape
brent


Re: [PATCH 1/8] remove warning messages on rpctrace from 'asprintf'

2016-10-29 Thread Brent W. Baccala
On Oct 29, 2016 2:53 AM, "Samuel Thibault"  wrote:
>
> Hello,
>
> > #define easprintf(args...)assert(asprintf (args) != -1)
>
> That will be removed when building with -DNDEBUG, not a good thing :)

An excellent point.  I'll revise PATCH 1 tomorrow.

> Also, I don't see copyright assignment in the FSF records, did you start
> making one?

No, I haven't.  What should I use, request-assign.program, sent to
fsf-records?

   agape
   brent


rpctrace

2016-10-28 Thread Brent W. Baccala
Hi -

I've made some decent progress with rpctrace this week.  I think I pretty
much understand this program now!

In particular, I've gotten it working on ext2fs.  I had to fix the problem
I described earlier with send-once rights, and then deal with a deadlock
situation caused when ext2fs called vm_copy() on a region of memory that it
itself managed.  This caused the kernel to do some memory manager RPCs
before the vm_copy() completed - the same kind of problem I described two
months ago with netmsg.  My solution is to make rpctrace multi-threaded.
It now has a global lock that protects pretty much the entire program, but
gets unlocked right before resending a message, which prevents the
deadlock.  I've also added a condition variable to ensure that messages get
processed in-order, even if two different threads attempt to process two
messages in the wrong order.

So, ext2fs can now be traced.  I'm seeing some bizarre behavior with it,
running on a little ramdisk.  For example, when being traced, for some
reason it doesn't use the memory map interface, but instead does everything
with io_read/io_write, then doesn't perform the last io_write when being
detached and leaves the ramdisk in a dirty state.  I've also seem some
kernel panics in gsync_wait().

All in all, I think it's enough progress to warrant a patch set, even
though I've still got more stuff to investigate here.

agape
brent


Mach "pipe" between parent and child

2016-10-25 Thread Brent W. Baccala
hi -

What is the best way to fork() a child and have a Mach receive right on the
parent connected to a send right on the child?  Or vice versa?

agape
brent


Re: rpctrace design

2016-10-25 Thread Brent W. Baccala
On Mon, Oct 24, 2016 at 1:12 AM, Justus Winter <jus...@gnupg.org> wrote:

>
> "Brent W. Baccala" <cos...@freesoft.org> writes:
> > I read on the website's hurd/debugging/rpctrace page that somebody
> (zhenga)
> > had come with a new version of rpctrace.  Do we have a copy of it around
> > somewhere?
>
> We merged his rpctrace work.  His contributions made multi-task tracing
> possible.
>

OK, thanks.

I've been thinking more about my problem.  I don't think it's as bad as I
was thinking.  The question is what port names do we use when sending a
message to a send-once right?

Well, how can a traced task produce a send-once right?  There's only three
ways that I can think of.

1. It was created from a receive right held by the task.  In this case, we
should use port names corresponding to the task.

2. It was passed in from another task.  In this case, the send-once right
was wrapped by rpctrace, and rpctrace itself will get the message sent to
the right, so it hardly matters what names we use.

3. It was created from a receive right that was then passed to another
task.  When transferring receive rights, rpctrace usually holds the
original receive right and relays a different one, so again, rpctrace gets
the message, and it doesn't matter which names we use.  It looks like
previously unknown receive rights get passed through rpctrace, but it
should be possible to modify the code to wrap them.

So (with that change), if when we get a send-once right from a task, we
send it port names associated with that task, everything should work right.

I think.

agape
brent


rpctrace design

2016-10-24 Thread Brent W. Baccala
Aloha -

I've been trying to debug a problem in rpctrace which causes rpctrace to
crash when I use it to wrap /hurd/ext2fs.

The bug is triggered by a memory_object_lock_request /
memory_object_lock_completed sequence.  Specifically, ext2fs sends a lock
request to the kernel with a send-once reply_to port.  Once the lock is
complete, the kernel sends a memory_object_lock_completed message to the
send-once right, including a send right to the memory control port (for
identification purposes) and that's where the trouble starts in rpctrace.

rpctrace is designed to trace multiple tasks simultaneously, and to
identify which task is doing an RPC, it allocates separate ports for each
task (even if they wrap the same port).  So, if you pass out a send right
to three tasks, three different receive rights will be allocated in
rpctrace.  Which receive right a message comes in on indicates which task
is sending the message.

For this to work right, we need to identify which task is on the receiving
end of a send right.  Not which task the send right came from, mind you,
but the ultimate destination.  In the previous example, we transferring a
copy of the original send right, we need to pick which of the three new
send rights should be transferred, based on the ultimate destination of the
message, to ensure that the right task gets the right version of the port.
There's also a fourth case - we're transferring to a task that we're not
tracing.

All of this complexity is already built into rpctrace.  It plays games like
looking at a task's port space, extracting a send right from each remote
receive right, and checking to see if it matches a local send right, in
order to determine that the local send right's final destination is the
remote receive right on the task in question.  See discover_receive_right().

In my case, problems arise because a send-once right is used to return the
lock completed message, and there's no way to know which task the send-once
right ultimately goes to.  There's a bad pointer deference involved, but
even once that's fixed, how do you know which send right to transfer?  It's
important to get it right, since the send right is used by the memory
manager to identify different clients.

I've patched it up by assuming that the task sending the send-once right is
the ultimate destination, which works in my case, but obviously it isn't
right in general.

The more I think about it, the more I'm thinking that it's a design flaw in
rpctrace.  We need to identify ultimate destinations, but can't do that
reliably.

I read on the website's hurd/debugging/rpctrace page that somebody (zhenga)
had come with a new version of rpctrace.  Do we have a copy of it around
somewhere?

I could submit the patches that I've got, but they're not right, and I
don't see any way to make them right.  I'm thinking now that the way to fix
it is to redesign rpctrace so that each wrapped task gets a separate
rpctrace task wrapping it.  That way, we should be able to determine which
task makes which RPC without the problems I've described above.

I'm also thinking that I don't want to undertake rewriting rpctrace right
now.  I was just trying to fix it so that I could understand what ext2fs
was doing.

Comments?

agape
brent


commit 341f43d: boot: Ignore EINTR.

2016-10-23 Thread Brent W. Baccala
Dear Justus,

What are the symptoms of this bug?

I've been seeing sporadic behavior where my boot hangs right after "exec"
prints in the "Hurd server bootstrap" line.  Is this patch related to that?

agape
brent


Re: libpager multi-client support

2016-10-18 Thread Brent W. Baccala
On Tue, Oct 18, 2016 at 2:37 AM, Richard Braun  wrote:

>
> From my fifteen years or so experience working in various projects with
> C++ and other languages, my opinion is that C++ is never the best
> choice. It makes you trade lines of code for increased complexity in
> understanding and following the code. You should either go with C for
> excellent control, or another higher level langguage for e.g. user
> interfaces and tasks that don't require the highest performance.
>

My opinion is that C++ features have to be used very, very judiciously to
avoid the problems you describe.  It's very easy to fall into a "C++
mindset" and try to write everything in "C++ style".  My solution has been
to look at each problem and ask myself how to write the code in the
simplest manner.


> I really don't think the problem you describe would be so hard to solve
> in C.
>

Well, let's see.  To handle arbitrary numbers of clients, we have to
dynamically allocate all those queues and lists.  We have to copy them when
adding new clients or removing old ones (more dynamic allocation).  To
constrain memory utilization, we have detect when they're no longer used
and deallocate them.

Can it be done in C?  Sure.  But following my principle of asking how to
write the code in the simplest manner, this one is obvious: STL containers
win.

agape
brent


Re: libpager multi-client support

2016-10-18 Thread Brent W. Baccala
Aloha -

I've been thinking more about the data structures involved with the new
libpager code, and they're complex enough that I'd like to write them in
C++11.  The pagemap has to contain a list of all clients with copies of a
page, and a queue of clients waiting to access the page.  To keep the
memory footprint under control, the pagemap itself should be an array of
pointers to structures that are reused if multiple pages have the same
"clientèle", so they also need a usage counter, plus we need to make copies
of them (say, when a new client is added), which requires making copies of
their constituent lists and queues.  STL containers and shared pointers
seem like a good choice.

C linkage allows backward compatible use of the library.

The biggest drawback that I see is that memory usage might get out of
hand.  A std::shared_ptr, for example, is four times bigger than a regular
pointer.  After reading http://gamedev.stackexchange.com/questions/268 I
think it might not be so bad, but I'm not thrilled about it, either.  We're
basically paying for code simplicity with memory.

Also, programs like ext2fs would be linked against the standard C++
library, but I don't know if that's really such a problem.

Obviously, introducing C++ into libpager would probably open the door to
large scale use of C++ in Hurd.  This might not be such a bad thing.  I
suspect that serialization of RPCs could be done with templates,
eliminating a lot of the need for MIG.

What do you guys think?

agape
brent


Re: workflow with Debian patches and Git repositories (was: libpager multi-client support)

2016-10-17 Thread Brent W. Baccala
On Sat, Oct 15, 2016 at 11:17 PM, Kalle Olavi Niemitalo  wrote:

>
> I also have a "hurd-debian" working tree.  I extracted that from
> the Debian source package, so that I got a .pc directory with the
> correct state information.  I then added a .git directory cloned
> from "git://anonscm.debian.org/pkg-hurd/hurd.git", so that I can
> view the history and make local commits.
>
>
This is the step that most raises my hackles.  You tack a .git directory
onto the unpacked Debian source package?

It looks to me like that Debian git tree contains an unpacked snapshot of
the savannah git tree.  Various commits there are labeled "new upstream
snapshot"; I suppose that's how changes to savannah get imported?  And
those other .tar.gz files that make up the Debian source package are
unpacked into it as well?  Are they snapshots taken from the incubator git
tree?

Is the Debian git tree composed exclusively of pieces pulled from various
git trees on savannah?

How does the Debian source package actually get built?  Is there a script?

Could we reorganize the Debian git tree to import the savannah git trees as
submodules?  Or has this approach been deliberately rejected?

And then, of course, the Debian patches are checked into git as files, not
(git) patches.  I was just reading about "git-buildpackage", which manages
Debian patches by converting them back and forth to git patches on a
dedicated branch.  You keep this branch local and rebase it to apply the
patches to a new location.  Sounds a little crazy, but interesting.

Just trying to get my mind around all this.

agape
brent


libpager multi-client support

2016-10-13 Thread Brent W. Baccala
aloha -

I've started looking at adding multi-client support to libpager.

My first question is for advice on managing my workflow.  I'm using the
Debian hurd package, which adds patches on top of a snapshot taken from
savannah's git tree.  I think I want to work on the Debian-ized code, since
that's what's installed on my system, but that leaves me without git.  It's
just a snapshot, not an actual git repository.  Any advice?

Then, should I wait until I've got it working and send the changes to this
list as a comprehensive set of patches?  That seems like what others have
done in the past.

I've reviewed the existing code, and I have a problem with the function
pager_offer_page().

First, the API is problematic, since pager_offer_page() is a call from the
memory server (i.e, ext2fs) into the libpager library, instructing the
library to offer a page to the client (i.e, the kernel) that hasn't been
solicited by the client.  The problem is that the function parameters don't
indicate which client to offer the page to.

Second, I can't find anywhere in the hurd source tree where this function
is actually used.

Third, why would we want this feature?  Why would a memory server ever want
to send an unsolicited page to a client?

So, I propose deprecating pager_offer_pager() by simply removing it from
the library.

Any objections?

agape
brent


Re: behavior of NO SENDERS notifications when receive rights move

2016-09-30 Thread Brent W. Baccala
On Fri, Sep 30, 2016 at 9:17 AM, Kalle Olavi Niemitalo  wrote:

>
> A future version of rpctrace might want to move receive rights
> if it were able to attach to a preexisting task.
>

That's an important and interesting application that I hadn't thought of.

Now I'm wondering - how would DEAD NAME notifications be handled?  rpctrace
would want to transfer send rights with the DN notifications attached (so
it could wrap them both), but my experience, and my understanding of the
Mach documentation, is that moving a send right with a DN request triggers
a PORT DELETED notification, which is not what we would want.

Right now, I can't think of any way to circumvent the problem with the
current Mach API.  We could copy the send rights instead of moving them,
which would avoid triggering the notification, but then how would we
interpose rpctrace?  We could swap the ports all around with the target
task halted, and I'm not sure what would happen then.

We'd probably have to modify the kernel to allow rpctrace to transparently
attach like that.

agape
brent


Re: RFC: Revised authentication protocol

2016-09-27 Thread Brent W. Baccala
On Mon, Sep 19, 2016 at 9:52 AM, Olaf Buddenhagen 
wrote:

> I'm a bit confused here: my understanding was that you essentially
> wanted to implement a "single system instance" cluster. I would have
> thought that would imply only a single instance of most servers --
> including auth -- rather than separate ones for each node?...
>

Well, I'm starting with a network transport for Mach messages that should
be usable for remote filesystem and remote process execution.  I've thought
a little bit about how to reconfigure the boot process to use centralized
auth and proc servers, but only a little bit.  Right now, I'm trying to
achieve one of the goals on the translator wish list: "network file system
by just forwarding RPCs

"

Also, as Richard noted, it might be best to use distributed processes.  A
cluster filesystem, for example, might be implemented a bit like unionfs,
with the local filesystems on each node underlying a global filesystem.
The global filesystem could be implemented by a group of processes, one per
node.  A process's file objects would generally be handled by the global
filesystem's local process, which would use the underlying local filesystem
for replicated files, cached files, locally created files, etc., and only
hand off to remote nodes for files unavailable on the local filesystem.

I see your point, though.  It would be ironic to modify the auth protocol
only to end up with a cluster using a single auth server.

agape
brent


Re: [PATCH gnumach] RFC: Create a malleable syscall interface.

2016-09-26 Thread Brent W. Baccala
On Thu, Sep 22, 2016 at 12:21 PM, Justus Winter  wrote:

>
> % ~/build/machometer/machometer
> N: 33554432 (1<<25), qlimit: 5
> mach_print:   5s75us171.363354ns5835553.391 (1/s)
>nullmsg:   6s33us188.648701ns5300858.136 (1/s)
>   producer:  27s65us824.034214ns1213541.844 (1/s)
>   consumer:  27s65us824.034214ns1213541.844 (1/s)
>
>
Where can I find "machometer"?

agape
brent


Re: firmlink deleting files on boot / interpretation of find -xdev switch

2016-09-07 Thread Brent W. Baccala
On Wed, Sep 7, 2016 at 11:49 AM, Richard Braun  wrote:

>
> We really, really don't want to make standard Unix tools aware of
> Hurd-specific stuff, because that allows us to completely reuse the
> work of others. With a trend towards systemd, it's even more likely
> that our efforts will be put into providing some of the stuff specific
> to _others_ system instead.
>

> Personally, I would consider any solution that isn't entirely contained
> in the Hurd (kernel, servers, glibc and related) to be a no-go.
>

OK, I understand.  I personally lean in the direction of adding something
like an "-xtrans" switch to find, telling it not to enter translators,
because that's a lot clearer than usurping the interpretation of existing
switches from systems without translators.  However, I also appreciate the
wisdom in what you say, in which case I revert to my earlier suggestion of
modifying the FTS code in glibc to interpret FTS_XDEV to mean "don't enter
translators".


> > Makes sense.  The parent is where you've got all that information.  Is
> > there no way to retrieve it?
>
> There might, I haven't looked thoroughly, and it could be implemented
> if needed.
>

OK.  I just though I might be overlooking something obvious.


> We'd also have to make sure that remove()/unlink()/rmdir() don't cross
> the file system into the untrusted translator.


How do we do that without modifying programs?  Probably the FTS code;
that's what both rm and find seem to use to transverse directory structures.

Also, I agree with Kalle that not entering translators should be the
default for "rm".  If so, and we modify FTS without touching the programs,
then it also becomes the default for "chmod", "chown", "chcon", "grep", and
"du".  In particular, I don't think we want that for "grep" (not so sure
about the others).

If I understand you, Richard, you'd like to see grep's default be to enter
trusted translators, but not untrusted ones.  Am I correct?

I'm not sure I understand when you say "More limited in that our trust
> set is finite". Actually, we'd like our trust set _not_ to be finite,
> since we want the system to be extensible, by both the admin and any
> unprivileged user. Again, too rigid.


I meant that we have a standard set of trusted translators in /hurd, and
that set is finite, just like the set of programs in /bin is finite.  We
certainly don't have a means for verifying any old program in a user's bin
directory to see if it's safe to run as root.

Would you like to see a scheme where only a limited set of trusted
translators were accessible to a process, and the user had the ability to
extend the trust set of his own processes?  Something like adding
directories to your own PATH, but this would apply to translators running
under different UIDs, and not just programs that you started yourself?

agape
brent


Re: firmlink deleting files on boot / interpretation of find -xdev switch

2016-09-06 Thread Brent W. Baccala
On Tue, Sep 6, 2016 at 2:05 AM, Richard Braun <rbr...@sceen.net> wrote:

> On Mon, Sep 05, 2016 at 09:55:44PM -1000, Brent W. Baccala wrote:
> > I haven't been able to find any other places on my system where find uses
> > -xdev; just bootclean.sh, but my search has not been exhaustive.
> >
> > Obviously there's been a long history behind this problem, and I'm new on
> > the scene.  Does this change make sense?
>
> Any solution that isn't a global system solution is doomed to fail,
> since this problem may affect any RPC, including those that don't
> exist yet.
>

Let's try to hash this out, please.  I've read the entire "critique" and at
least some of the prior discussion on this list.  My response to it has
been, "we can work this out".

I think you have an excellent point on the need for trusted translators.  I
would suggest, however, that UNIX systems already have a basic solution to
this problem - the /bin, /sbin, and /usr/bin directories.  It's trivial to
write a program that looks like "rm" (or even "ls") but instead wreaks all
kinds of havoc.  The hard part is getting root to run it - that's why we
don't search dot in our PATHs.  It's unsophisticated, but it's simple and
usually works.

Clearly a user could fashion a translator that presented a filesystem in a
very deceptive manner to an unsuspecting superuser.  Maybe we should trust
the translators in /hurd, much like we trust the binaries in /bin, and
fashion our shell prompts and directory listings so they clearly warn us
when we're dealing with a translator that isn't in that trust set.

Yet, I don't think that this is the case here.

Consider the case with symlinks.  If "rm" transversed them, it could be a
big problem.  Let's see... what's the option for that?... oh, there is
none!  Isn't that interesting?  "rm" has no option to follow symlinks!

"find" does, however.  "find -L -delete" is a dangerous combination, and
when run as root will trigger the exact same behavior that we're seeing
with firmlinks in /tmp.

So, part of the solution is just making sure that the system scripts and
binaries do what we want.  That "find" command used to clean /tmp should
not recurse into translators.  It should delete the underlying nodes
instead.

> On a related note, how do you find the owner of a passive translator?  I
> > expected either showtrans or ls to provide that information (perhaps
> with a
> > verbose switch), but it had eluded me...
>
> I'm not sure you can from outside the parent.
>

Makes sense.  The parent is where you've got all that information.  Is
there no way to retrieve it?


> Besides, note that Justus and I are currently pushing towards the use
> of "translator records" instead of "passive translators" to more
> accurately reflect what they truely are.
>
> The solution, whatever it is, should focus only on determining whether
> a server can be trusted or not. This should affect everything (servers,
> (active) translators and translator records).
>

Yes, we need to clearly determine when a server is trusted.  Yet I think it
has to be both more comprehensive and more limited than that.  More
comprehensive in that we need to determine whether programs can be trusted
as well.  More limited in that our trust set is finite.

In this case, our trust set is "ext2fs", "find", and "bootclean.sh" (plus
shared libraries, the kernel, the shell, etc).  We trust ext2fs to notify
us when it's handing off to a different translator.  We trust "find" to
respect those notifications and not to cross those boundaries.  We trust
"bootclean.sh" to clean /tmp without touching the rest of the filesystem.

This seems doable!  The questions I see are, do we introduce a new switch
to find (-xtrans), and have to maintain a Hurd-specific version of
bootclean.sh, or do we overload a legacy switch (-xdev) and tolerate a more
obscure and confusing option name?  Do we introduce a "-x" switch to rm, to
make "rm -rfx" avoid translators, or do we make this the default behavior,
since "rm -rf" is so ingrained into people's minds that we need to respect
its semantics?

One step at a time.  Can we answer these questions, and fix our boot
sequence to properly clean /tmp?

agape
brent


firmlink deleting files on boot / interpretation of find -xdev switch

2016-09-06 Thread Brent W. Baccala
On Thu, Sep 1, 2016 at 12:38 PM, Richard Braun  wrote:

> This was famously shown with the example of the
> firmlink translator used in /tmp, which would cause the removal of
> any file targeted by the firmlink on /tmp cleanup during system
> startup.
>

I see that.  It seems to still have that problem.  I created a directory
/root/baitdir, and put in it a file named 'bait'.  As a non-privileged
user, I created a firmlink in /tmp to /root/baitdir and rebooted.  Voila!
'bait' vanished.

I took the time to read some of this mailing list's archive on the
subject.  The consensus seems to be that you can't trust unprivileged
translators.  So "find", which is used to clean /tmp, should not, in this
case, cross translator boundaries.

I was thinking at first that we should have something like the "-xdev"
switch; "-xtrans", maybe?

Yet since filesystem mounts are themselves done with translators, what does
"-xdev" mean on Hurd?  I've poked around a bit in the source, and played
with 'stat'.  It seems like several translators take an arbitrary number
and present it as their device number.  Seems like legacy support, and it's
easy for a translator to defeat -xdev by announcing the same device as its
parent.

So, now I'm thinking that find's "-xdev" option shouldn't cross translator
boundaries, and since find uses FTS, and the find call in
/lib/init/bootclean.sh already specifies -xdev, that would require only a
change to glibc.  This would affect any program that uses the FTS library
calls.

Since "rm" also uses FTS, this change would affect rm.  It's
--one-file-system option would have the effect of avoiding recursion into
translators.  It doesn't sound like a bad thing.  In fact, it sounds to me
like that switch might become a lot more useful.  A few slight changes to
rm itself, and we could use "rm -rfx" as a common verb meaning "delete
everything and don't go into translators".

"chmod", "chown", "chcon", "grep", and "mv" also use FTS, but don't provide
options that map through into FTS_XDEV.  "du" uses FTS and does provide
such an option (-x / --one-file-system).  These are the only programs that
I've been able to find on my system that use FTS.

I haven't been able to find any other places on my system where find uses
-xdev; just bootclean.sh, but my search has not been exhaustive.

Obviously there's been a long history behind this problem, and I'm new on
the scene.  Does this change make sense?

On a related note, how do you find the owner of a passive translator?  I
expected either showtrans or ls to provide that information (perhaps with a
verbose switch), but it had eluded me...

agape
brent


RFC: Revised authentication protocol

2016-09-05 Thread Brent W. Baccala
Aloha -

For those of you who have followed my netmsg threads, you may remember that
there were two major issues when I first got it running.  One was
libpager's lack of multi-client support, which has been discussed at
length, and the other was authentication, which we haven't discussed at all.

Here's my proposal for dealing with the authentication issue.

There should be an extra send right passed from the auth server, to the
client, that the client then passes along to the server in its
authentication request, and that the server then passes on to the auth
server in its auth_server_authenticate messsage.  This is distinct from the
auth_object.  Unlike the auth_object, this right conveys no credentials and
is used primarily to identify the auth server.  Let's call it the
auth_identifier.

In the present scheme, you can see that the auth_identifier gets passed
around in a big circle.  When the auth server gets the
auth_server_authenticate message from the server, it sees that it's gotten
its own auth_identifier back, and things proceed as they do now.

The more interesting case is when the auth_identifier differs.  That
happens when we're dealing with two different auth servers on different
hosts.

First, the auth_identifier allows the auth server to detect this case.

Next, it provides a means for the two auth servers to complete the
rendezvous.  The server's auth server can pass the rendezvous right to the
client's auth server using the auth_identifier.

There's no way for the server's auth server to fool the client's auth
server into thinking that this is a normal server authentication request,
because the only communication between the two is via the auth_identifier.
The client's auth server can clearly distinguish messages coming in over
this port from auth_server_authenticate messages, which would come directly
from a server over a different port.

Finally, the auth servers can use the auth_identifier to speak any kind of
inter-auth-server protocol that they wish.  At the moment, the exact
details of that protocol are beyond the scope of this proposal.  I
envision, perhaps, a public key exchange.  Something that would let
mutually cooperating auth servers decide how to map UIDs/GIDs from one auth
domain to another.  I haven't thought through the details, but I think this
proposal is flexible enough to handle just about anything we'd want to do.

Comments?

agape
brent


Re: mach_msg blocking on call to vm_map

2016-09-01 Thread Brent W. Baccala
On Thu, Sep 1, 2016 at 12:38 PM, Richard Braun  wrote:

> > > Most modern microkernels use synchronous IPC
> > > that just block, and many operations on top of them do block. The
> > > overall system on the other hand must be careful to avoid such
> > > deadlocks.
> >
> > OK, I read the Mach documentation for mach_msg() and concluded that it
> was
> > like a POSIX read(), that I could operate it in a mode where the kernel
> > absolutely would not block my process, and would return EWOULDBLOCK
> > instead.  That's basically a kernel guarantee, at least as much as it is.
> > (Notice that it doesn't guarantee how long the system call will take - 1
> > ms?  1 s?  1 week? - because it's not a real time system, which is why I
> > say "as much as it is")
>
> Yes, you can think of mach_msg as such a system call. Note that if
> the timeout is 0, it will return immediately instead of blocking,
> which is what a real-time system would do too. Real time systems
> aren't about that at all.
>
> > Are you now saying that's not how it works on Mach/Hurd?  If so, please
> let
> > me know, because I've been under a big misunderstanding that I need to
> get
> > cleared up!
>
> I think your mistake here is using MACH_SEND_TIMEOUT instead of
> MACH_RCV_TIMEOUT. Your message certainly was sent, so there is no
> reason to get a timeout error for that.
>

Here's the call:

 mach_msg(msg, MACH_SEND_MSG | MACH_SEND_TIMEOUT, msg->msgh_size,
  0, msg->msgh_remote_port,
  0, MACH_PORT_NULL);

Why should I specify MACH_RCV_TIMEOUT?  It's not a receive call.

The way my code is structured, one thread handles the traffic from the
network to IPC, and a separate thread handles the traffic from IPC to the
network.  This call is in the network-to-IPC thread.  This thread never
receives anything except network traffic, which it blocks for.

I want a non-blocking send.  This one blocks if the message is vm_map, the
memory object passed in is goofed, and the message is targeted at a task
port on the local kernel (and it doesn't have to be the task that calls
mach_msg).


> Now that you know you should be using MACH_RCV_TIMEOUT, you should see
> that no server can block you indefinitely.


Just so we're on the same page here, should that call above block or not?

I just tried it again with MACH_RCV_TIMEOUT; it does the same thing.

agape
brent


Re: mach_msg blocking on call to vm_map

2016-09-01 Thread Brent W. Baccala
On Thu, Sep 1, 2016 at 10:28 AM, Richard Braun  wrote:

>
> I completely disagree.


Thank you, Richard.  Really!  Thank you for disagreeing.  Now we can have a
good discussion about this!


> Most modern microkernels use synchronous IPC
> that just block, and many operations on top of them do block. The
> overall system on the other hand must be careful to avoid such
> deadlocks.
>

OK, I read the Mach documentation for mach_msg() and concluded that it was
like a POSIX read(), that I could operate it in a mode where the kernel
absolutely would not block my process, and would return EWOULDBLOCK
instead.  That's basically a kernel guarantee, at least as much as it is.
(Notice that it doesn't guarantee how long the system call will take - 1
ms?  1 s?  1 week? - because it's not a real time system, which is why I
say "as much as it is")

Are you now saying that's not how it works on Mach/Hurd?  If so, please let
me know, because I've been under a big misunderstanding that I need to get
cleared up!

Can a bunch of screwy translators legitimately cause mach_msg() to block
for some user space thing that might never happen, even if I've supplied
MACH_SEND_TIMEOUT?

Shouldn't it just return with no reply message instead?


> I don't see anything wrong with vm_map misbehaving if the underlying
> implementation is wrong, just fix that implementation, e.g. by
> actually taking the send timeout into account here, or making
> libpager handle multiple clients like you want.
>

Yes, but libpager is in user space.  Isn't one of the great selling points
for Hurd is that we put so much stuff into user space, and the kernel
offers us guarantees (read: "guarantees") that we're protected from
misbehaving stuff in user space?

>
> Queuing the operation would only add tremendous complexity to an
> already extremely complex IPC mechanism where messages are
> allowed to be partially sent... Besides, the Unix semantics
> often require blocking, since the original system calls are,
> well, completely synchronous with respect to user thread
> execution (it's actually the very same thread running kernel
> code). So you'd only add another layer of synchronization,
> and it would block there.
>

> My personal opinion on the matter is that you should only invoke
> remote objects if you trust them.


How pervasive is this in the design?  Is vm_map only one of many RPCs that
can block mach_msg() if some critical system translator is on the blink?


> The original Hurd design,
> however, was explicitly meant to allow clients and servers to
> be "mutually untrusting". I'm not exactly sure what this means
> in practice but it seems that clients can detach themselves from
> servers at any time. So making the timeout work, and allowing the
> transaction to be interrupted (usually with a - hardcoded ? check
> how glibc handles Ctrl-C/SIGINT during an RPC - 3 seconds grace
> delay before brutal interruption) may be the only things required
> to make the behaviour comply with "Hurdish expectations".
>
> Thank you for that clarification.  I've figured out that Ctrl-C is handled
by a message.  Does glibc spawn a separate thread to handle those
messages?  Is that why all of the processes on the system have at least two
threads?  That 3 second timeout - what is it, exactly?  I'll have to look
at the code, but this is something I've only partially puzzled out.

agape
brent


mach_msg blocking on call to vm_map

2016-09-01 Thread Brent W. Baccala
Aloha -

I've run into a real kernel-level problem with 'netmsg'.

It's related to the libpager issue.  The problem arises when a process gets
an unworkable memory object and tries to vm_map it.  This causes the
mach_msg() that sent the vm_map to block indefinitely, even though I've
specified MACH_SEND_TIMEOUT with a zero timeout.

More specifically, the process in question is the exec server.  It gets a
memory object from the file server to read a file, then uses it to map the
file into a remote task.  This causes a vm_map to go across the network
connection.  The kernel, upon receiving the vm_map, sends a
memory_object_init message, and then blocks waiting for the reply.

The block occurs in vm_object_copy_strategically(), which is labeled in its
comments "[t]his operation may block".  Almost the first thing it does it
to wait for the memory object to become ready.

In our case, libpager already has a different client, so the memory object
never becomes ready.

The big problem, as I see it, is that mach_msg() is blocking, and that
hangs my entire thread.  It seems to me that these low-level RPC operations
like vm_map can't block, otherwise it would defeat the purpose of
MACH_SEND_TIMEOUT.  So vm_map() should record the mapping and then return,
putting the copy operation on some kind of queue.  I guess.

Any thought on how to resolve this?

agape
brent


problems with a subhurd

2016-08-31 Thread Brent W. Baccala
Aloha -

While testing the exec server, I setup a very minimalist subhurd using just
the most essential files, as opposed to copying the entire filesystem, and
uncovered a number of bugs.

I've refined the process into a shell script (attached) which creates the
subhurd on a ramdisk and then boots it.

At least three bugs become apparent:

1. /hurd/startup doesn't fallback on /bin/sh if it can't exec
/etc/hurd/runsystem.  This is easy to fix - just a missing increment.
Patch attached.

2. /hurd/startup naively assumes that SIGCHLD and waitpid() both work on
init (PID 1), but they don't.

I've been able to patch this up by introducing special cases to check for
HURD_PID_INIT in proc/wait.c's alert_parent (if PID is HURD_PID_INIT then
ignore the p_parent field and treat startup_proc as the parent) and
S_proc_wait (if we're called from procserver, make a special attempt to
reap(init_proc)), but I hesitate to submit this as a patch.  I'm not sure
how we want to do this.  Introduce special cases for init everywhere we've
got a problem with it?  Also, after fixing bug #1, this screws up startup's
attempt to start a new shell if the old one dies.  proc doesn't like having
a second init process started after the first one has died and been
reaped.  Maybe startup shouldn't try to start a second init, even if the
first one dies.  And startup still should have some way to detect if init
dies.

Our current setup is that PID 5 (ext2fs) runs first, then starts PID 2
(startup), which starts PID 1 (init).  Weird.  The cleanest solution, of
course, would be to have proc actually respect these parenting
relationships, then SIGCHLD and waitpid() would work normally.

3. Booting the subhurd, then running "halt -f" from its shell crashes the
parent Hurd.  Here's what the subhurd displays:

# halt -f
startup: notifying ext2fs.static pseudo-root of halt...done
startup: Killing pid 1
startup: Killing pid 3

...and here's what I see on the parent's console:

panic: thread_invoke: thread 9fcbfc80 has unexpected state 86
Debugger invoked: panic
Kernel Breakpoint trap, eip 0x810200f4
Stoppedat  Debugger+0x13:int$3
Debugger(810e015e,7,0,81a2fc90,9fcc6960)+0x13
panic(810e4740,810d9666,9fcbfc80,86,9fcbfc80)+0x79
state_panic(81051d8f,9bbb48dc,0,9fcbfc80,81029300)+0x17
thread_invoke(9fcbfc80,81029300,9b58f698,810292b3)+0x258
thread_run(81029300,9b58f698,0,81029375)+0x49
idle_thread_continue(9bb92568,81028a70,9c092fe4,0,9bd15488)+0x125
db>

Thread 0x9fcbfc80 is a kernel thread, and state 86 is TH_RUN | TH_SUSP |
TH_IDLE.  Not sure how it gets there.

agape
brent


subhurd
Description: Binary data


patch
Description: Binary data


Re: Denial of service attack via libpager

2016-08-29 Thread Brent W. Baccala
On Sun, Aug 28, 2016 at 11:15 PM, Richard Braun <rbr...@sceen.net> wrote:

> On Sun, Aug 28, 2016 at 05:12:35PM -1000, Brent W. Baccala wrote:
>
> > The obvious additional client would be a remote kernel, but as the
> exploit
> > program that I posted shows, it could just as easily be an unprivileged
> > process.  You don't need much permission to get a memory object, just
> read
> > access on the file.
>
> OK, this comes from the fact that io_map directly provides memory
> objects indeed... Do we actually want to pass them around ? How
> come calls like memory_object_init (specifically meant to be used
> between the kernel and the pager) can be made from any client ?
>

Good question!

How could we authenticate the kernel to avoid unprivileged access?


> The changes involved here are heavy, which is one reason we'd want
> to avoid them. It also makes the system slightly slower by adding
> a new layer of abstraction. So we may just want to support multiple
> clients at the pager level, but I really don't see the benefit
> other than "it works". You really need to justify why it's a good
> thing that any unprivileged client is allowed to perform memory
> object management calls...
>

I don't see why unprivileged clients should be able to participate in this
protocol.

We need multi-client support so that multiple privileged clients can
participate.

My goal is to build a single system image Hurd cluster.  We need to support
multiple processes mmap'ing the same file, for basic POSIX compatibility.
If those processes are on different nodes, then the file server needs to
handle multiple kernels as paging clients.


> In addition, I've just thought about something else : if we handle
> multiple clients, how do we make sure that two kernels, caching the
> same file, don't just completely corrupt its content ? We'd need
> some kind of cooperation to prevent the file being mapped more than
> once I suppose, right ?
>

They can already do that with ordinary io_write's.  It's not that clients
can trash the file if they don't have write access; they can't.  It's a
denial of service issue.

Or, well, a complete cache coherence protocol, with a very large
> overhead per operation.
>

That's what I'm talking about!

Let's think about the "very large overhead" for a minute.  If we've only
got a single client, there's no extra overhead at all.  That's the case
we've got now, so we're not losing anything.

If two processes on separate nodes have mmap'ed the same file with write
permissions, you bet that could generate some very large overhead!  The
programmer has to take that into account, and avoid using that technique in
critical sections of code.

Yet it needs to be supported for POSIX compatibility, and in non-critical
code it might not be a problem at all.  Two tasks could probably bat a 4KB
page back and forth a hundred times and you'd never notice, just so long as
they settled down and stopped doing it once they got initialized.

Furthermore, my reading of the memory_object protocol suggests that Mach's
designers already had this in mind.  We don't need to design a cache
coherence protocol, since we've already got one.  We just need to implement
it.

agape
brent


Re: netmsg can now exec files (sort of)

2016-08-29 Thread Brent W. Baccala
I've figured out why the patched exec server didn't work with mmap, and
just opened a bug on it, with a fix attached.

So now I've got a working, mmap-less exec server that burns a lot of extra
RAM (each process gets its own private copy of the C library), but lets me
execute files across a netmsg TCP/IP session.

I think the next logical step is to get it to attempt an mmap, and only
fall back if that doesn't work.

agape
brent


Re: Denial of service attack via libpager

2016-08-28 Thread Brent W. Baccala
On Sun, Aug 28, 2016 at 12:49 PM, Richard Braun  wrote:

>
> I'm really not seeing the relation between "multiple clients" and
> "multiple threads". Libpager must be able to handle multiple clients
> with a single thread, otherwise we don't control scalability and we're
> back to where we were before Justus' rework...
>

Are you referring to the work about two years ago to use a fixed number of
threads?

libpager handles multiple clients with a single thread, but I don't think
it can handle multiple clients for a single file.

In ext2fs/ext2fs.h, we find:

/* ext2fs specific per-file data.  */
struct disknode
{
  ...
  /* This file's pager.  */
  struct pager *pager;
  ...
};

which suggests a single struct pager for each file, right?  And in
libpager/priv.h, we find:

struct pager
{
  ...
  /* Interface ports */
  memory_object_control_t memobjcntl;
  memory_object_name_t memobjname;
  ...
};

which suggests only a single client, with a single pair of control/name
ports, for each struct pager.

I'm saying that the struct pager needs to have a list of multiple clients,
with multiple control/name port pairs.


> And again, I think it's much easier and much more helpful to change
> exec and others to _avoid_ mmap, and copy the data in instead,
> possibly (and in this case probably) with zero-copy.
>

I should elaborate on what I found with exec.  After I fixed the problem
with the exec server mmap'ing the library's ELF headers, it just got on a
little bit further in the process, and then croaked when it tried to mmap
the program binary itself.

Assume two hosts, Alice and Bob.  Alice has mounted a remote file system
from Bob and now tries to execute a file residing on Bob.  So we have the
new task and the shared libraries, both on Alice, and the exec server and
the program binary, both on Bob.  Bob's exec server will try to mmap the
shared library headers into its own address space to examine them, which is
a problem.  Once we get past that point, the exec server tries to mmap the
program binary (on Bob) into the new task (on Alice).  Bob's ext2fs
translator now has a new memory object client - Alice's kernel.

So we still have to mmap across the network.  We certainly don't want to
avoid mmap's entirely for program text and (especially) for shared
libraries.  Although I admit that it would be best to detect when the mmap
fails and fall back on ordinary reads.


> Finally, I fail to see how making libpager handle multiple clients
> will solve that issue. The only client should be the local kernel,
> right ?


The obvious additional client would be a remote kernel, but as the exploit
program that I posted shows, it could just as easily be an unprivileged
process.  You don't need much permission to get a memory object, just read
access on the file.

agape
brent


Denial of service attack via libpager

2016-08-28 Thread Brent W. Baccala
Aloha -

I've written a short program (attached) that demonstrates how libpager's
support for only a single client can be used to mount a denial of service
attack against the kernel.

It works by opening a file, grabbing its associated memory object (if it
can), and holding it until you hit CNTL-C.  Nothing more than read access
is required.

If successful, the kernel can not exec the file, because it needs a memory
object to mmap() the file, and the program is already holding libpager's
single memory object.

It seems like once the kernel execs a file, it continues to hold the memory
object, so the attack, to be successful, needs to be against programs that
have never been exec'ed.  It's therefore "best" run on a cleanly booted
system.

An unprivileged user can run "grab-memory-objects /bin/*" and disrupt the
whole works.

Even worse, any attempt to exec one of these files then leaves it in a
state where it can never be exec'ed, even if grab-memory-objects is
killed.  The file remains hosed until shutdown, when we get the following
sequence:

startup: notifying tmpfs none of halt...done
startup: notifying ext2fs device:hd0s1 of halt...(ipc/rcv) timed out
startup: halting Mach (flags 0x8)...

It's a 60 second timeout that must terminate the ext2fs translator
abnormally, because the file system is left dirty.

So, there's several problems here:

1. libpager can't handle multiple clients
2. the kernel can't recover from a failed attempt to get a file's memory
object
3. ext2fs can't cleanly shutdown in this case

I'm continuing to lobby for a multi-client libpager!  I can see that it's
going to raise a lot of locking and concurrency issues, but this program
demonstrates that we've already got problems with the current scheme.  Even
a simple multi-client libpager should allow shared read-only access, which
would prevent an unprivileged user from mounting this attack.  Root, with
write access to the files in /bin, could still do it, though.

agape
brent
/* -*- mode: C++; indent-tabs-mode: nil -*-

   grab-memory-objects - test program to grab and hold as many
   memory objects as possible on a collection of files

   The intent is to exploit libpager's current support for only a
   single client.  By grabbing a file's associated memory object, you
   can mount a denial-of-service attack against the kernel, which
   needs a memory object to mmap() the file for execution.

   Nothing more than read access to the file is required.

   Compile with:

   g++ -std=c++11 -o grab-memory-objects grab-memory-objects.cc

   Basic operation:

   grab-memory-objects /bin/tar (or whatever)

   If the grab is successful, good luck running tar while this program
   is running.  If you try to run tar, you'll never be able to run it
   again, even if you termnate this program, without a reboot.
*/

#include 
#include 
#include 
#include 

#include 
#include 
#include 

extern "C" {
#include 
#include 
#include 
}

/* mach_error()'s first argument isn't declared const, and we usually pass it a string */
#pragma GCC diagnostic ignored "-Wwrite-strings"

void
mach_call(kern_return_t err)
{
  if (err != KERN_SUCCESS)
{
  mach_error("mach_call", err);
}
}

/* "simple" template to print a vector by printing its members, separated by spaces
 *
 * adapted from http://stackoverflow.com/questions/10750057
 */

template 
std::ostream& operator<< (std::ostream& out, const std::vector& v) {
  if ( !v.empty() ) {
std::copy (v.begin(), v.end(), std::ostream_iterator(out, " "));
  }
  return out;
}

int
main(int argc, char *argv[])
{
  std::map read_controls;
  std::map write_controls;

  std::vector read_locks;
  std::vector write_locks;

  if (argc < 2)
{
  error (1, 0, "Usage: %s FILENAME...", argv[0]);
}

  mach_port_t portset;

  mach_call (mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_PORT_SET, ));

  /* 'objname' is a unused port that's passed to the memory manager as an identifier */

  mach_port_t objname;

  mach_call (mach_port_allocate (mach_task_self (), MACH_PORT_RIGHT_RECEIVE, ));

  for (int argi=1; argi < argc; argi ++)
{
  const char * const filename = argv[argi];

  file_t node = file_name_lookup (filename, O_RDWR, 0);

  if (node == MACH_PORT_NULL)
{
  node = file_name_lookup (filename, O_RDONLY, 0);
}

  if (node != MACH_PORT_NULL)
{
  mach_port_t rdobj;
  mach_port_t wrobj;

  mach_call(io_map(node, , ));

  /* If we got either kind of memory object (read or write),
   * create a control port, remember its association with the
   * filename, and send it to the memory manager in a
   * memory_object_init message.  We're hoping to receive back
   * a memory_object_ready message.
   *
   * The memory_object_init() client stub makes new send
   * rights on the control and objname ports, so all 

netmsg can now exec files (sort of)

2016-08-26 Thread Brent W. Baccala
Aloha -

I've gotten 'netmsg' to the point where files in the mounted, remote
filesystem can be executed on the local machine.  This isn't remote
execution - it's just copying the files to the local machine and executing
them there.  Nothing more than what you'd expect from NFS or Samba, but it
works.

'libpager', as we've discussed, can't handle multiple clients, and some
extra effort is required to avoid this limitation.  You need a custom exec
server that doesn't attempt any mmap()'s.  It's a simple patch, but not one
we want in the mainline code, except perhaps for testing purposes.  Anybody
see any reason for a --no-mmap flag to the exec server?

Unfortunately, my patched exec server doesn't work as /hurd/exec.  The
system freezes right after the "Hurd server bootstrap" line, and I've spent
the afternoon trying to figure out why.  At first I thought it was a
problem with my exec server executing shell scripts (runsystem.sysv, to be
specific), but I now realize that this is a bug in the standard exec server
executing shell scripts.  I filed a bug report; it has to do with
re-authentication, so I'd rather somebody else sign off on how to fix it.

In the mean time, I'm still wondering why an mmap-less exec server won't
boot my Hurd.

Obviously, this is a Band-Aid.  We don't want a mmap-less exec server; we
want libpager to handle multiple clients, right?

agape
brent


exec server behavior

2016-08-24 Thread Brent W. Baccala
Aloha -

My recent experiments with 'netmsg' have lead me to investigate the
operation of Hurd's exec server, and I've got a few questions.

First off, the current obstacle to exec'ing files over netmsg is the need
to operate the memory_object protocol over the TCP/IP connection.  libpager
has a problem with this, because it can't handle multiple clients.  It
needs to, because in a multi-node environment a file server is likely to be
interacting with multiple kernels using memory_object calls.  Also, my test
program can mount denial of service attacks against the kernel by grabbing
libpager's only available client connection for a given file before the
kernel does.

So, I'll modify libpager to handle multiple clients.  Not trivial, but it
seems necessary and correct.

However, the protocol exchange over the network connection isn't what I
expected.  The failing vm_map turns out to be running on the server, not
the client!

Think about this for a minute.  We've mounted a remote file system and
tried to exec a file on it.  I expected the client to be mapping the file,
which is on the server, not the other way around!

The reason is that the failing vm_map arises from the exec server's attempt
to examine the ELF header of /lib/ld.so.  When the client called
file_exec_file_name, it passed a bunch of ports across the network
connection, including INIT_PORT_CRDIR.  The file server then passed that
along to the exec server (running on the server), which examined the file
(no network ops needed; both the exec server and the file are on the
server), determined the need to map /lib/ld.so, did a dir_lookup on
INIT_PORT_CRDIR (this went back across the network), and then tried to map
the file's ELF header into its own memory space.  Not the new process's
memory space, mind you; it maps it into its own memory space!

One question is whether the exec server really needs to do a vm_map to
examine an ELF header.  A simple read would suffice.  Which should be
preferred?

A more serious question is why the exec server is running on the server
side at all.  Shouldn't it be running on the client?  Then the only network
operation it would need is to map the one file that it's trying to execute.

Examining diskfs_S_file_exec reveals some interesting behavior.  The file
server caches a port to its exec server, so the same exec server gets used
by all of its clients!  Furthermore, it gets the exec server in the first
place by looking up _SERVERS_EXEC in its own name space, not the client's!

Shouldn't the exec server be looked for in the client's name space?

While trying to figure all of this out, I googled around and found the
discussion on this list about the EXECSERVERS environment variable, which
has apparently been deprecated in favor of the remap translator.  I can't
get remap to work, and it's real simple, so I suspect that it's currently
broken.  Yet even if it were working, I can't see how it could affect the
choice of exec server, since it only modifies the client's name space!

I'm thinking that maybe the C library's exec() should lookup _SERVERS_EXEC
and pass it to the file server in the file_exec_file_name call.  Then, if
SUID execution is being requested, maybe the file server ignores that and
looks up _SERVERS_EXEC in its own name space.

What do you think?

agape
brent


Re: netmsg

2016-08-24 Thread Brent W. Baccala
On Tue, Aug 23, 2016 at 11:45 PM, Brent W. Baccala <cos...@freesoft.org>
wrote:

>
> Any ideas why this basic sequence wouldn't work?
>
>node = file_name_lookup("/lib/ld.so", O_RDONLY, 0)
>io_map(node, , )
>/* create control and objname with send/receive rights on both */
>memory_object_init(rdobj, control, objname)
>
>
The answer is that libpager can only handle a single client.

agape
brent


Re: netmsg

2016-08-24 Thread Brent W. Baccala
On Tue, Aug 23, 2016 at 2:12 AM, Richard Braun  wrote:

>
> > It's got a lot of problems.  No authentication handoff; everything the
> > client requests happens with the permissions of the server.  exec'ing a
> > file doesn't work; the last RPC before the hang is memory_object_init.
> > emacs doesn't work; the last RPC before the hang is io_reauthenticate.
>
> That's a very interesting problem indeed, not sure how to fix it.


I've looked into it a bit more.

The auth server is going to be more trouble than I anticipated because it
depends on using Mach ports for rendezvous.  A send right, after transfer
across a TCP/IP connection, will appear to be a completely different port.
I'll think about it more.

The exec problem, on the other hand, is more of a puzzle.
memory_object_init gets sent, but we never see memory_object_ready in
reply.  Of course, we're getting memory_object_init from a user process and
not the kernel.  So I wrote a little test program to try sending
memory_object_init from a user process and, sure enough, no
memory_object_ready.

Any ideas why this basic sequence wouldn't work?

   node = file_name_lookup("/lib/ld.so", O_RDONLY, 0)
   io_map(node, , )
   /* create control and objname with send/receive rights on both */
   memory_object_init(rdobj, control, objname)

...and now I expect to see memory_object_ready on control, but it never
happens.

agape
brent


netmsg

2016-08-22 Thread Brent W. Baccala
Aloha -

I've gotten a basic netmsg server/translator running that relays Mach
messages across a TCP/IP connection.

The code is available at g...@github.com:BrentBaccala/netmsg.git

Basic usage:

netmsg -s (server)

settrans -a node netmsg localhost (client)

It's got a lot of problems.  No authentication handoff; everything the
client requests happens with the permissions of the server.  exec'ing a
file doesn't work; the last RPC before the hang is memory_object_init.
emacs doesn't work; the last RPC before the hang is io_reauthenticate.

Nevertheless, ls, cp, mkdir all work through it.

I'm going to keep plugging away at it; let me know what you think.

Richard was right, by the way - it was best to just start over from scratch.

agape
brent


[PATCH] Re: Hurd shutdown problems

2016-08-17 Thread Brent W. Baccala
Aloha -

OK, I seem to have gotten a handle on this thing now.

First, there's a missing mutex unlock in mach_defpager.  I'm attaching two
patches.  One fixes the debug printfs in mach_defpager/default_pager.c,
which obviously haven't been compiled for a while.  Use %p and %lx instead
of %x to silence compiler warnings, and access pthread_mutex_t's internal
structure member __held instead of held when printing mutex state.  The
second patch actual fixes the problem.

Second, the sysvinit scripts are killing mach_defpager during the shutdown
sequence, and this wreaks havoc.  The big culprit is /sbin/killall5, a C
program in the sysvinit-utils package.  It's readproc() function operates
by reading each process's stat file and parsing its startcode and endcode
values (Linux man page proc(5) - the address range of the program text),
and flagging the PID as a 'kernel' process, not to be killed, if these
values are both zero.  Obviously, this doesn't work on hurd.

I've tinkered with several band-aids - strcmp on the program name, not
killing PIDs below 100, but obviously none of this is suitable to submit as
a patch.  killall5's internal logic is just too Linux specific, IMHO.
What's the Hurdish way to do it?  I'm thinking killall5 should check that
'important' flag on the process and skip processes for which that flag is
set.  Yet, I don't understand what that flag is really intended for.  Does
this make sense?

I think this means changing killall5 so it access the Hurd process server
directly, instead of walking /proc.  Incidentally, the program currently
works by mounting /proc if it isn't mounted already - odd behavior for a
program that's supposed to be shutting things done, not starting them up!
Might have problems getting such a Hurd specific patch into the upstream
code base; who knows?

Also, what should the kernel do if it has problems with the default pager?
After I fixed the mutex bug, I started getting a bunch of
memory_object_data_request failed messages on console.  Still mysterious,
but I guess that's better than nothing!  The error code prints in hex, and
when I looked it up it was MACH_SEND_INVALID_DEST.  Is that what you get
when you send to a dead port?

Yet when the mutex locked up, the result was a silent, locked system.  A
timeout of some kind, accompanied with complaints on console, would be
better, I think, but I don't understand the vm code enough to attempt such
a change right now.

Also, there's this proxy-defpager.  Is that the actual default pager,
acting as front end to mach-defpager?  Yet killall5 seems to be able to
kill proxy-defpager without consequence.  I don't understand.

For me, though, I now have a qemu VM that can cleanly start up, use swap,
and shutdown, so I have real sense of accomplishment!

agape
brent
--- mach-defpager/default_pager.c.dist.almost	2016-08-16 13:11:00.0 -1000
+++ mach-defpager/default_pager.c	2016-08-16 13:11:31.0 -1000
@@ -581,7 +581,7 @@
 	/* be paranoid */
 	if (no_partition(pindex))
 	panic("%sdealloc_page",my_name);
-ddprintf ("pager_dealloc_page(%d,%x,%d)\n",pindex,page,lock_it);
+ddprintf ("pager_dealloc_page(%d,%lx,%d)\n",pindex,page,lock_it);
 	part = partition_of(pindex);
 
 	if (page >= part->total_size)
@@ -1092,7 +1092,7 @@
 #endif
 	if (f_page >= pager->size)
 	  {
-	ddprintf ("%spager_read_offset pager %x: bad page %d >= size %d",
+	ddprintf ("%spager_read_offset pager %p: bad page %ld >= size %d",
 		my_name, pager, f_page, pager->size);
 	pthread_mutex_unlock(>lock);
 	return (union dp_map) (union dp_map *) NO_BLOCK;
@@ -1360,7 +1360,7 @@
 	}
 
 	while (f_page >= pager->size) {
-	  ddprintf ("pager_write_offset: extending: %x %x\n", f_page, pager->size);
+	  ddprintf ("pager_write_offset: extending: %lx %x\n", f_page, pager->size);
 
 	/*
 	 * Paging object must be extended.
@@ -1380,7 +1380,7 @@
 #if	DEBUG_READER_CONFLICTS
 	pager->readers++;
 #endif
-	ddprintf ("pager_write_offset: done extending: %x %x\n", f_page, pager->size);
+	ddprintf ("pager_write_offset: done extending: %lx %x\n", f_page, pager->size);
 	}
 
 	if (INDIRECT_PAGEMAP(pager->size)) {
@@ -1429,7 +1429,7 @@
 	}
 
 	block = mapptr[f_page];
-	ddprintf ("pager_write_offset: block starts as %x[%x] %x\n", mapptr, f_page, block);
+	ddprintf ("pager_write_offset: block starts as %p[%lx] %p\n", mapptr, f_page, block.indirect);
 	if (no_block(block)) {
 	vm_offset_t	off;
 
@@ -1656,7 +1656,7 @@
 	 * Read it, trying for the entire page.
 	 */
 	offset = ptoa(block.block.p_offset);
-ddprintf ("default_read(%x,%x,%x,%d)\n",addr,size,offset,block.block.p_index);
+ddprintf ("default_read(%lx,%x,%lx,%d)\n",addr,size,offset,block.block.p_index);
 	part   = partition_of(block.block.p_index);
 	first_time = TRUE;
 	*out_addr = addr;
@@ -1723,7 +1723,7 @@
 	vm_size_t		wsize;
 	int		rc;
 
-	ddprintf ("default_write: pager offset %x\n", offset);
+	ddprintf ("default_write: pager offset %lx\n", offset);
 
 	/*
 	 

Re: Fwd: Hurd shutdown problems

2016-08-15 Thread Brent W. Baccala
Aloha -

I've updated to the latest Debian kernel package, which includes Samuel's
patch.  (thank you)

This fixes the symbol table problem, but my VM still locks up after a
failed swapoff.

I do, however, get symbolic names displayed correctly from the kernel
debugger at that point.

Obviously, there is another bug, and I will continue to hunt for it.

agape
brent


Re: Fwd: Hurd shutdown problems

2016-08-11 Thread Brent W. Baccala
On Wed, Aug 10, 2016 at 4:33 AM, Richard Braun  wrote:

> On Wed, Aug 10, 2016 at 04:26:35PM +0200, Richard Braun wrote:
> > the boot loader (see MULTIBOOT_FLAGS in boothdr.S), and at
> > some point, late during the boot process, module data are freed
> > using (see free_bootstrap_pages in bootstrap.c). This might
>
> Using vm_page_manage().
>
> --
> Richard Braun
>

The symbol table is far enough away from the module data that I don't think
it's getting freed at that point.

But it does seem to be freed.  Please check my calculations.

Here's the location of the symbol table in virtual memory.

(gdb) print self->start
$15 = (Elf32_Sym *) 0x804fb5ec

Here's its location in physical memory.

(gdb) print *symtab
$23 = {sh_name = 1, sh_type = 2, sh_flags = 0, sh_addr = 5223916, sh_offset
= 5367452, sh_size = 70736, sh_link = 16,
  sh_info = 1663, sh_addralign = 4, sh_entsize = 16}

(gdb) printf "%x\n", 5223916
4fb5ec

Now, with the system fully booted, I find this address's page:

(gdb) print (5223916 - vm_page_segs[0].start)/4096
$44 = 1259

...and now start looking at the page table entries:

(gdb) print vm_page_segs[0].pages[1259].type
$52 = 0
(gdb) print vm_page_segs[0].pages[1260].type
$53 = 0
(gdb) print vm_page_segs[0].pages[1261].type
$54 = 0
(gdb) print vm_page_segs[0].pages[1262].type
$55 = 0

0 is VM_PT_FREE.  It should be VM_PT_RESERVED (1), right?

agape
brent


  1   2   >