Re: Trying to use /bin/sh

2013-09-27 Thread Jilles Tjoelker
On Fri, Sep 27, 2013 at 12:20:32PM -0700, Sean Bruno wrote:
 I was attempting to simply set my EDITOR env variable for use on my
 system today and discovered that I do *not* understand how computers
 work.

 After realizing that I simply do not understand what's going on, I broke
 down and put an echo test into /etc/profile and ~/.profile.

 I can't see why, but logging in via xdm/xfce4 isn't doing any of the
 normal login things and nothing in my .profile is ever executed by any
 shell, ever.

 None of these get executed unless its a login shell.  xfce4 doesn't seem
 to pick these up when starting up, so I have no idea what to do.

 For now, I've put the needed things into a .bashrc and switched my
 loging shell to bash, which is not really what I wanted to do.

sh's model of startup files (only login shells use startup files with
fixed names, other interactive shells only use $ENV) assumes that every
session will load /etc/profile and ~/.profile at some point. This
includes graphical sessions. The ENV file typically contains only shell
options, aliases, function definitions and unexported variables but no
environment variables.

Some graphical environments actually source shell startup files like
~/.profile when logging in. I remember this from CDE for example. It is
important to have some rule where this should happen to avoid doing it
twice or never in strange configurations. As a workaround, I made
~/.xsession a script interpreted by my login shell and source some
startup files. A problem here is that different login shells have
incompatible startup files.

A different direction is to accept that the environment variables in the
X session will be incorrect and make all xterms start login shells (e.g.
xterm -ls).

You may also be able to use setenv in ~/.login_conf.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [kde-freebsd] virtualbox file dialog problem

2013-08-28 Thread Jilles Tjoelker
, if setting KDE_FORK_SLAVES=1 works around the problem, perhaps
KDE should behave that way automatically if it is called from a process
with issetugid() true.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Importing tradcpp (traditional (KR-style) C macro preprocessor) into base?

2013-06-11 Thread Jilles Tjoelker
On Wed, Jun 12, 2013 at 12:23:44AM +0200, Matthias Andree wrote:
 Am 12.06.2013 00:11, schrieb Baptiste Daroussin:
  I have been working in importing tradcpp (developped by David A.
  Holland from NetBSD) into the ports tree, it is a traditional
  (KR-style) C macro preprocessor BSD licensed. I first worked on it
  so that imake can work properly without gcc.

  I discovered that some part of the base system still needs a
  traditional preprocessor, like (calendar), what I propose it to
  import tradcpp into the base system (not the version in port right
  now but what will become version 0.2).

  It mostly behave like gcpp, and I'm able to properly use calendar
  along with tradcpp with this small patch:
  http://people.freebsd.org/~bapt/tradcpp.diff

  Any objections against me importing it?

 Shouldn't we fix calendar and imake so that they can use a modern cpp,
 instead of going back 25 years?  Or am I missing the point here?

A modern cpp is only suitable for preprocessing C code. It makes various
assumptions about syntax that make it work better for preprocessing
actual C code but worse for text that is not C. Calendar and imake use
the traditional cpp for preprocessing non-C text.

There is already a patch for removing the dependency on cpp in calendar.
I think that makes more sense than importing a traditional cpp (assuming
there are no other users in base that cannot be fixed).

The fix for imake is probably not to use it. Ports that need imake can
continue to use it if it depends on a traditional cpp port.

A further point of confusion is that things that want a traditional
cpp in fact want an equivalent of gcpp -traditional. This is in fact
the point of tradcpp and why the old preprocessors do not work.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: /bin/sh = STDIN functions, var scope messing

2013-05-30 Thread Jilles Tjoelker
On Tue, May 28, 2013 at 11:48:47AM +0200, Václav Zeman wrote:
 On 27 May 2013 21:58, Reid Linnemann wrote:
  from SH(1)

  Note that unlike some other shells, sh executes each process in a pipe-
   line with more than one command in a subshell environment and as a
  child
   of the sh process.

  I'm taking this to mean that redirecting to sh_f has sh_f execute in
  a subshell in which global_scope_var changes, but the original
  shell's copy is uncahnged.
 Curious. Which of the two behaviours is POSIXly correct?

Both. As per XCU 2.12 Shell Execution Environment, each command in a
multi-command pipeline may or may not be executed in a subshell
environment.

Behaviour different from our sh is most often encountered in the various
versions of the real Korn shell (ksh88 and ksh93), which execute the
last command in a pipeline in the current shell environment.

If things like  jobs | cat  work, that can also be explained using this
rule.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: find -delete broken, or just used improperly?

2013-05-24 Thread Jilles Tjoelker
On Tue, May 21, 2013 at 11:06:39AM -0400, John Baldwin wrote:
 On Monday, May 20, 2013 5:47:31 pm Jilles Tjoelker wrote:
  The below patch allows deleting the pathname given to find itself:

  Index: usr.bin/find/function.c
  ===
  --- usr.bin/find/function.c (revision 250661)
  +++ usr.bin/find/function.c (working copy)
  @@ -442,7 +442,8 @@
  errx(1, -delete: forbidden when symlinks are followed);
   
  /* Potentially unsafe - do not accept relative paths whatsoever */
  -   if (strchr(entry-fts_accpath, '/') != NULL)
  +   if (entry-fts_level  FTS_ROOTLEVEL 
  +   strchr(entry-fts_accpath, '/') != NULL)
  errx(1, -delete: %s: relative path potentially not safe,
  entry-fts_accpath);

 I'm curious, how would you instruct a patched find to avoid deleteing
 the /tmp/foo directory (e.g. if you wanted this to be a job that
 pruned empty dirs from /tmp/foo but never pruned the directory
 itself).  Would -mindepth 1 do it?  (Just asking.  I have also found
 this message annoying but most of the jobs I have seen it on probably
 don't want to delete the root path, just descendants.)

-mindepth 1 works, as does cd /tmp/foo  find . -... (-delete silently
ignores . and ..).

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: find -delete broken, or just used improperly?

2013-05-20 Thread Jilles Tjoelker
On Mon, May 20, 2013 at 03:23:16PM -0400, Kurt Lidl wrote:
 OK, maybe I'm missing something obvious, but...

 find(1) says:

  -delete
  Delete found files and/or directories.  Always returns true.
  This executes from the current working directory as find recurses
  down the tree.  It will not attempt to delete a filename with a
  ``/'' character in its pathname relative to ``.'' for security
  reasons.  Depth-first traversal processing is implied by this
  option.  Following symlinks is incompatible with this option.

 However, it fails even when the path is absolute:

 bhyve9# mkdir /tmp/foo
 bhyve9# find /tmp/foo -empty -delete
 find: -delete: /tmp/foo: relative path potentially not safe

 Shouldn't this work?

The relative path refers to a pathname that contains a slash other
than at the beginning or end and may therefore refer to somewhere else
if a directory is concurrently replaced by a symlink.

When -L is not specified and . can be opened, the fts(3) code
underlying find(1) is careful to avoid following symlinks or being
dropped in different locations by moving the directory fts is currently
traversing. If a problematic concurrent modification is detected, fts
will not enter the directory or abort. Files found in the search are
returned via the current working directory and a pathname not containing
a slash.

For paranoia, find(1) verifies this when -delete is used. However, it is
too paranoid about the root of the traversal. It is already assumed that
the initial pathname does not refer to directories or symlinks that
might be replaced by untrusted users; otherwise, the whole traversal
would be unsafe. Therefore, it is not necessary to do the check for
fts_level == FTS_ROOTLEVEL.

The below patch allows deleting the pathname given to find itself:

Index: usr.bin/find/function.c
===
--- usr.bin/find/function.c (revision 250661)
+++ usr.bin/find/function.c (working copy)
@@ -442,7 +442,8 @@
errx(1, -delete: forbidden when symlinks are followed);
 
/* Potentially unsafe - do not accept relative paths whatsoever */
-   if (strchr(entry-fts_accpath, '/') != NULL)
+   if (entry-fts_level  FTS_ROOTLEVEL 
+   strchr(entry-fts_accpath, '/') != NULL)
errx(1, -delete: %s: relative path potentially not safe,
entry-fts_accpath);
 

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Managing userland data pointers in kqueue/kevent

2013-05-19 Thread Jilles Tjoelker
On Wed, May 15, 2013 at 01:34:58PM +0100, Paul LeoNerd Evans wrote:
 On Wed, 15 May 2013 13:29:59 +0100
 Paul LeoNerd Evans leon...@leonerd.org.uk wrote:

  Is that not the exact thing I suggested?

  The extension to create register a kevent to catch these events is
  that you put the EV_DROPWATCH bit flag in the event at the time you
  register it.

  The returned event [that] could have all the appropriate informaiton
  for the event being dropped is that you receive an event with
  EV_DROPPED set on it. It being a real event includes of course the
  udata pointer, so you can handle it.

 In fact, to requote the original PR I wrote[1] on the subject:

 ---

   I propose the addition of a new flag applicable to any kevent watch
   structure, documented thusly:

 The flags field can contain the following values:
 ..
 EV_DROPWATCH Requests that the kernel will send an EV_DROPPED event
  on this watch when it has finished watching it for any
  reason, including EV_DELETE, expiry because of
  EV_ONESHOT, or because the filehandle was closed by
  close(2).
 
 EV_DROPPED   This flag is returned by the kernel if it is now about
  to drop the watch. After this flag has been received,
  no further events will occur on this watch.

   This flag then makes it trivial to build a generic wrapper for kqueue
   that can always manage its memory correctly.

   a) at EV_ADD time, simply set flags |= EV_DROPWATCH

   b) after an event has been processed that included the EV_DROPPED
   flag, free() the pointer given in the udata field.

An important detail is missing: how do you avoid using up all kernel
memory on knotes if someone keeps adding new file descriptors with
EV_ADD | EV_DROPWATCH and closing the file descriptors again without
ever draining the kqueue?

This problem did not use to exist for file descriptor events before: the
number of such knotes was limited to the number of open file
descriptors. However, it does already exist for most of the other event
types. For example, pwait -v will return the exit status even if it was
suspended (^Z) while the process terminated and the parent reaped the
zombie. For EVFILT_TIMER, the worst effect is a denial of service of
EVFILT_TIMER on all other processes in the system. EVFILT_USER does not
appear to check anything and appears to allow arbitrary kernel memory
consumption.

The EVFILT_TIMER needs to keep its global limit and EVFILT_USER needs
something similar.

For the rest, call an event that is no longer associated to a kernel
object (e.g. EVFILT_READ whose file descriptor is closed, EVFILT_PROC
whose process has terminated and been reaped by the parent or EVFILT_AIO
whose I/O request is completed) unbound. The number of events that are
not unbound is limited by existing limits on the other kernel objects. A
possible fix is to reject (such as with [ENOMEM]) adding new events when
there are too many unbound events in the queue already. The application
should then allow kevent() to return pending events first before it adds
new ones. If the kernel returns unbound events in preference to other
events, a kevent() call with nevents = 2 * nchanges cannot result in a
net increase in the number of current and potential unbound events,
since it allows the kernel to return (and forget) as many unbound events
as it may add (nchanges entries are required for EV_ERROR leaving
nchanges for returning other events).

   It is not required that these two flags have distinct values; since
   one is userland-kernel and the other kernel-userland, they could for
   neatness reuse the same bit field.

I think it would be consistent with other EV_* to use the same name and
value for both.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: SystemV IPC. Segment info

2013-04-30 Thread Jilles Tjoelker
On Tue, Apr 30, 2013 at 07:49:40AM +0400, Vagner wrote:
 Tell me please, may I send this to PR, or this changes is not valid?

  A few weeks ago, I ran into problem, which related to SystemV IPC.
  More than 20 processes attached to a segment shared queue.
  Process-initiator for create segment was killed, as process which
  was accessed to segment last. Segment didn't free memory, but tagged
  it as SHMSEG_REMOVED as the result. This is a reason of memory
  overflow (memory assotiated as shm). Moreover, processes, which was
  attached to this segment did't get a new data. I have one resolve. I
  need to restarted all process, which still attached to segment. But
  this reason have a problem. We haven't list of this processes at
  system. Moreover, struct shmid_ds, which described segment, haven't
  this info too.

  This patch is a resolve of problem. It: 
  - added a linked list of structures shmid_pi in struct shmid_ds. PID
  (and last access time) recorded to this struct consistently. Memory
  allocates with ident 'shminfo' for this list of struct shmid_pi.
  - added syscall shminf for get all elements from list shmid_ds.
  - added option [-P] in ipcs(1) for system call shminf.

I think it is strange to maintain pids in the VM system. This makes the
VM system more complex for little reason (because the information is
only needed for monitoring, not normal operation).

Perhaps it is possible to do what you want using procstat -v or a slight
extension to it.

Alternatively, you may find a different way to get rid of your stale
worker processes. For example, before invoking shmctl(IPC_RMID), set a
flag inside the memory segment that all workers should exit and activate
whatever mechanism you use for telling the workers that they need to do
something.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Some improvements to rm(1)

2013-04-26 Thread Jilles Tjoelker
On Thu, Apr 25, 2013 at 10:56:10PM -0400, Eitan Adler wrote:
 On 25 April 2013 22:50, Brooks Davis bro...@freebsd.org wrote:
  On Thu, Apr 25, 2013 at 10:16:32PM -0400, Eitan Adler wrote:
  Anyone have thoughts on the following?

  commit 82c78ba923d8ce4a1bfbb309658c49021c8bb384
  Author: Eitan Adler li...@eitanadler.com
  Date:   Thu Apr 25 22:14:49 2013 -0400

  Take some improvements from DragonFlyBSD:
- add const where appropriate
- add static where appropriate
- fix a whitespace issues
 
  The no-op changes look more correct to me.

  I think the -x option seems a bit odd.  What is the use case?  At a
  first thought, it seems to raise more questions than it resolves.

 It goes along with cp -x, find -x, and others.

 Quick example #1: You have /usr/ports /usr/ports/distfiles as
 different mount points it lets you wipe /usr/ports without wiping your
 distfile cache.

 Quick example #2: You have /usr/src/ null mounted in every user's
 /home/ and you want to wipe one home directory.

Hmm, isn't this already possible using  find -x DIR -delete  ?

There will be an error message 'Device busy' about attempting to delete
the mount point but this does not even affect the exit status and all
files not under the mount point are removed.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Some improvements to rm(1)

2013-04-26 Thread Jilles Tjoelker
On Fri, Apr 26, 2013 at 08:33:54AM -0500, Joshua Isom wrote:
 On 4/26/2013 7:23 AM, Eitan Adler wrote:
  Yes, rm's functionality can be fully replicated by find.

 As well as anything using -R.

Emulating other -R things using find becomes quite slow when you don't
want to impose {PATH_MAX} limits or open up symlink-based race windows
because the only safe option is -execdir UTILITY {} \;. Any find command
based on -exec, -print or -print0 passes pathnames which are subject to
{PATH_MAX} limits and directories concurrently replaced with symlinks.

The construct -execdir ... {} + is unusably broken in older FreeBSD
versions and gives no advantage compared to -execdir ... {} \; in recent
-CURRENT.

With -L, this is not a new problem because symlinks are followed anyway
and the underlying code (fts(3)) always imposes the {PATH_MAX} limit in
that case.

The -delete primary is safe like -execdir.

I'm not entirely sure about this because the rm(1) patch is simple and
the new syntax is fairly clear.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [patch] pipe2

2013-04-22 Thread Jilles Tjoelker
On Mon, Apr 22, 2013 at 08:31:18PM +0300, Konstantin Belousov wrote:
 On Sun, Apr 21, 2013 at 11:02:43PM +0200, Jilles Tjoelker wrote:
  On Sat, Apr 20, 2013 at 12:48:39AM +0200, Jilles Tjoelker wrote:
   I'm also working on pipe2() (using linuxulator work) and dup3() (patch
   from Jukka A. Ukkonen).

  This is an implementation of pipe2. As with the accept4 patch, make
  sysent needs to be run in sys/kern and sys/compat/freebsd32.

  As a bonus, new architectures might implement pipe(p) as pipe2(p, 0)
  avoiding the need for assembler (but behaviour is different if the
  pointer is invalid).

 I do not see anything wrong with the patch.  That said, I prefer the
 pipe(2) approach of delegating the access to the usermode memory to
 usermode, which avoids the need of the calls to kern_close().

Delegating the access gives better semantics (a signal instead of an
[EFAULT] error) but I don't like adding architecture-specific assembler
wrappers like pipe.S. Also, different calling conventions than expected
from the C level prototype may make things harder to understand.

Perhaps the wrapper can be implemented in C with the below declarations
but it is quite likely that this is not fully portable.

struct pipereturn { register_t a, b; };
struct pipereturn __sys_pipe(void);

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


[patch] pipe2

2013-04-21 Thread Jilles Tjoelker
On Sat, Apr 20, 2013 at 12:48:39AM +0200, Jilles Tjoelker wrote:
 I'm also working on pipe2() (using linuxulator work) and dup3() (patch
 from Jukka A. Ukkonen).

This is an implementation of pipe2. As with the accept4 patch, make
sysent needs to be run in sys/kern and sys/compat/freebsd32.

As a bonus, new architectures might implement pipe(p) as pipe2(p, 0)
avoiding the need for assembler (but behaviour is different if the
pointer is invalid).

I will do a __FreeBSD_version bump once these are in.

commit 256ab750c5e70db85d87c884994eb13038b182a2
Author: Jilles Tjoelker jil...@stack.nl
Date:   Sat Mar 30 23:32:34 2013 +0100

Add pipe2() system call.

diff --git a/include/unistd.h b/include/unistd.h
index dabf178..9df0777 100644
--- a/include/unistd.h
+++ b/include/unistd.h
@@ -533,6 +533,7 @@ char*mktemp(char *);
 #endif
 int nfssvc(int, void *);
 int nlm_syscall(int, int, int, char **);
+int pipe2(int *, int);
 int profil(char *, size_t, vm_offset_t, int);
 int rcmd(char **, int, const char *, const char *, const char *, int *);
 int rcmd_af(char **, int, const char *,
diff --git a/lib/libc/sys/Makefile.inc b/lib/libc/sys/Makefile.inc
index 105f469..8e918bf 100644
--- a/lib/libc/sys/Makefile.inc
+++ b/lib/libc/sys/Makefile.inc
@@ -352,6 +352,7 @@ MLINKS+=pathconf.2 lpathconf.2
 MLINKS+=pdfork.2 pdgetpid.2\
pdfork.2 pdkill.2 \
pdfork.2 pdwait4.2
+MLINKS+=pipe.2 pipe2.2
 MLINKS+=read.2 pread.2 \
read.2 preadv.2 \
read.2 readv.2
diff --git a/lib/libc/sys/Symbol.map b/lib/libc/sys/Symbol.map
index 149fa41..24f4621 100644
--- a/lib/libc/sys/Symbol.map
+++ b/lib/libc/sys/Symbol.map
@@ -393,6 +393,7 @@ FBSD_1.3 {
ffclock_getcounter;
ffclock_getestimate;
ffclock_setestimate;
+   pipe2;
posix_fadvise;
wait6;
 };
diff --git a/lib/libc/sys/pipe.2 b/lib/libc/sys/pipe.2
index 92d137f..bb3db51 100644
--- a/lib/libc/sys/pipe.2
+++ b/lib/libc/sys/pipe.2
@@ -28,7 +28,7 @@
 .\ @(#)pipe.2 8.1 (Berkeley) 6/4/93
 .\ $FreeBSD$
 .\
-.Dd January 30, 2006
+.Dd March 31, 2013
 .Dt PIPE 2
 .Os
 .Sh NAME
@@ -40,6 +40,8 @@
 .In unistd.h
 .Ft int
 .Fn pipe int fildes[2]
+.Ft int
+.Fn pipe2 int fildes[2] int flags
 .Sh DESCRIPTION
 The
 .Fn pipe
@@ -50,6 +52,29 @@ which is an object allowing
 bidirectional data flow,
 and allocates a pair of file descriptors.
 .Pp
+The
+.Fn pipe2
+system call allows control over the attributes of the file descriptors
+via the
+.Fa flags
+argument.
+Values for
+.Fa flags
+are constructed by a bitwise-inclusive OR of flags from the following
+list, defined in
+.In fcntl.h :
+.Bl -tag -width .Dv O_NONBLOCK
+.It Dv O_CLOEXEC
+Set the close-on-exec flag for the new file descriptors.
+.It Dv O_NONBLOCK
+Set the non-blocking flag for the ends of the pipe.
+.El
+.Pp
+If the
+.Fa flags
+argument is 0, the behavior is identical to a call to
+.Fn pipe .
+.Pp
 By convention, the first descriptor is normally used as the
 .Em read end
 of the pipe,
@@ -88,7 +113,9 @@ pipe in one direction.
 .Sh ERRORS
 The
 .Fn pipe
-system call will fail if:
+and
+.Fn pipe2
+system calls will fail if:
 .Bl -tag -width Er
 .It Bq Er EMFILE
 Too many descriptors are active.
@@ -97,6 +124,16 @@ The system file table is full.
 .It Bq Er ENOMEM
 Not enough kernel memory to establish a pipe.
 .El
+.Pp
+The
+.Fn pipe2
+system call will also fail if:
+.Bl -tag -width Er
+.It Bq Er EINVAL
+The
+.Fa flags
+argument is invalid.
+.El
 .Sh SEE ALSO
 .Xr sh 1 ,
 .Xr fork 2 ,
@@ -111,3 +148,8 @@ function appeared in
 .Pp
 Bidirectional pipes were first used on
 .At V.4 .
+.Pp
+The
+.Fn pipe2
+function appeared in
+.Fx 10.0 .
diff --git a/sys/compat/freebsd32/syscalls.master 
b/sys/compat/freebsd32/syscalls.master
index 2cbdf31..be245be 100644
--- a/sys/compat/freebsd32/syscalls.master
+++ b/sys/compat/freebsd32/syscalls.master
@@ -1026,3 +1026,4 @@
struct sockaddr * __restrict name, \
__socklen_t * __restrict anamelen, \
int flags); }
+542AUE_PIPENOPROTO { int pipe2(int *fildes, int flags); }
diff --git a/sys/kern/capabilities.conf b/sys/kern/capabilities.conf
index 0b64503..d2fa51c 100644
--- a/sys/kern/capabilities.conf
+++ b/sys/kern/capabilities.conf
@@ -490,6 +490,7 @@ pdkill
 ## Allow pipe(2).
 ##
 pipe
+pipe2
 
 ##
 ## Allow poll(2), which will be scoped by capability rights.
diff --git a/sys/kern/sys_pipe.c b/sys/kern/sys_pipe.c
index 90c3022..493fee5e 100644
--- a/sys/kern/sys_pipe.c
+++ b/sys/kern/sys_pipe.c
@@ -477,6 +477,24 @@ sys_pipe(struct thread *td, struct pipe_args *uap)
return (0);
 }
 
+int
+sys_pipe2(struct thread *td, struct pipe2_args *uap)
+{
+   int error, fildes[2];
+
+   if (uap-flags  ~(O_CLOEXEC | O_NONBLOCK))
+   return (EINVAL);
+   error = kern_pipe2(td, fildes, uap-flags);
+   if (error)
+   return (error);
+   error

[patch] accept4

2013-04-19 Thread Jilles Tjoelker
The accept4() function, compared to accept(), allows setting the new
file descriptor atomically close-on-exec and explicitly controlling the
non-blocking status on the new socket. (Note that the latter point means
that accept() is not equivalent to any form of accept4().)

The linuxulator's accept4 implementation leaves a race window where the
new file descriptor is not close-on-exec because it calls sys_accept().
This implementation leaves no such race window (by using falloc()
flags). The linuxulator could be fixed and simplified by using the new
code.

The comms/openobex port now compiles again without the
patch-lib_cloexec.h patch.

More information about API extensions to set new file descriptors
atomically close-on-exec is at
https://wiki.freebsd.org/AtomicCloseOnExec

I'm also working on pipe2() (using linuxulator work) and dup3() (patch
from Jukka A. Ukkonen).

commit 2e410d9260d12328b689b0938e09d09649b0460a
Author: Jilles Tjoelker jil...@stack.nl
Date:   Sat Mar 30 21:44:14 2013 +0100

Add accept4() system call.

diff --git a/lib/libc/sys/Makefile.inc b/lib/libc/sys/Makefile.inc
index fef0f3c..105f469 100644
--- a/lib/libc/sys/Makefile.inc
+++ b/lib/libc/sys/Makefile.inc
@@ -270,6 +270,7 @@ MAN+=   sctp_generic_recvmsg.2 \
wait.2 \
write.2
 
+MLINKS+=accept.2 accept4.2
 MLINKS+=access.2 eaccess.2 \
access.2 faccessat.2
 MLINKS+=brk.2 sbrk.2
diff --git a/lib/libc/sys/Symbol.map b/lib/libc/sys/Symbol.map
index 6faa0af..149fa41 100644
--- a/lib/libc/sys/Symbol.map
+++ b/lib/libc/sys/Symbol.map
@@ -378,6 +378,7 @@ FBSD_1.2 {
 };
 
 FBSD_1.3 {
+   accept4;
bindat;
cap_fcntls_get;
cap_fcntls_limit;
@@ -461,6 +462,8 @@ FBSDprivate_1.0 {
__sys_abort2;
_accept;
__sys_accept;
+   _accept4;
+   __sys_accept4;
_access;
__sys_access;
_acct;
diff --git a/lib/libc/sys/accept.2 b/lib/libc/sys/accept.2
index 978b948..e8bccaf 100644
--- a/lib/libc/sys/accept.2
+++ b/lib/libc/sys/accept.2
@@ -41,6 +41,8 @@
 .In sys/socket.h
 .Ft int
 .Fn accept int s struct sockaddr * restrict addr socklen_t * restrict 
addrlen
+.Ft int
+.Fn accept4 int s struct sockaddr * restrict addr socklen_t * restrict 
addrlen int flags
 .Sh DESCRIPTION
 The argument
 .Fa s
@@ -66,6 +68,26 @@ and
 signals from the original socket
 .Fa s .
 .Pp
+The
+.Fn accept4
+system call is similar,
+but the
+.Dv O_NONBLOCK
+property of the new socket is instead determined by the
+.Dv SOCK_NONBLOCK
+flag in the
+.Fa flags
+argument,
+the
+.Dv O_ASYNC
+property is cleared,
+the signal destination is cleared
+and the close-on-exec flag on the new file descriptor can be set via the
+.Dv SOCK_CLOEXEC
+flag in the
+.Fa flags
+argument.
+.Pp
 If no pending connections are
 present on the queue, and the original socket
 is not marked as non-blocking,
@@ -141,13 +163,15 @@ properties and the signal destination being inherited,
 but should set them explicitly using
 .Xr fcntl 2 .
 .Sh RETURN VALUES
-The call returns \-1 on error.
-If it succeeds, it returns a non-negative
+These calls return \-1 on error.
+If they succeed, they return a non-negative
 integer that is a descriptor for the accepted socket.
 .Sh ERRORS
 The
 .Fn accept
-system call will fail if:
+and
+.Fn accept4
+system calls will fail if:
 .Bl -tag -width Er
 .It Bq Er EBADF
 The descriptor is invalid.
@@ -180,6 +204,16 @@ are present to be accepted.
 A connection arrived, but it was closed while waiting
 on the listen queue.
 .El
+.Pp
+The
+.Fn accept4
+system call will also fail if:
+.Bl -tag -width Er
+.It Bq Er EINVAL
+The
+.Fa flags
+argument is invalid.
+.El
 .Sh SEE ALSO
 .Xr bind 2 ,
 .Xr connect 2 ,
@@ -194,3 +228,8 @@ The
 .Fn accept
 system call appeared in
 .Bx 4.2 .
+.Pp
+The
+.Fn accept4
+system call appeared in
+.Fx 10.0 .
diff --git a/lib/libthr/pthread.map b/lib/libthr/pthread.map
index 355edea..bbbd930e 100644
--- a/lib/libthr/pthread.map
+++ b/lib/libthr/pthread.map
@@ -181,6 +181,7 @@ FBSDprivate_1.0 {
___wait;
___waitpid;
__accept;
+   __accept4;
__aio_suspend;
__close;
__connect;
@@ -408,3 +409,7 @@ FBSD_1.2 {
setcontext;
swapcontext;
 };
+
+FBSD_1.3 {
+   accept4;
+};
diff --git a/lib/libthr/thread/thr_syscalls.c b/lib/libthr/thread/thr_syscalls.c
index 2327d74..7a08302 100644
--- a/lib/libthr/thread/thr_syscalls.c
+++ b/lib/libthr/thread/thr_syscalls.c
@@ -101,6 +101,7 @@ extern pid_t__waitpid(pid_t, int *, int);
 extern int __sys_aio_suspend(const struct aiocb * const[], int,
const struct timespec *);
 extern int __sys_accept(int, struct sockaddr *, socklen_t *);
+extern int __sys_accept4(int, struct sockaddr *, socklen_t *, int);
 extern int __sys_connect(int, const struct sockaddr *, socklen_t);
 extern int __sys_fsync(int);
 extern int __sys_msync(void *, size_t, int);
@@ -129,6 +130,7 @@ int ___usleep(useconds_t useconds);
 pid_t

Re: GSOC: Qt front-end for freebsd-update

2013-04-14 Thread Jilles Tjoelker
On Sun, Apr 14, 2013 at 02:11:44AM -0400, Justin Edward Muniz wrote:
  I am excited for this year's Google Summer of Code, and I have a
 project in mind that I am working to propose.

  I am a CS major and have experience with Qt, C++ and shell scripting.
 I have been developing on FreeBSD for several years, and I am looking to
 tackle developing a new Qt front-end for the freebsd-update command.

  I am thinking a rather straight-forward user interface with more
 advanced options available (such as selecting a specific server). I am
 looking for constructive feedback, and also a developer interested in
 mentoring me this Summer.

If you want to work on freebsd-update, I think it will be more useful to
improve freebsd-update itself. Some issues are:

* It is not particularly robust in the face of errors (such as a full
  disk). From reading the documentation, you might get the impression
  that it is better than it actually is. The state for the rollback
  command is only set up after installation of the update so that
  command is not useful for backing out a partially installed update.

* The upgrade feature takes large amounts of time and network bandwidth.

* Less required user interaction would be nice. Think of upgrading many
  machines.

* freebsd-update and pkgng are two upgrade systems. Perhaps
  freebsd-upgrade should be scrapped altogether in favour of a
  pkgng-based solution.

If you want to write GUIs, perhaps a pkgng GUI is more interesting.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: GSOC 2013 project Kernel Size Reduction for Embedded System

2013-04-08 Thread Jilles Tjoelker
On Mon, Apr 08, 2013 at 08:28:04PM +, Amit Rawat wrote:
 GSOC posted the list of selected organization for GSOC 2013 and I am
 highly happy that FreeBSD is among the selected organization.

 I am a third year student interested to work in the field of embedded
 system. I applied last year and the title of my project was  Kernel
 Size Reduction for Embedded System. The link to my last year proposal
 is
 https://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/amitrawat10/1#c8001

 But due to some flaws it doesn't get selected. I am looking to improve my
 proposal for this year and apply again. I explain some portion of my
 project pictorially on my blog

 https://amit10rawat.wordpress.com/2013/02/26/kernel-size-reduction-for-embedded-system/

 I am looking for suggestion and new ideas by which I can reduce the
 size of kernel.

It looks like the overlay idea could be implemented more simply by
taking advantage of the VM system: make part of the kernel code
pageable. Memory formerly occupied by rarely used kernel code can then
be used by userland applications. You will need some sort of backing
store where the kernel code can be read after booting; this is not
normally available.

However, almost no kernel code is safe in a situation where an
instruction fetch may fault. Reading or writing the secondary storage
can easily cause a deadlock. It causes the thread to sleep, which is not
allowed while holding a mutex. It would help if you could wire down
pieces which will need to be used in the near future from a place where
a fault is safe, but this may also be very slow even if the code is
already in memory.

Some other ideas for kernel size reduction:

* Find pieces of code that are required but seem big for what they do
  for you, and try to make them smaller. The proposal should list
  concrete parts.

* Find variables and functions that are required only during kernel
  initialization, place them in a special section and add this section
  to the free memory pool after kernel initialization.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Fwd: Where does FreeBSD tr -C differ from tr -c?

2013-04-07 Thread Jilles Tjoelker
On Sun, Apr 07, 2013 at 09:12:57PM +0200, Cedric Blancher wrote:
 Forwarding to freebsd-hackers@/freebsd-standa...@freebsd.org

 The question remain open and I need help. tr -C is implemented by
 FreeBSD tr -C but I can't find examples (or a testcase) where tr -c
 and tr -C differ.

Reading the rationale of POSIX, here is an example of a difference:

% printf 'a\200'|LC_ALL=en_US.US-ASCII tr -cd '\000-\177'|hd
  61|a|
0001
% printf 'a\200'|LC_ALL=en_US.US-ASCII tr -Cd '\000-\177'|hd 
  61 80 |a.|
0002

Because the bytes 128..255 are not characters in us-ascii, they cannot
be removed with -Cd, only with -cd.

Here is another difference (using LC_CTYPE=en_US.UTF-8, rest C):

% echo $'\U0001a000'|tr -cd '\U0001a000'|hd 
% echo $'\U0001a000'|tr -Cd '\U0001a000'|hd
  f0 9a 80 80   ||
0004

The cause is that iswrune(3) returns false for the unassigned code point
U+0001A000.

This may well contain bugs because Unicode adds new characters from time
to time and our tables seem to be updated very rarely.

POSIX also says things about collation order. You may not have detected
this because FreeBSD does not implement LC_COLLATE for multibyte locales
yet.

 PS: Who wrote tr -C and how can I contact the author?

You can read the Subversion logs but people may no longer be around.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: close(2) while accept(2) is blocked

2013-04-05 Thread Jilles Tjoelker
On Thu, Apr 04, 2013 at 08:43:17PM +0300, Andriy Gapon wrote:
 on 01/04/2013 18:22 John Baldwin said the following:
  I think you need to split the 'struct file' reference count into two
  different counts similar to the how we have vref/vrele vs
  vhold/vdrop for vnodes.  The fget for accept and probably most other
  system calls should probably be equivalent to vhold, whereas things
  like open/dup (and storing an fd in a cmsg) should be more like
  vref.  close() should then be a vrele().

 This model makes perfect sense.
 Unfortunately, ENOTIME to work on this.

This looks like it can work but I don't know whether it's worth the
trouble.

 Meanwhile I am using the following patch specific to local domain
 sockets, accept(2) and shutdown(2).  Turns out that the problematic
 application does both shutdown(RDWR) and close(2), but that doesn't
 help on FreeBSD.
 BTW, this is the application:
 http://thread.gmane.org/gmane.os.freebsd.devel.office/1754
 The patch does help.

 Author: Andriy Gapon a...@icyb.net.ua
 Date:   Thu Mar 28 20:08:13 2013 +0200
 
 [test!] uipc_shutdown: use soisdisconnected instead of socantsendmore
 
 So that in addition to disabling sends we also wake up threads blocked
 in accept (on so_timeo in general).
 
 diff --git a/sys/kern/uipc_usrreq.c b/sys/kern/uipc_usrreq.c
 index 9b60eab..e93d46c 100644
 --- a/sys/kern/uipc_usrreq.c
 +++ b/sys/kern/uipc_usrreq.c
 @@ -1074,7 +1074,7 @@ uipc_shutdown(struct socket *so)
 
   UNP_LINK_WLOCK();
   UNP_PCB_LOCK(unp);
 - socantsendmore(so);
 + soisdisconnected(so);
   unp_shutdown(unp);
   UNP_PCB_UNLOCK(unp);
   UNP_LINK_WUNLOCK();

I think this patch makes shutdown(SHUT_WR) on unix domain sockets act
like shutdown(SHUT_RDWR), i.e. receives are incorrectly denied.

The below patch also makes shutdown(SHUT_RDWR) abort a blocking accept
on a unix domain socket, but it should work for all domains:

Index: sys/kern/uipc_socket.c
===
--- sys/kern/uipc_socket.c  (revision 248873)
+++ sys/kern/uipc_socket.c  (working copy)
@@ -2428,9 +2428,11 @@ soshutdown(struct socket *so, int how)
sorflush(so);
if (how != SHUT_RD) {
error = (*pr-pr_usrreqs-pru_shutdown)(so);
+   wakeup(so-so_timeo);
CURVNET_RESTORE();
return (error);
}
+   wakeup(so-so_timeo);
CURVNET_RESTORE();
return (0);
 }

A blocking accept (and some other operations) is waiting on
so-so_timeo. Once it wakes up, it will detect the SBS_CANTRCVMORE bit.

A spurious wakeup on so-so_timeo appears harmless (sleep retried)
except when lingering on close (SO_LINGER) so this should be fairly
safe.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: close(2) while accept(2) is blocked

2013-03-28 Thread Jilles Tjoelker
On Thu, Mar 28, 2013 at 06:54:31PM +0200, Andriy Gapon wrote:

 So, this started as a simple question, but the answer was quite
 unexpected to me.

 Let's say we have an opened and listen-ed socket and let's assume that
 we know that one thread is blocked in accept(2) and another thread is
 calling close(2). What is going to happen?

 Turns out that practically nothing.  For kernel the close call would
 be almost a nop.
 My understanding is this:
 - when socket is created, its reference count is 1
 - when accept(2) is called, fget in kernel increments the reference
   count (kept in an associated struct file)
 - when close(2) is called, the reference count is decremented

 The reference count is still greater than zero, so fdrop does not call
 fo_close. That means that in the case of a socket soclose is not
 called.

 I am sure that the reference counting in this case is absolutely
 correct with respect to managing kernel side structures.

I agree this is expected and correct from the kernel point of view.

 But I am not that it is correct with respect to hiding the explicit
 close(2) call from other threads that may be waiting on the socket. In
 other words, I am not sure if fo_close is supposed to signify that
 there are no uses of a file, or that userland close-d the file.  Or
 perhaps these should be two different methods.

It would be possible to keep track of the file descriptor number but I
think it is not worth the large amount of extra code.

Keeping track of file descriptor number would be necessary to interrupt
waits after a close or dup2 on the file descriptor that was passed to
the blocking call, even if the object remains open on a different file
descriptor number or in a different process.

Also, most people would use the new functionality incorrectly anyway. A
close() on a file descriptor another thread is using is risky since it
is in most cases impossible to prove that the other thread is in fact
blocked on the file descriptor and not preempted right before making the
system call. In the latter case, the other thread might accept a
connection from a different socket created later. A dup2() of /dev/null
onto the file descriptor would be safer.

 Additional note is that shutdown(2) doesn't wake up the thread in
 accept(2) either.  At least that's true for unix domain sockets.
 Not sure if this is a bug.

I think it is a bug. It works properly for IPv4 TCP sockets.

The resulting error is [ECONNABORTED] for a blocking socket which likely
leads to infinite looping if the thread does not know about the
shutdown(2) (because that error normally means the accept should be
retried later). For a non-blocking socket the error is [EWOULDBLOCK]
which also leads to infinite looping and is certainly wrong because
select/poll do report the socket as readable. Both of these are in
kern_accept() in sys/kern/uipc_syscalls.c.

POSIX does not say which error code we should return here but these two
are almost certainly wrong (it is usable for waking up threads stuck in
accept() if those threads check a variable after every accept() failure
and do not rely on the exact value of errno). Linux returns [EINVAL] for
both blocking and non-blocking sockets, probably from the POSIX error
condition The socket is not accepting connections. In our man page
that error condition is formulated listen(2) has not been called on the
socket descriptor. which is clearly not the case.

Also, I think a non-blocking accept() should immediately fail with the
head-so_error if it is set, rather than returning [EWOULDBLOCK] until
another connection arrives. Likewise, filt_solisten() in
sys/kern/uipc_socket.c only returns true if there is a connection, not
if there was an error or shutdown() has been called. On the other hand,
sopoll_generic() looks correct.

Error reporting on non-blocking accept() might usefully be postponed
until there is a connection or the socket has been shut down, to reduce
context switches.

 But the summary seems to be is that currently it is not possible to
 break a thread out of accept(2) (at least without resorting to
 signals).

Pthread cancellation works better than raw signals for this use case. In
a proper implementation such as in FreeBSD 9.0 or newer and used
properly, it allows avoiding the resource leak that may happen when
calling longjmp() or pthread_exit() in a signal handler just after
accept() has created a new socket.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [patch] SOCK_CLOEXEC, SOCK_NONBLOCK and MSG_CMSG_CLOEXEC

2013-03-19 Thread Jilles Tjoelker
On Mon, Mar 18, 2013 at 10:59:57PM +0200, Konstantin Belousov wrote:
 On Sun, Mar 17, 2013 at 10:23:53PM +0100, Jilles Tjoelker wrote:
  Here are some more modifications to allow creating file descriptors with
  close-on-exec set. Like in linux/glibc, SOCK_CLOEXEC and SOCK_NONBLOCK
  can be OR'ed in socket() and socketpair()'s type parameter, and
  MSG_CMSG_CLOEXEC to recvmsg() makes file descriptors (SCM_RIGHTS)
  atomically close-on-exec.

  The numerical values for SOCK_CLOEXEC and SOCK_NONBLOCK are as in
  NetBSD. MSG_CMSG_CLOEXEC is the first free bit for MSG_*.

  I do not pass the SOCK_* flags to MAC because this may cause incorrect
  failures and can be done later via fcntl() anyway. I expect audit to
  cope with the new flags.

  For MSG_CMSG_CLOEXEC, I had to change unp_externalize to take a flags
  argument.

 This looks fine to me.

Thanks for the review.

 The only note I have, which is not directly related to your patch,
 is the recvmsg(2) behaviour when the undefined flag is passed.
 The syscall silently ignores the flags. I think this is quite wrong,
 and would cause interesting (security) implications if the program
 using the MSG_CMSG_CLOEXEC is run on older kernel which does not
 interpret the flag.

This is indeed unfortunate and it also affects open(2).

It may have originally been done for reasons of modularity; the
sys_open() and kern_open() need not know about the valid flag bits this
way and a device driver or filesystem can define new bits.

The effect of an older kernel ignoring MSG_CMSG_CLOEXEC is probably not
that bad, considering that running new userland on an old kernel is not
supported. Most compatibility shims for MSG_CMSG_CLOEXEC and similar
flags (such as the one in Wayland) simply contain the race condition, so
you lose anyway with an old kernel. It would be possible to use
closefrom() or an emulation of it or to lock fork() conditionally on the
necessary CLOEXEC flags not being available but it appears that people
do not go to that trouble.

 Might be, we should start returning EINVAL for unknown flag, despite
 SUSv4 not specifying the condition ?

Per POSIX it is valid to do this but it might cause a compatibility
problem if there are applications that pass garbage in flags.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


[patch] SOCK_CLOEXEC, SOCK_NONBLOCK and MSG_CMSG_CLOEXEC

2013-03-17 Thread Jilles Tjoelker
(). */
-   error = socreate(uap-domain, so, uap-type, uap-protocol,
+   error = socreate(uap-domain, so, type, uap-protocol,
td-td_ucred, td);
if (error) {
fdclose(td-td_proc-p_fd, fp, fd, td);
} else {
-   finit(fp, FREAD | FWRITE, DTYPE_SOCKET, so, socketops);
+   finit(fp, FREAD | FWRITE | fflag, DTYPE_SOCKET, so, socketops);
+   if ((fflag  FNONBLOCK) != 0)
+   (void) fo_ioctl(fp, FIONBIO, fflag, td-td_ucred, td);
td-td_retval[0] = fd;
}
fdrop(fp, td);
@@ -648,9 +663,20 @@
struct filedesc *fdp = td-td_proc-p_fd;
struct file *fp1, *fp2;
struct socket *so1, *so2;
-   int fd, error;
+   int fd, error, oflag, fflag;
 
AUDIT_ARG_SOCKET(domain, type, protocol);
+
+   oflag = 0;
+   fflag = 0;
+   if ((type  SOCK_CLOEXEC) != 0) {
+   type = ~SOCK_CLOEXEC;
+   oflag |= O_CLOEXEC;
+   }
+   if ((type  SOCK_NONBLOCK) != 0) {
+   type = ~SOCK_NONBLOCK;
+   fflag |= FNONBLOCK;
+   }
 #ifdef MAC
/* We might want to have a separate check for socket pairs. */
error = mac_socket_check_create(td-td_ucred, domain, type,
@@ -665,12 +691,12 @@
if (error)
goto free1;
/* On success extra reference to `fp1' and 'fp2' is set by falloc. */
-   error = falloc(td, fp1, fd, 0);
+   error = falloc(td, fp1, fd, oflag);
if (error)
goto free2;
rsv[0] = fd;
fp1-f_data = so1;  /* so1 already has ref count */
-   error = falloc(td, fp2, fd, 0);
+   error = falloc(td, fp2, fd, oflag);
if (error)
goto free3;
fp2-f_data = so2;  /* so2 already has ref count */
@@ -686,8 +712,14 @@
 if (error)
goto free4;
}
-   finit(fp1, FREAD | FWRITE, DTYPE_SOCKET, fp1-f_data, socketops);
-   finit(fp2, FREAD | FWRITE, DTYPE_SOCKET, fp2-f_data, socketops);
+   finit(fp1, FREAD | FWRITE | fflag, DTYPE_SOCKET, fp1-f_data,
+   socketops);
+   finit(fp2, FREAD | FWRITE | fflag, DTYPE_SOCKET, fp2-f_data,
+   socketops);
+   if ((fflag  FNONBLOCK) != 0) {
+   (void) fo_ioctl(fp1, FIONBIO, fflag, td-td_ucred, td);
+   (void) fo_ioctl(fp2, FIONBIO, fflag, td-td_ucred, td);
+   }
fdrop(fp1, td);
fdrop(fp2, td);
return (0);

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [patch] statfs does not detect -t nullfs -o union as a union mount

2013-03-01 Thread Jilles Tjoelker
On Thu, Feb 28, 2013 at 12:21:44AM +0200, Konstantin Belousov wrote:
 On Wed, Feb 27, 2013 at 10:31:42PM +0100, Jilles Tjoelker wrote:
  While testing recent changes to opendir(), I noticed that fstatfs() does
  not return the MNT_UNION flag for a -t nullfs -o union mount. As a
  result, opendir()/readdir() return files that exist in both top and
  bottom directories twice (at least . and ..). Other -o union mounts and
  -t unionfs mounts work correctly in this regard.

  The below patch passes through just the MNT_UNION flag of the nullfs
  mount itself. Perhaps more flags should be passed through.

  commit fce32a779af4eb977c9b96feb6e4f811d89f2881
  Author: Jilles Tjoelker jil...@stack.nl
  Date:   Sat Feb 23 22:22:39 2013 +0100
  
  nullfs: Preserve the MNT_UNION flag of the nullfs mount itself.
  
  This is needed so that opendir() can properly detect a union mount like
  mount -t nullfs -o union dir1 dir2.
  
  diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c
  index 3724e0a..ff06f57 100644
  --- a/sys/fs/nullfs/null_vfsops.c
  +++ b/sys/fs/nullfs/null_vfsops.c
  @@ -313,7 +313,7 @@ nullfs_statfs(mp, sbp)
   
  /* now copy across the interesting information and fake the rest */
  sbp-f_type = mstat.f_type;
  -   sbp-f_flags = mstat.f_flags;
  +   sbp-f_flags = (sbp-f_flags  MNT_UNION) | mstat.f_flags;
  sbp-f_bsize = mstat.f_bsize;
  sbp-f_iosize = mstat.f_iosize;
  sbp-f_blocks = mstat.f_blocks;

 Would it make sense to preserve more flags from the upper mount ?
 I see a use for MNT_NOEXEC as well, at least.

Yes, preserving MNT_NOEXEC will make -t nullfs -o noexec work better, in
particular rtld's check for libraries loaded via environment variables.

In the same way MNT_RDONLY, MNT_NOSUID and MNT_NOSYMFOLLOW could be
preserved.

On the other hand, MNT_ROOTFS should probably not be passed through from
the underlying filesystem.

This would give
sbp-f_flags = (sbp-f_flags  (MNT_RDONLY | MNT_NOEXEC | MNT_NOSUID |
MNT_UNION | MNT_NOSYMFOLLOW) | (mstat.f_flags  ~MNT_ROOTFS);

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


[patch] statfs does not detect -t nullfs -o union as a union mount

2013-02-27 Thread Jilles Tjoelker
While testing recent changes to opendir(), I noticed that fstatfs() does
not return the MNT_UNION flag for a -t nullfs -o union mount. As a
result, opendir()/readdir() return files that exist in both top and
bottom directories twice (at least . and ..). Other -o union mounts and
-t unionfs mounts work correctly in this regard.

The below patch passes through just the MNT_UNION flag of the nullfs
mount itself. Perhaps more flags should be passed through.

commit fce32a779af4eb977c9b96feb6e4f811d89f2881
Author: Jilles Tjoelker jil...@stack.nl
Date:   Sat Feb 23 22:22:39 2013 +0100

nullfs: Preserve the MNT_UNION flag of the nullfs mount itself.

This is needed so that opendir() can properly detect a union mount like
mount -t nullfs -o union dir1 dir2.

diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c
index 3724e0a..ff06f57 100644
--- a/sys/fs/nullfs/null_vfsops.c
+++ b/sys/fs/nullfs/null_vfsops.c
@@ -313,7 +313,7 @@ nullfs_statfs(mp, sbp)
 
/* now copy across the interesting information and fake the rest */
sbp-f_type = mstat.f_type;
-   sbp-f_flags = mstat.f_flags;
+   sbp-f_flags = (sbp-f_flags  MNT_UNION) | mstat.f_flags;
sbp-f_bsize = mstat.f_bsize;
sbp-f_iosize = mstat.f_iosize;
sbp-f_blocks = mstat.f_blocks;

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Request for review, time_pps_fetch() enhancement

2013-02-09 Thread Jilles Tjoelker
On Wed, Feb 06, 2013 at 05:58:30PM +0200, Konstantin Belousov wrote:
 On Tue, Feb 05, 2013 at 09:41:38PM -0700, Ian Lepore wrote:
  I'd like feedback on the attached patch, which adds support to our
  time_pps_fetch() implementation for the blocking behaviors described in
  section 3.4.3 of RFC 2783.  The existing implementation can only return
  the most recently captured data without blocking.  These changes add the
  ability to block (forever or with timeout) until a new event occurs.

  Index: sys/kern/kern_tc.c
  ===
  --- sys/kern/kern_tc.c  (revision 246337)
  +++ sys/kern/kern_tc.c  (working copy)
  @@ -1446,6 +1446,50 @@
* RFC 2783 PPS-API implementation.
*/
   
  +static int
  +pps_fetch(struct pps_fetch_args *fapi, struct pps_state *pps)
  +{
  [snip]
  +   aseq = pps-ppsinfo.assert_sequence;
  +   cseq = pps-ppsinfo.clear_sequence;
  +   while (aseq == pps-ppsinfo.assert_sequence 
  +   cseq == pps-ppsinfo.clear_sequence) {
 Note that compilers are allowed to optimize these accesses even over
 the sequential point, which is the tsleep() call. Only accesses to
 volatile objects are forbidden to be rearranged.

 I suggest to add volatile casts to pps in the loop condition.

The memory pointed to by pps is global (other code may have a pointer to
it); therefore, the compiler must assume that the tsleep() call (which
invokes code in a different compilation unit) may modify it.

Because volatile does not make concurrent access by multiple threads
defined either, adding it here only seems to slow down the code
(potentially).

  +   err = tsleep(pps, PCATCH, ppsfch, timo);
  +   if (err == EWOULDBLOCK  fapi-timeout.tv_sec == -1) {
  +   continue;
  +   } else if (err != 0) {
  +   return (err);
  +   }
  +   }
  +   }
-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: use after free in grep?

2012-12-20 Thread Jilles Tjoelker
On Thu, Dec 20, 2012 at 01:19:07PM +0100, Dimitry Andric wrote:
 On 2012-12-20 08:13, Eitan Adler wrote:
  in xrealloc_impl

  338   new_ptr = realloc(ptr, new_size);
  339   if (new_ptr != NULL)
  340 {
  341   hash_table_del(xmalloc_table, ptr);

  ^^^ isn't this a use-after-free of ptr?

 Yes, realloc does not guarantee the realloc'd space will be at the same
 address, so it may free ptr at its discretion.

Even if you somehow know realloc() is not going to move the block, it is
still wrong to use any pointer not derived from its return value to
access the block. Comparing the old and the new pointers (normally or
with memcmp()) does not help; it has an indeterminate result.

See http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm

 Also, there is a memory leak if realloc() returns NULL.  This is a
 very usual mistake when using realloc(). :-)

No, this would be correct if a successful realloc() call did not make
the old pointer indeterminate. The hash table remains unchanged if
realloc() fails.

 Probably, the code should do the hash_table_del() before the realloc(),
 but I am not sure if hash_table_del() will already free ptr.

Yes, and add it back if realloc() fails.

A smarter internal interface to the hash table would avoid freeing and
reallocating hash table entries here (which might fail).

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Fix overlinking in base aka import pkgconf

2012-12-16 Thread Jilles Tjoelker
On Sun, Dec 16, 2012 at 04:03:40PM +0200, Konstantin Belousov wrote:
 On Sun, Dec 16, 2012 at 02:01:00PM +0100, Jeremie Le Hen wrote:
  I think this could be solved by an implicit linker script contained in
  .so and .a files, pointing to the real libraries.

  We have already the SHLIB_LDSCRIPT variable to accomplish this for .so
  files.  It may be possible to do the same for .a files, though this
  would require renaming the real archives to something like .a.0 (here
  I'm just taking a similar scheme to the one used for DSO).

  As a matter of fact, libprocstat.a would contain:

GROUP ( /usr/lib/libprocstat.a.0 /usr/lib/libkvm.a /usr/lib/libutil.a )

  Note that libkvm.a and libutil.a could be linker scripts as well.

  Kib, do you see any problem to this proposition?

 Wouldn't you need to completely rewrite the handling of the .a files
 in the share/mk ? I somewhat dislike the mere thought that .a is not
 an archive any longer.

 Does it make sense from the overhead and complexity POV, for such small
 goal ?

For base this may not make much sense since there is not much need to
run an old base binary against newer base libraries. Apart from that the
only advantage is some form of philosophical correctness.

For ports this may be a good way to keep static linking working after
fixing overlinking caused by libtool's dependency_libs in .la files.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Incorrect use of posix_memalign() (was: Re: svn commit: r243405 - in stable/9: include lib/libc/stdlib)

2012-11-25 Thread Jilles Tjoelker
On Thu, Nov 22, 2012 at 03:19:53PM +, Ed Schouten wrote:
 Author: ed
 Date: Thu Nov 22 15:19:53 2012
 New Revision: 243405
 URL: http://svnweb.freebsd.org/changeset/base/243405

 Log:
   MFC r229848:

 Add aligned_alloc(3).

 The C11 folks reinvented the wheel by introducing an aligned version of
 malloc(3) called aligned_alloc(3), instead of posix_memalign(3). Instead
 of returning the allocation by reference, it returns the address, just
 like malloc(3).

   I'm MFCing this now, as it seems aligned_alloc(3) is needed to make the
   new version of libc++ work, which was merged back to FreeBSD 9 in r243376.

The C11 committee knew about posix_memalign() and had several reasons
for creating a new function; see for example
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1397.htm .

In particular, posix_memalign() is a little annoying to use correctly,
often requiring a temporary variable of type void *. It is tempting to
do something like
  error = posix_memalign((void **)some_ptr, aln, sz);
and some FreeBSD code does this, but it violates strict-aliasing. A
further mostly theoretical objection is that assumes that the
representation of some_ptr and void * are compatible which C does not
guarantee.

The problem can be fixed by adding the temporary pointer variable like
  void *tmp_ptr;
  error = posix_memalign(tmp_ptr, aln, sz);
  some_ptr = tmp_ptr;
or by using aligned_alloc() instead of posix_memalign()
  some_ptr = aligned_alloc(aln, sz);
with error checking against some_ptr instead of error.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


[patch] rtld: fix fd leak with parallel dlopen and fork/exec

2012-11-04 Thread Jilles Tjoelker
);
if (error != 0)
errc(1, error, posix_spawnp);
do
wpid = waitpid(pid, status, 0);
while (wpid == -1  errno == EINTR);
if (wpid == -1)
err(1, waitpid);
if (status != 0)
errx(1, sh -c failed);
}
}

int
main(int argc, char *argv[])
{
pthread_t td;
int error;

if (argc != 2)
{
fprintf(stderr, Usage: %s dso\n, argv[0]);
return 1;
}

error = pthread_create(td, NULL, dlopen_thread, argv[1]);
if (error != 0)
errc(1, error, pthread_create);

filestat_loop();

return 0;
}

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: -lpthread vs -pthread: does -D_REENTRANT matter?

2012-10-21 Thread Jilles Tjoelker
On Sun, Oct 14, 2012 at 04:55:07PM -0400, Eitan Adler wrote:
 On 14 October 2012 10:42, Jilles Tjoelker jil...@stack.nl wrote:

  Because C99 does not specify threading, it allows these transformations.
  In C11, they are forbidden. Passing -pthread disables them as well.

 Is the man page wrong or do I misunderstand?

This option sets flags for both the preprocessor and linker.  It
does not affect the thread safety of object code produced by the
compiler or that of libraries supplied with it.

OK, that would explain why I could not find such things.

I seem to recall GCC must sometimes be instructed not to do
thread-unsafe transformations but I cannot find how to do so.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: -lpthread vs -pthread: does -D_REENTRANT matter?

2012-10-14 Thread Jilles Tjoelker
On Mon, Oct 08, 2012 at 12:17:08PM -0400, Eitan Adler wrote:
 The only difference between -lpthread and -pthread that I could see is
 that the latter also sets -D_REENTRANT.
 However, I can't find any uses of _REENTRANT anywhere outside of a few
 utilities that seem to define it manually.

 Testing with various manually written pthread programs resulted in
 identical binaries, let alone identical results.

 Is there an actual difference between -pthread and -lpthread or is
 this just a historical artifact?

In some cases, -pthread also affects the compiler's code generation. On
some RISC architectures, compilers may try to avoid loads and stores of
less than 32 bits.

For example (untested):
  struct { int n; char a, b, c, d; } *p;
  p-a = p-b = p-c = 0;

The compiler might load p-d and then store the 32 bits containing a, b,
c and d at once. This causes a race condition if p-d is written
concurrently.

Because C99 does not specify threading, it allows these transformations.
In C11, they are forbidden. Passing -pthread disables them as well.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Change vfork() to posix_spawn()?

2012-09-14 Thread Jilles Tjoelker
On Fri, Sep 14, 2012 at 01:45:49PM +0200, Erik Cederstrand wrote:
 Den 14/09/2012 kl. 13.03 skrev Ivan Voras ivo...@freebsd.org:
  On 14/09/2012 09:49, Erik Cederstrand wrote:

  I'm looking through the Clang Analyzer scans on
  http://scan.freebsd.your.org/freebsd-head looking for false
  positives to report back to LLVM. There are quite a list of reports
  suggesting to change vfork() calls to posix_spawn(). Example from
  /bin/rpc:
  http://scan.freebsd.your.org/freebsd-head/bin.rcp/2012-09-12-amd64/report-nsOV80.html#EndPath

  I know nothing about this but I can see fork and posix_spawn have
  been discussed on this list previously. Is this a legitimate
  warning (in this case and in general in FreeBSD base)?

  Currently (on 9-stable at least), posix_spawn() is implemented as a
  wrapper around vfork(), so I doubt replacing one with the other
  would do much.

vfork() returns twice, possibly reanimating variables from the dead.
Calling posix_spawn() limits this issue to the posix_spawn()
implementation only.

For example, in case of unwilling compiler developers, the optimization
level for that file might be lowered or more volatile keywords might be
added. I think it makes more sense to disable various optimizations in
the compiler automatically in functions that call vfork(), longjmp() and
similar functions, but I do not decide what compiler developers do.

 The analyzer added this warning in January. The release notes link to
 this explanation:
 https://www.securecoding.cert.org/confluence/display/seccode/POS33-C.+Do+not+use+vfork()

 I guess this is the important part:

 Because of the implementation of the vfork() function, the parent
 process is suspended while the child process executes. If a user sends
 a signal to the child process, delaying its execution, the parent
 process (which is privileged) is also blocked. This means that an
 unprivileged process can cause a privileged process to halt, which is
 a privilege inversion resulting in a denial of service.

This problem only occurs if privileges are dropped between vfork and
exec, which is uncommon. If no privileges are dropped, the user can
affect the parent directly.

Furthermore, this exact problem does not happen in FreeBSD because child
processes between vfork and exec/exit are not affected by stop signals
(this is stronger than the vfork(2) man page documents).

However, related issues are still present. If there is a signal handler
that blocks for a long time (many functions which do this are
async-signal-safe) for a signal permitted by
security.bsd.conservative_signals, an unprivileged user will be able to
trigger it and delay the thread calling vfork(). A function may also be
async-signal-safe but not suitable for a vforked child (for example,
libthr makes many functions async-signal-safe by postponing signal
handlers which is not good enough if a vforked child is SIGKILL'ed).

An unprivileged user may also trigger priority inversion by lowering the
priority of the child process and consuming CPU time at a higher
priority.

Obviously, the child process should not lower its priority voluntarily
either.

These problems can be fixed in various ways. The direct priority
inversion problem can be fixed by using fork() instead in that case or
by adding a priority inheritance scheme in the kernel for vforked
children (but only for the static priority; the parent's dynamic
priority will increase because it is sleeping).

The privilege manipulation available via POSIX_SPAWN_RESETIDS seems safe
enough. Since it only affects the effective UID/GID, it does not affect
the ability to modify scheduling parameters (real UID) or to send
signals (real or saved UID). Since the seteuid() call itself will set
the issetugid flag to true if it changed anything, it does not affect
the ability to debug before exec.

More general privilege dropping frequently involves frameworks such as
PAM which are not async-signal-safe and certainly not vfork-safe.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [patch] libc: Do not export .cerror

2012-08-31 Thread Jilles Tjoelker
On Tue, Aug 28, 2012 at 02:03:22PM +0300, Konstantin Belousov wrote:
 On Sat, Aug 25, 2012 at 12:16:55AM +0200, Jilles Tjoelker wrote:
  Not exporting .cerror causes it to be jumped to directly instead of via
  the PLT.

  The below patch is for i386 only and also takes advantage of .cerror's
  new status by not saving and loading %ebx before jumping to it.
  (Therefore, .cerror now saves and loads %ebx itself.) Where there was a
  conditional jump to a jump to .cerror, the conditional jump has been
  changed to jump to .cerror directly (many modern CPUs don't do static
  prediction and in any case it is not much of a benefit anyway).

 Why do you need to save/restore the %ebx at all ? %ebx ==
 __GLOBAL_OFFSET_TABLE__ is only needed when you access GOT, but .cerror
 only works with PLT, which is addressed using the instruction capable of
 relative addressing. The old .cerror does not need it as well, but it is
 just engraved in the function ABI.

On i386, a shared object's PLT entry needs %ebx set up to work properly.
This is because such a PLT entry needs to access the GOT to find the
address to jump to (the first instruction is jmp *d32(%ebx)).

An executable's PLT entry accesses the GOT via absolute addressing and
therefore does not need %ebx.

  The patch decreases the size of libc.so.7 by a few kilobytes.

  Similar changes could be made to other architectures, and there may be
  more symbols that are exported but need not be.
 Sure, would you handle at least amd64 too ?

The below patch handles amd64.

I'm a bit annoyed that most of the syscall stubs are 17 bytes long now
and have the maximum 15 bytes of padding. This means that the patch
provides virtually no gain in code size.

Index: lib/libc/amd64/Symbol.map
===
--- lib/libc/amd64/Symbol.map   (revision 239865)
+++ lib/libc/amd64/Symbol.map   (working copy)
@@ -66,7 +66,6 @@
.curbrk;
.minbrk;
_brk;
-   .cerror;
_end;
__sys_vfork;
_vfork;
Index: lib/libc/amd64/SYS.h
===
--- lib/libc/amd64/SYS.h(revision 239865)
+++ lib/libc/amd64/SYS.h(working copy)
@@ -36,38 +36,20 @@
 #include sys/syscall.h
 #include machine/asm.h
 
-#ifdef PIC
 #defineRSYSCALL(x) ENTRY(__CONCAT(__sys_,x));  
\
.weak CNAME(x); \
.set CNAME(x),CNAME(__CONCAT(__sys_,x));\
.weak CNAME(__CONCAT(_,x)); \
.set CNAME(__CONCAT(_,x)),CNAME(__CONCAT(__sys_,x)); \
-   mov __CONCAT($SYS_,x),%rax; KERNCALL; jb 2f; ret; \
-   2: movq PIC_GOT(HIDENAME(cerror)),%rcx; jmp *%rcx; \
+   mov __CONCAT($SYS_,x),%eax; KERNCALL;   \
+   jb HIDENAME(cerror); ret;   \
END(__CONCAT(__sys_,x))
 
 #definePSEUDO(x)   ENTRY(__CONCAT(__sys_,x));  
\
.weak CNAME(__CONCAT(_,x)); \
.set CNAME(__CONCAT(_,x)),CNAME(__CONCAT(__sys_,x)); \
-   mov __CONCAT($SYS_,x),%rax; KERNCALL; jb 2f; ret ; \
-   2: movq PIC_GOT(HIDENAME(cerror)),%rcx; jmp *%rcx; \
+   mov __CONCAT($SYS_,x),%eax; KERNCALL;   \
+   jb HIDENAME(cerror); ret;   \
END(__CONCAT(__sys_,x))
-#else
-#defineRSYSCALL(x) ENTRY(__CONCAT(__sys_,x));  
\
-   .weak CNAME(x); \
-   .set CNAME(x),CNAME(__CONCAT(__sys_,x));\
-   .weak CNAME(__CONCAT(_,x)); \
-   .set CNAME(__CONCAT(_,x)),CNAME(__CONCAT(__sys_,x)); \
-   mov __CONCAT($SYS_,x),%rax; KERNCALL; jb 2f; ret; \
-   2: jmp HIDENAME(cerror);\
-   END(__CONCAT(__sys_,x))
 
-#definePSEUDO(x)   ENTRY(__CONCAT(__sys_,x));  
\
-   .weak CNAME(__CONCAT(_,x)); \
-   .set CNAME(__CONCAT(_,x)),CNAME(__CONCAT(__sys_,x)); \
-   mov __CONCAT($SYS_,x),%rax; KERNCALL; jb 2f; ret; \
-   2: jmp HIDENAME(cerror);\
-   END(__CONCAT(__sys_,x))
-#endif
-
 #define KERNCALL   movq %rcx, %r10; syscall
Index: lib/libc/amd64/gen/rfork_thread.S
===
--- lib/libc/amd64/gen/rfork_thread.S   (revision 239865)
+++ lib/libc/amd64/gen/rfork_thread.S   (working copy)
@@ -93,12 +93,7 @@
 2:
popq%r12

Re: system() using vfork() or posix_spawn() and libthr

2012-08-26 Thread Jilles Tjoelker
On Sun, Aug 19, 2012 at 03:26:36PM +0300, Konstantin Belousov wrote:
 On Fri, Aug 17, 2012 at 06:08:47PM +0200, Jilles Tjoelker wrote:
  On Fri, Aug 17, 2012 at 03:43:12PM +0300, Konstantin Belousov wrote:
   On Fri, Aug 17, 2012 at 12:39:33AM +0200, Jilles Tjoelker wrote:
On Thu, Aug 16, 2012 at 02:44:26PM +0300, Konstantin Belousov wrote:
 BTW, returning to Jilles proposal, can we call vfork() only for
 single-threaded parent ? I think it gives good boost for 
 single-threaded
 case, and also eliminates the concerns of non-safety.

This would probably fix the safety issues but at a price. There is a
correlation between processes so large that they benefit greatly from
vfork and threaded processes.
   Ok, so I will continue with my patch.

On the other hand, I think direct calls to vfork() in applications are
risky and it may not be possible to support them safely in all
circumstances. However, if libc is calling vfork() such as via popen(),
system() or posix_spawn(), it should be possible even in a
multi-threaded process. In that case, the rtld and libthr problems can
be avoided by using aliases with hidden visibility for all functions the
vforked child needs to call (or any other method that prevents
interposition and hard-codes a displacement into libc.so).

   I do not see how using any aliases could help there. Basically, if mt
   process is not single-threaded for vfork, you can have both some parent
   thread and child enter rtld. This is complete mess.

  If libc calls a function with hidden visibility, this proceeds directly
  (not via the PLT) and rtld is not involved.
 Oh, so you are only concerned with libc there ?

 If spending so much efforts on this issue, IMO it is pity to only fix
 libc and not fix vfork for all consumers.

True. If vfork can be fixed more generally without too much penalty, it
is better to do that. And it looks like that is possible, including
detecting unexpected kills, see below.

  Several ways to do this are in section 2.2.7 Avoid Using Exported
  Symbols of Ulrich Drepper's dsohowto. One of them is something like

  extern __typeof(next) next_int
  __attribute((alias(next), visibility(hidden)));

  in the same source as the definition of the function

  int next(void) { ...; }

  As Drepper notes, the visibility attribute is not strictly required if a
  version script keeps the symbol local but it might lead to better code.
  At least on i386, though, the optimal direct near call instruction is
  generated even without it. For example, _nsdispatch() calls
  libc_dlopen() (kept local by the version script) without going through
  the PLT (from the output of objdump -dS on the libc.so.7 in /usr/obj).

 I am not confident, so this might be a hallucination, but 'optimization'
 in the recent gnu ld might rewrite  the call target to avoid PLT when
 symbol visibility is hidden.

If symbol visibility is hidden, rtld does not know about the symbol's
name; therefore, the only way ld can make it work is to make the call
instruction point directly at the called function, not to a PLT entry.

Different from what I wrote earlier, on i386, there is still a penalty
in omitting the visibility attribute: the compiler will generate code to
load %ebx which may not be necessary.

  In the assembler syscall stubs using code from lib/libc/arch/SYS.h this
  can be done by adding another .set (we currently have foo, _foo and
  __sys_foo for a syscall foo; some syscalls have only _foo and
  __sys_foo) such as __syshidden_foo. The ones that are actually used
  then need prototypes (similar to the __sys_* ones in lib/libthr/?).

  For some reason, the symbol .cerror (HIDENAME(cerror)) is also exported.
  Either this should be kept local or a local uninterposable alias should
  be added and used (as with the syscalls).

I sent a patch for .cerror (i386 only) in a new mailing list thread.

  The function __error() (to get errno's address for the current thread)
  is and must be called via the PLT (because libthr is separate).
  Therefore, we should ensure it is called at least once before vfork so
  calls in the child do not involve rtld. The implementations for the
  various architectures use the TLS register (or memory location for ARM),
  so they seem safe.

  This should suffice to fix posix_spawn() but the _execvpe() used by
  posix_spawnp also uses various string functions. If not all of these
  have already been called earlier, this will not work. Making calls to
  them not go through the PLT seems fairly hard, even though it would make
  sense in general, so perhaps I should instead reimplement it such that
  the parent does almost all of the work.

  An alternative is to write the core of posix_spawn() in assembler using
  system calls directly but I would rather avoid that :(

 BTW, we have very gross inefficiency in our libc syscall stubs on i386.
 Namely, the PIC_PROLOGUE generates 'call 1f; 1f:' sequence, which

[patch] libc: Do not export .cerror

2012-08-24 Thread Jilles Tjoelker
-   jmp PIC_PLT(HIDENAME(cerror))
 END(__sys_getcontext)
 
.section .note.GNU-stack,,%progbits
Index: lib/libc/i386/sys/cerror.S
===
--- lib/libc/i386/sys/cerror.S  (revision 239195)
+++ lib/libc/i386/sys/cerror.S  (working copy)
@@ -48,13 +48,14 @@
.globl  CNAME(__error)
.type   CNAME(__error),@function
 HIDENAME(cerror):
+#ifdef PIC
+   PIC_PROLOGUE
pushl   %eax
-#ifdef PIC
-   /* The caller must execute the PIC prologue before jumping to cerror. */
callPIC_PLT(CNAME(__error))
popl%ecx
PIC_EPILOGUE
 #else
+   pushl   %eax
callCNAME(__error)
popl%ecx
 #endif
Index: lib/libc/i386/sys/sbrk.S
===
--- lib/libc/i386/sys/sbrk.S(revision 239195)
+++ lib/libc/i386/sys/sbrk.S(working copy)
@@ -59,7 +59,7 @@
addl%eax,4(%esp)
mov $SYS_break,%eax
KERNCALL
-   jb  err
+   jb  HIDENAME(cerror)
PIC_PROLOGUE
movlPIC_GOT(HIDENAME(curbrk)),%edx
movl(%edx),%eax
@@ -67,9 +67,6 @@
PIC_EPILOGUE
 back:
ret
-err:
-   PIC_PROLOGUE
-   jmp PIC_PLT(HIDENAME(cerror))
 
 #else /* !PIC */
 
@@ -80,13 +77,11 @@
addl%eax,4(%esp)
mov $SYS_break,%eax
KERNCALL
-   jb  err
+   jb  HIDENAME(cerror)
movlHIDENAME(curbrk),%eax
addl%ecx,HIDENAME(curbrk)
 back:
ret
-err:
-   jmp HIDENAME(cerror)
 #endif /* PIC */
 END(sbrk)
 
Index: lib/libc/i386/sys/Ovfork.S
===
--- lib/libc/i386/sys/Ovfork.S  (revision 239195)
+++ lib/libc/i386/sys/Ovfork.S  (working copy)
@@ -50,8 +50,7 @@
jmp *%ecx
 1:
pushl   %ecx
-   PIC_PROLOGUE
-   jmp PIC_PLT(HIDENAME(cerror))
+   jmp HIDENAME(cerror)
 END(__sys_vfork)
 
.section .note.GNU-stack,,%progbits
Index: lib/libc/i386/sys/ptrace.S
===
--- lib/libc/i386/sys/ptrace.S  (revision 239195)
+++ lib/libc/i386/sys/ptrace.S  (working copy)
@@ -50,11 +50,8 @@
 #endif
mov $SYS_ptrace,%eax
KERNCALL
-   jb  err
+   jb  HIDENAME(cerror)
ret
-err:
-   PIC_PROLOGUE
-   jmp PIC_PLT(HIDENAME(cerror))
 END(ptrace)
 
.section .note.GNU-stack,,%progbits
Index: lib/libc/i386/sys/exect.S
===
--- lib/libc/i386/sys/exect.S   (revision 239195)
+++ lib/libc/i386/sys/exect.S   (working copy)
@@ -47,8 +47,7 @@
pushl   %edx
popf
KERNCALL
-   PIC_PROLOGUE
-   jmp PIC_PLT(HIDENAME(cerror))   /* exect(file, argv, env); */
+   jmp HIDENAME(cerror)/* exect(file, argv, env); */
 END(exect)
 
.section .note.GNU-stack,,%progbits
Index: lib/libc/i386/sys/syscall.S
===
--- lib/libc/i386/sys/syscall.S (revision 239195)
+++ lib/libc/i386/sys/syscall.S (working copy)
@@ -45,11 +45,8 @@
KERNCALL
push%ecx/* need to push a word to keep stack frame intact
   upon return; the word must be the return address. */
-   jb  1f
+   jb  HIDENAME(cerror)
ret
-1:
-   PIC_PROLOGUE
-   jmp PIC_PLT(HIDENAME(cerror))
 END(syscall)
 
.section .note.GNU-stack,,%progbits
-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: system() using vfork() or posix_spawn() and libthr

2012-08-17 Thread Jilles Tjoelker
On Fri, Aug 17, 2012 at 03:43:12PM +0300, Konstantin Belousov wrote:
 On Fri, Aug 17, 2012 at 12:39:33AM +0200, Jilles Tjoelker wrote:
  On Thu, Aug 16, 2012 at 02:44:26PM +0300, Konstantin Belousov wrote:
   BTW, returning to Jilles proposal, can we call vfork() only for
   single-threaded parent ? I think it gives good boost for single-threaded
   case, and also eliminates the concerns of non-safety.

  This would probably fix the safety issues but at a price. There is a
  correlation between processes so large that they benefit greatly from
  vfork and threaded processes.
 Ok, so I will continue with my patch.

  On the other hand, I think direct calls to vfork() in applications are
  risky and it may not be possible to support them safely in all
  circumstances. However, if libc is calling vfork() such as via popen(),
  system() or posix_spawn(), it should be possible even in a
  multi-threaded process. In that case, the rtld and libthr problems can
  be avoided by using aliases with hidden visibility for all functions the
  vforked child needs to call (or any other method that prevents
  interposition and hard-codes a displacement into libc.so).

 I do not see how using any aliases could help there. Basically, if mt
 process is not single-threaded for vfork, you can have both some parent
 thread and child enter rtld. This is complete mess.

If libc calls a function with hidden visibility, this proceeds directly
(not via the PLT) and rtld is not involved.

Several ways to do this are in section 2.2.7 Avoid Using Exported
Symbols of Ulrich Drepper's dsohowto. One of them is something like

extern __typeof(next) next_int
__attribute((alias(next), visibility(hidden)));

in the same source as the definition of the function

int next(void) { ...; }

As Drepper notes, the visibility attribute is not strictly required if a
version script keeps the symbol local but it might lead to better code.
At least on i386, though, the optimal direct near call instruction is
generated even without it. For example, _nsdispatch() calls
libc_dlopen() (kept local by the version script) without going through
the PLT (from the output of objdump -dS on the libc.so.7 in /usr/obj).

In the assembler syscall stubs using code from lib/libc/arch/SYS.h this
can be done by adding another .set (we currently have foo, _foo and
__sys_foo for a syscall foo; some syscalls have only _foo and
__sys_foo) such as __syshidden_foo. The ones that are actually used
then need prototypes (similar to the __sys_* ones in lib/libthr/?).

For some reason, the symbol .cerror (HIDENAME(cerror)) is also exported.
Either this should be kept local or a local uninterposable alias should
be added and used (as with the syscalls).

The function __error() (to get errno's address for the current thread)
is and must be called via the PLT (because libthr is separate).
Therefore, we should ensure it is called at least once before vfork so
calls in the child do not involve rtld. The implementations for the
various architectures use the TLS register (or memory location for ARM),
so they seem safe.

This should suffice to fix posix_spawn() but the _execvpe() used by
posix_spawnp also uses various string functions. If not all of these
have already been called earlier, this will not work. Making calls to
them not go through the PLT seems fairly hard, even though it would make
sense in general, so perhaps I should instead reimplement it such that
the parent does almost all of the work.

An alternative is to write the core of posix_spawn() in assembler using
system calls directly but I would rather avoid that :(

  There may still be a problem in programs that install signal handlers
  because the set of async-signal-safe functions is larger than what can
  be done in a vforked child. Userland can avoid this by masking affected
  signals before calling vfork() and resetting them to SIG_DFL before
  unmasking them. This will add many syscalls if the code does not know
  which signals are affected (such as libc). Alternatively, the kernel
  could map caught signals to the default action for processes with
  P_PPWAIT (just like it does not stop such processes because of signals
  or TTY job control).

 If rtld does not work, then any library function call from a signal handler
 is problematic. My goal is to get a working rtld and possibly malloc.

 BTW, not quite related, it seems that the placement of openat,
 setcontext and swapcontext in the pthread.map is wrong. openat is
 at FBSD_1.1 in libc, and *context are at FBSD_1.0 version, while
 libthr exports them at 1.2. App or library gets linked to arbitrary
 version depending on whether libphread was specified at link time, and
 interposition from libthr does not work.

Oops :( Can this still be fixed (like by exporting identical functions
in multiple versions)?

 Below is the latest version of my patch for vfork, which survives (modified)
 tools/test/pthread_vfork_test. Patch is only for x86 right now.

Why does

Re: system() using vfork() or posix_spawn() and libthr

2012-08-16 Thread Jilles Tjoelker
On Thu, Aug 16, 2012 at 02:44:26PM +0300, Konstantin Belousov wrote:
 My point is that the fact that fork() is called from dynamic context
 that was identified as the signal handler does not mean anything.
 It can be mis-identified for many reasons, which both me and you
 tried to partially enumerate above.

 The really important thing is the atomicity of the current context,
 e.g. synchronous SIGSEGV caused by a language runtime GC is very
 much safe place to call atfork handlers, since runtimes usually cause
 signal generations only at the safe place.

 I do not think that such approach as you described can be completed,
 the _Fork() seems the only robust way.

Agreed, that way (detecting signal handler) lies madness.

If need be, _Fork() is easily implemented and used.

 BTW, returning to Jilles proposal, can we call vfork() only for
 single-threaded parent ? I think it gives good boost for single-threaded
 case, and also eliminates the concerns of non-safety.

This would probably fix the safety issues but at a price. There is a
correlation between processes so large that they benefit greatly from
vfork and threaded processes.

On the other hand, I think direct calls to vfork() in applications are
risky and it may not be possible to support them safely in all
circumstances. However, if libc is calling vfork() such as via popen(),
system() or posix_spawn(), it should be possible even in a
multi-threaded process. In that case, the rtld and libthr problems can
be avoided by using aliases with hidden visibility for all functions the
vforked child needs to call (or any other method that prevents
interposition and hard-codes a displacement into libc.so).

There may still be a problem in programs that install signal handlers
because the set of async-signal-safe functions is larger than what can
be done in a vforked child. Userland can avoid this by masking affected
signals before calling vfork() and resetting them to SIG_DFL before
unmasking them. This will add many syscalls if the code does not know
which signals are affected (such as libc). Alternatively, the kernel
could map caught signals to the default action for processes with
P_PPWAIT (just like it does not stop such processes because of signals
or TTY job control).

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: system() using vfork() or posix_spawn() and libthr

2012-08-14 Thread Jilles Tjoelker
On Tue, Aug 14, 2012 at 11:15:06PM +0800, David Xu wrote:
 But in real word, pthread atfork handlers are not async-signal safe,
 they mostly do mutex locking and unlocking to keep consistent state,
 mutex is not async-signal safe.
 The malloc prefork and postfork handlers happen to work because I have
 some hacking code in library for malloc locks. Otherwise, you even
 can not use fork() in signal handler.

This problem was also reported to the Austin Group at
http://austingroupbugs.net/view.php?id=62

Atfork handlers are inherently async-signal-unsafe.

An interpretation was issued suggesting to remove fork() from the list
of async-signal-safe functions and add a new async-signal-safe function
_Fork() which does not call the atfork handlers.

This change will however not be in POSIX.1-2008 TC1 but only in the next
issue (SUSv5).

A slightly earlier report http://austingroupbugs.net/view.php?id=18 just
requested the _Fork() function because an existing application
deadlocked when calling fork() from a signal handler.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: system() using vfork() or posix_spawn() and libthr

2012-08-11 Thread Jilles Tjoelker
On Fri, Aug 10, 2012 at 10:16:04AM +0800, David Xu wrote:
 On 2012/08/09 18:56, Jilles Tjoelker wrote:
  On Mon, Aug 06, 2012 at 11:25:35AM +0300, Konstantin Belousov wrote:
  On Sun, Aug 05, 2012 at 11:54:32PM +0200, Jilles Tjoelker wrote:
  On Mon, Jul 30, 2012 at 01:53:03PM +0300, Konstantin Belousov wrote:
  On Mon, Jul 30, 2012 at 12:24:08PM +0200, Jilles Tjoelker wrote:
  People sometimes use system() from large address spaces where it would
  improve performance greatly to use vfork() instead of fork().

  A simple approach is to change fork() to vfork(), although I have not
  tried this. It seems safe enough to use sigaction and sigprocmask system
  calls in the vforked process.

  Alternatively, we can have posix_spawn() do the vfork() with signal
  changes. This avoids possible whining from compilers and static
  analyzers about using vfork() in system.c. However, I do not like the
  tricky code for signals and that it adds lines of code.

  This is lightly tested.

  It is interesting to note that for some time our vfork(2) no longer
  stops the whole forked process (parent), only the forking thread is
  waiting for the child exit or exec. I am not sure is this point
  important for system(3), but determined code can notice the difference
  from the fork-vfork switch.

  Neither fork nor vfork call thread_single(SINGLE_BOUNDARY), so this is
  not a difference.
  It is the difference, because vforked child shares parent address space.

  Thread singling may be noticeable from a failing execve() (but only in
  the process doing execve()) and in the rare case of rfork() without
  RFPROC.

  No, other running threads in parent affect vforked child till exec or exit.
  In fact, I would classify this as bug, but not a serious one.

  There are some ugly ways this parallel execution is depended on. If the
  vforked child calls sigaction() while another thread is also in
  sigaction() for that signal, the vforked child needs to wait for the
  other thread to release the lock.

  This uses a per-process lock to synchronize threads in different
  processes, which may not work properly.

  If the vforked child is killed (such as by SIGKILL) while holding the
  lock, the parent is not killed but its _thr_sigact is damaged.

  These problems could be avoided in libthr by skipping the lock in
  _sigaction() if a signal action is being set to SIG_DFL or SIG_IGN and
  the old action is not queried. In those cases, _thr_sigact is not
  touched so no lock is required. This change also helps applications,
  provided they call sigaction() and not signal().

  Alternatively, posix_spawn() and system() could use the sigaction system
  call directly, bypassing libthr (if present). However, this will not
  help applications that call vfork() and sigaction() themselves (such as
  a shell that wants to implement ...  using vfork()).

  posix_spawn() likely still needs some adjustment so that having it reset
  all signals (sigfillset()) to the default action will not cause it to
  [EINVAL] because libthr does not allow changing SIGTHR's disposition.

  Index: lib/libthr/thread/thr_sig.c
  ===
  --- lib/libthr/thread/thr_sig.c (revision 238970)
  +++ lib/libthr/thread/thr_sig.c (working copy)
  @@ -519,8 +519,16 @@
  return (-1);
  }
 
  -   if (act)
  +   if (act) {
  newact = *act;
  +   /*
  +* Short-circuit cases where we do not touch _thr_sigact.
  +* This allows performing these safely in a vforked child.
  +*/
  +   if ((newact.sa_handler == SIG_DFL ||
  +   newact.sa_handler == SIG_IGN)  oact == NULL)
  +   return (__sys_sigaction(sig,newact, NULL));
  +   }
 
  __sys_sigprocmask(SIG_SETMASK,_thr_maskset,oldset);
  _thr_rwl_wrlock(_thr_sigact[sig-1].lock);
 

 Your patch is better than nothing, I don't object.
 The problem is visble to you, but there is also invisible user - rtld.
 If a symbol is never used in parent process, but now it is used
 in a vforked child, the rtld will be involved, if the child
 is killed, the rtld's data structure may be in inconsistent state,
 such as locking or link list etcs...
 I think this problem might be a non-fixable problem.

Hmm. Rtld cannot be fixed like libthr because its data structures are
inherently in userland.

Perhaps signal handling should be different for a vforked child, like
the default action of a signal sent to a thread affects the entire
process and not just the thread. This cannot be implemented in the
calling code because resolving execve() itself also needs rtld (ugly
hacks like performing an execve() call that is guaranteed to fail
aside).

The rtld problem can be avoided specifically by linking with '-z now'.
This might be acceptable for sh and csh; most applications can use
posix_spawn() which would have to become a system call.

-- 
Jilles Tjoelker

Re: system() using vfork() or posix_spawn() and libthr

2012-08-09 Thread Jilles Tjoelker
On Mon, Aug 06, 2012 at 11:25:35AM +0300, Konstantin Belousov wrote:
 On Sun, Aug 05, 2012 at 11:54:32PM +0200, Jilles Tjoelker wrote:
  On Mon, Jul 30, 2012 at 01:53:03PM +0300, Konstantin Belousov wrote:
   On Mon, Jul 30, 2012 at 12:24:08PM +0200, Jilles Tjoelker wrote:
People sometimes use system() from large address spaces where it would
improve performance greatly to use vfork() instead of fork().

A simple approach is to change fork() to vfork(), although I have not
tried this. It seems safe enough to use sigaction and sigprocmask system
calls in the vforked process.

Alternatively, we can have posix_spawn() do the vfork() with signal
changes. This avoids possible whining from compilers and static
analyzers about using vfork() in system.c. However, I do not like the
tricky code for signals and that it adds lines of code.

This is lightly tested.

   It is interesting to note that for some time our vfork(2) no longer
   stops the whole forked process (parent), only the forking thread is
   waiting for the child exit or exec. I am not sure is this point
   important for system(3), but determined code can notice the difference
   from the fork-vfork switch.

  Neither fork nor vfork call thread_single(SINGLE_BOUNDARY), so this is
  not a difference.
 It is the difference, because vforked child shares parent address space.

  Thread singling may be noticeable from a failing execve() (but only in
  the process doing execve()) and in the rare case of rfork() without
  RFPROC.

 No, other running threads in parent affect vforked child till exec or exit.
 In fact, I would classify this as bug, but not a serious one.

There are some ugly ways this parallel execution is depended on. If the
vforked child calls sigaction() while another thread is also in
sigaction() for that signal, the vforked child needs to wait for the
other thread to release the lock.

This uses a per-process lock to synchronize threads in different
processes, which may not work properly.

If the vforked child is killed (such as by SIGKILL) while holding the
lock, the parent is not killed but its _thr_sigact is damaged.

These problems could be avoided in libthr by skipping the lock in
_sigaction() if a signal action is being set to SIG_DFL or SIG_IGN and
the old action is not queried. In those cases, _thr_sigact is not
touched so no lock is required. This change also helps applications,
provided they call sigaction() and not signal().

Alternatively, posix_spawn() and system() could use the sigaction system
call directly, bypassing libthr (if present). However, this will not
help applications that call vfork() and sigaction() themselves (such as
a shell that wants to implement ...  using vfork()).

posix_spawn() likely still needs some adjustment so that having it reset
all signals (sigfillset()) to the default action will not cause it to
[EINVAL] because libthr does not allow changing SIGTHR's disposition.

Index: lib/libthr/thread/thr_sig.c
===
--- lib/libthr/thread/thr_sig.c (revision 238970)
+++ lib/libthr/thread/thr_sig.c (working copy)
@@ -519,8 +519,16 @@
return (-1);
}
 
-   if (act)
+   if (act) {
newact = *act;
+   /*
+* Short-circuit cases where we do not touch _thr_sigact.
+* This allows performing these safely in a vforked child.
+*/
+   if ((newact.sa_handler == SIG_DFL ||
+   newact.sa_handler == SIG_IGN)  oact == NULL)
+   return (__sys_sigaction(sig, newact, NULL));
+   }
 
__sys_sigprocmask(SIG_SETMASK, _thr_maskset, oldset);
_thr_rwl_wrlock(_thr_sigact[sig-1].lock);

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: system() using vfork() or posix_spawn()

2012-08-05 Thread Jilles Tjoelker
On Mon, Jul 30, 2012 at 01:53:03PM +0300, Konstantin Belousov wrote:
 On Mon, Jul 30, 2012 at 12:24:08PM +0200, Jilles Tjoelker wrote:
  People sometimes use system() from large address spaces where it would
  improve performance greatly to use vfork() instead of fork().

  A simple approach is to change fork() to vfork(), although I have not
  tried this. It seems safe enough to use sigaction and sigprocmask system
  calls in the vforked process.

  Alternatively, we can have posix_spawn() do the vfork() with signal
  changes. This avoids possible whining from compilers and static
  analyzers about using vfork() in system.c. However, I do not like the
  tricky code for signals and that it adds lines of code.

  This is lightly tested.

 It is interesting to note that for some time our vfork(2) no longer
 stops the whole forked process (parent), only the forking thread is
 waiting for the child exit or exec. I am not sure is this point
 important for system(3), but determined code can notice the difference
 from the fork-vfork switch.

Neither fork nor vfork call thread_single(SINGLE_BOUNDARY), so this is
not a difference.

Thread singling may be noticeable from a failing execve() (but only in
the process doing execve()) and in the rare case of rfork() without
RFPROC.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


system() using vfork() or posix_spawn()

2012-07-30 Thread Jilles Tjoelker
People sometimes use system() from large address spaces where it would
improve performance greatly to use vfork() instead of fork().

A simple approach is to change fork() to vfork(), although I have not
tried this. It seems safe enough to use sigaction and sigprocmask system
calls in the vforked process.

Alternatively, we can have posix_spawn() do the vfork() with signal
changes. This avoids possible whining from compilers and static
analyzers about using vfork() in system.c. However, I do not like the
tricky code for signals and that it adds lines of code.

This is lightly tested.

Index: lib/libc/stdlib/system.c
===
--- lib/libc/stdlib/system.c(revision 238371)
+++ lib/libc/stdlib/system.c(working copy)
@@ -42,16 +42,21 @@
 #include unistd.h
 #include paths.h
 #include errno.h
+#include spawn.h
 #include un-namespace.h
 #include libc_private.h
 
+extern char **environ;
+
 int
 __system(const char *command)
 {
pid_t pid, savedpid;
-   int pstat;
+   int error, pstat;
struct sigaction ign, intact, quitact;
-   sigset_t newsigblock, oldsigblock;
+   sigset_t newsigblock, oldsigblock, defmask;
+   const char *argv[4];
+   posix_spawnattr_t attr;
 
if (!command)   /* just checking... */
return(1);
@@ -65,28 +70,36 @@
ign.sa_flags = 0;
(void)_sigaction(SIGINT, ign, intact);
(void)_sigaction(SIGQUIT, ign, quitact);
+   (void)sigemptyset(defmask);
+   if ((intact.sa_flags  SA_SIGINFO) != 0 ||
+   intact.sa_handler != SIG_IGN)
+   (void)sigaddset(defmask, SIGINT);
+   if ((quitact.sa_flags  SA_SIGINFO) != 0 ||
+   quitact.sa_handler != SIG_IGN)
+   (void)sigaddset(defmask, SIGQUIT);
(void)sigemptyset(newsigblock);
(void)sigaddset(newsigblock, SIGCHLD);
(void)_sigprocmask(SIG_BLOCK, newsigblock, oldsigblock);
-   switch(pid = fork()) {
-   case -1:/* error */
-   break;
-   case 0: /* child */
-   /*
-* Restore original signal dispositions and exec the command.
-*/
-   (void)_sigaction(SIGINT, intact, NULL);
-   (void)_sigaction(SIGQUIT,  quitact, NULL);
-   (void)_sigprocmask(SIG_SETMASK, oldsigblock, NULL);
-   execl(_PATH_BSHELL, sh, -c, command, (char *)NULL);
-   _exit(127);
-   default:/* parent */
+   argv[0] = sh;
+   argv[1] = -c;
+   argv[2] = command;
+   argv[3] = NULL;
+   if ((error = posix_spawnattr_init(attr)) != 0 ||
+   (error = posix_spawnattr_setsigmask(attr, oldsigblock)) != 0 ||
+   (error = posix_spawnattr_setsigdefault(attr, defmask)) != 0 ||
+   (error = posix_spawnattr_setflags(attr,
+   POSIX_SPAWN_SETSIGDEF | POSIX_SPAWN_SETSIGMASK)) != 0 ||
+   (error = posix_spawn(pid, _PATH_BSHELL, NULL, attr,
+   __DECONST(char **, argv), environ)) != 0) {
+   pid = -1;
+   errno = error;
+   } else {
savedpid = pid;
do {
pid = _wait4(savedpid, pstat, 0, (struct rusage *)0);
} while (pid == -1  errno == EINTR);
-   break;
}
+   posix_spawnattr_destroy(attr);
(void)_sigaction(SIGINT, intact, NULL);
(void)_sigaction(SIGQUIT,  quitact, NULL);
(void)_sigprocmask(SIG_SETMASK, oldsigblock, NULL);

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFC] last(1) with security.bsd.see_other_uids support

2012-06-06 Thread Jilles Tjoelker
On Wed, Jun 06, 2012 at 01:20:12PM +0200, Pawel Jakub Dawidek wrote:
 On Tue, Jun 05, 2012 at 11:31:01PM +0200, Jilles Tjoelker wrote:
  Also, the attack surface of such a daemon may be smaller than that of a
  setuid/setgid program.

 Really? I don't see that. With current patch and setgid to utmp the
 process can only read some files that don't even contain very sensitive
 data (like passwords).

 Any privileged daemon is much bigger threat. Also, do we really want a
 daemon running all the time just to be able to parse utx files?

The daemon would run with non-root privileges just sufficient to read
the utmpx files.

If we have a good way to start it, the attack surface is limited to what
you can do with its socket and this can be cut down tightly.

On the other hand, an attacker can control various process attributes of
a setgid program such as the output file, a subset of signals, rlimits
and a subset of environment variables. For example, last, w and who have
some degree of locale support (time/date formats). Also, in this
particular case, dropping privileges does not help much since the utmpx
file descriptor is almost as valuable as the group credentials.

I agree that leaving a daemon running for this is ugly.

  Alternatively, the daemon could be a setgid program that is spawned by
  the utmpx APIs when needed.

 Still seems a bit too far for my taste. Spawning a daemon somewhere from
 within library doesn't sound like a good idea to me... At least until we
 have something like launchd that can start such services on demand.

The suggested approach is used by old implementations of grantpt(). If
the kernel does not set up the ownership of a new pseudo terminal
properly, grantpt() can invoke a setuid root binary, for example
/usr/libexec/pt_chown.

Similarly, the utmpx APIs might invoke a setgid helper if they cannot
read the files themselves. Communication would be over a pipe.

This has the downside of having a setgid program at all but things like
locale support are handled by the calling (unprivileged) application and
not by the setgid program.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFC] last(1) with security.bsd.see_other_uids support

2012-06-05 Thread Jilles Tjoelker
On Sun, Jun 03, 2012 at 08:42:04PM -0500, Bryan Drewery wrote:
 I've written up a patch to add some privacy to last(1) while still
 giving non-privileged users access to their own login history.

 This is still a work in progress. I am reaching out to make sure my
 approach is proper and to get some input on code sharing. My goal is to
 add this support to w(1) and who(1) as well. FWIW I have been running a
 similar patch on my own shared-hosting systems (pre-utmpx) for a few years.

 Changes:

 * Added utmp group
 * All utmpx files are 660 root:utmp

The utmp group was previously (on other systems) used to allow
applications write access to utmp without making them setuid root
(making them setgid utmp instead).

 * last(1) runs setgid(utmp) and drops this as soon as the utmpx files
 are opened.
 * Users in the wheel or utmp group can see all entries
 * IFF security.bsd.see_other_uids=0: users only see their own entries,
 as well as shutdown/boot/init times.
 * If security.bsd.see_other_uids=1, all entries are always shown, as it
 does now.

 Justifications:

 Why the changes? This makes sense for shared hosting environments where
 jails are not practical. A user should be able to see their own login
 history, to see if someone has been accessing their account, but not to
 see the IPs of other users. The intention is *not* to disallow them to
 see that other users of the system. Obviously they can just cat
 /etc/passwd. This is just about IP privacy.

 Why the setgid? Allow reading the entries, but disallow directly opening
 the utx files. I've seen some shared hosts incorrectly chmod 0
 /usr/bin/last, but still leave the utmp files wide open for reading!

 Why the utmp group? It's consistent with other systems (OpenBSD, Linux),
 and allows giving a user access to see all entries, without granting
 them wheel or allowing a non-privileged user to run as setgid(wheel). It
 also helps mitigates security concerns by using a specific group only
 having extra privilege to utmpx files.

 I originally had not planned for security.bsd.see_other_uids, but
 considering POLA and consistency, it makes sense.

This requires every utmpx access to go through a setgid binary,
regardless of the value of security.bsd.see_other_uids. If called by a
user that is not root or in the utmp group, getutxent(3) and related
APIs become worthless.

While POSIX permits this (security restriction denying all visibility of
utmpx), this is not what applications expect. For example, tcsh and zsh
offer a watch feature that reports on logins and logouts by calling
utmpx APIs.

To avoid this, the utmpx APIs could communicate with a privileged daemon
if the files are not readable. The daemon can check the identity of the
caller via getpeereid(3). (Unfortunately, even if getpeereid() is
bypassed and LOCAL_PEERCRED called directly, only 16 groups can be
queried. Therefore the daemon cannot check the process credential for
the groups but will have to check the group database for the user.)

Also, the attack surface of such a daemon may be smaller than that of a
setuid/setgid program.

Alternatively, the daemon could be a setgid program that is spawned by
the utmpx APIs when needed.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: mmap segmentation fault

2012-04-12 Thread Jilles Tjoelker
On Fri, Apr 13, 2012 at 12:25:32AM +0530, Maninya M wrote:
 I want to allocate memory at a specified address location 'a' of size 'b'.
 I wrote code below to do it, but I'm getting a seg fault. How can I solve
 this?
 How can I get the allocated memory at the required address?

 int main()
 {
 unsigned int *addr,*newaddr;
 unsigned long a=134516736,a1;
 unsigned long b=3895296;
 unsigned long flags =6;
 a1=(a0x);
 printf(%x\n,(void *)a);
 newaddr=(unsigned int *)mmap((void *)a,b,6,MAP_ANONYMOUS|MAP_FIXED,-1,0);

 if(newaddr==MAP_FAILED)
 printf(mmap failed);
 else
 printf(sucess %x,newaddr);
 return 0;
 }

 Output is
 8049000
 Segmentation fault

If this is i386, you're mapping onto the executable itself. If you
really want to map something there, you will have to move your code
somewhere else or manipulate your executable to contain a suitable
memory area at the required address.

Try, for example,
  procstat -v $$

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: a sysctl for process binary osreldate

2012-03-17 Thread Jilles Tjoelker
On Sat, Mar 17, 2012 at 09:30:05PM +0200, Mikolaj Golub wrote:
 I added osrel output to procstat -b option:

 kopusha:~% procstat -b 2975
   PID COMMOSREL PATH
  2975 emacs 101 /usr/local/bin/emacs-23.3

 Would this be ok or someone see a better way?

Hmm, this means that procstat is not supposed to be used from scripts as
it is apparently OK to change its output format like this?

In some ways, querying via ps would be better for scripts since it
allows things like
  ps -p PID -o KEYWORD=
which do not need additional parsing except that many of the newer
things in procstat do not have ps keywords.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: xargs short-circuit

2012-02-14 Thread Jilles Tjoelker
On Tue, Feb 14, 2012 at 01:34:49PM -0500, Matthew Story wrote:
 After reading the man-page, and browsing around the internet for a minute,
 I was just wondering if there is an option in (any) xargs to short-circuit
 on first failure of [utility [arguments]].

 e.g.

 $ jot - 1 10 | xargs -e -n1 sh -c 'echo $*; echo exit 1' worker || echo $?
 1
 1

 such that any non-0 exit code in a child process would cause xargs to stop
 processing.  seems like this would be a nice feature to have.

As per xargs(1), you can do this by having the command exit on a signal
or with a value of 255.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sem(4) lockup in python?

2012-02-05 Thread Jilles Tjoelker
On Sun, Feb 05, 2012 at 01:32:42PM -0500, Daniel Eischen wrote:
 On Sun, 5 Feb 2012, Ivan Voras wrote:
  On 5 February 2012 11:44, Garrett Cooper yaneg...@gmail.com wrote:

     'make MAKE_JOBS_NUMBER=1' is the workground used right now..

  David Xu suggested that it is a bug in Python - it doesn't set
  process-shared attribute when it calls sem_init(), but i've tried
  patching it (replacing the port patchfile file the one I've attached)
  and I still get the hang.

 I don't understand how process shared semaphores can work.  Perhaps
 I'm dumb and ignorant, but a sem_id_t is an allocated struct.   The
 actual kernel sem_id is inside the struct.  Isn't this the same
 reason pthread_mutex_t and pthread_cond_t cannot be process-shared?

That's how the old implementation works. It does not support
process-shared semaphores although they may happen to work in some
specific cases.

However, in 9.0, sem_t works differently and contains the actual lock
word directly, so that process-shared semaphores work. The
implementation is in lib/libc/gen/sem_new.c. The pshared flag to
sem_init() is not a no-op because it tells the kernel to allow for use
from multiple processes.

Note that the old implementation is still present as well, for
compatibility with old binaries.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


sh(1) vfork patch, with benchmarks

2012-01-25 Thread Jilles Tjoelker
Here is a new version of the vfork patch. The concept is the same,
trying to use vfork in simple common cases. Compared to
http://lists.freebsd.org/pipermail/freebsd-hackers/2011-June/035618.html
I have extended vfork use to some command substitutions and allowed
setting an environment variable SH_DISABLE_VFORK to disable all vfork
use.

The test machine is a 4-core Phenom II X4 in 32-bit mode with 4GB RAM,
stable/8 with newer sh. I have run tests with a dummy environment
variable SH_DISABLE_VFORL=1 (y) and with SH_DISABLE_VFORK=1 (n).

A microbenchmark
sh -c 'x=0; while [ $x -lt 1 ]; do /bin/kill -0 $$; x=$(($x + 1)); done'
is much faster:

x micro-vfork-timings1-n
+ micro-vfork-timings1-y
+--+
|+ |
|+ x   |
| + ++  x  xx  |
|++ ++  xx xx x|
| |_A||___AM_| |
+--+
N   Min   MaxMedian   AvgStddev
x   9  3.86  4.05 4 3.963   0.076648549
+   9  2.52   2.6  2.58 2.572   0.033082389
Difference at 95.0% confidence
-1.39111 +/- 0.0589948
-35.0995% +/- 1.48851%
(Student's t, pooled s = 0.0590315)

A make -j4 buildkernel is about 0.5% faster:

x buildkernel-vfork-timings-n
+ buildkernel-vfork-timings-y
+--+
|   +  x + |
|+  + +* + + +x*+xx+   x   xx x +   x  x x  *+  *  x+ x|
||__|M_AMA|___||
+--+
N   Min   MaxMedian   AvgStddev
x  17 435.3443.65 438.8 439.03588  2.378805
+  17 432.8442.86436.76 437.06412   3.20108
Difference at 95.0% confidence
-1.97176 +/- 1.97034
-0.449112% +/- 0.448789%
(Student's t, pooled s = 2.82007)

In both cases, the difference comes mainly from the system time, but the
user time is also a bit lower (statistically significant).

In a virtual machine with 10-current (default kernel config + capsicum
and procdesc) and the patch, the microbenchmark is similarly much
faster:

x micro-vfork-timings-n
+ micro-vfork-timings-y
+--+
|+ +   |
|+++ x |
|xx x  |
|   xx |
|+   x  xxx|
||_A|   |_A_|  |
+--+
N   Min   MaxMedian   AvgStddev
x  18 15.14 15.85 15.5715.5550.17088868
+  18  9.79 10.14  9.92 9.9127778   0.096820365
Difference at 95.0% confidence
-5.64222 +/- 0.0940703
-36.2727% +/- 0.604759%
(Student's t, pooled s = 0.138883)

Ian Lepore has been kind enough to try an earlier version of this patch
on some sort of ARM board and reports an improvement in boot time from
54 to 51 seconds, and a large difference in microbenchmarks.

commit f55a350fa9c3792e10f93160a93d016a7bfdd630
Author: Jilles Tjoelker jil...@stack.nl
Date:   Mon May 30 00:31:45 2011 +0200

sh: Use vfork in a few common cases.

This uses vfork() for simple commands and command substitutions containing a
single simple command, invoking an external program under certain conditions
(no redirections or variable assignments, non-interactive shell, no job
control).

The use of vfork() can be disabled by setting a variable named
SH_DISABLE_VFORK.

diff --git a/bin/sh/eval.c b/bin/sh/eval.c
index a5f0aff..2d90921 100644
--- a/bin/sh/eval.c
+++ b/bin/sh/eval.c
@@ -921,6 +921,15 @@ evalcommand(union node *cmd, int flags, struct backcmd 
*backcmd)
if (pipe(pip)  0)
error(Pipe call failed: %s, strerror(errno));
}
+   if (cmdentry.cmdtype == CMDNORMAL 
+   cmd

Re: sh(1) vfork patch, with benchmarks

2012-01-25 Thread Jilles Tjoelker
On Wed, Jan 25, 2012 at 11:54:46PM +0100, Jilles Tjoelker wrote:
 [snip]
 x micro-vfork-timings1-n
 + micro-vfork-timings1-y
 +--+
 |+
  |
 |+ x  
  |
 | + ++  x  xx 
  |
 |++ ++  xx xx 
 x|
 | |_A|
 |___AM_| |
 +--+
 N   Min   MaxMedian   AvgStddev
 x   9  3.86  4.05 4 3.963   0.076648549
 +   9  2.52   2.6  2.58 2.572   0.033082389
 Difference at 95.0% confidence
 -1.39111 +/- 0.0589948
 -35.0995% +/- 1.48851%
 (Student's t, pooled s = 0.0590315)

 [snip]

I forgot to mention, the numbers are time in seconds (measured with
/usr/bin/time).

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: intent of tab-completion in /bin/sh in 9.0

2012-01-19 Thread Jilles Tjoelker
On Wed, Jan 18, 2012 at 08:46:18PM -0500, Matthew Story wrote:
 On Wed, Jan 18, 2012 at 5:16 PM, Jilles Tjoelker jil...@stack.nl wrote:
  POSIX itself has gradually adopted ksh features, so seeing more of them
  in future is not unlikely. Most of the new language features in 9.0 are
  either from POSIX.1-2008 or on the roadmap for a new version of POSIX
  (in collaboration with other shell authors).

 Tab completion is a welcome addition, I was unaware that this had been (or
 is slated to be) added to the POSIX specification.  This makes far more
 sense than my proposed explanations.  Thanks for the clarification.

Tab completion is not in POSIX and not planned to be (as far as I know),
although there is some fairly ugly pathname completion functionality
required in 'set -o vi' mode (we do not implement it).

The reason is more like that I noticed that NetBSD had it and found
someone willing to port it and add some small features (escaping special
characters in what is inserted).

In using /bin/sh as login shell on virtual machines (to avoid the need
to install something else from ports), I have found the filename
completion to be remarkably useful.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: intent of tab-completion in /bin/sh in 9.0

2012-01-18 Thread Jilles Tjoelker
On Wed, Jan 18, 2012 at 11:00:44AM -0500, Matthew Story wrote:
 Just noticed that tab-completion in /bin/sh has been added in 9.0
 (verified that it is not there in 8.0, dunno if it's there in 8.2,
 could probably go digging to figure it out).  In addition to the
 command history via up:down (which is present in 8.0) FreeBSD sh
 is now actually a pretty usable interactive shell.  I also noticed
 that the following bit has been removed from the sh(1):

  This version has many features which make it appear similar in some
 respects to the Korn shell, but it is not a Korn shell clone like pdksh.

That sentence is an implicit comparison to the Bourne shell. When it was
written, many other Un*x variants had a version of the Bourne shell as
/bin/sh, and it was not taken for granted to have things like $(...),
$((...)) and ${...#...} in a /bin/sh. Nowadays, this is often taken for
granted; the last OS to have a Bourne shell as /bin/sh is probably
Solaris 10 (11 has ksh93).

On the contrary, our /bin/sh is minimalistic compared to many other
shells used in that role, like bash, pdksh, mksh and ksh93. It (the 9.0
version) has only slightly more features than dash or NetBSD's sh, and
dash has instead some other features.

 Just wondering if the general direction here is attempting to provide a
 minimal POSIX shell, that is useful enough interactively to become the
 default root shell (supplanting csh)?  Or if there is just a general trend
 towards adopting more of the ksh feature-set.

POSIX itself has gradually adopted ksh features, so seeing more of them
in future is not unlikely. Most of the new language features in 9.0 are
either from POSIX.1-2008 or on the roadmap for a new version of POSIX
(in collaboration with other shell authors).

Adding other ksh features is not very likely.

It is certainly possible to use /bin/sh as root's shell, but the
distributed master.passwd entry will probably continue to use /bin/csh
for a long time.

Some plans for sh in 10.0 are in this mailing list post:
http://lists.freebsd.org/pipermail/freebsd-arch/2011-December/011976.html

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ps -e without procfs(5)

2011-12-04 Thread Jilles Tjoelker
On Sat, Oct 29, 2011 at 01:32:39PM +0300, Mikolaj Golub wrote:
 [KERN_PROC_AUXV requires just p_cansee()]

If we are ever going to do ASLR, the AUXV information tells an attacker
where the stack, executable and RTLD are located, which defeats much of
the point of randomizing the addresses in the first place.

Given that the AUXV information seems to be used by debuggers only
anyway, I think it would be good to move it to p_candebug() now.

The full virtual memory maps (KERN_PROC_VMMAP, procstat -v) are already
under p_candebug().

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ps -e without procfs(5)

2011-12-04 Thread Jilles Tjoelker
On Sun, Dec 04, 2011 at 10:58:10PM +0200, Mikolaj Golub wrote:
  RNMW Agreed. In general, my view is that p_cansee() should be used for very
  RNMW few of our process inspection APIs. I like your example of ASLR
  RNMW especially, as it illustrates how debugging information can aid even
  RNMW local attacks (i.e., user vs. setuid binary).

 What do you think about recently added kern.proc.ps_strings, which
 returns location of ps_strings structure? It uses p_cansee() too. The
 location is the same for all processes of the same ABI, so this does
 not look like sensitive information, on the other hand it also seems
 to be used by debuggers only.

With stack ASLR, the address will not be the same for every process of
the same ABI and will be sensitive information. Therefore I think this
should be locked down too.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: cron(8) mis-feature with @reboot long after system startup

2011-11-26 Thread Jilles Tjoelker
On Sat, Nov 26, 2011 at 02:58:46PM -0500, APseudoUtopia wrote:
 On Fri, Nov 25, 2011 at 4:36 PM, Christian Kastner deb...@kvr.at wrote:
  On 2011-11-25 08:02, Jason Hellenthal wrote:
  So with that said... is there a way we could actually make this run
  @reboot only ?

  Debian's cron[0] and Fedora's cronie[1] have solved this by touching a
  file on first startup and running @reboot only when this file does not
  yet exist.

 I like this idea, however it has a major caveat: Assuming the shutdown
 scripts remove said file (and the boot scripts create said file), what
 happens in the event that the disk was umount'ed uncleanly? For
 example, a power failure (I know, that's what UPSs are for, but lets
 ignore that for a second). If the system is configured to
 automatically boot after a power failure, the @reboot cron script wont
 run (since the said file still exists...).

The file can be stored in /var/run, which is cleared at boot.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: xlocale patch

2011-11-20 Thread Jilles Tjoelker
On Wed, Sep 21, 2011 at 08:28:54PM +0100, David Chisnall wrote:
 And here's an updated version of the patch.  I've fixed some other
 bugs, including where wcstod() and wcstodl() in trunk return the wrong
 value for any input string starting with spaces, wchar.h's violation
 of POSIX by not declaring FILE, and a few C++ incompatibilities in
 other headers (e.g. putchar being a macro, which breaks things like
 std::putchar(foo)).  All of my libcxxrt and libc++ changes have now
 been pushed upstream, so this should now be repeatable.  
 
 The libunwind port still has an irritating bug in the header, where
 the extern C {} block ends with a semicolon, which causes it to be
 rejected in any C++ program, but with that fixed you can compile
 libcxxrt (I used cmake .. -DCMAKE_CXX_FLAGS=-I/usr/local/include
 -nostdlib -g - hopefully I'll work out how to make CMake add these
 flags automatically soon...).  Libc++ should build out of the box with
 cmake.

I think it is unfortunate that xlocale.h depends on #include order; we
should be moving away from such ordering dependencies, not adding more.
POSIX does not have such dependencies so if they add all these new _l
functions, they will most likely do it differently.

POSIX.1-2008 has some of the _l functions but adds them to the plain
header; for example isalnum_l() is in ctype.h like isalnum(),
nl_langinfo_l() is in langinfo.h like nl_langinfo() and strcasecmp_l()
is in strings.h like strcasecmp(). This appears more sensible to me.
The xlocale.h header can then be an empty file.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Does anyone use nscd?

2011-10-08 Thread Jilles Tjoelker
On Wed, Oct 05, 2011 at 03:54:00PM -0700, Artem Belevich wrote:
 2011/10/5 Dag-Erling Sm?rgrav d...@des.no:
  Michael Bushkov bush...@freebsd.org writes:
  2. Consequences of the aforementioned problem can probably be
  corrected by using _setsockopt(..., SO_NOSIGPIPE) in
  __open_cached_connection() in nscachedcli.c

  That sounds like a workaround rather than a fix...

 Not necessarily. Using SO_NOSIGPIPE is a valid option when someone
 wants to see read/write on a closed socket fail and return -1 with
 errno=EPIPE.

 Quick grep in libc shows that resolver code in
 lib/libc/resolv/res_send.c also sets SO_NOSIGPIPE for exactly that
 reason.

Disabling SIGPIPE is good anyway because a crashing/dying nscd should
not cause applications to terminate. However, if EPIPE/SIGPIPE happens
in normal operation, that is still a bug that should be fixed.

By the way, SO_NOSIGPIPE is not in POSIX.1-2008 while the MSG_NOSIGNAL
flag to send() is. It may be better to replace the write() call with
send() with the MSG_NOSIGNAL flag and drop the setsockopt().

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Re[2]: Sharing device driver between kernel and user space

2011-09-23 Thread Jilles Tjoelker
On Wed, Sep 21, 2011 at 04:23:28PM +0200, Ivan Voras wrote:
 On 21 September 2011 16:09, geoffrey levand geoffrey.lev...@mail.ru wrote:
  Sure i can use the synchronization primitives, the problem is that
  the response to a request sent to PS3 VUART port is not available
  immediately, and i have to disallow kernel access to the PS3 VUART
  while i'm waiting for the response in user space. I send request
  with write syscall from user space and wait for response with read
  syscall. In the period of time between sending request and receiving
  response i could receive some other packets from VUART port, e.g.
  some kind of event notification,  i have to skip them. But kernel
  should not interfer until i get my response.  So i would need to
  lock out the kernel during this time. I think i found a good
  solution for this problem, just use a IOCTL which tells kernel
  device driver to stop processing kernel requests and events,
  something like SET_USER_MODE.  After that i can use it in user
  space.

Perhaps it is better to lock out kernel activity (and further opens)
while userland has the device open.

In any case, a close should probably release the device (possibly after
some sort of cleanup), so that a crash of the userland program does not
necessitate a reboot before the device can be used again.

 Have you read sema(9)?

sema(9) is best avoided in new code. It is not used much in existing
code, and it would be useful to be able to remove some day in order to
reduce redundancy in the provided synchronization primitives.

Instead, see mutex(9) and condvar(9).

The implementation of sema(9) is not very optimized, using a mutex and a
condition variable. (This is unlike the userland sem_post(3) and friends
which only require a single atomic op in the uncontested case, except
for named semaphores in 8.x and earlier.)

 Or if returning EBUSY is acceptable when the resource is in use by
 $whatever, maybe you just need a boolean variable.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Mixing Asynchronous Network I/O and POSIX Threads

2011-09-18 Thread Jilles Tjoelker
On Sun, Sep 18, 2011 at 11:32:08AM -0400, Richard Yao wrote:
 I wrote a program for Linux that uses Asynchronous Network I/O and
 POSIX Threads. It uses a mix of gettid(), fcntl() and F_SETOWN to
 specify which thread handles each connection's SIGIO interrupts.
 gettid() is Linux-specific and I would prefer to do this in a way that
 also works with FreeBSD. Is that possible?

In FreeBSD, you can only F_SETOWN processes or process groups, not
threads, so a direct port is probably not possible. Another Linux
feature not in FreeBSD is F_SETSIG (to send a realtime signal instead of
SIGIO and provide siginfo_t information).

My recommendation is not to use SIGIO and SIGURG at all. If you use
signal handlers, your program is very likely to suffer from race
condition problems unless you unmask the signal only at well-defined
safe points. If you use sigtimedwait() or similar, select()/poll() can
provide similar functionality. Alternatively, if you have many
descriptors to watch, use non-portable facilities like BSD kqueue, Linux
epoll, Solaris ports or portable wrappers around them. Realtime signals
are nasty for this -- the signal queue has a finite size and when it
overflows you need a resync step that is expensive both in CPU and
programmer time.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


fdopendir() wrongly closes passed fd on error, union mess

2011-08-09 Thread Jilles Tjoelker
While trying to use openat()/fdopendir()/fstatat() to improve
performance of sh(1) pathname generation, I noticed that fdopendir()
closes the passed file descriptor if it fails (such as when stable/8
silently ignores O_DIRECTORY even though it is defined in the header
file).

The below patch should fix this problem. I have changed the ugly
DTF_REWIND code so it moves the reopened directory to the same file
descriptor number, so as to minimize change there while ensuring the
correct fd is closed.

A later possibility is to fix the DTF_REWIND problem by reading union
directories from a new descriptor obtained via
  fd2 = openat(fd, ., O_RDONLY | O_DIRECTORY | O_CLOEXEC);
and doing this unconditionally, retiring the DTF_REWIND flag.

Reading the kernel code, it appears that calling getdirentries() on an
open file description in a filesystem mounted with MNT_UNION (mount -o
union) may change that open file description irreversibly to one
pointing to the lower layer. This must not happen to the descriptor
passed to fdopendir() or the descriptor returned via dirfd() since
fchdir() and *at functions may not work properly with such a changed
descriptor.

Note: unionfs does not appear to mangle the open file description like
MNT_UNION does, but it does need the duplicate removal code like
MNT_UNION.

Index: lib/libc/gen/opendir.c
===
--- lib/libc/gen/opendir.c  (revision 224739)
+++ lib/libc/gen/opendir.c  (working copy)
@@ -75,6 +75,8 @@ __opendir2(const char *name, int flags)
 {
int fd;
struct stat statb;
+   DIR *dir;
+   int saved_errno;
 
/*
 * stat() before _open() because opening of special files may be
@@ -89,7 +91,13 @@ __opendir2(const char *name, int flags)
if ((fd = _open(name, O_RDONLY | O_NONBLOCK | O_DIRECTORY)) == -1)
return (NULL);
 
-   return __opendir_common(fd, name, flags);
+   dir = __opendir_common(fd, name, flags);
+   if (dir == NULL) {
+   saved_errno = errno;
+   _close(fd);
+   errno = saved_errno;
+   }
+   return (dir);
 }
 
 static int
@@ -110,6 +118,7 @@ __opendir_common(int fd, const char *name, int fla
int incr;
int saved_errno;
int unionstack;
+   int fd2;
struct stat statb;
 
dirp = NULL;
@@ -199,14 +208,15 @@ __opendir_common(int fd, const char *name, int fla
 * which has also been read -- see fts.c.
 */
if (flags  DTF_REWIND) {
-   (void)_close(fd);
-   if ((fd = _open(name, O_RDONLY | O_DIRECTORY)) == -1) {
+   if ((fd2 = _open(name, O_RDONLY | O_DIRECTORY)) == -1) {
saved_errno = errno;
free(buf);
free(dirp);
errno = saved_errno;
return (NULL);
}
+   (void)_dup2(fd2, fd);
+   _close(fd2);
}
 
/*
@@ -309,7 +319,6 @@ __opendir_common(int fd, const char *name, int fla
 fail:
saved_errno = errno;
free(dirp);
-   (void)_close(fd);
errno = saved_errno;
return (NULL);
 }

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: eliminating a syscall on accept()+ioctl() combo

2011-08-02 Thread Jilles Tjoelker
On Mon, Aug 01, 2011 at 08:11:04AM +0200, Vlad Galu wrote:
 On Jul 31, 2011, at 9:59 PM, Bernard van Gastel wrote:
  I want to reduce the number of syscalls for my networking
  application. The app handles incoming connections with the
  'accept()' system call. Is there a way to specify to accept() that
  the newly created file descriptors should be non-blocking (FIONBIO)?
  This will avoid an ioctl() after the accept(). Thanks!

 You can make your listening socket non-blocking. Newly created file
 descriptors will inherit that property. However, that will require you
 to select()/poll()/kqueue() for that descriptor as well, instead of
 simply blocking in accept().

This is documented FreeBSD behaviour and common across BSDs, but is not
portable. POSIX leaves it unspecified what the non-blocking state of the
new socket is and in fact Linux always makes the new socket blocking
(unless you request non-blocking using their new accept4() call).

Because this portability issue can be very subtle, I suggest not blindly
relying on it.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: invalid argument in select() when peer socket is in FD_SET

2011-07-31 Thread Jilles Tjoelker
On Sun, Jul 31, 2011 at 04:20:08PM +0200, Christoph P.U. Kukulies wrote:
 I posted this on freebsd-questions also but maybe the expert density 
 isn't that high as here in hackers.
 Since I think it may be a design or implementation issue in FreeBSDs' 
 select(), I'm posting it here as well,
 hoping to get an experts' answer.

 I have written a small server to test TCP/IP roundtrip times of the 
 packets in a proprietary protocol and while
 compiling and running this server on different platforms (Windows 
 7/cygwin, UbuntuLinux, FreeBSD 8.0 Release), I found
 that the server produces an error when the listening socket (on which 
 the accept() is performed) is also
 member of the select() fd_set.

 On the other platforms the program works without error, just under 
 FreeBSD I'm getting this invalid argument error.

 Comments appreciated (despite comments about the error checking logic

 [snip]
  tv.tv_sec = 0;
  tv.tv_usec = 500;   /* 5 seconds */
 [snip]
  n = select(nfds, readfds,
 (fd_set *) NULL, /* not interested in write */
 (fd_set *) NULL, /* ...or exceptions */
 tv);/* timeout */

The number of microseconds in a struct timeval must be nonnegative and
less than one million (likewise, the number of nanoseconds in a struct
timespec must be nonnegative and less than 1000 million).

FreeBSD checks this strictly in most functions.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


[PATCH] [RFC] sh(1) vfork support

2011-06-13 Thread Jilles Tjoelker
;
+   TRACE((In parent shell:  child = %d\n, (int)pid));
+   return pid;
+}
+
+
 /*
  * Wait for job to finish.
  *

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: tr A-Z a-z in locales other than C

2011-06-07 Thread Jilles Tjoelker
On Tue, Jun 07, 2011 at 04:24:43AM +0400, Andrey Chernov wrote:
 On Tue, Jun 07, 2011 at 12:41:05AM +0200, Jilles Tjoelker wrote:

  There is a related issue with ranges in regular expressions, glob and
  fnmatch (likewise unspecified by POSIX outside the POSIX locale), but
  this is less likely to cause problems.

 You care about ports, but suggested change is americano-centrism which 
 kills tr usage for national language documents due to impossibility to 
 specify whole national alphabet easily, just by two letters.

Hmm, so that's with translation to a constant, or with the -d and/or -s
options. In such cases, there may be a range for all letters with
collation order, but not with codeset order (mainly if all letters
includes letters with diacritical marks).

In FreeBSD, upper case sorts before lower case, so cases can be
distinguished this way but all letters may require two ranges. In most
other operating systems the cases go together so a single range is
sufficient, but cases cannot be distinguished. Making such things work
on multiple operating systems requires careful testing.

 Moreover, having differently treated regex ranges in tr vs other places 
 you mention will produce additional chaos.

I think this is already inconsistent because some programs do not enable
locale or use different locale code.

With UTF-8 or other multibyte character sets, this is even more so
because functions like isalpha work very poorly by definition and there
is no collation support for such character sets in FreeBSD.

 Back to the ports: it is not hard to run _any_ port's make or configure 
 with LANG=C directly by the ports Mk system to eliminate that problem.

True, but some ports install scripts with problematic tr calls.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: tr A-Z a-z in locales other than C

2011-06-07 Thread Jilles Tjoelker
On Wed, Jun 08, 2011 at 09:56:39AM +1200, Atom Smasher wrote:
 the man page makes it clear...

   Translate the contents of file1 to upper-case.

 tr [:lower:] [:upper:]  file1

   (This should be preferred over the traditional UNIX idiom of ``tr a-z
   A-Z'', since it works correctly in all locales.)

 for any other uses, either build the port with locale specified as C as 
 mentioned, or patch the port so:
   tr '[a-z]' '[A-Z]'
   becomes:
   env LC_ALL=C tr '[a-z]' '[A-Z]'

 the only change that would be appropriate to the tr utility would be a 
 command-line option to select a locale... something like:
   tr -l C '[a-z]' '[A-Z]'

 i don't think anyone would object to that, but it would still require 
 patching some ports under some locales...

That new option would provide zero benefit. If things are going to be
patched anyway then patch them to be standards compliant.

 maybe another option would be modifying tr to recognize other [new] 
 environment variables... TR_LANG, TR_LC_ALL, TR_LC_CTYPE and 
 TR_LC_COLLATE. done that way, things could be set in /etc/make.conf (or 
 sys.mk), not need any patching, and not interfere with other uses of 
 locale.

That would be rather ugly.

If  tr a-z A-Z  is supposed to be deceiving in some locales, then let it
remain so unconditionally.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


tr A-Z a-z in locales other than C

2011-06-06 Thread Jilles Tjoelker
++ = cnt;
-   *p = OOBCH;
-   n = p - s-set;
-
-   s-cnt = 0;
-   s-state = SET;
-   if (n  1)
-   mergesort(s-set, n, sizeof(*(s-set)), charcoll);
+   s-cnt = stopval - s-lastch + 1;
+   s-state = RANGE;
+   --s-lastch;
return (1);
 }
 

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: int64_t and printf

2011-06-05 Thread Jilles Tjoelker
On Sun, Jun 05, 2011 at 02:31:27PM -0400, Sean C. Farley wrote:
 On Sun, 5 Jun 2011, Ben Laurie wrote:
  So, for example int64_t has no printf modifier I am aware of. Likewise 
  its many friends.

  In the past I've handled this by having a define somewhere along the 
  lines of...

  #if something
  # define INT_64_T_FMT %ld
  #else
  # define INT_64_T_FMT %lld
  #endif

  but I have no idea where to put such a thing in FreeBSD. Opinions? 
  Also, I guess I'd really need to do a modifier rather than a format, 
  for full generality.

 You need to include inttypes.h, which includes machine/_inttypes.h. 
 This will provide the appropriate macro which in this case is PRId64.

The macros from inttypes.h are certainly valid, but the style in most
of FreeBSD prefers casting to a type such as intmax_t or uintmax_t for
which a constant format string is available (%jd/%ju). In the particular
case of int64_t, it would seem that long long is better than intmax_t,
as it is possibly shorter but still guaranteed large enough by C99, but
there are objections against this that I do not understand.

Also, note the format strings for size_t, %zu, and for ptrdiff_t, %td.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [RFC] rcexecr: rcorder in parallel

2011-06-04 Thread Jilles Tjoelker
On Sat, Jun 04, 2011 at 12:45:37PM +0100, Julien Laffaye wrote:
 On Sat, Jun 4, 2011 at 10:10 AM, Buganini bugan...@gmail.com wrote:
  https://github.com/buganini/rcexecr

  Currently it is able to determine the exec/wait order

  There are something I haven't digged in deeply in the self
  modification part.

  patches/ideas are welcome.

 Thanks for doing that!

Yes.

 You should use kqueue(2) instead of waitpid(2) so that you can
 efficiently monitor a pool of processes.
 See pwait(1) for an example.

Hmm, I don't think kqueue() should be used here. Its main advantage is
that it works regardless of parent-child relationships, but that
advantage is not relevant here. On the other hand, waitpid() is still
necessary to get rid of the zombies. Furthermore, waitpid() is standard
while kqueue() is not, and I think non-standard interfaces should only
be used if they provide a real benefit above standard interfaces.

The current approach with waitpid() for specific processes should be
good enough for a proof of concept. It will keep zombies longer than
necessary, particularly for things that are not explicitly depended on.
To avoid this, use waitpid(-1, ...) and maintain more tracking for
processes that have already terminated.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Does FreeBSD have replacement for posix_fadvice() or fcntl(F_RDADVISE)?

2011-05-12 Thread Jilles Tjoelker
On Thu, May 12, 2011 at 11:38:12AM +0400, Lev Serebryakov wrote:
Does FreeBSD have some custom call, which can be used where Linux
  programs uses posix_fadvice() and DARWIN ones fcntl(F_RDADVISE)?

It is like madvise(2) but for file descriptors.

An effect like POSIX_FADV_SEQUENTIAL can be obtained with the
F_READAHEAD or F_RDAHEAD fcntl(2) requests. The implementation is
divided between the generic VFS layer and various filesystems (cd9660,
ext2fs, msdosfs, nfs, xfs and ufs appear to use the information to some
degree).

Setting readahead to 0 with the fcntl requests has little effect: it
resets the state of the sequential access detection heuristic but does
not disable it.

The O_DIRECT open(2) flag has an effect similar to what
POSIX_FADV_NOREUSE should do, except that it changes semantics by adding
alignment restrictions.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [LIBC] Modfied Version of sscanf

2011-05-01 Thread Jilles Tjoelker
On Sat, Apr 30, 2011 at 06:44:43PM +0200, Martin Möller wrote:
 This is my first email to this list, so hello to all members.
 The current version of sscanf, stops when a whitespace characters occurs in
 a string
 when the %s (string) type is used.

 The following code:

 char name [20], value [20];
 sscanf (Test 2-Test 3, %s-%s, name, value);
 printf (%s-%s\n, name, value);

 outputs total garbage on my FreeBSD-7.0-RELEASE #0 amd64.
 Is there already a way to do this or should we release a new version of
 sscanf, e.g. called sscanfWS.

 This modified version would output: Test 2-Test 3.

I think you should use functions like memchr(), strchr() and strtok_r()
rather than sscanf().

For one, your code has undefined behaviour if the name or the value
exceed 19 bytes. If the input is untrusted, as your follow-up seems to
indicate, this undefined behaviour likely manifests in allowing an
attacker to execute code of his own choosing. Even if you avoid the
buffer overflow using a format string like %19s-%19s it is still not
very good as you may not get an error if the string is too long. Silent
truncation might invalidate security checks done elsewhere and can lead
to hard-to-diagnose bugs.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Basic UTF-8 support for sh(1)

2011-02-25 Thread Jilles Tjoelker
Here is a patch that adds basic UTF-8 support to sh(1). This is enabled
if the locale is set appropriately.

Features:
* ${#var} counts codepoints. (Really, bytes with (b  0xc0) != 0x80.)
* ?, [...] patterns match codepoints instead of bytes. They do not match
  invalid sequences. This is so that ${var#?} removes the first
  codepoint, not the first byte. However, * continues to match any
  string and an invalid sequence matches an identical invalid sequence.
  (This differs from fnmatch(3).)

Internal:
* CTL* bytes are moved to bytes that cannot occur in UTF-8 so that
  mbrtowc(3) can be used directly. The new locations do occur in
  iso-8859-* encodings.

Limitations:
* Only UTF-8 support is added, not any other multibyte encodings. I do
  not want to bloat up sh with mbrtowc(3) and similar everywhere.
* Invalid sequences may not be handled as desired. It seems aborting on
  invalid UTF-8 sequences would break things, so they are let through.
  This also avoids bloating the code up with checking everywhere.
* There is no special treatment for combining characters, accented
  letters may match ? or ?? or even more depending on normalization
  form. This matches other code in FreeBSD and is usually good enough
  because normalization forms that use as few codepoints as possible
  tend to be used.
* IFS remains byte-based as in ksh93 (but unlike bash and zsh).
* Our version of libedit does not support UTF-8 so sh will still be
  rather unpleasant to use interactively with characters not in
  us-ascii.

Is this useful and worth the (small) bloat?

A somewhat related feature is support for \u and \U
sequences in $'...' (this will be added to POSIX, see
http://austingroupbugs.net/view.php?id=249 and I plan to add it to sh).
Ideally, these are converted using iconv(3) but as long as it is not
unconditionally available in base or if it is not supposed to be used,
the codepoints can be encoded in UTF-8 for UTF-8 locales, leaving other
locales with question marks.

-- 
Jilles Tjoelker
Index: parser.h
===
--- parser.h	(revision 218371)
+++ parser.h	(working copy)
@@ -34,16 +34,16 @@
  */
 
 /* control characters in argument strings */
-#define CTLESC '\201'
-#define CTLVAR '\202'
-#define CTLENDVAR '\203'
-#define CTLBACKQ '\204'
+#define CTLESC '\300'
+#define CTLVAR '\301'
+#define CTLENDVAR '\371'
+#define CTLBACKQ '\372'
 #define CTLQUOTE 01		/* ored with CTLBACKQ code if in quotes */
 /*	CTLBACKQ | CTLQUOTE == '\205' */
-#define	CTLARI	'\206'
-#define	CTLENDARI '\207'
-#define	CTLQUOTEMARK '\210'
-#define	CTLQUOTEEND '\211' /* only for ${v+-...} */
+#define	CTLARI	'\374'
+#define	CTLENDARI '\375'
+#define	CTLQUOTEMARK '\376'
+#define	CTLQUOTEEND '\377' /* only for ${v+-...} */
 
 /* variable substitution byte (follows CTLVAR) */
 #define VSTYPE		0x0f	/* type of variable substitution */
Index: sh.1
===
--- sh.1	(revision 218467)
+++ sh.1	(working copy)
@@ -2510,4 +2510,7 @@ was originally written by
 .Sh BUGS
 The
 .Nm
-utility does not recognize multibyte characters.
+utility does not recognize multibyte characters other than UTF-8.
+The line editing library
+.Xr editline 3
+does not recognize multibyte characters.
Index: expand.c
===
--- expand.c	(revision 218371)
+++ expand.c	(working copy)
@@ -52,6 +52,7 @@ __FBSDID($FreeBSD$);
 #include stdlib.h
 #include string.h
 #include unistd.h
+#include wchar.h
 
 /*
  * Routines to expand arguments to commands.  We have to deal with
@@ -111,16 +112,16 @@ static void addfname(char *);
 static struct strlist *expsort(struct strlist *);
 static struct strlist *msort(struct strlist *, int);
 static char *cvtnum(int, char *);
-static int collate_range_cmp(int, int);
+static int collate_range_cmp(wchar_t, wchar_t);
 
 static int
-collate_range_cmp(int c1, int c2)
+collate_range_cmp(wchar_t c1, wchar_t c2)
 {
-	static char s1[2], s2[2];
+	static wchar_t s1[2], s2[2];
 
 	s1[0] = c1;
 	s2[0] = c2;
-	return (strcoll(s1, s2));
+	return (wcscoll(s1, s2));
 }
 
 /*
@@ -665,6 +666,7 @@ evalvar(char *p, int flag)
 	int special;
 	int startloc;
 	int varlen;
+	int varlenb;
 	int easy;
 	int quotes = flag  (EXP_FULL | EXP_CASE | EXP_REDIR);
 
@@ -712,8 +714,15 @@ again: /* jump here after setting a variable with
 		if (special) {
 			varvalue(var, varflags  VSQUOTE, subtype, flag);
 			if (subtype == VSLENGTH) {
-varlen = expdest - stackblock() - startloc;
-STADJUST(-varlen, expdest);
+varlenb = expdest - stackblock() - startloc;
+varlen = varlenb;
+if (localeisutf8) {
+	val = stackblock() + startloc;
+	for (;val != expdest; val++)
+		if ((*val  0xC0) == 0x80)
+			varlen--;
+}
+STADJUST(-varlenb, expdest);
 			}
 		} else {
 			char const *syntax = (varflags  VSQUOTE) ? DQSYNTAX
@@ -721,7 +730,9 @@ again: /* jump here after setting

Re: man 3 getopt char * const argv[] - is const wrong ?

2011-02-13 Thread Jilles Tjoelker
On Sun, Feb 13, 2011 at 01:59:38PM +0100, Matthias Andree wrote:
 Am So, 13.02.2011, 13:20 schrieb Julian H. Stacey:
  Hi Hackers
  Ref.: man 3 getopt
  int getopt(int argc, char * const argv[], const char *optstring);
  Ref.: KR 2nd Ed P.211 last indent, 2nd sentence
  The purpose of const is to announce [objects] that may be
  placed in read-only memory, and perhaps to increas[e] opportunities for
  optimization

const only rarely allows optimization, mostly because a declaration like
const foo *p only says that the foo may not be modified via p, not that
it may not be modified at all.

Another important purpose of const is to allow the programmer to
document what is not changed by a function, and have the compiler check
this documentation.

  optstring is obviously const,
  but I don't see that argv can calim to be const ?

 The const basically states that the argv[] elements (i. e. the char *
 pointers) are const[ant], or, read another way, that getopt promises not
 to mess with the *pointers* but is free to modify the actual strings
 pointed to (with the usual constraints of not assuming they can be
 extended, for instance) - and I don't see the problem in that.

getopt() should not modify the strings themselves either, but there are
at least two reasons why the type is not const char *const argv[]:

1. Elements of argv are written into char *optarg.

2. Many programs declare main's second argument as char *argv[] which
   cannot be converted to const char *const [], other than via a cast.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Removing PATH=...%builtin... from sh

2010-12-25 Thread Jilles Tjoelker
Most ash derivatives have an undocumented feature where the presence of
an entry %builtin in $PATH will cause builtins to be checked at that
point of the PATH search, rather than before looking at any directories
as documented in the man page (very old versions do document this
feature).

I would like to remove this feature from sh, as it complicates the code,
may violate expectations (for example, /usr/bin/alias is very close to a
forkbomb with PATH=/usr/bin:%builtin, only /usr/bin/builtin not being
another link saves it) and appears to be unused (all the %builtin google
code search finds is in some sort of ash source code).

The feature also has a bug in that it does not work properly if there is
a PATH assignment before 'type', 'command -v' or 'command -V' and
%builtin differs between the set and temporary paths. (This worked in
8.x for 'command'; 'type' ignores PATH assignment in 8.x.)

It is true that dash ignores %builtin for the special builtins and the
regular builtins that are found before PATH, using it only for the
other builtins. I could do the same but I think it is not useful enough.

See also
http://lists.freebsd.org/pipermail/svn-src-head/2010-November/022613.html

Which builtins this is about:
POSIX special builtins:
  : . break continue eval exec exit export readonly return set shift
  times trap unset
POSIX commands found before PATH (regular builtins):
  alias bg cd command false fc fg getopts jobs kill pwd read true
  umask unalias wait
Other POSIX utilities that must be builtins:
  hash type ulimit
Other POSIX utilities:
  [ echo printf test
Non-POSIX utilities that must be builtins:
  builtin bind chdir jobid local setvar
Undocumented builtins:
  exp let wordexp

And for completeness one command that should really be a builtin or
otherwise magic:
Missing POSIX commands found before PATH (regular builtins):
  newgrp

-- 
Jilles Tjoelker
Index: bin/sh/exec.c
===
--- bin/sh/exec.c	(revision 216690)
+++ bin/sh/exec.c	(working copy)
@@ -92,7 +92,6 @@
 
 
 static struct tblentry *cmdtable[CMDTABLESIZE];
-static int builtinloc = -1;		/* index in path of %builtin, or -1 */
 int exerrno = 0;			/* Last exec error */
 
 
@@ -245,8 +244,7 @@
 	}
 	while ((name = *argptr) != NULL) {
 		if ((cmdp = cmdlookup(name, 0)) != NULL
-		  (cmdp-cmdtype == CMDNORMAL
-		 || (cmdp-cmdtype == CMDBUILTIN  builtinloc = 0)))
+		  cmdp-cmdtype == CMDNORMAL)
 			delete_cmd_entry();
 		find_command(name, entry, DO_ERR, pathval());
 		if (verbose) {
@@ -337,8 +335,8 @@
 			goto success;
 	}
 
-	/* If %builtin not in path, check for builtin next */
-	if (builtinloc  0  (i = find_builtin(name, spec)) = 0) {
+	/* Check for builtin next */
+	if ((i = find_builtin(name, spec)) = 0) {
 		INTOFF;
 		cmdp = cmdlookup(name, 1);
 		if (cmdp-cmdtype == CMDFUNCTION)
@@ -354,7 +352,7 @@
 	prev = -1;		/* where to start */
 	if (cmdp) {		/* doing a rehash */
 		if (cmdp-cmdtype == CMDBUILTIN)
-			prev = builtinloc;
+			prev = -1;
 		else
 			prev = cmdp-param.index;
 	}
@@ -366,19 +364,7 @@
 		stunalloc(fullname);
 		idx++;
 		if (pathopt) {
-			if (prefix(builtin, pathopt)) {
-if ((i = find_builtin(name, spec))  0)
-	goto loop;
-INTOFF;
-cmdp = cmdlookup(name, 1);
-if (cmdp-cmdtype == CMDFUNCTION)
-	cmdp = loc_cmd;
-cmdp-cmdtype = CMDBUILTIN;
-cmdp-param.index = i;
-cmdp-special = spec;
-INTON;
-goto success;
-			} else if (prefix(func, pathopt)) {
+			if (prefix(func, pathopt)) {
 /* handled below */
 			} else {
 goto loop;	/* ignore unimplemented options */
@@ -485,8 +471,7 @@
 
 	for (pp = cmdtable ; pp  cmdtable[CMDTABLESIZE] ; pp++) {
 		for (cmdp = *pp ; cmdp ; cmdp = cmdp-next) {
-			if (cmdp-cmdtype == CMDNORMAL
-			 || (cmdp-cmdtype == CMDBUILTIN  builtinloc = 0))
+			if (cmdp-cmdtype == CMDNORMAL)
 cmdp-rehash = 1;
 		}
 	}
@@ -506,13 +491,11 @@
 	const char *old, *new;
 	int idx;
 	int firstchange;
-	int bltin;
 
 	old = pathval();
 	new = newval;
 	firstchange = ;	/* assume no change */
 	idx = 0;
-	bltin = -1;
 	for (;;) {
 		if (*old != *new) {
 			firstchange = idx;
@@ -523,19 +506,12 @@
 		}
 		if (*new == '\0')
 			break;
-		if (*new == '%'  bltin  0  prefix(builtin, new + 1))
-			bltin = idx;
 		if (*new == ':') {
 			idx++;
 		}
 		new++, old++;
 	}
-	if (builtinloc  0  bltin = 0)
-		builtinloc = bltin;		/* zap builtins */
-	if (builtinloc = 0  bltin  0)
-		firstchange = 0;
 	clearcmdentry(firstchange);
-	builtinloc = bltin;
 }
 
 
@@ -556,9 +532,7 @@
 		pp = tblp;
 		while ((cmdp = *pp) != NULL) {
 			if ((cmdp-cmdtype == CMDNORMAL 
-			 cmdp-param.index = firstchange)
-			 || (cmdp-cmdtype == CMDBUILTIN 
-			 builtinloc = firstchange)) {
+			 cmdp-param.index = firstchange)) {
 *pp = cmdp-next;
 ckfree(cmdp);
 			} else {
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd

Re: libkvm: consumers of kvm_getprocs for non-live kernels?

2010-11-10 Thread Jilles Tjoelker
On Wed, Nov 10, 2010 at 09:41:52PM +0100, Ulrich Spörlein wrote:
 I have this cleanup of libkvm sitting in my tree and it needs a little
 bit of testing, especially the function kvm_proclist, which is only
 called from kvm_deadprocs which is only called from kvm_getprocs when kd
 is not ALIVE.

 The only consumer in our tree that I can make out is *probably* kgdb, as
 ps(1), top(1), w(1), pkill(1), fstat(1), systat(1), pmcstat(8) and
 bsnmpd don't really work on coredumps

 But, the kgdb file gnu/usr.bin/binutils/gdb/kvm-fbsd.c, where
 kvm_getprocs is probably called on a dead kernel is not even used during
 build!

 So I guess I'm staring at dead code here, any kvm people around that can
 clue me in?

It is a while ago that I last used this, but ps and fstat definitely
worked on crashdumps, to some extent. /usr/sbin/crashinfo uses this.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [PATCH] update to the latest libedit version and remove libreadline deps

2010-11-05 Thread Jilles Tjoelker
On Fri, Nov 05, 2010 at 04:32:56PM +0100, Baptiste Daroussin wrote:
 I've updated libedit to the latest version available in the netbsd cvs.
 UTF8 support is disabled for now has it seems to be experimental and segfault.
 I also patch and tested all the sources that used to be linked against
 libreadline so that it now uses libedit making libreadline unused (I
 guess, perhaps I have missed some)

 beware that there are collision between libreadline and libedit
 (/usr/include/readline/readline.h) is provided by both of them.

 You can find the patch against current here:
 http://people.freebsd.org/~bapt/update-libedit.patch

This patch reverts various local enhancements to libedit, such as:
* r212191 more expected behaviour of ed-delete-next-char
* r212235 delete key support
* r209217,r209219,r209224 completion improvements for sh(1)
* built-in \033[1~ and \033[4~ support (not sure how useful this is)
* .An macros in man pages

If the csh-style history expansion gets in, a set -o histexpand for sh
could be useful. I think this is not important enough to add all the
code for, but if we have the code anyway why not allow using it. On the
other hand, if we can replace it with stubs and is very simplistic,
perhaps it is better to stub it out and not change sh.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Space character in rc.conf variable

2010-10-30 Thread Jilles Tjoelker
On Sat, Oct 30, 2010 at 01:11:51PM -0700, Doug Barton wrote:
 On 10/30/10 11:56, Harald Servat wrote:
 My SSID has an space character and I don't know how to deal with
 it. So my question is easy, how do I add an space in the SSID?
 I've tried wrapping the SSID using \ , \' also adding \(space),
 changing  for ' and others without success.

 As an example my rc.conf entry looks like
   ifconfig_wlan0=ssid SSID WITH SPACE dhcp
 and after booting I see that ifconfig tried to connect to SSID instead to
  SSID WITH SPACE.

 As I said, if I run ifconfig wlan0 ssid SSID WITH SPACE from
 the command line, works fine.

 The best solution would be to change the SSID to one without spaces. :)

Well, as mentioned elsewhere in the thread write the SSID in hex. You
also need this if *, ? or [ occur in the SSID.

 If you can't or won't do that, one of these should work, please report 
 back which one does:

 ='ssid ssid with space DHCP'
 ='ssid \ssid with space\ DHCP'
 =ssid 'ssid with space' DHCP
 =ssid \'ssid with space\' DHCP

This will not work, as the value is stored in a variable and then split
on IFS characters by expanding the variable outside quotes. There is no
escape mechanism - either the whole variable is split or none of it (if
in double-quotes).

One workaround would be to set IFS to something not containing space,
for example tabnewline, and then delimiting fields with tabs instead
of spaces. Apart from the general ugliness all over the rc.confs
(including the defaults/ one), breakage would probably also happen in
rc.subr and network.subr.

Doug's suggestions point to something like
  eval set -- ${ifconfig_FOO}
  ifconfig $_if $@

This has potential security issues with arbitrary stuff from untrusted
sources (say, some sort of privilege separation where someone may
configure the network but not do arbitrary things) in /etc/rc.conf:
  ifconfig_wlan0='ssid $(chown evilhacker topsecretfile) dhcp'
is perfectly safe now, but will be insecure. (Pathname generation may
lead to information disclosure, but not executing arbitrary commands.)

Also, the syntax for the keywords such as DHCP is too flexible. This
currently does not do much harm other than stopping you from using
DHCP as an ssid or something, but is hard to remove from $@ in the
above code. If they were restricted to appear at the start only, simple
comparisons of $1 and shift would do the job, but removing from the
middle requires constructing a string with quoting characters and then
using eval on that string.

Array support in the shell could make this easier, but not much unless
the rc.conf syntax would be changed as well, like
  ifconfig_wlan0=(ssid SSID WITH SPACE dhcp)
but then changed some more such that DHCP can be used as SSID.

Array support would add a fair bit of code to sh, and even then it will
remain fairly limited and clumsy. For example, an array can only be
returned from a function by passing the name of an existing array to
place the result in and using eval, extending setvar in some strange way
or implementing another ksh93 extension, namerefs.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sysrc -- a sysctl(8)-like utility for managing /etc/rc.conf et. al.

2010-10-12 Thread Jilles Tjoelker
On Sat, Oct 09, 2010 at 03:39:44PM -0700, Devin Teske wrote:
 On Oct 9, 2010, at 1:25 PM, Garrett Cooper wrote:
[...]
 The second usage (: function) aids in finding the function
 declaration among the usages. See, in Perl, I can simply search for
 sub preceding the function name. In C, I tend to split the return
 type from the function name and ensure that the function name always
 starts in column-1 so I can search for ^funcname to go to the
 declaration opposed to the usages/references. In BASH, `function' is a
 valid keyword and you can say function funcname ( ) BLOCK but
 unfortunately in good ol' bourne shell, function is not an
 understood keyword, ... but really liking this keyword, I decided to
 make use of it in bourne shell by-way-of simply making it a
 non-executed-expression (preceded it with : and terminated it with
 ;).

I think function f() { ... } is a despicable bashism. In POSIX, an empty
pair of parentheses is an unambiguous indication of a function
definition; although FreeBSD sh also accepts () as a subshell containing
an empty command, there is no reason to use that.

In ksh, function f { ... } is different from f() { ... }, so there is a
reason to use it, and function f() { ... } does not exist.

 I originally had been programming in tests for '!' and 'in', but in
 POSIX bourne-shell, they aren't defined (though understood) in the
 keyword table (so type(1) balks in bourne-shell while csh and bash do
 respond to '!' and 'in' queries).

 Since you've pointed out command(1)... I now have a way of checking
 '!'. Though unfortunately, command -v, like type(1), also does not
 like in (in bourne-shell at least).

The type builtin in the original Bourne shell does not know keywords.
The type builtin in FreeBSD sh knows them, but unlike most shells in
is not a keyword -- it is only recognized at the appropriate places in a
case or for command (this is probably a POSIX violation and may change
in the future).

  x$fmt != x ? It seems like it could be simplified to:

  if [ $# -gt 0 ]
  then
 local fmt=$1
 shift 1
 eprintf $fmt\n $@
  fi
 
 I never understood why people don't trust the tools they are using...
 
 `[' is very very similar (if not identical) to test(1)
 
 [ ... ] is the same thing as [ -n ... ] or test -n ...
 [ ! ... ] is the same things as [ -z ... ] or test -z ...
 
 I'll never understand why people have to throw an extra letter in there and 
 then compare it to that letter.
 
 If the variable expands to nothing, go ahead and let it. I've traced
 every possible expansion of variables when used in the following
 manner:

 [ $VAR ] ...

 and it never fails. If $VAR is anything but null, the entire
 expression will evaluate to true.

Right, except in very old implementations if VAR is -t, but those should
be extinct and are not worth coding for.

However, in some fairly recent implementations [ $A = $B ] does not
work as expected for all values of A and B. For example, FreeBSD 7.0
returns true for A='(' and B=')' because it applies the POSIX rules in
the wrong order (certainly, [ '(' $X ')' ] is true for most non-empty
values of X, but this does not apply if X is an operator).

  # depend $name [ $dependency ... ]
  #
  # Add a dependency. Die with error if dependency is not met.
  #
  : dependency checks performed after depend-function declaration
  : function ; depend ( ) # $name [ $dependency ... ]
  {
 local by=$1 arg
 shift 1
  
 for arg in $@; do
 local d

  Wouldn't it be better to declare this outside of the loop (I'm not
  sure how optimal it is to place it inside the loop)?

 I'm assuming you mean the local d statement. There's no restriction
 that says you have to put your variable declarations at the beginning
 of a block (like in C -- even if only within a superficial block { in
 the middle of nowhere } ... like that).

 Meanwhile, in Perl, it's quite a difference to scope it to the loop
 rather than the block. So, it all depends on whichever _looks_ nicer
 to you ^_^

Although this works, I do not recommend it. It makes the impression that
the variable is scoped to the do block, which is not the case as there
is only function scope. Also, FreeBSD sh will register another local
variable record if you make the same variable local again (but it will
reset the variable to the correct value later).

  I'd say that you have bigger fish to try if your shell lacks the
  needed lexicon to parse built-ins like for, do, local, etc :)...

local might be worth checking for as the original Bourne shell and ksh93
don't know it.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Why I can't trace linux process's childs with truss?

2010-09-10 Thread Jilles Tjoelker
On Fri, Sep 10, 2010 at 12:07:05PM -0700, Yuri wrote:
 I am trying to get the log of all system calls that skype makes with 
 truss -f /usr/local/share/skype/skype
 For some reason the resulting log only has the leading process calls and 
 nothing from it's 8 childs.
 Truss doesn't show any 'cloned' processes. Is this a bug in truss that 
 it doesn't follow 'cloned' processes?

 Is there any workaround or other way I can debug skype? strace doesn't 
 work on amd64.
 I am primarily interested why it can't read /dev/video0 device, created 
 by webcamd.

Try using ktrace instead of truss. You will need devel/linux_kdump from
ports to decode the resulting ktrace.out.

Alternatively, if you're familiar with dtrace, you could try that.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ar(1) format_decimal failure is fatal?

2010-08-29 Thread Jilles Tjoelker
On Sat, Aug 28, 2010 at 07:08:34PM -0400, Benjamin Kaduk wrote:
 I've been working on fixing the OpenAFS network filesystem client for 
 FreeBSD, and it's at the point where I want to use the lazy man's 
 filesystem stress test: buildworld.
 However, quite early in the process, I get the following error:
  stage 1.1: legacy release compatibility shims
 [...]
 === tools/build (obj,includes,depend,all,install)
 [...]
 building static egacy library
 ar: fatal: Numeric user ID too large
 *** Error code 70

 This error appears to be coming from 
 lib/libarchive/archive_write_set_format_ar.c , which seems to only have 
 provisions for outputting a user ID in AR_uid_size = 6 columns.
 Now, AFS user IDs are not tied to unix IDs; there is a separate protection 
 server whose pts IDs are used for access control; these pts IDs are tied 
 to kerberos authentication.  In this case, I was authenticated as 
 daemon/freebuild.mit@athena.mit.edu, with pts ID 33554737, which needs 
 8 columns to display.

 It looks like this macro was so defined in version 1.1 of that file, with 
 commit message 'ar' format support for libarchive, contributed by Kai 
 Wang..  This doesn't make it terribly clear whether the 'ar' format 
 mandates this length, or if it is an implementation decision, so I get to 
 ask: what reasoning (if any) was behind this choice?  Would anything break 
 if it was bumped up to a larger size?  Are there other options for a 
 workaround in my AFS environment?

I wonder if the uid/gid fields are useful at all for ar archives. Ar
archives are usually not extracted, and when they are, the current
user's values seem good enough. The uid/gid also prevent exactly
reproducible builds (together with the timestamp).

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: tiny patch to prevent head from closing pipes

2010-08-29 Thread Jilles Tjoelker
On Sat, Aug 28, 2010 at 06:42:29PM +0400, Anonymous wrote:
 Alexander Best arun...@freebsd.org writes:

  i just had subversion complain about a broken pipe while piping its
  output through awk straight to head [1]. i decided to add a switch
  to head which will tell it to never close a pipe unless the input
  has stopped [2].

 You can do same with sh(1), e.g.

   $ svn log | (IFS=; while read li; do [ $((i+=1)) -le 10 ]  echo $li; 
 done)

 versus

   $ svn log | (IFS=; while read li  [ $((i+=1)) -le 10 ]; do echo $li; 
 done)
   ...
   svn: Write error: Broken pipe

Even easier (and avoiding mangling through read) is
  svn log | { head; cat /dev/null; }
Beware that the cat command will not necessarily get all of the input.
head may buffer reads and it is unable to push the extra data back onto
the pipe.

 But I think subversion should

I think Subversion should not print an error message for broken pipe on
its stdout.

  there's probably a much more efficient way of discarding the input
  without closing the pipe unless the input ceased. it's just a 5
  minute hack in order to see if people find the idea useful or not.
  ;)

 Can you give an example of usefulness that does not involve subversion?

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Help with some makefile hackery

2010-06-25 Thread Jilles Tjoelker
On Tue, Jun 22, 2010 at 11:18:43PM -0700, Patrick Mahan wrote:
 src-kern-tools:
  cd src; ./machine-kernel-toolchain.sh 21 | tee logfile

The pipeline will return the status of 'tee' which is almost always 0.
The exit status of the build script is ignored.

A simple fix is store the status in a file and refer to that afterwards.
Another fix uses a separate file descriptor to pass through the status,
like
  { st=$(
{
  { ./kernel-toolchain.sh 21 3- 4-; echo $? 3; } | tee logfile 4
} 31)
  } 41
but this is fairly complicated.

The issue can be sidestepped entirely by sending the output to a file
only; a developer can use tail -f to follow the build if necessary.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: libc symbol versioning difficulties with iconv integration

2010-06-04 Thread Jilles Tjoelker
On Fri, Jun 04, 2010 at 12:58:34PM +0200, Gabor Kovesdan wrote:
 I'm trying to integrate the result of my last SoC work to the base 
 system but I'm facing some difficulties with libc symbol versioning. I 
 placed the iconv code into an iconv subdirectory inside src/lib/libc and 
 I added a Makefile and a symbol map, just like another parts of libc do 
 but when I try to compile this stuff, I get this error in the linking phase:

 building shared library libc.so.7
 /usr/bin/ld: libc.so.7: undefined versioned symbol namefts_o...@fbsd_1.0
 /usr/bin/ld: failed to set dynamic section sizes: Bad value
 *** Error code 1

 I have no idea what's going wrong because I did everything exactly in 
 the same way as another components do. I don't know why does it break at 
 fts_open(), which is unrelated to iconv, not even used in the iconv 
 code. If I just unhook the iconv part fromt he build, everything goes 
 fine. Any ideas?

 Patch is here: http://kovesdan.org/patches/iconv-libc.diff

There is a  .include bsd.lib.mk  in iconv/Makefile.inc, what happens
if you take that out?

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: /dev/null zero inside chroot for make release

2010-05-15 Thread Jilles Tjoelker
On Sat, May 15, 2010 at 09:12:04AM +0200, Tjado Mäcke wrote:
 Am 13.05.2010 19:44, schrieb Julian H. Stacey:
  Hi Hackers,
  Problem with /dev/null  /dev/zero inside a chroot:
  I wanted to build a release from inside a chroot

  What sort of null  zero should be in chroot ?
  man mknod ... deprecated ...
  Should I be running a devfs (I'm not currently)
  Or a jail ? (I dont really want that level of encapsulation ).

 http://www.ijs.si/software/amavisd/README.chroot

 mknod dev/nullc  2 2   # FreeBSD

This is deprecated as of FreeBSD 5.0 and does not work at all as of
FreeBSD 6.0, see mknod(8). Only device nodes in devfs can be used to
access devices.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: /dev/null zero inside chroot for make release

2010-05-13 Thread Jilles Tjoelker
On Thu, May 13, 2010 at 07:44:58PM +0200, Julian H. Stacey wrote:
 Problem with /dev/null  /dev/zero inside a chroot:
 I wanted to build a release from inside a chroot

 What sort of null  zero should be in chroot ?
 man mknod ... deprecated ...
 Should I be running a devfs (I'm not currently)
 Or a jail ? (I dont really want that level of encapsulation ).

Mount devfs in your chroot. Then use devfs(8) to limit what's visible in
that particular devfs, if you want.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [patch] [RFC] pathchk quiet flag

2010-04-27 Thread Jilles Tjoelker
On Mon, Apr 26, 2010 at 08:57:31PM +0300, Eitan Adler wrote:
 This path adds a -q flag to stop pathchk from outputting error
 messages but still return an error code.

Why do you need this; can't you do something like
  pathconf -- $pathname 2/dev/null

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


libc NLS, NFS mounted /usr/local, DHCP, no default route causes hangs

2010-04-23 Thread Jilles Tjoelker
The changes to use NLS for strerror sometimes cause one of my virtual
machines to deadlock. This virtual machine runs 9-CURRENT, acquires its
IP address via DHCP (virtualbox host-only networking), has no default
route and has /usr/local and /usr/home NFS mounted.

When the DHCP lease expires such as by resetting the date after a VM
saverestore, one of the route(8) commands executed by
dhclient-script(8) fails and calls strerror(3). Following the default
NLSPATH, catopen(3) looks in /usr/share/nls first; because the catalog
is not there it then tries in /usr/local/share/nls which deadlocks
because the network is not available.

I currently use the attached patch which returns failure on any attempt
to open a catalog for language C, but I think this is not correct.
Am I using a configuration that is not supposed to work (NFS mounted
/usr/local with DHCP in particular), or should this be fixed in some
other way?

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: libc NLS, NFS mounted /usr/local, DHCP, no default route causes hangs

2010-04-23 Thread Jilles Tjoelker
On Fri, Apr 23, 2010 at 04:17:36PM +0200, Jilles Tjoelker wrote:
 The changes to use NLS for strerror sometimes cause one of my virtual
 machines to deadlock. This virtual machine runs 9-CURRENT, acquires its
 IP address via DHCP (virtualbox host-only networking), has no default
 route and has /usr/local and /usr/home NFS mounted.

 When the DHCP lease expires such as by resetting the date after a VM
 saverestore, one of the route(8) commands executed by
 dhclient-script(8) fails and calls strerror(3). Following the default
 NLSPATH, catopen(3) looks in /usr/share/nls first; because the catalog
 is not there it then tries in /usr/local/share/nls which deadlocks
 because the network is not available.

 I currently use the attached patch which returns failure on any attempt
 to open a catalog for language C, but I think this is not correct.
 Am I using a configuration that is not supposed to work (NFS mounted
 /usr/local with DHCP in particular), or should this be fixed in some
 other way?

The patch:

Index: lib/libc/nls/msgcat.c
===
--- lib/libc/nls/msgcat.c   (revision 206760)
+++ lib/libc/nls/msgcat.c   (working copy)
@@ -138,6 +138,9 @@
lang = C;
}
 
+   if (strcmp(lang, C) == 0)
+   NLRETERR(ENOENT);
+
/* Try to get it from the cache first */
RLOCK(NLERR);
SLIST_FOREACH(np, cache, list) {

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: System() returning ECHILD error on FreeBSD 7.2

2010-02-10 Thread Jilles Tjoelker
On Wed, Feb 10, 2010 at 12:44:57PM +0530, Naveen Gujje wrote:
 [SIGCHLD handler that calls waitpid()]

 And, in some other part of the code, we call system() to add an ethernet
 interface. This system() call is returning -1 with errno set to ECHILD,
 though the passed command is executed successfully.  I have noticed that,
 the problem is observed only after we register SigChildHandler. If I have a
 simple statement like system(ls) before and after the call to
 signal(SIGCHLD, SigChildHandler), the call before setting signal handler
 succeeds without errors and the call after setting signal handler returns -1
 with errno set to ECHILD.

 Here, I believe that within the system() call, the child exited before the
 parent got a chance to call _wait4 and thus resulted in ECHILD error. But,
 for the child to exit without notifying the parent, SIGCHLD has to be set to
 SIG_IGN in the parent and this is not the case, because we are already
 setting it to SigChildHandler. If I set SIGCHLD to SIG_DFL before calling
 system() then i don't see this problem.

 I would like to know how setting SIGCHLD to SIG_DFL or SigChildHanlder is
 making the difference.

I think your process is multi-threaded. In a single-threaded process,
system()'s signal masking will ensure it will reap the zombie, leaving
the signal handler with nothing (in fact, as of FreeBSD 7.0 it will not
be called at all unless there are other child processes).

In a multi-threaded process, each thread has its own signal mask and
system() can only affect its own thread's signal mask. If another thread
has SIGCHLD unblocked, the signal handler will race with system() trying
to call waitpid() first.

Possible fixes are using siginfo_t information to only waitpid() child
processes you know about, setting up the signal masks so the bad
situation does not occur (note that the signal mask is inherited across
pthread_create()) and calling fork/execve and managing the child process
exit yourself.

Note that POSIX does not require system() to be thread-safe.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ps time field jumps backward

2010-02-07 Thread Jilles Tjoelker
On Fri, Feb 05, 2010 at 05:45:01PM -0500, Linda Messerschmidt wrote:
 On Fri, Feb 5, 2010 at 4:28 PM, Dan Nelson dnel...@allantgroup.com wrote:
  Ideally, top and ps would total up all
  the per-thread CPU counts when displaying the per-process numbers, but it
  doesn't seem to.

 It does seem to total them:

 $ ps axHo pid,lwp,time,wchan,comm | awk '$1 == 1647'
  1647 100401   0:00.63 select mysqld
  1647 100466   0:11.08 sigwai mysqld
  1647 100521   0:00.00 ucond  mysqld
 $ ps axo pid,lwp,time,wchan,comm | awk '$1 == 1647'
  1647 100521   0:11.71 ucond  mysqld

 But you put me on the right track.  I ran both side by side for
 awhile, and found that ps/top only sums up those threads that haven't
 already exited.  I.e., once a thread has exited, it's as if its usage
 never happened from the perspective of ps and top's total calculation.

 That seems like undesirable behavior, particularly if it conceals
 CPU-churning behavior by short-lived threads, but possibly very hard
 to change. :(

 I wonder if the system accounting records are more accurate?

It should not be very hard to fix the time field, because the rusage
struct is correct. This can be seen in the ^T status line and, after the
process has terminated, time, times, ps S.

In the kernel code, sys/kern/kern_proc.c, it seems that
fill_kinfo_proc_only() puts in the correct ki_runtime using
p-p_rux.rux_runtime, but fill_kinfo_aggregate() later overwrites this
using the sum of all threads' td-td_runtime. Removing the bogus
calculation from fill_kinfo_aggregate() fixes ps(1)'s time field, but
not the %cpu field, nor anything in top(1). The latter is because top(1)
always requests information about threads and does the same wrong
calculation as fill_kinfo_aggregate() internally.

Fixing the %cpu field needs changes to the scheduler. The schedulers
have functions to propagate CPU usage from terminating child processes
and threads, but this seems to affect scheduling decisions only and not
the %cpu field. Note that the CPU usage is always charged to the most
recently created thread in the process, not necessarily the thread that
called or will call fork(), pthread_create(), waitpid() or
pthread_join().

If the thread charged to could be selected better, it could be useful to
add in the cpu time as well.

Index: sys/kern/kern_proc.c
===
--- sys/kern/kern_proc.c(revision 203549)
+++ sys/kern/kern_proc.c(working copy)
@@ -676,11 +676,9 @@
 
kp-ki_estcpu = 0;
kp-ki_pctcpu = 0;
-   kp-ki_runtime = 0;
FOREACH_THREAD_IN_PROC(p, td) {
thread_lock(td);
kp-ki_pctcpu += sched_pctcpu(td);
-   kp-ki_runtime += cputick2usec(td-td_runtime);
kp-ki_estcpu += td-td_estcpu;
thread_unlock(td);
}

Test program that starts threads that waste about 1 second of cpu time
each.

#include pthread.h
#include stdio.h
#include unistd.h
#include err.h

void *
threadfunc(void *unused)
{
int error;

error = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
if (error)
errc(1, error, 
pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS));
for (;;)
;
}

int
main(int argc, char *argv[])
{
int error;
pthread_t td;

for (;;)
{
error = pthread_create(td, NULL, threadfunc, NULL);
if (error != 0)
errc(1, error, pthread_create);
sleep(1);
error = pthread_cancel(td);
if (error != 0)
errc(1, error, pthread_cancel);
error = pthread_join(td, NULL);
if (error != 0)
errc(1, error, pthread_join);
}

return 0;
}

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Unique process id (not pid) and accounting daemon

2010-01-29 Thread Jilles Tjoelker
On Mon, Jan 25, 2010 at 02:33:35AM +0300, cronfy wrote:
 I am trying to create an accounting daemon that would be more precise
 than usual BSD system accounting. It should read the whole process
 tree from time to time (say, every 10 seconds) and log changes in
 usage of CPU, I/O operations and memory per process. After daemon
 notices process exit, it should read /var/account/acct to get a last
 portion of accounting data and make a last entry for the process. Also
 daemon should read /var/account/acct to find information about
 processes that had been running between taking process tree snapshots.

 There is a problem: it is not always possible to link a process in a
 process tree against matching process in an accounting file. Only
 command name, user/group id  and start time will match, but:

  * start time may change (i. e. after ntpdate);
  * command name saved in /var/account/acct is 15 characters max
 (AC_COMM_LEN in sys/sys/acct.h), while command name in the process
 tree is 19 characters max (MAXCOMLEN in sys/sys/param.h).

 To ensure that process in the process tree and process in the
 accounting file are the same, I want to add unique process identifier
 (uint64_t) to 'proc' struct in sys/sys/proc.h and increment it for
 every process fork. I see it is possible to do this just before
 sx_sunlock() in fork1() in sys/kern/kern_fork.c. I'll have to add
 saving of this identifier in kern_acct.c, of course.

 This way I will be extremely easy to remember a process in the process
 tree and find a matching one in the accounting file after it finishes.

 Am I looking in a right direction or should I try some other way?
 Thanks in advance.

Have you looked at audit(4)?

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Greetings... a patch I would like your comments on...

2010-01-22 Thread Jilles Tjoelker
On Fri, Jan 22, 2010 at 07:55:47AM -0800, Randall Stewart wrote:
 On Jan 22, 2010, at 7:33 AM, Ivan Voras wrote:

  On 01/22/10 16:10, Ed Schouten wrote:
  * Ivan Vorasivo...@freebsd.org  wrote:
  This is a good and useful addition! I think Windows has  
  implemented a
  generalization of this (called wait objects or something like  
  that),
  which effectively allows a select()- (or in this case kqueue())-like
  syscall to wait on both file descriptors and condvars (as well as
  probably other MS-style objects). It's useful for multiplexing  
  events
  for dissimilar sources.

  NtWaitForSingleObject(), NtWaitForMultipleObjects(), etc. :-)

  Yes, I was thinking about WaitForMultipleObjects() - I sometimes  
  wished I had it in FreeBSD :)

  I think the hackers@ side of the thread is missing the original link  
  to the patch file offered for review, so here it is:

  http://people.freebsd.org/~rrs/kque_umtx.patch

Cool.

  My kqueue-fu level is too low to be really useful here but from what  
  I've read it looks like a logical and even reasonably clean way of  
  doing it.

 thanks it made sense to me ;-)

  If I read the comment at filt_umtxattach() correctly, in the best  
  case you would need an extension to the kevent structure to add more  
  fields like data  udata (for passing values back and forth between  
  userland and kernel). I agree with this - it would be very  
  convenient for some future purposes (like file modification  
  notification) if the kernel filter could both accept and return a  
  struct of data from/to the userland.

 Yeah, more arguments inside the kevent would allow me to add the  
 COND_CV_WAIT* where a lock and condition are passed
 in as well... But I was hesitant to add more than was already there  
 since doing
 so would cause ABI ripples that I did not want to face.

I don't think passing the lock is needed.

Traditional way (error handling omitted):

pthread_mutex_lock(M);
while (!P)
pthread_cond_wait(C, M);
do_work();
pthread_mutex_unlock(M);

The thread must be registered as a waiter before unlocking the mutex,
otherwise a wakeup could be lost.

Possible kqueue way (error handling omitted):

kq = kqueue();
pthread_mutex_lock(M);
while (!P)
{
pthread_cond_kqueue_register_wait_np(C, kq);
pthread_mutex_unlock(M);
nevents = kevent(kq, NULL, 0, events, 1, NULL);
... handle other events ...
pthread_mutex_lock(M);
}
do_work();
pthread_mutex_unlock(M);
close(kq);

To avoid lost wakeup, the kqueue must be registered as a waiter before
unlocking the mutex. Once it has been registered, it is safe to unlock
the mutex as the kqueue will store the fact that the condition has been
signalled. Hence, pthread_cond_kqueue_register_wait_np() or however it
will be named does not need the mutex explicitly.

kevent() needs to return some sort of identifier of C, possibly via
kevent.udata. In that case the hack with fflags and data remains needed.

Registering multiple condvars in one function call could be nasty, as it
requires all the mutexes to be locked simultaneously.

On the other hand if you do want registration and waiting in the same
function, like kqueue can do with file descriptors, then it is needed to
pass all the locks so they can be unlocked after registering and before
waiting. I think this results in a rather complicated API though.

Adding a mutex to a kqueue probably means that mutexes are used wrongly:
they should not be locked for so long that it is desirable to wait for
one together with other objects (pthread_mutex_timedlock() is meant as a
fail-safe).

Semaphores could also be waited for via kqueue, probably via a version
of sem_trywait() that registers the kqueue as a waiter. Once this is
signalled, the thread can call the function again to obtain the
semaphore or register again if another thread obtained the semaphore
first.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [PATCH] linprocfs dofilesystems

2010-01-10 Thread Jilles Tjoelker
On Fri, Jan 08, 2010 at 06:19:29PM +0100, Fernando Apesteguía wrote:
 This patch implements the filesystems file in the linux proc fs.
 I have used it for some time without seeing any problems. Let me
 know in case this is useful.

Concrete examples of Linux apps that start working or work better with
this change will be helpful.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: linprocfs Input/output error

2010-01-01 Thread Jilles Tjoelker
On Fri, Jan 01, 2010 at 06:45:33PM +0100, Fernando Apesteguía wrote:
 Hi all, I post here cause I didn't get any answers in freebsd-emulation.

 In 8.0-RELEASE-p1 if I try to execute this:

 cat /compat/linux/proc/cpuinfo  ~/cpuinfo.txt

 I get

 cat: /compat/linux/proc/cpuinfo: Input/output error

 truss shows the read system call returns ERR#5. It is the same with
 other files from linprocfs.

 cat /compat/linux/proc/[any_file] works fine

 What am I missing?

You are not missing anything, this is a bug. A while ago, cat(1) was
changed to use larger buffers when writing to regular files on machines
with sufficient RAM, usually a multiple of MAXPHYS. On the other hand,
pseudofs (the general framework for filesystems such as procfs,
linprocfs and linsysfs) in src/sys/fs/pseudofs/pseudofs_vnops.c
pfs_read() fails any read over MAXPHYS + 1 with EIO. This limit probably
has to do with the allocation of a buffer of that size using sbuf_new(9)
(and so, malloc(9)).

Some sort of limit seems appropriate, but MAXPHYS seems unrelated, and
if the request is too long it should perhaps just truncate it.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


[PATCH] fts(3): allow FTS_LOGICAL without FTS_NOCHDIR

2010-01-01 Thread Jilles Tjoelker
fts(3) currently automatically enables FTS_NOCHDIR if FTS_LOGICAL (-L in
various utilities) is used, according to a comment because symlinks are
too hard. However, the effect of fts_instr=FTS_FOLLOW seems quite
similar to FTS_LOGICAL in chdir mode. Using one more bit in fts_flags, I
made FTS_LOGICAL work with chdir.

This has several advantages:
* it runs faster
* it eliminates the PATH_MAX restriction on deep directory trees
* find -L ... -type l -delete can be made to work without ugliness
  (although it is of course still insecure if run over directory trees
  where untrusted users have write access)

Question: why are the values for fts_flags in the public header file
fts.h, even though they are supposedly private?

In case the mailing list eats the patch, it is also available here:
http://www.stack.nl/~jilles/unix/fts-logical-chdir.patch

-- 
Jilles Tjoelker
Index: include/fts.h
===
--- include/fts.h	(revision 201269)
+++ include/fts.h	(working copy)
@@ -106,6 +106,7 @@
 #define	FTS_DONTCHDIR	 0x01		/* don't chdir .. to the parent */
 #define	FTS_SYMFOLLOW	 0x02		/* followed a symlink to get here */
 #define	FTS_ISW		 0x04		/* this is a whiteout object */
+#define	FTS_DIRSYM	 0x08		/* this is a symlink to a directory */
 	unsigned fts_flags;		/* private flags for FTSENT structure */
 
 #define	FTS_AGAIN	 1		/* read node again */
Index: lib/libc/gen/fts.c
===
--- lib/libc/gen/fts.c	(revision 201269)
+++ lib/libc/gen/fts.c	(working copy)
@@ -141,10 +141,6 @@
 	/* Shush, GCC. */
 	tmp = NULL;
 
-	/* Logical walks turn on NOCHDIR; symbolic links are too hard. */
-	if (ISSET(FTS_LOGICAL))
-		SET(FTS_NOCHDIR);
-
 	/*
 	 * Start out with 1K of path space, and enough, in any case,
 	 * to hold the user's paths.
@@ -355,8 +351,10 @@
 		/* If skipped or crossed mount point, do post-order visit. */
 		if (instr == FTS_SKIP ||
 		(ISSET(FTS_XDEV)  p-fts_dev != sp-fts_dev)) {
-			if (p-fts_flags  FTS_SYMFOLLOW)
+			if (p-fts_flags  FTS_SYMFOLLOW) {
 (void)_close(p-fts_symfd);
+p-fts_symfd = -1;
+			}
 			if (sp-fts_child) {
 fts_lfree(sp-fts_child);
 sp-fts_child = NULL;
@@ -385,6 +383,14 @@
 		 * FTS_STOP or the fts_info field of the node.
 		 */
 		if (sp-fts_child != NULL) {
+			if (p-fts_flags  FTS_DIRSYM 
+			!(p-fts_flags  FTS_SYMFOLLOW)) {
+if ((p-fts_symfd = _open(., O_RDONLY, 0))  0) {
+	p-fts_errno = errno;
+	p-fts_info = FTS_ERR;
+} else
+	p-fts_flags |= FTS_SYMFOLLOW;
+			}
 			if (fts_safe_changedir(sp, p, -1, p-fts_accpath)) {
 p-fts_errno = errno;
 p-fts_flags |= FTS_DONTCHDIR;
@@ -674,8 +680,8 @@
 		nlinks = 0;
 		/* Be quiet about nostat, GCC. */
 		nostat = 0;
-	} else if (ISSET(FTS_NOSTAT)  ISSET(FTS_PHYSICAL)) {
-		if (fts_ufslinks(sp, cur))
+	} else if (ISSET(FTS_NOSTAT)) {
+		if (ISSET(FTS_PHYSICAL)  fts_ufslinks(sp, cur))
 			nlinks = cur-fts_nlink - (ISSET(FTS_SEEDOT) ? 0 : 2);
 		else
 			nlinks = -1;
@@ -707,7 +713,19 @@
 	 */
 	cderrno = 0;
 	if (nlinks || type == BREAD) {
-		if (fts_safe_changedir(sp, cur, dirfd(dirp), NULL)) {
+		if (cur-fts_flags  FTS_DIRSYM 
+		!(cur-fts_flags  FTS_SYMFOLLOW)) {
+			if ((cur-fts_symfd = _open(., O_RDONLY, 0))  0) {
+if (nlinks  type == BREAD)
+	cur-fts_errno = errno;
+cur-fts_flags |= FTS_DONTCHDIR;
+descend = 0;
+cderrno = errno;
+			} else
+cur-fts_flags |= FTS_SYMFOLLOW;
+		}
+		if (!(cur-fts_flags  FTS_DONTCHDIR) 
+		fts_safe_changedir(sp, cur, dirfd(dirp), NULL)) {
 			if (nlinks  type == BREAD)
 cur-fts_errno = errno;
 			cur-fts_flags |= FTS_DONTCHDIR;
@@ -796,7 +814,8 @@
 		} else if (nlinks == 0
 #ifdef DT_DIR
 		|| (nostat 
-		dp-d_type != DT_DIR  dp-d_type != DT_UNKNOWN)
+		dp-d_type != DT_DIR  dp-d_type != DT_UNKNOWN 
+		(dp-d_type != DT_LNK || ISSET(FTS_PHYSICAL)))
 #endif
 		) {
 			p-fts_accpath =
@@ -883,7 +902,7 @@
 	FTSENT *t;
 	dev_t dev;
 	ino_t ino;
-	struct stat *sbp, sb;
+	struct stat *sbp, sb, sb2;
 	int saved_errno;
 
 	/* If user needs stat info, stat buffer already allocated. */
@@ -923,6 +942,20 @@
 
 	if (S_ISDIR(sbp-st_mode)) {
 		/*
+		 * Check if this is actually a symlink to a directory.
+		 * If so, record this fact so we save a file descriptor
+		 * to get back instead of using ...
+		 */
+		if (ISSET(FTS_LOGICAL)  !follow) {
+			if (lstat(p-fts_accpath, sb2)) {
+p-fts_errno = errno;
+goto err;
+			}
+			if (S_ISLNK(sb2.st_mode))
+p-fts_flags |= FTS_DIRSYM;
+		}
+
+		/*
 		 * Set the device/inode.  Used to find cycles and check for
 		 * crossing mount points.  Also remember the link count, used
 		 * in fts_build to limit the number of stat calls.  It is
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

why does _PATH_STDPATH contain the current directory?

2009-12-23 Thread Jilles Tjoelker
/usr/include/paths.h has:
/* All standard utilities path. */
#define _PATH_STDPATH   /usr/bin:/bin:/usr/sbin:/sbin:

The current directory appears to have been added accidentally years ago.
Can I go ahead and take it out (the colon)?

The paths for rescue already do not have the current directory.

The main use for _PATH_STDPATH is to find all standard utilities, such
as for the POSIX-specified confstr(_CS_PATH), 'getconf PATH' and
'command -p'. Some utilities also use it instead of _PATH_DEFPATH for
root, but these values tend to be overridden by /etc/login.conf and/or
shell startup files.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Workaround for ntop as daemon, is it ok?

2009-11-27 Thread Jilles Tjoelker
On Fri, Nov 27, 2009 at 09:19:11AM +0100, Henner Morten Kruse wrote:
 I have just set up an ntop server based on 8.0-RELEASE.

  FreeBSD ntop 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:48:17 UTC 2009
  r...@almeida.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  i386

 After installing ntop 1.3.10 and all dependencies from the ports ntop
 did work, but when running ntop as a daemon I got permanent and repeating
 warning messages:

 [warn] kevent: Bad file descriptor

 [...]

 I found out that ntop forks another thread for the daemon and kills the
 initial one. The problem with this behaviour is that the kqueue is
 started by the initial thread and the daemon thread doesn't use the
 same file descriptors. So the kqueue is lost.

That's a process, not a thread.

 [...]

 When I change the fork() in line 186 to rfork(RFPROC) everything works
 and I get no more warning messages and procstat reports an existing
 kqueue for the daemonized ntop:

 So my question to this is:
 Is my workaround ok or could this cause any problems? And what is the cause
 of these warnings? Is it a bug or incapability in the kqueue implementation
 or is it caused by bad code in ntop?

Because it refers to other file descriptors, a kqueue is only really
meaningful in the file descriptor table it was created in (this could be
avoided for events that do not refer to file descriptors, and I think
Solaris's ports system does that).

As described in the kqueue(2) man page, calling rfork(2) without RFFDG
will allow sharing the kqueue between the two processes.

As long as the parent process exits quickly, or no tricks with file
descriptor numbers are done, this is pretty safe.

Another fix is to create the kqueue in the child process.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: CFR: Exceedingly minor fixes to libc

2009-11-13 Thread Jilles Tjoelker
On Fri, Nov 13, 2009 at 12:18:49AM -0500, Garrett Wollman wrote:
 If you have a moment, please take a look at the following patch.  It
 contains some very minor fixes to various parts of libc which were
 found by the clang static analyzer.  They fall into a few categories:

 1) Bug fixes in very rare situations (mostly error-handling code that
 has probably never been executed).

 2) Dead store elimination.

 3) Elimination of unused variables.  (Or in a few cases, making use of
 them.)

 Some minor style problems were also fixed in the vicinity.  There
 should be no functional changes except in very unusual conditions.

This looks mostly ok, with a few exceptions.

 [...]
 Index: gen/getpwent.c
 ===
 --- gen/getpwent.c(revision 199242)
 +++ gen/getpwent.c(working copy)
 @@ -257,22 +257,19 @@
 [...]
 @@ -1876,7 +1871,6 @@
   syslog(LOG_ERR,
getpwent memory allocation failure);
   *errnop = ENOMEM;
 - rv = NS_UNAVAIL;
   break;
   }
   st-compat = COMPAT_MODE_NAME;

I think this change hides the wrongness in the handling of malloc errors
while leaving it as broken as it is.

 [...]
 Index: net/getservent.c
 ===
 --- net/getservent.c  (revision 199242)
 +++ net/getservent.c  (working copy)
 @@ -476,11 +476,11 @@
   struct nis_state *st;
   int rv;
  
 - enum nss_lookup_type how;
   char *name;
   char *proto;
   int port;
  
 + enum nss_lookup_type how;
   struct servent *serv;
   char *buffer;
   size_t bufsize;
 @@ -491,7 +491,8 @@
  
   name = NULL;
   proto = NULL;
 - how = (enum nss_lookup_type)mdata;
 +
 + how = (intptr_t)mdata;
   switch (how) {
   case nss_lt_name:
   name = va_arg(ap, char *);

In what way is this an improvement?

 Index: posix1e/acl_delete_entry.c
 ===
 --- posix1e/acl_delete_entry.c(revision 199242)
 +++ posix1e/acl_delete_entry.c(working copy)
 @@ -89,20 +89,20 @@
   return (-1);
   }
  
 - if ((acl-ats_acl.acl_cnt  1) ||
 - (acl-ats_acl.acl_cnt  ACL_MAX_ENTRIES)) {
 + if ((acl_int-acl_cnt  1) ||
 + (acl_int-acl_cnt  ACL_MAX_ENTRIES)) {
   errno = EINVAL;
   return (-1);
   }

If you are changing this code anyway, consider fixing a style bug
(extraneous parentheses) here.

 - for (i = 0; i  acl-ats_acl.acl_cnt;) {
 - if (_entry_matches((acl-ats_acl.acl_entry[i]), entry_d)) {
 + for (i = 0; i  acl_int-acl_cnt;) {
 + if (_entry_matches((acl_int-acl_entry[i]), entry_d)) {
   /* ...shift the remaining entries... */
 - for (j = i; j  acl-ats_acl.acl_cnt - 1; ++j)
 - acl-ats_acl.acl_entry[j] =
 - acl-ats_acl.acl_entry[j+1];
 + for (j = i; j  acl_int-acl_cnt - 1; ++j)
 + acl_int-acl_entry[j] =
 + acl_int-acl_entry[j+1];
   /* ...drop the count and zero the unused entry... */
 - acl-ats_acl.acl_cnt--;
 - bzero(acl-ats_acl.acl_entry[j],
 + acl_int-acl_cnt--;
 + bzero(acl_int-acl_entry[j],
   sizeof(struct acl_entry));
   acl-ats_cur_entry = 0;
   

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [patch] add pwait utility

2009-11-07 Thread Jilles Tjoelker
On Sat, Nov 07, 2009 at 03:01:36PM +0200, Kostik Belousov wrote:
 On Sat, Nov 07, 2009 at 11:28:32AM +0100, Gary Jennejohn wrote:
  On Fri, 6 Nov 2009 23:24:46 +0100
  Jilles Tjoelker jil...@stack.nl wrote:

   I propose adding a small new utility to /usr/bin: pwait. Similar to the
   Solaris utility of the same name, it waits for any process to terminate.

  Why not /bin so it can be used before /usr is mounted?
 And it seems to make sense to add this functionality to pkill/pgrep
 binary, creating another hardlink to it.

Hmm, pwait's syntax is incompatible: it takes pids (pkill says: use
kill) and the -v option does something totally different.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: help needed to fix contrib/ee crash/exit when receiving SIGWINCH

2009-10-24 Thread Jilles Tjoelker
On Fri, Oct 23, 2009 at 07:56:35PM +0400, Eygene Ryabinkin wrote:
 Hmm, we can transform this code to the following one:
 -
 errno = 0;
 do {
   in = wgetch(text_win);
 } while (errno == EINTR);
 if (in == -1)
   exit(0);
 -
 This won't help with FreeBSD's ncurses, but may be other variants
 will feel much better with such a event loop variant.

That should be:
-
do
in = wgetch(text_win);
while (in == -1  errno == EINTR);
if (in == -1)
exit(0);
-

errno should only be checked after failed function calls or for
functions where it is documented that errno should be used to check for
error conditions (example: readdir).

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: semaphores between processes

2009-10-22 Thread Jilles Tjoelker
On Thu, Oct 22, 2009 at 04:08:11PM -0400, Andrew Gallatin wrote:
 We then moved on to posix semaphores.  Using sem_wait/sem_post
 with the sem_t residing in a shared page seems to work on
 all 3 platforms.  However, the FreeBSD (7-stable) man page
 for sem_init(3) has this scary text regarding the pshared
 value:

   The sem_init() function initializes the unnamed semaphore pointed 
 to by
   sem to have the value value.  A non-zero value for pshared specifies a
   shared semaphore that can be used by multiple processes, which this
   implementation is not capable of.

 Is this text obsolete?  Or is my test just getting lucky?

They work, but only if the processes are related and do not exec and the
parent process initializes the semaphores before forking. This is
because sem_t is a pointer to a malloc'ed structure. For process-shared
semaphores this really only contains an identifier of the kernel
semaphore. This also means process-shared semaphores are slower than
in-process semaphores (libthr implements those using atomics and umtx so
that system calls are only needed if a thread needs to sleep or be
awakened).

This is documented in comments in the source code, but not in man pages
or other documentation.

 Is there recommended way to do this?

Apart from sysv semaphores, perhaps posix named semaphores (sem_open()
etc). These require a 'kldload sem' on older versions though.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sigwait - differences between Linux FreeBSD

2009-10-09 Thread Jilles Tjoelker
On Thu, Oct 08, 2009 at 01:02:09PM +0300, Kostik Belousov wrote:
 On Thu, Oct 08, 2009 at 11:53:21AM +1100, Stephen Hocking wrote:
  In my efforts to make the xrdp port more robust under FreeBSD, I have
  discovered that sigwait (kind of an analogue to select(2), but for
  signals rather than I/O) re-enables ignored signals in its list under
  Linux, but not FreeBSD. The sesman daemon uses SIGCHLD to clean up
  after a session has exited. Under Linux this works OK, under FreeSBD
  it doesn't. I have worked around it in a very hackish manner (define a
  dummy signal handler and enable it using signal, which means that the
  sigwait call can then be unblocked by it), but am wondering if anyone
  else has run across the same problem, and if so, if they fixed it in
  an elegant manner. Also, does anyone know the correct semantics of
  sigwait under this situation?

 ports@ is the wrong list to discuss the issue in the base system.

 Solaris 10 sigwait(2) manpage says the following:
 If sigwait() is called on an ignored signal, then the occurrence of the
 signal will be ignored, unless sigaction() changes the disposition.

 We have the same behaviour as Solaris, ingored signals are not queued or
 recorded regardeless of the presence of sigwaiting thread.

POSIX permits both behaviours here: a blocked and ignored signal may or
may not be discarded immediately on generation. Making this depend on
whether there is a sigwaiting thread seems broken, and I don't think
Linux does that.

I think your very hackish approach is correct: set up a dummy signal
handler after blocking the signal. 

Additionally, POSIX requires applications to set the SA_SIGINFO flag if
they want queuing. This applies even if the signals are blocked and
received using sigwaitinfo(2) or sigtimedwait(2). The SA_SIGINFO flag
can only be set by setting a handler using sigaction(2). (Note, this
does not mean that all signals are queued if SA_SIGINFO is set. It means
that signals may not be queued if SA_SIGINFO is not set.)

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Problem in bin/sh stripping the * character through ${expansion%}

2009-09-11 Thread Jilles Tjoelker
On Fri, Aug 07, 2009 at 03:26:50AM +0400, Eygene Ryabinkin wrote:
 Thu, Aug 06, 2009 at 11:15:12AM -0700, Doug Barton wrote:
  I came across this problem during a recent portmaster update. When
  trying to strip off the * character using variable expansion in bin/sh
  it doesn't work. Other special characters do work if they are
  properly escaped.

  The attached mini-script clearly shows the problem:

  $ sh sh-strip-problem
  var before stripping: foo\*
  var after stripping: foo\*

  var before stripping: foo\$
  var after stripping: foo\

 According to the sh(1), it is not a problem.  Namely,
  - \* being unquoted at all will produce a lone '*';
  - '*' when treated as the smallest pattern, will result in a stripping
of a zero-length string -- it is the smallest pattern in the case of
'*' that matches anything.

That is indeed an explanation why it works that way, but I think it is
wrong. Generally, the shell command language avoids unnecessary levels
of quoting. In the POSIX spec, Shell Command Language, note the part
about ${x#*} (pattern) and ${x#*} (literal asterisk). Also compare
with  case $something in \*) echo asterisk;; esac  which matches a
literal asterisk.

Two PRs already exist for aspects of stripping: bin/57554 (double
quotes) and bin/117748 (trying to match pattern matching characters
literally).

 In order to strip the trailing star you should use
 -
 var=${var%[*]}
 -
 This gives you the pattern of '[*]' that is properly treated as the
 single star -- it's a weird way to escape the star in the patterns.

This is indeed a good workaround.

-- 
Jilles Tjoelker
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


  1   2   >