Re: question in sosend_generic()

2013-06-08 Thread Robert Watson


On Fri, 7 Jun 2013, vasanth rao naik sabavat wrote:

When sending data out of the socket I don't see in the code where the sb_cc 
is incremented.


sb_cc reflects data appended to the socket buffer; sosend_generic() is 
responsible for arranging copying in and performing flow control, but the 
protocol's own pru_send() routine performs the append.  E.g., tcp_usr_send() 
performs sbappendstream() which actually adds it to the socket buffer. 
Notice that not all protocols actually use the send socket buffer -- for 
example, UNIX domain sockets direct cross-deliver to the receiving socket's 
receive buffer.


Is the socket send performed in the same thread of execution or the data is 
copied on to the socket send buffer and a different thread then sends the 
data out of the socket?


Protocols provide their own implementations to handle data moving down the 
stack, so the specifics are protocol-dependent.  In TCP, socket buffer append 
occurs synchronously in the same thread as part of the pru_send() downcall 
from the socket layer.  When data leaves the send socket buffer is quite a 
different question.  For TCP, data may be sent immediately if there various 
windows allow immediate transmit of the data (e.g., flow control, congestion 
control) ... or it may remain enqueued in the send socket buffer until an ACK 
is received that indicates the receiver is ready for more data (E.g., growing 
window size, ACK clocking, etc).  In the steady send state (e.g., filling the 
window) I would expect to see data sent (and later removed) from the socket 
buffer only in an asynchronous context.  Typically, ACK processing occurs in 
one of two threads: device driver interrupt handling (i.e., in the ithread) or 
in the netisr thread for encapsulated or looped back traffic.


Because, I see a call to sbwait(so-so_snd) in the sosend_generic and I 
don't understand who would wake up this thread?


sbwait() implements blocking for flow/congestion control: when the socket 
buffer fills, the sending thread must wait for space to open up.  Space 
becomes available as a result of successful transmit -- e.g., the sbtruncate() 
of the sending socket buffer when a TCP ACK has been received.  So the thread 
that triggers the wakeup will usually be the ithread or netisr.  In the case 
of UNIX domain sockets, it's actually the receiving thread that triggers the 
wakeup directly.



If the data is not copied on to the socket buffers then it should 
technically send all data out in the same thread of execution and future 
socket send calls should see that space is always fully available. In that 
case I dont see a reason why we need to wait on the socket send buffer. As 
there would no one who will actually wake you up.


There are some false assumptions here.  The sending thread will always append 
data [that fits] to the socket buffer, but may have to loop awaiting space for 
all data, depending on blocking/non-blocking status.  Space becomes available 
when the remote endpoint acknowledges receipt, perhaps via a TCP ACK.  You 
might never wake up if flow control from the remote endpoint doesn't find 
space becoming available, you've enabled blocking, and no timeout is set.  If 
you fear the recipient may block the sender, then you need to implement some 
timeout mechanism to decide how long you're willing to wait.



   if (space  resid + clen 
   (atomic || space  so-so_snd.sb_lowat || space 
clen)) {
   if ((so-so_state  SS_NBIO) || (flags  MSG_NBIO))
{
   SOCKBUF_UNLOCK(so-so_snd);
   error = EWOULDBLOCK;
   goto release;
   }
   error = sbwait(so-so_snd);
   SOCKBUF_UNLOCK(so-so_snd);
   if (error)
   goto release;
   goto restart;
   }

In the above code snippet, for a blocking socket if the space is not
available, then it may trigger a deadlock?


You can experience deadlocks between senders and receivers as a result of 
cyclic waits for constrained resources (e.g., buffers).  However, that is a 
property of application design, and applications that are killed will close 
their sockets, releasing resources.  Most application designers attempt to 
avoid deadlock in their designs by ensuring that there is a path to progress, 
even a slow one.


The deadlock you're suggesting in general does not exist -- it would be silly 
to wait for something that could never happen.  Instead, we wait for things 
that generally will happen (e.g., a TCP ACK) or a timeout, which would close 
the connection.  Notice that sbwait() is allowed to fail -- if the connection 
is severed due to a timeout or RST, then it returns immediately with an error.


Robert
___
freebsd-hackers@freebsd.org mailing list

Re: KVERIFY for non-debug invariants?

2012-12-06 Thread Robert Watson


On Wed, 5 Dec 2012, Vijay Singh wrote:

All. KASSERT() is a really need way of expressing invariants when INVARIANTS 
is defined. However for regular, non-INVARIANTS code folks have the typical 
if() panic() combos, or private macros. Would a KVERIFY() that does this in 
non-INVARIANTS code make sense?


I'd certainly be fine with something like this.  It might be worth posting to 
arch@ with a code example, as hackers@ has a subset of the potentially 
interested audience.  INVARIANTS has got a bit heavier-weight over the years 
-- the main thing I run into in higher-performance scenarios is its additional 
UMA debugging, which causes a global lock to be acquired during sanity checks. 
It might be worth our pondering adding a new configure option for particularly 
slow invariant tests -- e.g., INVARIANTS_SLOW ... or maybe just 
INVARIANTS_UMA.  However, that's a different issue.


(I sort of feel that things labeled assert should be something we can turn 
on in production... so maybe INVARIANTS/KASSERT mission-creep is the issue.)


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: A question about creating a system call

2012-11-08 Thread Robert Watson

Hi Dave:

This wiki page may be of value:

http://wiki.freebsd.org/AddingAuditEvents

Robert N M Watson
Computer Laboratory
University of Cambridge

On Thu, 8 Nov 2012, dave jones wrote:


Hello,

I know how to create system calls, but I'm a bit confused about
sys/kern/syscalls.master file explained. For example, if I have a
foo system call, following code is added:

532 AUE_NULLSTD { int foo(char *str); }

The question is in column two AUE_NULL, can I replace it with AUE_FOO?
How to determine the system call should be audit or not? Thank you.

Regards,
Dave.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: No bus_space_read_8 on x86 ?

2012-10-13 Thread Robert Watson


On Fri, 12 Oct 2012, Carl Delsey wrote:

Indeed -- and on non-x86, where there are uncached direct map segments, and 
TLB entries that disable caching, reading 2x 32-bit vs 1x 64-bit have quite 
different effects in terms of atomicity. Where uncached I/Os are being 
used, those differences may affect semantics significantly -- e.g., if your 
device has a 64-bit memory-mapped FIFO or registers, 2x 32-bit gives you 
two halves of two different 64-bit values, rather than two halves of the 
same value.  As device drivers depend on those atomicity semantics, we 
should (at the busspace level) offer only the exactly expected semantics, 
rather than trying to patch things up.  If a device driver accessing 64-bit 
fields wants to support doing it using two 32-bit reads, it can figure out 
how to splice it together following bus_space_read_region_4().
I wouldn't make any default behaviour for bus_space_read_8 on i386, just 
amd64. My assumption (which may be unjustified) is that by far the most 
common implementations to read a 64-bit register on i386 would be to read the 
lower 4 bytes first, followed by the upper 4 bytes (or vice versa) and then 
stitch them together.  I think we should provide helper functions for these 
two cases, otherwise I fear our code base will be littered with multiple 
independent implementations of this.


Some driver writer who wants to take advantage of these helper functions 
would do something like

#ifdef i386
#definebus_space_read_8bus_space_read_8_lower_first
#endif
otherwise, using bus_space_read_8 won't compile for i386 builds.
If these implementations won't work for their case, they are free to write 
their own implementation or take whatever action is necessary.


I guess my question is, are these cases common enough that it is worth 
helping developers by providing functions that do the double read and shifts 
for them, or do we leave them to deal with it on their own at the risk of 
possibly some duplicated code.


I was thinking we might suggest to developers that they use a KPI that 
specifically captures the underlying semantics, so it's clear they understand 
them.  Untested example:


uint64_t v;

/*
 * On 32-bit systems, read the 64-bit statistic using two 32-bit
 * reads.
 *
 * XXX: This will sometimes lead to a race.
 *
 * XXX: Gosh, I wonder if some word-swapping is needed in the merge?
 */
#ifdef 32-bit
bus_space_read_region_4(space, handle, offset, (uint32_t *)v, 2;
#else
bus_space_read_8(space, handle, offset, v);
#endif

The potential need to word swap, however, suggests that you may be right about 
the error-prone nature of manual merging.


Robert

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: No bus_space_read_8 on x86 ?

2012-10-12 Thread Robert Watson


On Fri, 12 Oct 2012, John Baldwin wrote:

I believe it was because bus reads weren't guaranteed to be atomic on 
i386. don't know if that's still the case or a concern, but it was an 
intentional omission.
True.  If you are on a 32-bit system you can read the two 4 byte values 
and then build a 64-bit value.  For 64-bit platforms we should offer 
bus_read_8() however.


I believe there is still no way to perform a 64-bit read on a i386 (or at 
least without messing with SSE instructions), but if you have to read a 
64-bit register, you are stuck with doing two 32-bit reads and 
concatenating them. I figure we may as well provide an implementation for 
those who have to do that as well as the implementation for 64-bit.


I think the problem though is that the way you should glue those two 32-bit 
reads together is device dependent.  I don't think you can provide a 
completely device-neutral bus_read_8() on i386.  We should certainly have it 
on 64-bit platforms, but I think drivers that want to work on 32-bit 
platforms need to explicitly merge the two words themselves.


Indeed -- and on non-x86, where there are uncached direct map segments, and 
TLB entries that disable caching, reading 2x 32-bit vs 1x 64-bit have quite 
different effects in terms of atomicity.  Where uncached I/Os are being used, 
those differences may affect semantics significantly -- e.g., if your device 
has a 64-bit memory-mapped FIFO or registers, 2x 32-bit gives you two halves 
of two different 64-bit values, rather than two halves of the same value.  As 
device drivers depend on those atomicity semantics, we should (at the busspace 
level) offer only the exactly expected semantics, rather than trying to patch 
things up.  If a device driver accessing 64-bit fields wants to support doing 
it using two 32-bit reads, it can figure out how to splice it together 
following bus_space_read_region_4().


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: syslog(3) issues

2012-09-03 Thread Robert Watson

On Mon, 3 Sep 2012, Attilio Rao wrote:

I was trying to use syslog(3) in a port application that uses threading , 
having all of them at the LOG_CRIT level. What I see is that when the 
logging gets massive (1000 entries) I cannot find some items within the 
/var/log/messages (I know because I started stamping also some sort of 
message ID in order to see what is going on). The missing items are in the 
order of 25% of what really be there.


Someone has a good idea on where I can start verifying for my syslogd 
system? I have really 0 experience with syslogd and maybe I could be missing 
something obvious.


syslog(3)/syslogd(8) use datagram sockets for both local and networked 
logging, and it is possible for those datagram sockets to fill and drop 
messages.  I'm not sure if we have per-socket counters that can easily be 
queried by syslogd, but if we do, it might be beneficial to have syslogd wake 
up once a second and check to see if the counters have changed -- if they 
have, inject a log message indicating how many messages were dropped in the 
last $epsilon.  If we don't have counters along those lines, it might make 
sense to add them.  We might also find that it is appropriate to tune up the 
limits if they no longer seem sensible in the current world order -- they may 
have late 1980s/early 1990s values (or they may not).


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: projects/armv6 merged to HEAD

2012-08-17 Thread Robert Watson


On Thu, 16 Aug 2012, Oleksandr Tymoshenko wrote:

projects/armv6 branch was merged to HEAD and should be considered dead now. 
This patch is a result of a joint effort by many people. Including but not 
limited to:


Amazing work -- many thanks are due to to everyone who was involved!

Robert



 Grzegorz Bernacki (gber@)
 Aleksander Dutkowski
 Ben R. Gray (bgray@)
 Olivier Houchard (cognet@)
 Rafal Jaworowski (raj@) and Semihalf team
 Tim Kientzle (kientzle@)
 Jakub Wojciech Klama (jceel@)
 Ian Lepore
 Warner Losh (imp@)
 Damjan Marion (dmarion@)
 Lukasz Plachno
 Stanislav Sedov (stas@)
 Mark Tinguely
 Andrew Turner (andrew@)

Thanks to all, who contributed by submitting code,
testing and giving valuable advices.

Code drop includes following parts:

- General ARMv6/ARMv7 kernel bits (pmap, cache,
   assembler routines, etc...)
- ARM SMP support
- VFP/Neon support
- ARM Generic Interrupt Controller driver
- Improved thread-local storage for cpus =ARMv6
- Two new values for TARGET_ARCH: armv6 and armv6eb
- Driver for SMSC LAN95XX and LAN8710A ethernet controllers
- Marvell MV78x60 support (multiuser, ARMADA XP kernel config)
- TI OMAP4 and AM335x support (multiuser, no GPU or graphics
   support, kernel configs for Pandaboard and Beaglebone)
- LPC32x0 support (multiuser, frame buffer works with SSD1289
   LCD controller.Embedded Artists EA3250 kernel config)
- Barebone Nvidia Tegra2 support (timers, interrupts and UART.
   No kernel config)

Hope now that the code is in trunk it will get more attention
and love from developers.

Happy hacking

--
gonzo
___
freebsd-a...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to freebsd-arch-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sysctl filesystem ?

2012-06-26 Thread Robert Watson

On Tue, 26 Jun 2012, Chris Rees wrote:


as well as we don't depend of /proc for normal operation we shouldn't for

say /proc/sysctl


improvements are welcome, better documentation is welcome, changes to

what is OK - isn't.

/proc/sysctl might be useful.  Just because Linux uses it doesn't make it a 
bad idea.


One of the problems we've encounted with synthetic file systems is that 
off-the-shelf file system tools (e.g., cp, dd, cat) make simplistic (but not 
unreasonable) assumptions about the statistic content of files.  This comes up 
frequently with procfs-like systems where the size of, say, memory map data 
can be considerably larger than the perhaps 128-byte, 256-byte, or even 8k 
buffers that might exist in a stock file access tool.  Unless we change all of 
those tools to use buffers much bigger than they currently do, which even 
suggets changing the C library buffer to defaults for FILE *, that places an 
onus on the file system to provide persisting snapshots of data until it's 
sure that a user process is done -- e.g., over many system calls.


sysctl is not immune to the requirement of atomicity, but it has explicit 
control over it: sysctl is a single system call, rather than an unbounded 
open-read-seek-repeat-etc cycle, and has been carefully crafted to provide 
this and other MIB-like properties, such as a basic data type model so that 
command line tools know how to render content rather than having to guess 
and/or get it wrong.  sysctl has some file-system like properties, but on the 
whole, it's not a file system -- it's much more like an SNMP MIB.


While you can map anything into anything (including Turing machines), I think 
the sysctl command line tool and API, despite its limitations, is a better 
match for accessing this sort of monitoring and control data than the POSIX 
file API, and would recommend against trying to move to a sysctl file system.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: SMP: protocol control block protection for a multithreaded process (ex: udp).

2012-05-29 Thread Robert Watson


On Tue, 29 May 2012, vasanth rao naik sabavat wrote:


In case of a Multicore cpu system running a multithreaded process.

For protocol control blocks there is no protection provided in the FreeBSD 
9. For example, udp_close() and udp_send() access the inp before taking the 
lock. Couldn't this cause the inp inconsistency on a multithreaded process 
running on multicore cpu system?


Say, If the two threads of a process are concurrently executing socket send 
and socket close say on a udp connection (this can happen in case of poorly 
written user code.). udp_close() will access the inp on one cpu and 
udp_send() will access the inp on another cpu. it is possible that 
udp_close() gets the locks first and free's the inp before udp_send() has a 
chance to run?


Am I missing anything?


The life cycle here is complicated and there is some subtlety.  The simple 
answer to your question is that udp_abort() and udp_close() don't free the 
inpcb -- that occurs in udp_detach(), which is called only when the reference 
count on the socket hits 0, which can't happen while udp_send() is in flight, 
as the caller owns a reference maintaining the stability of the socket.


Take a look at the comment at the top of uipc_socket.c for more detailed 
coverage of socket life cycles; for UDP, inpcbs are around for the entirely 
life cycle of the socket, so it is always safe to follow so-so_pcb if you 
hold a valid socket reference (either borrowed from a process's file 
descriptor, or held).  For TCP, things are more complex.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: SMP: protocol control block protection for a multithreaded process (ex: udp).

2012-05-29 Thread Robert Watson


On Tue, 29 May 2012, vasanth rao naik sabavat wrote:


Can somebody please reply to this email.

basically, can udp_detach() and udp_send() execute simultaneously for a 
process with multiple threads? if yes, then inp reference in udp_send() will 
be stale if udp_detach() free's the inp?


You are confusing application-level close() with an actual close in the socket 
implementation.  The socket will remain allocated as long as there are 
consumers using it, which is ensured through a reference count on the socket, 
regardless of close().  That isn't to say that there aren't bugs -- this stuff 
is pretty complex -- but the life cycle and synchronisation models around 
sockets should prevent the scenario you are describing from occurring.


Robert



Thanks,
Vasanth



On Tue, May 29, 2012 at 10:53 AM, vasanth rao naik sabavat 
vasanth.raon...@gmail.com wrote:


Hi,

In case of a Multicore cpu system running a multithreaded process.

For protocol control blocks there is no protection provided in the FreeBSD
9. For example, udp_close() and udp_send() access the inp before taking the
lock. Couldn't this cause the inp inconsistency on a multithreaded process
running on multicore cpu system?

Say, If the two threads of a process are concurrently executing socket
send and socket close say on a udp connection (this can happen in case of
poorly written user code.).
udp_close() will access the inp on one cpu and udp_send() will access the
inp on another cpu. it is possible that udp_close() gets the locks first
and free's the inp before udp_send() has a chance to run?

Am I missing anything?

Thanks,
Vasanth






___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD has serious problems with focus, longevity, and lifecycle

2012-01-18 Thread Robert Watson


On Mon, 16 Jan 2012, Julian Elischer wrote:


On 1/16/12 3:32 PM, William Bentley wrote:
I also echo John's sentiments here. Very excellent points made here. Thank 
you for voicing your opinion. I was beginning to think I was the only one 
who felt this way.

[...]

We seem to have lost our way around the release of FreeBSD 7. I am all in 
favor of new features but not at the risk of stability and proper life 
cycle management.


Are me and John the only people that feel this way or are we among the 
minority?


It pretty much boils down to one thing..  man power..


I disagree.  Resourcing is an issue, but it is not *the* issue.  The real 
issue here is a failure by the release engineering team (which includes me) to 
concurrently perform major and minor releases.  Given that minor releases run 
like clockwork in most cases, this is disappointing.  In the past, there have 
been a lot of good technical and structural obstacles to trying to do 
clockwork releases for both major and minor releases:


- Tight synchronisation of the ports and base release schedule means that the
  base release schedule limits ports productivity.

- Long freezes forced on us by poor revision control support for branching.

None of these really apply any longer -- and in as much as they do, they 
should be addressed.  In particular, I think there's a growing feeling that 
ports should be conducting its own releases out of lockstep with the base 
tree, producing package sets as a primary product at regular intervals 
regardless of the base release schedule.  Likewise, long freezes enforced by 
expensive branching operation in CVS no longer apply due to use of Subversion 
-- it's not perfect, but it's workable.


There's no way to satisfy everyone with any particular maintenance schedule 
and release cycle.  However, it seems clear that the current model with minor 
releases spaced at a year is satisfying no one.  It's easy to point at a 
developer-user divide, but I think that misses the point: most developers 
are users.  A big gap between development branch and shipped features hurts 
the commercial users of FreeBSD that pay for so much of its development, since 
it forces them to support diverging local development and shipping products -- 
ISPs, etc.  There is no incentive for year-long gaps in minor releases.


My view is therefore that we have a social -- which is to say structural -- 
problem.  Regardless of .0 releases, we should be forcing out minor 
releases, which are morally similar to service packs in the vocabulary of 
other vendors: device driver improvements, new CPU support, steady of 
conservative feature development, etc, required to keep older major releases 
viable on contemporary hardware and with contemporary applications.  One known 
problem is using a single head release engineer in steering all releases. 
I think this is a mistake, as it makes the whole project's release schedule 
subject to individual unavailability, burnout, etc, as well as increasing the 
risks associated with low bus factor.  I'd like to see us move to a model 
where new release engineers are mentored in from the developer community for 
point releases, ensuring that we increase our expertise, share knowledge about 
release engineering in the broader community, and get new eyes on the process 
which can lead more readily to process improvements.  The role of the head 
release engineer shouldn't be hands-on prodution of every release, but rather, 
steering of the overall team.


I'd like to see this begin with 8.3, drawing a per-release lead from the 
developer community, and continue with a fixed schedule release of 8.4.  Yes, 
more staffing is needed, but first, what is needed is an improvement in model.


On a related note, the security team owns the freebsd-update mechanism, 
largely for historical reasons (Colin wrote it), but this is actually a bit 
backwards from how you would expect things to run, as we now use 
freebsd-update for upgrades, which are almost never engineered by the security 
team.  Not sure what the fix is there, but it seems related -- perhaps what is 
really called for is breaking out our .0 release engineering entirely from .x 
engineering, with freebsd-update being in the latter.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD has serious problems with focus, longevity, and lifecycle

2012-01-18 Thread Robert Watson

On Wed, 18 Jan 2012, Andriy Gapon wrote:


on 18/01/2012 02:16 Igor Mozolevsky said the following:
Seriously, WTF is the point of having a PR system that allows patches to be 
submitted??! When I submit a patch I fix *your* code (not yours personally, 
but you get my gist).


Let me pretend that I don't get it.  It is as much your code as it is mine 
if you are a user of FreeBSD.  I just happen to have a commit bit at this 
point in time.


No other project requires a non-committer to be so ridiculously persistent 
in order to get a patch through.


There are about 5000 open PRs for FreeBSD base system, maybe more. There are 
only a few dozens of active FreeBSD developers.  Maybe less for any given 
particular point in time (as opposed to a period of time). And dealing with 
PRs is not always exciting. Need I continue?


P.S. Using GNATS for the PR database doesn't help either, in some technical 
ways.


The structural problem around the PR system for the base system is that there 
isn't a whole lot of incentive for most developers to use it.  I think we can 
reasonably categorise developers into three classes -- some move between or 
span them, of course:


(1) Volunteers.  Due to childhood trauma, they have a desperate urge to write
operating systems.  Not much incentive to do PRs here, as most refer to
versions of FreeBSD before their time, aren't great characterisations,
rarely come with patches, and when they do, the patches are out of date,
don't apply, have the wrong style, solve the wrong problem, etc.  A
sweeping generalisation, but you see what I mean.  The only exceptions
here are our dedicated team of bugmeisters, who get enourmous respect from
me, but they are a tiny minority.

(2) Employees.  They work at a company using FreeBSD as a product, and
effectively deliver their own CompanyBSD as a further product to their own
internal customers -- to be put on a web service frontline, to ship as the
foundation of an appliance, etc.  The key phrase here is internal
customers -- they have their own bug report database, which they respond
to in a timely way due to the incentives of the workplace, but also
because they are relevant bug reports for their product goals.

(3) Authors of upstream code.  They don't even work on FreeBSD, but their code
ends up in FreeBSD, so they also have their own bug report databases, fix
bugs, and eventually the fixes trickle into FreeBSD.

With the above, the incentives to handle PRs are very weak -- and it's 
compounded by gnats being terrible for both submitters and handlers of bug 
reports.  Contrast this with ports, where the PR database is a key part of the 
workflow.


However, and I am being entirely honest when I say this: FreeBSD works anyway. 
So somehow, we end up with a pretty good OS despite largely ignoring our bug 
report database.  Why?  Well, for (1) it's because volunteers have a strong 
sense of ownership of the code they've written and care about, (2) there's a 
significant internal QA and bug management effort at downstream companies from 
FreeBSD, whose improvements are frequently upstreamed by committers on staff, 
and (3) occurs independently of bugs in our bug report database.


Don't get me wrong: it's a problem that the PR database goes so unloved.  But 
it's a symptom of the construction of *extremely large* volunteer projects in 
which the incentives are not aligned for dealing with PRs most of the time. 
If you want to see something similarly sad, try counting dropped patches on 
the linux-kernel list.  Someone once ported the entire FreeBSD kernel audit 
framework and OpenBSM to Linux, posted on the list saying here are my 
patches, never heard anything back, and went away.  You can moralise in 
various ways and for various parties in that relationship, but at heart, 
that's pretty similar to a lot of the patches in the PR database; you'll find 
similar stuff in every open source project of scale.  I submitted patches to 
fix several bugs in KDE a decade or so ago .. after five years, the reports 
were closed as out of date.  Yet large open source products *do* work, and 
become the foundations for amazing things.


I think shifting away from Gnats would help as it would make it easier for 
developers to find bugs they care about, users to submit higher-quality 
reports, and so on.  Gnats makes it really hard to manage reports in a useful 
way.


Another possibility is to get some combination of {The FreeBSD Foundation, iX 
Systems, ...} to trawl the bug report database in a more official capacity. 
The problem there is that this will be a high burn-out job.  I'll bring it up 
at the next Foundation board meeting, especially after a bumper year of 
fund-raising, and see what we can do.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to 

Re: FreeBSD has serious problems with focus, longevity, and lifecycle

2012-01-18 Thread Robert Watson


On Tue, 17 Jan 2012, Andriy Gapon wrote:


on 17/01/2012 00:28 John Kozubik said the following:

we going to run RELEASE software ONLY


My opinion: you've put yourself in a box that is not very compatible with 
the current FreeBSD release strategy.  With your scale and restrictions you 
probably should just use the FreeBSD source and roll your own releases from 
a stable branch of interest (including testing, etc).  Or have your own 
branch where you could cherry-pick interesting changes from any FreeBSD 
branches.  Tools like e.g. git and mercurial make it easy.  Of course, this 
strategy is not as easy as trying to persuade the rest of FreeBSD 
community/project/thing to change its ways, but perhaps a little bit more 
realistic.  You can bond with similarly minded organizations to share 
costs/work/etc.  It's a community-driven project after all.


Suppose for a moment we get the .x release process fixed: we start cutting 
regular point releases from -STABLE on a 6-month cycle (just a strawman). 
freebsd-update's update and upgrade features actually make tracking -STABLE at 
release engineered time slices plausible.


One reason that's true is that between 5.x and 6.x, the FreeBSD Project 
underwent a substantive change in our approach to binary interfaces.  In 4.x 
and before, the letters ABI rarely hit the mailing lists.  In 6.x and later, 
it's a key topic discussed whenever merges to -STABLE come up.  We now really 
care about keeping applications running as the OS moves under them.  We also 
build packages to better-defined ABIs -- not perfectly, but OK.


I think John gets a lot of what he wants if we just fix our release cycle.

Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD has serious problems with focus, longevity, and lifecycle

2012-01-18 Thread Robert Watson


On Tue, 17 Jan 2012, Doug Barton wrote:

The other thing I think has been missing (as several have pointed out in 
this thread already) is any sort of planning for what should be in the next 
release. The current time-based release schedule is (in large part) a 
reaction to the problems we had in getting 5.0 out the door. However I think 
the pendulum has swung *way* too far in the wrong direction, such that we 
are now afraid to put *any* kind of plan in place for fear that it will 
cause the release schedule to slip. Aside from the obvious folly in that 
(lack of) plan, it fails to take into account the fact that the release 
schedules already slip, often comically far out into the future, and that 
the results are often worse than they would have been otherwise.


Agreed entirely.  There's been an over-swing caused by the diagnosis it's 
like herding cats into cats can't be herded, so why try?.  Projects like 
FreeBSD don't agree if there's no consensus on interesting problems to solve, 
directions to run in, etc.  The history of FreeBSD is also full of examples of 
successful collaborative development in which developers decide, together, on 
a direction and run that way.  Sure, it's not the same as we are paying you 
to do X, but I think many FreeBSD developers like the idea that they are 
working on something larger than just their own micro-project, and would 
subscribe (and contribute) to a sensible plan.  In fact, I think we'd find 
that if we were a bit more forthcoming about our plans, we'd have an easier 
time soliciting contributions from people less involved in the project, as it 
would be more obvious how they could get involved.


It strikes me that the first basic plan would be a release schedule, however. 
:-)


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: buf_ring(9) API precisions

2011-09-15 Thread Robert Watson

On Thu, 15 Sep 2011, K. Macy wrote:

Why are you making an MD guess, the amount of padding to fit the size of a 
cache line, in MI API ? Strangely enough, you did not make this assumption 
in, say r205488 (picked randomly).


It has been several years, and I haven't done any work in svn in over a 
year, I don't remember. I probably meant to refine it in a later iteration.


If you would like to send me a patch addressing this I'd be more than happy 
to apply it if appropriate. Otherwise, I will deal with it some time after 9 
settles.


Thanks for pointing this out.


I'm not sure if gcc (and friends) allow __aligned(CACHE_LINE_SIZE) to be used 
on individual elements of a struct (causing appropriate padding to be added), 
but that may be one option here.  Of course, that introduces a further 
alignment requirement on the struct itself, so a moderate amount of care would 
need to be used.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: TIME_WAIT Assassination in FreeBSD???

2011-09-05 Thread Robert Watson

On Sat, 3 Sep 2011, Jarrod Lee Petz wrote:

3. Does FreeBSD handle this situation? How? I can't seem to find much info 
on TIME_WAIT assassination in FreeBSD is mentioned in RFC 6056


I'm not familiar with the RFC side here, but I can confirm that FreeBSD will 
recycle TIMEWAIT connections more quickly than specified when load is very 
high.  This is done on the basis of allocated space; the sysctl:


  net.inet.tcp.maxtcptw

Instructs the stack regarding how much state to retain -- this is implemented 
by adjusting the allocation limit on the tcptw zone.  On my system, it seems 
to auto-tune to about 5000 connections, a value derived from the global limit 
on the number of sockets on the box I'm looking at -- your mileage may vary.


The resource limit case can occur in tcp_twstart(), when uma_zalloc() returns 
NULL on failing to allocate new TIMEWAIT state for a connection.  At that 
point, it forces an early scan of TIMEWAIT connections (which normally happens 
on 2msl intervals) with a 'reuse' argument of 1, authorising premature reuse. 
Without too close an analysis, it appears on face value to implement LRU: we 
reuse storage held by the connection that has been in TIMEWAIT the longest.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Dynamic kernel module linking problem

2011-08-26 Thread Robert Watson


On Fri, 26 Aug 2011, Monthadar Al Jaberi wrote:

I have written a dynamic loadable module using DECLARE_MODULE in 
FreeBSD-Current.


And I want to iterate through the ifnet list using following code snippet:


If this is on a recent version of FreeBSD (8.x and later), then you probably 
mean to be using V_ifnet, and you should include if_var.h rather than using an 
extern in order to ensure virtualisation is handled properly.


Robert



extern struct ifnethead ifnet;
...
struct ifnet *ifp, *ifp_temp;
TAILQ_FOREACH_SAFE(ifp, ifnet, if_link, ifp_temp) {
printf(%s\n, ifp-if_dname);
}

Compilation is fine, but when I load the module I get the following error:

...
/sbin/kldload -v module.ko
link_elf: symbol ifnet undefined
...

What am I doing wrong? Shouldn't kernel be able to link it on its own?

Grateful for any advice.
--
//Monthadar Al Jaberi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Capsicum project: Ideas needed

2011-08-10 Thread Robert Watson


On Thu, 4 Aug 2011, Lars Engels wrote:


I just stumbled upon this rather outdated thread...

On Fri, 8 Jul 2011 15:09:52 +0400, Ilya Bakulin wrote: [...]

wget curl links/lynx
This is Ports software, we may try to modify it and even send patches to 
upstream, or maintain our local patches. I wanted to focus on base system 
components during GSoC, but it doesn't hurt to try to capsicumize these 
tools either.


fetch(1) is similar to wget and curl and is part of the base system, so 
would this be a candidate?


I'd think fetch would be quite a good candidate -- most of its work is done as 
a pipeline between a socket and a file, and sandboxing the gubbins that sits 
in the middle of that pipeline would be quite beneficial.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: MIPS toolchain

2011-07-31 Thread Robert Watson


On Fri, 29 Jul 2011, James Jones wrote:


Does anyone have a prebuilt MIPS tool chain?


For FreeBSD-related MIPS work, I generally use the FreeBSD toolchain target 
followed by the buildenv environment, but that requires first building a 
cross-toolchain using TARGET_ARCH and TARGET.  However, the result is a pretty 
sane compiler, linker, etc, setup for the MIPS of your choice (we tend to use 
mips64eb).


We also use the MIPS-provided SDE toolchain for Linux at the CL, but that 
appears to be out of maintenance, and I haven't found its bug density to be 
any lower, really, than the even more ageing FreeBSD versions of the tools. 
In fact, there are some toolchain bugs I'm running into that manifest only in 
the SDE toolchain and not the FreeBSD toolchain.  (Mind you, Philip has 
commented that in building Uboot for MIPS, he's found FreeBSD bugs that don't 
appear in the SDE toolchain, so mileage varies).


We're greatly looking forward to MIPS support for LLVM, which currently 
appears very premature indeed.  Someone from MIPS appears to be contributing 
to it, however, and we (cl.cam.ac.uk) hope to provide some implementation 
support for that effort in the immediate future.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: MAC Framework, Socket information

2011-07-29 Thread Robert Watson


On Thu, 28 Jul 2011, s wrote:

I need to get some info about the socket being created by the user. What I 
want to do is log all TCP/UDP outgoing connections that are being made. I 
*need* to get the local and remote address, as well as the local and remote 
port. I managed to get all of the remote data, but this is useless to me, if 
I haven't got the local port. Here is what I have already written:


Most MAC Framework entry points are invoked before operations of interest, 
rather than after, because they are intended to perform access control on 
operations.  I think the closest you may be able to get given current entry 
points is logging when the first operation is performed on the connected 
socket: i.e., read, write, sendfile, etc, since it will be established at that 
point (some caution required: you can invoke system calls on sockets before 
and during connect()).


However, I can't help but wonder: would you be better-served by using the 
kernel's audit facilities to track events like socket connection?  Are you 
blending access control and logging in your module, or is this really just 
about logging?


Robert




static int slog_socket_check_connect(struct ucred *cred,
   struct socket *socket, struct label *socketlabel,
   struct sockaddr *sockaddr)
{
   if(sockaddr-sa_family == AF_INET) {
   struct sockaddr_in sa;
   log(LOG_SECURITY | LOG_DEBUG, Somebody made a socket: %d:%d 
(%d)\n,

   cred-cr_ruid,
   ntohs(((struct sockaddr_in*)sockaddr)-sin_port),
   ntohs(((struct in_endpoints*)sockaddr)-ie_lport)
   );
   }
   return 0;
}

--
Pozdrawiam,
Jakub 'samu' Szafrański
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Add setacl system call?

2011-07-28 Thread Robert Watson


On Mon, 25 Jul 2011, exorcistkiller wrote:

Another question while I'm reading the code. In ufs_acl.c, in static int 
ufs_getacl_posix1e(struct vop_getacl_args *ap), you commented: As part of 
the ACL is stored in the inode, and the rest in an EA, assemble both into a 
final ACL product. From what I learned from Kirk's book, ACLs are supposed 
be stored in extended attributes blocks. So what do you mean by part of the 
ACL is stores in the inode? I know extended attributes blocks data can be 
addressed by inode, but how to get ACL directly from the inode?


POSIX.1e ACLs are defined as an extension to permissions: additional user 
entries, group entries, and a mask entry supplement the existing owner, group 
and other permission bits.  Both the APIs and command line tools allow the 
portions of the ACL representing permission bits to be directly manipulated. 
For the purpose of the UFS implementation (and I suspect this to be common in 
other implementations as well), we keep the owner/group/other bits (or 
sometimes the mask bits) in the existing inode permissions field.  All 
additional entries are stored in the extended attribute.  This has some nice 
properties, not least:


(1) stat(2) on the file still only needs look at the inode, not the extended
attributes, making it faster.
(2) chmod(2) can be implemented by writing out only the inode, also faster.
(3) Files without extended ACLs don't need extended attributes stored.

The inclusion of a mask field in POSIX.1e is motivated similarly: it is what 
allows stat(2) and chmod(2) to not touch extended ACL fields.


This is what the commend means by part of the ACL being stored in the inode, 
and part in the extended attribute: any areas of an ACL that are actually 
permission mask entries go in the existing mode bits in the inode for 
efficiency reasons.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: HTT vs SMT in x86 SMP topology reporting

2011-07-28 Thread Robert Watson


On Tue, 26 Jul 2011, Andriy Gapon wrote:

Can anybody explain to me why our _x86_ SMP topology discovery and reporting 
code sometimes reports HTT and sometimes SMT? As in FreeBSD/SMP: %d 
package(s) x %d core(s) x %d HTT threads vs FreeBSD/SMP: %d package(s) x %d 
core(s) x %d SMT threads


As I understand, and quoting Wikipedia (I know, I know), SMT stands for 
simultaneous multithreading and is a generic term for a particular kind of 
hardware multithreading: 
http://en.wikipedia.org/wiki/Simultaneous_multithreading


The only known (to me) implementation of SMT for x86 is Intel's 
Hyper-Threading Technology aka HTT aka HT Technology aka hyperthreading: 
http://en.wikipedia.org/wiki/Hyper-threading 
http://software.intel.com/en-us/articles/intel-hyper-threading-technology-your-questions-answered/?wapkw=%28Intel+Hyper-Threading+Technology%29


Several MIPS platforms we run on support SMT.  Typically this means a set of 
weaker threads sharing a single core, usually context switching as a result 
of memory access stalls in other threads, and perhaps sharing particularly 
expensive CPU features, such as a TLB.  They sometimes come with 
high-performance message-passing facilities between threads, or even between 
cores, to supplement shared memory and IPIs.


It may be that HTT is, among other things, a trademark of Intel.

Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Add setacl system call?

2011-07-25 Thread Robert Watson


On Sun, 24 Jul 2011, exorcistkiller wrote:

Hi, I'm working on a course project in which I need to add 3 system calls. 
One of which is setacl(char *name, int type, int idnum, int perms), which 
set acl for a file specified by name. I used newfs as in 
ftp://ftp.tw.freebsd.org/pub/FreeBSD/FreeBSD-current/src/sbin/newfs/ to make 
this new filesystem, named myfs (which really is UFS2) and mounted it.


My question is:
1) where to start with?
2) Is this filesystem actually a userland UFS and I can use functions in
libufs(3)?
3) What about functions in ufs_acl.c? Should the acls be stored on the
extended attributes blocks? Does FreeBSD 8.2 support it?

I know I'm asking stupid questions, but a small hint might help me a lot. 
Thank you so much..


Hi... er.. exorcistkiller...  (*)

This being FreeBSD, you may want to start with the existing programmer 
documentation, which should prove quite useful given your goals.  Try acl(3) 
for userspace, and acl(9) for the kernel.


You are doing this in the context of a course, so the constraints may be 
somewhat artificial.  However, normally my advice to someone wanting to add a 
new ACL implementation to FreeBSD would be to start with our existing 
implementation, which supports both POSIX.1e and NFSv4 ACLs (and is extensible 
to new ACL types without changing the current APIs (much)).  For example, if I 
were going to teach our native system call API about AFS ACLs, I'd start by 
perusing the above man pages and code, including:


  src/bin/*acl* # Commands for manipulating ACLs
  src/lib/libc/posix1e  # Library routines
  src/sys/kern/*acl*# File system-independent code
  src/sys/sys/acl.h # File system-independent header

As you've already found, ufs_acl.c contains the implementation for UFS; ZFS, 
NFS, etc, have similar-looking files with markedly different contents.  In 
general, if something looks file system-independent, we try to put it in the 
centralised files in kern, rather than replicate the code across file systems. 
Roughly half the code in the kern directory has to do with calls *into* the 
file system, and the other half is a library of routines called *by* the file 
system.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Finding symlink information in MAC Framework

2011-07-25 Thread Robert Watson


On Fri, 15 Jul 2011, s wrote:

I am trying to get some information related to the symlink which is being 
accessed by the user in MAC Framework. Currently I managed to get the 
uid/gid of the owner of the symlink that is being read, but now I need to 
get the same information about the target, that the symlink points to.


static int samplemac_vnode_check_link (struct ucred *cred, struct vnode *vp,
   struct label *vplabel)
{

int error;
struct vattr vap;

error = VOP_GETATTR(vp, vap, cred);
if (error)
return (1);

if(vap.va_uid != 0) {
		log(LOG_NOTICE, stub_vnode_check_readlink: %i, gid: %i\n, 
vap.va_uid, vap.va_gid);

return (0);
}

return (0);
}

And I have no idea how could I do that. Where should I look for that info? 
And what way would be the fastest?


Hi Jakub:

Could you say a bit more about what you're trying to accomplish?  The reason 
it's hard to express what you're trying to do (inspect the target of a symlink 
during a read of the symlink) is that it's not really a coherent concept in 
terms of kernel implementation.  At the point where the access control check 
on readlink is occuring, the string hasn't yet been read from the link, and 
even if it had, you couldn't look up the target object as you're already 
holding locks relating to lookup and read of the symlink itself.  Even if you 
could, there's also a risk of recursion: the symlink could point straight back 
to where you are, etc.  The readlink check is mid-lookup and triggering an 
entirely fresh lookup from there might be quite awkward for a number of such 
reasons.


In general, however, this is not an issue for the policies we've encountered 
thus far: they almost all care only about authorising path segment lookups (in 
which case readlink is just another segment in evaluation), or absolute paths 
to objects reconstructed during the actual operation on the target object, 
etc.  Hence my wondering what you're trying to accomplish -- the first 
question, really, is is what you're trying to express actually safely 
expressible in a fine-grained, multiprocessing kernel?


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Issue with 'Unknown Error: -512'

2011-07-25 Thread Robert Watson


On Mon, 18 Jul 2011, Andriy Gapon wrote:

In recent branches (confirmed with 224119) builds compiled with clang 
happen to throw 'Unknown error: -512' in a lot of places, making the system 
unusable. (Untested on gcc compiled systems). Originally I thought the 
problem was with specific programs, then I narrowed it down to file I/O, 
and now I've narrowed it down to open() with O_TRUNC. Without O_TRUNC there 
seems to be no issues whatsoever. With O_TRUNC on open() it fails with that 
'Unknown error: -512' every other time you run the program. Common issues, 
portsnap is affected, making it impossible to fetch/extract ports. As well 
as redirecting output in shells eg `echo 'hi'  test` fails every other 
try. You have the same issue with text editors like `edit` where it fails 
every other save. There are no issues with `echo 'hi'  test` as there is 
no O_TRUNC, it only seems to be an O_TRUNC error.


Any tips? Otherwise I'll be looking into this today myself.


Just a hint that you could try using DTrace syscall and fbt providers to see 
where in kernel (if in kernel) that -512 return value originates.


Jon Anderson spotted that here during some Capsicum work -- initially we were 
concerned it was a local patch, but it sounds like it might be less local.  I 
think he saw it on calls to open(2) as well, and I couldn't help but wonder 
(given its recent arrival) if it was an outcome of the change to break falloc 
into two parts, leading to some or another problematic handling of file 
descriptor numbers.  I.e., it's not so much that -512 is being returned, as a 
number that's a bad file descriptor.  (Although now having seen 512 twice on 
two different machines, that particular explanation seems less credible). 
Perhaps this is indeed unrelated to Capsicum, and triggered by a clang bug or 
something else.


I've CC'd Jon, maybe he has gained further insight since we chatted.

Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Kernel timers infrastructure

2011-07-25 Thread Robert Watson


On Mon, 25 Jul 2011, Filippo Sironi wrote:

I'm working on a university project that's based on FreeBSD and I'm 
currently hacking the kernel... but I'm a complete newbie. My question is: 
what if I have to call a certain function 10 times per second? I've seen a 
bit of code regarding callout_* functions but I can't get through them. Is 
there anyone who can help me?


Hi Filippo:

I'm not sure if you've found the callout(9) man page yet, but it talks about 
the KPI in some detail.  The basic idea, though, is that you describe a 
regular callout using a function pointer, an opaque data pointer, and how 
long until it should be invoked.  In its more complex incantations, you can 
also specify locks for it to acquire, etc.  The key aspect of the API that 
some people find confusing is that the time interval is described in ticks of 
length 1/hz seconds.  Unless software really wants one invocation per tick 
(generally unlikely), you will want to pass in some constant times/divided by 
hz so that it's appropriately scaled.


You can find two fairly straight-forward examples in kern/uipc_domain.c, which 
are respectively the fast and slow timers


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: priv_check() question

2011-07-05 Thread Robert Watson


On Sun, 3 Jul 2011, exorcistkiller wrote:

Hi! I am taking a FreeBSD course this summer and I'm doing a homework. A new 
system call uidkill() is to be added. uidkill(uid_t uid, int signum) sends 
signal specified by signum to all processes owned by uid, excluding the 
calling process itself.


I'm almost done, however I get stuck with priv_check(). If the calling 
process is trying to send signal to processes owned by others, permission 
should be denied. My implementation simply uses an if (p-p_ucred-cr_uid == 
ksi.ksi_uid) to deny it, however priv_check() is required. My question is: 
what privilege a process should have to send signal to processes owned by 
others? PRIV_SIGNAL_DIFFCRED?


The right way to think about privileges in FreeBSD is that they exempt 
subjects (usually processes) from normal access control rules -- typically as 
a result of a root uid.  The access control rules for signalling are captured 
by p_cansignal() and cr_cansignal(), depending on whether the subject is a 
process or a cached credential.  Processes have access to slightly greater 
rights than raw credentials due to additional context -- for example, 
information about parent-child relationships.  These functions then invoke 
further privilege checks if required, perhaps overriding the normal 
requirement that uids match, etc.  kill() implements a couple of broadcast 
modes for signals -- you may want to look at the implementation there to see 
how this is done.


Robert

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD I/OAT (QuickData now?) driver

2011-06-11 Thread Robert Watson


On Mon, 6 Jun 2011, grarpamp wrote:

I know we've got polling. And probably MSI-X in a couple drivers. Pretty 
sure there is still one CPU doing the interrupt work? And none of the 
multiple queue thread spreading tech exists?


Actually, with most recent 10gbps cards, and even 1gbps cards, we process 
inbound data with as many CPUs as the hardware has MSI-X enabled input and 
output queues.  So a couple understates things significantly.



   * Through PF_RING, expose the RX queues to the userland so that
the application can spawn one thread per queue hence avoid using
semaphores at all.


I'm probably a bit out of date, but last I checked, PF_RING still implied 
copying, albeit into shared memory buffers.  We support shared memory between 
the kernel and userspace for BPF and have done for quite a while.  However, 
right now a single shared memory buffer is shared for all receive queues on a 
NIC.  We have a Google summer of code student working on this actively right 
now -- my hope is that by the end of the summer we'll have a pretty functional 
system that allows different shared memory buffers to be used for different 
input queues.  In particular, applications will be able to query the set of 
queues available, detect CPU affinity for them, and bind particular shared 
memory rings to particular queues.  It's worth observing that for many types 
of high-performance analysis, BPF's packet filtering and truncation support is 
quite helpful, and if you're going to use multiple hardware threads per input 
queue anyway, you actually get a nice split this way (as long as those threads 
share L2 caches).


Luigi's work on mapping receive rings straight into userspace looks quite 
interesting, but I'm pretty behind currently, so haven't had a chance to read 
his NetMap paper.  The direct mapping of rings approach is what a number of 
high-performance FreeBSD shops have been doing for a while, but none had 
generalised it sufficiently to merge into our base stack.  I hope to see this 
happen in the next year.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sizeof(function pointer)

2011-06-05 Thread Robert Watson


On Tue, 31 May 2011, m...@freebsd.org wrote:

I am looking into potentially MFC'ing r212367 and related, that adds drains 
to sbufs.  The reason for MFC is that several pieces of new code in CURRENT 
are using the drain functionality and it would make MFCing those changes 
much easier.


The problem is that r212367 added a pointer to a drain function in the sbuf 
(it replaced a pointer to void).  The C standard doesn't guarantee that a 
void * and a function pointer have the same size, though its true on amd64, 
i386 and I believe PPC.  What I'm wondering is, though not guaranteed by the 
standard, is it *practically* true that sizeof(void *) == 
sizeof(int(*)(void)), such that an MFC won't break binary compatibility for 
any supported architecture?  (The standard does guarantee, though not in 
words, that all function pointers have the same size, since it guarantees 
that pointers to functions can be cast to other pointers to functions and 
back without changing the value).


I think you're OK for MFC purposes, but that in general, we shouldn't assume 
that they are the same size.  I.e., we should use a function pointer type 
where we mean a function pointer type, and never write code that casts a 
function pointer to a regular pointer.  (Which the change is fine with respect 
to, I believe).


I'm doing some research on an experimental architecture where certain types of 
function pointers are 256-bit.  This has some interesting consequences; we 
haven't yet gotten to investigating C language extensions/compatibility, but 
that will follow in the next year or so.  (We also have 256-bit data 
references, similar to pointers, for use in some environments, which will also 
prove interesting.  I'm not yet convinced we'll try to use a general pointer 
type for them, but perhaps instead extend the language to have a qualified 
type of some sort).


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Mount_nfs question

2011-05-31 Thread Robert Watson


On Mon, 30 May 2011, Mark Saad wrote:


 So I am stumped on this one.  I want to know what the IP of each
nfs server that is providing each nfs export. I am running 7.4-RELEASE
When I run mount -t nfs I see something like this

VIP-01:/export/source on /mnt/src
VIP-02:/export/target   on /mnt/target
VIP-01:/export/logs on /mnt/logs
VIP-02:/export/package   on /mnt/pkg

The issue is I use a load balanced nfs server , from isilon. So VIP-01 could 
be any one of a group of IPs . I am trying to track down a network 
congestion issue and I cant find a way to match the output of lsof , and 
netstat to the output of mount -t nfs . Does anyone have any ideas how I 
could track this down , is there a way to run mount and have it show the IP 
and not the name of the source server ?


Unfortunately, there's not a good answer to this question.  nfsstat(1) should 
have a mode that can iterate down active mount points displaying statistics 
and connection information for each, but doesn't.  NFS sockets generally don't 
appear in sockstat(1) either.  However, they should appear in netstat(1), so 
you can at least identify the sockets open to various NFS server IP addresses 
(especially if they are TCP mounts).


Enhancing nfsstat(1) to display more detailed information would, I think, be a 
very useful task for someone to get up to (and perhaps should appear on our 
ideas list).  Something that would be nice to have, in support of this, is a 
way for file systems to provide extended status via a system call that queries 
mountpoints, both portable information that spans file systems, and file 
system-specific data.  Morally, similar to nmount(2) but for statistics rather 
than setting things.  The easier route is to add new sysctls that dump 
per-mountpoint state directly from NFS, but given how much other information 
we'd like to export, it would be great to have a more general mechanism.


(The more adventurous can, with a fairly high degree of safety, use kgdb on 
/dev/mem (read-only) to walk the NFS stack's mount tables, but that's not much 
fun.)


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: compiler warnings (was: Re: [rfc] a few kern.mk and bsd.sys.mk related changes)

2011-05-31 Thread Robert Watson


On Tue, 31 May 2011, Alexander Best wrote:


On Mon May 30 11, Dieter BSD wrote:

Chris writes:

Ports need attention. The warnings I get there are frightening.


I find it comforting that they're just that: warnings.

How do they frighten you?


High quality code does not have any warnings.

The most frightening thing is the attitute that They're just warnings, so 
I'll ignore them.  Most compiler warnings should be fatal errors. And a 
lot of the warnings that require a -Wwhatever should be on by default.


please keep in mind that -Wfoo does reflect the ideas of the GNU people 
regarding *proper* code. the warnings themselves are sometimes wrong, 
because they complain about perfectly correct code. so -Wfoo should not be 
considered a code verifier, but in fact what it is: a warning flag. 
sometimes it's correct and indeed reports wrong code, sometimes it is 
completely wrong.


And, it's also worth remembering that warnings change over time, as the 
compiler changes.  One of the known issues building with clang is that large 
quantities of warning-free code under gcc are in fact rife with warnings 
under clang, including the gcc source code itself.  In general, my hope is 
that we can get the FreeBSD base warning-free for a useful set of warnings, 
and on the whole, this is the case.  Pretty much the entire kernel is compiled 
with quite a large number of warning classes enabled, and -Werror set, for 
example.


(One of the other tensions, of course, is the locally maintained vs externally 
maintained tension: fixing warnings in other people's code is useful only if 
you can get them to accept the fixes back -- maintaining large numbers of 
patch sets over time is not sustainable for non-trivial quantifies of code, if 
you're tracking the upstream vendor.  Ports is the worst possible case, where 
maintaining local patches is quite expensive.  In the FreeBSD base we can do 
a lot better, since we can use revision control and automatic merging to help 
us, but it's still an overhead that has to be reasoned about carefully.)


Robert___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: GSoC

2011-04-02 Thread Robert Watson


On Fri, 1 Apr 2011, Oleksandr Dudinskyi wrote:

I should like more specifically disclose my plan of action. One of the main 
tasks is find the places where registered errors, subsequently error 
analysis (their type) and separation errors related to disk and modifying 
the output format. There are different types of errors such as soft, hard, 
transport, device not ready, recoverable and other. Currently, presence the 
problem of reports and the majority error logs built as an individual files. 
Necessary changes in the kernel, which provide the emergence a database that 
processes information from several sources. The current kernel can't report 
what specific operations were errors, this further compounds the consistency 
problem. Reports of drivers errors requires a change. Systematization format 
recording of errors also is a priority,that we get and where the error 
occurred.


Hi Oleksandr:

This sounds like a potentially interesting project, but it remains a bit 
abstract to me, which makes me worry about it as a GSoC project.  Strong 
proposals typically have a well-defined and easily characterised objective 
(1-2 sentences), and 3-4 intermediate deliverables.  I worry that what you've 
described may be a bit too researchy for a summer project, but I'm willing to 
be convinced otherwise!  Could you flesh out in a bit more detail how what you 
have in mind would work: are there new daemons? system calls? will you reuse 
existing logging or error-handling infrastructure? what is the namespace for 
errors? how will it affect current operations?  We don't need perfect answers 
to these questions yet, but a slightly more worked out example might help 
resolve my concerns.


Thanks!

Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Include file search path

2011-04-02 Thread Robert Watson

On Wed, 30 Mar 2011, Warner Losh wrote:


On Mar 30, 2011, at 9:23 AM, Dimitry Andric wrote:
This is a rather nasty hack, though.  If we can make it work, we should 
probably try using --sysroot instead, or alternatively, -nostdinc and 
adding include dirs by hand.  The same for executable and library search 
paths, although I am not sure if there is a way to completely reset those 
with the current options.


I'm pretty sure that the origins of this hack pre-dates the -sysroot feature 
in gcc.  It works in -current and has for years, so nobody has cared enough 
to even contemplate changing it.


If you can make the sysroot feature work, that would be great, since that 
would allow us to skip the compiler building phase if we were building using 
external compilers.  I have some patches to make that work, but this very 
problem is what I'd worked my way up to.  It works well if you are building 
current on current, but not so well if you are mixing versions (you can mix 
architectures if you are using the xdev feature I put in a while ago, but 
even that has one or two niggles I need to iron out).


Count me as another eager consumer awaiting a nice answer to the general 
cross-compile problem.  I'm really looking for three things:


(1) A bit more intelligence from our build framework regarding not rebuilding
the toolchain quite so many times!  I'd like to be able to do a buildworld
with TARGET_ARCH with significantly improved performance.  Perhaps we can
do this already, in which case a pointer considered welcome.

(2) Working clang/LLVM cross-compile of FreeBSD.  This seems like a basic
requirement to adopt clang/LLVM, and as far as I'm aware that's not yet a
resolved issue?

(3) Making it easy to plug in, first, an external gcc easily, and second, an
external clang/LLVM.  One worrying point for me on the last one is that we
can't yet build the whole kernel with clang/LLVM, at least for i386/amd64,
so I guess you need both external gcc *and* external clang/LLVM?

We (Cambridge) are currently bringing up FreeBSD on a new soft-core 64-bit 
MIPS platform.  We're already using a non-base gcc for our boot loader work, 
and plan to move to using clang/LLVM later in the year.  The base system seems 
a bit short on detail when it comes to the above, currently.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Prebind from OpenBSD

2011-03-27 Thread Robert Watson

On Sat, 26 Mar 2011, Jesse Smith wrote:

I'm interested in working on the Port prebind from OpenBSD project 
mentioned on the FreeBSD Ideas page. ( 
http://wiki.freebsd.org/IdeasPage#head-d28cdd95ca1755d5afe63d653cb4926d4bdc99de 
)


There isn't much to go on from the project description and I'm curious what 
FreeBSD devs are looking for specifically. For example, should the entire 
ldconfig program be ported from OpenBSD (it looks like it's close enough to 
FreeBSD's to make that suitable), or should just the prebind code be merged 
into FreeBSD's ldcnfig?


Once the project is complete, who should the work be submitted to? Has 
anyone else worked on this and made any progress?


Hi Jesse:

I think the intent of the ideas list entry is more a research project than a 
direct-to-commit project: the question is whether prebinding of some sort 
would observably help performance for important FreeBSD applications or, for 
example, the boot process.  If so, then certainly the OpenBSD prebinding code 
is a possible model -- Mac OS X also has prebinding, of course, and it's done 
quite differently (and probably less reusably from our perspective as they use 
Mach-O rather than ELF); however, there might be interesting ideas as well.


I think therefore I'd structure a project along the following lines: first, 
you want to establish to what extent synchronous waiting on linkage at 
run-time is a significant problem.  It could be that some combination of hwpmc 
and DTrace would be the right tools for this.  I'd especially pay attention to 
boot time, since we know that quite a lot of executing takes place then as 
part of rc.d.  I'd also investigate large applications like Firefox, Chrome, 
KDE, Gnome, etc.  KDE already integrates prebinding tricks in its design, but 
I don't think the others do.


Next, I'd dig a bit more into the areas where it's hurting performance -- can 
you add up all the time spent waiting and cut 10 seconds from boot, or 5 
seconds from Firefox startup?  Or is the best win going to be .2 seconds in 
Firefox?  Does the OpenBSD optimisation actually address the problem we're 
experiencing?  Perhaps perform some experiments with prebinding-like 
behaviour, working up to an implementation.


It's worth remembering that prebinding comes with some baggage as well, of 
course.  Perhaps less relevant in the world of 64-bit address spaces, but 
there are some design trade-offs in this department...


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: GSoC

2011-03-27 Thread Robert Watson


On Fri, 25 Mar 2011, Dudinskyi Olexandr wrote:

My name is Dudinskyi Oleksandr. I am a student of National aviation 
university, Ukraine. I want to participate in GSoC 2011 with your 
organization.


My project: Disk device error counters, iostat –e.

I thing this project is very necessary in the FreeBSD system.  Now I make a 
plan to develop this project.


What you can say about the idea of ​​my project?  And what about the favor of 
this project?


My mentor: Andriy Gapon.


Hi Dudinskyi:

It's a little hard to tell from your description exactly what it is you are 
proposing to do.  Could you flesh out the idea some for us, so that we can 
give you feedback?  What is the nature of the problem you want to solve? 
What software changes do you anticipate making?  How will you test your 
changes?


Robert___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: CFR: FEATURE macros for AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/PMC/SYSV/...

2011-02-12 Thread Robert Watson


On Sat, 12 Feb 2011, Alexander Leidinger wrote:


On Sat, 12 Feb 2011 00:52:48 + (GMT) Robert Watson
rwat...@freebsd.org wrote:

The one comment I'd make is that the MAC case should indicate that The MAC 
Framework is supported, rather than mandatory access controls being 
present -- the presence of the framework doesn't imply the presence of 
mandatory access control policies.


Does
FEATURE(mac, Mandatory Access Control Framework support);
look better?

Alternatively/additionally we could use mac_framework as the name of the 
feature.


The above seems fine -- while I've been moving to names like mac_framework.h, 
it's still options MAC and security/mac, etc, and think that mac is the 
most consistent options.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: CFR: FEATURE macros for AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/PMC/SYSV/...

2011-02-11 Thread Robert Watson


On Fri, 11 Feb 2011, Alexander Leidinger wrote:

during the last GSoC various FEATURE macros where added to the system. 
Before committing them, I would like to get some review (like if macro is in 
the correct file, and for those FEATURES where the description was not taken 
from NOTES if the description is OK).


If nobody complains, I would like to commit this in 1-2 weeks. If you need 
more time to review, just tell me.


Here is the list of affected files (for those impatient ones which do not 
want to look at the attached patch before noticing that they are not 
interested to look at it):


The additions for security/audit and security/mac both seem reasonable; I've 
been meaning to add them myself for quite a bit.  There's then some code in 
libc that can learn to use this as well, at least for MAC.


The one comment I'd make is that the MAC case should indicate that The MAC 
Framework is supported, rather than mandatory access controls being present 
-- the presence of the framework doesn't imply the presence of mandatory 
access control policies.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: CFR: FEATURE macros for AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/PMC/SYSV/...

2011-02-11 Thread Robert Watson

On Fri, 11 Feb 2011, Ilya Bakulin wrote:

When I was beginning this GSoC work, I primarily thought about unifying the 
way to determine if particular feature exists in the kernel. Of course there 
should be at least one way to check if the feature is available or not (by 
definition: if I may use some functionality, than feature is present, 
otherwise... Oh, no, may be I have no permissions to use it? or something is 
terribly wrong with system confuguration? Or?...), but it is better to have 
a sort of unified way to get this information without looking for files in 
/dev, parsing `kldstat -v`, etc.


One of the nice things about this is that when a conditionally compiled 
feature introduces a new system call, there can be forward (rather than 
backward) compatibility benefits.  If login(1) had checked for the Audit 
feature before trying audit system calls when we introduced it in 6.x, it 
would have avoided a few people shooting their feet off in the (officially 
unsupported) case where following a kernel and userspace roll-forward, a 
kernel roll-back was required to restore stability.  While we don't support it 
(you shouldn't run a new userspace with an old kernel), the failure mode would 
have been improved.


More abstractly: for a feature like MAC, testing for the presence of the 
framework is functionally fairly different from exercise the feature, as most 
instances of exercising it work only based on modules loaded by the framework, 
which is a different goal.  Right now, libc offers a mac_present API, which 
back-ends into manually testing a system call.  I'd rather it backended into a 
common feature test framework.


In many cases, it is of course desirable to test for a feature by using it -- 
a much more pragmatic approach, and generally one preferred in the world of 
autoconf, etc...


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: ixgbe DMA question

2011-02-11 Thread Robert Watson


On Fri, 11 Feb 2011, Santosh Rao Gururajan wrote:

I have a host machine with 2 ixgbe NICs. I am trying to pass the frames from 
one NIC to the other with the lowest possible overhead to the host (high 
speed bridge). I am wondering if I can do a rx-ring to tx-ring DMA copy 
without creating a mbuf on the host. Is that possible? What are the risks?


The only real risk is the simple matter of programming, I think.  There's no 
reason not to it except that it involves modifying device drivers, memory 
models, etc.  If you do what you describe, and you decide you do want to pass 
some frames up the stack, you can always hook up mbufs and use the external 
storage free routine to return the memory to the ring.  Jeff Roberson has been 
circulating some patches that eliminate the mbuf-cluster relationship in its 
current form, instead preferring variable size mbufs, and I can't help but 
wonder if with such a patch, that wouldn't be simpler than what you propose, 
offering many of the same performance benefits while making the device driver 
changes smaller and still allowing you to direct some packets up the stack if 
desired.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Analyzing wired memory?

2011-02-08 Thread Robert Watson


On Tue, 8 Feb 2011, Alan Cox wrote:


On Tue, Feb 8, 2011 at 6:20 AM, Ivan Voras ivo...@freebsd.org wrote:

Is it possible to track by some way what kernel system, process or thread 
has wired memory? (including data exists but needs code to extract it)



No.


I'd like to analyze a system where there is a lot of memory wired but not 
accounted for in the output of vmstat -m and vmstat -z. There are no user 
processes which would lock memory themselves.


Any pointers?


Have you accounted for the buffer cache?


John and I have occasionally talked about making procstat -v work on the 
kernel; conceivably it could also export a wired page count for mappings where 
it makes sense.  Ideally procstat would drill in a bit and allow you to see 
things at least at the granularty of this page range was allocated to UMA.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Why does printf(9) hang network?

2011-02-05 Thread Robert Watson


On Sat, 5 Feb 2011, dieter...@engineer.com wrote:

Why would doing a printf(9) in a device driver (usb, firewire, probably 
others) cause an obscenely long lockout on 
/usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx)  ?


Printf(9) alone isn't the problem, adding printfs to chown(2) does not cause 
the problem, but printfs from device drivers do.


Grep says that uipc_sockbuf.c is the only file that locks/unlocks sb_sx. The 
device drivers and printf don't even know that sb_sx exists.


I can't speak to the details of your situation, but one possible explanation 
might be: printf runs at the speed of the console, which for serious consoles 
can be extremely slowly.  Device driver interrupt threads can preempt other 
threads, possibly while those threads hold locks.  That causes them to hold 
the locks for much longer, as the threads may not get rescheduled for some 
period (for example, until the device driver is done doing a printf), leading 
other threads waiting for that lock to wait significantly longer.  Especially 
the case if the other thread was spinning adaptively, in which case it will 
then yield since the holder of the lock effectively yielded.


You might try forcing all the various threads to run on different CPUs using 
cpuset and see if the variance goes down.  You can also use KTR + schedgraph 
to explore the specific scheduling going on, although be aware that KTR 
can also noticeably perturb schediling itself.


In general, things shouldn't call kernel printf in steady state operation; if 
they need to log something, they should use log(9) or similar.  printf is 
primarily a tool for printing out device probe information, and for debugging 
purposes: it is not intended to be fast.


Robert



135  int
136  sblock(struct sockbuf *sb, int flags)
137  {
138
139  KASSERT((flags  SBL_VALID) == flags,
140  (sblock: flags invalid (0x%x), flags));
141
142  if (flags  SBL_WAIT) {
143  if ((sb-sb_flags  SB_NOINTR) ||
144  (flags  SBL_NOINTR)) {
145  sx_xlock(sb-sb_sx);
146  return (0);
147  }
148  return (sx_xlock_sig(sb-sb_sx));
149  } else {
150  if (sx_try_xlock(sb-sb_sx) == 0)
151  return (EWOULDBLOCK);
152  return (0);
153  }
154  }

More info at: http://www.freebsd.org/cgi/query-pr.cgi?pr=118093


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Creating an LVM-backed FreeBSD DomU in a Linux Dom0

2010-12-28 Thread Robert Watson


On Tue, 28 Dec 2010, Avleen Vig wrote:

After searching high and low and not finding exactly what I wanted (although 
Adrian Chadd's documents came close), I decided to document a lengthy but 
worthwhile procedure:


How to install a FreeBSD DomU guest in a Linux Dom0 Xen host, from scratch, 
with LVM-backed storage (rather than file based), and without the need to 
rely on random kernels and ISO[1]


http://bit.ly/dVhfFe

Hopefully people find it useful :-)


FYI, we now have a xen(4) man page, which will ship in 8.2.  It's not tutorial 
material like your document, but is useful reference material.  I'd like it 
very much if we could get something more along the lines of what you've 
created into the FreeBSD Handbook.


Robert




I haven't yet broached configuring inside the Xen host. Again there is
scattered documentation available. I'll try to bring it together next.

[1] I gave serious thought to uploading my own stuff along with the
other similar things available already, but in the end I thought it
better if people try out how to do it, given that the amount of work
will be almost the same, or even slightly less building it yourself.
Plus there are the usual security and availability concerns.. :)

--
Avleen Vig
Systems Administrator
Personal: www.silverwraith.com
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: libkvm: consumers of kvm_getprocs for non-live kernels?

2010-11-11 Thread Robert Watson

On Wed, 10 Nov 2010, Ulrich Spörlein wrote:

I have this cleanup of libkvm sitting in my tree and it needs a little bit 
of testing, especially the function kvm_proclist, which is only called from 
kvm_deadprocs which is only called from kvm_getprocs when kd is not ALIVE.


The only consumer in our tree that I can make out is *probably* kgdb, as 
ps(1), top(1), w(1), pkill(1), fstat(1), systat(1), pmcstat(8) and bsnmpd 
don't really work on coredumps


But, the kgdb file gnu/usr.bin/binutils/gdb/kvm-fbsd.c, where kvm_getprocs 
is probably called on a dead kernel is not even used during build!


So I guess I'm staring at dead code here, any kvm people around that can 
clue me in?


Even if those tools aren't using kvm properly, they should be.  ps(1) at least 
used to work quite well on coredumps, and perhaps still does?


Stas has ongoing work on a libprocstat, you might want to give him a ping. 
I'm not sure if he plans to refactor some of those existing tools to use that 
library or not, but crashdump support is a key goal of it.


Robert___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [PATCH] Fix 'implicit declaration' warning and update vgone(9)

2010-10-29 Thread Robert Watson

On Wed, 27 Oct 2010, Benjamin Kaduk wrote:


On Wed, 27 Oct 2010, Kostik Belousov wrote:


On Wed, Oct 27, 2010 at 10:59:56AM -0400, Benjamin Kaduk wrote:
[1] The old (racy) function is osi_TryEvictVCache, here: 
http://git.openafs.org/?p=openafs.git;a=blob;f=src/afs/FBSD/osi_vcache.c;h=c2060c74f0155a610d2ea94f3c7f508e8ca4373a;hb=HEAD
The function looks very strange for much more serious reasons. Why do you 
try to manage the vnode revocation in the filesystem module at all ?


I am still becoming familiar with the AFS code, but I think this is largely 
due to a difference in the vfs structure that AFS has been using and the 
FreeBSD standard.  E.g. vop_inactive/vop_reclaim do not actually free 
filesystem-specific resources, instead keeping a free list of vcache 
entries.  So, the original authors of this AFS code were approaching the 
problem in a somewhat different way. Therefore, are somewhat-orthogonal 
pools of vcaches and vnodes (with some intersection).  If the vcaches are 
all in use in use, there is a routine which tries to shake some loose; if 
it can free up vcaches, their associated vnodes also need to be cleaned up 
in some fashion.  It may be that no additional code is actually needed to do 
this, though -- I am not sure.


I have a hazy recollection, from quite a long time ago, that OpenAFS used to 
be a bit special with regard to vnodes -- allocating its own, or something 
along those lines.  I expect it no longer does that, but it could be that it 
feels it owns the vnodes more than your average file system does, which may 
play less well with global management of vnodes.  Derrick would probably have 
more to say on this.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: addition of sysctl nodes after compile time

2010-10-21 Thread Robert Watson


On Tue, 19 Oct 2010, Alexander Best wrote:


does this limitation still exist?


Sysctls can be added dynamically using the sysctl_add_oid(9) KPI, which has 
existed (as far as I'm aware) at least since FreeBSD 4.x.  It could be that 
this KPI provides the functionality required to do what the comment describes.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Bumping MAXCPU on amd64?

2010-09-23 Thread Robert Watson


On Wed, 22 Sep 2010, Maxim Sobolev wrote:


On 9/22/2010 6:37 AM, John Baldwin wrote:
Unfortunately this can't be MFC'd to 7 as it would destroy the ABI for 
existing klds.


Ah, ok, sorry, I did only check RELENG_7. Can we make it a kernel option 
then?


In principle, yes, but MAXCPU is used to size various kernel data structures 
inspected by userspace crash post-mortem tools, etc.  I've done a bit of work 
to teach some of those tools (in particular, vmstat -z and vmstat -m) to 
extract the version of maxcpu compiled into the kernel instead just relying on 
the version of MAXCPU present when the command line tool was compiled. 
However, I think a better long-term approach here is to generally eliminate 
sizing based on MAXCPU and instead size based on the number of CPUs present. 
Certain kernel subsystems already do this (UMA, netisr, ...) but others don't 
(malloc(9), ...).  Additional hands on this project would probably help :-).


As John mentioned, the other issue is the use of fixed-width types instead of 
variable-length CPU bitmasks to name cores for IPIs, etc.  There are people 
actively working on this, but it's a non-trivial project as kernel code likes 
to do things like cpumask  othermask.  My expectation is that this problem 
will be solved in 9.0 but I don't see any obvious MFC paths for 8.x due to KBI 
issues.  It could be that this forces our hand in terms of breaking the KBI at 
some point in the 8.x series, unclear...


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: zfs + uma

2010-09-18 Thread Robert Watson


On Fri, 17 Sep 2010, Andre Oppermann wrote:

Although keeping free items around improves performance, it does consume 
memory too.  And the fact that that memory is not freed on lowmem condition 
makes the situation worse.


Interesting.  We may run into related issues with excessive mbuf (cluster) 
caching in the per-cpu buckets as well.


Having a general solutions for that is appreciated.  Maybe the size of the 
free per-cpu buckets should be specified when setting up the UMA zone.  Of 
certain frequently re-used elements we may want to cache more, other less.


I've been keeping a vague eye out for this over the last few years, and 
haven't spotted many problems in production machines I've inspected.  You can 
use the umastat tool in the tools tree to look at the distribution of memory 
over buckets (etc) in UMA manually.  It would be nice if it had some automated 
statistics on fragmentation however.  Short-lived fragmentation is likely, and 
isn't an issue, so what you want is a tool that monitors over time and reports 
on longer-lived fragmentation.


The main fragmentation issue we've had in the past has been due to 
mbuf+cluster caching, which prevented mbufs from being freed usefully in some 
cases.  Jeff's ongoing work on variable-sized mbufs would entirely eliminate 
that problem...


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Intel TurboBoost in practice

2010-07-26 Thread Robert Watson

On Sun, 25 Jul 2010, Alexander Motin wrote:

The numbers that you are showing doesn't show much difference. Have you 
tried buildworld?


If you mean relative difference -- as I have told, it's mostly because of my 
CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is enabled most 
of time if CPU is not overheated. It probably doesn't, as it works on clear 
table under air conditioner. So maximal effect I can expect on is 4.2%. In 
such situation 2.8% probably not so bad to illustrate that feature works and 
there is space for further improvements. If I had Core i5-750S I would 
expect 33% boost.


Can I recommend the use of ministat(1) and sample sizes of at least 8 runs per 
configuration?


Robert



If you mean absolute difference, here are results or four buildworld runs:
hw.acpi.cpu.cx_lowest=C1: 4654.23 sec
hw.acpi.cpu.cx_lowest=C2: 4556.37 sec
hw.acpi.cpu.cx_lowest=C2: 4570.85 sec
hw.acpi.cpu.cx_lowest=C1: 4679.83 sec
Benefit is about 2.1%. Each time results were erased and sources
pre-cached into RAM. Storage was SSD, so disk should not be an issue.

--
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: How to get stack bounds of current process?

2010-05-11 Thread Robert Watson


On Mon, 10 May 2010, Lev Serebryakov wrote:

 I'm proting some application from Linux, which discover its stack bounds by 
reading and pasing /proc/self/maps.  FreeBSD have /prov/curproc/map, but 
I can not find how to determine which record is for stack (I've looked into 
implementation of proc_fs, but it doesn't contain any specail processing for 
process stack).


 How could I determine stack bounds of current process on FreeBSD 7/8/9?


The procstat -v command in 8.x and 9.x will give this information based on 
sysctls; we're about to integrate a libprocstat(3) library which will provide 
a public API for this information.  I'd agree with Kostik that you should 
think carefully about whether the application really needs this information 
:-).


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: make pkg_install suite reusable, please

2010-04-11 Thread Robert Watson

On Fri, 9 Apr 2010, Alexander Churanov wrote:


2010/4/9 Leinier Cruz Salfran salfrancl.lis...@gmail.com

i want to ask you one thing: can you make the 'pkg_install' suite reusable 
.. means install 'libinstall.a' as a shared object in order to make it 
reusable by others devs


I'd like to add my 50 cents. From my point of view, the true UNIX way is 
re-using whole programs. This provides unbelievable isolation and 
correctness. If you don't want to fork myriads of processes each second, 
then, it's, probably, better to ask for pipe mode of pkg_* tools. For 
example, aspell works that way. You start a process, write commands and 
queries and read results.


While there are clearly benefits to process isolation, there are countless 
situations in UNIX where I've said to myself Oh, I wish I had a libfoo not 
just a foo command.  This is particularly the case for monitoring tools, 
where third-party applications have a lot of trouble parsing and tracking the 
output of tools like ps(1), etc.  This is why recently we've been working on 
libmemstat(3), libprocstat(3), libnetstat(3), etc -- so that tools can avoid 
rewriting that code as well as avoid the parsing problem.  So I have no 
particular opinion on this tool, but I will say that in general, it would be 
nice if programs were often thin wrappers around a library that could be 
reused, not just command line tools.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: make pkg_install suite reusable, please

2010-04-11 Thread Robert Watson

On Fri, 9 Apr 2010, Charlie Kester wrote:

It was a watershed moment in my programming career when I realized that the 
bubbles on those DFD charts we used to use for structured design could be 
whole processes and not just functions in a single, monolithic program. 
Suddenly everything the structured design folks were saying about re-use, 
encapsulation, loose coupling, module cohesion, etc. made a lot more sense 
when viewed from the perspective of simple Unix utilities communicating with 
plain text via pipes. We should encourage that approach as a default, and 
only put things into binary libraries when forced to by performance 
considerations.


Per my e-mail, I'm not sure I entirely agree with this view, although for 
certain types of scripting and programming it makes a lot of sense.  What was 
always missing from this model is a structured way to pass complex data 
between components: streams of one-line ASCII strings work fine, but when you 
want to pass data structures, you end up replicating code to generate and 
parse data between components.  Maybe XML is an answer to this, but more 
likely it's not :-).


There's also the issue of plugging and types: if you support complex types, 
why not have type checking on the plugs?  For example, gzcat | tar -xf - 
only for certain file types: wouldn't it be nice if type information, as well 
as byte streams, were passed around and you could do static checking, or even 
negotiation.  But it would be nice to get a clear typing error instead of 
garbage.


This is, BTW, what windowing systems do for copy-and-paste: when you copy from 
one program and paste to another, the two programmes negotiate an appropriate 
intermediate format: if the target doesn't support rich text, then it needs to 
be generated as plain text by the source, etc.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: GSoC 2010

2010-04-11 Thread Robert Watson

On Sat, 10 Apr 2010, jax wrote:

I am Igor Druzhinin and I want to participate in GSoC 2010 in FreeBSD 
project. I want to propose to completely realise fast syscalls support for 
FreeBSD on x86 platform. I have already submited my proposal few days ago on 
GSoC site and tried to contact with possible mentors from technical contacts 
list. But they they still have not answered me. So I have decided to try 
here. What do you think about this proposal? Is it still actual or not? If 
so, who can be my mentor?


Hi Igor--

Due to the volume of proposals, it can take some time to get through them all, 
and the last few weeks have been a bit rife with holidays around the world 
which has slowed down answers to some questions/pings.  I see your proposal in 
the set, and I'll point some appropriate potential mentors at it this week.


Thanks for your proposal!

Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Another tool for updating /etc -- lua||other script language bikeshed

2010-03-26 Thread Robert Watson

On Fri, 26 Mar 2010, per...@pluto.rain.com wrote:


Robert Watson rwat...@freebsd.org wrote:

... web browsers [are] basically operating systems at this point ...


Isn't this a bit of an exaggeration?  Not too many browsers have to deal 
with process/thread scheduling, or device drivers, or booting, or file 
system issues -- they rely on the OS for that (as does any other 
application).


I think it's more of an anaology than an exageration.  The FreeBSD kernel, 
including device drivers and architectures, is around 3.9 million lines of 
code.  Google's Chromium, including WebKit, is around 4.1 million lines of 
code.  Both provide an extensive runtime environment for applications that run 
on top of them, security domains, storage services, and management models.


I'm not arguing that web browsers are a substitute for our current operating 
system layer: they clearly build on it.  However, in terms of their goals in 
providing an execution environment, user interface, etc, they fill a very 
similar niche by being a general-purpose platform for many specific things.


And, to get back to the point I was making: if you toast your Chromium update 
or get configuration management wrong, then your applications (Google Docs, 
GMail, ...) on ChromeOS won't work any more than if you toasted your /lib or 
/etc in FreeBSD.  For example, if the Chromium configuration files change and 
it forgets about web proxies, Chromium won't be able to call home to pick up a 
fix any more than if etcmerge toasts resolv.conf.


Making updates easy is, to a large extent, about avoiding the creation of 
foot-shooting opportunities.  Some of it is about tools (binary updates, 
mergers, rollbac, etc), but most of it is about avoiding scenarios in which a 
previously valid configuration becomes invalid.  And if we look at problems 
FreeBSD has had with updates in a past, a lot come down to precisely that: for 
example, renaming serial port device names (several times in as many years).


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Another tool for updating /etc -- lua||other script language bikeshed

2010-03-25 Thread Robert Watson


On Wed, 24 Mar 2010, Ivan Voras wrote:

Wouldn't it be nice to have a blessed (i.e. present-in-base) script 
language interpreter with a syntax that has evolved since the 1970-ies? 
(with a side-glance to C that *has* evolved since the KR style).

...
As a possible alternative, or at least to learn about others' opinion on the 
subject, I'd like to suggest Lua (http://www.lua.org/).


I think there are lots of good arguments for Lua in the base, but that 
etcmerge is definitely not one of them :-).  An important goals for a tool 
like etcmerge is a minimal dependency footprint, so that you can use it with 
all the existing versions of FreeBSD floating around and upgrade to new 
versions.  None of those existing versions have lua.  Good arguments for lua 
in the base might include:


- Moving to Lua as the scripting language for the boot loader
- Improving scripting capabilities in the installer

etcmerge sounds very exciting, especially for shops that want a more automated 
upgrade path.  It's easy to upgrade web browsers, and they're basically 
operating systems at this point, so it would be nice if we could offer FreeBSD 
upgrades with similar ease.


Quite a bit of our automated configuration update problem comes down to 
configuration file formats and the way diff/patch perform merges.  Consider 
files like inetd.conf, master.passwd, group, etc: they essentially ensure that 
there will be a conflict if you have any local changes and the vendor (us) 
makes an upstream change.  We used to have this problem with /etc/rc and 
/etc/rc.local, but rc.d has basically eliminated the problem by allowing 
boot-time custtomization through file insertion rather than file changes.


Choices made in the configuration design for launchd, xinetd, and others avoid 
this mistake.  Perhaps we shold be considering similar sorts of redesigns, 
focusing on how configuration files could be reworked to maximize automated 
update support.  Where there's a true semantic conflict, an update conflict 
requiring resolution is fine, but where there's no semantic conflict (i.e., we 
add _anotheruser to the base master.passwd), no upgrade conflict should arise.


(And definitely keeping this mind as we add new configuration files)

Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


RE: mac_mls mac_biba mac_lomac patches to fix ptys_equal mib support for new /dev/pts in FreeBSD 8

2010-03-06 Thread Robert Watson


On Tue, 2 Mar 2010, Selphie Keller wrote:


- (2) Could you let me know how your login.conf + user labels are
configured, and show me the output of ps -axZ | grep sshd?

/etc/login.conf label configurations I use

Staff users: label=mls/2(low-high)
Deamons: label=mls/equal(equal-equal)
Insecure users: label=mls/low(low-low)

If you need the exact data from login.conf I can provide it, but is a bit 
tricky as I use tc= to call from one class to another class and override, in 
which default class is mls/low.


Am I right in thinking that you have security.mac.biba.revocation_enabled 
and/or security.mac.mls.revocation_enabled set?  Revocation being enabled 
might explain why you're seeing this issue, but other users aren't reporting 
problems.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Automated kernel crash reporting system

2010-03-05 Thread Robert Watson


On Thu, 4 Mar 2010, sean connolly wrote:

Automatic reporting would end up being a mess given that panics can be 
caused by hardware problems. Having an autoreport check if memtest was run 
before it reports, or having it only run with -CURRENTmight be useful.


Hi Sean, Dan, et al:

I'm not sure I agree with this view.  For releases, it's true that many 
reported panics are a result of bad hardware.  However, on active development 
branches, especially -CURRENT, that's not the case.  An automated scheme to 
track bug reports and find common themes could be incredibly valuable in the 
development environment.


And, to be honest, even if a fair number of reports are due to hardware 
failures, these often have common themes themselves, so it would be quite 
educational to be able to reason about panics on a large scale.  Not to 
mention using it to identify potentially flakey hardware that users could then 
be warned about :-).



Collecting crash reports is widespread in industry for both operating systems 
and applications for these reasons.  Certainly, the crashinfo summary gathered 
on recent FreeBSD versions is an excellent starting point for building such a 
system.  If we were to move ahead with it, we'd need to pay very close 
attention to scrubbing potentially sensitive information from panic reports, 
however.


Robert




Sean





From: jhell jh...@dataix.net
To: Dan Naumov dan.nau...@gmail.com
Cc: FreeBSD Hackers freebsd-hackers@freebsd.org; freebsd-questi...@freebsd.org
Sent: Thu, March 4, 2010 8:06:50 AM
Subject: Re: Automated kernel crash reporting system


On Thu, 4 Mar 2010 07:09, dan.naumov@ wrote:

Hello

I noticed the following on the FreeBSD website:
http://www.freebsd.org/projects/ideas/ideas.html#p-autoreport Has
there been any progress/work done on the automated kernel crash
reporting system? The current ways of enabling and gathering the
information required by developers for investigating panics and
similar issues are unintuitive and user-hostile to say the least and
anything to automate the process would be a very welcome addition.


- Sincerely,
Dan Naumov



Hi Dan,

I am assuming that the output of crashinfo_enable=YES is not what you
are talking about is it ? are you aware of it ?

The info contained in the crashinfo.txt.N is pretty informative for
developers, maybe your talking about another way of submitting it ?

Regards,

--

 jhell

___
freebsd-questi...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org




___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: mac_mls mac_biba mac_lomac patches to fix ptys_equal mib support for new /dev/pts in FreeBSD 8

2010-03-02 Thread Robert Watson


On Mon, 1 Mar 2010, Estella Mystagic wrote:

Found issues with sysctl mibs security.mac.biba.ptys_equal, 
security.mac.lomac.ptys_equal, security.mac.mls.ptys_equal, not supporting 
new /dev/pts terminal system in FreeBSD 8, proposed fix for issue.


When using a higher security grade/clearance with mac_mls it prevents 
writing to the /dev/pts/5 as its set as mls/low and subjects may not write 
to objects with a lower classification level than its own clearance level.


Feb 25 21:42:16 labyrinth sshd[30965]: error: /dev/pts/5: Permission denied

Feb 25 21:42:16 labyrinth sshd[30965]: error: open /dev/tty failed - could 
not set controlling tty: Permission denied


Hi Selphie:

Thanks for this patch.  I'll go ahead and merge it, but had two questions:

(1) It looks like you didn't need to set any special label on /dev/ptmx
itself?

(2) Could you let me know how your login.conf + user labels are configured,
and show me the output of ps -axZ | grep sshd?

We need to rethink how we deal with ttys anyway, and I'd like to understand 
how the specific case you're running into comes about.


Robert N M Watson
Computer Laboratory
University of Cambridge

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: mac_mls mac_biba mac_lomac patches to fix ptys_equal mib support for new /dev/pts in FreeBSD 8

2010-03-02 Thread Robert Watson

On Tue, 2 Mar 2010, Robert Watson wrote:


Thanks for this patch.  I'll go ahead and merge it, but had two questions:


Committed as r204581, thanks!

Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: unix socket: race on close?

2010-02-18 Thread Robert Watson

On Thu, 18 Feb 2010, Mikolaj Golub wrote:

Below is a simple test code with unix sockets: the client does 
connect()/close() in loop and the server -- accept()/close().


Sometimes close() fails with 'Socket is not connected' error:


Hi Mikolaj:

Thanks for this report, and sorry about not spotting your earlier post to 
freebsd-net.  I've been fairly preoccupied the last month and not keeping up 
with the mailing lists.  Could I ask you to file a PR on this, and forward me 
the PR number so I can claim ownership?  This should prevent it from getting 
lost while I catch up.


In short, your evaluation seems reasonable to me -- have you tried tweaking 
soclose() to ignore ENOTCONN from sodisconnect() to confirm this diagnosis 
fixes all the instances you've been seeing?


Robert N M Watson
Computer Laboratory
University of Cambridge




a.out: parent: close error: 57

or

a.out: child: close error: 57

It looks for me like some race in close(). Looking at uipc_socket.c:soclose():

int
soclose(struct socket *so)
{
   int error = 0;

   KASSERT(!(so-so_state  SS_NOFDREF), (soclose: SS_NOFDREF on enter));

   CURVNET_SET(so-so_vnet);
   funsetown(so-so_sigio);
   if (so-so_state  SS_ISCONNECTED) {
   if ((so-so_state  SS_ISDISCONNECTING) == 0) {
   error = sodisconnect(so);
   if (error)
   goto drop;
   }

Isn't the problem here? so_state is checked for SS_ISCONNECTED and
SS_ISDISCONNECTING without locking and then sodisconnect() is called, which
closes both sockets of the connection. So it looks for me that if the close()
is called for both ends simultaneously it is possible that sodisconnect() will
be called for both ends and for one ENOTCONN will be returned. Or may I have
missed something?

We have been observing periodically ENOTCONN errors on unix socket close in
our applications, so it is not just curiosity :-) (I posted about our problem
to freebsd-net@ some time ago but then did not attract any attention
http://lists.freebsd.org/pipermail/freebsd-net/2009-December/024047.html).

#include sys/types.h
#include sys/socket.h
#include sys/un.h
#include netinet/in.h
#include arpa/inet.h
#include errno.h
#include fcntl.h
#include stdio.h
#include strings.h
#include string.h
#include unistd.h
#include sys/select.h
#include err.h

#define UNIXSTR_PATH /tmp/mytest.socket
#define USLEEP  100

int main(int argc, char **argv)
{
int listenfd, connfd, pid;
struct sockaddr_un  servaddr;

pid = fork();
if (-1 == pid)
errx(1, fork(): %d, errno);

if (0 != pid) { /* parent */

if ((listenfd = socket(AF_LOCAL, SOCK_STREAM, 0))  0)
errx(1, parent: socket error: %d, errno);

unlink(UNIXSTR_PATH);
bzero(servaddr, sizeof(servaddr));
servaddr.sun_family = AF_LOCAL;
strcpy(servaddr.sun_path, UNIXSTR_PATH);

if (bind(listenfd, (struct sockaddr *) servaddr, 
sizeof(servaddr))  0)
errx(1, parent: bind error: %d, errno);

if (listen(listenfd, 1024)  0)
errx(1, parent: listen error: %d, errno);

for ( ; ; ) {
if ((connfd = accept(listenfd, (struct sockaddr *) NULL, 
NULL))  0)
errx(1, parent: accept error: %d, errno);

//usleep(USLEEP / 2); // (I) uncomment this or (II) 
below to avoid the race

if (close(connfd)  0)
errx(1, parent: close error: %d, errno);
}

} else { /* child */

sleep(1); /* give the parent some time to create the socket */

for ( ; ; ) {

if ((connfd = socket(AF_LOCAL, SOCK_STREAM, 0))  0)
errx(1, child: socket error: %d, errno);

bzero(servaddr, sizeof(servaddr));
servaddr.sun_family = AF_LOCAL;
strcpy(servaddr.sun_path, UNIXSTR_PATH);

if (connect(connfd, (struct sockaddr *) servaddr, 
sizeof(servaddr))  0)
errx(1, child: connect error %d, errno);

// usleep(USLEEP); // (II) uncomment this or (I) above 
to avoid the race

if (close(connfd) != 0)
errx(1, child: close error: %d, errno);

usleep(USLEEP);
}
}

return 0;
}

--
Mikolaj Golub
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___

Re: PFIL: how to get tcp/ip fields from mbuf

2010-02-01 Thread Robert Watson


On Mon, 1 Feb 2010, Lukasz Jaroszewski wrote:

I am wondering about most elegant and proper way to get IP header fields 
from mbuf, using PFILs. I have read Murat Balaban paper on PFIL_HOOKS where 
I found some example function. Question is how can I access IP header field 
in such manner.


The best reference here is probably firewall source code that already exists 
in the tree.  For IP-layer hooks, you'll need to use the m_pullup() call to 
ensure the bytes you want are contiguously stored, and then mtod() to cast the 
mbuf pointer appropriately.  Although I notice ipfw, at least, doesn't call 
m_pullup() for the base header, as it assumes the calling context will already 
have arranged for it to be contiguous:


static int
ipfw_check_hook(void *arg, struct mbuf **m0, struct ifnet *ifp, int dir,
struct inpcb *inp)
{
...
   if (mtod(*m0, struct ip *)-ip_v == 4)
ret = ip_dn_io_ptr(m0, dir, args);
...

Robert



static int
hisar_chkinput(void *arg, struct mbuf **m, struct ifnet *ifp, int dir,
struct inpcb *inp)
{
   in_bytes += (*m)-m_len;
   return 0;
}
Regards
LVJ.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Contribution to FreeBSD network stack

2010-01-31 Thread Robert Watson


On Sun, 31 Jan 2010, shashidhara none wrote:

I am interested to contribute to FreeBSD network stack. I found some 
projects at http://wiki.freebsd.org/Networking . But could not figure out 
how to start working on the same. Please help.


Hi Shashi--

The FreeBSD network stack is a very large piece of code, and there are lots of 
opportunities to get involved helping to measure and improve its behavior, add 
new features, etc.  Could you say a bit more about your background -- have you 
done much kernel programming and/or network stack programming before?


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Strange network issue in freebsd 8

2010-01-23 Thread Robert Watson


On Sat, 23 Jan 2010, Sherin George wrote:

i am facing some sort of strange network issue in a freebsd server 
occasionally.


OS: FreeBSD 8.0-RELEASE - amd64

The servers loses network connection once in a few days. I logged into 
console and verified that network is up. I even restarted network service 
using following command.


I'd suggest sending this e-mail to freebsd-net; there have been significant 
link layer changes in 8.0, and it's possible this is a side effect (and bug) 
from that.  The appropriate people will pick it up on that list.


Also, I notice you're running 8.0-RELEASE, rather than the latest patch level 
(which included some important security impovements and stability 
improvements); you may want to slide forward using freebsd-update or a manual 
rebuild.  You will need to reboot to pick up some of the improvements.


Robert N M Watson
Computer Laboratory
University of Cambridge



/etc/rc.d/netif restart

Still, it didn't fix.

I checked /var/log/messages, but I am not getting any clue.

==
Jan 19 12:10:20 myserver kernel: GEOM_MIRROR: Device gm0: rebuilding
provider ad0 finished.
Jan 19 20:20:23 myserver nfsd[732]: select failed: Interrupted system call
Jan 19 20:21:07 myserver nfsd[732]: select failed: Interrupted system call
Jan 23 02:14:33 myserver login: ROOT LOGIN (root) ON ttyv0
Jan 23 02:19:51 myserver kernel: ifa_del_loopback_route: deletion failed
Jan 23 02:19:57 myserver kernel: em0: link state changed to DOWN
Jan 23 02:20:02 myserver kernel: em0: link state changed to UP
Jan 23 02:29:58 myserver reboot: rebooted by root
Jan 23 02:29:58 myserver syslogd: exiting on signal 15
Jan 23 02:31:31 myserver syslogd: kernel boot file is /boot/kernel/kernel
Jan 23 02:31:31 myserver kernel: Copyright (c) 1992-2009 The FreeBSD
Project.
Jan 23 02:31:31 myserver kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988,
1989, 1991, 1992, 1993, 1994
Jan 23 02:31:31 myserver kernel: The Regents of the University of
California. All rights reserved.
Jan 23 02:31:31 myserver kernel: FreeBSD is a registered trademark of The
FreeBSD Foundation.
Jan 23 02:31:31 myserver kernel: FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:02:08
UTC 2009
Jan 23 02:31:31 myserver kernel: r...@mason.cse.buffalo.edu:
/usr/obj/usr/src/sys/GENERIC
Jan 23 02:31:31 myserver kernel: Timecounter i8254 frequency 1193182 Hz
quality 0
==

Network, TCP stack all were up. It was pinging gateway even. But, traceroute
was not going beyond gateway.

I believe the issue is not related to anything outside server since a reboot
always fixes the issue.

I will be grateful for any advise that can help me in troubleshooting this
problem.

--
Best Regards,
Sherin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: yarrow random generator

2009-12-25 Thread Robert Watson


On Thu, 24 Dec 2009, RW wrote:

And also according to Schneier it is a good idea to save state of the PRNG 
and restore it on boot to make it more seeded.


In the default configuration, we save some PRNG output every few minutes 
(using cron) to a file in /var so that it can be re-injected into Yarrow on 
the next boot (done by /etc/rc.d/random).


It isn't handled very well though. The files saved by crontab under /var are 
loaded a bit late in the boot sequence - after encrypted swap.


The main entropy file is loaded earlier, but immediatly after ps -fauxww, 
sysctl -a, etc are dumped into the device, saturating its 4K of buffer 
space.


I can't speak to the specific /dev/random design choices here, but I can say 
that there is a more general issue with swap being required to get to the 
point where you reliably have writable file system access.  This is because 
fsck can be quite memory-heavy, and so swap is started before fsck is started. 
It could well be that the arrival of proper UFS journaling support in the 
immediate future allows more agressive reordering of the boot process so that 
writable file systems can be assumed much earlier.


I'll point Mark Murray at this thread and see if we can get him to opine some 
on the current design choices and any potential changes to address them.  I 
was interested by your observation that the boot-time dumping of bits into 
/dev/random may overflow the buffering -- indeed, it looks like the 
rate-controlling in effect for other entropy sources may not be appropriate 
for /dev/random.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: yarrow random generator

2009-12-24 Thread Robert Watson

On Thu, 24 Dec 2009, Paul Graphov wrote:

And also according to Schneier it is a good idea to save state of the PRNG 
and restore it on boot to make it more seeded.


In the default configuration, we save some PRNG output every few minutes 
(using cron) to a file in /var so that it can be re-injected into Yarrow on 
the next boot (done by /etc/rc.d/random).


Robert N M Watson
Computer Laboratory
University of Cambridge



2009/12/24 Colin Percival cperc...@freebsd.org


Hi all,

Looks like there's a bug here, but it doesn't matter since this is dead
code: .seeded is initialized to 1 and never modified, so we will never
call into random_yarrow_block.

IIRC this is because there are some places which ask for entropy before
yarrow is seeded but don't actually need *cryptographic* entropy.


Thu, Dec 24, 2009 at 03:45:15PM +0300, Paul Graphov wrote:

I've looked at FreeBSD 8.0 cryptographically secure pseudorandom
numbers generator and have a question. It looks like a bug but I'am
not sure.

In file sys/dev/randomdev.c, function random_read:

if (!random_systat.seeded)
error = (*random_systat.block)(flag);

It blocks until PRNG is seeded. For software random generator

implementation

block method looks as follows, sys/dev/randomdev_soft.c:

random_yarrow_block(int flag)
{
int error = 0;

mtx_lock(random_reseed_mtx);

/* Blocking logic */
while (random_systat.seeded  !error) {
if (flag  O_NONBLOCK)
error = EWOULDBLOCK;
else {
printf(Entropy device is blocking.\n);
error = msleep(random_systat,
random_reseed_mtx,
PUSER | PCATCH, block, 0);
}
}
mtx_unlock(random_reseed_mtx);

return error;
}

It seems that random_systat.seeded in while condition should be

negated.

Or it will never block actually, or block erroneously until next reseed
(under very rare
conditions)


--
Colin Percival
Security Officer, FreeBSD | freebsd.org | The power to serve
Founder / author, Tarsnap | tarsnap.com | Online backups for the truly
paranoid


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: 8.0-RELEASE-p1 Panic panic: sbdrop

2009-12-16 Thread Robert Watson

On Tue, 15 Dec 2009, Linda Messerschmidt wrote:


This is a new one on me:


Hi Linda--

Unfortunately, this has historically been a tricky panic to debug, as it's 
associated with a sanity check that picks up kernel memory corruption that may 
have occurred at a much earlier time.  Without a crashdump, we won't get much 
further.  However, let's see what we can do, perhaps trying to find some 
common configuration element with another past report of the same diagnostic. 
FYI, typically this panic occurs as a result of a concurrency bug in a device 
driver, although it can be a symptom of a more general network stack bug.


Could you tell us a bit more about the network configuration -- especially, 
are you using any tunneling software (such as ipsec), netgraph, or other less 
commonly used network features?  Are you using accept filters?


Robert N M Watson
Computer Laboratory
University of Cambridge



panic: sbdrop
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x182
sbdrop_internal() at sbdrop_internal+0x323
soisdisconnected() at soisdisconnected+0xbe
tcp_close() at tcp_close+0x45
tcp_do_segment() at tcp_do_segment+0x122f
tcp_input() at tcp_input+0xc92
ip_input() at ip_input+0xac
netisr_dispatch_src() at netisr_dispatch_src+0x7e
ether_demux() at ether_demux+0x15d
ether_input() at ether_input+0x17b
em_rxeof() at em_rxeof+0x287
em_handle_rxtx() at em_handle_rxtx+0x2f
taskqueue_run() at taskqueue_run+0x93
taskqueue_thread_loop() at taskqueue_thread_loop+0x46
fork_exit() at fork_exit+0x118
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff8000117d30, rbp = 0 ---

This machine runs squid as a reverse proxy, and this has happened a
couple of times now in the past day.

Unfortunately it's a production machine, so we'll have to go back to
7.2.  I can probably leave it as-is for 24 hours or so if anybody
wants me to check something, but it doesn't have a dump or a debug
kernel and I unfortunately can't put it back in production to provoke
another crash.  :(   But I wanted to at least report this before we
did in case it's useful to anyone.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Superpages on amd64 FreeBSD 7.2-STABLE

2009-12-12 Thread Robert Watson


On Thu, 10 Dec 2009, Nate Eldredge wrote:

What about using posix_spawn(3)?  This is implemented in terms of vfork(), 
so you'll gain the same performance advantages, but it avoids many of 
vfork's pitfalls.  Also, since it's a POSIX standard function, you needn't 
worry that it will go away or change its semantics someday.


Just as a note here: while we do posix_spawn(3) as a library function, Mac OS 
X does it as a system call.  As a result, they can implement certain spawn 
flags that we can't, among others, the ability to have the newly created 
process/image be suspended before its first instruction executes.  This would 
be very useful when debugging the runtime linker, among other things.  On the 
other hand, it's quite a complex kernel code path...


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: UNIX domain sockets on nullfs still broken?

2009-12-10 Thread Robert Watson

On Mon, 30 Nov 2009, xorquew...@googlemail.com wrote:

jackd (audio/jack) creates a directory in /tmp with a UNIX domain socket in 
it. Clients connect to this socket to communicate with the server.


We currently support the sharing of UNIX domain sockets between file system 
layers on either nullfs or unionfs.  In the former case, this is a bug, and in 
the latter case, it is a feature.


The specific nature of the bug is that you can't just copy the socket pointer 
between layers in the vnode stack without additional reference counting (and 
other similar state propagation), so if we allowed inter-layer access it would 
lead to use-after-free panics and similar sorts of problems.


This occurs, BTW, because the socket pointer is directly in struct vnode, and 
not queried by a VOP, which could be forwarded by nullfs down a layer.  The 
fixes here aren't easy, so I would anticipate UNIX domain sockets not working 
across nullfs layers for some time to come.  It's not immediately clear to me 
which approach is the best way to fix it, since it likely requires UNIX domain 
sockets to learn about stacked file systems in some form, which will 
significantly complicate an already complicated relationship.


Robert N M Watson
Computer Laboratory
University of Cambridge



$ jackd -d oss -r 44100 -p 128
$ ls -alF /tmp/jack-11001/default
total 4
drwx--  2 xw  wheel  512 30 Nov 14:19 ./
drwx--  3 xw  wheel  512 30 Nov 14:19 ../
prw-r--r--  1 xw  wheel0 30 Nov 14:19 jack-ack-fifo-54211-0|
prw-r--r--  1 xw  wheel0 30 Nov 14:19 jack-ack-fifo-54211-1|
prw-r--r--  1 xw  wheel0 30 Nov 14:19 jack-ack-fifo-54211-2|
srwxr-xr-x  1 xw  wheel0 30 Nov 14:19 jack_0=
srwxr-xr-x  1 xw  wheel0 30 Nov 14:19 jack_ack_0=

$ sudo mount_nullfs /tmp/ /jail/k4m/tmp

In the jail:

k4m$ ls -alF /tmp/jack-11001/default
drwx--  2 xw  wheel  512 30 Nov 14:19 ./
drwx--  3 xw  wheel  512 30 Nov 14:19 ../
prw-r--r--  1 xw  wheel0 30 Nov 14:19 jack-ack-fifo-54211-0|
prw-r--r--  1 xw  wheel0 30 Nov 14:19 jack-ack-fifo-54211-1|
prw-r--r--  1 xw  wheel0 30 Nov 14:19 jack-ack-fifo-54211-2|
srwxr-xr-x  1 xw  wheel0 30 Nov 14:19 jack_0=
srwxr-xr-x  1 xw  wheel0 30 Nov 14:19 jack_ack_0=

k4m$ ktrace jack_showtime
jack server not running?

k4m$ kdump | grep '/tmp/jack-11001'
76030 initial thread STRU  struct sockaddr { AF_LOCAL, 
/tmp/jack-11001/default/jack_0 }
76030 initial thread NAMI  /tmp/jack-11001/default/jack_0
76030 initial thread RET   connect -1 errno 61 Connection refused

$ uname -a
FreeBSD viper.internal.network 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 
15:02:08 UTC 2009 r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  
amd64

xw
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: UNIX domain sockets on nullfs still broken?

2009-12-10 Thread Robert Watson

On Mon, 30 Nov 2009, Ivan Voras wrote:

What's the sane solution, then, when the only method of communication is 
unix domain sockets?


It is a security problem. I think the long-term solution would be to add a 
sysctl analogous to security.jail.param.securelevel to handle this.


I don't think there is a workaround right now.


I'm not sure I agree on the above, hence my comments about nullfs and unionfs. 
I see nullfs as intended to provide references (possibly masked to read-only) 
to the same fundamental object, and unionfs to provide independence between 
different consumers that see objects via different file system mounts.  As 
such, I'd expect UNIX domain sockets to work for inter-jail communication 
when using nullfs, and not work when using unionfs.  It's simply a property 
of the implementation of the linkage between VFS and UNIX domain sockets that 
they are currently both broken (in fact, someone tried to fix it with union 
mounts recenty, running into the use-after-free bugs I mentioned, but also 
breaking the semantics in my view).


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: UNIX domain sockets on nullfs still broken?

2009-12-10 Thread Robert Watson


On Tue, 1 Dec 2009, Linda Messerschmidt wrote:


On Mon, Nov 30, 2009 at 10:14 AM, Ivan Voras ivo...@freebsd.org wrote:

What's the sane solution, then, when the only method of communication
is unix domain sockets?


It is a security problem. I think the long-term solution would be to add a
sysctl analogous to security.jail.param.securelevel to handle this.


Out of curiosity, why is allowing accessing to a Unix domain socket in a 
filesystem to which a jail has explicitly been allowed access more or less 
secure than allowing access to a file or a devfs node in a filesystem to 
which a jail has explicitly been allowed access?


(I seem to have caught this thread rather late in the game due to being on 
travel) -- Ivan is wrong about nullfs, it's broken due to a bug, not a 
feature, and that bug is not present when using a single file system.  He's 
thinking of unionfs semantics, where if it worked it would be a bug.  :-)


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: UNIX domain sockets on nullfs still broken?

2009-12-10 Thread Robert Watson

On Thu, 10 Dec 2009, Robert Watson wrote:


On Mon, 30 Nov 2009, xorquew...@googlemail.com wrote:

jackd (audio/jack) creates a directory in /tmp with a UNIX domain socket in 
it. Clients connect to this socket to communicate with the server.


We currently support the sharing of UNIX domain sockets between file system
layers on either nullfs or unionfs.  In the former case, this is a bug, and


Should read neither ... nor.

Robert N M Watson
Computer Laboratory
University of Cambridge



in the latter case, it is a feature.

The specific nature of the bug is that you can't just copy the socket pointer 
between layers in the vnode stack without additional reference counting (and 
other similar state propagation), so if we allowed inter-layer access it 
would lead to use-after-free panics and similar sorts of problems.


This occurs, BTW, because the socket pointer is directly in struct vnode, and 
not queried by a VOP, which could be forwarded by nullfs down a layer.  The 
fixes here aren't easy, so I would anticipate UNIX domain sockets not working 
across nullfs layers for some time to come.  It's not immediately clear to me 
which approach is the best way to fix it, since it likely requires UNIX 
domain sockets to learn about stacked file systems in some form, which will 
significantly complicate an already complicated relationship.


Robert N M Watson
Computer Laboratory
University of Cambridge



$ jackd -d oss -r 44100 -p 128
$ ls -alF /tmp/jack-11001/default
total 4
drwx--  2 xw  wheel  512 30 Nov 14:19 ./
drwx--  3 xw  wheel  512 30 Nov 14:19 ../
prw-r--r--  1 xw  wheel0 30 Nov 14:19 jack-ack-fifo-54211-0|
prw-r--r--  1 xw  wheel0 30 Nov 14:19 jack-ack-fifo-54211-1|
prw-r--r--  1 xw  wheel0 30 Nov 14:19 jack-ack-fifo-54211-2|
srwxr-xr-x  1 xw  wheel0 30 Nov 14:19 jack_0=
srwxr-xr-x  1 xw  wheel0 30 Nov 14:19 jack_ack_0=

$ sudo mount_nullfs /tmp/ /jail/k4m/tmp

In the jail:

k4m$ ls -alF /tmp/jack-11001/default
drwx--  2 xw  wheel  512 30 Nov 14:19 ./
drwx--  3 xw  wheel  512 30 Nov 14:19 ../
prw-r--r--  1 xw  wheel0 30 Nov 14:19 jack-ack-fifo-54211-0|
prw-r--r--  1 xw  wheel0 30 Nov 14:19 jack-ack-fifo-54211-1|
prw-r--r--  1 xw  wheel0 30 Nov 14:19 jack-ack-fifo-54211-2|
srwxr-xr-x  1 xw  wheel0 30 Nov 14:19 jack_0=
srwxr-xr-x  1 xw  wheel0 30 Nov 14:19 jack_ack_0=

k4m$ ktrace jack_showtime
jack server not running?

k4m$ kdump | grep '/tmp/jack-11001'
76030 initial thread STRU  struct sockaddr { AF_LOCAL, 
/tmp/jack-11001/default/jack_0 }

76030 initial thread NAMI  /tmp/jack-11001/default/jack_0
76030 initial thread RET   connect -1 errno 61 Connection refused

$ uname -a
FreeBSD viper.internal.network 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 
21 15:02:08 UTC 2009 
r...@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64


xw
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: mprotect(2) clears the flag for whole page which causes program crash.

2009-11-18 Thread Robert Watson


On Tue, 17 Nov 2009, Sharad Chandra wrote:

Is it known bug or is there any workaround? How will a userland process make 
sure that process will not crash as malloc(3) can allocate where ever it get 
the memory free to use.


mprotect(2) operates on pages, so you'll want to use mmap(2) and munmap(2) to 
allocate and free pages directly rather than mallac(3), which manages byte 
ranges from pages managed using those same interfaces.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: mmap(2) segaults with certain len values and MAP_ANON|MAP_FIXED

2009-10-21 Thread Robert Watson


On Wed, 21 Oct 2009, Alexander Best wrote:

this code serves only one purpose: to trigger a segfault. i don't use the 
code for any other purpose. i was under the impression that mmap() should 
either succeed or fail (tertium non datur). mmap's manual doesn't say 
anything about mmap() causing segfaults.


Have you tried ktracing the application?  I think you'll find that mmap(2) 
system call succeeded fine, and that the segfault comes from attempting to 
execute the address in libc on return to userspace, as a result of libc not 
being at that address anymore (since you removed its mapping).  You can use 
procstat -v to inspect address space use by processes, but as a general rule 
you don't want to pass anything other than an address of 0x0 to mmap(2) unless 
you're very carefully managing the address space of the process.  Many 
userspace libraries are involved in using that address space, but especially 
the runtime linker which begins execution in userspace when a binary is 
started.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: global TCP_NODELAY?

2009-10-12 Thread Robert Watson


On Mon, 12 Oct 2009, Ivan Voras wrote:


2009/10/12 Alfred Perlstein alf...@freebsd.org:

* Ivan Voras ivo...@freebsd.org [091012 04:29] wrote:
I'm trying to work around some extreme brain damageness in PHP (yes, it 
sucks) which doesn't have a way to set TCP_NODELAY on stream sockets so 
I'm wondering what are my other options? Is there a way to set TCP_NODELAY 
system-wide?


Ivan, many people write php extensions, maybe you can do that?


While writing PHP extensions isn't hard (I've done it before), I'm not yet 
convinced it's worth the effort in this case - I don't know if TCP_NODELAY 
will help at all. I'll think about it if time permits.


Create a libc wrapper that calls setsockopt(2) whenever socket(2) is called to 
create a TCP socket in php, and inject it using LD_PRELOAD.  This is a similar 
trick to what things like socks proxy library wrappers use, is easy to hack 
together, and avoids having to modify the kernel.  When it doesn't work, move 
on, and if it does, change php :-).


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Need some help understanding a jail system call.

2009-08-31 Thread Robert Watson

On Wed, 26 Aug 2009, Dag-Erling Smørgrav wrote:

Here is the link i used to find this code 
http://www.watson.org/~robert/freebsd/jailng/


You realize that this is eight years old, right?  And that the jail 
infrastructure has been extensively modified since then, and is currently 
being rewritten again?


As DES points out, that jail work has been superceded by other, more 
interesting, work in the base tree.  My suggestion would be to read the 
jail(2) man page, both the current 7.x version, and the forthcoming 8.x 
version which has been substantially enhanced, and disregard the jailng page. 
I should more clearly mark it as being of historic interest only.


Robert N M Watson
Computer Laboratory
University of Cambridge___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Common interface for sensors/health monitoring

2009-08-22 Thread Robert Watson

On Sat, 22 Aug 2009, Marc Balmer wrote:

I was looking for the same info a time ago .. something that would allow me 
to gather all the info from the same place, but the only thing I came up 
with was the very same discussion about the sensors framework port and 
nothing else.


Any info on any such proyect will be greatly apreciated


The OpenBSD sensors framework lacks some desireable features, e.g. event 
capabilities like getting an event if a certain threshold is exceeded.  And 
it propbably was used for things that it better had not (yes, I am culprit 
for on of these (ab)uses...).


I am sure these features could be added if only the code was in the tree to 
hack on...


One of the things I'd particularly like to see is an alignment between 
kernel/user level monitoring frameworks and the SNMP model (especially 
relating to traps).  The SNMP information model (MIBs, agents, traps, etc) has 
its limitations, but having a compatible model at all layers of the system 
will make it easier to store, manipulate, manage, and report this information 
consistently throughout the OS and larger distributed systems.


Robert
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Security: information leaks in /proc enable keystroke recovery

2009-08-17 Thread Robert Watson


On Sun, 16 Aug 2009, David Wagner wrote:

I accept your argument that there is no point trying to defend against 
deliberate communication of information between two cooperating processes 
via some sneaky channel; there is no hope of stopping that in 
general-purpose commodity OS's.  If process X and Y are both colluding to 
send information from X to Y, they will succeed, no matter how hard we try. 
We have no hope of closing all such channels, for general-purpose commodity 
OS's (like FreeBSD or Linux).


Moving beyind EIP/ESP, which are clearly a bad idea:

The OS community has not engaged well with the concerns raised by past 
cache-based crypto side channels, in part because it seemed the least complex 
solution was hardening crypto against having key-driven footprints in the 
cache.  However, the problem they represent (avoiding the use of shared 
resources between mutually untrusting processes, and then mitigating efects 
that remain) definitely sounds like the covert channel problem, with very 
similar concerns extensively discussed in the documents I referred to.


In an interactive system, the scheduling of threads in a process reflect the 
completion of various events: user I/O, network I/O, disk I/O, or perhaps the 
expiration of a timer associated with application-internal events (animations, 
statistics, etc).  Monitoring these from another process is intentionally easy 
on commodity OS's -- there are a variety of monitoring statistics, from the 
already mentioned process/thread execution time, to context switch counters, 
wait channels/addresses, lock states, timestamps on special devices, etc, not 
to mention having CPU sink processes that nice themselves appropriately and 
hang around monitoring execution of other processes/threads/the kernel through 
gaps in its own scheduling.  Some of the intentional mechanisms are specific 
to processes, and easy to block by policy.  Others are global, and begin the 
sliding down the slope of making the system and applications a lot harder to 
analyze and debug, something that sites frequently hosting large numbers of 
mutually untrusting users (web farms) may not be willing to deal do.


Into the area of techniques that annoy people: my guess is that you may also 
be able to measure the context switching of processes on other CPUs through 
very careful timing of events in the kernel on your local CPU.  For example, 
it's a reasonable bet that using the TSC and carefully selected system 
calls/arguments, you can measure cache line behavior associated with kernel 
scheduler/statistic lines that will be pulled to another CPU when a context 
switch takes place.  For example, consider per-CPU run queue locks or context 
switch statistics, which may in edge cases be pulled to another CPU, such as 
when monitoring takes place.  If they are already local to the attacking CPU, 
no context switch has taken place on the other CPU since you last checked; if 
they're non-local, a context switch has taken place.


Following Colin Percival's paper on cache side channels for RSA, there was a 
lot of discussion about how the OS could help mitigate these problems: do you 
provide security critical sections around cryptography which introduce 
temporary but performance-degrading mutual exclusion of caches based on 
knowledge of the CPU topology, for example.  Identifying and offering similar 
trade-offs between performance and security, avoiding excess complexity, and 
in particular, limiting the scope of those performance losses to only critical 
moments will be key if the security community wants to engage the OS community 
here.  Otherwise I suspect these concerns will pass by, unaddressed, again.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Security: information leaks in /proc enable keystroke recovery

2009-08-16 Thread Robert Watson


On Sun, 16 Aug 2009, Oliver Pinter wrote:

FreeBSD manages its process files more cautiously than Linux12 : it puts all 
register values into the file /proc/pid/regs that can only be read by the 
owner of a process, which blocks the information used by


This is inaccurate, but largely in an academic sense.  The FreeBSD kernels 
computes debug permissions between two processes from a number of factors, 
including:


- Comparison of uids (effective, real, saved)
- Comparison of gids (effective, real, saved)
- Subset check on the additional group set
- Jail information
- Mandatory access control policies

There are some other checks that fit the pattern of credential comparison less 
well:


- We deny debugger activity during execve(2) as a robustness protection
  against possible race conditions.

- We have global policy controls, security.bsd.see_other_gids and
  security.bsd.see_other_uids, which allow administrators to scope not just
  debugging facilities, but also monitoring facilities.

- We have a global policy control, security.bsd.unprivileged_proc_debug, to
  disable debugging facilities for unprivileged users, on the grounds that (a)
  this is sometimes a desirable system policy, and (b) all UNIX systems have
  historically suffered from significant debugger security vulnerabilities and
  this provides an easy work-around to use if that happens in the future.

Which is to say: the UNIX file system permissions appearing in procfs are 
purely decorative -- they roughly summarize, but do not implement, the above 
checks.


procfs is also deprecated in FreeBSD, and has not been mounted by default for 
several major releases.  Instead, the system call interfaces ktrace(2), 
ptrace(2), and sysctl(2) provide access to trace data, process debugging, and 
process state (such as address space layout and file descriptor information). 
Some legacy setgid kvm tools exist that use libkvm and /dev/kmem, but we are 
eliminating these as quickly as we can; they may not follow the same policies 
as those implemented in the kernel.


The see_other_uids and see_other_gids policy sysctls narrow the policy on 
inter-process visibility via monitoring controls -- however, 
additional hardening is required to enforce this policy universally.  For 
example, administrators will also need to limit access to logs, accounting 
data, and so on, for this to be fully effective.


Beyond this, and assuming the correct implementation of the above, we're into 
the grounds of classic trusted OS covert channel analysis, against which no 
COTS UNIX OSes I'm aware of are hardened.  This isn't to dismiss these attacks 
as purely hypothetical -- we've seen some rather compelling examples of covert 
channels being exploited in unexpected and remarkably practical ways in the 
last few years (Steven Murdoch's Hot or Not paper takes the cake in that 
regard, I think).


However, this next step up from the kernel doesn't reveal information on 
processes from other users involves scheduler hardening, consideration of 
inter-CPU/core/thread cache interactions, and so on -- things that we don't 
have a good research, let alone production, OS understanding of.  There are 
tools in FreeBSD that can help with some of these issues -- for example, you 
can use login classes to pin different users to different CPU threads, cores, 
or packages.  However, this leaves the implementation of policy up to the 
administrator, rather than simply allowing the administator to specify the 
policy that mutually untrusting processes can't share CPUs with each other in 
some window.


Robert N M Watson
Computer Laboratory
University of Cambridge___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: What's changed between 7.1 and 7.2

2009-07-10 Thread Robert Watson

On Fri, 10 Jul 2009, Ivan Voras wrote:


Robert Watson wrote:


On Wed, 8 Jul 2009, Wojciech Puchar wrote:


i'm getting that crap every time i remount filesystem and on startup.
GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed.
GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e.
GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed.
GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e.
GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed.
GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e.
GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed.
GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e.

i'm using glabel only to avoid mess about what drive is connected to what 
SATA port.


This is effectively debugging output that slipped into the release and 
shouldn't have.  I believe it's now removed in 8.x, I'm not sure it's been 
MFC'd to 7.x yet.  The output can be entirely ignored and does not reflect 
a problem, just state changes resulting from a volume becoming visible to 
geom, and then the label name being removed following mount.


If it's desireable to MFC it now, I'll do it.

(but it will remove the above output for all glabel labels, not only ufs).


I think it would be widely appreciated.  :-)

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: What's changed between 7.1 and 7.2

2009-07-08 Thread Robert Watson


On Wed, 8 Jul 2009, Wojciech Puchar wrote:


i'm getting that crap every time i remount filesystem and on startup.
GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed.
GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e.
GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed.
GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e.
GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed.
GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e.
GEOM_LABEL: Label ufsid/48dd2cbe8423dd9e removed.
GEOM_LABEL: Label for provider mirror/sysa is ufsid/48dd2cbe8423dd9e.

i'm using glabel only to avoid mess about what drive is connected to what 
SATA port.


This is effectively debugging output that slipped into the release and 
shouldn't have.  I believe it's now removed in 8.x, I'm not sure it's been 
MFC'd to 7.x yet.  The output can be entirely ignored and does not reflect a 
problem, just state changes resulting from a volume becoming visible to geom, 
and then the label name being removed following mount.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: large pages (amd64)

2009-07-02 Thread Robert Watson


On Tue, 30 Jun 2009, Mel Flynn wrote:

It looks like sys/kern/kern_proc.c could call mincore around the loop at 
line 1601 (rev 194498), but I know nothing about the vm subsystem to know 
the implications or locking involved. There's still 16 bytes of spare to 
consume, in the kve_vminfo struct though ;)


Yes, to start with, you could replace the call to pmap_extract() with a 
call to pmap_mincore() and export a Boolean to user space that says, This 
region of the address space contains one or more superpage mappings.


How about attached?


I like the idea -- there are some style nits that need fixing though. 
Assuming Alan is happy with the VM side of things, I can do the cleanup and 
get it in the tree.


Robert N M Watson
Computer Laboratory
University of Cambridge



% sudo procstat -av|grep 'S '
 PID  STARTEND PRT  RES PRES REF SHD  FL TP PATH
1754 0x2890 0x2ae0 rw- 93850   3   0 --S df
2141 0x2f90 0x3080 rw- 37190   1   0 --S df
2146 0x3eec 0x4fac rwx 17450   1   0 --S df

--
Mel



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: callout(9) and Giant lock

2009-06-28 Thread Robert Watson


On Sun, 28 Jun 2009, Sebastian Huber wrote:

suppose that a certain time event triggered several callout functions. What 
happens if the first of these callout functions blocks on the Giant lock? 
Does this delay all further callout functions until the Giant lock is 
available for the first callout function? What happens if one of the callout 
function blocks forever? Does this deadlock the system?


Callouts are marked as MPSAFE or non-MPSAFE when registered.  If non-MPSAFE, 
we will acquire Giant automatically for the callout, but I believe we'll also 
try and sort non-MPSAFE callouts behind MPSAFE ones in execution order to 
minimize latency for MPSAFE callouts.  Most callouts acquire locks of some 
sort, and stalling any callout indefinitely will stall the entire callout 
thread indefinitely, which in turn could lead to a variety of odd behaviors 
and potentially (although not necessarily) deadlock.


In general, we do not allow callouts to block, however, in the sense that 
with INVARIANTS enabled we will actually panic if a callout tries to call 
msleep() or related functions.  Likewise, if another thread sleeps while 
holding Giant, it will automatically release it when it sleeps.


Relatively few kernel subsystems use Giant at this point, FYI, and even fewer 
in FreeBSD 8.  One of our goals for FreeBSD 9 is to eliminate all last 
remaining references to the Giant lock in the kernel.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Disk quota for Jail. Discussion.

2009-05-27 Thread Robert Watson


On Tue, 26 May 2009, Menshikov Konstantin wrote:

Yes. But jail cannot allocate block and inode above root path. In 
allocation functions, whether for example ffs_alloc we have access to 
ucred process and we can check up there is a process in jail.


Yes, you can check this for jailed process. Think about non-jailed 
processes that can do allocation below the jail root.


Processes out of jail are not considered. I do not understand, these 
processes have what relation to disk to quotas for jail. Please explain more 
in detail


Historic UFS quotas are actually not interested in processes at all, really, 
except in as much as processes are where exception states are exposed.  UFS 
quotas count blocks and inodes owned by users based on the 'uid' and 'gid' 
fields in the inode.  There's now 'jailid' field, so quotas on this model 
can't capture the notion of per-jail quotas.  In fact, quotacheck relies on 
being able to walk the file system looking only at file system data in order 
to establish initial usage accounting.  You can imagine adding one, or 
managing the uid spaces across jails such that all uids are unique, etc, but 
all of these require some amount of rethinking.


Or, some other model of quota.  Frankly, I've always been a fan of the AFS 
model, now accessible locally via ZFS, in which lightweight volumes with quota 
limits are used for individual user home directories, virtual machines, etc. 
This was hard to do in FreeBSD before ZFS because (a) UFS didn't want to 
resize trivially and (b) having lots and lots of mountpoints and file systems 
wasn't something we made administratively easy.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: compiling system binutils as cross tools

2009-05-21 Thread Robert Watson


On Thu, 21 May 2009, xorquew...@googlemail.com wrote:

How do I compile the system binutils (contrib/binutils) as i386 - x86_64 
cross utils? That is, binutils that will run on an i386 host but will 
produce x86_64 binaries?


I'm trying to produce a bootstrapping compiler for a port and need to get 
these working. I've spent a while reading Makefiles but would rather get 
information from someone who actually knows rather than waste *another* week 
on this stuff.


I'd rather not compile the entire world if it can be avoided.


Not really my area, but if you haven't found make toolchain and make 
buildenv then you might want to take a look.  Typically these will be 
combined with TARGET_ARCH=foo, and in your case foo is 'amd64'.  The former 
builds the toolchain required for the architecture, and the latter creates a 
shell environment with paths appropriately munged and environments 
appropriately set to cross-compile using that chain.  Normally the toolchain 
step is part of our integrated buildworld/buildkernel/etc process, but you can 
also use it for other things with buildenv.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: How to invalidate NFS read cache?

2009-05-12 Thread Robert Watson

On Fri, 8 May 2009, Konrad Heuer wrote:

sporadically, I observe a strange but serious problem in our large NFS 
environment. NFS servers are Linux and OS X with StorNext/Xsan cluster 
filesystems, NFS clients Linux and FreeBSD.


NFS client A changes a file, but nfs client B (running on FreeBSD) does 
still see the old version. On the NFS server itself, everything looks fine.


Afaik the FreeBSD kernel invalidates the NFS read cache if file modification 
time on the server changed which should happen here but doesn't. Can I force 
FreeBSD (e.g. by sysctl setting) to read file buffers again unconditionally 
after vfs.nfs.access_cache_timeout seconds have passed?


Hi Konrad:

Normally, NFS clients implement open-to-close consistency, which dictates that 
when a close() occurs on client A, all pending writes on the file should be 
issued to the server before close() returns, so that a signal to client B to 
open() the file can validate its cache before open() returns.


This raises the following question: is client A closing the file, and is 
client B then opening it?


If not: relying on writes being visible on the client B before the close() on 
A and a fresh open() on B is not guaranteed to work, although we can discuss 
ways to improve behavior with respect to expectation.  Try modifying your 
application and see if it gets the desired behavior, and then we can discuss 
ways to improve what you're seeing.


If you are: this is probably a bug in our caching and or issuing of NFS RPCs. 
We cache both attribute and access data -- perhaps there is an open() path 
where we issue neither RPC?  In the case of open, we likely should test for a 
valid access cache entry, and if there is one, issue an attribute read, and 
otherwise just issue an access check which will piggyback fresh attribute data 
on the reply.  Perhaps there is a bug here somewhere.


A few other misc questions:

- Could you confirm you're using NFSv3 on all clients.  Are there any special
  mount options in use?
- What version of FreeBSD are you running with?

In FreeBSD 8.x, we now have DTrace probes for all of the above events -- VOPs, 
attribute cache hit/miss/load/flush, access cache hit/miss/load/flush, RPCs, 
etc, which we can use to debug the problem.  I haven't yet MFC'd these to 7.x, 
but if you're able to run a very fresh 7-STABLE, I can probably produce a 
patch to add it for you in a few days.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: How to invalidate NFS read cache?

2009-05-12 Thread Robert Watson


On Tue, 12 May 2009, Robert Watson wrote:

Normally, NFS clients implement open-to-close consistency, which dictates 
that when a close() occurs on client A, all pending writes on the file 
should be issued to the server before close() returns, so that a signal to 
client B to open() the file can validate its cache before open() returns.


This should, of course, read close-to-open consistency -- I plead jetlag 
after an overnight flight back form Boston to the UK :-)


Robert N M Watson
Computer Laboratory
University of Cambridge



This raises the following question: is client A closing the file, and is 
client B then opening it?


If not: relying on writes being visible on the client B before the close() on 
A and a fresh open() on B is not guaranteed to work, although we can discuss 
ways to improve behavior with respect to expectation.  Try modifying your 
application and see if it gets the desired behavior, and then we can discuss 
ways to improve what you're seeing.


If you are: this is probably a bug in our caching and or issuing of NFS RPCs. 
We cache both attribute and access data -- perhaps there is an open() path 
where we issue neither RPC?  In the case of open, we likely should test for a 
valid access cache entry, and if there is one, issue an attribute read, and 
otherwise just issue an access check which will piggyback fresh attribute 
data on the reply.  Perhaps there is a bug here somewhere.


A few other misc questions:

- Could you confirm you're using NFSv3 on all clients.  Are there any special
 mount options in use?
- What version of FreeBSD are you running with?

In FreeBSD 8.x, we now have DTrace probes for all of the above events -- 
VOPs, attribute cache hit/miss/load/flush, access cache hit/miss/load/flush, 
RPCs, etc, which we can use to debug the problem.  I haven't yet MFC'd these 
to 7.x, but if you're able to run a very fresh 7-STABLE, I can probably 
produce a patch to add it for you in a few days.


Robert N M Watson
Computer Laboratory
University of Cambridge


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: NetBSD 5.0 statistics

2009-04-30 Thread Robert Watson


On Thu, 30 Apr 2009, Oliver Pinter wrote:


Is the FreeBSD's FS management so slow?

http://www.netbsd.org/~ad/50/img15.html

Or so big is the difference between the two cpu scheduler?


Also, there's a known and serious performance regression in CAM relating to 
tgged queueing, and the generic disk sort routine, introduced 7.1, which will 
be fixed in 7.2.  I can't speak more generally to the benchmarks -- we'll need 
to run them in a controlled environment and see if we can reproduce the 
results.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD memguard + spinlocks

2009-04-11 Thread Robert Watson

On Sat, 11 Apr 2009, Andrew Brampton wrote:

I'm having a problem with memguard(9) on FreeBSD 7.1 but before I ask about 
that I just need to check my facts about malloc.


When in interrupt context malloc must be called with M_NOWAIT, this is 
because I can't sleep inside a interrupt. Now when I hold a spinlock 
(MTX_SPIN) I am also not allowed to sleep or obtain a sleepable mutex (such 
as MTX_DEF). So I assume while holding a spin lock any mallocs I do must 
have the M_NOWAIT flag? This was not clear from the manual pages, but at 
least makes sense to me.


So my problem with memguard stems from the fact that I am locking a 
spinlock, and then I'm calling malloc with M_NOWAIT. But inside 
memguard_alloc a MTX_DEF is acquired causing WITNESS to panic.


So I think fundamental memguard is flawed and should be using MTX_SPIN 
instead of MTX_DEF otherwise it can't be called from inside a interrupt or 
when a spin lock is held. But maybe I'm missing something?


Also on a related note, I see that MTX_SPIN disables interrupts, making it a 
rather heavy spinlock. Is there a lighter spin lock that literally just 
spins? I read that MTX_DEF are far quicker to acquire , but surely a light 
spinlock would be easier to acquire than sleeping?


I think for the moment I will fix my code by not using a MTX_SPIN (since the 
code is not in a interrupt), however, I think memguard should change its 
lock.


Your understanding is mostly right.  The missing bit is this: there are two 
kinds of interrupt contexts -- fast/filter interrupt handlers, which borrow 
the stack and execution context of the kernel thread they preempt, and 
interrupt threads, which get their own complete thread context.


Fast interrupt handlers are allowed unlock to acquire spinlocks so as to avoid 
deadlock because of the borrowed context.  This means they can't perform any 
sort of sleep, or acquire any locks that might sleep, since the thread they've 
preempted may hold conflicting locks, or be the one that would have woken up 
the sleep that the handler performed.  Almost no code will run in fast 
handlers -- perhaps checking some device registers, doing work on a lockless 
or spinlock-protected queue, and waking up a worker thread.


This is why, BTW, spin locks disable interrupt: they need to control 
preemption by other interrupt handlers to avoid deadlock, but they are not 
intended for use except when either in the scheduler, in a few related IPI 
contexts, or when synchronizing between normal kernel code and a fast handler.


Full interrupt thread contexts are permitted to perform short lock sleeps, 
such as those performed when contending default mutexes, rwlocks, and rmlocks. 
They are permitted to invoke kernel services such as malloc(9), UMA(9), the 
network stack, etc, as long as they use M_NOWAIT and don't invoke msleep(9) or 
similar unbounded sleeps -- again to avoid the possibility of deadlocks, since 
you don't want an interrupt thread sleeping waiting for an event that only it 
can satisfy.


So the first question, really, is whether you are or mean to be using 
fast/filter interrupt handler.  Device drivers will never call memory 
allocation, free, etc, from there, but will defer it to an ithread using the 
filter mechanism in 8.x, or to a task queue or other worker in 7.x and 
earlier.  If you're using a regular INTR_MPSAFE ithread, you should be able to 
use only default mutexes (a single atomic operation if uncontended) without 
disabling interrupts, etc.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD memguard + spinlocks

2009-04-11 Thread Robert Watson


On Sat, 11 Apr 2009, Andrew Brampton wrote:

Thanks very much for your detailed reply. I'm slowly understanding how 
everything in FreeBSD fits together, and I appreciate your help.


I've been given a project to take over, and all of the design decisions were 
made before I started working on it, thus I'm playing catch up. One of the 
decisions was to implement their own version of a spin lock, which literally 
looks something like this:


lock_aquire() {
 critical_enter();
 while (! lockHeld ) {}
 lockHeld++;
}

This was actually the code tripping up MemGuard, as it is inside a critical 
section, which MemGuard is unable to sleep within. This is all running 
inside a kthread_create thread (I'm unsure of the proper name of this type 
of thread).


Anyway, that is why I also asked about a lighter weight spin lock (perhaps 
similar to this one). I tempted to replace this custom spinlock with the 
standard MTX_DEF, however I'm unsure of its impact. The custom spin lock 
seems quick and light to acquire, and it does not concern me that a 
interrupt can potentially interrupt the code.


On a related note, if you change the lock in memguard to a MTX_SPIN, it 
panics the kernel during boot. So that is not an option :) I was only using 
memguard because I suspected memory being used after it was freed. However, 
I think I will either change my locks to MTX_DEF or live without memguard.


I realise I've not really asked any questions, but I would be grateful for 
any insights anyone may have. Andrew


My advice, unless you're definitely executing code in fast interrupt contexts, 
is to simply use the FreeBSD default mutex primitive instead of either a 
custom-build spinlock or a FreeBSD MTX_SPIN mutex.  The default mutex is 
adaptive, and will spin when contending the lock unless the thread holding the 
lock isn't executing, in which case it will fall back on a context switch. 
Our mutexes also make correct use of memory barriers, which the above example 
code doesn't appear to, so will work on systems that have weaker memory 
ordering properties.  Using the default mutex scheme also allows you to take 
advantage of WITNESS, our lock order verifier, which proves a really useful 
tool when a lot of locks are in flight.


The critical sections you're using above may not have the effect you intend: 
they prevent preemption by another thread, and they prevent migration to 
another CPU, but they don't prevent fast interrupt handlers from executing. 
Any synchronization with a fast interrupt handler needs to be done either 
using spinlocks, or other atomic operations.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: GSoC: Semantic File System

2009-04-07 Thread Robert Watson

On Tue, 7 Apr 2009, Stephan Lichtenauer wrote:


Am 02.04.2009 um 19:26 schrieb Robert Watson:

In the BeOS model, or my reinterpretation based on something I read a long 
time ago and then presumably had dreams about, the split is a bit 
different: the file system maintains indexes of extended attributes, which 
are written by applications in order to expose searchable material.  For 
example, a mail application might write out each message as a file, and 
attach a series of extended attributes, such as subject line, date, author, 
etc.  These extended attributes are then indexed automatically by the file 
system in order to allow queries to be evaluated.  I don't recall how 
queries and results are expressed, and in particular, whether the queries 
are processed by the file system (possibly exposed via special APIs or the 
name space) or userspace (accessing special files maintained by the kernel 
that are the indexes).


It's also worth observing that one of the authors of BFS was Dominic 
Giampaolo, who now works on Apple's HFS+, and implemented fsevents there as 
part of their Spotlight project.


Maybe you also might be interested that there is a PDF document (formerly 
book) from Dominic available describing the BeOS file system in great 
detail: http://www.haiku-os.org/legacy-docs/practical-file-system-design.pdf


Additionally, there seems to be a GSoC project to create something like 
Spotlight for Haiku, the open source BeOS clone. You could browse through 
the haiku-developer mailing list archives at 
http://www.freelists.org/archive/haiku-development, the thread where this 
has been discussed is titled Need Some GSoC Advice with the first mail 
from 21 March.


Actually, I have a original copy of the book on the bookshelf behind me. :-)

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: GSoC: Semantic File System

2009-04-02 Thread Robert Watson

On Thu, 2 Apr 2009, Gabriele Modena wrote:


On Sun, Mar 22, 2009 at 6:52 PM, Robert Watson rwat...@freebsd.org wrote:
We are certainly not uninterested in projects along these lines, but I 
think the trick will be creating a convincing proposal that argues that (a) 
you can do the work in a summer, (b) there's a compelling usage case for 
including the results in FreeBSD, and (c) find a mentor who can supervise 
you in this project.


Thanks, I will keep it on mind when writing the proposal. How do you suggest 
to proceed for finding a mentor?


By the way, this is a project that I'm very probably going to carry on even 
without GSoC support (even though that would be very useful).


Well, I think the first step is to write the proposal, and we can see about 
shopping it around for a potential mentor.


What sort of semantic file system do you have in mind?  How would you feel 
about a middle-ground project along the lines of Mac OS X Spotlight or 
similar efficient userspace indexing of a file system based on feedback 
from the file system about what has changed, or something BeOS-like, in 
which indexing takes place for extended attributes rather than for 
contents?


In this moment I am considering also an userspace approach similar to 
Spotlight/Beagles, but I don't know how I could propose this as a FreeBSD 
GSoC project.


I think that would make a fine GSoC proposal.  Keep in mind that one of the 
premises of Spotlight is the fsevents kernel feature, and fseventsd, which 
allow Spotlight to subscribe to changes in trees and kick off reindexing as 
required.  Porting the fsevents API to FreeBSD is fairly straight forward, 
with one exception: HFS+ offers a much more reliable notion of vnode-path 
mapping, but it would be interesting to see how well our current vnode-path 
mapping mechanisms would suffice in practice (since a lot of the edge cases 
that don't work well with our mapping system are exactly that -- edge cases).


Between kernel and userspace parts there's quite a bit to do, but one 
possibility would be to borrow parts from Mac OS X/etc that we need.  For 
example, do a literal port of the fsevents mechanism from XNU, provide our own 
implementation that provides a similar API, or provide a new mechanism that 
meets fseventd's semantic requirements for monitoring.


What I have in mind at the moment would be an indexing based on contents 
rather than extended fs attributes. I did not know about the BeOS semantics 
capabilities, I will surely have a look at that.


I'm probably blending reality with imagination here, but my vague recollection 
is that the model was a slightly different blend of user vs. application 
involvement in indexing.  For systems like Spotlight, there are no 
kernel-maintained indexes, the kernel simply provides a change list so that 
the userspace indexer can go through and apply file type-specific indexes to 
all files that have changed.  So, for example, there are indexers for word 
files, plain text files, pdf's, and so on.


In the BeOS model, or my reinterpretation based on something I read a long 
time ago and then presumably had dreams about, the split is a bit different: 
the file system maintains indexes of extended attributes, which are written by 
applications in order to expose searchable material.  For example, a mail 
application might write out each message as a file, and attach a series of 
extended attributes, such as subject line, date, author, etc.  These extended 
attributes are then indexed automatically by the file system in order to allow 
queries to be evaluated.  I don't recall how queries and results are 
expressed, and in particular, whether the queries are processed by the file 
system (possibly exposed via special APIs or the name space) or userspace 
(accessing special files maintained by the kernel that are the indexes).


It's also worth observing that one of the authors of BFS was Dominic 
Giampaolo, who now works on Apple's HFS+, and implemented fsevents there as 
part of their Spotlight project.


Robert N M Watson
Computer Laboratory
University of Cambridge___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Improving the kernel/i386 timecounter performance (GSoC proposal)

2009-03-27 Thread Robert Watson


On Fri, 27 Mar 2009, Scott Long wrote:

I've been talking about this for years.  All I need is help with the VM 
magic to create the page on fork.  I also want two pages, one global for 
gettimeofday (and any other global data we can think of) and one per-process 
for static data like getpid/getgid.


FWIW, there are some variations in schemes across OS's -- one extreme is the 
Linux approach, which actually exports a mini shared library in ELF format on 
the shared page, providing implementations of various services (such as 
entering system calls), time stuff, etc.  Less extreme are the shared pages 
offered on Mac OS X, etc.


Robert N M Watson
Computer Laboratory
University of Cambridge



Scott


Sergey Babkin wrote:

   (Sorry for the top quoting). Probably the best implementation of
   gettimeofd=y() is to have
   a page in the kernel mapped read-only to all the user pr=cesses. Put
   the kernel's idea of time
   into this page. Then getting the =ime becomes a simple read (OK, two
   reads, to make sure that
   no update =as happened in between).
   The TSC can then be used to add the precis=on between the ticks of
   the kernel timer:
   i.e. remember the value of TS= when the last tick happen, and the
   highest rate at which
   TSC may be ti=king at this CPU, and export in the same page. This
   would guarantee thatthe time is not moving back.
   However there are more issues with TS=. TSC is guaranteed to have
   the same value
   on all the processors that s=are the same system bus. But if the
   machine is built of multiple
   buses =ith bridges between them, all bets are off. Each bus may be
   stopped, resta=ted
   and clocked separately. There is no way to tell, on which CPU is th=
   process currently
   runnning, and it may be rescheduled do a different C=U right before
   or after the RDTSC
   instruction.
   -SB
   Ma= 26, 2009 06:55:04 PM, [1]...@phk.freebsd.dk wrote:
In message [2]17560ccf0903260551v1f5cba9eu8 
7727c0bae7b...@mail.gmail.com, Prasha

 nt Vaibhav writes:
 =The gettimeofday() function's implementation will then be
 change= to read the timestamp counter (TSC) from the processor,
 and use the
 g=;reading in conjunction with the timing info exported by the
 kernel to
 =calculate and return the time info in proper format.
 I take it a= read, that you know that there are other relvant
 functions than gettim=ofday() and that these must provide a
 monotonic timescale when queried =nterleaved ?
 Be aware that the TSC may not be, and may not stay syn=hronized
 across multiple cores.
 Further more, the TSC is not con=tant frequency and in particular
 not known frequency at all times.
 There are a lot of nasty cases to check, and a nasty interpolation
 =equired, which, in my tests some years back, totally negated any
 speedu= from using the TSC in the first place.
 At the very minimum, you wi=l have to add a quirk table where
 known good {CPU+MOBO+BIOS} combinatio=s can be entered, as we
 find them.
 This will also pave way f=r optionally making the
 FreeBSD kernel tickless,
 Rubbish. T=mecounters are not even closely associated with the
 tick or ticklessnes= of the kernel. [1]
  - The TSC frequency might change on cert=in processors with
 non-constant
  TSC rate (because of SpeedStep, =ynamic freq scaling etc.). The
 only way to
  combat this is that t=e kernel be notified every time the
 processor
  frequency changes.=very cpu frequency driver will need to be
 updated to
  notify the=ernel before and after a cpu freq change.
 That is not good enough= the bios may autonomously change the cpu
 speed
 and the skew from not k=owing exactly _when_ and _how_ the cpu
 clock
 changed, is a significant =umber of microseconds, plenty of time
 to make strange things happen.
 You will want to study carefully Dave Mills work to tame the alpha
 =hips wandering SAW clocks.
 Poul-Henning
 [1] In my mind, rewo=king the callout system in the kernel would
 be a much better more neded=nd much more worthwhile project.
 --
 Poul-Henning Kamp | =NIX since Zilog Zeus 3.20
 [3]...@freebsd.org | TCP=IP since RFC 956
 FreeBSD committer | BSD since 4.3-tahoe
 N=ver attribute to malice what can adequately be explained by
 incompetence.=r___
 [4]freebsd-hack...@freebsd.org mailing list
 [5]http://lists.freebsd.org/mailman/listinfo/freebsd-hackersTo
 unsubscribe, send any mail to [6]fre 
ebsd-hackers-unsubscr...@freebsd.org


References

   1. 3Dmailto:p...@phk.freebsd.dk;
   2. file://localhost/tmp/3D   3. 3Dmailto:p...@freebsd.org;
   4. 3Dmailto:fre   5. 3Dhttp://lists.=/
   6. 
3Dmailto:freebsd-hackers-unsub___

freebsd-curr...@freebsd.org mailing list

Re: Improving the kernel/i386 timecounter performance (GSoC proposal)

2009-03-27 Thread Robert Watson


On Fri, 27 Mar 2009, Scott Long wrote:

I've been talking about this for years.  All I need is help with the VM 
magic to create the page on fork.  I also want two pages, one global for 
gettimeofday (and any other global data we can think of) and one per-process 
for static data like getpid/getgid.


One note though -- the time to do the global page is at execve()-time.

Robert N M Watson
Computer Laboratory
University of Cambridge



Scott


Sergey Babkin wrote:

   (Sorry for the top quoting). Probably the best implementation of
   gettimeofd=y() is to have
   a page in the kernel mapped read-only to all the user pr=cesses. Put
   the kernel's idea of time
   into this page. Then getting the =ime becomes a simple read (OK, two
   reads, to make sure that
   no update =as happened in between).
   The TSC can then be used to add the precis=on between the ticks of
   the kernel timer:
   i.e. remember the value of TS= when the last tick happen, and the
   highest rate at which
   TSC may be ti=king at this CPU, and export in the same page. This
   would guarantee thatthe time is not moving back.
   However there are more issues with TS=. TSC is guaranteed to have
   the same value
   on all the processors that s=are the same system bus. But if the
   machine is built of multiple
   buses =ith bridges between them, all bets are off. Each bus may be
   stopped, resta=ted
   and clocked separately. There is no way to tell, on which CPU is th=
   process currently
   runnning, and it may be rescheduled do a different C=U right before
   or after the RDTSC
   instruction.
   -SB
   Ma= 26, 2009 06:55:04 PM, [1]...@phk.freebsd.dk wrote:
In message [2]17560ccf0903260551v1f5cba9eu8 
7727c0bae7b...@mail.gmail.com, Prasha

 nt Vaibhav writes:
 =The gettimeofday() function's implementation will then be
 change= to read the timestamp counter (TSC) from the processor,
 and use the
 g=;reading in conjunction with the timing info exported by the
 kernel to
 =calculate and return the time info in proper format.
 I take it a= read, that you know that there are other relvant
 functions than gettim=ofday() and that these must provide a
 monotonic timescale when queried =nterleaved ?
 Be aware that the TSC may not be, and may not stay syn=hronized
 across multiple cores.
 Further more, the TSC is not con=tant frequency and in particular
 not known frequency at all times.
 There are a lot of nasty cases to check, and a nasty interpolation
 =equired, which, in my tests some years back, totally negated any
 speedu= from using the TSC in the first place.
 At the very minimum, you wi=l have to add a quirk table where
 known good {CPU+MOBO+BIOS} combinatio=s can be entered, as we
 find them.
 This will also pave way f=r optionally making the
 FreeBSD kernel tickless,
 Rubbish. T=mecounters are not even closely associated with the
 tick or ticklessnes= of the kernel. [1]
  - The TSC frequency might change on cert=in processors with
 non-constant
  TSC rate (because of SpeedStep, =ynamic freq scaling etc.). The
 only way to
  combat this is that t=e kernel be notified every time the
 processor
  frequency changes.=very cpu frequency driver will need to be
 updated to
  notify the=ernel before and after a cpu freq change.
 That is not good enough= the bios may autonomously change the cpu
 speed
 and the skew from not k=owing exactly _when_ and _how_ the cpu
 clock
 changed, is a significant =umber of microseconds, plenty of time
 to make strange things happen.
 You will want to study carefully Dave Mills work to tame the alpha
 =hips wandering SAW clocks.
 Poul-Henning
 [1] In my mind, rewo=king the callout system in the kernel would
 be a much better more neded=nd much more worthwhile project.
 --
 Poul-Henning Kamp | =NIX since Zilog Zeus 3.20
 [3]...@freebsd.org | TCP=IP since RFC 956
 FreeBSD committer | BSD since 4.3-tahoe
 N=ver attribute to malice what can adequately be explained by
 incompetence.=r___
 [4]freebsd-hack...@freebsd.org mailing list
 [5]http://lists.freebsd.org/mailman/listinfo/freebsd-hackersTo
 unsubscribe, send any mail to [6]fre 
ebsd-hackers-unsubscr...@freebsd.org


References

   1. 3Dmailto:p...@phk.freebsd.dk;
   2. file://localhost/tmp/3D   3. 3Dmailto:p...@freebsd.org;
   4. 3Dmailto:fre   5. 3Dhttp://lists.=/
   6. 
3Dmailto:freebsd-hackers-unsub___

freebsd-curr...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


___
freebsd-curr...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to 

Re: Re: Improving the kernel/i386 timecounter performance (GSoC proposal)

2009-03-27 Thread Robert Watson


On Fri, 27 Mar 2009, Sergey Babkin wrote:


  Would not a normal mmap be duplicated on fork? I'd do it as a small
  pseudo-= driver
  that allows to mmap this page. Then libc would open this pseudo-d=
  evice and mmap it,
  either in the on-load handler or on the first call of=
  gettimeofday().  I think, that should
  be it, no special magic nece= ssary.
  The per-process is more difficult and would require the magic= :-) Or
  maybe
  no magic a s such: just mmap the file from the /proc files= ystem.
  Then on fork
  in the child unmap this page, open the new file, and= map it. vfork
  will still be tricky :-)
  It also means wasting an extra p= age per process.


Part of the point of mapping in the page at execve()-time, or fork()-time for 
per-process pages (which I'm not entirely convinced we need yet) is to avoid 
the cost of an extra device open, mmap, etc, for every execve(), which can be 
quite expensive.  I stuck a prototype page mapped from a special device 
exporting time information here a year or two ago:


  http://www.watson.org/~robert/freebsd/20080203-evilmem.diff
  http://www.watson.org/~robert/freebsd/evilmem_test.c

This doesn't do TSC-based adjustment, just drops a timestamp in from the 
callout wheel, but was intended to allow Kris to do a bit of comparative 
benchmarking and decide if it might be a viable approach to invest further 
work in.  Obviously, the above code should never, ever, get near a production 
kernel, since it was a 2-hour hack for experimental purposes.


I think the right way forward is to prototype: map the page in at 
execve()-time in the kernel and pass the address to rtld via elf auxiliary 
arguments, and have rtld link it (via some or another means), exposing symbols 
or code or whatever, to libc.  If someone wants to make it a dynamic shared 
object in ELF-speak, then I'm all for that as it would minimize the work rtld 
had to do.


I guess interesting questions are whether (a) it would be desirable to have 
per-page, per-cpu, or per-thread mappings.  If there are non-synchronized 
TSCs, then there might be some interesting advantages to a per-CPU page.


Robert N M Watson
Computer Laboratory
University of Cambridge


  -SB
  Mar 27, 2009 12:51:56 PM, [1]sco...@samsc= o.org wrote:

I've been talking about this for years. All I need is help with =
the VM
magic to create the page on fork. I also want two pages, one gl=
obal
for gettimeofday (and any other global data we can think of) and
on= e
per-process for static data like getpid/getgid.
Scott
Sergey Babkin wrote:
 (Sorry for the top quoting). Probably the= best implementation of
 gettimeofd=3Dy() is to have
 a= page in the kernel mapped read-only to all the user
pr=3Dcesses. Put
g= t; the kernel's idea of time
 into this page. Then getting the= =3Dime becomes a simple read
(OK, two
 reads, to make sure that= br no update =3Das happened in
between).


References

  1. file://localhost/tmp/3Dmai=
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Improving the kernel/i386 timecounter performance (GSoC proposal)

2009-03-27 Thread Robert Watson


On Fri, 27 Mar 2009, Poul-Henning Kamp wrote:


In message alpine.bsf.2.00.0903272254460.12...@fledge.watson.org, Robert Wats
on writes:

I guess interesting questions are whether (a) it would be desirable to have 
per-page, per-cpu, or per-thread mappings.  If there are non-synchronized 
TSCs, then there might be some interesting advantages to a per-CPU page.


Rule #3:
The only thing worse than generalizing from one example is
generalizing from no examples at all.

We can add those mappings when we know why we would want them.


If we believe TSCs won't be synchronized, and don't want to synchronize them 
ourselves, then we'll need different mapping state to get from a TSC stamp to 
a time on different CPUs.  In which case user application threads will need to 
know their CPU in order to use the right conversion data (ideally without a 
system call, since that's part of what we're avoiding here), or use a per-CPU 
mapping and not know (in which case they'll need to detect and handle the very 
rare preempted and migrated between read TSC and read conversion data race). 
I'm not pushing a per-CPU page, but there would be some interesting advantages 
to supporting that.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Improving the kernel/i386 timecounter performance (GSoC proposal)

2009-03-27 Thread Robert Watson

On Fri, 27 Mar 2009, Poul-Henning Kamp wrote:

In message alpine.bsf.2.00.0903272303040.12...@fledge.watson.org, Robert 
Wats on writes:



In which case user application threads will need to know their CPU [...]


Didn't jemalloc solve that problem once already ?


I think jemalloc implements thread-affinity for arenas rather than 
CPU-affinity in the strict sense, but I may misread.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: does Copyright on source files expire ?

2009-03-26 Thread Robert Watson

On Thu, 26 Mar 2009, ttw+...@cobbled.net wrote:


On 25.03-05:31, David Schultz wrote:
[ ... ]
A person's Copyright doesn't go away just because they die, disappear, or 
fail to respond. If you can't contact them, their heirs, or whomever they 
transferred the Copyright to, you're stuck.


yeah but it's a little like finding something.  if there not about and not 
reachable there isn't much they can do to stop you using it. if they popup 
and make demands later then you get to choose between re-writes and haggling 
(twenty shekels is standard).


In some countries, such as the US, copyright violation can be a criminal, not 
just civil, matter.


Also, in countries where copyright can be assigned, the holder listed in a 
file may not accurately represent who the current holder is, so while the 
original author may be unreachable, etc, the current holder may be alive and 
kicking.


Robert N M Watson
Computer Laboratory
University of Cambridge


point is you can use it, the actual copyright owner needs to sue
you; not like saying jehovah which may result in action by the agents
of the state.

n.b: using the above opinion may get you crucified.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: 2 uni-directional TCP connection good

2009-03-23 Thread Robert Watson


On Sun, 22 Mar 2009, Yoshihiro Ota wrote:


On Fri, 20 Mar 2009, Yoshihiro Ota wrote:

1. With TCP connections, only sender side can detect some communication 
issues passively if happened.  By using two connections, you lost that 
ability by your self.  I agree on this one.


Could you expand a bit on this point?  While the connection creation 
process (usually) asymmetric, once the connection is built it's essentially 
the same state machine on both sides of the connection, and socket 
semantics with respect to the state machine are effectively identical. 
Application on both sides should be able to detect disconnect, monitor 
connection state using TCP_INFO, etc.


What I meant was that there were cases when a receiver could not tell 
weather no data was coming or communication was interrupted.  Once 
connection is established, a route is available between a server and a 
client.  Let's say this route is broken for some reasons, i.e. someone 
unplugged a cable or a firewall started dropping or rejecting between these 
server and client, a sender may not notice as soon as it happens but at 
least, a sender knows a massages was not delivered right.  On the other 
hand, receiver side does not have any idea that a message delivery failure 
has happened at all or for a while unless using heartbeat messages in upper 
layer.  KEEP_ALIVE option seems to be implementation dependent such that you 
cannot assure TCP connection availability for every minute.


This is generally considered a robustness property rather than a fragility 
issue, but yes: if you need a liveliness property for idle connections with 
TCP, it's something you have to implement at the application layer, and many 
protocols indeed do this.  I don't see that this is an argument for using two 
TCP connections as opposed to one, however.  If you're interested in 
alternative protocols, however, SCTP allows a number of these protocol 
behaviors to be modified, and includes support for a heartbeat.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: GSoC: Semantic File System

2009-03-22 Thread Robert Watson


On Sat, 21 Mar 2009, Gabriele Modena wrote:


I am an AI master student at the university of Amsterdam.

On of my current research interests lays in the area of information 
retrieval and I would like to do a project within my University research 
group starting next june.


I am actually studying background literature about semantic filesystem and 
information retrieval over local files.


Being also quite interested in kernel development, I would like to propose a 
proof of concept that implements such techniques. My goal, though, would not 
be just a reimplementation of existing code, but possibly some more 
extensive work that combines techniques already used in other domains of II.


Could this be an interesting Summer of Code proposal for the FreeBSD 
Foundation?


I plan to write down some notes/ideas (and details) I have on a wiki 
starting from next week.


Hi Gabriele--

We are certainly not uninterested in projects along these lines, but I think 
the trick will be creating a convincing proposal that argues that (a) you can 
do the work in a summer, (b) there's a compelling usage case for including the 
results in FreeBSD, and (c) find a mentor who can supervise you in this 
project.  What sort of semantic file system do you have in mind?  How would 
you feel about a middle-ground project along the lines of Mac OS X Spotlight 
or similar efficient userspace indexing of a file system based on feedback 
from the file system about what has changed, or something BeOS-like, in which 
indexing takes place for extended attributes rather than for contents?


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: 2 uni-directional TCP connection good?

2009-03-20 Thread Robert Watson


On Fri, 20 Mar 2009, Yoshihiro Ota wrote:

1. With TCP connections, only sender side can detect some communication 
issues passively if happened.  By using two connections, you lost that 
ability by your self.  I agree on this one.


Could you expand a bit on this point?  While the connection creation process 
(usualy) asymetric, once the connection is built it's essentially the same 
state machine on both sides of the connection, and socket semantics with 
respect to the state machine are effectively identical.  Application on both 
sides should be able to detect disconnect, monitor connection state using 
TCP_INFO, etc.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: SA add notification to externa module

2009-03-17 Thread Robert Watson

On Tue, 17 Mar 2009, srikanth jampala wrote:


   This is my first posting.

I want the notifications about the SA (security association) add/delete 
events, from the kernel to my externel kernel module.


How can I do this... ?

Thanks in advance for ur suggestions.


I'm not sure if PF_KEY has an async notification event, but in principle you 
could consume those inside the kernel, not just in a user application.


Alternatively, you might reasonably submit a patch to add an EVENTHANDLER(9) 
event at the right points in the kernel code so that future versions of 
FreeBSD will allow your code to plug in more easily.  We already provide event 
handler hooks for things like process fork/exit, arrival/departure of network 
interfaces, etc.  The trick is to place them at the right points so that 
appropriate locks are held, and you'll want to avoid having your handler code 
change the semantics of the calling site (i.e., don't sleep if that's not 
allowed).


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


  1   2   3   4   5   6   7   8   >