from:"Mouse"

Sun keyboard on i386?

2011-07-13 Thread Mouse

I have a desk on which (for reasons not immediately relevant) the main
head is an i386 machine (4.0.1).  But this has meant I'm stuck using a
crappy peecee keyboard.

Today, I put together the interface electronics to put one of my good
(Sun type 3) keyboards on one of the serial ports.  It works, in that a
program that talks to the serial port can speak the keyboard's protocol
and get keystrokes and suchlike.

I can, if I have to, bludgeon X into being such a program.  But I
thought I would first try to use the existing kernel code for Sun
keyboards (which would, I would expect, have the additional advantage
of working in the text console).  Looking at the kernel configs, I see
that on sparc64 (and on sparc, though the comments say it's just for
test building) kbd can attach at com, which is convenient because it's
exactly what I want to do.  So I appended a handful of lines to my i386
machine's kernel config, mostly lifted from sparc64:

define firm_events
file dev/sun/event.cfirm_events needs-flag
device kbd: firm_events, wskbddev
file dev/sun/kbd.c  kbd needs-flag
file dev/sun/kbd_tables.c   kbd
file dev/sun/wskbdmap_sun.c kbd  wskbd
attach kbd at com with kbd_tty
file dev/sun/sunkbd.c   kbd_tty
file dev/sun/kbdsun.c   kbd_tty
kbd0 at com0

I had to change an #include and remove another to get the kernel to
compile, and rip a little code out of kbd.c and sunkbd.c to get it to
link, but surprisingly little.  Less than I was expecting.
(Specifically: in sunkbd.c, machine/kbd.h - sys/dev/kbd_reg.h,
remove machine/vuid_event.h, and rip out both arms of the
if (args-kmta_consdev) test in sunkbd_attach(); in kbd.c, remove
sunkbd_wskbd_cn{getc,pollc,bell} and sunkbd_bell_off, remove
sunkbd_wskbd_consops and the code in kbd_enable that conditionally uses
it.  Exact diffs available if anyone wants.)

But it doesn't work.  I added a printf to sunkbd_match, and it's never
even getting called.  Is there some kind person here who has any idea
why not and can point me in a useful direction?  I daresay it's
something that will be blindingly obvious once I see it

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Sun keyboard on i386?

2011-07-13 Thread Mouse

 Oh well.  It would have been a nice hack, but it's sounding like
 more effort than it's worth to me.
 Make it a line discipline, may be?

Possibly.  The kbd-at-com attachment is already close to that,
according to the comments (I haven't looked at the code enough to be
competent to remark on whether the comments are accurate).  The major
difference I see between what I think you're suggesting and the sparc64
way is the use of a userland utility versus autoconf machinery.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Multiple device attachments

2011-07-21 Thread Mouse

 /sys/device.h -- seems to indicate that a device driver can attach to
 multiple parent drivers (e.g. busses, controllers, ?)

 Does anyone know how this is done in practice?

device wdc: ata, wdc_common
attach wdc at isa with wdc_isa
attach wdc at isapnp with wdc_iaspnp
attach wdc at ofisa with wdc_ofisa
attach wdc at pcmcia with wdc_pcmcia

com is another example.  So is le.  I'm sure there are plenty of
others.  In some cases they don't even need the with stuff, I think.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Multiple device attachments

2011-07-21 Thread Mouse

 The examples you site seem to indicate that for example the le device
 may attach to many alternative devices (e.g. pci, tc, ?), but only
 one attachment is made when autoconf is complete.

For any particular instance of le, yes.

 I may have read the code examples incorrectly -- please pardon me if
 I did; but what I want to know is --  can a device have multiple
 attachments (more than one parent device) when autoconf is complete.

A device can in the sense that, for example, ne0 and ne1 might not
attach to the same parent, or even the same kind of parent (eg, one ISA
and one PCMCIA).

But a single node, a single instance of a driver (eg, ne0), always has
at most one parent (exactly one, I think, except for the autoconf root
most ports call mainbus).

To put it another way, the autoconf tree is a tree, not a dag.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: rsync very slow with current kernel (select issue?)

2011-07-27 Thread Mouse

 So select blocks (maybe because there's effectively nothing to read
 at this time), but instead of waking up when there's data ready it
 wakes up when the timeout expires.
 This seems rather similar to something I was looking at back in
 January.  [...]

I had a similar symptom once, which turned out to be fixed by having
both ends of the TCP connection set TCP_NODELAY.  (Just one end might
have been enough, but, since I was in there anyway, I did both.)  This
case doesn't sound quite similar enough for that to be it here, but I
could have missed something.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Adding linux_link(2) system call (Was: Re: link(2) on a symlink to a directory fails)

2011-07-29 Thread Mouse

 What about adding a linux_link(2) that would do exactly what link(2)
 does but without the FOLLOW flag to NDINIT on the path argument?

How about just fixing link(2) that way?

 If linux_link(2) seems unreasonable, it could be lazy_link(2),
 weak_link(2), braindead_link(2) or whatever.
 You'll also need to update every filesystem to allow this and update
 all the various fsck programs to allow filesystems to be in this
 state.

Hardly.  The most that needs to be done to every filesystem is to
reject these operations.  The filesystem(s) that we want to support
hardlinks to symlinks can then be uptdated, one at a time, along with
their fscks.

 I'd disagree with this as it seems like a nonsensical thing to do.

Why?

I usually can understand your point of view, even when I disagree with
it, but this time I'm baffled.  What's nonsensical about hardlinking to
a symlink?  Two names pointing to the same inode, which inode happens
to be a symlink - I see nothing nonsensical about that.

Of course, some filesystems may not implement symlinks as (their analog
of) inodes; they presumably will refuse such attempts.  Again, nothing
nonsensical; pretty much everything about symlinks can potentially vary
with the filesystem; this is no different.

What am I missing?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Adding linux_link(2) system call (Was: Re: link(2) on a symlink to a directory fails)

2011-07-29 Thread Mouse

 I'd disagree with this as it seems like a nonsensical thing to do.
 Why?
 Because symlinks are a special type of filesystem object with their
 own semantics

Every filesystem object is. :)

 Also, from a more operational standpoint, because there's no way to
 update a symlink in place, so there's no difference between two
 symlinks and two hard links to the same symlink except confusion and
 the number of inodes used.

(a) You're forgetting that symlinks have other attributes than the
link-to string.  The most obvious is mode bits (which have no effect
unless you mount -o symperm, but (a1) that can be done and (a2) they
can be queried with lstat(2) even if the filesystem doesn't use them),
but there are others, such as owner, or even inumber.

(b) If you have a lot of symlinks, inode usage may actually matter.

(c) I've long thought there should be a way to update a symlink
in-place.

 FWIW, I just asked some linux guys about the linux behavior and the
 answer was we sell rope.

That would be my answer too, though I'd probably phrase it as not
preventing you from doing stupid things because it would also prevent
you from doing clever things.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: autoclean mode for tmpfs

2011-08-07 Thread Mouse

 How hard would it be to add a mount option for tmpfs to
 automatically drop files after a given timeout?

 Anyone think this is worthwhile?

 Sounds like a job for the userland and cron(8).

Sort of.

Personally?  I'd say that scheduling it should be done by userland, but
that putting the actual removal in the filesystem makes sense.

I'm not sure whether I'd prefer to do it with a new and idiosyncratic
syscall, a vfs.something sysctl, some sort of filesystem-level analog
to ioctl, or what.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: autoclean mode for tmpfs

2011-08-07 Thread Mouse

 It's a security FAQ. If you do rm -rf (or nearly any of the other
 obvious/easy alternatives) in a world-writable directory, a hostile
 user can interact with it to erase any file on the system.

I believe that this is partially fixable: provided there is at least
one file descriptor available per directory level, I think it is
possible to safely remove everything but directories.  Most briefly,
fchdir to each directory, stat . and make sure it matches the directory
we thought we chdired into (to avoid doing damage if we lose a symlink
race).  Then delete things using relative-to-. paths and fchdir back
out.  However, since there's no way to make rmdir(2) use NOFOLLOW, we
have to either leave directory structure in place or risk removing an
attacker's choice of empty directories.

Not that this makes it any easier to do the usual find | xargs rm style
of cleanup, though.  To do it safely in the way I refer to above would
require doing it all inside rm.  Might be worth doing, but quite
possibly better done in the filesystem, to (a) avoid the need for the
file descriptors, (b) delete a file here and a file there rather than
the wholesale destruction of rm -rf (even if I'm right about it being
possible to make it safe against hostile users), and (c) get
directories right.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: autoclean mode for tmpfs

2011-08-07 Thread Mouse

 However, since there's no way to make rmdir(2) use NOFOLLOW, [...]
 ?

 lrwx--  1 dholland  notmp3 Aug  7 12:32 baz - bar
 valkyrie% rmdir baz
 rmdir: baz: Not a directory

!  Hm, I see that too.

I wonder where I got the idea it followed symlinks.

My apologies for spreading misinformation.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: what to do on memory or cache errors?

2011-08-22 Thread Mouse

 besides panicing, of course.

Ideally, I think...

Corrected error: Usually, log and ignore.  Maybe watch for elevated
levels of corrected errors and disable either the containing page or
the containing memory stick, depending on how much the hardware lets
the kernel determine and maybe policy sysctls.  Maybe even allow
paranoid sysadmins to configure elevated levels of to mean any.

Uncorrectable error: Log.  Disable the containing page and/or stick, as
mentioned above.  If it's for the contents of a dirty page, about all
we can do is deliver a memory-error signal.  If it's for a clean page
(including (most) instruction-stream fetches), re-fetch the virtual
page into a new physical page and carry on.

 This is going to involve a lot of help from UVM.

Probably.  Maybe the pmap, too, for things such as figuring out what
regions of RAM would have to be disabled to stop using the affected
memory stick, or the like.

 If uvm_page_error can't correct the error, it would panic.

I'd recommend doing that only for kernel accesses; for userland, I'd
much prefer to blow up at most the process incurring the fault.

 Preemptively, we could have a thread force dirty cache lines to
 memory if they've been in L2 too long (thereby reducing the problem
 to an ECC error on a clean cache line which means you just toss the
 cache-line contents.)

Depends.  Are we talking ECC on L2 cache, or on main memory?  I'd say
the results should be different.

 We can also have a thread that reads all of memory (slowly) thereby
 causing any single bit errors to be corrected before they become
 double-bit errors.

Well, to be detected.  Whether the correct action upon detecting them
is to silently correct them is a policy matter I'd prefer to avoid
wiring into the kernel.

 I'm not familiar enough with UVM internals to actually know what to
 do but I hope someone else reading this is.

Me neither.  I have just about zero idea how implementable any of the
above is; I've been speaking in ideal generalities.  (My idea of ideal
generalities, that is, of course.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Where are the specific WARNS=n defined?

2011-08-22 Thread Mouse

 [...] gcc errors due to comparison of signed and unsigned values.

 It is best to fix the errors.

What errors?

It is not necessarily an error to compare signed and unsigned values.
In my experience, that warning produces so many more false positives
than useful warnings that I normally shut it off entirely.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Where are the specific WARNS=n defined?

2011-08-23 Thread Mouse

 [...] gcc errors due to comparison of signed and unsigned values.
 It is best to fix the errors.
 In my experience, that warning produces so many more false positives
 than useful warnings that I normally shut it off entirely.
 and that one time that using it might have warned you about a serious
 vulnerability?

When was that?

Except for a few that also provoked, or would have provoked, the
warning about how a conditional's value is constant due to limited
range of data type, I can't recall ever finding a bug that
-Wsign-compare warned about (or would have warned about).

Ever.

In anyone's code.

Yes, it's possible there is such an occasion lurking in my future.
It's also possible I've forgotten about one in the past.  But I judge
the expected cost of possibly having to track down such a bug directly
to be well below the expected cost (both immediate and in down-the-road
maintenance) of pervasive manual uglification of code to fix
non-errors.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Where are the specific WARNS=n defined?

2011-08-23 Thread Mouse

 It is not necessarily an error to compare signed and unsigned
 values.  [...]
 And it is not an error to put assignments in conditionals, or not
 place parentheses to clarify operator precedence, etc.  It is a
 warning [...].  For some of us this is helpful.  The compiler writers
 try to help protect programmers against common mistakes.  If you
 don't like the warnings you are free to turn them off.

That's what I do - along with a handful of other such warnings.

The question asked what the appropriate action was, whether to turn the
warnings off (the way real kernel compiles apparently do anyway) or
uglify the code to work around the warning [ok, my phrasing].  I
believe the former is better, because in my experience the mistake the
warning warns about is anything but common.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Adding pulse support to gpio(4), gpioctl(8)

2011-08-23 Thread Mouse

 Well, you need to open it first, before you can to ioctl, and if
 only one process can open it, only one process can ioctl it, right?
 Wrong.

Agreed.

 Multiple threads can ioctl and nobody prevents one from having a
 single process with multiple threads (pthreads, if you like).

Not only that, but even without threading, there are at least two ways
I can think of offhand that a file descriptor, once opened, can end up
in multiple processes' open file tables: fork() and SCM_RIGHTS.  (There
are probably others, too.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: KASSERTMSG fix

2011-09-07 Thread Mouse

 And I am not aware of a solution where you can have two ... in a C
 function.

You can't actually _write_ something like

void foo(const char *, ..., int, const char *, const char *, ...);

but, except for -Wformat issues, you can get that effect with suitable
use of va_* calls within the implementation of foo().  (If you expect
to use vprintf or relatives to consume the first ... list, this
involves unwarranted chumminess with the stdarg implementation.  But if
you walk the first ... list yourself, it's no problem at all.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: KASSERTMSG fix

2011-09-08 Thread Mouse

 You can't actually _write_ something like
 void foo(const char *, ..., int, const char *, const char *, ...);
 But you can do:
 void foo(const char *, va_list, const char *, ...)
 if you need to add some extra args.

Yeah, but then you have to pass a va_list, not separate args.  Of
course, for some uses, that's entirely tolerable.

 What would be useful is a format effector that processes a format
 string and a va_list (recursive call inside vsnprintf).

 But adding non-standard effectors is not really a good idea.

I once added such a thing (I think I used %@).  It was easy, but I
never used it very much and never rolled it forward (it was 1.4T I
added it to).  Never even got around to adding it to -Wformat.

As for using nonstandard formats, don't we already do that with %b?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Perform mmap and poll on PUD character devices

2011-09-10 Thread Mouse

 I do not understahd why it is desirable to involve additional context
 switches to and from userspace into this data path.

 Instead of writing a bunch of fairly dubious page mapping code [...]
 in the kernel to support user-space daemons handling various virtual
 disk formats, why not put the effort into just doing the various
 desired virtual disk formats in-kernel?

I can't speak for Roger, but it seems to me that an appropriate answer
would be the same reasons you do _anything_ in userspace rather than
the kernel: better insulation of pieces of the system from one
another and ease of changing if you want to run something else instead
strike me as the biggest ones.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: MAXNAMLEN vs NAME_MAX

2011-09-24 Thread Mouse

 MAXNAMLEN = 511
 NAME_MAX = 255

 [...]  We want to make them consistent.

Do you want to increase NAME_MAX, or decrease MAXNAMLEN?

 My opinion is that [versioning userland] is not worth the trouble.
 The only programs that can fail are ones that do things like:
  char name[NAME_MAX];
  strcpy(name, d-d_name);

This sounds as though you are contemplating increasing NAME_MAX.

 sizeof(d-d_name) does not change. It is just that d_namelen can be 
 255 (NAME_MAX).  Only programs that use NAME_MAX to store directory
 entries can fail.

Not quite.  Such things can also find their way into code in subtler
ways.  For example, I've writen code that knows it can store a
directory entry length in an unsigned char (which amounts to assuming
NAME_MAX = UCHAR_MAX).  I think all the recent examples of that I've
written have been FFS-specific and therefore safe (if I'm reading
things right, FFS uses a single octet to store directory entry length
on disk), but I'm probably not the only one who's done such stuff.

 My vote is to bump without versioning, what's yours?

I probably agree with you.  But what's the motivation for increasing
NAME_MAX rather than decreasing MAXNAMLEN?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: MAXNAMLEN vs NAME_MAX

2011-09-27 Thread Mouse

 Certainly the original 14 byte limit was occasionally a nuisance (but
 even that was better than 8+3 which was typical), but longer than
 255?

I've run into the 255 limit.  On only a few occasions, but definitely
more than zero.  (About three times, I think.)

In my case it is usually files named after URLs; I will typically put
(for example) http://www3.telus.net/~bhilpert/tmp/touchTone1969.gif in
a file called www3.telus.net%~bhilpert%tmp%touchTone1969.gif.  I
regularly see (though seldom want to fetch) URLs long enough to blow
out a 255-character limit under this transformation.

I'm sure other people have their own uses for long pathname components,
too, though I don't know of any offhand.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: A simple cpufreq(9)

2011-09-28 Thread Mouse

 If that periodically-threatened pdp10 port (or some other off-size
 port) ever appears, it's not likely to care about the size that
 appears in some other environment (unlike for on-disk structures) and
 using an explicit size will if anything make life more complicated.

Especially if it's a size that doesn't exist on that port.  Is uint32_t
32 bits or at least 32 bits?  THe former may well not exist on a
pdp10 port.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: A simple cpufreq(9)

2011-09-28 Thread Mouse

 I'm sure at this point someone could put together a 36-bit machine
 out of FPGAs that ran fast enough to be used as a low-volume web
 server, and there are certainly heterogeneity advantages to such a
 platform.  Maybe someone who knows enough about such things should
 actually do this :-)

If I had a source of FPGAs that were decently documented - in
particular, that didn't demand use of a vendor-provided opaque binary
blob to generate the programming data, I'd probably be doing that
(among other things) already.  (Such things may well already exist, but
I haven't found them.  Not that I've put all _that_ much effort into
looking; finding needles in haystacks is not exactly my forte - unless
the needles are bugs and the haystacks are code.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: UNIX kernel notification system

2011-10-03 Thread Mouse

 [Do you really mean to use paragraph-length lines?  I'd suggest
 against it; they impair readability significantly, at least for me.
 Manually rewrapped in the quotes below.]
 less(1) (or more(1)) doesn't take care of you?

Maybe; see below.

 The nice thing about such formatting is that the text can be wrapped
 at relatively arbitrary word boundaries, making it more readably
 displayable on a wider range of display widths (e.g. mobile phones,
 tablets).

That would have been true if the mail were marked format=flowed, which
yours wasn't.  Since it wasn't, the UA has to assume that that single
long line is supposed to be a single long line, and rewrapping it
arbitrarily is wrong.

Actually, my software deals with it moderately poorly.  Depending on
exactly which piece is handling the text in question, it either wraps
with no regard to word boundaries or truncates - I'm not sure whether
this counts as tak[ing] care of it for [me] or not.

When mail displays uglily because my software doesn't know how to
interpret correctly-marked mail, I don't mention it - but when the mail
isn't marked as rewrappable, it is hardly a UA fault to not rewrap it.

Again, I'll be manually repairing the damage for purposes of this email.

 What about embedded?  [...]
 What about machines with multiple keyboard/screen heads [...]
 I'd argue that embedded is a degenerate case of lights-out, [...]

Certainly defensible.

 The multi-bottle+keyboard ( possibly mouse, though last we met,
 there was only one of you ...) is arguably the standard multi-user
 timesharing system set up, with a little more complicated terminal

Hm.

I think I agree.

 The handling could even be look up the appropriate language for this
 message to match what the users of this system know how to read,
 e.g. catgets(3) in NLS message catalogs.  See?  i18n handled!

I think it's more like i18n handwaved, but never mind.

 (OK, except for the translation part, but I'll put on an MIT X11 hat
 here and say, mechanism, not policy!)

Agreed.  For unclassified text messages, I think just passing the text
message to userland (for display, translation, ignoring, checking
against admin-configured swatch-style patterns, whatever) is about as
good as we're likely to get.

/dev/log basically _is_ that answer, though; that part's already done.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Mail.app idiocy [was Re: UNIX kernel notification system]

2011-10-03 Thread Mouse

 [Do you really mean to use paragraph-length lines?  [...]]
 less(1) (or more(1)) doesn't take care of you?
 [...]
 I assume you're using Mail.app as a user-agent.  Apple used to do
 this right -- they wrapped the lines at 72 columns or thereabouts,
 but then marked the text as format=flowed in the MIME headers, so
 readers wishing to rewrap it could do so.

 In recent versions they've broken this again.  Woo hoo.  Because, you
 know, all the world's a Mac and every user-agent is Mail.app.

There actually is approximately zero chance this will get fixed.  I
know someone who works at Apple (I have no reason to think he's on this
list) who checked their logs and tells me the breakage was done for
Exchange-compatability reasons.  I would have hoped Apple would have
had the balls to close the bug report with a cannot fix; it's Exchange
that's broken here response, or _at least_ provide a bug
compatability with Exchange tickbox, but apparently they prefer to
just ship broken software, silently gulling their users into inflicting
second-hand Microsoft brain damage on the rest of the net.

It doesn't help that most Apple users - heck, most users - aren't
competent to understand the issue and thus don't see what the problem
is.  (I hasten to add that Erik is not on that list of most [] users
whom I fear are not competent to understand the issue.)

The current Mail.app behaviour is broken enough in its own way.  Try
sending a nicely-laid-out table like

HostSizeConnection
frodo   74290   10-only
bilbo   81288   10/100
samwise 41442   10-only
sauron  940061  10/100/gig
aragorn 286166  10/100
merry   40447   10/100

to such a user and watch it get converted into

Host Size Connection frodo 74290 10-only bilbo 81288 10/100 samwise
41442 10-only sauron 940061 10/100/gig aragorn 286166 10/100 merry
40447 10/100

One of my correspondents at work has exactly this problem when I send
nicely formatted text.

 This is, admittedly, less evil than the Android mail client's
 unalterable behavior of base64-encoding *all* message data, even that
 which would be perfectly readable as plain ASCII text.

I don't think so.  That at least is correctly labeled and thus _can_ be
mechanically corrected for, by those for whom it needs correction.
This is a case of Mail.app suckering its users into sending out
mislabeled mail without even telling them it's doing so.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: fs-independent quotas

2011-10-21 Thread Mouse

 (ufs is unix filesystems, isn't it ?)

On the few occasions when I've seen it expanded (usually in Sun
documentation - my impression is that the name came to BSD from Sun),
the U has been expanded to User.

However, regardless of the expansion, the name has come to refer to
what is perhaps more properly called FFS, and I think using the ufs
name as part of something that is filesystem-independent is a mistake.
If nothing else, it will confuse humans.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: fs-independent quotas

2011-10-21 Thread Mouse

 Nor in the tree-based dictionnary, or in the multidimentionnal array.
 No, in an array the unused locations do exist.
 I don't understand this.  If you have a 2-dimention array
 quota[id][type], and quota[class=group] doesn't exist for this
 filesytem, you have quota[class=group]=NULL and no memory associated
 with it.

Not no memory.  The memory for that pointer, the one that's nil, still
exists.

If you use an array, representing a quota for id=0 and a quota for
id=99 requires 100 array elements, even if 98 of them are
nil.  With a suitable sparse data structure, the memory cost
is..substantially lower.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Patch: rework kernel random number subsystem

2011-10-22 Thread Mouse

 The critical values for the statistical tests are set so that
 p=.0001, so there should be one false positive (the null hypothesis
 being that the data _are_ random) in 10,000 rekeyings.  In that case
 the right thing to do is simply to rekey -- though for a hardware
 generator that fails the test, the conservative thing to do, I
 believe, is to detach that particular random source, so that is the
 behavior I intend to leave in place in that case.

Conservative, but not necessarily conrrect.  Some systems stay up a
long time, and if working hardware RNG get auto-detached whenever a
1-in-1 test trips, long-lived systems _will_ lose their RNGs.  I
think this is suboptimal.

Indeed, a hardware RNG that _didn't_ fail that test once in a while
would be suspect.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

4.x - 5.x locking?

2011-11-08 Thread Mouse

I've got some questions about locking.

I've got some kernel code to move from 4.x to 5.x.  Looking at it, I
can see it effectively assumes the kernel is giantlocked: it assumes
that at most one CPU is executing in the kernel.  (This particular code
never runs in device interrupt context, though it may get called from
callouts and the like.)  I seem to recall being told, when I asked more
or less this question on the lists some time back, that this was a safe
assumption under 4.x - and, indeed the thing seems to work on 4.x.

Thus, my first question: is this also true of 5.x?

I found mutex(9), condvar(9), and the like.  But it is not clear to me
what I need to do to be MP-ready.  Do I need to use the stuff from
mb(9), or membar_ops(3), or what?  It's not clear from the manpages
whether, for example, membar_enter is usable within the kernel; the
reference from mutex(9) seems to imply so, but I've been surprised
before.  It's also not clear whether it would even work; I see no
statements promising that if I, for example, do

mutex_enter(mtx);
...update a data structure...
membar_sync();
mutex_exit(mtx);

that the updates will necessarily be visible to another processor which
later takes the same mutex; membar_sync() is specified to synchronize
memory accesses with respect to other memory accesses, not necessarily
with respect to (for example) mutex operations, and it's not clear
whether the other memory accesses includes accesses by other
processors.  I could have the other processor do a membar_enter() after
taking the mutex, but, again, it's not clear whether the accesses the
manpage talks about refer to this CPU or any CPU.  (Any CPU is
more useful here (and probably mroe expensive), but this CPU is what
I'd expect from what I've read of memory barriers in CPU documentation.)

The mb(9) page specifically warns that it does not entail any promises
about pushing stores to visibility by other processors, so I don't
think it's useful here - am I wrong?

And, finally, with reference to the membar_ops(3) page, what does it
mean for a load to reach global visibility?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: 4.x - 5.x locking?

2011-11-09 Thread Mouse

  membar_sync();
  mutex_exit(mtx);
 mutex_enter and mutex_exit are implicit memory barriers (reads and
 writes respectively are not allowed to be reordered).

Oh!  Thank you.  Has that made it into mutex(9) in -current?  If not, I
offer my opinion that it should.

Does mutex_exit also implicitly push writes to main RAM, or whatever
else is necessary to make them visible to other CPUs?  (A reordering
barrier does not necessarily imply a global visibility barrier.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: 4.x - 5.x locking?

2011-11-09 Thread Mouse

 You can still have non MP-SAFE drivers in netbsd-5 these days.  If
 you do not set D_MPSAFE flag they will be giant-locked[1], [2], [3].

Some of the code is not a device driver, in the sense that it is
entered other than via {b,c}devsw[] entries or interrupts.

 [...mutex...condvar...]
 b) In cv_wait mtx is used this mutex is released before thread went
to sleep and acquired before it's woken up,

Yes.

which means that you can safely do required work in side mtx
guarded producer area.

Not quote.  On modern MP systems, mutual exclusion such as you outline
(which affects flow-of-control only) is not enough; you also need
memory barriers.  Joerg Sonnenberger just said that mutex_enter() and
mutex_exit() include the appropriate reordering barriers.  I don't yet
know whether they include global visibility barriers (data cache pushes
on this CPU and invalidates or snoops on others) or not - you may
have seen my note to the list asking - but those are needed too; if
they are part of the mutex routines, then your skeleton code is
correct, though your explanation omits part of the reason why.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: 4.x - 5.x locking?

2011-11-09 Thread Mouse

 However, since we aren't talking about non-cache-coherent
 architectures (which require even more manual manipulation) it's only
 about access reordering in the memory hierarchy.

I'm not totally clear on what cache coherency is.  Based on these
remarks, I'm going to guess that a cache-coherent architecture is one
on which, as far as the model visible to the programmer (including
kernel programmer) goes, it is not possible to have conflicting data in
two CPUs' caches: either different CPUs don't have distinct caches, or
there is automatic cache update and/or invalidation in hardware (at
least optionally, and if it's optional then NetBSD runs the hardware in
that mode).

Correct?  If so, that completely annuls the hairiest of my worries.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: fs-independent quotas

2011-11-13 Thread Mouse

 The arguments that ufs_quota_entry (or whatever its name is) will be
 good enough for any future filesystem is just not true.

You have asserted that.

Proof by repeated assertion is...unconvincing.

Not that I think nuermic IDs will be good forever.  But, given the lack
of any _okther_ filesystem interfaces that represent such things as
strings, I think they are far enough off that it is much too early to
try to design them in here.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: bumping ARG_MAX

2011-11-14 Thread Mouse

valkyrie% grep foo */*/Makefile
 Use: grep -r --include Makefile foo .

That (a) will include Makefiles at other depths than two (which may not
be a problem in the specific example of pkgsrc, but in general makes it
non-equivalent), (b) is grep-specifc, and (c) will walk the whole tre
to full depth even if there aren't any more Makefiles for it to find.

I think the point still stands.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Patch: new random pseudodevice

2011-12-09 Thread Mouse

 In what sense are bits really ever taken out?

Revealed to userland, of course.

The idea here is that entropy that has been revealed to userland might
as well not be present.  With good mixing at appropriate points, this
is of questionable truth, but it is, as you said, a very conservative
approach; it amounts to assuming userland has unlimited computational
power available to invert the mixing.  Combined with the conservative
approach to estimating how much entropy was put into the pool, it is a
reasonably good way of making sure that when you ask for strongly
random bits, you get strongly random bits uncorrelated with anyone
else's bits.  Your change loses this property, depending on something I
might call an entropy stretcher, something which takes some number of
random bits and produces a much larger number of no-longer-random bits:
essentially, even the supposedly-strongly random device becomes just a
PRNG.  (A complex one among PRNGs, but still a PRNG.)

 If there is some kind of correlation between the bits you get from
 the pool now and the bits you got from the pool then, the right
 answer is not to put more bits in and hope the correlation gets
 worse; it is to correct the output function so that finding such a
 correlation is actually cryptographically hard.

It's true that better mixing on output is a good thing.

However, it does not fix the fundamental problem that you can't get out
more information than you put in.  Even a _good_ PRNG can't avoid
correlated output bits, even if the correlation is complex enough to be
hard to exploit.  You are replacing a very conservative and
well-defined concept (the amount of unrevealed information remaining in
the pool), even if a somewhat misleading term (entropy) is used for
it, with a vague hope/belief that your PRNGs are hard to invert.

 Before, bits were extracted from the pool with a construct nobody
 had really studied, and we counted every bit output as if it had been
 somehow consumed.  Even though we didn't actually understand what
 consumed meant.

Maybe you didn't.  I thought it was perfectly cleaer: information
exposed to userland reduces the amount of secret random information
content remaining in the pool.

In practice, I doubt your changes weaken it much...yet.

In theory, they are pretty horrible.  Information content is a fairly
well-defined concept, and the old code took a conservative approach to
measuring it and doling it out.  You are replacing that with something
that appears to think it can turn a small amount of information into a
large amount, which is not possible; the information content of the
output of your per-device PRNG cannot be more than the amount of
information you keyed it with, even if the correlations are currently
difficult to see.

I would welcome better mixing on output.  But this information
stretching for the supposedly-strongly-random device is, in my opinion,
just plain broken.

 And note that at least one highly-thought-of modern design for an
 entropy collector (Fortuna) doesn't even _try_ to keep an entropy
 estimate

Because one popular system makes a mistake, we should make the same
mistake?  (Actually, see my last paragraph, below.)

 -- the whole concept is pretty fuzzy when you start trying to count
 how many bits you took out.

Not fuzzy at all.  Read unrevealed information content for entropy
and it amkes a whole lot of sense.  The number of bits you took out
is the number of bits of information revealed to elsewhere.  (The
amount of information content, not necessarily the number of bits of
apparent information - if you feed 32 bits of information into a hash
function, you get at most 32 bits of information content out, even if
they're spread across multiple hundreds of output bits.)

It's possible there's something going on here I don't understand, which
invalidates these arguments.  I'd welcome any pointers to such a thing.
But until then, I'm going to stick with the information-theoretical
point of view that you can't get more information out than you put in,
and call this key a PRNG and then generate more bits of output than
there were in the key implementation of the supposedly-strongly-random
device broken.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Patch: new random pseudodevice

2011-12-09 Thread Mouse

[I'm pulling together multiple mails from tls here.  The second-level
quotes are from varying people; I've marked their authors according to
the info I have.]

[Mouse]
 Revealed to userland, of course.

 Combined with the conservative approach to estimating how much
 entropy was put into the pool, it is a reasonably good way of making
 sure that when you ask for strongly random bits, you get strongly
 random bits uncorrelated with anyone else's bits.
 Look at the implementation.  It *never* worked that way.

It came reasonably close.

 To cause bits to actually be taken out, you'd have to maintain two
 pools, discard the entire contents of one every time any bits were
 revealed to userspace, and switch to the other.  Or something along
 those lines.  And that's just not how it ever worked.

To be certain of it, yes.  It should have been done that way, and I
would support changing it to be done that way.

What it actually was is a hybrid.

[Alan Barrett]
 Fair enough, but you still seem to be talking about how good a
 CSPRNG it is, whereas my concern is that it's pseudorandom, nor
 random.

 So was the output from the old entropy pool.

Only sort of.

 As soon as you start accumulating random bits in any manner that
 leaves the old ones in -- that is, does not entirely eliminate them
 as inputs to the accumulator function -- even after you take them
 out -- that is, disclose the accumulator function's output -- you
 are dealing in precisely what you say you want not to:
 cryptographically secure pseudorandomness.

Not entirely, in two respects: (1) provided you don't draw on the pool
for more bits than it has information content, you are in principle
getting information out.  If the mixing function is good, you will
actually be doing so.  (2) If you are stirring new randomness into the
pool in the meantime, it...complicates things; it is no longer a pure
PRNG.  To an approximation as good as the mixing function is good, you
can draw on the pool as much as you like, provided you never draw
enough to reduce the information content to zero.

I'm not particularly happy with the old mixing function; see below.

I would much prefer to do what you suggest...

 To get the property you seem to want, you basically have to buffer
 the purportedly true randomness into pools, blocks, [...]

...to accumulate entropy into unread blocks and release it to the rest
of the subsystem in blocks, to be consumed as such.  Preferably, it
should be whitened as it's accumulated into blocks.

_That_ would actually be fixing one of the potential problems the
current system has, without introducing more.

 Let me put it this way: before, you may have thought you were getting
 some kind of true randomness.  You weren't.

But you were, to a decent approximation, assuming the input entropy
estimate was not an overestimate.

You appear to think that anything that's not pure true randomness must
be pure PRNG.  The old randomness pool was neither; it was a hybrid.
If the mixing function is good - and that's the first weakness; we have
only heuristic reasons to think it is - then to a correspondingly good
approximation, it returned real randomness - bits uncorrelated with
anything else - as long as the actual entropy-in-pool was greater than
the number of bits returned.  This is the second weakness; nobody
really knows how good the estimate of the entropy supplied by input is.
Personally, I suspect it's an underestimate, but I have no particular
evidence for that; as long as I'm right, it's OK in this respect.

 Now, you still aren't, but at least what sits between you and the
 entropy source is a lot more clear, and a lot better analyzed

I don't know; it sounds a lot less clear to me.

I'd still rather fix it right.  If you go ahead with inflicting a PRNG
on /dev/random, it really really needs to be prominently marked as
being a PRNG even for the stronger device, since that's a nontrivial
regression over previous versions (which may not have been perfect, but
did return at least _some_ real randomness in the bits it produced, the
exact amount depending on how good the mixing was and how good the
input estimates were).

 The bottom line: pseudorandomness is the best you're going to get.

With your design, perhaps.  But you actually outlined a design that
does not suffer from those problems; why not do that instead?

 It might as well be done in the safest way possible,

That's fine, for /dev/urandom.  If you want to hook what you've
outlined, or something like it, up to /dev/urandom, that's fine.

For /dev/random, I consider it a bug.  That the old system was
partially broken (to an extent nobody really knows) does not excuse
replacing it with something even more broken.

 I have been trying to follow the Yarrow/Fortuna design paradigm of
 rekeying the stream generator from the entropy pool at each
 request.  [And asking, what's a request?]

 However, when applications use /dev/random, we could consider a
 request to be a single read from

Re: Patch: new random pseudodevice

2011-12-09 Thread Mouse

 You are aware of the fact that 99.99% of computers don't have true
 random number generators and the bits you claim that are random are
 not random at all?

Actually, practically all computers have true random number generators.
The first problem is that neither they nor their interfaces are
designed as such, so getting the randomness out of them and into the
system is...interesting.  The second problem is that nobody really
knows just how good the resulting randomness is - that is, while there
is true randomness there, nobody knows just how much information
content there is in each random bit.  (The latter is one reason for
whitening input bits as they are gathered.)

These random number generators are things like the turbulence inside
disk drives and the noise in sound input.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Patch: new random pseudodevice

2011-12-09 Thread Mouse

 a small one, but
even the small one has sufficient entropy for your purposes.

 Notably, [Mouse's opinion] differs from the opinons of the people who
 wrote the several relevant FIPS and X9 standards, who _require_ that
 cryptosystem keys be generated by an approved DRBG (their terminology
 for a CSPRNG) -- though they also impose minimum entropy requirements
 for keying the DRBG itself -- and of SP800-90, which explicitly
 discusses this issue.

I don't know why they chose that.  Perhaps the standards body simply
made a mistake.  Perhaps they had to compromise for any of many
possible reasons.  Perhaps they simply considered the minimum input
entropy to be enough for the purpose the standard is intended for and
just use the PRNG as a whitening and stretching function.  But I value
information-theoretic considerations, such as any determinstic
computation's inability to contain more information in its output than
is present in its input, over any standards body's output.

[Paul Koning, quoting me]
 [uses for RNGs]
 (1) Strong bits suitable for direct use as things like crypto keys.
 Using a PRNG here, even a really good one, is a major fail.  The
 only time it's acceptable is when the data drawn is no larger than
 the PRNG key, and then you might as well return the bits directly.

 I don't think this is correct.

 One thing to keep in mind is that the current standard of quality for
 a cipher is that its output is indistinguishable from a random string
 (up to a length limit, 2^blocksize or 2^(blocksize/2), I'm not sure
 which).

Computationally indistinguishable, today.  It is never theoretically
indistinguishable, as can easily be seen by considering trying all
possible keyings and seeing if any of them match.

This is why I'm hammering on the information-theoretic considerations:
they are fundamental, not subject to change with advances in
cryptanalysis.  The security of bits drawn from an properly-designed
entropy pool depends on much weaker assumptions than the security of
bits produced by a PRNG (when the PRNG seed is smaller than the number
of bits produced).

In practice, today, what Thor is proposing (or, in view of what he's
said, perhaps I should say imposing) is probably good enough for most
purposes.  It is not good enough in theory.  That's why it does not
satisfy me, especially when there is an easy way to get something that
is as good, theoretically, as is available - indeed, Thor himself
outlined it.

[smb]
 In my opinion ([and presumably others']), a CSPRNG is more secure.
 Why?  Because we *know* what it does, all the time.  True RNGs are
 devilishly hard to get right, and are susceptible to all sorts of
 environmental perturbations.  Imagine what would happen if someone
 upgraded the disk to a flash disk or one with a large flash cache

You still need a true RNG (to seed your PRNG), though, or you get
predictable bits.

A CSPRNG makes a good mixing function.  But that's really all it's
doing, because that's all it's capable of doing.

In any case, it's no skin off my nose.  I have plenty of other reasons
for not using whichever NetBSD this ends up in.  I've pointed out the
problems; if NetBSD is determined to carry on regardless, that's its
lookout.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Use consistent errno for read(2) failure on directories

2011-12-10 Thread Mouse

 According to the online OpenGroup specification for read(2)
 available at [1], read(2) on directories is implementation
 dependant.  If unsupported, it shall fail with EISDIR.

 Not all our file systems comply, and return random errno values in
 this case (mostly EINVAL or ENOTSUP).

How does that not comply with implementation dependent?  From a
standards-conformance point of view, that's equivalent to in this
implementation, read(2) on directories is supported: on $FILESYSTEM, it
always returns EINVAL, on $OTHER_FILESYSTEM, it works according to
$REFERENCE; on $THIRD_FILESYSTEM, it always returns EOPNOTSUPP.

This is not to say that it shouldn't be cleaned up.  Just that I don't
think it's actually nonconformant.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Debian OpenSSL desaster (was: Patch: new random pseudodevice)

2011-12-11 Thread Mouse

 [I tried to send this as private mail, but get

 host Sparkle-4.Rodents-Montreal.ORG[216.46.5.7] refused to talk to me:
 550-.de's whois server, whois.denic.de, is completely broken, [...]

I wrote up a point-by-point reply to this, but then realized, this is
tech-kern, not tech-broken-network-governance.  So I'll confine myself
to saying my respnse is at
{ftp,http}://ftp.rodents-montreal.org/mouse/ccTLD-thoughts.txt for
anyone interested.  (Actually, will be at; as I send this mail, I'm
still writing it - the draft is available at
.../ccTLD-thoughts-draft.txt and I'll move it when I'm done.)

As for the content...

 I don't recall full details, but I think it was a Linux distro
 It was the Debian OpenSSL desaster.  In essence, they patched
 OpenSSL's entropy gathering to the point where the PID was the only
 entropy source being used.

Ah.  Yeah, that'll do it.  Thanks for the correction; I'm not surprised
I got some of the details wrong - but the actual incident works just as
well for the argument I was making with it.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Debian OpenSSL desaster (was: Patch: new random pseudodevice)

2011-12-11 Thread Mouse

 [...]
 The short answer is that Mouse likes tilting at windmills. :-)

Eh.  I think that is at least a little of a misstatement.  I don't do
such things because I enjoy doing them.  Quite the opposite.

I do them because I must.

I'm not entirely sure what I mean by that.  It's difficult to explain,
even to myself.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Lost file-system story

2011-12-12 Thread Mouse

 [...], you do indeed seem to think that async-mounted Unix-based
 filesystems should be able to be repaired, at least some of the time,

There's a huge difference between this isn't promsied and this never
happens.

They _can_ be repaired...some of the time.  When they can, it is
because, by coincidence, it just so happens that the stuff that got
written produces a filesystem fsck can repair.

 The probablility of any Unix-based filesystem being repariable after
 a crash is zero (0) if it has been mounted with MNT_ASYNC, and if
 there was _any_ activity that affected its structure since mount time
 up to the time of the crash.

This is simply false.  I just tried it.  On a 5.1 i386 system, I used
fdisk and disklabel to make a half-gig partition, newfsed it, mounted
it normally, copied a file into it, unmounted it, mounted it async,
removed the file, and hit the power switch.  After the machine came
back up, I tried fsck on the filesystem.  It said it was clean.  I used
fsck -f.  It was happy.  I mounted it and, as far as I can tell, fsck
was correct in thinking the filesystem was OK.  So, there is an
existence-proof-by-example that there are circumstances under which a
filesystem mounted async can be changed and still be left in a state
fsck can repair.

 It still might survive after some types of changes, but it _probably_
 won't.

Right.  But that's not probability ... is zero (0).

 Linux ext2 is not a Unix-based filesystem and Linux itself is not a
 Unix-based kernel.

It's about as Unix-based as NetBSD is.  Unless you mean something
strange by Unix-based - what _do_ you mean by it?

 For Unix-based filesystems and their repair tools, any probablility
 of recovery less than one is as good as if it were zero.

That's not how I feel about it when I've lost a filesystem.  I'll take
a filesystem with a nonzero probability of recovering something useful
from over one that guarantees to trash everything any day (other things
being equal, of course).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: ccTLD filtering (was: Debian OpenSSL desaster)

2011-12-12 Thread Mouse

 You can make [your] point, but you won't win against Mouse as he just
 doesn't care outside of his wall [...]

Yeah.

I used to.  Then I realized that it was sucking away a huge amount of
time, energy, and stress tolerance, for, as far as I could tell, zero
benefit to anyone, including me.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Lost file-system story

2011-12-12 Thread Mouse

 They _can_ be repaired...some of the time.

 That's totally irrellevant.

I don't think so, not when I'm replying to a claim otherwise.

 Possibilities other than zero or one are not useful in manual pages,

Then we can throw away fsck, because there is always _some_ chance the
filesystem will be irreparable.  Memory, CPUs, disks, and the
transports between them do fail, occasionally transiently.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: RFC: import of posix_spawn GSoC results

2011-12-19 Thread Mouse

 What's clean about importing the VMS process model to Unix?

That's hardly the VMS process model - or at least it wasn't back in the
'80s when I used VMS.  In particular, in the VMS paradigm, the CLI (as
close as VMS gets to the shell) and the program being run run in the
same process.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: RFC: import of posix_spawn GSoC results

2011-12-20 Thread Mouse

 What's clean about importing the VMS process model to Unix?
 That's hardly the VMS process model - or at least it wasn't back in
 the '80s when I used VMS.  In particular, in the VMS paradigm, the
 CLI (as close as VMS gets to the shell) and the program being run
 run in the same process.
 Well, the process model as such will not change just because another
 system call is introduced in Unix.

No, of course not.  That's why I think talking about importing the VMS
process model is irrelevant - that's not what's happening.

 Also, the CLI is not really related to the process model either.

Well, it is in that the CLI and the program being run run in the same
process, which means that process is very long-lived.  Starting new
processes under VMS is - well, was - a very heavyweight operation, far
more costly than fork() under Unix.  The paradigm was, a process was
created on login and it lived until logout; that the CLI inhabits the
same process as programs you run is relevant only in that it eliminates
one of the principal reasons Unix needs lots of processes.

 However, yes, this system call looks like it comes very close to how
 tasks are run under VMS.  Even down to the name.

LIB$SPAWN, I think it was.  But I don't know about very close; aside
from relatively trivial things, like file descriptors having no
particularly close analog, the real problem is that LIB$SPAWN is not
how most things are - were - run under VMS.  Rather, everything runs in
the same process, with each program you run replacing the prvious one.
Additional processes are necessary only if you want to run something
detached, which is not common.

Or at least, that's how it was.  With POSIX's dominance, it would not
surprise me if VMS had tried to jump on the bandwagon and support the
Unix paradigm at least to the extent necessary to implement many of the
POSIX interfaces.  You've probably used more recent VMS than I have;
has this happened?

 I won't go into the pros and cons of different ways of starting a new
 process to run something.

And there we are: under VMS, you don't - well, didn't, back in the '80s
when I used it - start a new process to run something.  You ran it in
the same process you ran everything else in.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: RFC: import of posix_spawn GSoC results

2011-12-20 Thread Mouse

 From the annals of the POSIX wars:  the rationale for posix_spawn()
 was to support systems without MMUs, where fork() is expensive, and
 vfork() impossible.

I would quibble with calling vfork() `impossible'.  Perhaps I'm missing
osmehting, but vfork() seems particularly well-suited to such a system
to me - the borrow the VM semantics strike me as exactly what you
want when context-switching is expensive.  (Though of course the `V' of
`VM' is a bit of a misnomer in that circumstance.)

In any case, even if I'm wrong, vfork isn't impossible, just, at worst,
ludicrously expensive. :-)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: NetBSD/usermode (Was: CVS commit: src)

2011-12-22 Thread Mouse

 This all seems simple and elegant enough, but it does not (quite) work:

   A) It still requires a new system call on the outer kernel.
  *Perhaps* this could be avoided by using ptrace, which might
  be simpler with this approach because the rule is simple:
  just say no to all system calls.

Well...sort of.

   B) There is no way for the usermode userspace process to
  allocate memory.  I don't really see a clean way to fix
  this:

ptrace can support this too.  It can let sbrk/mmap through, but tell
the usermode kernel as it does so.  Or it can consult with the usermode
kernel first, and then let them through in a possibly modified form.

   2) Using ptrace to allow, but validate, sbrk and mmap
  arguments seems questionable at best.  How would
  this interact with the NetBSD VM system in the
  usermode kernel?

With difficulty. :)  I am confident this could be dealt with; if
necessary, sbrk and mmap could be intercepted and turned into different
calls of some sort.  SysV shm calls?  mmap() of /proc/something?

I've long thought that something akin to SCM_RIGHTS should exist for
passing memory regions between unrelated processes.  That would come in
extremely handy here.  (But given how badly SCM_RIGHTS got botched, it
probably would end up exploding somehow.)

   4) How exactly does the usermode kernel _end_ the
  usermode userspace processes in a clean way?

Use ptrace to force it to call exit(), and let the exit() call through
to the real kernel.  Or just use PT_KILL.

 Working through it makes me really wonder whether there's _any_
 portable way to do this stuff.

Sure - by instruction-level emulation if naught else.  (Not a great
way, but it certainly can work.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

raidframe rebuild question

2011-12-24 Thread Mouse

I've got a bit of a practical issue with raidframe.

The machine is at 4.0.1.  The RAID devices are

raid0: L5 /dev/raid5e /dev/raid6e /dev/raid7e /dev/raid4e /dev/raid9e 
/dev/raid10e /dev/raid11e[failed] /dev/raid12e /dev/raid8e
raid1: L1 /dev/raid2e /dev/raid3e
raid2: L1 /dev/ld0e
raid3: L1 /dev/ld5e /dev/wd3e
raid4: L1 /dev/ld8e
raid5: L1 /dev/ld2e
raid6: L1 /dev/ld4e
raid7: L1 /dev/ld3e
raid8: L1 /dev/wd4e
raid9: L1 /dev/ld1e
raid10: L1 /dev/ld7e
raid11: L1 /dev/ld6e
raid12: L1 /dev/wd2e

Just recently, /dev/ld6e decided it didn't like us any longer.
(Actually, I think it is probably the twe it's connected to, not ld6
itself.)  I manually failed /dev/wd3e in raid3 and added it as a spare
to raid11, but now I find myself stymied as to how to get it to
rebuild.  raid11 is of course failed in raid0; I could raidctl -R it,
but that won't help until raid11 is back in operational shape.  I can't
reconstruct raid11, because it has no operational members.  I can't
unconfigure it (preparatory to reconfiguring it), because it's held
open by raid0.

What's the right way to do this?  Am I stuck needing a reboot?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: fifo and [acm]time

2011-12-25 Thread Mouse

 On ufs (and tmpfs and perhaps others), reading from or writing to a
 fifo updates its [acm]time [...]

 Note that the same [acm]time updates do not apply to sockets.

Aside from whether this is a good idea, this difference may make sense.

A FIFO is a single shared object; multiple opens result in multiple
references to a single FIFO.

You don't say whether the socket case is SOCK_STREAM or SOCK_DGRAM
sockets.

For SOCK_STREAM, a socket in the filesystem is more like a cloning
device: a connection established using it as a rendezvous point results
in new sockets, distinct from the socket corresponding to the
filesystem entry (though one of them is derived from it).  It is these
new sockets that the I/O occurs on and whose [acm]time would logically
be updated (if they had [acm]times, which they don't, because they
don't have [iv]nodes).

For SOCK_DGRAM, the above argument does not apply.  As a matter of
theory, read()/recv()/etc on such a socket should update the atime and
write()/send()/etc should update the mtime (and, of course, in each
case the ctime as well).  (I/O on the peer socket, the one that doesn't
have a filesystem entry, should not do anything of the sort, because
that socket doesn't have any [iv]node to update the [acm]time of.)
Pragmatically speaking, it's not clear to me that there's enough value
in either stance to make it worth changing whichever behaviour the
implementation happens to provide.

 And what applications would ever rely on the [acm]time of a fifo?

The only value I can see in the [acm]time of either a FIFO or an
AF_LOCAL socket file is to see when the relevant software last did
anything with it.  This is less a matter of an application proper using
the timestamp and more one of a human who's investigating something
looking for relevant (or possibly-relevant) data.

 One consequence of this is that in a vanilla NetBSD install, Postfix
 triggers disk I/O every minute when master tickles the pickup daemon
 by writing to the fifo /var/spool/postfix/public/pickup.

Is one inode update per minute enough to be a significant issue?
(Sure, there will be cases where it is enough to matter.  But are the
common enough that it's worth doing anything to NetBSD in consequence,
or are they outliers that call for per-system custom tuning?  I can
think of at least one approach which likely would address this problem,
in cases where it _is_ A problem, already, and that's with no more than
a minute or so of thought.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: raidframe rebuild question

2011-12-26 Thread Mouse

 [raidframe woes]
 What's the right way to do this?

 What about creating a (Level 1) raid13 consisting of wd3e, adding (a
 partition on) that as a spare to raid0, and failing raid0's raid11e
 component?

That's probably what I should have done.  I don't seem able to do it
now, though; raidctl -r refuses to remove /dev/wd3e from raid11's
spares.  (It doesn't complain, but wd3e is still listed as a spare when
I check with -G afterwards.)

I should probably go read the code to see if I can figure out what's
really going on here...might be worth setting up a test machine I _can_
reboot casually.  (The machine in question is a production machine and
I'm not in the right city to deal with it personally.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: fifo and [acm]time

2011-12-26 Thread Mouse

 The only value I can see in the [acm]time of either a FIFO or an
 AF_LOCAL socket file is to see when the relevant software last did
 anything with it.
 Diagnostic information is useful, but is it useful to store on disk?

In many cases storing it in core is good enough, though I'm sure there
are at least some cases where it needs to go to disk.

 It seems to me that for the investigation you describe, systems such
 as ktrace, dtrace, and filemon would be more appropriate than the
 [acm]time of the inode.

Possibly.  I'm not familiar with dtrace or filemon, but ktrace cannot
produce that information unless the relevant processes were being
traced when they last did I/O.  [acm]times allow after-the-fact
investigation without needing to leave the processes traced during
routine operation.

 However, I suppose they monitor processes, rather than inodes,

There's that too.

 Is one inode update per minute enough to be a significant issue?
 It means the disk must continue spinning and, e.g., will continue to
 draw power from a laptop battery to do so, even when the system is
 functionally idle.

Aren't there lots of things that already do that?

Some of them can be suppressed by various mechanisms (eg, nodevmtime
mounts); one possibility is to use the same or similar mechanisms here.

Another is to address it some other way.  In the case of postfix, the
first thing that occurs to me is to make the FIFO path a symlink into a
tiny mfs mount dedicated to the purpose; updating the mtime of an inode
in a ramdisk is very fast, very cheap, and does not require keeping a
disk spinning.  Depending on whether the relevant support has
bitrotted, it could even be turned into a direct mount of a ramdisk
whose root inode is a FIFO rather than a directory.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: raidframe rebuild question

2011-12-26 Thread Mouse

 i seem to recall that we found some missing close calls inside
 raidframe that are now fixed in -current, and possibly pulled upto
 netbsd-5 and probably not netbsd-4?

Worse than that - see below.

 i think you do need a reboot, unfortunately.

I think so too.

I finally got around to looking at the code, and it turns out the ioctl
backing raidctl -r is totally unimplemented (quoted code here is from
the source tree from which the kernel on that machine was built):

case RAIDFRAME_REMOVE_HOT_SPARE:
return(retcode);

Not only that, but rf_remove_hot_spare, even were it called, is
unimplemented too:

int
rf_remove_hot_spare(RF_Raid_t *raidPtr, RF_SingleComponent_t *sparePtr)
{
int spare_number;


if (raidPtr-numSpare==0) {
printf(No spares to remove!\n);
return(EINVAL);
}

spare_number = sparePtr-column;

return(EINVAL); /* XXX not implemented yet */
#if 0
if (spare_number  0 || spare_number  raidPtr-numSpare) {
return(EINVAL);
}

/* verify that this spare isn't in use... */




/* it's gone.. */

raidPtr-numSpare--;

return(0);
#endif
}

So, yeah, I don't see any way out of this but a reboot. :(

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: DADHI drivers for Asterisk?

2012-01-08 Thread Mouse

 PCI boards for Asterisk require kernel drivers, [...]
 [...], which have a FreeBSD port here:
 http://svn.digium.com/svn/dahdi/freebsd/trunk

 Anyone started working on porting that to NetBSD?

 Have you realized that these drivers are apparently GPL/LGPL and thus
 not suitable for NetBSD kernel inclusion?

I once started looking at writing a native driver for one of the Digium
FXO/FXS cards, only to find that, as far as I could tell, the only
hardware documentation available was the Linux driver source.  I spent
a little time reading over the Linux drivers, but it was an ugly enough
mess to try to glark hardware interfaces from the driver that I lost
interest before getting anywhere.

If anyone does manage to find hardware doc, I'd be interested.  I'm not
likely to produce a driver soon (I don't expect to have the time), but
I'd like to have the doc against future possibility.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: buffer cache ufs changes (preliminary ffsv2 extattr support)

2012-01-15 Thread Mouse

 I'm working on porting the FreeBSD FFSv2 extended attributes support.
 [...]

 1) Add a new bflag, B_ALTDATA.  [...]
 2) instead of using a new flag, add a new 'int type' member [...]

 Althrough I've done 1 as a POC, I prefer solution 2 ([...]).  What do
 other think ?

As a choice of approach to implementing what you want, I think 2 is
better.  It's far more generalizable.  As a piece of SF I read once
said, the number two is ridiculous and can't possibly exist.  It was
talking about universes, but the basic concept applies here too:
there's very little excuse for any number between one and many.

However, I think that constitutes a good implementation of a bad idea.
This makes a file no longer a long list of octets; it becomes multiple
long lists of octets.  The Mac did this, with resource forks and data
forks, and you may note OS X doesn't do it any longer.  I suspect these
will seem like a good idea for a while, until people start discovering
all the things they break, or that break them, and realize that they
didn't learn from history and thus had to repeat it.

That said, it's no skin off my nose.  I've said my piece, and it won't
be affecting me, pragmatically, either way.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: buffer cache ufs changes (preliminary ffsv2 extattr support)

2012-01-16 Thread Mouse

 This makes a file no longer a long list of octets; it becomes
 multiple long lists of octets.  [...]
 [...] I have always found the idea flaky myself (and sorry for the
 rant):  [...]

Yeah.  I think it's a very interesting direction to take filesystems.

But this, interesting as it is, is research experimentation; we do not
even nearly understand how to fit multi-fork (to adopt the MacOS term)
files into a Unix paradigm (witness all the programs that we don't
understand how to change for this), and investigating non-understood
things is what research _is_.  And I think the master tree for a
(supposedly-)production OS is not the place to be carrying out research
experiments, not even if another such OS is already doing it.

But my opinions seem to correlate negatively with NetBSD's these days.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: NetBSD on current AMD motherboards

2012-01-17 Thread Mouse

 Any RAID controller where the management interface works under
 NetBSD?

Well, it's old enough it's more in the nature of an existence proof by
example, but at work there's a NetBSD machine with a 3ware Escalade
12-port SATA RAID card.  There's a management program that is
depressingly poorly documented, but it does work for us.  However, we
don't actually use the card's RAID facilities, just using it as a
multi-port SATA interface and doing the RAID with RAIDframe; all we use
the management program for is getting lists of disks attached with
serial numbers and suchlike.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: RFC: New bus_space routine: bus_space_sync

2012-01-20 Thread Mouse

 Even if originally intended for something else, [...]

 Why do you think BUS_SPACE_BARRIER_SYNC was intended for something
 else ?  I can't see how a write barrier that doesn't ensure the write
 has reached the target (main or device memory) can be usefull.

I can't comment on why someone else thinks something.  But barriers
that have nothing to do with write completion to the target can still
be useful.  There are algorithms that don't require that writes
complete on any particular schedule, but do require that _this_ write
complete before _that_ one.  When faced with write coalescing and
reordering, a write barrier that does nothing but enforce ordering (in
the sequence A-barrier-B, the barrier enforces the constraint that
there is no time at which write B has completed but write A hasn't) can
be useful.

For example, the standard double-buffering trick of write inactive
copy, then write variable indicating which is the active copy does not
work if the indicator's write can complete before the
(formerly-)inactive copy's writes complete - but, in many uses, there
is no requirement that those writes, as a sequence, be pushed to their
target at any particular time.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: RFC: New bus_space routine: bus_space_sync

2012-01-20 Thread Mouse

 I can't see how a write barrier that doesn't ensure the write has
 reached the target (main or device memory) can be usefull.

 [...].  But barriers that have nothing to do with write completion
 to the target can still be useful.  [...]

 That's not what the manpage documenting BUS_SPACE_BARRIER_SYNC says.
 Read the manpage.

Oh, what I wrote wasn't about BUS_SPACE_BARRIER_SYNC specifically.  It
was about barriers more generally, in response to I can't see how a
write barrier that doesn't ensure the write has reached the target
(main or device memory) can be usefull.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: extattr namespaces

2012-02-06 Thread Mouse

 One thing that I'm wondering: what are the character constraints on
 those class names in the Linux API?

 The reason is that if UTF8 is allowed, it'd be possible for two names
 to show as an equivalent representation to humans, while they'd be
 different for the system, [...]

Only if userland insists on rendering the octet sequences as UTF-8
characters.  That would be stupid of it (in security-important
contexts, at least) for this reason if no others.

I think the kernel should be as encoding-agnostic as feasible, just as
it is now for pathname components, file contents, data flowing through
pipes and sockets - pretty much all places where octet strings of any
sort cross the user/kernel boundary.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: O_NOACCESS?

2012-02-10 Thread Mouse

 Why not use O_DIRECTORY (which is part of -current) and add that to
 flags?

Backporting that might be a better alternative.  What are its
semantics?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: O_NOACCESS?

2012-02-11 Thread Mouse

 Why not use O_DIRECTORY (which is part of -current) and add that to
 flags?
 Backporting that might be a better alternative.  What are its
 semantics?
 It means the open will only succeed is the file is a directory.

Worth having, but not sufficient by itself, because it still requires
something in the low two bits, and without something like O_NOACCESS
there is nothing you can pass there that will let you open a directory
you have neither read nor write access to (even if you have search
access to it).

In a private exchange with someone else, I've determiend that it
definitely needs more restrictions than I've got on it now, because
what I have lets anyone flock() anything - flock does not require FREAD
or FWRITE - and lets anyone open any device special file (for no
access, but depending on the driver that can still be substantial) and
lets anyone keep a big file from being destroyed by having an open
descriptor on, in each case requiring no more access than the ability
to name the object (ie, search access on the containing directory and
the path leading to it).

I really should not have needed to have those pointed out to me.

My current plan is to add O_DIRECTORY as well and make O_NOACCESS work
only when combined with it.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: O_NOACCESS?

2012-02-11 Thread Mouse

 Right.  You add O_DIRECTORY to that check.

Ah, I misunderstood.  My apologies.

 if ((flags  (FREAD|FWRITE)) == 0  (flags  O_DIRECTORY) == 0)
   return EINVAL;
 if ((flags  O_DIRECTORY) != 0  (flags  (FREAD|FWRITE)) != 0)
   return EINVAL;

Actually, what I have now skips the latter of those two checks, because
I can't see any reason wby O_DIRECTORY shouldn't be specifiable with,
eg, O_RDONLY.

Am I missing something important there?  After how I missed something
pretty blatantly obvious before, I don't trust myself tonight.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: O_NOACCESS?

2012-02-11 Thread Mouse

 There is no problem.  O_NOACCESS would be 3.  When converted from O_*
 to F* it becomes 0.

And that is indeed what I did.  FFLAGS() and OFLAGS() become more
complex than just adding and subtracting 1, but that's not difficult to
deal with.  (If anyone's curious exactly what I did, look at the three
commits ending with 5215f8f6551df407d7c87c8e6a80c7b04e9ee844 in the git
repo git://git.rodents-montreal.org/Mouse/netbsd-fork/4.0.1/src.)

 The fact that the O_ flags were not intelligently specified aeons ago
 so that a conversion is required is regrettable, but at this point
 unfixable.

Actually, I disagree.  It is totally fixable.

Well, it's unfixable in the sense that we can't change the past choice.
But it is fixable in that we do not have to be remain crippled by that
choice.

Quite aside from someone just having the courage to bite the bullet and
write off compatability with such ancient code (are there any known
extant examples?), it's possible to do something like

#define O_MODERN 4 /* or whatever */
#define O_RDONLY (_FREAD|O_MODERN)
#define O_WRONLY (_FWRITE|O_MODERN)
#define O_RDWR (_FREAD|_FWRITE|O_MODERN)

In the libc open() stub, check O_MODERN.  If set, just call the
syscall.  If not, call _really_ancient_compat_open or some such (which
latter would be under the control of the relevant COMPAT_* option, and
would perform the historical mapping) - or, maybe, just do the mapping
in libc.

I'm not sure it's worth it, though, just for the sake of eliminating
FFLAGS() and OFLAGS().

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: O_NOACCESS?

2012-02-11 Thread Mouse

 There are, however, at least three possible things there's currently
 no open flags for.

 (1) search/lookup on a directory, as described;
 (2) execute on an (executable) regular file;
 (3) really nothing at all.

 #1 and #2 could be legitimately combined (as the --x permission
 setting is combined) into something we could reasonably call O_EXEC.

That actually makes the most sense, I think.  O_NOACCESS as I
implemented it a quick kludge to graft the effect I want onto the
existing framework.  It definitely is not the rightest answer,
especially with the ugly works only when O_DIRECTORY is given `fix'.

 (Note that while there may be no use for #2 in userlevel code, unless
 perhaps if we add an fexecve() call, having it would be convenient in
 the kernel.)

fexecve() makes a lot of sense too.  So would an flink(), and indeed f*
versions of any other call which uses a path just to name an object
rather than as a relevant part of the syscall.

But, taken to its logical conclusion, that also means that all the
pathname-taking calls should have versions which take a directory fd
and a single pathname component.  This would be nice in some respects,
though I'm not sure about bind(2) for AF_LOCAL sockets.  C is not a
right language for where my mind is going with this.

 #3, which is what I'd call O_NOACCESS, is something else though; [...]

 That is, it would let you use open() to create a fd for any path you
 can name, including devices and whatever else, without granting any
 access permissions at all.  And, indeed, without calling device-level
 open() routines and such.

 This would also support what Mouse is trying to do,

Actually, I don't think it would, not without creating other problems.
If it addresses my desire, then it must keep a reference to the
underlying object.  And if it does that, then it can be used as a DoS
by preventing large files (coredumps, logfiles) from being destroyed
upon unlinking them.

I'm not sure that needs fixing.  It does need more thought.

 I have implemented #3 in research kernels and it doesn't cause the
 world to blow up, although it does require some extra logic for
 calling device open routines, and NetBSD in particular might be
 missing checks entirely in certain places (like flock, as previously
 cited) that would need to be added.

What, if anything, did you do about the anyone who can stat a file can
keep it around consuming diskspace indefinitely issue?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Respawn crashed PUFFS filesystems?

2012-02-11 Thread Mouse

 Of course the feature would be broken in some cases, but we could
 make the thing optional using a vfs.puffs.respawn sysctl, which would
 contain a colon-separated mount points subjected to respawn.

What happens if a mount point contains a colon?

More to the point, I think this puts the information in the wrong
place.  Is there any way it could be set as an option at mount time?
(That's a serious question; I don't know puffs enough to answer it.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Respawn crashed PUFFS filesystems?

2012-02-11 Thread Mouse

 Is there any way it could be set as an option at mount time?
 The problem is that mount(8) passes the options verbatim to
 /sbin/mount_xxx, which is supposed to start the xxx filesystem.  The
 filesystem will parse the options on its own before passing
 appropriate flags to mount(2). We have no way to make sure a third
 party software will not choke on an unexpected option, and no way to
 make it pass the option to mount(2).

As for choke on an unexpected option, well, third-party software can
choke for any reason or none.  But I don't see any reason we can't
document this as one of the likely options and let anyone who doesn't
handle it, or doesn't pass it back at mount(2) time, deal with user
censure for not supporting a useful and easy-to-support facility.

Alternatively

This could be useful in other contexts, from post-unmount cleanup in
general to auto-remount of non-puffs filesystems.  Perhaps it's
appropriate to add vfsctl(2), with an option which can set a run this
on unmount command?  Or maybe a wait for unmount operation?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: vnode_to_path()

2012-02-21 Thread Mouse

 It's not just $ORIGIN that can make use of it.  Imagine for a moment
 getting a backtrace automatically on a segfault.  It's a lot easier
 and more reliable if you get access to the debug sections.  Those are
 normally not mapped though, so you need access to the path.

Actually, you don't.  You need read access to the executable; whether
you get that access via a path or not is irrelevant.
/proc/curproc/file can address this; so could some kind of
get_RO_fd_on_my_executable().

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: vnode_to_path()

2012-02-21 Thread Mouse

 I have a question regarding the vnode_to_path() function [...]

 The problem is that it works if and only if [...].

That's the immediate pragmatic problem.

More serious, I think, is that it exhibits a much more fundamental
confusion: it is confusing objects with names for objects.

 The correct way to handle this is to call getcwd there instead, but
 there's so far no agreement to accept the possible extra overhead on
 every exec call.  Also, there *are* race conditions and it's not at
 all clear what the consequences might be.

The current directory may not have any name, and if it does, it may not
be determinable by the user doing the exec.

 $ORIGIN is a poorly conceived interface, unfortunately.

Not as if _that_'s anything new.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: raidframe questions

2012-02-27 Thread Mouse

 [...raid1...2 x 500G...each disk fails, replaced with larger...]

 I want to know if I can recover my lost 100Go...

 I read that changing raid size is not possible.

That's only semi-true.

It's not possible in that there's no clean well-defined interface for
it, perhaps.  But of course it's possible.  With RAID 1, you've got a
fairly easy case.  Here's how I'd do it:

- Unconfigure the RAID.
- Re-disklabel both disks, enlarging the relevant partitions.
- Patch one component label, increasing the size.
- Configure the RAID with only the patched component, with the other
   one missing.
- Hot-add the other component.
- Let it resync the hot-added component.

When the resync finishes, you should be back in operation with a larger
RAID.  (As with any resync to a hot-added component, you may then want
to unconfigure and reconfigure to get the second drive changed from
used spare to ordinary member.)

Of course, you then have to figure out what to do with the extra space.
The RAID pseudo-disk's disklabel is your friend here; you could give it
a separate filesystem, or, if the filesystem supports it, grow the
existing partition and then grow the filesystem

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: NetBSD-based file servers

2012-03-05 Thread Mouse

 So I need some data for that upcoming discussion. Who is using NetBSD to ope$

Please don't use paragraph-length lines.

 So I need some data for that upcoming discussion.  Who is using
 NetBSD to operate a file server on a scale comparable to or larger
 than ours, i.e. ~200 users, ~1TB storage?  If so, which version on
 what kind of hardware?

I'm not sure what counts as a file server, but at one of my jobs our
main backup host has a dozen 1T (actually about 931G) disks in two
RAIDs, one providing a little under 1T of space and the other providing
about 7¼T of space.  NetBSD 4.0.1 with a few tweaks (stock 4.0.1 had
some 32-bit issues that started breaking things in the TB range) on
peecee-architecture hardware.  Rackmount server, but it's NetBSD/i386.
Fairly old hardware, too; when it was first set up, 2.0 had just been
released.

User count...that depends on how you measure users.  The machine has
very few logins, because end users don't log in to it directly.  It
pulls backups from customer machines.  But, while I haven't counted
(and in some cases am not in a position to count), I'd be surprised if
there were fewer than 200 end users whose data this machine handles.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

sin_zero, redux

2012-03-06 Thread Mouse

Back about two weeks ago I wrote about sin_zero and its relevance to
the radix tree used by AF_INET's routing table.

Sunday, I finally got together the round tuits to try eliminating
sin_zero altogether, this reinforced by remarks during the previous
thread that not everyone has sin_zero and thus code with suitable
portability aspirations won't, or at least shouldn't, use it
explicitly.

Of course, I don't know whether anyone will care.  Consider this a
report back to the community on an experiment, if you will. :)

Working with 4.0.1, because that's what I have easy to build at the
moment, I removed sin_zero from the struct definition in
sys/netinet/in.h.  Then it took fixing only two other files to get the
kernel to build, sys/nfs/nfs_export.c and sys/netinet/raw_ip.c.  Then I
did a sweep for files which textually contained sin_zero and fixed
them; this meant gnu/dist/gdb/sim/arm/main.c,
gnu/dist/gdb6/sim/arm/main.c, share/man/man4/inet.4,
usr.bin/talk/ctl.c, and usr.sbin/nfsd/nfsd.c, and ignoring a few other
occurrences (eg, in RFCs quoted in files under dist/).  All of these
fixes were fairly obvious upon looking at the references.

Then upon attempting a build of the world, I found I had to fix
sbin/routed/output.c and sbin/routed/table.c, which contained
initializers which assume the presence of a compound field after
sin_addr.

Then the world built.

AF_INET doesn't work, and I think I know why.  This also explains why
it didn't work when I just reconfigured the routing table in
inetdomain.  ARP processing uses sockaddr_inarp (netinet/if_inarp.h)
for its routes.  This struct is just like standard sockaddr_in except
that, in place of sin_zero, it has a second struct in_addr and two
16-bit values, and it actually cares what's there, so for inetdomain's
routing table to ignore that data breaks it.  Unwarranted chumminess
with the implementation at its finest; at the very least this deserves
big comments on inetdomain where its routing table is configured, and
on struct sockaddr_in explaining why sin_zero has to be there.  AF_INET
networking seems to work on-subnet anyway, and I'm not sure why, since
that uses ARPs too - perhaps it was total coincidence

I am relieved to finally (think I) understand why sin_zero is as
necessary as it appears to be.  I still think requiring sin_zero to be
zero for most interfaces (bind(), routing socket messages, etc) to work
is a bug, or at best a misfeature; I think it should be zeroed upon
reception by the kernel from userland rather than letting userland
leave trash there.  But I feel much better understanding why things
broke so mysteriously when I shrank the routing table.

I'm now going back to a source tree with sin_zero and will be adding
prominent comments to it explaining why sin_zero is necessary.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Adding SMBus block transfers to iic(4)

2012-03-11 Thread Mouse

 [...] i2c block-mode transfers [...]

   #define I2C_F_BLOCK 0x20

 Comments?  Suggestions?  Alternatives?

BLOCK has a second, quite different, meaning (as in, blocking I/O).
It may not apply here, but defining a bit in the interface that can be
misunderstood as indicating it does could, at the very least, be
confusing.

Might I suggest BLKMODE instead of BLOCK?  At least to my eye, that's a
lot less ambiguous.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: GSOC 2012 project clarification

2012-04-02 Thread Mouse

 The deletion is generally not at the time of unlink.  It happens
 when the file isn't referenced by anything anymore.

Yeah, but in most cases isn't that at unlink time?  File destruction
_can_ be delayed well beyond the no names refer to it point, but at
least in my experience that is very much more the exception than the
rule.

Mouse

Re: add disk size to struct disk?

2012-04-28 Thread Mouse

 Design question: do you expect the checks to be performed in
 userland, so anyone can be free to have overlaps/overflows, or let
 the kernel do the checks and return errors using the size obtained
 through disk(9)?

Speaking as someone who occasionally causes overlaps and such
deliberately: I don't care, as long as, wherever it is, it's easy to
disable the check, or at least downgrade the error to a warning.

To put it another way, I think this is an good time to apply the
principle Unix does not prevent you from doing stupid things, because
that would also prevent you from doing clever things.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: How to get a struct mount

2012-04-29 Thread Mouse

 I hace a filesystem mounted on /foo how do I retreive its struct mount?

 Using namei_simple_user(), I can get a vnode for /foo but its v_mount
 is the one for the root filesystem.  Looking up /foo/ produces an
 error.  (bad address). 

Isn't that what v_mountedhere is for?  Or has that gone away in the
NetBSD version you're using?  (I don't see any indication what version
you're doing this under.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: ENOATTR vs ENODATA

2012-04-30 Thread Mouse

 There is a choice to be made about returing ENOATTR or ENODATA [...]

 In order to get the broader compatibility, I suggest patching our
 errno.h to define ENOATTR as ENODATA.  Opinions?

As a code author, I don't like this.

A similar situation already exists with EAGAIN and EWOULDBLOCK: some
systems define only one, some only the other, some both with different
values, and some both with the same value (often one in terms of the
other).

The last of these is rather annoying, because it means that a simple

#ifdef EAGAIN
case EAGAIN:
#endif
#ifdef EWOULDBLOCK
case EWOULDBLOCK:
#endif

produces a compile-time error.  So, my opinion would be to prefer one
of the other alternatives.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Thinking about branes for netbsd...

2012-05-02 Thread Mouse

 After spending some time thinking about what would be required to
 implement branes as part of the SMP networking project, [...]

What's a brane in this context?  The only meaning I'm familiar with for
the term is from particle physics and makes no sense here.  I did a
little searching, and, while my Web-fu is admittedly weak, I didn't
find anything the least bit helpful.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

watching dynamic device attachment?

2012-05-03 Thread Mouse

I have an application where I want to watch USB devices come and go.
I've written code basd on usb(4), and it works - but the devices in
question are disks, which show up as, for example,

umass0: using SCSI over Bulk-Only
scsibus1 at umass0: 2 targets, 1 lun per target
sd0 at scsibus1 target 0 lun 0: Generic, External, 2.10 disk fixed

So, I'm wondering if there's some way to watch devices come and go
beyond what /dev/usb gives me (which stops with the umass attachment).
Even some way to inspect the current device tree would help; I can
watch umass attach and then query the tree to see what's underneath it.
I have a fuzzy memory of seeing something that looked like device-tree
data, but the memory's too fuzzy to be of much use here.  kern.drivers
appears to be part of what I'd want, but only part; I'd much rather not
have to scrape dmesg output. :/

The machine in question is currently at 4.0.1.  If this is possible
with a more recent version but not with 4.0.1, I might be able to talk
its admins into switching, but I suspect they'd rather not; it _is_ a
production machine.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: watching dynamic device attachment?

2012-05-03 Thread Mouse

 Even some way to inspect the current device tree would help; [...]

 Device properties:

   drvctl -p

That's close.

I don't have an easy way to test it in the particular case at issue
(umass - scsibus - sd), since the machine in question is running a
kernel based on GENERIC, so drvctl means a new kernel, and, as a
production machine, I can't casually reboot it now.  But I do have
another 4.0.1 machine I can play with; I built a new kernel for it with
drvctl(4) in it and poked around with drvctl(8).

It doesn't look suitable.  For example, my test machine's disk attaches
via

pci0 at mainbus0 bus 0: configuration mode 1
...
piixide0 at pci0 dev 31 function 2
...
atabus0 at piixide0 channel 0
...
wd0 at atabus0 drive 0: ST9500420AS

but drvctl -p on pci0, piixide0, and atabus0 print, essentially,
nothing - they print output, but it contains an empty dict.  Nothing
that would let me walk the device tree down from the umass attachment
to the relevant sd.

Or is there an option I'm missing?  Neither the manpage nor the source
lead me to think so.

I may be able to extend it a little, to include parent and/or child
data in the result, but if there's something already present that will
let me get the info I want I'd prefer that.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: watching dynamic device attachment?

2012-05-03 Thread Mouse

 drvctl -tl will give you a recursive list of the device tree

Maybe you'd expect it to, but what it actually gives me is

drvctl: unknown option -- t
Usage: drvctl -r [-a attribute] busdevice [locator ...]
   drvctl -d device
   drvctl -p device

So I assume you're talking about something newer than 4.0.1, which
comes back to what I said about switching versions.  I suspect my
tweaking 4.0.1 will be an easier sell to them than switching versions,
especially to a version which isn't even released yet (I have access to
a 5.1 machine, and drvctl still says t is an unknown option there,
though the list of options it shows is longer, so presumably what you
describe won't work before 6.x).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: choosing the file system block size

2012-05-11 Thread Mouse

 at least if you ignore the space used by the inode.
 I guess I can indeed ignore inodes because the space occupied by them
 doesn't vary with fsbsize, or does it?

I think you're right that an inode's size is independent of the
filesystem's blocksize; possibly also relevant is that inode space is
not available for ordinary data storage even if the inodes in question
are not being used.  (Whether these means you can ignore them depends
on exactly what you care about; you know that better than I.)

 For files large enough to need indirect blocks,
 (a) the size is rounded up to the block size, not the frag size,
 Oops, I didn't know that.

Also, it has to occupy only whole blocks.  (This can lead to an
out-of-space error while there is still space on the disk, if all the
available space is in sub-block fragments but the file being extended
is large enough that it uses only whole blocks.  I have a fuzzy memory
that there's code in the allocation routines that tries to allocate
fragments out of sub-block pieces rather than splitting whole blocks,
in an attempt to reduce this effect.)

 and (b) you also need to account for indirect blocks.
 Ah, yes. Although that's probably negledgible

Again, it depends on your purpose.  It is a fairly small fraction,
though; even at its most egregious - a 512/512 filesystem - single
indirect block overhead adds 1/128 (128 = 512/4), double indirect adds
1/16384 above the single indirect overhead, and triple another
1/2097152 above that.  More typical would be, say, a 1k/8k filesystem,
for which the fractions are 1/2048 (2048 = 8192/4), double 1/4194304,
and triple 1/8589934592.  (This is for FFSv1.  I think FFSv2 has 64-bit
block addresses, in which case those fractions increase - 1/64, 1/4096,
and 1/262144 for 512; 1/1024, 1/1048576, and 1/1073741824 for 8k.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: choosing the file system block size

2012-05-15 Thread Mouse

 What I'm trying to do is to figure out the optimal block/fragment
 size for a filesystem.  My idea is to take the given data set and,
 for various block/fragment sizes, compute the overhead caused by that
 choice.

This is reasonable, if your only metric for optimal is amount of
overhead space required.  (Which may be true in your case, but it
seems to me to be worth mentioning anyway.)

 Since I'm not interested in the real amount of space required, I can
 ignore super blocks, cylinder group heads and inodes, since the space
 required by them doesn't vary with the choice of block size.

Yes and no.  There are aspects of cylinder groups which do change with
block size, though I haven't though about it enough to figure out
whether the proportion of space dedicated to overhead changes (ie,
whether there are multiple effects which cancel out).  I do know that
when I newfs with different block sizes, I often get different numbers
of CGs and thus different proportions of space dedicated to CG heads.

 Now, what else do I need?  [...freelist...cluster map...]  But what
 do I need for the summary infomation?  What else have I forgotten?

I'm not sure.  I'd suggest reading over the source to newfs and/or
fsck; they know a good deal about that stuff, and are much smaller and
more comprehensible than the filesystem kernel code.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: choosing the file system block size

2012-05-18 Thread Mouse

 I'd suggest reading over the source to newfs and/or fsck; they know
 a good deal about that stuff, and are much smaller and more
 comprehensible than the filesystem kernel code.
 Basically, I gave up on that after realising that both number of
 blocks and number of data blocks where actually in units of --
 fragments!  There seems to be a lot of stuff in that which is
 probably perfectly clear for those actually dealing with FS code, but
 close to incomprehensible for a newcomer in that area.

Heh.  The FFS code is full of delightful little surprises like that.
In fsresize.c, the source to my program which becamse resize_ffs, there
are a number of minor rants about other filesystem programs, such as
fsck and newfs/mkfs.  Most/all of them are still present in resize_ffs
source as of 4.0.1; I haven't bothered checking anything more recent.

 Is there any good book on the subject?

I don't know.  I don't know of any such book, but I've never looked; my
own knowledge of such things comes from experimentation and code
reading.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: accessing another process' resource limits

2012-05-19 Thread Mouse

 Is there an interface for reading (or even writing) another process'
 ulimits?

Yes.

Command-line: sysctl proc.$PID.rlimit.$RESOURCE.{soft,hard} (use -w to
change them, of course).

API: sysctl(3) with the analogous MIB.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: raw/block device disc troughput

2012-05-24 Thread Mouse

 It seems that I have to update my understanding of raw and block
 devices for discs.  [...performance oddities...]

Mostly I have nothing useful to say here.  But...

 2. I would have expected inceasing the block size above MAXPHYS not
improving the performance.

There is at least one aspect of performance that will not be cut off by
MAXPHYS, that being syscall overhead.  I don't know your system (you
don't say which port you're running on, for example), but if syscall
overhead for your hardware is not ignorably small compared to the costs
of doing the disk transfer, then doing one syscall per 256K will be
four times as costly in syscall overhead as doing one syscall per 1M,
even if it is four times as costly in disk transfers.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: raw/block device disc troughput

2012-05-24 Thread Mouse

  dd if=/dev/zero od=/dev/[r]sd0b bs=nn, count=xxx.

(I've been assuming od= should be of=)

 The block device will cause readahead at the OS layer.

I thought of that too, but didn't mention it, because it's not
relevant.  dd isn't reading from the disk; it's writing to it.

 I suspect that if you double-buffered at the client application layer
 this effect might disappear,

I suspect this is a significant effect.  I was once using dd to copy
from one disk to another, and both drives happened to have activity
lights on them.  Watching each drive wait for the other convinced me dd
is an inefficient way to do that.  I built a program that uses two
processes, one reading and one writing, with a large chunk of memory
shared between them for buffer space.  Disk-to-disk copies (not on the
same spindle) got significantly faster. :)

In this case, dd has to block after each disk write to wait for its
buffer to be (unnecessarily, as it happens, though it can't know that)
zeroed for the next write.  This both imposes additional delay and
enforces a lack of overlap between each write and the next.

I speculate that the cooked device helps because it means that dd's
write finishes when the bits are in the buffer cache, rather than
waiting for them to hit disk.  Flushing from the buffer cache to the
disk then (a) gets overlapped with zeroing memory for the next cycle
and (b) allows writes to adjacent disk blocks to be collapsed as they
get pushed from the buffer cache to the drive.  Unless the host is
unusually slow, writing to the disk will be the limiting factor here,
meaning the buffer cache will have a large number of writes pending, so
coalescing writes is plausible, even likely.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Should kqueue descriptors work outsid of the creating process?

2012-05-31 Thread Mouse

 Recently we found out (PR kern/46463) that kqueue() file descriptors,
 which originaly were designed to be local process only objects,
 could be passed with SCM_RIGHTS messages to other processes.  [...]

 I propose to not allow sending kqueue file descriptors [...]

 Or are there any legit uses for foreign kqueue()s?

It seems to me, for what it may be worth, that this is asking the
wrong question.  Rather, I would ask whether there are illegitimate
uses for `foreign' kqueue descriptors, and, if not, fix them to be
passable like any other descriptors.

It's certainly possible there are such uses we want to forbid.  I don't
know kqueue well enough to address that point myself.  But your post
doesn't give any particular reason to think there are.

 I don't see any, the alien process could just create its own kqueue()
 and add the same events instead of passing the filedescriptor over.

The same argument could be applied to descriptors on /dev/null, too,
but we don't forbid passing them.

That's a somewhat silly analogy, but I think at its core it's basically
my argument: we shouldn't forbid things by default, and there are
other ways to accomplish the same effects isn't reason enough to
prohibit something.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: per-mount maxvnodes

2012-06-07 Thread Mouse

 Therefore comes the idea to have a per-mount maxvnodes.

 I tried implementing it, the biggest problem is how to set the value.

sysctl kern./usr/local.maxvnodes?

It's a little ambiguous, in that it's possible - or at least it was
last time I tried it - to have multiple mounts with the same mounted-on
string.  But that's definitely an unusual case, and I see nothing wrong
with accessing the topmost mount in that case; that's what normal
filesystem accesses will do, after all.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: selectively disabling atime updates?

2012-06-11 Thread Mouse

 I can think of two ways to acheive this (each of which may be absurd
 given better knowledge of fs internals than I have): Either a
 per-process switch disabling atime updates or a way to obtain a
 read-only clone of a block device which can be mounted ro,noatime.

The latter will not work, at least not for FFS and probably not for any
filesystem whose implementation was not specifically designed to
support it.  The problem is that the `read-only' device is changing
behind the filesystem's back.

Unless you mean a read-only *snapshot* of a block device, in which case
you're basically back at snapshots, only at the block device level
instead of the filesystem level.  (Actually, looking at the existing
snapshot support, it's not clear to me that's not exactly what it
already is.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: RAID 1 with 3 disks

2012-07-02 Thread Mouse

 Iis RAIDframe smarter when it has 3 disaks in a RAID 1?

Does RF even support 3-disk RAID 1?  It didn't last time I looked, but
that was long enough ago it could very well have changed since then.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: software interrupts scheduling oddities

2012-07-05 Thread Mouse

 Maybe this has happened to you: you tune your NetBSD router for
 fastest packet-forwarding speed.  Presented with a peak packet load,
 [...] the user interface doesn't get any CPU cycles.  [...]  [I]f
 there is any software interrupt pending, then it will run before any
 user process gets a timeslice.  So if the softint rate is really
 high, then userland will scarcely ever run.  Or that is my current
 understanding.  Is it incorrect?

No, I think.  At least, that's how I'd expect it to work, and I've
occasionally seen behaviour close enough to that to make me think it's
reasonably accurate.

I find your discovery about changing a user process's priority making a
difference surprising.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Quota on tmpfs

2012-07-17 Thread Mouse

 I don't care about low-level storage and only manage visible file
 sizes.

 Sparse files are [...].  Counting them as if they [weren't sparse]
 [will] [...] or render your new quota system unuseful to a large
 number of users.

How large a number?  I have very little basis for more than wild
guessing here; I rarely use sparse files, and even more rarely use
files sparse enough to make a significant difference from a quota point
of view.

Furthermore, those few uses are generally administrative, the kind of
thing that either is on a non-quota filesystem or is owned by a user
who can be exempted from quotas without harm (eg, root).

Do you have experience or studies indicating that this is another
respect in which I am an outlier?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Quota on tmpfs

2012-07-18 Thread Mouse

 I would also guess that sparse files are very rarely used.
 I suspect process 'core' files are written sparse.

I just tried one, and it did not appear so.  But that was just one
test, and quite possibly one of the probably numerous differences
between your test and mine is relevant.  (On 1.4T/sparc and 4.0.1/i386,
I ran sleep 60 and typed my quitc to get a core dump.  In each case,
dd conv=notrunc if=sleep.core of=sleep.core did not change the number
reported by ls -s.)

 I had to uncompress one yesterday and it would have a lot smaller if
 written as a sparse file.

It may have had large runs of 0x00s, but could that have been because
the process's VM contained them?  That is, was it actually sparse when
written, or was it just a file which happened to contain data such that
some disk could be saved by making it sparse?  You say you uncompressed
it, and most compression programs do not distinguish between a sparse
file and a file with long runs of 0x00s, so that's not evidence for
whether it was dumped sparse.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: Syscall kill(2) called for a zombie process should return 0

2012-07-18 Thread Mouse

 + if (p != NULL  P_ZOMBIE(p)) {
 + mutex_exit(proc_lock);
 + return 0;
 + }
   mutex_exit(proc_lock);
   return ESRCH;

 This is a general question, not necessarily specific to the patch.
 Which is more costly?  Two function calls as above, or storing the
 return value in a variable to return with just one function call to
 mutex_exit?

It depends.  A good optimizer could turn either one into the other,
so it may make no real difference.  If optimization is disabled or
limited, the version quoted above will probably be marginally larger
and, assuming larger code doesn't mean more cache line fills,
marginally faster.  Which is `more costly' depends on what costs you
care about and to what extent.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: disklabel problems on 3TB disc

2012-07-21 Thread Mouse

 hello.  You can put a wedge on the disk or put the raid on the raw
 disk itself.

Can you RAID the raw disk?  I thought you had to use partitions of type
RAID for that, which RAW_PART isn't.  Am I just confused?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: RAID on raw partitions (was: disklabel problems on 3TB disc)

2012-07-21 Thread Mouse

 You could always have raid sets on raw partitions.

 I thought I just learned from Greg Oster on May 11 (in
 2012053355.38f88...@mickey.usask.ca) that I couldn't have raw
 partitions als RAIDframe components.

There are two meanings of `raw' as applied to disk partitions.

There's `raw' as in the message you mention, which is (eg) /dev/rsd0a
instead of /dev/sd0a.  This is `raw' in that I/O goes more directly to
the disk.  In this sense, you cannot use raw partitions as RAIDframe
members.

And there's `raw' as in RAW_PART, which is (eg) /dev/sd0d instead of
some other /dev/sd0? (on x86; on most other ports, /dev/sd0c).  This is
`raw' in that it bypasses partitioning, allowing access to the whole
disk regardless of partitioning.  In this sense, you can use raw
partitions as RAIDframe members, provided you don't autoconfigure, or
provided you apply the patch that appeared upthread (or a suitable
porting of it if you're not using the version the patch is for).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: full-disc partition (was: RAID on raw partitions)

2012-07-22 Thread Mouse

 This is `raw' in that it bypasses partitioning allowing access to
 the whole disk regardless of partitioning.

 It looks that I remember incorrectly what ws@ taught me back in the
 days how the ``full disc'' partition works.

 What I remember is that (let's assume sparc) partition ``c'' was
 actually present in the disklabel and it was just by convention that
 one would allocate that 0-$.  The only magic I thought there was is
 that in case the kernel can't find a disklabel, it would invent one
 having a single ``c'' partition spanning the whole disc.

Perhaps that's how it was back in the day.  But now, looking at, for
example, sd.c, I see code that actually does bypass partitioning when
using RAW_PART.  For example,

if (SDPART(bp-b_dev) == RAW_PART) {
if (bounds_check_with_mediasize(bp, DEV_BSIZE,
sd-params.disksize512) = 0)
goto done;
} else {
if (bounds_check_with_label(sd-sc_dk, bp,
(sd-flags  (SDF_WLABEL|SDF_LABELLING)) != 0) =
0)
goto done;
}

(Using RAW_PART also affects various other things; for example, using
RAW_PART prevents the driver from loading the disklabel off the drive
the way it does for other partitions.)

 But then, due to the 32-bit-limitation of the disklabel structure, it
 couldn't make a ``c'' partition spanning 3TB.

 Is my memory wrong?  Has that been changed?

Depends on the version and the disk driver in question.  The above
quote is from sd.c,v 1.258.2.1, the one that shipped with 4.0.1.

Of course, for the full truth, you should check the source to the
version you're using.  Since this is done in the disk drivers, it will
also depend on what driver you're using (sd? wd? ld? etc).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: disklabel problems on 3TB disc

2012-07-23 Thread Mouse

 or put the raid on the raw disk itself.
 That doesn't work.  It truncates the component capacity to the
 truncated value in the disklabel.

That could easily just be a code bug.

Back some years ago, I had occasion to (for work) set up a RAID of
something like six or seven TB.  The individual drives were well under
the 2T limit, but even so I had some 32-bit bugs to fix.  It's possible
there's another one in the code path that passes the raw disk size to
the raidframe code.

Of course, it also might not be, too; I haven't looked.  I mention it
mostly to say don't just give up on it...well, unless for your
purposes custom bugfixes aren't acceptable, or finding them isn't going
to happen, or some such.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: disklabel problems on 3TB disc

2012-07-23 Thread Mouse

 Back some years ago, I had occasion to (for work) set up a RAID of
 something like six or seven TB.  The individual drives were well
 under the 2T limit, but even so I had some 32-bit bugs to fix.

 support for 2TB raidframe was not implemented until this comment:

 date: 2010/11/01 02:35:25;  author: mrg;  state: Exp;  lines: +19 -21
 add support for 2TB raid devices.

 which was later pulled upto netbsd-5.  before this, you could not
 create or use rf devices larger than 2TB, regardless of the size of
 the individual components.

Perhaps not, but it was pretty close.  It took only a few fixes.

I did the first version of this under 2.x, but I no longer have that
tree (or, if I do, I don't know where).  Looking at my 4.0.1 tree, I
see only these changesets touching sys/dev/raidframe:

* aec5aaa Bugfix: when clearing diskwatch on a raid `disk', use the correct 
value for `none set'.
* 096b537 Add diskwatch support to raidframe pseudo-disks.
* 5ba6878 Autoconfiguration rework in raidframe.
* 4d7b7d3 A few 32-bit fixes in raidframe.
* 6174918 Handle shutdown during reconstruction/rebuild better.
* 20a3038 Import newer raidframe code.

The first two are not relevant to this discussion; they're entirely
diskwatch-related.  Taking the other four in chronological order,

* 20a3038 Import newer raidframe code.

This moved five files (rf_reconmap.c, rf_reconmap.h, rf_reconstruct.c,
rf_reconstruct.h, and rf_revent.c) to 2008-05-25 versions (I can give
CVS version numbers, but don't see much point).  Looking at the diffs,
none of them look directly related to 32-bit issues.

* 6174918 Handle shutdown during reconstruction/rebuild better.

This is not 32-bit related.

* 4d7b7d3 A few 32-bit fixes in raidframe.

This is most of the 32-bit stuff.  There are only four hunks in the
diff, all applying to rf_netbsdkintf.c.  Three of them are writing
100ULL instead of 100 when computing percentages; the fourth is

-   if (lp-d_secperunit != rs-sc_size)
+   if ((long)lp-d_secperunit != (long)rs-sc_size)

* 5ba6878 Autoconfiguration rework in raidframe.

A quick skim here makes me think none of these are related to 32-bit
issues either.

The resulting 4.0.1 tree does indeed work for a RAID 5 that's about 7T
or so.  Of course, the filesystem is on the d partition of the
resulting RAID, since I didn't do anything to deal with the disklabel
on 2T disk issue.  (Actually, it's a RAID 15 - a RAID 5 whose members
are RAID 1s.)  I can supply full diffs if desired, or anyone with git
installed is welcome to clone the repo and look at the aforementioned
changesets. (For those interested, it's at
git://git.rodents-montreal.org/Mouse/netbsd-fork/4.0.1/src.)

It's possible there were other fixes required elsewhere in the tree,
but I don't think so.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: pinning down dk? assignment

2012-07-24 Thread Mouse

 Let wd1 disappear and the raid will try to use wd0a (dk0) and sd0a
 (dk1).

This is one reason to use autoconfigured RAIDs when you can.  They are
far more immune (completely immune, in my experience) to confusion from
disks attaching in new orders or at new places.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

1 2 3 4 5 6 7 8 9 >

1 - 100 of 812 matches

Mail list logo