Re: ZFS root mount regression

2019-07-22 Thread Garrett Wollman
< said:

> I am not sure how the original description leads to conclusion that
> problem is related to parallel mounting.  From my point of view it
> sounds like a problem that root pool mounting happens based on name, not
> pool GUID that needs to be passed from the loader.  We have seen problem
> like that ourselves too when boot pool names collide.  So I doubt it is
> a new problem, just nobody got to fixing it yet.

This seems plausible, except that something clearly did change to
result in these servers finding the correct root pool under 11.2 and
finding the wrong one under 11.3.  Can't say what it might be; there's
nothing in UPDATING that would imply a change.

(I've also seen some evidence that the parallel mounting is
problematic and I'll have to figure out how to disable that.)

-GAWollman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ZFS root mount regression

2019-07-22 Thread Garrett Wollman
< said:

> Both 11.3-RELEASE announcement and Release Notes mention this:

>> The ZFS filesystem has been updated to implement parallel mounting.

> I strongly suggest reading Release documentation in case of troubles
> after upgrade, at least. Or better, read *before* updating.

Two servers breaking out of thirty-five upgraded is not the sort of
thing that you'd expect to be implied by such a statement, especially
when there's only one filesystem being mounted at the relevant time.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


ZFS root mount regression

2019-07-19 Thread Garrett Wollman
I recently upgraded several file servers from 11.2 to 11.3.  All of
them boot from a ZFS pool called "tank" (the data is in a different
pool).  In a couple of instances (which caused me to have to take a
late-evening 140-mile drive to the remote data center where they are
located), the servers crashed at the root mount phase.  In one case,
it bailed out with error 5 (I believe that's [EIO]) to the usual
mountroot prompt.  In the second case, the kernel panicked instead.

The root cause (no pun intended) on both servers was a disk which was
supplied by the vendor with a label on it that claimed to be part of
the "tank" pool, and for some reason the 11.3 kernel was trying to
mount that (faulted) pool rather than the real one.  The disks and
pool configuration were unchanged from 11.2 (and probably 11.1 as
well) so I am puzzled.

Other than laboriously running "zpool labelclear -f /dev/somedisk" for
every piece of media that comes into my hands, is there anything else
I could have done to avoid this?

-GAWollman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 11.2-STABLE kernel wired memory leak

2019-02-12 Thread Garrett Wollman
In article 
eu...@grosbein.net writes:

>Long story short: 11.2-STABLE/amd64 r335757 leaked over 4600MB kernel
>wired memory over 81 days uptime
>out of 8GB total RAM.

Not a whole lot of evidence yet, but anecdotally I'm seeing the same
thing on some huge-memory NFS servers running releng/11.2.  They seem
to run fine for a few weeks, then mysteriously start swapping
continuously, a few hundred pages a second.  The continues for hours
at a time, and then stops just as mysteriously.  Over time the total
memory dedicated to ZFS ARC goes down but there's no decrease in wired
memory.  I've tried disabling swap, but this seems to make the server
unstable.  I have yet to find any obvious commonality (aside from the
fact that these are all large-memory NFS servers which don't do much
of anything else -- the only software running on them is related to
managing and monitoring the NFS service).

-GAWollman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Trap 12 in vm_page_alloc_after()

2018-11-28 Thread Garrett Wollman
< said:

> If you're using a Skylake, I suspect that you can set the
> hw.skz63_enable tunable to 0 as a workaround, assuming you're not using
> any code that relies on Intel TSX.  (I don't think there's anything in
> the base system that does.)  There are some details in
> https://reviews.freebsd.org/D18374

It is definitely a Skylake (although it took searching to find that
out, since we don't identify processors by Intel codename).  I've
set that tunable, but I won't know whether it helps until the next
(scheduled or unscheduled) reboot.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Trap 12 in vm_page_alloc_after()

2018-11-25 Thread Garrett Wollman
< 
said:

> On Sun, Nov 18, 2018 at 08:24:38PM -0500, Garrett Wollman wrote:
>> Has anyone seen this before?  It's on a busy NFS server, but hasn't
>> been observed on any of our other NFS servers.
>> 
>> 
>> Fatal trap 12: page fault while in kernel mode

>> --- trap 0xc, rip = 0x809a903d, rsp = 0xfe17eb8d0710, rbp = 
>> 0xfe17eb8d0750 ---
>> vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfe17eb8d0750

> What is the line number for vm_page_alloc_after+0x15d ?
> Do you have NUMA enabled on 11 ?

If gdb is to be believed, the trap is at line 1687:

/*
 *  At this point we had better have found a good page.
 */
KASSERT(m != NULL, ("missing page"));
free_count = vm_phys_freecnt_adj(m, -1);
>>>>>>  if ((m->flags & PG_ZERO) != 0)
vm_page_zero_count--;
mtx_unlock(_page_queue_free_mtx);
vm_page_alloc_check(m);

The faulting instruction is:

0x809a903d :   testb  $0x8,0x5a(%r14)

There are no options matching /numa/i in the configuration.  (This is
a non-debugging configuration so the KASSERT is inoperative, I
assume.)  I have about a dozen other servers with the same kernel and
they're not crashing, but obviously they all have different loads and
sets of active clients.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Trap 12 in vm_page_alloc_after()

2018-11-18 Thread Garrett Wollman
Has anyone seen this before?  It's on a busy NFS server, but hasn't
been observed on any of our other NFS servers.


Fatal trap 12: page fault while in kernel mode
cpuid = 35; apic id = 35
fault virtual address   = 0x5a
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x809a903d
stack pointer   = 0x28:0xfe17eb8d0710
frame pointer   = 0x28:0xfe17eb8d0750
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 878 (nfsd: service)
trap number = 12
panic: page fault
cpuid = 35
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe17eb8d03c0
vpanic() at vpanic+0x177/frame 0xfe17eb8d0420
panic() at panic+0x43/frame 0xfe17eb8d0480
trap_fatal() at trap_fatal+0x35f/frame 0xfe17eb8d04d0
trap_pfault() at trap_pfault+0x49/frame 0xfe17eb8d0530
trap() at trap+0x2c7/frame 0xfe17eb8d0640
calltrap() at calltrap+0x8/frame 0xfe17eb8d0640
--- trap 0xc, rip = 0x809a903d, rsp = 0xfe17eb8d0710, rbp = 
0xfe17eb8d0750 ---
vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfe17eb8d0750
kmem_back() at kmem_back+0xf2/frame 0xfe17eb8d07c0
kmem_malloc() at kmem_malloc+0x60/frame 0xfe17eb8d07f0
keg_alloc_slab() at keg_alloc_slab+0xe2/frame 0xfe17eb8d0860
keg_fetch_slab() at keg_fetch_slab+0x14e/frame 0xfe17eb8d08b0
zone_fetch_slab() at zone_fetch_slab+0x64/frame 0xfe17eb8d08e0
zone_import() at zone_import+0x3f/frame 0xfe17eb8d0930
uma_zalloc_arg() at uma_zalloc_arg+0x3d9/frame 0xfe17eb8d09a0
zil_alloc_lwb() at zil_alloc_lwb+0x9c/frame 0xfe17eb8d09e0
zil_lwb_write_issue() at zil_lwb_write_issue+0x2f8/frame 0xfe17eb8d0a40
zil_commit_impl() at zil_commit_impl+0x95f/frame 0xfe17eb8d0b80
zfs_freebsd_fsync() at zfs_freebsd_fsync+0xa7/frame 0xfe17eb8d0bb0
VOP_FSYNC_APV() at VOP_FSYNC_APV+0x82/frame 0xfe17eb8d0be0
nfsvno_fsync() at nfsvno_fsync+0xe0/frame 0xfe17eb8d0c50
nfsrvd_commit() at nfsrvd_commit+0xe8/frame 0xfe17eb8d0e20
nfsrvd_dorpc() at nfsrvd_dorpc+0x621/frame 0xfe17eb8d0ff0
nfssvc_program() at nfssvc_program+0x557/frame 0xfe17eb8d11a0
svc_run_internal() at svc_run_internal+0xe09/frame 0xfe17eb8d12e0
svc_thread_start() at svc_thread_start+0xb/frame 0xfe17eb8d12f0
fork_exit() at fork_exit+0x83/frame 0xfe17eb8d1330
fork_trampoline() at fork_trampoline+0xe/frame 0xfe17eb8d1330
--- trap 0xc, rip = 0x80087101a, rsp = 0x7fffe688, rbp = 0x7fffe930 ---


At this point the system was frozen: it did not attempt to reboot
automatically and was not in the debugger.  I had to do a remote reset
via the BMC.  The kernel is 11.2 r336644 (so no errata applied), but
none of the SAs and ENs release so far look like they touch this
region of code.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


NFS server hang with backing store on ZFS and quota nearly exhausted

2016-12-20 Thread Garrett Wollman
I've opened a bug about this before, which I can't cite by number
because bugzilla appears to be down at the moment.  But I had this
problem recur tonight under otherwise idle conditions, so I was able
to get a set of kernel stacks without any confounding RPC activity
going on.  This is on 10.2; we're not scheduled to take these servers
to 10.3 until next week.

Here's the "procstat -kk" output.

  PIDTID COMM TDNAME   KSTACK   
 1055 101965 nfsd -mi_switch+0xe1 
sleepq_catch_signals+0xab sleepq_wait_sig+0xf _cv_wait_sig+0x16a seltdwait+0xae 
kern_select+0x8fa sys_select+0x54 amd64_syscall+0x357 Xfast_syscall+0xfb 
 1058 101012 nfsd nfsd: servicemi_switch+0xe1 
sleepq_catch_signals+0xab sleepq_wait_sig+0xf _cv_wait_sig+0x16a 
svc_run_internal+0x8be svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe 

[Threads with the stack trace above are simply idle and waiting for
incoming requests, and I've deleted the other 5 of them.]

 1058 101688 nfsd nfsd: servicemi_switch+0xe1 
sleepq_catch_signals+0xab sleepq_timedwait_sig+0x10 _cv_timedwait_sig_sbt+0x18b 
svc_run_internal+0x4bd svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe 

[Not sure what these threads are doing: obviously they are waiting for
a condvar, but at a different spot in svc_run_internal().  I've
deleted the other 7 of them.]

 1058 101720 nfsd nfsd: servicemi_switch+0xe1 sleepq_wait+0x3a 
_cv_wait+0x16d txg_wait_open+0x85 dmu_tx_wait+0x2ac dmu_tx_assign+0x48 
zfs_freebsd_write+0x544 VOP_WRITE_APV+0x149 nfsvno_write+0x13e 
nfsrvd_write+0x496 nfsrvd_dorpc+0x6f1 nfssvc_program+0x54e 
svc_run_internal+0xd7b svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe 
 1058 102015 nfsd nfsd: master mi_switch+0xe1 sleepq_wait+0x3a 
_cv_wait+0x16d txg_wait_open+0x85 dmu_tx_wait+0x2ac dmu_tx_assign+0x48 
zfs_freebsd_write+0x544 VOP_WRITE_APV+0x149 nfsvno_write+0x13e 
nfsrvd_write+0x496 nfsrvd_dorpc+0x6f1 nfssvc_program+0x54e 
svc_run_internal+0xd7b svc_run+0x1de nfsrvd_nfsd+0x242 nfssvc_nfsd+0x107 
sys_nfssvc+0x9c amd64_syscall+0x357 

Then there are these two threads, both servicing WRITE RPCs, both
sleeping deep inside the ZFS code.  Note that one of them is the
"master" krpc thread in this service pool; I don't know if this
accounts for the fact that requests are not getting processed even
though plenty of idle threads exist.  (Note that zfs_write() does not
appear in the stack due to tail-call optimization.)

I don't know the ZFS code well enough to understand what running out
of quota has to do with this situation (you'd think it would just
return immediately with [EDQUOT]) but perhaps it matters that the
clients are not well-behaved and that the filesystem is often almost
at quota but not quite there yet.

-GAWollman
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: HOST_NAME_MAX not defined in unistd.h

2016-10-03 Thread Garrett Wollman
In article <20161002111719.6e22f...@x220.alogt.com>, 
erichsfreebsdl...@alogt.com writes:

>'Host names are limited to {HOST_NAME_MAX} characters, not including
>the trailing null, currently 255.'
>
>which makes people assume that HOST_NAME_MAX is defined in unistd.h.

It should not make them assume that.

{SOME_LIMIT}, in curly braces, means the value of the limit,
determined in the usual POSIX way: by checking whether a macro is
defined, and if not, calling sysconf() to determine what the run-time
value of the limit is, and if sysconf() returns -1 with errno
unchanged, doing whatever makes sense when the system does not impose
any limit (which depends on the semantics of the limit in question).
Having a typographical convention avoids having to repeat the usual
description with all of its caveats every time a limit is referenced.

For most limits defined by POSIX, there is a macro _POSIX_LIMIT_NAME
which must be defined to a specific value set out in the standard, and
which indicates the minimum permissible value of the limit.  This
value isn't necessarily useful for what any given program needs,
although it might serve as a reasonable guess.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


procstat -kk: Cannot allocate memory

2016-03-08 Thread Garrett Wollman
Sometimes when running "procstat -kk", I get the following error:

procstat: sysctl: kern.proc.pid: 1044: Cannot allocate memory

What is the condition that causes this?  Is there a static limit in
procstat, or in the kernel, that needs to be increased?

-GAWollman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Hangs with mrsas?

2016-03-07 Thread Garrett Wollman
I have a new Dell server with a typical Dell hardware RAID.  pciconf
identifies it as "MegaRAID SAS-3 3008 [Fury]"; mfiutil reports:

mfi0 Adapter:
Product Name: PERC H330 Adapter
   Serial Number: 5AT00PI
Firmware: 25.3.0.0016
 RAID Levels:
  Battery Backup: not present
   NVRAM: 32K
  Onboard Memory: 0M
  Minimum Stripe: 64K
  Maximum Stripe: 64K

Since I'm running ZFS I have the RAID functions disabled and the
drives are presented as "system physical drives" ("mfisyspd[0-3]" when
using mfi(4)).  I wanted to use mrsas(4) instead, so that I could have
direct access to the drives' SMART functions, and this seemed to work
after I set the hw.mfi.mrsas_enable tunable, with one major exception:
all drive access would hang after about 12 hours and the machine would
require a hard reset to come back up.

Has anyone seen this before?  The driver in head doesn't appear to be
any newer.

-GAWollman
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Have I got this VIMAGE setup correct?

2016-01-03 Thread Garrett Wollman
< said:

>> 2) Stopping jails with virtual network stacks generates warnings from
>> UMA about memory being leaked.

> I'm given to understand that's Known, and presumably Not Quite Trivial
> To Fix.  Since I'm not starting/stopping jails repeatedly as a normal
> runtime thing, I'm ignoring it.  If you were spinning jails up and
> down dynamically dozens of times a day, I'd want to look more closely
> at just what is leaking and why...

It looks like that's what bz@ fixed in r292601 (thanks to rodrigc@ for
pointing me in the right direction).  I haven't looked at how
difficult this would be to backport, but since I'm in the same
situation as you in terms of the frequency of startup/teardown
operations, I'm probably not going to worry too much about it.  Other
relevant changes from HEAD appear to be 292603, 292604, 278766, and
286537 (and again, this is just based on scanning the svn logs, not
actually thinking about the code).

> Is what I'm doing, though I'm creating the epair's and adding them to
> the bridges in the setup script rather than rc.conf (exec.prestart in
> jail.conf), because that makes it a more manageable IME, and since I'm
> already doing a bunch of setup in the script anyway...

For now, I think I'll just use exec.prestart to manually configure a
MAC address.  It would be nice if the LAA MAC addresses we generated
were both random on initial creation (to better avoid duplicates) and
stable over reboot.  (Likewise the bridge(4) MAC address.)  Or
alternatively if we just had rc.conf support for explicitly
configuring the MAC address of every interface, since ifconfig doesn't
let you configure L2 and L3 addresses on the same command line.

Actually, what would be *really* nice -- and I don't know if any of my
network interfaces can do this, but it would give me a reason to buy
hardware that could -- would be if PCI virtual functions could be used
to implement multiple independent network interfaces in the same
kernel (additional units in the same driver).  Then I wouldn't have to
deal with any of this configuration at all.

Failing all of those, having a good, well-documented example in the
handbook would be a Good Thing.

-GAWollman
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Have I got this VIMAGE setup correct?

2015-12-22 Thread Garrett Wollman
The consensus when I asked seemed to be that VIMAGE+jail was the right
combination to give every container its own private loopback
interface, so I tried to build that.  I noticed a few things:

1) The kernel prints out a warning message at boot time that VIMAGE is
"highly experimental".  Should I be concerned about running this in
production?

2) Stopping jails with virtual network stacks generates warnings from
UMA about memory being leaked.

3) It wasn't clear (or documented anywhere that I could see) how to
get the host network set up properly.  Obviously I'm not going to have
a vlan for every single jail, so it seemed like what most people were
doing was "bridge" along with a bunch of "epair" interfaces.  I ended
up with the following:

network_interfaces="lo0 bridge0 bce0"
autobridge_interfaces="bridge0"
autobridge_bridge0="bce0 epair0a epair1a"
cloned_interfaces="bridge0 epair0 epair1"
ifconfig_bridge0="inet [deleted] netmask 0xff00"
ifconfig_bridge0_ipv6="inet6 [deleted] prefixlen 64 accept_rtadv"
ifconfig_bce0="up"
ifconfig_epair0a="up"
ifconfig_epair1a="up"

The net.link.bridge.inherit_mac sysctl, which is documented in
bridge(4), doesn't appear to work; I haven't yet verified that I can
create a /etc/start_if.bridge0 to set the MAC address manually without
breaking something else.  The IPv6 stack regularly prints
"in6_if2idlen: unknown link type (209)" to the console, which is
annoying, and IPv6 on the host doesn't entirely work -- it accepts
router advertisements but then gives [ENETUNREACH] trying to actually
send packets to the default gateway.  (IPv6 to the jails *does* work!)

In each of the jails I have to manually configure a MAC address using
/etc/start_if.epairNb to ensure that it's globally unique, but then
everything seems to work.

Does this match up with what other people have been doing?  Anything
I've missed?  Any patches I should pull up to make this setup more
reliable before I roll it out in production?

-GAWollman
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


auditd zombies in 10.1?

2015-04-16 Thread Garrett Wollman
I notice that systems of ours which were recently upgraded to 10.1 are
accumulating zombies at an alarming rate.  (Well, alarming enough to
cause me to be paged at 4 in the morning, at any rate!)  These zombies
are all children of auditd.  Has anyone else seen or debugged this?

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


How to unstick ZFS resilver?

2013-10-14 Thread Garrett Wollman
I have a large (88-drive) zpool in which a drive was recently
replaced.  (The pool has a bunch of duff Toshiba MK2001TRKB drives --
never ever pay money for these! -- and I'm trying to replace them one
by one before they fail completely.)  The resilver on the first drive
replacement has been taking much much too long, and currently it's
stuck in this state:

  pool: export
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Oct  9 14:54:47 2013
86.5T scanned out of 86.8T at 1/s, (scan is slow, no estimated time)
982G resilvered, 99.62% done

The overall progress hasn't changed in twelve hours, even across a
reboot, and the server is fairly lightly loaded.  Searching the Web is
no help; can anyone suggest a remedial action?  (This is on
9.1-RELEASE, with our local patches, and all the drives are SAS.)

In exchange, I offer the following DTrace script which I used to
identify the slow SAS drives:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option dynvarsize=2m

inline int TOO_SLOW = 1;/* 100 ms */

dtrace:::BEGIN
{
printf(Tracing... Hit Ctrl-C to end.\n);
}

fbt::dastrategy:entry
{
start_time[(struct buf *)arg0] = timestamp;
}

fbt::dadone:entry
/(this-bp = (struct buf *)args[1]-ccb_h.periph_priv.entries[1].ptr)  
start_time[this-bp]  (timestamp - start_time[this-bp])  TOO_SLOW/
{
@[strjoin(da, lltostr(args[0]-unit_number))] = count();
start_time[this-bp] = 0;
}

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Bind in FreeBSD, security advisories

2013-07-30 Thread Garrett Wollman
In article
1375186900.23467.3223791.24cb3...@webmail.messagingengine.com,
f...@freebsd.org writes:

just import Unbound. However, if you can't reach any DNS servers I
assume you can't reach the roots either, so I don't understand what a
local recursor will gain you.

There are plenty of situations in which a remote recursive resolver is
untrustworthy.  (Some would say any situation.)  It doesn't have to be
BIND, but people do legitimately want the normal DNS diagnostic
utilities, which sadly have been tied together with BIND for some
years now.  (I don't know why anyone would ever use nslookup(1), but
host(1) and dig(1) are pretty much essential.)

It is a little bit disconcerting to see that big chunks of our BSD
heritage have turned into someone else's commercial product, but that
seems to be the way of the world these days.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: perl upgrade woes -- how to best reconcile?

2013-07-09 Thread Garrett Wollman
In article op.wzyd6vkx34t...@markf.office.supranet.net, f...@feld.me writes:

I've had zero problems with upgrades to Perl, etc after I stopped  
compiling my packages in the host OS and started building the packages via  
poudriere and using pkgng (sysutils/pkg). pkg can detect when a perl  
upgrade is happening and is intelligent enough to reinstall all programs  
that require perl; poudriere is smart enough to rebuild and repackage them  
all. It's a match made in heaven and dead simple to use.

This.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: portupgrade(1) | portmaster(8) -- which is more effective for large upgrade?

2013-06-27 Thread Garrett Wollman
In article alpine.bsf.2.00.1306270602300.99...@wonkity.com,
wbl...@wonkity.com writes:
On Thu, 27 Jun 2013, Garrett Wollman wrote:

 Having just gone through this in two different environments, I can
 very very strongly recommend doing the following.  It's not the easy
 button of the TV commercials, but it will make things much much
 easier in the future.

This is an interesting procedure and should be made into a 
web-accessible document!  Setting up a build machine for a network is a 
fairly common desire, and your procedure looks to be doing everything 
the newest way.

See
http://people.freebsd.org/~wollman/converting-to-pkg-repository.html.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: portupgrade(1) | portmaster(8) -- which is more effective for large upgrade?

2013-06-26 Thread Garrett Wollman
In article 5e20544e3580a75759c3858f31894dc9.authentica...@ultimatedns.net, 
bsd-li...@lcommand.com writes:

 I haven't upgraded my tree(s) for awhile. My last attempt to rebuild
after an updating
src  ports, resulted in nearly installing the entire ports tree, which
is why I've
waited so long. Try as I might, I've had great difficulty finding
something that will
_only_ upgrade what I already have installed, _and_ respect the
options used during the
original make  make install, or those options expressed in make.conf.

Having just gone through this in two different environments, I can
very very strongly recommend doing the following.  It's not the easy
button of the TV commercials, but it will make things much much
easier in the future.

1) Switch your system to pkgng if you haven't already.  Unfortunately,
this will not result in the right ports being marked as automatic,
so you'll need to do a bit of post-conversion surgery:

# pkg set -A 1 -g '*'
# pkg query -e '%#r==0' '%n-%v: %c'

Then look through the output of pkg query to identify the leaf
packages that are the ones you actually wanted explicitly to have
installed.  For each one of those:

# pkg set -A 0 packagename

Create a list of your desired packages:

# pkg query -e '%a==0' '%o'  pkg-list

Clean up the unnecessary local packages:

# pkg autoremove

(You can iterate the last three steps, aborting pkg autoremove each
time but the last, until it doesn't offer to remove anything you care
about keeping.)

Repeat this process for each machine, and merge the resulting pkg-list
files using sort -u.  Make sure that pkgng is enabled for ports in
/etc/make.conf.

2) Install and set up poudriere.  Copy /var/db/ports, /etc/src.conf,
and /etc/make.conf to /usr/local/etc (possibly with local variations
as described in poudriere(8) under the heading CUSTOMISATION).

3) Run poudriere options for each jail and setname (if you created
any sets following the customization section referenced above),
providing the package list you constructed, to make sure that any new
options are configured as you require them.

4) Run poudriere bulk for each jail and setname (if you created
any), providing the package list as before.  This will create a pkgng
repository for each jail and set, which you can serve by HTTP (using
your choice of Web server) or SSH (with pkgng 1.1+), and all of these
packages will have been built in a clean jail and (if their
dependencies were specified correctly) will have no library
inconsistencies.

5) Configure your client machines to reference the appropriate
repository created in step (4).

6) Run pkg upgrade -fy on all of your machines, and resolve any
inconsistencies by pkg remove-ing the offending local package.

That seems like a lot of work, and it is, but having done it, there's
a huge benefit the next time you want to do update your systems:

a) Update the ports tree (how you do this depends on how you set up
poudriere -- see the man page).

b) Repeat step (3).

c) Repeat step (4).

d) Check the ports UPDATING file for any warnings about packages you
are about to install.  If it tells you to do pkg install -fR
somepackage, then do those.

e) Run pkg upgrade -y to upgrade any remaining packages.

Even for just three machines it was worth going through this process
-- and worth unifying all of my package sets and options.  Since I now
do one build instead of three, I'm no longer so ocncerned about
minimizing dependencies; it's no big deal if some X libraries get
installed on my server.

-GAWollman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Reinstalling boot blocks on a ZFS-only system

2013-05-12 Thread Garrett Wollman
In article 20130512205837.ga69...@icarus.home.lan, j...@koitsu.org writes:

You may also need to set kern.geom.debugflags=0x10 to inhibit GEOM's
safety measure / to permit writing to LBA 0; see GEOM(4) and search
for the word foot.

If you have set up your partitioning properly (read: following the
clearly recommended best practice on the wiki), there should never,
ever be any reason to do this.  (That is why it's called a DEBUG
flag.)  The necessary and sufficient invocation is:

# gpart bootcode -b /boot/pmbr -p /boot/gptzfsloader -i 1 [a]daX

I have no idea how this works with MBR partitioning, but I would make
one suggestion in that regard: DON'T.  Whatever makes you think you
want to do that, think harder and find another way.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Reinstalling boot blocks on a ZFS-only system

2013-05-12 Thread Garrett Wollman
In article 20130513032838.ga76...@icarus.home.lan, j...@koitsu.org write:
https://wiki.freebsd.org/RootOnZFS/GPTZFSBoot

5. Install the Protected MBR (pmbr) and gptzfsboot loader

Bug #1: Protective, not Protected.

   Fixit# gpart bootcode -b /mnt2/boot/pmbr -p /mnt2/boot/gptzfsboot -i 1 ad0

   This may fail with an operation not permitted error message, since the
   kernel likes to protect critical parts of the disk. If this happens for
   you, run:

   Fixit# sysctl kern.geom.debugflags=0x10

I suppose the bit that's missing here is:

...and then file a bug report, with severity serious and
priority high, because this indicates that something is
seriously broken in the kernel's implementation of GPT
partitioning.

The only way this step can fail (absent bugs) is if something (other
than gpart) has either the whole-disk device or the partition 1 device
open in exclusive mode, which is a can't happen condition at this
stage in an installation.  (Well, it can happen if the disk you are in
the process of destroying has a still-mounted filesystem on it, which
is what the code is supposed to prevent!)

This little bit of cargo-culting used to be necessary for *MBR* and
*bsdlabel* partitioning, before the days of gpart bootcode, to
update the boot0 and embedded partition-boot (boot1) blocks while the
filesystem was mounted, because the bsdlabel boot blocks are stored in
the first 64k of the root filesystem.  When using GPT, the boot blocks
are stored in the boot partition, which doesn't have a mountable
filesystem on it, so should never be open for write except when gpart
bootcode is doing the deed.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS stalls -- and maybe we should be talking about defaults?

2013-03-04 Thread Garrett Wollman
In article 8c68812328e3483ba9786ef155911...@multiplay.co.uk,
kill...@multiplay.co.uk writes:

Now interesting you should say that I've seen a stall recently on ZFS
only box running on 6 x SSD RAIDZ2.

The stall was caused by fairly large mysql import, with nothing else
running.

Then it happened I thought the machine had wedged, but minutes (not
seconds) later, everything sprung into action again.

I have certainly seen what you might describe as stalls, caused, so
far as I can tell, by kernel memory starvation.  I've seen it take as
much as a half an hour to recover from these (which is too long for my
users).  Right now I have the ARC limited to 64 GB (on a 96 GB file
server) and that has made it more stable, but it's still not behaving
quite as I would like, and I'm looking to put more memory into the
system (to be used for non-ARC functions).  Looking at my munin
graphs, I find that backups in particular put very heavy pressure on,
doubling the UMA allocations over steady-state, and this takes about
four or five hours to climb back down.  See
http://people.freebsd.org/~wollman/vmstat_z-day.png for an example.

Some of the stalls are undoubtedly caused by internal fragmentation
rather than actual data in use.  (Solaris used to have this issue, and
some hooks were added to allow some amount of garbage collection with
the cooperation of the filesystem.)

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


A useful munin plugin for monitoring memory usage

2013-02-22 Thread Garrett Wollman
We were having some memory starvation issues (which it turned out were
caused by our backup system), and I found it useful to create a munin
plugin to monitor UMA statistics (as displayed by 'vmstat -z').  As
I'm not interested in dealing with github.com, I thought I would share
it with the people most likely to benefit.  Here's an example of the
graph that gets generated:


(hopefully mailman will let the image through).  The plugin itself
(undoubtedly not in the best of style) follows.

-GAWollman

#!/bin/sh
#
# Plugin to monitor FreeBSD Unified Memory Allocator
# (UMA) statistics from vmstat -z
# Based on the nfsd (NFS server statistics) plugin.
#
#%# family=auto
#%# capabilities=autoconf

getnames () {
/usr/bin/vmstat -z | awk -F: 'NR  2  NF  1 { print $1 }'
}

if [ $1 = autoconf ]; then
if /usr/bin/vmstat -z 2/dev/null | \
  egrep '^ITEM[[:space:]]+' /dev/null; then
echo yes
exit 0
else
echo no
exit 0
fi
fi

if [ $1 = config ]; then
area='AREA'
echo 'graph_title FreeBSD Unified Memory Allocator'
echo 'graph_vlabel bytes'
echo 'graph_total total'
echo 'graph_category system'
getnames | while read label; do
a=$(printf %s $label | tr -C 'A-Za-z0-9_' '_')
echo $a.label $label
echo $a.type GAUGE
echo $a.min 0
echo $a.draw $area
area='STACK'
done
exit 0
fi

/usr/bin/vmstat -z | awk -F '[:,] +' '
NR  2  NF  1 { 
name=$1
gsub([^A-Za-z0-9_], _, name)
print name .value  $2*$4
if($3  0) {
print name .warning  int($2*$3*0.92)
print name .critical  int($2*$3*0.95)
}
}
'

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: IPMI serial console

2013-02-21 Thread Garrett Wollman
In article 20130221233838.gb92...@icarus.home.lan, j...@koitsu.org writes:

Wow, that's disappointing.  I wonder if the underlying IPMI firmware has
a bug relating to using serial port speeds other than 115200.

The bug may be in the BIOS where it claims you can select some other
speed.

Certainly none of the Dell iDRAC systems I've ever seen support
anything other than 115.2, despite there being a speed setting in the
BIOS.  But we're building a custom OS image anyway, so it was no
hardship to put that into /boot.

-GAWollman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Interpreting vmstat -z output

2013-02-09 Thread Garrett Wollman
On a server that's been experiencing some issues, I note the following
in vmstat -z:

ITEM   SIZE  LIMIT USED FREE  REQ FAIL SLEEP

UMA Kegs:   208,  0, 188,  16, 188,   0,   0
UMA Zones: 3456,  0, 188,   0, 188,   0,   0
UMA Slabs:  568,  0, 1209668,6211,50929964,   0,   0
UMA RCntSlabs:  568,  0,   50791,   1,   50791,   0,   0
UMA Hash:   256,  0,  78,  12,  80,   0,   0
16 Bucket:  152,  0, 227,  23, 227,   0,   0
32 Bucket:  280,  0, 522,  10, 522,   0,   0
64 Bucket:  536,  0, 783,   1, 783, 156,   0
128 Bucket:1048,  0,   33513,   0,   33513,134423,   0
[...]

How should I interpret the failure count for 64 Bucket and 128
Bucket?  Does it represent a problem, or something that needs to be
tuned?  There are no obvious tunables, but the code is not exactly
transparent.  No other zones show failures.

-GAWollman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


More on odd ZFS not-quite-deadlock

2013-01-30 Thread Garrett Wollman
I posted a few days ago about what I thought was a ZFS-related
almost-deadlock.  I have a bit more information now, but I'm still
puzzled.  Hopefully someone else has seen this before.

While things are in the hung state, a zfs recv is running.  It's
receiving an empty snapshot to one of the many datasets on this file
server.  zfs recv reports that receiving this particular empty
snapshot takes just about half an hour.  When it finally completes,
everything starts working normally again.  (This particular
replication job will no longer be operational in a few hours, so this
may be the last time I can collect information about the issue for a
while.)  The same zfs recv takes only a few seconds 23 hours out of 24.

The kstacks of the processes that appear to possibly be involved look
like this:

  PIDTID COMM TDNAME   KSTACK   
0 100061 kernel   thread taskq mi_switch+0x196 sleepq_wait+0x42 
_sx_slock_hard+0x3bb _sx_slock+0x3d zfs_reclaim_complete+0x38 
taskqueue_run_locked+0x85 taskqueue_thread_loop+0x46 fork_exit+0x11f 
fork_trampoline+0xe 
7 100215 zfskern  arc_reclaim_thre mi_switch+0x196 
sleepq_timedwait+0x42 _cv_timedwait+0x13c arc_reclaim_thread+0x29d 
fork_exit+0x11f fork_trampoline+0xe 
7 100216 zfskern  l2arc_feed_threa mi_switch+0x196 
sleepq_timedwait+0x42 _cv_timedwait+0x13c l2arc_feed_thread+0x1a8 
fork_exit+0x11f fork_trampoline+0xe 
7 100592 zfskern  txg_thread_enter mi_switch+0x196 sleepq_wait+0x42 
_cv_wait+0x121 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 fork_exit+0x11f 
fork_trampoline+0xe 
7 100593 zfskern  txg_thread_enter mi_switch+0x196 
sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_thread_wait+0x3c 
txg_sync_thread+0x269 fork_exit+0x11f fork_trampoline+0xe 
7 100989 zfskern  txg_thread_enter mi_switch+0x196 sleepq_wait+0x42 
_cv_wait+0x121 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 fork_exit+0x11f 
fork_trampoline+0xe 
7 100990 zfskern  txg_thread_enter mi_switch+0x196 
sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_thread_wait+0x3c 
txg_sync_thread+0x269 fork_exit+0x11f fork_trampoline+0xe 
7 101355 zfskern  txg_thread_enter mi_switch+0x196 sleepq_wait+0x42 
_cv_wait+0x121 txg_thread_wait+0x79 txg_quiesce_thread+0xb5 fork_exit+0x11f 
fork_trampoline+0xe 
7 101356 zfskern  txg_thread_enter mi_switch+0x196 
sleepq_timedwait+0x42 _cv_timedwait+0x13c txg_thread_wait+0x3c 
txg_sync_thread+0x269 fork_exit+0x11f fork_trampoline+0xe 
   13 100053 geom g_event  mi_switch+0x196 sleepq_wait+0x42 
_sleep+0x3a8 g_run_events+0x430 fork_exit+0x11f fork_trampoline+0xe 
   13 100054 geom g_up mi_switch+0x196 sleepq_wait+0x42 
_sleep+0x3a8 g_io_schedule_up+0xd8 g_up_procbody+0x5c fork_exit+0x11f 
fork_trampoline+0xe 
   13 100055 geom g_down   mi_switch+0x196 sleepq_wait+0x42 
_sleep+0x3a8 g_io_schedule_down+0x20e g_down_procbody+0x5c fork_exit+0x11f 
fork_trampoline+0xe 
   22 100225 syncer   -mi_switch+0x196 sleepq_wait+0x42 
_cv_wait+0x121 rrw_enter+0xdb zfs_sync+0x63 sync_fsync+0x19d VOP_FSYNC_APV+0x4a 
sync_vnode+0x15e sched_sync+0x1c5 fork_exit+0x11f fork_trampoline+0xe 

93224 102554 zfs  -mi_switch+0x196 sleepq_wait+0x42 
_cv_wait+0x121 zio_wait+0x61 dbuf_read+0x5e5 dnode_next_offset_level+0x28d 
dnode_next_offset+0xb9 dmu_object_next+0x3e dsl_dataset_destroy+0x164 
dmu_recv_end+0x184 zfs_ioc_recv+0x9f4 zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b 
kern_ioctl+0x115 sys_ioctl+0xf0 amd64_syscall+0x5ea Xfast_syscall+0xf7 

[This is the zfs recv process that is applying the replication package
with an empty snapshot.]

93320 102479 df   -mi_switch+0x196 sleepq_wait+0x42 
_cv_wait+0x121 rrw_enter+0xdb zfs_root+0x40 lookup+0xaa6 namei+0x535 
kern_statfs+0xa4 sys_statfs+0x37 amd64_syscall+0x5ea Xfast_syscall+0xf7 
[7 more like this]

(I've deleted all of the threads that are clearly waiting for some
unrelated event, such as nanosleep() and select().)

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


ZFS deadlock on rrl-rr_ -- look familiar to anyone?

2013-01-28 Thread Garrett Wollman
I just had a big fileserver deadlock in an odd way.  I was
investigating a user's problem, and decided for various reasons to
restart mountd.  It had been complaining like this:

Jan 28 21:06:43 nfs-prod-1 mountd[1108]: can't delete exports for 
/usr/local/.zfs/snapshot/monthly-2013-01: Invalid argument 

for a while, which is odd because /usr/local was never exported.  When
I restarted mountd, it hung waiting on rrl-rr_, but the system may
already have been deadlocked at that point.  procstat reported:

87678 104365 mountd   -mi_switch sleepq_wait _cv_wait 
rrw_enter zfs_root lookup namei vfs_donmount sys_nmount amd64_syscall 
Xfast_syscall 

I was able to run shutdown, and the rc scripts eventually hung in
sync(1) and timed out.  The kernel then hung trying to do the same
thing, but I was able to break into the debugger.  The debugger
interrupted an idle thread, which was not particularly helpful, but I
was able to quickly gather the following information before I had to
reset the machine to restore normal service.

Locked vnodes


0xfe00536383c0: 0xfe00536383c0: tag syncer, type VNON
tag syncer, type VNON
usecount 1, writecount 0, refcount 2 mountedhere 0
usecount 1, writecount 0, refcount 2 mountedhere 0
flags (VI(0x200))
flags (VI(0x200))
lock type syncer: EXCL by thread 0xfe00348cc470 (pid 22)
lock type syncer: EXCL by thread 0xfe00348cc470 (pid 22)

db ps
  pid  ppid  pgrp   uid   state   wmesg wchancmd
87996 1 87994 65534  D   rrl-rr_ 0xfe0048ff8108 df
87976 1 87726 0  D+  rrl-rr_ 0xfe0048ff8108 sync
87707 1 87705 65534  D   rrl-rr_ 0xfe0048ff8108 df
87700 1 87698 65534  D   rrl-rr_ 0xfe0048ff8108 df
87678 1 87657 0  D+  rrl-rr_ 0xfe0048ff8108 mountd
87531 1 87529 65534  D   rrl-rr_ 0xfe0048ff8108 df
87387 1 87385 65534  D   rrl-rr_ 0xfe0048ff8108 df
87380 1 87378 65534  D   rrl-rr_ 0xfe0048ff8108 df
87103 1 87101 65534  D   rrl-rr_ 0xfe0048ff8108 df
87096 1 87094 65534  D   rrl-rr_ 0xfe0048ff8108 df
85193 1 85192 0  D   zio-io_ 0xfe10d3e75320 zfs
   24 0 0 0  DL  sdflush  0x80e50878 [softdepflush]
   23 0 0 0  DL  vlruwt   0xfe0048c0a940 [vnlru]
   22 0 0 0  DL  rrl-rr_ 0xfe0048ff8108 [syncer]
   21 0 0 0  DL  psleep   0x80e3c048 [bufdaemon]
   20 0 0 0  DL  pgzero   0x80e5a81c [pagezero]
   19 0 0 0  DL  psleep   0x80e599e8 [vmdaemon]
   18 0 0 0  DL  psleep   0x80e599ac [pagedaemon]
   17 0 0 0  DL  gkt:wait 0x80de6c0c [g_mp_kt]
   16 0 0 0  DL  ipmireq  0xfe00347400b8 [ipmi0: kcs]
9 0 0 0  DL  ccb_scan 0x80dc1360 [xpt_thrd]
8 0 0 0  DL  waiting_ 0x80e41e80 [sctp_iterator]
7 0 0 0  DL  (threaded)  [zfskern]
101355   D   tx-tx_s 0xfe0050342e10 [txg_thread_enter]
101354   D   tx-tx_q 0xfe0050342e30 [txg_thread_enter]
100989   D   tx-tx_s 0xfe004fd27a10 [txg_thread_enter]
100988   D   tx-tx_q 0xfe004fd27a30 [txg_thread_enter]
100593   D   tx-tx_s 0xfe004a8c0a10 [txg_thread_enter]
100592   D   tx-tx_q 0xfe004a8c0a30 [txg_thread_enter]
100216   D   l2arc_fe 0x81228bc0 [l2arc_feed_thread]
100215   D   arc_recl 0x81218d20 
[arc_reclaim_thread]
   15 0 0 0  DL  (threaded)  [usb]
[32 uninteresting and identical threads deleted]
6 0 0 0  DL  mps_scan 0xfe00276816a8 [mps_scan2]
5 0 0 0  DL  mps_scan 0xfe0027612ca8 [mps_scan1]
4 0 0 0  DL  mps_scan 0xfe00274ef4a8 [mps_scan0]
   14 0 0 0  DL  -0x80ded764 [yarrow]
3 0 0 0  DL  crypto_r 0x80e4e0a0 [crypto returns]
2 0 0 0  DL  crypto_w 0x80e4e060 [crypto]
   13 0 0 0  DL  (threaded)  [geom]
100055   D   -0x80de6b90 [g_down]
100054   D   -0x80de6b88 [g_up]
100053   D   -0x80de6b78 [g_event]
   12 0 0 0  WL  (threaded)  [intr]
100189   I   [irq1: atkbd0]
100188   I   [swi0: uart uart]
100187   I   [irq19: atapci1]
100186   I   [irq18: atapci0+]
100169   I   

Re: RFC: Suggesting ZFS best practices in FreeBSD

2013-01-25 Thread Garrett Wollman
In article alpine.bsf.2.00.1301252014160.37...@wonkity.com,
wbl...@wonkity.com writes:

As far as best practices, situations vary so much that I don't know if 
any drive ID method can be recommended.  For a FreeBSD ZFS document, a 
useful sample configuration is going to be small enough that anything 
would work.  A survey of the techniques in use at various data centers 
would be interesting.

My best practice would be: write the label onto the drive itself.
Have it be something that is physically meaningful.  Then the only
issue is making sure to label a new drive properly when you install
it.

On our big file servers, we use a labeling convention of
s${shelf}d${drive}, where ${shelf} is the rack unit where the shelf is
mounted and ${drive} is the slot number marked on the front of the
drive.  I have some scripts to semiautomatically crawl the output of
camcontrol devlist and sas2ircu to determine which drive is located
where.  Of course, we have no choice but to label the drives; that's
how gmultipath works.

For bootable drives, we just use the built-in labeling feature of gpt:
the swap partition is named swap0 or swap1, and the ZFS partition is
named tank0 or tank1 (as appropriate for whether it's the primary or
secondary boot device).  That way, if one fails on boot, we know which
one it is (or rather, we know which one is still alive).

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Anothe pkgng question: signing a repository

2012-12-27 Thread Garrett Wollman
In article 20121227162311$6...@grapevine.csail.mit.edu,
rai...@ultra-secure.de writes:

I'm creating my own repository and have created a key for it.
[...]
What does pkg expect to be in this file?

A public key.  It does not use X.509 (nor is there any reason why it
should, although I suppose it could be made to at the cost of
significant added complexity and a bootstrapping problem).

-GAWollman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 9.1-RC2 ixgbe lagg problems

2012-10-17 Thread Garrett Wollman
In article
cakvzwmwu2mouq7h5xa1aqxdomtmlj6fg98trsd5xmlfpcp5...@mail.gmail.com,
wynnwil...@gmail.com writes:

I've tried the 2.4.4 driver from Intel's site, but it still has the
same problems. Is a lagg using lacp with the ix interfaces working for
anyone else?

You bet.

lagg0: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST
metric 0 mtu 9120

options=401bbRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO
ether 04:7d:7b:a5:88:f0
inet 128.30.3.34 netmask 0xff00 broadcast 128.30.3.255
nd6 options=29PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL
media: Ethernet autoselect
status: active
laggproto lacp lagghash l2,l3,l4
laggport: ix1 flags=1cACTIVE,COLLECTING,DISTRIBUTING
laggport: ix0 flags=1cACTIVE,COLLECTING,DISTRIBUTING

Configured with:

cloned_interfaces=lagg0
ifconfig_ix0=mtu 9120 up
ifconfig_ix1=mtu 9120 up
ifconfig_lagg0=laggproto lacp laggport ix0 laggport ix1
ipv4_addrs_lagg0=128.30.x.x/24

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Kernel panic with geom_multipath + ZFS

2012-09-11 Thread Garrett Wollman
In article 504f4049.9080...@dest-unreach.be, nio...@dest-unreach.be
writes:

I'm under the illusion that I've found a bug in the FreeBSD kernel, but
since I'm new to FreeBSD, a quiet voice tells me it's probably a case of
you're doing it wrong.

Nope.  It's a known bug in the version of geom_multipath that shipped
with FreeBSD 9.0.  It's fixed in 9.1.

-GAWollman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9.1-RC1 Available...

2012-08-24 Thread Garrett Wollman
In article 20120825041357.gd1...@glenbarber.us, g...@freebsd.org writes:
On Sat, Aug 25, 2012 at 04:55:48AM +0100, Ben Morrow wrote:
 1. Is is sensible|supported to use freebsd-update to update just the src
 component, followed by a normal buildworld/buildkernel to update the
 rest of the system? I would much prefer to avoid having to use svn,
 especially given that it isn't in the base system.

No.  freebsd-update(8) is a binary system updater.  It does not touch
your source tree.

It works just fine for that, actually -- PROVIDED that you installed
the source tree the same way (from original installation media or with
a previous freebsd-update invocation).

I don't know what it will do if you've modified the sources.  On the
machines where I do this, I don't touch the sources.

-GAWollman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Deadlock (?) with ZFS, NFS

2012-03-07 Thread Garrett Wollman
This is unfortunately a very difficult issue to report (particularly
because I don't have console access to the machine until I get into
the office to reboot it).  My server was happily serving NFS on top of
a huge ZFS pool when it ground to a halt -- but only partially.  All
of the nfsd threads got stuck in ZFS, and there were a large number of
pending I/Os, but there was nothing apparently wrong with the storage
system itself.  (At a minimum, smartctl could talk to the drives, so
CAM and the mps driver were working enough to get commands to them.)
ssh logins worked fine, but anything that required writing to that
zpool (such as zpool scrub, sync, and reboot) would get stuck somewhere
in ZFS.  zpool status and zfs-stats reported no issues; netstat -p tcp
reported many NFS connections with large unhandled receive buffers
(owing to the nfsds being unable to complete the request they were
working on).  Nothing in the kernel message buffer to indicate a
problem.

Eventually, the machine stopped responding to network requests as
well, although for a while after sshd stopped working, it still
responded to pings.

Here's a snapshot of top(1).  Note that the zfskern{txg_thread_enter}
thread is getting some CPU, although I can't tell if it's making any
progress or just spinning.  zfskern{l2arc_feed_thread} would
occasionally get some CPU as well, but appeared to do nothing.

  PID USERNAME  PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
   11 root  155 ki31 0K   128K CPU66  45.7H 100.00% idle{idle: 
cpu6}
   11 root  155 ki31 0K   128K CPU33  45.7H 100.00% idle{idle: 
cpu3}
   11 root  155 ki31 0K   128K CPU55  44.9H 100.00% idle{idle: 
cpu5}
   11 root  155 ki31 0K   128K CPU77  44.7H 100.00% idle{idle: 
cpu7}
   11 root  155 ki31 0K   128K CPU22  44.3H 100.00% idle{idle: 
cpu2}
   11 root  155 ki31 0K   128K CPU11  44.1H 100.00% idle{idle: 
cpu1}
   11 root  155 ki31 0K   128K RUN 4  43.7H 100.00% idle{idle: 
cpu4}
   11 root  155 ki31 0K   128K CPU00  43.1H 99.46% idle{idle: 
cpu0}
5 root   -8- 0K   128K zio-i  5  67:25  0.98% 
zfskern{txg_thread_enter}
   12 root  -92- 0K   800K WAIT7 297:15  0.00% intr{irq264: 
ix0:que }
0 root  -160 0K  6144K -   1 297:11  0.00% 
kernel{zio_write_issue_}
0 root  -160 0K  6144K -   2 297:09  0.00% 
kernel{zio_write_issue_}
0 root  -160 0K  6144K -   3 297:07  0.00% 
kernel{zio_write_issue_}
0 root  -160 0K  6144K -   7 296:58  0.00% 
kernel{zio_write_issue_}
0 root  -160 0K  6144K -   6 296:57  0.00% 
kernel{zio_write_issue_}
0 root  -160 0K  6144K -   5 296:54  0.00% 
kernel{zio_write_issue_}
0 root  -160 0K  6144K -   0 296:53  0.00% 
kernel{zio_write_issue_}
0 root  -160 0K  6144K -   4 296:40  0.00% 
kernel{zio_write_issue_}
5 root   -8- 0K   128K l2arc_  4 220:18  0.00% 
zfskern{l2arc_feed_threa}
   12 root  -92- 0K   800K WAIT2 163:56  0.00% intr{irq259: 
ix0:que }
   13 root   -8- 0K48K -   2  93:35  0.00% geom{g_down}
   12 root  -92- 0K   800K WAIT3  85:26  0.00% intr{irq260: 
ix0:que }
0 root  -920 0K  6144K -   0  77:53  0.00% kernel{ix0 
que}
8 root  -16- 0K16K ipmire  5  74:03  0.00% ipmi1: kcs
 1815 root   200 10052K   456K zfs 4  72:07  0.00% nfsd{nfsd: 
master}
 1815 root   200 10052K   456K zfs 0  71:51  0.00% nfsd{nfsd: 
service}
 1815 root   200 10052K   456K tx-tx  5  71:45  0.00% nfsd{nfsd: 
service}
 1815 root   200 10052K   456K zfs 2  71:43  0.00% nfsd{nfsd: 
service}
 1815 root   200 10052K   456K zfs 1  71:31  0.00% nfsd{nfsd: 
service}
 1815 root   200 10052K   456K zfs 0  71:25  0.00% nfsd{nfsd: 
service}
 1815 root   200 10052K   456K zfs 6  71:23  0.00% nfsd{nfsd: 
service}
 1815 root   200 10052K   456K zfs 1  71:18  0.00% nfsd{nfsd: 
service}
 1815 root   200 10052K   456K zfs 4  71:15  0.00% nfsd{nfsd: 
service}
 1815 root   200 10052K   456K zfs 1  71:13  0.00% nfsd{nfsd: 
service}
 1815 root   200 10052K   456K zfs 7  71:10  0.00% nfsd{nfsd: 
service}
 1815 root   200 10052K   456K zfs 0  71:10  0.00% nfsd{nfsd: 
service}
 1815 root   200 10052K   456K zfs 6  71:07  0.00% nfsd{nfsd: 
service}
 1815 root   200 10052K   456K zfs 2  71:02  0.00% nfsd{nfsd: 
service}
 1815 root   200 10052K   456K zfs 5  70:58  0.00% nfsd{nfsd: 
service}
   12 root  -68- 0K   800K WAIT2  70:50  0.00% intr{swi2: 

Re: FreeBSD 9.0 release - memstick installation fails

2012-03-02 Thread Garrett Wollman
In article 719f8e0e-f88d-48e7-b2b7-aba44b4f4...@free.de,
galla...@free.de wrote:

Trying to install 9.0 release with a USB stick.
I use FreeBSD-9.0-RELEASE-amd64-memstick.img

At first the bootup looks promising, but in the end it stops with Root
mount waiting for: usbus2 usbus1 usbus

The 9.0 memstick image worked (nearly[1]) flawlessly for me, albeit on
a different kind of device.  It would be really nice if it supported
installing directly onto ZFS, though, so that I could avoid doing the
install-on-UFS, copy-to-ZFS, pull-boot-drive-and-reboot,
reinsert-boot-drive-and-add-to-mirror business.  Getting
/boot/zfs/zpool.cache is really tricky with the new livefs
architecture -- I was able to install on a ZFS pool just fine, but I
wasted three hours trying to make it bootable before giving up and
doing it the old-fashioned way.

(That machine is now being tested as an NFS fileserver[2], getting the
snot beat out of it by our most abusive users.)

-GAWollman

[1] The mps driver in 9.0 is no good, and would not allow the system
to boot while even one drive shelf was hooked up.  Booting with
external drives disconnected, then connecting them in multiuser mode,
works fine, although it didn't see one of the SES targets.
After cherry-picking the new LSI-supported driver from 9-stable, it
boots fine and sees all of the SES devices.

[2] 
wollman@zfsnfs(9)$ uptime 
10:17PM  up 4 days, 13:07, 2 users, load averages: 8.54, 8.49, 8.72
wollman@zfsnfs(10)$ zpool status export
  pool: export
 state: ONLINE
 scan: scrub canceled on Thu Mar  1 15:22:42 2012
config:

NAME  STATE READ WRITE CKSUM
exportONLINE   0 0 0
  raidz2-0ONLINE   0 0 0
multipath/disk2   ONLINE   0 0 0
multipath/disk3   ONLINE   0 0 0
multipath/disk4   ONLINE   0 0 0
multipath/disk5   ONLINE   0 0 0
multipath/disk6   ONLINE   0 0 0
multipath/disk7   ONLINE   0 0 0
multipath/disk8   ONLINE   0 0 0
multipath/disk9   ONLINE   0 0 0
multipath/disk10  ONLINE   0 0 0
multipath/disk11  ONLINE   0 0 0
multipath/disk12  ONLINE   0 0 0
  raidz2-1ONLINE   0 0 0
multipath/disk14  ONLINE   0 0 0
multipath/disk16  ONLINE   0 0 0
multipath/disk17  ONLINE   0 0 0
multipath/disk19  ONLINE   0 0 0
multipath/disk20  ONLINE   0 0 0
multipath/disk21  ONLINE   0 0 0
multipath/disk22  ONLINE   0 0 0
multipath/disk23  ONLINE   0 0 0
multipath/disk24  ONLINE   0 0 0
multipath/disk25  ONLINE   0 0 0
multipath/disk26  ONLINE   0 0 0
  raidz2-2ONLINE   0 0 0
multipath/disk27  ONLINE   0 0 0
multipath/disk28  ONLINE   0 0 0
multipath/disk29  ONLINE   0 0 0
multipath/disk30  ONLINE   0 0 0
multipath/disk31  ONLINE   0 0 0
multipath/disk32  ONLINE   0 0 0
multipath/disk33  ONLINE   0 0 0
multipath/disk34  ONLINE   0 0 0
multipath/disk35  ONLINE   0 0 0
multipath/disk36  ONLINE   0 0 0
multipath/disk37  ONLINE   0 0 0
  raidz2-3ONLINE   0 0 0
multipath/disk38  ONLINE   0 0 0
multipath/disk39  ONLINE   0 0 0
multipath/disk40  ONLINE   0 0 0
multipath/disk41  ONLINE   0 0 0
multipath/disk42  ONLINE   0 0 0
multipath/disk43  ONLINE   0 0 0
multipath/disk44  ONLINE   0 0 0
multipath/disk45  ONLINE   0 0 0
multipath/disk46  ONLINE   0 0 0
multipath/disk47  ONLINE   0 0 0
multipath/disk48  ONLINE   0 0 0
  raidz2-4ONLINE   0 0 0
multipath/disk49  ONLINE   0 0 0
multipath/disk50  ONLINE   0 0 0
multipath/disk52  ONLINE   0 0 0
multipath/disk53  ONLINE   0 0 0
multipath/disk54  ONLINE   0 0 0
multipath/disk55  ONLINE   0 0 0
multipath/disk56  ONLINE   0 0 0
multipath/disk57  ONLINE   0 0 0
multipath/disk58  ONLINE   0 0 0

Re: New BSD Installer

2012-02-16 Thread Garrett Wollman
In article 20120217021019.ga61...@icarus.home.lan,
Jeremy Chadwick  free...@jdc.parodius.com wrote:

So for version 0.90 of their metadata format, you lose drive capacity by
about 64-128KBytes, given that the space is needed for metadata.

Which is exactly what geom_mirror does, amazingly enough.  (Except, of
course, that geom_mirror only needs one sector!)  See the function
mirror_label() in /usr/src/sbin/geom/class/mirror/geom_mirror.c.  So
if geom_mirror is not working with GPT, you should look elsewhere for
the culprit.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-12 Thread Garrett Wollman
In article 4ee6295b.3020...@cran.org.uk, brucecran.org.uk writes:

This isn't something that can be fixed by tuning ULE? For example for 
desktop applications kern.sched.preempt_thresh should be set to 224 from 
its default.

Where do you get that idea?  I've never seen any evidence for this
proposition (although the claim is repeated often enough).  What are
the specific circumstances that make this useful?  Where did the
number come from?

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-12 Thread Garrett Wollman
In article 4ee6595c.3080...@cran.org.uk, br...@cran.org.uk writes:
On 12/12/2011 19:23, Garrett Wollman wrote:
 Where do you get that idea?  I've never seen any evidence for this
 proposition (although the claim is repeated often enough).  What are
 the specific circumstances that make this useful?  Where did the
 number come from?

It's just something I've heard repeated, and people claiming that 
setting it improves performance.

This explains how the value 224 was obtained:
http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058686.html

Not so far as I can see.

The message does suggest that it helps if you are running a CPU-hog
GUI, which seems plausible to me, but doesn't justify making it the
default -- particularly when the setting is undocumented.  (It appears
to control how CPU-bound a process can be and still preempt another
even more CPU-bound process, so using this as a desktop performance
fix looks doubly wrong.)

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: unable to pwd in ZFS snapshot

2010-12-26 Thread Garrett Wollman
In article alpine.osx.2.00.1012261912460.43...@hotlap.local,
sp...@bway.net writes:

Other gotchas would be some of the periodic scripts - you don't want 
locate.updatedb traversing all that, or the setuid checks.

locate.updatedb in 9-current doesn't do that, by default.  Arguably
you want the setuid checks to do it, so that you're aware of setuid
executables that are buried in old snapshots -- particularly if you
keep old snapshots of /usr around after a security update.

Also I know I'm prone to sometimes doing a brute-force find which
can also dip into those hundreds of snapshot dirs.  In general, I
think having the directories hidden is a good default.

I could see the logic in having find not descend into .zfs directories
by default (if done in a sufficiently general way), although then
you'd have to introduce a new flag yes, really, look at everything!
for cases when that's not desirable.

-GAWollman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: RFC: Upgrade BIND version in RELENG_7 to BIND 9.6.x

2010-12-18 Thread Garrett Wollman
In article 4d0c49a2.4000...@freebsd.org, do...@freebsd.org writes:

In order to avoid repeating the scenario where we have a version of BIND
in the base that is not supported by the vendor I am proposing that we
upgrade to BIND 9.6-ESV in FreeBSD RELENG_7.

+1

All users are going to want working DNSsec soon, if they don't
already, and that requires 9.6.  (In fact, we should start shipping
with DNSsec enabled by default and the root key pre-configured, if we
aren't already doing so.)

-GAWollman

-- 
Garrett A. Wollman| What intellectual phenomenon can be older, or more oft
woll...@bimajority.org| repeated, than the story of a large research program
Opinions not shared by| that impaled itself upon a false central assumption
my employers. | accepted by all practitioners? - S.J. Gould, 1993
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: /sbin/reboot

2010-12-09 Thread Garrett Wollman
In article aanlktikggsyrlnds6oihw2u3syjezrrqwdsa9z4t7...@mail.gmail.com, 
amvandem...@gmail.com writes:

For the correct order, shutdown -r calls reboot which calls init which
calls rc.shutdown.

No.  shutdown(8) sends a SIGINT to init(8), which runs rc.shutdown and
then calls reboot(2) as its last act.

reboot(8) freezes init(8), then sends a SIGTERM to anything left
running, then sends a SIGKILL to anything left running, then calls
reboot(2) as its last act.

Doing a shutdown -r is the same as a reboot without the warning to logged in
users and shutdown handles the logging instead of reboot.

Not even close.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: umass causes panic on 7 amd64

2008-04-17 Thread Garrett Wollman
In article [EMAIL PROTECTED], [EMAIL PROTECTED] writes:

Eh I think I saw something like this myself.
Do you by a chance have that new device sg in your kernel?
I assume you do (GENERIC) - try to drop it.
I am not sure if this is some brokenness of that driver or fighting of
several USB drivers over the same hardware.

In my experience, umass over EHCI has never worked on any machine
ever, going back to 5.x and over multiple kinds of umass devices.  (I
never saw panics, only triple-fault CPU resets.)  My old workaround
was to disable EHCI in the kernel configuration, but in 7.0 umass just
doesn't work (reads corrupt data, even without going through a
filesystem).  The same device works fine in Windows.  (In fact that
was the workaround I had to resort to during a recent trip.)

FireWire has always worked for me, but consumer A/V devices don't come
with FireWire ports, and my 7.0 machine is a USB-only laptop.  (I can
plug a CF card into my FireWire CF reader and get valid data off it on
my 6.2 desktop, whereas the same CF device fails on the laptop under
7.0.  It worked when the laptop was running 6.2.)

-GAWollman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Upgrading to 7.0 - stupid requirements

2008-03-22 Thread Garrett Wollman
In article [EMAIL PROTECTED],
Freddie Cash writes:

Oh, gods, please, no!  That is one of the things I absolutely hate
about Debian (and its derivatives).  There are some packages on Debian
where they use separate text files for each configuration option
(ProFTPd, for examples).  It is a huge mess of directories and files
that makes it a *royal* PITA to edit at the CLI.

Yes, a scheme like that is better for GUI tools, but it really makes
things more difficult for non-GUI users/uses (like headless servers
managed via SSH).

Try managing a few hundred mostly-but-not-entirely-identical machines
and you really begin to appreciate the value of this approach.  It is
orders of magnitude easier to drop one file into the central config
repository that does *one thing* than it is to manage a dozen
not-quite-identical copies of a monolithic configuration file, keeping
in sync the parts that are supposed to be in sync, and keeping the
parts that are supposed to be different, different.

If FreeBSD were able to do this, it might have a bit more traction at
my place of employment.

-GAWollman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: is localtime file binary compatible with older releases?

2007-02-27 Thread Garrett Wollman
In article [EMAIL PROTECTED] you write:
Is the compiled time zone info file binary compatible across FreeBSD  
versions?

It is backwards-compatible across all versions of the Olson timezone
library going back to before there was a FreeBSD, on all platforms,
regardless of architecture (modulo bugs in libc).

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: send() returns error even though data is sent, TCP connection still alive

2007-01-31 Thread Garrett Wollman
In article [EMAIL PROTECTED],
Jeff Davis  [EMAIL PROTECTED] wrote:

You should see something like write failed: host is down and the
session will terminate. Of course, when ssh exits, the TCP connection
closes. The only way to see that it's still open and active is by
writing (or using) an application that ignores EHOSTDOWN errors from
write().

I agree that it's a bug.  The only time write() on a stream socket
should return the asynchronous error[1] is when the connection has
been (or is in the process of being) torn down as a result of a
subsequent timeout.  POSIX says may fail for these errors write()
and send() on sockets

-GAWollman

[1] There are two kinds of error returns in the socket model:
synchronous errors, like synchronous signals, are attributed to the
result of a specific system call, detected prior to syscall return,
and usually represent programming or user error (e.g., attempting to
connect() on an fd that is not a socket).  Asynchronous errors are
detected asynchronously, and merely posted to the socket without being
delivered; they may be delivered on the next socket operation.  See
XSH 2.10.10, Pending Error.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Possibility for FreeBSD 4.11 Extended Support

2006-12-21 Thread Garrett Wollman
[EMAIL PROTECTED] writes:

-5.x was never really for production use, in the same way 3.x never
was. 

Why do people continue to say this?  Many sites have used, are still
using, and plan to continue to use, 5.x in production.  ftp5/cvsup3
ran 5.x until a few months ago, and I have a netnews transit server
and a Web server that still run 5.5.  I'm slowly moving things off 5.x
for the better support and performance of 6.x, but it's been stable
for me in two fairly tough production applications for quite some
time.

-GAWollman

-- 
Garrett A. Wollman   | The real tragedy of human existence is not that we are
[EMAIL PROTECTED]| nasty by nature, but that a cruel structural asymmetry
Opinions not those   | grants to rare events of meanness such power to shape
of MIT or CSAIL. | our history. - S.J. Gould, Ten Thousand Acts of Kindness
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: new zoneinfo for 5.5-R?

2006-03-23 Thread Garrett Wollman
In [EMAIL PROTECTED], Steve Ames writes:

For us poor saps in Indiana... could we get a new zoneinfo port
prior to 5.5-R?

Sorry, I've been unable to devote any attention at all to FreeBSD in
the past three months or so.  I'm hoping to clear the backlog soon,
but I don't think that I'll be able to do it before the release as I
had originally planned.  You can always drop in the new tzdata files
on your existing system.  (I'm hoping at some point in the near future
to create a port so that you don't have to update your system to get
the latest tzdata.)

-GAWollman

-- 
Garrett A. Wollman| As the Constitution endures, persons in every
[EMAIL PROTECTED] | generation can invoke its principles in their own
Opinions not those| search for greater freedom.
of MIT or CSAIL.  | - A. Kennedy, Lawrence v. Texas, 539 U.S. 558 (2003)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Brief Report: Acer AS5002LMi notebook with FreeBSD

2005-12-03 Thread Garrett Wollman
On Sat, 3 Dec 2005 11:36:28 + (GMT), Robert Watson [EMAIL PROTECTED] 
said:

 Last week, I ordered byself a new notebook -- for reasons of price, stock, 
 features, etc, I went with the Lenovo z60t 2511.

Hmmm.  Last week, I ordered myself a new laptop -- for reasons of
price, availability, minimal Intel proprietary parts content, etc., I
went with the Acer Aspire 5002LMi.

 comes with a 14 display (1280x768 widescreen), 1.86GHz 
 Pentium M, 512M of memory, and a 60GB hard disk.

14-inch display (1024x768 normalscreen), 1.6-GHz Turion64, 512 Mbyte
of memory, and an 80-Gbyte hard disk.

 It also has hardware fingerprint scanner, bluetooth, SD card reader,
 broadcom gig-e, firewire, atheros 802.11 chipset, and various other
 neat things.

Mine doesn't have any of those neat things.  The chipset is SiS,
including Fast Ethernet (no Gig), USB2, and dual-screen video.  One
CardBus slot, no legacy ports (but the internal keyboard and touchpad
are PS/2, not USB).  All of the other Acer non-Centrino laptops I had
been able to examine came with Atheros wireless, but this one is
Broadcom.  I haven't tried Project Evil yet; since I had a Cardbus
wireless card available I used that.  There are other builds of this
platform that include Bluetooth support; mine has a button on the
front panel for it but lacks the interface.

 After chatting with Bjoern Zeeb bz, I concluded that I would leave XP on 
 the notebook, as well as the IBM maintenance partition.  The advantage to 
 keeping these around is that it makes it possible to pick up useful things 
 like BIOS updates, and possibly makes getting support easier when things 
 go horribly wrong (hasn't happened yet).

Sometimes I need to test/run Windows software, so there was no
question for me.  The machine shipped with XP Pro.  Acer was
considerate and partitioned the drive into two logical volumes (repeat
after me, C: is for Crap and D: is for Data).  This machine required
a BIOS upgrade in order to fix a problem with the Synaptics touchpad.
I was able to tell Windows that I really didn't want the extra
partition, and then I was able to install 6.0 over the network on the
back half of the disk.  The whole install (including KDE) took less
than an hour and was probably the smoothest install I've ever done
using sysinstall.

 (2) X.org 6.8.2 used the VESA video mode 1024x768.

I had interesting video issues with this laptop.  The video mode used
by Windows for 1024x768 made the projector I was testing with deeply
unhappy.  X worked just fine standalone, but if I started it with the
projector connected, it used a weird 1024x576 wide-screen mode which
neither I nor the projector cared much for.  After overriding X's idea
of Generic VGA Monitor sync rates, I was able to get 1024x768 to
work on both the LCD and the projector.

 (4) When I loaded the if_ath driver to use the wireless, I got an NMI
  and panic.

Didn't happen for me.  I did find that it mattered whether I
specifically assigned an SSID to the wireless; when I did not do so, I
was unable to communicate with my infrastructure network.  I won't
have time to debug this before I leave for San Diego on Monday.

 At the end of the day, I have most things working with this notebook 
 except for the following:

 - I don't yet have the sound driver attaching.

This card has an SiS 7012, which attaches to the snd_ich driver.  On
attach it reports Unknown AC97 Codec (id = 0x414c4770) but this does
not seem to cause a failure.  From my research this seems to be an
Avance Logic ALC203; I just addded it to ac97.c but haven't had a
chance to test it yet.

 - Suspend/resume was pretty sad, I don't advise trying it.  I may get a
chance to investigate this more while on travel over the next few weeks.

ACPI seems to be somewhat broken on this machine, even after updating
to the latest BIOS.  The CPU-state information looks bogus (and
telling the kernel to use C3 is asking for trouble).  Suspend doesn't.
There's an error in the \\_SB_.BAT1._BST method so it can't get the
battery status.  Dump available on request.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Good, stable gigabit nic?

2005-10-08 Thread Garrett Wollman
In article [EMAIL PROTECTED], [EMAIL PROTECTED] writes:

My gigabit nic has gone bad (after months of working just fine  it's
saying sk0 watchdog timeout after a day or so of operation - temp
fix is to reboot) - I'm looking for pointers for a low cos but
functional gigabit PCI 32 card that runs under 5.4 without issues.  
What works for you?

I have machines with Tigon II (ti), Marvell 88E8001 (sk), i82540 (em),
RTL8169 (re), and BCM5750 (bge) interfaces.  The only ones I've ever
had problems with were the Tigon and Marvell, and the latter was fixed
nearly a year ago.  All except the Tigon and Broadcom are running
under 5-STABLE; the Broadcom is running 6.0b5.

I can't comment on the Level I, Nominal Semidestructor, or VIA chips
as I've never seen any.

-GAWollman

-- 
Garrett A. Wollman| As the Constitution endures, persons in every
[EMAIL PROTECTED] | generation can invoke its principles in their own
Opinions not those| search for greater freedom.
of MIT or CSAIL.  | - A. Kennedy, Lawrence v. Texas, 539 U.S. 558 (2003)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5-STABLE softupdates issue?

2004-11-24 Thread Garrett Wollman
On Wed, 24 Nov 2004 20:36:09 +0100, Matthias Andree [EMAIL PROTECTED] said:

 I posted about softupdate problems on a SCSI system with DISABLED WRITE
 CACHE, on a somewhat flakey Micropolis drive that froze and caused

If the hardware is broken, all bets are off, soft updates or no.

-GAWollman

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: POSIX_C_SOURCE

2003-08-30 Thread Garrett Wollman
In article [EMAIL PROTECTED] you write:

Any chance that someone will finally commit the fixes to prevent the
POSIX_C_SOURCE warnings from showing up? I saw a number of posts on this
topic, but it still seems like it's not officially committed

/usr/include/sys/cdefs.h:273: warning: `_POSIX_C_SOURCE' is not defined
/usr/include/sys/cdefs.h:279: warning: `_POSIX_C_SOURCE' is not defined

The warnings are wrong,[1] so you should probably ask the GCC people
about that.

-GAWollman

[1] That is to say, any identifier used in a preprocessor expression
(after macro expansion) is defined to have a value of zero, and GCC
should not be complaining about this.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD Security Advisory FreeBSD-SA-03:04.sendmail

2003-03-03 Thread Garrett Wollman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

In article [EMAIL PROTECTED] Barney Wolff writes:

As of 13:06 EST, the commits had made it to head but were NOT tagged
with RELENG_4 or RELENG_5_0 from cvsup3.

cvsup3 updates every hour at 15 after.  I'm afraid you were just
unlucky.

- -GAWollman

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.0.7 (FreeBSD)

iD8DBQE+Y6adI+eG6b7tlG4RAm8mAJ9zvDTk24BAwUdcPCyOgunxCaVTTwCfZG3s
4XdMunELySmG5NpUTrOuMnA=
=Cw+h
-END PGP SIGNATURE-

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message


Re: cvsup Problem?

2002-08-15 Thread Garrett Wollman

In article mit.lcs.mail.freebsd-stable/[EMAIL PROTECTED] 
you write:
/usr/local/etc/cvsup/prefixes/FreeBSD.cvs/src/contrib/sendmail/BuildTools/OS/A-UX,v:

randy Is anyone else getting this message, and is it being fixed?

I've been told it's only on cvsup3.  Try changing servers until it is
resolved.

And did anybody bother to actually *tell* the admin of cvsup3?

(In case it wasn't clear, that was a rhetorical question.)

-GAWollman

-- 
Garrett A. Wollman   | [G]enes make enzymes, and enzymes control the rates of
[EMAIL PROTECTED]  | chemical processes.  Genes do not make ``novelty-
Opinions not those of| seeking'' or any other complex and overt behavior.
MIT, LCS, CRS, or NSA| - Stephen Jay Gould (1941-2002)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: FreeBSD 4.4 upcoming release/timetabling/mirroring

2001-09-10 Thread Garrett Wollman

On Mon, 10 Sep 2001 12:47:32 -0700, Jordan Hubbard [EMAIL PROTECTED] said:

 No, I didn't miss it.  You and Jason are simply asking for information
 I don't have to give you and you're not going to get.

You are trying to tell me you are incapable of running `du -s | mail
freebsd-hubs' before copying the files onto ftp-master?!

You have no place casting aspersions on other people's competence,
then.

 Add to this the fact that 4.4-RELEASE will be the first one
 to incorporate 5 ISO images by default

...which we, and I suspect most mirrors, absolutely WILL NOT carry.

-GAWollman


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: Problems with mergemaster

2001-06-13 Thread Garrett Wollman

In article 
mit.lcs.mail.freebsd-stable/045401c0f426$4a45d900$[EMAIL PROTECTED] you 
write:

   *** Problem installing ./foo , it will remain to merge by hand

That's because someone(tm) removed the `-c' options to `install' from
the -stable version of mergemaster, even though the -stable version of
`install' still needs them.

-GAWollman

-- 
Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
[EMAIL PROTECTED]  | O Siem / The fires of freedom 
Opinions not those of| Dance in the burning flame
MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



Re: sendmail traffic analysis

2001-05-11 Thread Garrett Wollman

In article 20010511172756$[EMAIL PROTECTED] you write:

I'm not sure how to monitor sendmail with SNMP, and would be
interested in hearing from others what tools they use to monitor SMTP
traffic on their FreeBSD systems.

Depends on what and how you want to monitor.  For BIND, I wrote a
little script that stuffs those annoying statistics dumps into an
RRD.  You could conceivably do the same thing with sendmail, although
you would have to collect your own stats by analyzing the log files.

The one place where we use SNMP to monitor sendmail is by using
ucd-snmp's process monitoring feature.  We then use Cricket to monitor
the number of sendmails active.  While this is statistically invalid
(because cricket measures every five minutes exactly) it still gives
us a useful look at what's going on.  (I just looked at my long-term
cricket graphs and learned something which was totally new to me:
there seems to be a new outbreak of the Hybris worm on the seventh day
of every month, although the population seems to have peaked last
February.)

-GAWollman

-- 
Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
[EMAIL PROTECTED]  | O Siem / The fires of freedom 
Opinions not those of| Dance in the burning flame
MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message