Re: mod_fcgid doesn't work in 9-stable jails after upgrade from 8.x

2012-12-20 Thread Attila Nagy

I could finally take the time to look into this, so here's the solution:
Setting jail_sysvipc_allow="YES" in rc.conf is no longer enough, it only 
sets the sysctl, but the new rc.d/jail script won't add the 
allow.sysvipc=1 parameter.

So in order to work, you must change the above to:
jail_jailname_parameters="allow.sysvipc=1"

ps: it's not related to 9, stable/8 rc.d/jail has the same new style 
jail invocation and hence the same problem.


On 11/04/2012 07:47 PM, Attila Nagy wrote:

Hi,

I've just tried to upgrade a machine running an older 8-stable to 
9-stable@r242549M without success.
It runs an apache with mod_fcgid in a jail and the latter can't start 
with the error message of:
[Sun Nov 04 16:09:12 2012] [emerg] (78)Function not implemented: 
mod_fcgid: Can't create shared memory for size 1192488 bytes


security.jail.sysvipc_allowed is enabled (it was needed on 8.x too), 
nothing else has changed.


There are some reports from this, but from earlier versions, and the 
only confirmed solution was sysvipc_allowed, which works for 8.x, but 
doesn't with the above version.


Any ideas?


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


mod_fcgid doesn't work in 9-stable jails after upgrade from 8.x

2012-11-04 Thread Attila Nagy

Hi,

I've just tried to upgrade a machine running an older 8-stable to 
9-stable@r242549M without success.
It runs an apache with mod_fcgid in a jail and the latter can't start 
with the error message of:
[Sun Nov 04 16:09:12 2012] [emerg] (78)Function not implemented: 
mod_fcgid: Can't create shared memory for size 1192488 bytes


security.jail.sysvipc_allowed is enabled (it was needed on 8.x too), 
nothing else has changed.


There are some reports from this, but from earlier versions, and the 
only confirmed solution was sysvipc_allowed, which works for 8.x, but 
doesn't with the above version.


Any ideas?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


mpt doesn't propagate read errors and dies on a single sector?

2012-10-20 Thread Attila Nagy

Hi,

I have a Sun X4540 with LSI C1068E based SAS controllers (FW version: 
1.27.02.00-IT).
My problem is if one drive starts to fail with read errors, the machine 
becomes completely unusable (running stable/9 with ZFS), because -it 
seems- ZFS can't see that there are read errors on a device, the mpt 
driver (controller, kernel?) wants to re-issue the operation endlessly.


Here is a verbose (dev.mpt.0.debug=7 level) dump:
mpt0: Address Reply:
SCSI IO Request Reply @ 0xff87ffcfdc00
IOC StatusSuccess
IOCLogInfo0x
MsgLength 0x09
MsgFlags  0x00
MsgContext0x000200eb
Bus:  0
TargetID  3
CDBLength 10
SCSI Status:  Check Condition
SCSI State:   (0x0001)AutoSense_Valid
TransferCnt   0x2
SenseCnt  0x0012
ResponseInfo  0x
(da3:mpt0:0:3:0): READ(10). CDB: 28 0 3a 38 5d e 0 1 0 0
(da3:mpt0:0:3:0): CAM status: SCSI Status Error
(da3:mpt0:0:3:0): SCSI status: Check Condition
(da3:mpt0:0:3:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da3:mpt0:0:3:0): Info: 0x3a385d1a
(da3:mpt0:0:3:0): Error 5, Unretryable error
SCSI IO Request @ 0xff80003046f0
Chain Offset  0x00
MsgFlags  0x00
MsgContext0x000200ea
Bus:0
TargetID3
SenseBufferLength   32
LUN:  0x0
Control   0x0200  READ  SIMPLEQ
DataLength  0x0002
SenseBufAddr0x0c65d5e0
CDB[0:10]   28 00 3a 38 5e 0e 00 01 00 00
SE64 0xff87ffd1c430: Addr=0x00010e858000 
FlagsLength=0xd302

 64_BIT_ADDRESSING LAST_ELEMENT END_OF_BUFFER END_OF_LIST
mpt0: Address Reply:
SCSI IO Request Reply @ 0xff87ffcfdd00
IOC StatusSuccess
IOCLogInfo0x
MsgLength 0x09
MsgFlags  0x00
MsgContext0x000200ea
Bus:  0
TargetID  3
CDBLength 10
SCSI Status:  Check Condition
SCSI State:   (0x0001)AutoSense_Valid
TransferCnt   0x2
SenseCnt  0x0012
ResponseInfo  0x

And I get these check condition SCSI errors endlessly. If ZFS is enabled 
at boot, the machine can't even start because of this (zpool import 
never finishes), if I boot without ZFS, and try to import, the zpool 
command stucks in the vdev_g state:

 1163 root  1  200 35440K  5200K vdev_g  6   0:01 0.10% zpool
procstat -k 1163
  PIDTID COMM TDNAME KSTACK
 1163 100116 zpool-mi_switch 
sleepq_timedwait _sleep biowait vdev_geom_read_guid vdev_geom_open 
vdev_open vdev_open_children vdev_raidz_open vdev_open 
vdev_open_children vdev_root_open vdev_open spa_load spa_tryimport 
zfs_ioc_pool_tryimport zfsdev_ioctl devfs_ioctl_f


Could it be that GEOM/ZFS doesn't receive this read error and waits 
indefinitely for the command to complete?


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Fatal trap 19, Stopped at bge_init_locked+ and bge booting problems

2012-02-22 Thread Attila Nagy

On 02/23/12 21:44, YongHyeon PYUN wrote:

I have to ask more information for the controller to Broadcom.
Not sure whether I can get some hint at this moment though. :-(
Is there anything I can do? I ask this because I have to give back this 
server very soon.


Given that you also have USB related errors, could you completely
remove bge(4) in your kernel and see whether it can successfully
boot up?
I think you can add the following entries to /boot/device.hints
without rebuilding kernel.

hint.bge.0.disabled="1"
hint.bge.1.disabled="1"
hint.bge.2.disabled="1"
hint.bge.3.disabled="1"

This does not help.
Removing bge makes it stop here:
da0 at ciss0 bus 0 scbus0 target 0 lun 0
da0:  Fixed Direct Access SCSI-5 device
da0: 135.168MB/s transfers
da0: Command Queueing enabled
da0: 286070MB (585871964 512 byte sectors: 255H 32S/T 65535C)
panic: bootpc_init: no eligible interfaces
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x187
bootpc_init() at bootpc_init+0x1205
mi_startup() at mi_startup+0x77
btext() at btext+0x2c
KDB: enter: panic
[ thread pid 0 tid 10 ]
Stopped at  kdb_enter+0x3b: movq$0,0x976972(%rip)
db>

Which is completely OK, because there are really no interfaces to boot 
from. Note that there is no NMI either (maybe because it would happen 
later in the initialization process).

Sadly, I can't boot from disk, but I assume it would work.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Fatal trap 19, Stopped at bge_init_locked+ and bge booting problems

2012-02-22 Thread Attila Nagy

On 02/23/12 05:15, YongHyeon PYUN wrote:

bge0:  mem
0xf6bf-0xf6bf,0xf6be-0xf6be,0xf6bd-0xf6bd irq 32
at device 0.0 on pci3
bge0: CHIP ID 0x05719001; ASIC REV 0x5719; CHIP REV 0x57190; PCI-E
 ^^

This controller is new one. Probably BCM5719 A1 but not sure.

Yes, it's in a new machine.




bge0: Try again

This message indicates your controller has ASF/IPMI firmware.
Try disabling ASF and see whether it makes any difference.
(Change hw.bge.allow_asf tunable to 0).

Oh, I always forget that (on the other machines this is set).
This is what I get with
machdep.panic_on_nmi: 0
machdep.kdb_on_nmi: 0
hw.bge.allow_asf: 0

bge0:  mem 
0xf6bf-0xf6bf,0xf6be-0xf6be,0xf6bd-0xf6bd irq 32 
at device 0.0 on pci3

bge0: CHIP ID 0x05719001; ASIC REV 0x5719; CHIP REV 0x57190; PCI-E
bge0: Try again
miibus0:  on bge0
ukphy0:  PHY 1 on miibus0
ukphy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 
1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, 
auto-flow

bge0: Ethernet address: 3c:4a:92:b2:3c:08
pci0:3:0:1: failed to read VPD data.
bge1:  mem 
0xf6bc-0xf6bc,0xf6bb-0xf6bb,0xf6ba-0xf6ba irq 36 
at device 0.1 on pci3

bge1: CHIP ID 0x05719001; ASIC REV 0x5719; CHIP REV 0x57190; PCI-E
miibus1:  on bge1
brgphy0:  PHY 2 on miibus1
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow

bge1: Ethernet address: 3c:4a:92:b2:3c:09
pci0:3:0:2: failed to read VPD data.
bge2:  mem 
0xf6b9-0xf6b9,0xf6b8-0xf6b8,0xf6b7-0xf6b7 irq 32 
at device 0.2 on pci3

bge2: CHIP ID 0x05719001; ASIC REV 0x5719; CHIP REV 0x57190; PCI-E
miibus2:  on bge2
brgphy1:  PHY 3 on miibus2
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow

bge2: Ethernet address: 3c:4a:92:b2:3c:0a
pci0:3:0:3: failed to read VPD data.
bge3:  mem 
0xf6b6-0xf6b6,0xf6b5-0xf6b5,0xf6b4-0xf6b4 irq 36 
at device 0.3 on pci3

bge3: CHIP ID 0x05719001; ASIC REV 0x5719; CHIP REV 0x57190; PCI-E
miibus3:  on bge3
brgphy2:  PHY 4 on miibus3
brgphy2:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow

bge3: Ethernet address: 3c:4a:92:b2:3c:0b
[...]
da0: 286070MB (585871964 512 byte sectors: 255H 32S/T 65535C)
NMI ISA 60, EISA ff
I/O channel check, likely hardware failure.Sending DHCP Discover packet 
from interface bge0 (3c:4a:92:b2:3c:08)

cd0 at ata3 bus 0 scbus3 target 0 lun 0
cd0:  Removable CD-ROM SCSI-0 device
cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present 
- tray closed

bge0: 11 link states coalesced
bge0: link state changed to DOWN
ugen0.2:  at usbus0
uhub3:  
on usbus0

bge1: 5 link states coalesced
bge1: link state changed to DOWN
bge2: link state changed to DOWN
bge3: link state changed to DOWN
bge0: ugen2.2:  at usbus2
uhub4:  
on usbus2

2 link states coalesced
bge0: link state changed to DOWN
bge1: 4 link states coalesced
bge1: link state changed to DOWN
bge0: 4 link states coalesced
bge0: link state changed to DOWN
Sending DHCP Discover packet from interface bge1 (3c:4a:92:b2:3c:09)
uhub3: 6 ports with 6 removable, self powered
bge0: usb_alloc_device: set address 2 failed (USB_ERR_TIMEOUT, ignored)
6 link states coalesced
bge0: link state changed to DOWN
bge1: 2 link states coalesced
bge1: link state changed to DOWN
Sending DHCP Discover packet from interface bge2 (3c:4a:92:b2:3c:0a)
bge0: 2 link states coalesced
bge0: link state changed to DOWN
bge1: usbd_setup_device_desc: getting device descriptor at addr 2 
failed, USB_ERR_TIMEOUT

10 link states coalesced
bge1: link state changed to DOWN
uhub4: 8 ports with 8 removable, self powered
bge0: 4 link states coalesced
bge0: link state changed to DOWN
bge1: 2 link states coalesced
bge1: link state changed to DOWN
Sending DHCP Discover packet from interface bge3 (3c:4a:92:b2:3c:0b)
bge0: 2 link states coalesced
bge0: link state changed to DOWN
bge1: 2 link states coalesced
bge1: link state changed to DOWN
ugen2.3:  at usbus2
uhub5:  
on usbus2

bge0: watchdog timeout -- resetting
bge0: link state changed to UP
usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, 
ignored)

bge1: 2 link states coalesced
bge1: link state changed to DOWN
uhub5: 2 ports with 1 removable, self powered
bge0: link state changed to DOWN
bge1: watchdog timeout -- resetting
bge1: usbd_setup_device_desc: getting device descriptor at addr 2 
failed, USB_ERR_TIMEOUT

2 link states coalesced
bge1: link state changed to DOWN
bge0: 2 link states coalesced
bge0: link state changed to DOWN
ugen1.2:  at usbus1
ukbd0:  on usbus1
kbd2 at ukbd0
ums0:  on usbus1
bge1: 4 link states coalesced
bge1: link state changed to DOWN
bge0: 

Fatal trap 19, Stopped at bge_init_locked+ and bge booting problems

2012-02-22 Thread Attila Nagy

Hi,

I get this on a recent stable/9 system with uhci support removed from 
the kernel config:

da0 at ciss0 bus 0 scbus0 target 0 lun 0
da0:  Fixed Direct Access SCSI-5 device
da0: 135.168MB/s transfers
da0: Command Queueing enabled
da0: 286070MB (585871964 512 byte sectors: 255H 32S/T 65535C)
cd0 at ata3 bus 0 scbus3 target 0 lun 0
cd0:  Removable CD-ROM SCSI-0 device
cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present 
- tray closed

NMI ISA 70, EISA ff
I/O channel check, likely hardware failure.

Fatal trap 19: non-maskable interrupt trap while in kernel mode
cpuid = 0; apic id = 00
instruction pointer = 0x20:0x804543fb
stack pointer   = 0x28:0x81251e40
frame pointer   = 0x28:0x814cf660
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, IOPL = 0
current process = 0 (swapper)
[ thread pid 0 tid 10 ]
Stopped at  bge_init_locked+0x233b: movl0x81c(%rsi),%eax
db>

and this with a plain GENERIC kernel:
da0 at ciss0 bus 0 scbus0 target 0 lun 0
da0:  Fixed Direct Access SCSI-5 device
da0: 135.168MB/s transfers
da0: Command Queueing enabled
da0: 286070MB (585871964 512 byte sectors: 255H 32S/T 65535C)
cd0 at ata3 bus 0 scbus3 target 0 lun 0
cd0:  Removable CD-ROM SCSI-0 device
cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present 
- tray closed

NMI ISA 70, EISA ff
I/O channel check, likely hardware failure.

Fatal trap 19: non-maskable interrupt trap while in kernel mode
cpuid = 0; apic id = 00
instruction pointer = 0x20:0x80711dc5
stack pointer   = 0x28:0x81272040
frame pointer   = 0x28:0xff907cf44b40
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, IOPL = 0
current process = 12 (irq16: uhci0)
[ thread pid 12 tid 100098 ]
Stopped at  uhci_interrupt+0x65:movzwl  %ax,%eax
db> KDB: stack backtrace:
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
mi_switch() at mi_switch+0x27a
turnstile_wait() at turnstile_wait+0x1cb
_mtx_lock_sleep() at _mtx_lock_sleep+0xb0
ukbd_poll() at ukbd_poll+0xbe
kbdmux_poll() at kbdmux_poll+0x3f
sc_cngetc() at sc_cngetc+0xec
cncheckc() at cncheckc+0x4a
cngetc() at cngetc+0x1c
db_readline() at db_readline+0x77
db_read_line() at db_read_line+0x15
db_command_loop() at db_command_loop+0x38
db_trap() at db_trap+0x89
kdb_trap() at kdb_trap+0x101
trap_fatal() at trap_fatal+0x29d
trap() at trap+0x10a
nmi_calltrap() at nmi_calltrap+0x8
--- trap 0x13, rip = 0x80711dc5, rsp = 0x81272040, rbp = 
0xff907cf44b40 ---

uhci_interrupt() at uhci_interrupt+0x65
intr_event_execute_handlers() at intr_event_execute_handlers+0x104
ithread_loop() at ithread_loop+0xa4
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff907cf44d00, rbp = 0 ---
db>

After disabling stopping on NMI (kdb_on_nmi), I still can't boot from 
bge (this is a PXE booted machine), I get this in an infinite loop:

bge1: link state changed to DOWN
DHCP/BOOTP timeout for server 255.255.255.255
bge1: 3 link states coalesced
bge1: link state changed to UP
bge0: 2 link states coalesced
bge0: link state changed to DOWN
bge0: link state changed to UP
bge1: link state changed to DOWN
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: link state changed to DOWN
bge1: 2 link states coalesced
bge1: link state changed to DOWN
bge0: link state changed to UP
bge0: link state changed to DOWN
bge0: 2 link states coalesced
bge0: link state changed to DOWN
bge1: 2 link states coalesced
bge1: link state changed to DOWN
bge0: link state changed to UP
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: link state changed to DOWN

Linux and Windows boot fine on the machine.

dmesg up to the point where it crashes:
Copyright (c) 1992-2012 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.0-STABLE #3: Tue Feb 21 11:57:33 CET 2012
r...@boot.lab:/usr/obj/usr/src/sys/BOOTCLNT amd64
CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz (2693.57-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x206d6  Family = 6  Model = 2d  
Stepping = 6
  
Features=0xbfebfbff
  
Features2=0x15bee3ff

  AMD Features=0x2c100800
  AMD Features2=0x1
  TSC: P-state invariant, performance statistics
real memory  = 34359738368 (32768 MB)
avail memory = 32991268864 (31462 MB)
Event timer "LAPIC

Re: Enabling IPSec panics stable/9 (runs OK on stable/8)

2012-01-05 Thread Attila Nagy

On 01/05/12 11:37, VANHULLEBUS Yvan wrote:

Strange. may be related to some kind of code optimization

As the line juste before is:
saidx =&sav->sah->saidx;

Could you show the value of&sav->sah->saidx ?
And also check if kgdb can print sav->sah->saidx (without the&) ?

Oh sorry, the previous console copy was chopped. I've tried sav too:
(kgdb) p sav->sah->saidx
Variable "sav" is not available.


To help you having a quick workaround, do you really need ESP+AH ?
Most of the time, people who configure ESP+AH just needs in fact ESP
with optional data authentication.

I could live without it, you are right. Thanks for the tip.


And the crash occurs in some part of the code which deals with
encapsulation in encapsulation.

This also may explains why I never saw that crash..

Could be. Anyways, it's a permitted option, so it would be good to fix it.
I hope I can help you in that somehow. :)

Thanks,
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Enabling IPSec panics stable/9 (runs OK on stable/8)

2012-01-05 Thread Attila Nagy

On 01/04/12 17:31, VANHULLEBUS Yvan wrote:

On Wed, Jan 04, 2012 at 04:17:41PM +0100, Attila Nagy wrote:
[]

#7  0x809bf779 in ipsec_process_done (m=0xfe000c7c7a00,
isr=0xfe001bf54380) at
/data/usr/src/sys/netipsec/ipsec_output.c:170

Here seems to be the problem
Can you do the following (in this order) in kgdb:
frame 7
p saidx
p *saidx

(kgdb) frame 7
#7  0x809bf779 in ipsec_process_done (m=0xfe000c7c7a00,
isr=0xfe001bf54380) at
/data/usr/src/sys/netipsec/ipsec_output.c:170
170 switch (saidx->dst.sa.sa_family) {
(kgdb) p saidx
No symbol "saidx" in current context.



There *is* such a symbol, as confirmed by kgdb's output when you
switched to frame 7 !

Could you check that you are running a correct debug kernel ?
The kernel config is GENERIC, plus some, so it contains DEBUG=-g 
makeoptions.
This is a limited environment (with a lot of programs missing from the 
boot image), but I don't think it should affect that.

kgdb command line is:
./kgdb kernel.debug vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address   = 0xa0
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x809bf779
stack pointer   = 0x28:0xff80002cd350
frame pointer   = 0x28:0xff80002cd390
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 12 (swi1: netisr 0)
trap number = 12
panic: page fault
cpuid = 4
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x187
trap_fatal() at trap_fatal+0x290
trap_pfault() at trap_pfault+0x1f9
trap() at trap+0x3df
calltrap() at calltrap+0x8
--- trap 0xc, rip = 0x809bf779, rsp = 0xff80002cd350, rbp = 
0xff80002cd390 ---

ipsec_process_done() at ipsec_process_done+0x119
esp_output_cb() at esp_output_cb+0x1a1
crypto_done() at crypto_done+0x102
swcr_process() at swcr_process+0x1d7
crypto_invoke() at crypto_invoke+0x6b
crypto_dispatch() at crypto_dispatch+0xfb
esp_output() at esp_output+0x5a2
ipsec4_process_packet() at ipsec4_process_packet+0x1f8
ip_ipsec_output() at ip_ipsec_output+0x16a
ip_output() at ip_output+0x526
icmp_reflect() at icmp_reflect+0x339
icmp_input() at icmp_input+0x257
ip_input() at ip_input+0x1de
swi_net() at swi_net+0x14d
intr_event_execute_handlers() at intr_event_execute_handlers+0x104
ithread_loop() at ithread_loop+0xa4
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff80002cdd00, rbp = 0 ---
Uptime: 12m32s
Dumping 30283 MB (4 chunks)
  chunk 0: 1MB (139 pages) ... ok
  chunk 1: 3070MB (785904 pages)panic: bufwrite: buffer is not busy???
cpuid = 4
 3054 3038 3022 3006 2990 2974 2958 2942 2926 2910 2894 2878 2862 2846 
2830 2814 2798 2782 2766 2750 2734 2718 2702 2686 2670 2654 2638 2622 
2606 2590 2574 2558 2542 2526 2510 2494 2478 2462 2446 2430 2414 2398 
2382 2366 2350 2334 2318 2302 2286 2270 2254 2238  2206 2190 2174 
2158 2142 2126 2110 2094 2078 2062 2046 2030 2014 1998 1982 1966 1950 
1934 1918 1902 1886 1870 1854 1838 1822 1806 1790 1774 1758 1742 1726 
1710 1694 1678 1662 1646 1630 1614 1598 1582 1566 1550 1534 1518 1502 
1486 1470 1454 1438 1422 1406 1390 1374 1358 1342 1326 1310 1294 1278 
1262 1246 1230 1214 1198 1182 1166 1150 1134 1118 1102 1086 1070 1054 
1038 1022 1006 990 974 958 942 926 910 894 878 862 846 830 814 798 782 
766 750 734 718 702 686 670 654 638 622 606 590 574 558 542 526 510 494 
478 462 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 
190 174 158 142 126 110 94 78 62 46 30 14 ... ok

  chunk 2: 1MB (256 pages) ... ok
  chunk 3: 27212MB (6966272 pages) 27197 27181 27165 27149 27133 27117 
27101 27085 27069 27053 27037 27021 27005 26989 26973 26957 26941 26925 
26909 26893 26877 26861 26845 26829 26813 26797 26781 26765 26749 26733 
26717 26701 26685 26669 26653 26637 26621 26605 26589 26573 26557 26541 
26525 26509 26493 26477 26461 26445 26429 26413 26397 26381 26365 26349 
26333 26317 26301 26285 26269 26253 26237 26221 26205 26189 26173 26157 
26141 26125 26109 26093 26077 26061 26045 26029 26013 25997 25981 25965 
25949 25933 25917 25901 25885 25869 25853 25837 25821 25805 25789 25773 

Re: Enabling IPSec panics stable/9 (runs OK on stable/8)

2012-01-04 Thread Attila Nagy

   Hi,
   On 01/04/12 15:51, VANHULLEBUS Yvan wrote:

   I've just upgraded a 8-STABLE box to 9-STABLE (well, just few commits
   before it has been tagged as STABLE), which runs from NFS (pxebooted).
   It has some IPSec config in ipsec.conf, like this for several boxes:
   add 172.28.16.4 172.16.248.2 ah 15704 -A hmac-md5 "asdfgh";
   add 172.16.248.2 172.28.16.4 ah 24504 -A hmac-md5 "asdfgh";
   add 172.28.16.4 172.16.248.2 esp 15705 -E blowfish-cbc "hgfdsa";
   add 172.16.248.2 172.28.16.4 esp 24505 -E blowfish-cbc "hgfdsa";
   spdadd 172.28.16.4 172.16.248.2 any -P out ipsec
  esp/transport/172.28.16.4-172.16.248.2/default
  ah/transport/172.28.16.4-172.16.248.2/default;

There is probably nothing related to the crash, but do you really use
static IPsec without IKE keying 

   Yes. :)
   It runs on an intranet, but there's a need to encrypt traffic.



[]

   kgdb says:
   (kgdb) bt
   #0  doadump (textdump=1) at /data/usr/src/sys/kern/kern_shutdown.c:260
   #1  0x80845705 in kern_reboot (howto=260)
   at /data/usr/src/sys/kern/kern_shutdown.c:442
   #2  0x80845bb1 in panic (fmt=Variable "fmt" is not available.
   )
   at /data/usr/src/sys/kern/kern_shutdown.c:607
   #3  0x80b167a0 in trap_fatal (frame=0xc, eva=Variable "eva" is
   not available.
   )
   at /data/usr/src/sys/amd64/amd64/trap.c:819
   #4  0x80b16ae9 in trap_pfault (frame=0xff80002cd2a0,
   usermode=0)
   at /data/usr/src/sys/amd64/amd64/trap.c:735
   #5  0x80b16faf in trap (frame=0xff80002cd2a0)
   at /data/usr/src/sys/amd64/amd64/trap.c:474
   #6  0x80b012ef in calltrap ()
   at /data/usr/src/sys/amd64/amd64/exception.S:228
   #7  0x809bf779 in ipsec_process_done (m=0xfe000c7c7a00,
   isr=0xfe001bf54380) at
   /data/usr/src/sys/netipsec/ipsec_output.c:170

Here seems to be the problem
Can you do the following (in this order) in kgdb:
frame 7
p saidx
p *saidx

   (kgdb) frame 7
   #7  0x809bf779 in ipsec_process_done (m=0xfe000c7c7a00,
   isr=0xfe001bf54380) at
   /data/usr/src/sys/netipsec/ipsec_output.c:170
   170 switch (saidx->dst.sa.sa_family) {
   (kgdb) p saidx
   No symbol "saidx" in current context.


The latest will probably generate an error, as (if you have the exact
same ipsec_output.c as I have from HEAD) saidx will probably have an
invalid adress.

   I have the same as in HEAD.



[...]

   8-STABLE runs fine with the same config.

Strange I'll review changes in IPsec stack which have been done in
STABLE/9 and not backported to STABLE/8.

   Oh, sorry, not quite an up-to-date 8-STABLE, it's from Sat May 21
   22:05:26 CEST 2011 (csup'd some hours earlier).
   Should I check with a more recent version? Does that help?
   Thanks for helping.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Enabling IPSec panics stable/9 (runs OK on stable/8)

2012-01-04 Thread Attila Nagy

   Hi,
   I've just upgraded a 8-STABLE box to 9-STABLE (well, just few commits
   before it has been tagged as STABLE), which runs from NFS (pxebooted).
   It has some IPSec config in ipsec.conf, like this for several boxes:
   add 172.28.16.4 172.16.248.2 ah 15704 -A hmac-md5 "asdfgh";
   add 172.16.248.2 172.28.16.4 ah 24504 -A hmac-md5 "asdfgh";
   add 172.28.16.4 172.16.248.2 esp 15705 -E blowfish-cbc "hgfdsa";
   add 172.16.248.2 172.28.16.4 esp 24505 -E blowfish-cbc "hgfdsa";
   spdadd 172.28.16.4 172.16.248.2 any -P out ipsec
  esp/transport/172.28.16.4-172.16.248.2/default
  ah/transport/172.28.16.4-172.16.248.2/default;
   Running /etc/rc.d/ipsec start instantly panics it with:
   Fatal trap 12: page fault while in kernel mode
   cpuid = 1; apic id = 01
   fault virtual address   = 0xa0
   fault code  = supervisor read data, page not present
   instruction pointer = 0x20:0x809bf779
   stack pointer   = 0x28:0xff80002cd350
   frame pointer   = 0x28:0xff80002cd390
   code segment= base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, long 1, def32 0, gran 1
   processor eflags= interrupt enabled, resume, IOPL = 0
   current process = 12 (swi1: netisr 0)
   trap number = 12
   panic: page fault
   cpuid = 1
   KDB: stack backtrace:
   db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
   kdb_backtrace() at kdb_backtrace+0x37
   panic() at panic+0x187
   trap_fatal() at trap_fatal+0x290
   trap_pfault() at trap_pfault+0x1f9
   trap() at trap+0x3df
   calltrap() at calltrap+0x8
   --- trap 0xc, rip = 0x809bf779, rsp = 0xff80002cd350, rbp
   = 0xff80002cd390 ---
   ipsec_process_done() at ipsec_process_done+0x119
   esp_output_cb() at esp_output_cb+0x1a1
   crypto_done() at crypto_done+0x102
   swcr_process() at swcr_process+0x1d7
   crypto_invoke() at crypto_invoke+0x6b
   crypto_dispatch() at crypto_dispatch+0xfb
   esp_output() at esp_output+0x5a2
   ipsec4_process_packet() at ipsec4_process_packet+0x1f8
   ip_ipsec_output() at ip_ipsec_output+0x16a
   ip_output() at ip_output+0x526
   icmp_reflect() at icmp_reflect+0x339
   icmp_input() at icmp_input+0x257
   ip_input() at ip_input+0x1de
   swi_net() at swi_net+0x14d
   intr_event_execute_handlers() at intr_event_execute_handlers+0x104
   ithread_loop() at ithread_loop+0xa4
   fork_exit() at fork_exit+0x11f
   fork_trampoline() at fork_trampoline+0xe
   --- trap 0, rip = 0, rsp = 0xff80002cdd00, rbp = 0 ---
   Uptime: 3m31s
   The machine in question is a VMWare virtual environment (if it
   counts), the kernel config is GENERIC plus this:
   options BOOTP
   options BOOTP_NFSV3
   options BOOTP_NFSROOT
   options NFSCLIENT
   device  carp
   device  crypto
   device  cryptodev
   options IPSEC
   options MAC
   options ROUTETABLES=1
   options KDB
   options DDB
   options KDB_UNATTENDED
   kgdb says:
   (kgdb) bt
   #0  doadump (textdump=1) at /data/usr/src/sys/kern/kern_shutdown.c:260
   #1  0x80845705 in kern_reboot (howto=260)
   at /data/usr/src/sys/kern/kern_shutdown.c:442
   #2  0x80845bb1 in panic (fmt=Variable "fmt" is not available.
   )
   at /data/usr/src/sys/kern/kern_shutdown.c:607
   #3  0x80b167a0 in trap_fatal (frame=0xc, eva=Variable "eva" is
   not available.
   )
   at /data/usr/src/sys/amd64/amd64/trap.c:819
   #4  0x80b16ae9 in trap_pfault (frame=0xff80002cd2a0,
   usermode=0)
   at /data/usr/src/sys/amd64/amd64/trap.c:735
   #5  0x80b16faf in trap (frame=0xff80002cd2a0)
   at /data/usr/src/sys/amd64/amd64/trap.c:474
   #6  0x80b012ef in calltrap ()
   at /data/usr/src/sys/amd64/amd64/exception.S:228
   #7  0x809bf779 in ipsec_process_done (m=0xfe000c7c7a00,
   isr=0xfe001bf54380) at
   /data/usr/src/sys/netipsec/ipsec_output.c:170
   #8  0x809ce931 in esp_output_cb (crp=0xfe011103c058)
   at /data/usr/src/sys/netipsec/xform_esp.c:1007
   #9  0x809f4e12 in crypto_done (crp=0xfe011103c058)
   at /data/usr/src/sys/opencrypto/crypto.c:1156
   #10 0x809f89c7 in swcr_process (dev=Variable "dev" is not
   available.
   )
   at /data/usr/src/sys/opencrypto/cryptosoft.c:1054
   #11 0x809f5c9b in crypto_invoke (cap=0xfe000c12f700,
   crp=0xfe011103c058, hint=0) at cryptodev_if.h:53
   #12 0x809f6acb in crypto_dispatch (crp=0xfe011103c058)
   at /data/usr/src/sys/opencrypto/crypto.c:806
   #13 0x809cef82 in esp_output (m=0xfe000c7c7a00,
   isr=0xfe001bf54380, mp=Variable "mp" is not available.
   ) at /data/usr/src/sys/netipsec/xform_esp.c:907
   #14 0x809bfa98 in ipsec4_process_packet (m=0xfe000c7c7a00,
   isr=0xfe001bf54380, flags=Variable "flags" is not available.
   )
   at /data/usr/src/sys/netipsec/ipsec_output.c:580
   #15 0x8096f5da in ip_ips

Re: tmpfs is zero bytes (no free space), maybe a zfs bug?

2011-02-10 Thread Attila Nagy

 On 02/10/2011 05:56 PM, Bruce Cran wrote:

On Wed, 19 Jan 2011 11:09:31 +0100
Attila Nagy  wrote:


On 01/19/11 09:46, Jeremy Chadwick wrote:

On Wed, Jan 19, 2011 at 09:37:35AM +0100, Attila Nagy wrote:

I first noticed this problem on machines with more memory (32GB
eg.), but now it happens on 4G machines too:
tmpfs   0B  0B  0B
100%/tmp
FreeBSD builder 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #0: Sat Jan
8 22:11:54 CET 2011

Maybe it's related, that I use zfs on these machines...

Sometimes it grows and shrinks, but generally there is no space
even for a small file, or a socket to create.

http://lists.freebsd.org/pipermail/freebsd-stable/2011-January/060867.html


Oh crap. :(

I hope somebody can find the time to look into this, it's pretty
annoying...

It's also listed as a bug on OpenSolaris:
http://bugs.opensolaris.org/bugdatabase/view_bug.do;?bug_id=6804661

ZFS is a great innovation, which forces sysadmins to learn kernel and VM 
internals. :-O

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: tmpfs is zero bytes (no free space), maybe a zfs bug?

2011-02-01 Thread Attila Nagy

On 01/30/11 12:09, Kostik Belousov wrote:

On Wed, Jan 19, 2011 at 05:27:38PM +0100, Ivan Voras wrote:

On 19 January 2011 16:02, Kostik Belousov  wrote:


http://people.freebsd.org/~ivoras/diffs/tmpfs.h.patch

I don't think this is a complete solution but it's a start. If you can,
try it and see if it helps.

This is not a start, and actually a step in the wrong direction.
Tmpfs is wrong now, but the patch would make the wrongness even bigger.

Issue is that the current tmpfs calculation should not depend on the
length of the inactive queue or the amount of free pages. This data only
measures  the pressure on the pagedaemon, and has absolutely no relation
to the amount of data that can be put into anonymous objects before the
system comes out of swap.

vm_lowmem handler is invoked in two situations:
- when KVA cannot satisfy the request for the space allocation;
- when pagedaemon have to start the scan.
None of the situations has any direct correlation with the fact that
tmpfs needs to check, that is "Is there enough swap to keep all my
future anonymous memory requests ?".

Might be, swap reservation numbers can be useful to the tmpfs reporting.
Also might be, tmpfs should reserve the swap explicitely on start, instead
of making attempts to guess how much can be allocated at random moment.

Thank you for your explanation! I'm still not very familiar with VM
and VFS. Could you also read my report at
http://www.mail-archive.com/freebsd-current@freebsd.org/msg126491.html
? I'm curious about the fact that there is lots of 'free' memory here
in the same situation.

This is another ugliness in the dynamic calculation. Your wired is around
15GB, that is always greater then available swap + free + inactive.
As result, tmpfs_mem_info() always returns 0.
In this situation TMPFS_PAGES_MAX() seems to return negative value, and
then TMPFS_PAGES_AVAIL() clamps at 0.

Well, if nobody can take care of this now, could you please state this 
in the BUGS section of the tmpfs man page?


Thanks,
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: tmpfs is zero bytes (no free space), maybe a zfs bug?

2011-01-19 Thread Attila Nagy

On 01/19/11 09:46, Jeremy Chadwick wrote:

On Wed, Jan 19, 2011 at 09:37:35AM +0100, Attila Nagy wrote:

I first noticed this problem on machines with more memory (32GB
eg.), but now it happens on 4G machines too:
tmpfs   0B  0B  0B
100%/tmp
FreeBSD builder 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #0: Sat Jan  8
22:11:54 CET 2011

Maybe it's related, that I use zfs on these machines...

Sometimes it grows and shrinks, but generally there is no space even
for a small file, or a socket to create.

http://lists.freebsd.org/pipermail/freebsd-stable/2011-January/060867.html


Oh crap. :(

I hope somebody can find the time to look into this, it's pretty annoying...
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


tmpfs is zero bytes (no free space), maybe a zfs bug?

2011-01-19 Thread Attila Nagy

Hi,

I first noticed this problem on machines with more memory (32GB eg.), 
but now it happens on 4G machines too:
tmpfs   0B  0B  0B   100%
/tmp
FreeBSD builder 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #0: Sat Jan  8 
22:11:54 CET 2011


Maybe it's related, that I use zfs on these machines...

Sometimes it grows and shrinks, but generally there is no space even for 
a small file, or a socket to create.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Attila Nagy

 On 12/16/2010 01:44 PM, Martin Matuska wrote:

Hi everyone,

following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
providing a ZFSv28 testing patch for 8-STABLE.

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz

Link to mfsBSD ISO files for testing (i386 and amd64):
 http://mfsbsd.vx.sk/iso/zfs-v28/8.2-beta-zfsv28-amd64.iso
 http://mfsbsd.vx.sk/iso/zfs-v28/8.2-beta-zfsv28-i386.iso

The root password for the ISO files: "mfsroot"
The ISO files work on real systems and in virtualbox.
They conatin a full install of FreeBSD 8.2-PRERELEASE with ZFS v28,
simply use the provided "zfsinstall" script.

The patch is against FreeBSD 8-STABLE as of 2010-12-15.

When applying the patch be sure to use correct options for patch(1)
and make sure the file sys/cddl/compat/opensolaris/sys/sysmacros.h gets
deleted:

 # cd /usr/src
 # fetch
http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
 # xz -d stable-8-zfsv28-20101215.patch.xz
 # patch -E -p0<  stable-8-zfsv28-20101215.patch
 # rm sys/cddl/compat/opensolaris/sys/sysmacros.h

I've just got a panic:
http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/IMAGE_006.jpg

The panic line for google:
panic: solaris assert: task->ost_magic == TASKQ_MAGIC, file: 
/usr/src/sys/modules/zfs/../../cddl/compat/opensolaris/kern/opensolaris_taskq.c, 
line: 150


I hope this is enough for debugging, if it's not yet otherwise known. If 
not, I will try to catch it againt and make a dump.


Thanks,
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Attila Nagy

 On 01/10/2011 09:57 AM, Pawel Jakub Dawidek wrote:

On Sun, Jan 09, 2011 at 12:52:56PM +0100, Attila Nagy wrote:
[...]

I've finally found the time to read the v28 patch and figured out the
problem: vfs.zfs.l2arc_noprefetch was changed to 1, so it doesn't use
the prefetched data on the L2ARC devices.
This is a major hit in my case. Enabling this again restored the
previous hit rates and lowered the load on the hard disks significantly.

Well, not storing prefetched data on L2ARC vdevs is the default is
Solaris. For some reason it was changed by kmacy@ in r205231. Not sure
why and we can't ask him now, I'm afraid. I just sent an e-mail to

What happened to him?

Brendan Gregg from Oracle who originally implemented L2ARC in ZFS why
this is turned off by default. Once I get answer we can think about
turning it on again.

I think it makes some sense as a stupid form of preferring random IO in 
the L2ARC instead of sequential. But if I rely on auto tuning and let 
prefetch enabled, even a busy mailserver will prefetch a lot of blocks 
and I think that's a fine example of random IO (also, it makes the 
system unusable, but that's another story).


Having this choice is good, and in this case enabling this makes sense 
for me. I don't know any reasons about why you wouldn't use all of your 
L2ARC space (apart from sparing the quickly wearing out flash space and 
move disk heads instead), but I'm sure Brendan made this choice with a 
good reason.

If you get an answer, please tell us. :)

Thanks,
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-10 Thread Attila Nagy

 On 01/10/2011 10:02 AM, Pawel Jakub Dawidek wrote:

On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:

No, it's not related. One of the disks in the RAIDZ2 pool went bad:
(da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
(da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
(da4:arcmsr0:0:4:0): SCSI status: Check Condition
(da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read
error)
and it seems it froze the whole zpool. Removing the disk by hand solved
the problem.
I've seen this previously on other machines with ciss.
I wonder why ZFS didn't throw it out of the pool.

Such hangs happen when I/O never returns. ZFS doesn't timeout I/O
requests on its own, this is driver's responsibility. It is still
strange that the driver didn't pass I/O error up to ZFS or it might as
well be ZFS bug, but I don't think so.

Indeed, it may to be a controller/driver bug. The newly released (last 
december) firmware says something about a similar problem. I've 
upgraded, we'll see whether it will help next time a drive goes awry.
I've only seen these errors in dmesg, not in zpool status, there 
everything was clear (all zeroes).


BTW, I've swapped those bad drives (da4, which reported the above 
errors, and da16, which didn't reported anything to the OS, it was just 
plain bad according to the controller firmware -and after its deletion, 
I could offline da4, so it seems it's the real cause, see my previous 
e-mail), and zpool replaced first da4, but after some seconds of 
thinking all IO on all disks deceased.
After waiting some minutes, it was still the same, so I've rebooted. 
Then I noticed that a scrub is going on, so I stopped it.
Then the zpool replace da4 went fine, it started to resilver the disk. 
But another zpool replace (for da16) causes the same error: some seconds 
of IO, then nothing and it stuck in that.


Has anybody tried replacing two drives simultaneously with the zfs v28 
patch? (this is a stripe of two raidz2s and da4 and da16 are in 
different raidz2)

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-09 Thread Attila Nagy

 On 01/09/2011 01:18 PM, Jeremy Chadwick wrote:

On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:

  On 01/09/2011 10:00 AM, Attila Nagy wrote:

On 12/16/2010 01:44 PM, Martin Matuska wrote:

Hi everyone,

following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
providing a ZFSv28 testing patch for 8-STABLE.

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz



I've got an IO hang with dedup enabled (not sure it's related,
I've started to rewrite all data on pool, which makes a heavy
load):

The processes are in various states:
65747   1001  1  54   10 28620K 24360K tx->tx  0   6:58  0.00% cvsup
80383   1001  1  54   10 40616K 30196K select  1   5:38  0.00% rsync
1501 www 1  440  7304K  2504K zio->i  0   2:09  0.00% nginx
1479 www 1  440  7304K  2416K zio->i  1   2:03  0.00% nginx
1477 www 1  440  7304K  2664K zio->i  0   2:02  0.00% nginx
1487 www 1  440  7304K  2376K zio->i  0   1:40  0.00% nginx
1490 www 1  440  7304K  1852K zfs 0   1:30  0.00% nginx
1486 www 1  440  7304K  2400K zfsvfs  1   1:05  0.00% nginx

And everything which wants to touch the pool is/becomes dead.

Procstat says about one process:
# procstat -k 1497
  PIDTID COMM TDNAME   KSTACK
1497 100257 nginx-mi_switch
sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock
VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred
kern_openat syscallenter syscall Xfast_syscall

No, it's not related. One of the disks in the RAIDZ2 pool went bad:
(da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
(da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
(da4:arcmsr0:0:4:0): SCSI status: Check Condition
(da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered
read error)
and it seems it froze the whole zpool. Removing the disk by hand
solved the problem.
I've seen this previously on other machines with ciss.
I wonder why ZFS didn't throw it out of the pool.

Hold on a minute.  An unrecoverable read error does not necessarily mean
the drive is bad, it could mean that the individual LBA that was
attempted to be read resulted in ASC 0x11 (MEDIUM ERROR) (e.g. a bad
block was encountered).  I would check SMART stats on the disk (since
these are probably SATA given use of arcmsr(4)) and provide those.
*That* will tell you if the disk is bad.  I'll help you decode the
attributes values if you provide them.
You are right, and I gave incorrect information. There are a lot more 
errors for that disk in the logs, and the zpool was frozen.
I tried to offline the given disk. That helped in the ciss case, where 
the symptom is the same, or something similar, like there is no IO for 
ages, then something small and nothing for long seconds/minutes, and 
there are no errors logged. zpool status reported no errors, and the 
dmesg was clear too.
There I could find the bad disk by watching gstat output and there I saw 
when the very small amount of IO was done, there was one disk with 
response times well above a second, while the others responded quickly.
There the zpool offline helped. Here not, the command just got hang, 
like everything else.

So what I did then: got into the areca-cli and searched for errors.
One disk was set to failed and it seemed to be the cause. I've removed 
it (and did a camcontrol rescan, but I'm not sure it was necessary or 
not), and suddenly the zpool offline finished and everything went back 
to normal.
But there are two controllers in the system and now I see that the above 
disk is on ctrl 1, while the one I have removed is on ctrl 2.
I was misleaded by their same position. So now I have an offlined disk 
(which produces read errors, but I couldn't see them in the zpool 
output) and another, which is shown as failed in the RAID controller and 
got removed by hand (and solved the situation):

NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
  raidz2-0   DEGRADED 0 0 0
label/disk20-01  ONLINE   0 0 0
label/disk20-02  ONLINE   0 0 0
label/disk20-03  ONLINE   0 0 0
label/disk20-04  ONLINE   0 0 0
label/disk20-05  OFFLINE  0 0 0
label/disk20-06  ONLINE   0 0 0
label/disk20-07  ONLINE   0 0 0
label/disk20-08  ONLINE   0 0 0
label/disk20-09  ONLINE   0 0 0
label/disk20-10  ONLINE   0 0 0
label/disk20-11  ONLINE   0 0 0
label/disk20-12  ONLINE   0 0 0
  raidz2-1   DEGRADED 0 0 0
label/disk21-01  ONLINE   0 0 0
label/disk21-02  ONLINE   0 0 0
label/disk21-03  ONLINE 

Re: New ZFSv28 patchset for 8-STABLE

2011-01-09 Thread Attila Nagy

 On 01/01/2011 08:09 PM, Artem Belevich wrote:

On Sat, Jan 1, 2011 at 10:18 AM, Attila Nagy  wrote:

What I see:
- increased CPU load
- decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
hard disk load (IOPS graph)


...

Any ideas on what could cause these? I haven't upgraded the pool version and
nothing was changed in the pool or in the file system.

The fact that L2 ARC is full does not mean that it contains the right
data.  Initial L2ARC warm up happens at a much higher rate than the
rate L2ARC is updated after it's been filled initially. Even
accelerated warm-up took almost a day in your case. In order for L2ARC
to warm up properly you may have to wait quite a bit longer. My guess
is that it should slowly improve over the next few days as data goes
through L2ARC and those bits that are hit more often take residence
there. The larger your data set, the longer it will take for L2ARC to
catch the right data.

Do you have similar graphs from pre-patch system just after reboot? I
suspect that it may show similarly abysmal L2ARC hit rates initially,
too.


I've finally found the time to read the v28 patch and figured out the 
problem: vfs.zfs.l2arc_noprefetch was changed to 1, so it doesn't use 
the prefetched data on the L2ARC devices.
This is a major hit in my case. Enabling this again restored the 
previous hit rates and lowered the load on the hard disks significantly.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-09 Thread Attila Nagy

 On 01/09/2011 10:00 AM, Attila Nagy wrote:

 On 12/16/2010 01:44 PM, Martin Matuska wrote:

Hi everyone,

following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
providing a ZFSv28 testing patch for 8-STABLE.

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz 



I've got an IO hang with dedup enabled (not sure it's related, I've 
started to rewrite all data on pool, which makes a heavy load):


The processes are in various states:
65747   1001  1  54   10 28620K 24360K tx->tx  0   6:58  0.00% cvsup
80383   1001  1  54   10 40616K 30196K select  1   5:38  0.00% rsync
 1501 www 1  440  7304K  2504K zio->i  0   2:09  0.00% nginx
 1479 www 1  440  7304K  2416K zio->i  1   2:03  0.00% nginx
 1477 www 1  440  7304K  2664K zio->i  0   2:02  0.00% nginx
 1487 www 1  440  7304K  2376K zio->i  0   1:40  0.00% nginx
 1490 www 1  440  7304K  1852K zfs 0   1:30  0.00% nginx
 1486 www 1  440  7304K  2400K zfsvfs  1   1:05  0.00% nginx

And everything which wants to touch the pool is/becomes dead.

Procstat says about one process:
# procstat -k 1497
  PIDTID COMM TDNAME   KSTACK
 1497 100257 nginx-mi_switch sleepq_wait 
__lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock VOP_LOCK1_APV 
_vn_lock nullfs_root lookup namei vn_open_cred kern_openat 
syscallenter syscall Xfast_syscall

No, it's not related. One of the disks in the RAIDZ2 pool went bad:
(da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
(da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
(da4:arcmsr0:0:4:0): SCSI status: Check Condition
(da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read 
error)
and it seems it froze the whole zpool. Removing the disk by hand solved 
the problem.

I've seen this previously on other machines with ciss.
I wonder why ZFS didn't throw it out of the pool.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-09 Thread Attila Nagy

 On 12/16/2010 01:44 PM, Martin Matuska wrote:

Hi everyone,

following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
providing a ZFSv28 testing patch for 8-STABLE.

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz

I've got an IO hang with dedup enabled (not sure it's related, I've 
started to rewrite all data on pool, which makes a heavy load):


The processes are in various states:
65747   1001  1  54   10 28620K 24360K tx->tx  0   6:58  0.00% cvsup
80383   1001  1  54   10 40616K 30196K select  1   5:38  0.00% rsync
 1501 www 1  440  7304K  2504K zio->i  0   2:09  0.00% nginx
 1479 www 1  440  7304K  2416K zio->i  1   2:03  0.00% nginx
 1477 www 1  440  7304K  2664K zio->i  0   2:02  0.00% nginx
 1487 www 1  440  7304K  2376K zio->i  0   1:40  0.00% nginx
 1490 www 1  440  7304K  1852K zfs 0   1:30  0.00% nginx
 1486 www 1  440  7304K  2400K zfsvfs  1   1:05  0.00% nginx

And everything which wants to touch the pool is/becomes dead.

Procstat says about one process:
# procstat -k 1497
  PIDTID COMM TDNAME   KSTACK
 1497 100257 nginx-mi_switch sleepq_wait 
__lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock VOP_LOCK1_APV 
_vn_lock nullfs_root lookup namei vn_open_cred kern_openat syscallenter 
syscall Xfast_syscall


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-04 Thread Attila Nagy

 On 01/03/2011 10:35 PM, Bob Friesenhahn wrote:


After four days, the L2 hit rate is still hovering around 10-20 
percents (was between 60-90), so I think it's clearly a regression in 
the ZFSv28 patch...

And the massive growth in CPU usage can also very nicely be seen...

I've updated the graphs at (switch time can be checked on the zfs-mem 
graph):

http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/

There is a new phenomenom: the large IOPS peaks. I use this munin 
script on a lot of machines and never seen anything like this... I'm 
not sure whether it's related or not.


It is not so clear that there is a problem.  I am not sure what you 
are using this server for but it is wise 

The IO pattern has changed radically, so for me it's a problem.
to consider that this is the funny time when a new year starts, SPAM 
delivery goes through the roof, and employees and customers behave 
differently.  You chose the worst time of the year to implement the 
change and observe behavior.
It's a free software mirror, ftp.fsn.hu, and I'm sure that it's (the 
very low hit rate and the increased CPU usage) not related to the time 
when I made the switch.


CPU use is indeed increased somewhat.  A lower loading of the l2arc is 
not necessarily a problem.  The l2arc is usually bandwidth limited 
compared with main store so if bulk data can not be cached in RAM, 
then it is best left in main store.  A smarter l2arc algorithm could 
put only the data producing the expensive IOPS (the ones requiring a 
seek) in the l2arc, lessening the amount of data cached on the device.
That would make sense, if I wouldn't have 100-120 IOPS (for 7k2 RPM 
disks, it's about their max, gstat tells me the same) on the disks, and 
as low as 10 percents of L2 hit rate.
What's smarter? Having 60-90% hit rate from the SSDs and moving the slow 
disk heads less, or having 10-20 percent of hit rate and kill the disks 
with random IO?
If you are right, ZFS tries to be too smart and falls on its face with 
this kind of workload.


BTW, I've checked the v15-v28 patch for arc.c, and I can't see any L2ARC 
related change there. I'm not sure whether the hypothetical logic would 
be there, or a different file, I haven't read it end to end.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-03 Thread Attila Nagy

 On 01/01/2011 08:09 PM, Artem Belevich wrote:

On Sat, Jan 1, 2011 at 10:18 AM, Attila Nagy  wrote:

What I see:
- increased CPU load
- decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
hard disk load (IOPS graph)


...

Any ideas on what could cause these? I haven't upgraded the pool version and
nothing was changed in the pool or in the file system.

The fact that L2 ARC is full does not mean that it contains the right
data.  Initial L2ARC warm up happens at a much higher rate than the
rate L2ARC is updated after it's been filled initially. Even
accelerated warm-up took almost a day in your case. In order for L2ARC
to warm up properly you may have to wait quite a bit longer. My guess
is that it should slowly improve over the next few days as data goes
through L2ARC and those bits that are hit more often take residence
there. The larger your data set, the longer it will take for L2ARC to
catch the right data.

Do you have similar graphs from pre-patch system just after reboot? I
suspect that it may show similarly abysmal L2ARC hit rates initially,
too.


After four days, the L2 hit rate is still hovering around 10-20 percents 
(was between 60-90), so I think it's clearly a regression in the ZFSv28 
patch...

And the massive growth in CPU usage can also very nicely be seen...

I've updated the graphs at (switch time can be checked on the zfs-mem 
graph):

http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/

There is a new phenomenom: the large IOPS peaks. I use this munin script 
on a lot of machines and never seen anything like this... I'm not sure 
whether it's related or not.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-02 Thread Attila Nagy

 On 01/02/2011 05:06 AM, J. Hellenthal wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/01/2011 13:18, Attila Nagy wrote:

  On 12/16/2010 01:44 PM, Martin Matuska wrote:

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz




I've used this:
http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101223-nopython.patch.xz

on a server with amd64, 8 G RAM, acting as a file server on
ftp/http/rsync, the content being read only mounted with nullfs in
jails, and the daemons use sendfile (ftp and http).

The effects can be seen here:
http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/
the exact moment of the switch can be seen on zfs_mem-week.png, where
the L2 ARC has been discarded.

What I see:
- increased CPU load
- decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
hard disk load (IOPS graph)

Maybe I could accept the higher system load as normal, because there
were a lot of things changed between v15 and v28 (but I was hoping if I
use the same feature set, it will require less CPU), but dropping the
L2ARC hit rate so radically seems to be a major issue somewhere.
As you can see from the memory stats, I have enough kernel memory to
hold the L2 headers, so the L2 devices got filled up to their maximum
capacity.

Any ideas on what could cause these? I haven't upgraded the pool version
and nothing was changed in the pool or in the file system.


Running arc_summary.pl[1] -p4 should print a summary about your l2arc
and you should also notice in that section that there is a high number
of "SPA Mismatch" mine usually grew to around 172k before I would notice
a crash and I could reliably trigger this while in scrub.

What ever is causing this needs desperate attention!

I emailed mm@ privately off-list when I noticed this going on but have
not received any feedback as of yet.

It's at zero currently (2 days of uptime):
kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-01 Thread Attila Nagy

 On 01/01/2011 08:09 PM, Artem Belevich wrote:

On Sat, Jan 1, 2011 at 10:18 AM, Attila Nagy  wrote:

What I see:
- increased CPU load
- decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
hard disk load (IOPS graph)


...

Any ideas on what could cause these? I haven't upgraded the pool version and
nothing was changed in the pool or in the file system.

The fact that L2 ARC is full does not mean that it contains the right
data.  Initial L2ARC warm up happens at a much higher rate than the
rate L2ARC is updated after it's been filled initially. Even
accelerated warm-up took almost a day in your case. In order for L2ARC
to warm up properly you may have to wait quite a bit longer. My guess
is that it should slowly improve over the next few days as data goes
through L2ARC and those bits that are hit more often take residence
there. The larger your data set, the longer it will take for L2ARC to
catch the right data.

Do you have similar graphs from pre-patch system just after reboot? I
suspect that it may show similarly abysmal L2ARC hit rates initially,
too.


Sadly no, but I remember that I've seen increasing hit rates as the 
cache grew, that's what I wrote the email after one and a half days.

Currently it's at the same level, when it was right after the reboot...

We'll see after few days.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: New ZFSv28 patchset for 8-STABLE

2011-01-01 Thread Attila Nagy

 On 12/16/2010 01:44 PM, Martin Matuska wrote:

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz



I've used this:
http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101223-nopython.patch.xz
on a server with amd64, 8 G RAM, acting as a file server on 
ftp/http/rsync, the content being read only mounted with nullfs in 
jails, and the daemons use sendfile (ftp and http).


The effects can be seen here:
http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/
the exact moment of the switch can be seen on zfs_mem-week.png, where 
the L2 ARC has been discarded.


What I see:
- increased CPU load
- decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased 
hard disk load (IOPS graph)


Maybe I could accept the higher system load as normal, because there 
were a lot of things changed between v15 and v28 (but I was hoping if I 
use the same feature set, it will require less CPU), but dropping the 
L2ARC hit rate so radically seems to be a major issue somewhere.
As you can see from the memory stats, I have enough kernel memory to 
hold the L2 headers, so the L2 devices got filled up to their maximum 
capacity.


Any ideas on what could cause these? I haven't upgraded the pool version 
and nothing was changed in the pool or in the file system.


Thanks,
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't

2010-03-31 Thread Attila Nagy
Pyun YongHyeon wrote:
> On Tue, Mar 30, 2010 at 05:57:45PM +0200, Attila Nagy wrote:
>   
>> Jonathan Feally wrote:
>>     
>>> Attila Nagy wrote:
>>>   
>>>>>>>> Bingo, this solved the problem. The current uptime nears four days.
>>>>>>>> Previously I couldn't go further than a day.
>>>>>>>>
>>>>>>>> The machine gets very light TCP load (and other machines which
>>>>>>>> get work
>>>>>>>> well), so I guess it's UDP RX or TX checksum related
>>>>>>>>   
>>>>>>>> 
>>> I also have had my network go dead on a recent 8.0-STABLE on bge
>>> system. Console is alive, but network just stops. I am running it as a
>>> router with untagged on bge0 and nat of traffic on vlan201 tagged on
>>> top of bge1. I haven't had it lock up in 3 days, but I will try the
>>> -txcsum and -rxcsum on both interfaces to see if the problem still
>>> persists or not. I do have a lot of tcp traffic, but there is also
>>> unsolicited udp flying in as well.
>>>   
>> Well, it's a short time to judge from, but with rx,txcsum disabled, the
>> machine froze nearly instantly (less than one hour of uptime), while
>> with tso disabled, it still works.
>> So for now I think tso causes the problems.
>> BTW, now that we are talking about that, I remember that I've disabled
>> it on a lot of machines previously, because I've had strange issues.
>> 
>
> Would you show me the dmesg output(only bce(4) part)?
>   
Sure:
bce0:  mem
0xfa00-0xfbff irq 16 at device 0.0 on pci7
miibus0:  on bce0
brgphy0:  PHY 2 on miibus0
brgphy0:  1000baseSX-FDX, 2500baseSX-FDX, auto
bce0: Ethernet address: 00:1b:78:75:f0:34
bce0: [ITHREAD]
bce0: ASIC (0x57081021); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); B/C
(4.4.1); Flags (MSI|2.5G)
bce1:  mem
0xf600-0xf7ff irq 16 at device 0.0 on pci3
miibus1:  on bce1
brgphy1:  PHY 2 on miibus1
brgphy1:  1000baseSX-FDX, 2500baseSX-FDX, auto
bce1: Ethernet address: 00:1b:78:75:f0:38
bce1: [ITHREAD]
bce1: ASIC (0x57081021); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); B/C
(4.4.1); Flags (MSI|2.5G)

The NIC's firmware is up to date (latest available on HP firmware update
CD).
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't

2010-03-30 Thread Attila Nagy
Jonathan Feally wrote:
> Attila Nagy wrote:
>>>>>> Bingo, this solved the problem. The current uptime nears four days.
>>>>>> Previously I couldn't go further than a day.
>>>>>>
>>>>>> The machine gets very light TCP load (and other machines which
>>>>>> get work
>>>>>> well), so I guess it's UDP RX or TX checksum related
>>>>>>   
> I also have had my network go dead on a recent 8.0-STABLE on bge
> system. Console is alive, but network just stops. I am running it as a
> router with untagged on bge0 and nat of traffic on vlan201 tagged on
> top of bge1. I haven't had it lock up in 3 days, but I will try the
> -txcsum and -rxcsum on both interfaces to see if the problem still
> persists or not. I do have a lot of tcp traffic, but there is also
> unsolicited udp flying in as well.
Well, it's a short time to judge from, but with rx,txcsum disabled, the
machine froze nearly instantly (less than one hour of uptime), while
with tso disabled, it still works.
So for now I think tso causes the problems.
BTW, now that we are talking about that, I remember that I've disabled
it on a lot of machines previously, because I've had strange issues.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't

2010-03-30 Thread Attila Nagy
Pyun YongHyeon wrote:
> On Mon, Mar 29, 2010 at 09:21:42PM +0200, Attila Nagy wrote:
>   
>> Pyun YongHyeon wrote:
>> 
>>> On Mon, Mar 29, 2010 at 12:57:59PM +0200, Attila Nagy wrote:
>>>   
>>>   
>>>> Hi,
>>>>
>>>> Michael Loftis wrote:
>>>> 
>>>> 
>>>>> --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy 
>>>>> wrote:
>>>>>
>>>>> <...>
>>>>>   
>>>>>   
>>>>>> Both unbound and python accepts DNS requests, and it seems when 25%
>>>>>> interrupt happens, only unbound is in *udp state, where it is 50%, both
>>>>>> programs are in that state.
>>>>>> 
>>>>>> 
>>>>> Try turning of hardware TSO/checksum offload if it's availble on your
>>>>> chipset?  ifconfig  -rxcsum -txcsum -tso -- I'm only using
>>>>> nfe chips right now, but w/ the TSO/CSUM on they lock up constantly
>>>>> under high load.  We're pretty sure it's mostly the nfe driver, or the
>>>>> chips themselves, but have never ruled out some generic 8.x hardware
>>>>> offload issues.
>>>>>   
>>>>>   
>>>> Bingo, this solved the problem. The current uptime nears four days.
>>>> Previously I couldn't go further than a day.
>>>>
>>>> The machine gets very light TCP load (and other machines which get work
>>>> well), so I guess it's UDP RX or TX checksum related.
>>>>
>>>> 
>>>> 
>>> Hmm, this is unexpected result. Since you're using UDP, TSO is not
>>> involved in this issue. Because you disabled RX/TX checksum
>>> offloading could you check how many number of 'bad checksum' and
>>> and 'no checksum' you have from netstat(1)?
>>> To narrow down which side of checksum offloading causes the issue,
>>> would you just disable one side in a time? For instance, disable TX
>>> checksum offloading with RX checksum offloading enabled and see how
>>> bce(4) works.
>>> #ifconfig bce0 -txcsum rxcsum
>>> If that shows the same issue, try disabling RX checksum offloading
>>> but enabling TX checksum offloading.
>>> #ifconfig bce0 txcsum -rxcsum
>>>   
>>>   
>> It's interesting. During the day, I've disabled only HW checksumming and
>> left TSO enabled. It couldn't run more than a few hours.
>> I have disabled tso again to see what happens.
>>
>> BTW, of course there is TCP traffic on that interface (DNS is also
>> available on TCP), maybe this causes the problem.
>> 
>
> The only guess I can think of at this moment is incorrect use of
> bus_dma(9) in TX path. But I'm not sure this is related with the
> issue you're seeing. Would you try the experimental patch at the
> following URL?
> http://people.freebsd.org/~yongari/bce/bce.20100305.diff
> Please make sure to back up your old bce(4) driver before applying
> the patch. I didn't see any abnormal things in testing but it
> wasn't much stressed.
>   
With the default settings (rx, tx csum, tso) it froze in about an hour:
CPU:  0.0% user,  0.0% nice,  0.0% system, 25.0% interrupt, 75.0% idle
  714 bind 4 1020  1200M  1182M *lle3  17:24  0.00% unbound

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't

2010-03-29 Thread Attila Nagy
Pyun YongHyeon wrote:
> On Mon, Mar 29, 2010 at 12:57:59PM +0200, Attila Nagy wrote:
>   
>> Hi,
>>
>> Michael Loftis wrote:
>> 
>>> --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy 
>>> wrote:
>>>
>>> <...>
>>>   
>>>> Both unbound and python accepts DNS requests, and it seems when 25%
>>>> interrupt happens, only unbound is in *udp state, where it is 50%, both
>>>> programs are in that state.
>>>> 
>>> Try turning of hardware TSO/checksum offload if it's availble on your
>>> chipset?  ifconfig  -rxcsum -txcsum -tso -- I'm only using
>>> nfe chips right now, but w/ the TSO/CSUM on they lock up constantly
>>> under high load.  We're pretty sure it's mostly the nfe driver, or the
>>> chips themselves, but have never ruled out some generic 8.x hardware
>>> offload issues.
>>>   
>> Bingo, this solved the problem. The current uptime nears four days.
>> Previously I couldn't go further than a day.
>>
>> The machine gets very light TCP load (and other machines which get work
>> well), so I guess it's UDP RX or TX checksum related.
>>
>> 
>
> Hmm, this is unexpected result. Since you're using UDP, TSO is not
> involved in this issue. Because you disabled RX/TX checksum
> offloading could you check how many number of 'bad checksum' and
> and 'no checksum' you have from netstat(1)?
> To narrow down which side of checksum offloading causes the issue,
> would you just disable one side in a time? For instance, disable TX
> checksum offloading with RX checksum offloading enabled and see how
> bce(4) works.
> #ifconfig bce0 -txcsum rxcsum
> If that shows the same issue, try disabling RX checksum offloading
> but enabling TX checksum offloading.
> #ifconfig bce0 txcsum -rxcsum
>   
It's interesting. During the day, I've disabled only HW checksumming and
left TSO enabled. It couldn't run more than a few hours.
I have disabled tso again to see what happens.

BTW, of course there is TCP traffic on that interface (DNS is also
available on TCP), maybe this causes the problem.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't

2010-03-29 Thread Attila Nagy
Mike Tancsa wrote:
> At 11:39 AM 3/25/2010, Michael Loftis wrote:
>> --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy 
>> wrote:
>>
>> <...>
>>> Both unbound and python accepts DNS requests, and it seems when 25%
>>> interrupt happens, only unbound is in *udp state, where it is 50%, both
>>> programs are in that state.
>>
>> Try turning of hardware TSO/checksum offload if it's availble on your
>> chipset?  ifconfig  -rxcsum -txcsum -tso -- I'm only using
>> nfe chips right now, but w/ the TSO/CSUM on they lock up constantly
>> under high load.  We're pretty sure it's mostly the nfe driver, or
>> the chips themselves, but have never ruled out some generic 8.x
>> hardware offload issues.
>
> There were also a bunch of commits last night for the bce driver.  If
> its the NIC in RELENG_8, perhaps those bug fixes might help
>
> <http://lists.freebsd.org/pipermail/svn-src-stable-8/2010-March/001804.html>http://lists.freebsd.org/pipermail/svn-src-stable-8/2010-March/001804.html
>
> http://lists.freebsd.org/pipermail/svn-src-stable-8/2010-March/001803.html
>
I saw them, but they didn't seem to be related. I've just tested it, and
my assumptions were correct. A fresh 8-STABLE also freezes.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't

2010-03-29 Thread Attila Nagy
Hi,

Michael Loftis wrote:
>
>
> --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy 
> wrote:
>
> <...>
>> Both unbound and python accepts DNS requests, and it seems when 25%
>> interrupt happens, only unbound is in *udp state, where it is 50%, both
>> programs are in that state.
>
> Try turning of hardware TSO/checksum offload if it's availble on your
> chipset?  ifconfig  -rxcsum -txcsum -tso -- I'm only using
> nfe chips right now, but w/ the TSO/CSUM on they lock up constantly
> under high load.  We're pretty sure it's mostly the nfe driver, or the
> chips themselves, but have never ruled out some generic 8.x hardware
> offload issues.
Bingo, this solved the problem. The current uptime nears four days.
Previously I couldn't go further than a day.

The machine gets very light TCP load (and other machines which get work
well), so I guess it's UDP RX or TX checksum related.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't

2010-03-25 Thread Attila Nagy
Pyun YongHyeon wrote:
> On Thu, Mar 25, 2010 at 03:22:04PM +0100, Attila Nagy wrote:
>   
>> Hi,
>>
>> I have some recursive nameservers, running unbound and 7.2-STABLE #0: 
>> Wed Sep  2 13:37:17 CEST 2009 on a bunch of HP BL460c machines (bce 
>> interfaces).
>> These work OK.
>>
>> During the process of migrating to 8.x, I've upgraded one of these 
>> machines to 8.0-STABLE #25: Tue Mar  9 18:15:34 CET 2010 (the dates 
>> indicate an approximate time, when the source was checked out from 
>> cvsup.hu.freebsd.org, I don't know the exact revision).
>>
>> The first problem was that the machine occasionally lost network access 
>> for some minutes. I could log in on the console, and I could see the 
>> processes, involved in network IO in "keglim" state, but couldn't do any 
>> network IO. This lasted for some minutes, then everything came back to 
>> normal.
>> I could fix this issue by raising kern.ipc.nmbclusters to 51200 
>> (doubling from its default size), when I can't see these blackouts.
>>
>> But now the machine freezes. It can run for about a day, and then it 
>> just freezes. I can't even break in to the debugger with sending NMI to it.
>> top says:
>> last pid: 92428;  load averages:  0.49,  0.40,  0.38up 0+21:13:18  
>> 07:41:43
>> 43 processes:  2 running, 38 sleeping, 1 zombie, 2 lock
>> CPU:  1.3% user,  0.0% nice,  1.3% system, 26.0% interrupt, 71.3% idle
>> Mem: 1682M Active, 99M Inact, 227M Wired, 5444K Cache, 44M Buf, 5899M Free
>> Swap:
>>
>>   PID USERNAME   THR PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
>> 45011 bind 4  490  1734M  1722M RUN 2  37:42 22.17% unbound
>>   712 bind 3  440 70892K 19904K uwait   0  71:07  3.86% 
>> python2.6
>>
>> The common in these freezes seems to be the high interrupt count. 
>> Normally, during load the CPU times look like this:
>> CPU:  3.5% user,  0.0% nice,  1.8% system,  0.4% interrupt, 94.4% idle
>>
>> I could observe a "freeze", where top remained running and everything 
>> was 0%, except interrupt, which was 25% exactly (the machine has four 
>> cores), and another, where I could save the following console output:
>> CPU:  0.0% user,  0.0% nice,  0.2% system, 50.0% interrupt, 49.8% idle
>> 
>
> When you see high number of interrupts, could you check this comes
> from bce(4)? I guess you can use systat(1) to check how many number
> interrupts are generated from bce(4).
>   
I've tried it multiple times, but couldn't yet catch the moment when the
machine was still alive (so the script could run) and there were
increased amount of interrupts.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


8-STABLE freezes on UDP traffic (DNS), 7.x doesn't

2010-03-25 Thread Attila Nagy

Hi,

I have some recursive nameservers, running unbound and 7.2-STABLE #0: 
Wed Sep  2 13:37:17 CEST 2009 on a bunch of HP BL460c machines (bce 
interfaces).

These work OK.

During the process of migrating to 8.x, I've upgraded one of these 
machines to 8.0-STABLE #25: Tue Mar  9 18:15:34 CET 2010 (the dates 
indicate an approximate time, when the source was checked out from 
cvsup.hu.freebsd.org, I don't know the exact revision).


The first problem was that the machine occasionally lost network access 
for some minutes. I could log in on the console, and I could see the 
processes, involved in network IO in "keglim" state, but couldn't do any 
network IO. This lasted for some minutes, then everything came back to 
normal.
I could fix this issue by raising kern.ipc.nmbclusters to 51200 
(doubling from its default size), when I can't see these blackouts.


But now the machine freezes. It can run for about a day, and then it 
just freezes. I can't even break in to the debugger with sending NMI to it.

top says:
last pid: 92428;  load averages:  0.49,  0.40,  0.38up 0+21:13:18  
07:41:43

43 processes:  2 running, 38 sleeping, 1 zombie, 2 lock
CPU:  1.3% user,  0.0% nice,  1.3% system, 26.0% interrupt, 71.3% idle
Mem: 1682M Active, 99M Inact, 227M Wired, 5444K Cache, 44M Buf, 5899M Free
Swap:

  PID USERNAME   THR PRI NICE   SIZERES STATE   C   TIME   WCPU COMMAND
45011 bind 4  490  1734M  1722M RUN 2  37:42 22.17% unbound
  712 bind 3  440 70892K 19904K uwait   0  71:07  3.86% 
python2.6


The common in these freezes seems to be the high interrupt count. 
Normally, during load the CPU times look like this:

CPU:  3.5% user,  0.0% nice,  1.8% system,  0.4% interrupt, 94.4% idle

I could observe a "freeze", where top remained running and everything 
was 0%, except interrupt, which was 25% exactly (the machine has four 
cores), and another, where I could save the following console output:

CPU:  0.0% user,  0.0% nice,  0.2% system, 50.0% interrupt, 49.8% idle
...(partial, broken line)32M  2423M *udp1  50:16 10.89% unbound
  714 bind 3  440 70892K 26852K uwait   3   8:41  4.69% 
python2.6

61004 root 1  620 37428K 10876K *udp1   0:00  1.56% python
  706 root 1  440  2696K   624K piperd  1   0:07  0.00% 
readproctit


Both unbound and python accepts DNS requests, and it seems when 25% 
interrupt happens, only unbound is in *udp state, where it is 50%, both 
programs are in that state.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: NFS on ZFS

2009-06-09 Thread Attila Nagy

Hello,

I've also ran into it, it's a pretty "killer" feature. :-O

Any chance for us on the fix?

Thanks,

Kip Macy wrote:

The flags checks are too strict. File a PR. I'll fix it when I get to
it. Sorrry.


-Kip

On Wed, May 27, 2009 at 7:24 PM, Mike Andrews  wrote:
  

On Tue, 26 May 2009, Mike Andrews wrote:



Takahashi Yoshihiro wrote:
  

Today's stable has a problem creating a new file via NFS on ZFS.

On the NFS server, there is no problem.

% cd /ZFS
% mktemp hoge
hoge
% ls -l hoge
-rw---  1 nyan  nyan  0  5 26 19:09 hoge


But it's a problem on the NFS client.

# mount server:/ZFS /ZFS
% cd /ZFS
% mktemp hoge
mktemp: mkstemp failed on hoge: Input/output error
% ls -l hoge
--  1 nyan  wheel  0  5 26 19:09 hoge

The file has a wrong permission.

This problem is only on stable, current has no problem.


I'm seeing this too.  It seems so far to be limited to mkstemp() -- just
copying files normally works.  For example /usr/bin/install -S fails,
without -S works, if the target is an NFS+ZFS volume.
  

Anyone?

I've verified that if the NFS server uses UFS2, mkstemp() from an NFS
client to the server works fine, but if the NFS server uses ZFS, the NFS
server returns EIO after creating a file with 000 permissions.

In addition to breaking /usr/bin/install -S, it also breaks rsync over NFS.

I don't yet know if it matters whether the on-disk format is ZFS v6 vs v13.







  


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


stat() takes 54 msec in a directory with 94k files (even with a big dirhash)

2009-05-12 Thread Attila Nagy

Hello,

I have a strange error on FreeBSD 7-STABLE (compiled on 7th May, just 
few commits after the release, but an earlier kernel did the same).


I'm doing several parallel rsyncs from a machine to another (let's call 
them source and destination). The source contains maildirs, so there are 
some directories with a (relatively) lot of files.
The source runs an earlier (around 6.2) FreeBSD and plain softupdates 
mounted UFS2 file systems.
The destination has a bigger (UFS2) filesystem, on top of gjournal, 
mounted as async.


I've noticed that rsync sometimes stops moving data and the destination 
machine gets sluggish. After some testing, I could catch the effect in 
action (was not that hard, because it persists even for hours sometimes).


top shows around 20% system activity (there are two quad core CPUs) and 
0% user. The WCPU field at rsync shows 100%.


ktrace-ing the rsync process I can see this:
  31639 rsync0.04 CALL  lstat(0x7fffab70,0x7fffaf70)
  31639 rsync0.04 NAMI  
"hm33/00/16/uid/Maildir/new/1212536121.54673,S=3128"
  31639 rsync0.054226 STRU  struct stat {dev=100, ino=136943662, 
mode=-rw--- , nlink=1, uid=999, gid=999, rdev=546942760, 
atime=1241807071, stime=1212536121, ctime=1241807071, 
birthtime=1212536121, size=3128, blksize=4096, blocks=8, flags=0x0 }

  31639 rsync0.13 RET   lstat 0
  31639 rsync0.18 CALL  lstat(0x7fffab70,0x7fffaf70)
  31639 rsync0.04 NAMI  
"hm33/00/16/uid/Maildir/new/1212537276.69702,S=4634"
  31639 rsync0.054409 STRU  struct stat {dev=100, ino=136943663, 
mode=-rw--- , nlink=1, uid=999, gid=999, rdev=546942762, 
atime=1241807071, stime=1212537276, ctime=1241807071, 
birthtime=1212537276, size=4634, blksize=4096, blocks=12, flags=0x0 }

  31639 rsync0.13 RET   lstat 0
  31639 rsync0.20 CALL  lstat(0x7fffab70,0x7fffaf70)
  31639 rsync0.05 NAMI  
"hm33/00/16/uid/Maildir/new/1212537689.74390,S=3172"
  31639 rsync0.054230 STRU  struct stat {dev=100, ino=136943664, 
mode=-rw--- , nlink=1, uid=999, gid=999, rdev=546942765, 
atime=1241807071, stime=1212537689, ctime=1241807071, 
birthtime=1212537689, size=3172, blksize=4096, blocks=8, flags=0x0 }

  31639 rsync0.13 RET   lstat 0

So according to ktrace, the stat call takes 54 milliseconds to return 
for each of the files.
I have tried with the default and a pretty much raised dirhash maxmem 
value, but I can still get these.

Currently I have:
vfs.ufs.dirhash_docheck: 0
vfs.ufs.dirhash_mem: 18589428
vfs.ufs.dirhash_maxmem: 209715200
vfs.ufs.dirhash_minsize: 2560
So dirhash has space to expand.

The directory in question contains 94493 files.

The source machine doesn't show this behaviour.

top's output on the destination machine:
CPU:  0.0% user,  0.0% nice, 22.7% system,  0.0% interrupt, 77.3% idle
Mem: 159M Active, 3032M Inact, 599M Wired, 47M Cache, 399M Buf, 102M Free
Swap: 4096M Total, 4096M Free

 PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
31639 root 1 1180 50648K 10512K CPU0   0   2:01 100.00% rsync
 634 root 1  -40  2536K   628K vlruwk 1   0:20  0.00% supervise
26760 root 1  440 25940K  3316K select 1   0:10  0.00% sshd
31640 root 1  750 87512K  8324K suspfs 4   0:10  0.00% rsync
31641 root 1  750 18904K  7124K suspfs 6   0:10  0.00% rsync
31637 root 1  750 40408K  7744K suspfs 4   0:09  0.00% rsync
31636 root 1  440 20952K  6288K select 2   0:09  0.00% rsync
31638 root 1  440   104M  8912K select 3   0:09  0.00% rsync
31635 root 1  750 80344K  7812K suspfs 4   0:09  0.00% rsync
31642 root 1  440 17940K  7624K select 1   0:04  0.00% ssh
31646 root 1  450 17940K  7656K select 1   0:03  0.00% ssh

All of the rsyncs use the same file system, but with different top level 
directories. During this, neither of the other rsyncs can run.


Any ideas about what could be done to work around this?

Thanks,
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Mounting devfs over to ZFS from fstab fails

2008-04-02 Thread Attila Nagy

On 2008.03.28. 23:59, Vince wrote:

Attila Nagy wrote:

Hello,

I have some jails running on ZFS, so I have to mount devfs's into them.

For this purpose, I have some similar lines in /etc/fstab:
devfs   /pool/jail/ldap/dev   devfs   rw  0   0

Where /pool is a ZFS filesystem.


I'm not sure if it will have any adverse effects but changing this to
devfs   /pool/jail/ldap/dev   devfs   rw,late  0   0

Will probably fix it. My guess is that the error checking correction 
in the latest version in -stable picked up an error that was being 
incorrectly ignored before.

see
http://www.freebsd.org/cgi/cvsweb.cgi/src/etc/rc.d/mountcritlocal.diff?only_with_tag=RELENG_7&r1=text&tr1=1.14.2.2&r2=text&tr2=1.14 


which i believe is the MFC for
http://www.freebsd.org/cgi/getmsg.cgi?fetch=1314016+1316331+/usr/local/www/db/text/2008/cvs-all/20080309.cvs-all 



Although I cant seen a commit message in cvsweb (i'm still learning 
that though :)

Yes, late works around the problem, thanks.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Mounting devfs over to ZFS from fstab fails

2008-03-28 Thread Attila Nagy

Hello,

I have some jails running on ZFS, so I have to mount devfs's into them.

For this purpose, I have some similar lines in /etc/fstab:
devfs   /pool/jail/ldap/dev   devfs   rw  0   0

Where /pool is a ZFS filesystem.

This has worked until today -when I upgraded from a previous 7-STABLE 
(FreeBSD 7.0-STABLE #16: Fri Mar  7 14:30:08 CET 2008) to today's 
STABLE- but not anymore.


The boot process fails with something like WARNING: $true wasn't set, 
see man rc.conf (or something similar, I don't have the exact error 
message, but I can reproduce if it's needed), and the problem is that 
the rc scripts try to mount the devfs (and nullfs) stuff to the yet 
unmounted ZFS, so /pool/jail/ldap/dev doesn't exist.


If I create them on the root filesystem, the OS boots up fine, but of 
course I don't have the devfs's mounted onto ZFS, they are beneath it 
(umount and mount -a solves the issue). There is a similar problem with 
nullfs's as well.


AFAIK only the following has been changed in rc.d:
./dhclient
./mountcritlocal
./mountlate

neither of them seems to be able to produce this kind of malfunction.

Any ideas?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ng_fec && pseudo-device vlan

2003-02-11 Thread Attila Nagy
Hello,

> Why don't ng_fec (Cisco FastEtherChannel netgraph module) and
> `pseudo-device vlan' (802.1q trunking) work together?
Don't worry. If you will ever upgrade to 5.0, not only VLANs on FEC
interfaces, but FEC itself won't work.

>Cisco Switch  FreeBSD
>
>/Port 4/1--fxp0\   /vlan0
> dot1q Trunk - EtherChannel<>--fec0---<
>\Port 4/2--fxp1/   \vlan1
>\vlan2
> \vlanN
Nice ASCII art. You can safely s/FreeBSD/Linux/g to get this work :~-(
(BTW, is it working under Linux? I don't use Linux)

> I am not skilled enough to diagnose the kernel internals to discern why
> this doesn't work.  I am willing to test any patches that may turn up.
ng_fec is in FreeBSD's tree starting from 5.0, but not yet connected to
the build. I assume it won't ever... :(

If anyone willing to fix it and can understand what's going on in the
kernel, I will gladly provide console (serial) access to a machine which
has two fxp NICs and is connected to a FEC aware switch.

Any takers?

--[ Free Software ISOs - http://www.fsn.hu/?f=download ]--
Attila Nagy e-mail: [EMAIL PROTECTED]
Free Software Network (FSN.HU)phone @work: +361 210 1415 (194)
cell.: +3630 306 6758

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: union fs

2001-11-15 Thread Attila Nagy

Hello,

> I've been running several dataless/diskless machines with union-fs
> (for the /etc stuff) with no - apparent - problems for several months
> now, so i was wondering if the commnet about it being buggy still
> holds?
Try to run a program on a union-mounted FS which uses sendfile() to
transfer data (the easiest to try is the webfsd from the ports tree). It
will send garbage, collected from various places from your harddrive.

------
Attila Nagye-mail:  [EMAIL PROTECTED]
Budapest Polytechnic (BMF.HU)   @work: +361 210 1415 (194)
H-1084 Budapest, Tavaszmezo u. 15-17.   cell.: +3630 306 6758


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message



Re: What's happened with kernel option ATA_ENABLE_ATAPI_DMA?

2001-04-25 Thread Attila Nagy

Hello,

> I'm trying to upgrade to 4.3-STABLE and I've found that my kernel
> doesn't want to build because its config has wrong option
> ATA_ENABLE_ATAPI_DMA. Have been this option removed in 4.3?
Yes.
Try the hw.ata.ata_dma or hw.ata.atapi_dma sysctl instead.

------
Attila Nagye-mail:  [EMAIL PROTECTED]
Budapest Polytechnic (BMF.HU)   @work: +361 210 1415 (194)
H-1084 Budapest, Tavaszmezo u. 15-17.   cell.: +3630 306 6758


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message