date:20080926

bad NFS/UDP performance

2008-09-26 Thread Danny Braniss

Hi,
There seems to be some serious degradation in performance.
Under 7.0 I get about 90 MB/s (on write), while, on the same machine
under 7.1 it drops to 20!
Any ideas?

thanks,
danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: bad NFS/UDP performance

2008-09-26 Thread Claus Guttesen

There seems to be some serious degradation in performance.
 Under 7.0 I get about 90 MB/s (on write), while, on the same machine
 under 7.1 it drops to 20!
 Any ideas?

Can you compare performanc with tcp?

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentler gamester is the soonest winner.

Shakespeare
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: bad NFS/UDP performance

2008-09-26 Thread Jeremy Chadwick

On Fri, Sep 26, 2008 at 10:04:16AM +0300, Danny Braniss wrote:
 Hi,
   There seems to be some serious degradation in performance.
 Under 7.0 I get about 90 MB/s (on write), while, on the same machine
 under 7.1 it drops to 20!
 Any ideas?

1) Network card driver changes,

2) This could be relevant, but rwatson@ will need to help determine
   that.
   http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045109.html

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: vm.kmem_size settings doesn't affect loader?

2008-09-26 Thread Bartosz Stec


Jeremy Chadwick wrote:

On Thu, Sep 25, 2008 at 04:14:02PM +0200, Bartosz Stec wrote:
  

Your options are:

1) Consider increasing it from 512M to something like 1.5GB; do not
increase it past that on RELENG_7, as there isn't support for more than
2GB total.  For example, on a 1GB memory machine, I often recommend
768M.  On 2GB machines, 1536M.  You will need to run -CURRENT if you
want more.

2) Tune ZFS aggressively.  Start by setting vfs.zfs.arc_min=16M
and vfs.zfs.arc_max=64M.

If your machine has some small amount of memory (768MB, 1GB, etc.),
then you probably shouldn't be using ZFS.

  
  
Problem occured on i386 machine with 1GB of memory and 7.1-pre (3HDD,  
40GB, RAIDZ1). I know that i386 is highly unrecommended for ZFS, but  
it's just a home box for testing and learning purposes - I just want to  
know what I'm doing and what should I expect when I decide to put ZFS on  
server machines :) Currently, from posts on freebsd-fs, I conclude that  
even with a gigs of kmem and using AMD64, we still can experience panic  
from kmem_malloc.



The i386 vs. amd64 argument is bogus, if you ask me.  ZFS works on both.
amd64 is recommended because ZFS contains code that makes heavy use of
64-bit values, and because amd64 offers large amounts of addressed
memory without disgusting hacks like PAE.

That said -- yes, even with gigs of kmem and using AMD64, you can
still panic due to kmem exhaustion.  I have fairly decent experience
with this problem, because it haunted me for quite some time.

A large portion of the problem is that kmem_max, on i386 and amd64 (yes,
you read that right) has a 2GB limit on RELENG_7.  I repeat: a 2GB
limit, regardless of i386 or amd64.

This limit has been increased to 512GB on CURRENT, but there are no
plans to MFC those changes, as they are too major.

Let me tell you something I did this weekend.  I had to copy literally
200GB of data from a ZFS raidz1 pool (spread across 3 disks) to two
different places: 1) a UFS2 filesystem on a different disk, and 2)
across a gigE network to a Windows machine.  I had to do this because I
was adding a disk to the vdev, which cannot be done without re-creating
the pool (this is a known problem with ZFS, and has nothing to do with
FreeBSD).

The machine hosting the data runs RELENG_7 with amd64, and contains 4GB
of memory.  However, I've accomplished the same task with only 2GB of
memory as well.

These are the tuning settings I use:

vm.kmem_size=1536M
vm.kmem_size_max=1536M
vfs.zfs.arc_min=16M
vfs.zfs.arc_max=64M

The entire copying process took almost 2 hours.  Not once did I
experience kmem exhaustion.  I can *guarantee* that I would have crashed
the box numerous times had I not tuned the machine with the values
above.

  
Manual tuning is hard for me because I'm not familiar  
with BSD kernel code nor kernel memory management. I'm just an end-user  
who love concepts of ZFS and wait for it to be (more) stable. Of course  
I've followed tuning guide carefully.



I'm an experienced end-user who has very little experience with BSD
kernel code and absolutely no experience with kernel memory management.
Proper tuning is all that's needed, regardless of your knowledge set.

Please try installing 2GB of memory in your i386 box, and then use
the exact loader.conf values I specified above.

  

Thank you for hints.
Yesterday I've added 512 MB memory to box (sum 1,5GB), and set 
vm.kmem_size and vm.kmem_size to 1024M. With pieces of 1024MB, 512MB, 
256MB, 256MB available and 3 memory slots it is hard to have 2GB RAM ;)
Until now it survived world cleaning/building/installing/bonnie++ 
benchmarkink/fs scrubing and general usage. Memory usage seems stable. 
If unfortunately kmem exhaustion will happen again I will experiment 
with ARC settings.
IMHO you've explained gently a lot of zfs tuning concerns in this thread 
and they should be added to tuning guide - espacially explanation of ARC 
and prefetch settings. Thanks again!


--
Bartosz Stec 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: bad NFS/UDP performance

2008-09-26 Thread Danny Braniss

 On Fri, Sep 26, 2008 at 10:04:16AM +0300, Danny Braniss wrote:
  Hi,
  There seems to be some serious degradation in performance.
  Under 7.0 I get about 90 MB/s (on write), while, on the same machine
  under 7.1 it drops to 20!
  Any ideas?
 
 1) Network card driver changes,
could be, but at least iperf/tcp is ok - can't get udp numbers, do you
know of any tool to measure udp performance?
BTW, I also checked on different hardware, and the badness is there.
 
 2) This could be relevant, but rwatson@ will need to help determine
that.

 http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045109.html

gut feeling is that it's somewhere else:


Writing 16 MB file
BSCount / 7.0 --/ / 7.1 -/
   1*512  32768 0.16s  98.11MB/s  0.43s 37.18MB/s
   2*512  16384 0.17s  92.04MB/s  0.46s 34.79MB/s
   4*512   8192 0.16s 101.88MB/s  0.43s 37.26MB/s
   8*512   4096 0.16s  99.86MB/s  0.44s 36.41MB/s
  16*512   2048 0.16s 100.11MB/s  0.50s 32.03MB/s
  32*512   1024 0.26s  61.71MB/s  0.46s 34.79MB/s
  64*512512 0.22s  71.45MB/s  0.45s 35.41MB/s
 128*512256 0.21s  77.84MB/s  0.51s 31.34MB/s
 256*512128 0.19s  82.47MB/s  0.43s 37.22MB/s
 512*512 64 0.18s  87.77MB/s  0.49s 32.69MB/s
1024*512 32 0.18s  89.24MB/s  0.47s 34.02MB/s
2048*512 16 0.17s  91.81MB/s  0.30s 53.41MB/s
4096*512  8 0.16s 100.56MB/s  0.42s 38.07MB/s
8192*512  4 0.82s  19.56MB/s  0.80s 19.95MB/s
   16384*512  2 0.82s  19.63MB/s  0.95s 16.80MB/s
   32768*512  1 0.81s  19.69MB/s  0.96s 16.64MB/s

Average:   75.8633.00

the nfs filer is a NetWork Appliance, and is in use, so i get fluctuations in 
the
measurements, but the relation are similar, good on 7.0, bad on 7.1

Cheers,
danny
 

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: buildworld fails in csh

2008-09-26 Thread Tobias Roth


On 09/25/08 15:14, Andreas Rudisch wrote:

On Thu, 25 Sep 2008 12:49:42 +0200
Tobias Roth [EMAIL PROTECTED] wrote:


heh, that should be RELENG_7.


Update your source tree again and clean up the build dirs.
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html#Q23.4.14.6.

Could be caused by some left overs from a previous build.


That didn't work. What else could I try?

Thanks,
Tobias


--
Tobias Roth   ||   http://fsck.ch   ||   PGP: 0xCE599B4D
| God is a comedian playing to an audience too afraid to laugh.
|  - Voltaire
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: bad NFS/UDP performance

2008-09-26 Thread Jeremy Chadwick

On Fri, Sep 26, 2008 at 12:27:08PM +0300, Danny Braniss wrote:
  On Fri, Sep 26, 2008 at 10:04:16AM +0300, Danny Braniss wrote:
   Hi,
 There seems to be some serious degradation in performance.
   Under 7.0 I get about 90 MB/s (on write), while, on the same machine
   under 7.1 it drops to 20!
   Any ideas?
  
  1) Network card driver changes,
 could be, but at least iperf/tcp is ok - can't get udp numbers, do you
 know of any tool to measure udp performance?
 BTW, I also checked on different hardware, and the badness is there.

According to INDEX, benchmarks/iperf does UDP bandwidth testing.

benchmarks/nttcp should as well.

What network card is in use?  If Intel, what driver version (should be
in dmesg).

  2) This could be relevant, but rwatson@ will need to help determine
 that.
 
  http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045109.html
 
 gut feeling is that it's somewhere else:
 
 Writing 16 MB file
   BSCount / 7.0 --/ / 7.1 -/
1*512  32768 0.16s  98.11MB/s  0.43s 37.18MB/s
2*512  16384 0.17s  92.04MB/s  0.46s 34.79MB/s
4*512   8192 0.16s 101.88MB/s  0.43s 37.26MB/s
8*512   4096 0.16s  99.86MB/s  0.44s 36.41MB/s
   16*512   2048 0.16s 100.11MB/s  0.50s 32.03MB/s
   32*512   1024 0.26s  61.71MB/s  0.46s 34.79MB/s
   64*512512 0.22s  71.45MB/s  0.45s 35.41MB/s
  128*512256 0.21s  77.84MB/s  0.51s 31.34MB/s
  256*512128 0.19s  82.47MB/s  0.43s 37.22MB/s
  512*512 64 0.18s  87.77MB/s  0.49s 32.69MB/s
 1024*512 32 0.18s  89.24MB/s  0.47s 34.02MB/s
 2048*512 16 0.17s  91.81MB/s  0.30s 53.41MB/s
 4096*512  8 0.16s 100.56MB/s  0.42s 38.07MB/s
 8192*512  4 0.82s  19.56MB/s  0.80s 19.95MB/s
16384*512  2 0.82s  19.63MB/s  0.95s 16.80MB/s
32768*512  1 0.81s  19.69MB/s  0.96s 16.64MB/s
 
 Average:   75.8633.00
 
 the nfs filer is a NetWork Appliance, and is in use, so i get fluctuations in 
 the
 measurements, but the relation are similar, good on 7.0, bad on 7.1

Do you have any NFS-related tunings in /etc/rc.conf or /etc/sysctl.conf?

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: buildworld fails in csh

2008-09-26 Thread Jeremy Chadwick

On Fri, Sep 26, 2008 at 11:46:28AM +0200, Tobias Roth wrote:
 On 09/25/08 15:14, Andreas Rudisch wrote:
 On Thu, 25 Sep 2008 12:49:42 +0200
 Tobias Roth [EMAIL PROTECTED] wrote:

 heh, that should be RELENG_7.

 Update your source tree again and clean up the build dirs.
 http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html#Q23.4.14.6.

 Could be caused by some left overs from a previous build.

 That didn't work. What else could I try?

Did you rm -fr /usr/obj/* before rebuilding world?  That didn't work
is too ambiguous.

The build is failing because it claims ICONV_CONST is undefined.

ICONV_CONST is found here:

$ grep -r ICONV_CONST /usr/src/contrib/tcsh /usr/src/bin/csh
/usr/src/contrib/tcsh/config.h.in:#undef ICONV_CONST
/usr/src/contrib/tcsh/configure:#define ICONV_CONST $am_cv_proto_iconv_arg1
/usr/src/contrib/tcsh/sh.func.c:ICONV_CONST char *src;
/usr/src/bin/csh/config.h:#define ICONV_CONST const

src/bin/csh/config.h declares it.

The proper include files are only included if HAVE_ICONV is declared,
which it is (in src/bin/csh/Makefile), as you can see from -DHAVE_ICONV.
You might have to end up giving someone access to your box to solve this
problem.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Problems with FreeBSD 7.1 Pre-Release after Upgrade from 7.0

2008-09-26 Thread Christopher Arnold




On Thu, 25 Sep 2008, [EMAIL PROTECTED] wrote:


After cvsuping the source and recompiling the kernel from 7.0

pid 971 (kldstat), uid 0: exited on signal 11 (core dumped)
fuse4bsd: version 0.3.9-pre1, FUSE ABI 7.8
pid 977 (mdconfig), uid 0: exited on signal 11 (core dumped)
pid 978 (mdconfig), uid 0: exited on signal 11 (core dumped)
acpi_ec0: warning: EC done before starting event wait
pid 1371 (kldstat), uid 1001: exited on signal 11 (core dumped)
pid 4485 (kldstat), uid 0: exited on signal 11 (core dumped)


Just checking, have you rebuilt your userland too?
(And i see you use fuse, you might want to rebuild that to.)

/Chris

--
http://www.arnold.se/chris/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: buildworld fails in csh

2008-09-26 Thread Tobias Roth


On 09/26/08 11:59, Jeremy Chadwick wrote:

On Fri, Sep 26, 2008 at 11:46:28AM +0200, Tobias Roth wrote:

On 09/25/08 15:14, Andreas Rudisch wrote:

On Thu, 25 Sep 2008 12:49:42 +0200
Tobias Roth [EMAIL PROTECTED] wrote:


heh, that should be RELENG_7.

Update your source tree again and clean up the build dirs.
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html#Q23.4.14.6.

Could be caused by some left overs from a previous build.

That didn't work. What else could I try?


Did you rm -fr /usr/obj/* before rebuilding world?  That didn't work
is too ambiguous.


I followed the above URL and did what was suggested there. So That 
didn't work was refering to


# chflags -R noschg /usr/obj/usr
# rm -rf /usr/obj/usr
# cd /usr/src
# make cleandir
# make cleandir


The build is failing because it claims ICONV_CONST is undefined.

ICONV_CONST is found here:

$ grep -r ICONV_CONST /usr/src/contrib/tcsh /usr/src/bin/csh
/usr/src/contrib/tcsh/config.h.in:#undef ICONV_CONST
/usr/src/contrib/tcsh/configure:#define ICONV_CONST $am_cv_proto_iconv_arg1
/usr/src/contrib/tcsh/sh.func.c:ICONV_CONST char *src;
/usr/src/bin/csh/config.h:#define ICONV_CONST const

src/bin/csh/config.h declares it.

The proper include files are only included if HAVE_ICONV is declared,
which it is (in src/bin/csh/Makefile), as you can see from -DHAVE_ICONV.


Nothing seems to be wrong here really.


You might have to end up giving someone access to your box to solve this
problem.


That will not be possible.

I'll wipe out /usr/src as well and re-cvsup, then build from single user 
mode for minimal intervention by shells and environments and see whether 
that might help.


Thanks,
Tobias


--
Tobias Roth   ||   http://fsck.ch   ||   PGP: 0xCE599B4D
| Percusive Maintenance:
| The art of tuning or repairing equipment by hitting it.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

ssh problems when upgrading 5.5 to 6.3

2008-09-26 Thread Christopher Arnold



Hi all,

i'm trying to remotely upgrade a 5.5 system to 6.3 and have run into an 
issue with userland not matching my kernel. (Yes i know i am a bad guy for 
even trying to do a upgrade remote, but this is a dress rehersal for 
future such scenarios.)


Symptoms:
When trying to ssh to the machine with a 6.3 kernel and a 5.5 userland i 
get:

% ssh machine
Password:
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.

And then the motd and after the session is stuck.

I can manage to do ssh machine csh i dont get a prompt but are able to 
execute commands. A tail of /var/log/messages reveal:

Sep 26 12:00:36 web sshd[3012]: error: openpty: Invalid argument
Sep 26 12:00:36 web sshd[3015]: error: session_pty_req: session 0 alloc failed

ok lets do a su and reboot the machine (I have used nextboot to try the 
new kernel out), but su gives me a su: Sorry straight away. Looking in 
messages i see:

Sep 26 11:14:14 web su: in prompt_echo_off(): tcgetattr(): Operation not 
supported
Sep 26 11:14:14 web su: BAD SU chris to root on tty

Ok, i'm totally aware that this is related to running the wrong userland 
for the wrong kernel. But i still would like to explore this problem a 
bit. Thus these questions:


A) Is this issue related to going directly from 5.5 to 6.3?
That is could i have gotten away without theese problems by upgrading to 
6.0 first and then head on to 6.3?


B) do you thing i would have been able to do an su or even login if i 
have had /usr/ports/misc/compat5x installed?


C) Does anyone have a creative way to reboot the machine remote?
You all where waiting for this, wasn't you ;-)
(Or is there a way to get su to survive long enough to do a reebot?)




/Chris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

HELP DEBUG: FreeBSD 6.3-RELEASE-p3 TIMEOUT - WRITE_DMA + other strange behaviour!

2008-09-26 Thread Anton - Valqk

Hello,
I have a VERY strange behaving 6-3p3 with DMA tmieouts and network cards
'dropping traffic'.
Following is the explanation of hardware and the thinga that are happening.
The machine is DELL optiplex PII 300mHZ with 512RAM.
It has 3 NICs:
fxp0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
options=8VLAN_MTU
inet 7.8.9.10 netmask 0xf000 broadcast 7.8.9.255
ether 00:91:21:16:14:bf
media: Ethernet autoselect (100baseTX full-duplex)
status: active
rl0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
options=8VLAN_MTU
inet 8.9.10.11 netmask 0xffe0 broadcast 8.9.10.255
ether 00:02:44:73:2a:fa
media: Ethernet autoselect (100baseTX full-duplex)
status: active
xl0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
options=9RXCSUM,VLAN_MTU
inet 192.168.123.2 netmask 0xff00 broadcast 192.168.123.255
inet 192.168.123.5 netmask 0xff00 broadcast 192.168.123.255
inet 192.168.123.6 netmask 0xff00 broadcast 192.168.123.255
ether 00:c0:4f:20:66:a3
media: Ethernet autoselect (100baseTX full-duplex)
status: active
fxp0 and rl0 are external links to the world and are plugged into pci slots
xl0 is the internal interface and is integrated on motherboard.
It also has 1 PROMISE ULTRA133 ATA pci IDE controller plugged into the
pci slot.
It has 5 disks in it - 4 connected to the PROMISE card and 1 to the
motherboard ide.

they are as follows:
ad0 and ad6 are two identical hitachi disks in gmirror for the system
and a partition that I keep backups on.

ad4, ad5 and ad7 are storage disks - seagates 500GB 8mb cache that I
keep isos etc files on and are the problematic (maybe because of high
traffic operations compared to the other two?).

What is the problem:
Actually there are two problems:
1. I get a lot of dma times outs. mostly on ad5 and ad7 where I keep
files over 4-5MBs and write/read very often with 3-6-8MB/s from the
disk. I don't use ad4 so I can not tell if there's gona be timeous but I
suppose there will (currently has linux partitions on it and is not
mounted). I get these errors:
dmesg.today:ad7: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=5554848
dmesg.today:ad7: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=5914112
dmesg.today:ad7: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=14924096
dmesg.today:ad7: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=374303456
dmesg.today:ad7: FAILURE - WRITE_DMA48 status=51READY,DSC,ERROR
error=10NID_NOT_FOUND LBA=374303456
dmesg.today:g_vfs_done():ad7[WRITE(offset=191643369472,
length=131072)]error = 5
dmesg.today:ad5: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=50757760
dmesg.today:ad5: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=50760192
dmesg.today:ad5: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=12032
dmesg.today:ad5: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=50769792

strange thing is that I'm seeing the g_vfs_done just recently and this
problem is from the very start of this hardware setup of the machine.
The machine used to work with two hitachi disks connected to the ad0 and
ad1 (integrated ide) and only one - xl0 - nic perfectly.
The problems started when I plugged in the PROMISE and other nic cards
and started using it as router, fileserver and backup server (each in
separate jail, except the pf firewall).
2. The other strange issue is that when (I guess) it starts timeouting
*sometimes* not everytime I'm loosing connection to xl0 or fxp0
(sometimes the rl0 works and accepts connections from the outside,
sometimes - not). When I go to the machine and plug a monitor - there
are no messages from kernel, no logs in /var/log/messages or debug -
noting. Stange thing is that I ping host from the local net and it time
outs, ifconfig shows that interface is connected at fd 100mbit and
everyting seems ok. I've tried ifconfig xl0 down up but doesn't help,
tried plugging out the cable and it got connected but not packets passed
- timeout again!
I've rebooted and nic came up. These 'drops' became more and more common
recently and last night I wasn't able to login for about an hour and
after that the machine came back up again by itself!!!that's in the lan
- but it wasn't accessible at all from the outside - strange thins is
that it replied to ping but I wasn't able to even open the ssh port
connection and the nat wasn't working?! After that I've remembered that
at this time I have a cronjob started for about an hour that fetches
into a file a online radio cast for an hour wired!!! it also have
rtorrent, apache22, samba (in a jail) runing.

some output from it can be found here:
http://valqk.ath.cx/tmp/dmesg
http://valqk.ath.cx/tmp/vmstat
http://valqk.ath.cx/tmp/smartctl


please give any ideas/hints/solutions!

thanks a lot to everyone!
cheers,
valqk.
___
freebsd-stable@freebsd.org mailing list

Rare problems in upgrade process (corrupted FS?)

2008-09-26 Thread Jordi Espasa Clofent


Hi all,

I'm traying to update a FreeBSD server box from 6.3p11 to 7.0 and I've 
found a rare problems.


1) I do the sync process with csup(1); next I go into 
/usr/src/sys/amd64/conf to edit the GENERIC file (I use a custimized 
kernels) and this file doesn't exists. Mmmm I decide to repeat the 
process againt other cvsup mirror but I get the same results: GENERIC 
file isn't there.


2) I go to FreeBSD CVSWeb , locate the GENERIC file under the 7_0 tag, 
copy and paste. Yes, I know: a very nasty process. The big problem 
appears when I try to do 'make cleandir' and others. I get the next outputs:


# pwd
/usr/src
# make cleandir
make: don't know how to make cleandir. Stop
# make buildworld
make: don't know how to make buildworld. Stop
# ls -l /usr/bin/make
-r-xr-xr-x  1 root  wheel  351024 Aug 18 13:19 /usr/bin/make
# file /usr/bin/make
/usr/bin/make: ELF 64-bit LSB executable, AMD x86-64, version 1 
(FreeBSD), for FreeBSD 6.3, statically linked, stripped


¿?¿?¿?¿

* I reboot the machine (because of I suspect a very weird FS problem), 
boot in single user mode and do a 'fsck -fy'. Effectively, the fsck(8) 
found and repair several errors. Epecially, one error claims my 
attention: SUPERBLOCK.


* After the theorical FS reparation I'm again in the point 1.

¿Any clues?

--
Thanks,
Jordi Espasa Clofent
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Rare problems in upgrade process (corrupted FS?)

2008-09-26 Thread Jeremy Chadwick

On Fri, Sep 26, 2008 at 12:22:55PM +0200, Jordi Espasa Clofent wrote:
 Hi all,

 I'm traying to update a FreeBSD server box from 6.3p11 to 7.0 and I've  
 found a rare problems.

 1) I do the sync process with csup(1); next I go into  
 /usr/src/sys/amd64/conf to edit the GENERIC file (I use a custimized  
 kernels) and this file doesn't exists. Mmmm I decide to repeat the  
 process againt other cvsup mirror but I get the same results: GENERIC  
 file isn't there.

 2) I go to FreeBSD CVSWeb , locate the GENERIC file under the 7_0 tag,  
 copy and paste. Yes, I know: a very nasty process. The big problem  
 appears when I try to do 'make cleandir' and others. I get the next 
 outputs:

 # pwd
 /usr/src
 # make cleandir
 make: don't know how to make cleandir. Stop
 # make buildworld
 make: don't know how to make buildworld. Stop
 # ls -l /usr/bin/make
 -r-xr-xr-x  1 root  wheel  351024 Aug 18 13:19 /usr/bin/make
 # file /usr/bin/make
 /usr/bin/make: ELF 64-bit LSB executable, AMD x86-64, version 1  
 (FreeBSD), for FreeBSD 6.3, statically linked, stripped

Looks to me like you have no /usr/src/Makefile.

 * After the theorical FS reparation I'm again in the point 1.

None of the information you provided in your above output, however,
shows anything about the filesystem (other than /usr/bin/make).  But
this sounds honestly like some sort of corrupted supdb, or a cvsup
mirror that's broken.

I would do the following:

rm -fr /usr/src/*
rm -fr /var/db/sup/src-all
csup -h cvsupserver -L 2 -g /usr/share/examples/stable-supfile

I can assure you /sys/amd64/conf/GENERIC exists, and is on the cvsup
mirrors.

 * I reboot the machine (because of I suspect a very weird FS problem),  
 boot in single user mode and do a 'fsck -fy'. Effectively, the fsck(8)  
 found and repair several errors. Epecially, one error claims my  
 attention: SUPERBLOCK.

Superblock problems wouldn't explain this; there are hundreds of
superblocks available (you wouldn't be able to use your machine if they
were all horked).

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: buildworld fails in csh

2008-09-26 Thread Jeremy Chadwick

On Fri, Sep 26, 2008 at 12:14:49PM +0200, Tobias Roth wrote:
 On 09/26/08 11:59, Jeremy Chadwick wrote:
 On Fri, Sep 26, 2008 at 11:46:28AM +0200, Tobias Roth wrote:
 On 09/25/08 15:14, Andreas Rudisch wrote:
 On Thu, 25 Sep 2008 12:49:42 +0200
 Tobias Roth [EMAIL PROTECTED] wrote:

 heh, that should be RELENG_7.
 Update your source tree again and clean up the build dirs.
 http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html#Q23.4.14.6.

 Could be caused by some left overs from a previous build.
 That didn't work. What else could I try?

 Did you rm -fr /usr/obj/* before rebuilding world?  That didn't work
 is too ambiguous.

 I followed the above URL and did what was suggested there. So That  
 didn't work was refering to

 # chflags -R noschg /usr/obj/usr
 # rm -rf /usr/obj/usr
 # cd /usr/src
 # make cleandir
 # make cleandir

 The build is failing because it claims ICONV_CONST is undefined.

 ICONV_CONST is found here:

 $ grep -r ICONV_CONST /usr/src/contrib/tcsh /usr/src/bin/csh
 /usr/src/contrib/tcsh/config.h.in:#undef ICONV_CONST
 /usr/src/contrib/tcsh/configure:#define ICONV_CONST $am_cv_proto_iconv_arg1
 /usr/src/contrib/tcsh/sh.func.c:ICONV_CONST char *src;
 /usr/src/bin/csh/config.h:#define ICONV_CONST const

 src/bin/csh/config.h declares it.

 The proper include files are only included if HAVE_ICONV is declared,
 which it is (in src/bin/csh/Makefile), as you can see from -DHAVE_ICONV.

 Nothing seems to be wrong here really.

Being as I just rebuilt world only 2 days ago and I did not run into
this problem, I'm concluding the issue must be with your system.  :-)
It's possible you've done some bizarre tuning in /etc/make.conf or
/etc/src.conf which is somehow breaking the build.

 You might have to end up giving someone access to your box to solve this
 problem.

 That will not be possible.

 I'll wipe out /usr/src as well and re-cvsup, then build from single user  
 mode for minimal intervention by shells and environments and see whether  
 that might help.

I don't see how booting single-user is going to help with any of this.

And do not forget to remove /var/db/sup/src-all if you remove all of
/usr/src.  People often forget this fact.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: HELP DEBUG: FreeBSD 6.3-RELEASE-p3 TIMEOUT - WRITE_DMA + other strange behaviour!

2008-09-26 Thread Jeremy Chadwick

On Fri, Sep 26, 2008 at 01:12:14PM +0300, Anton - Valqk wrote:
 Hello,
 I have a VERY strange behaving 6-3p3 with DMA tmieouts and network cards
 'dropping traffic'.

The disk errors you see are well-known, but the reasons for them
happening differ per person.  Some people replace cables and the problem
goes away.  Others change controller cards.  Others found no solution
and went to Linux.

http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting

Here's some facts:

1) The LBAs reported to have problems are scattered, which indicates to
me there are probably not bad blocks on your disks,

2) You have two separate disks showing the above behaviour, decreasing
the probability of it being bad blocks/sectors,

3) Your dmesg.today doesn't include timestamps, so I have to assume the
problems all happen at once or within short moments of one another,
rather than at random moments throughout a 24 hour period,

 strange thing is that I'm seeing the g_vfs_done just recently and this
 problem is from the very start of this hardware setup of the machine.

I believe the g_vfs_done issues can either be attributed to the disk
errors you're seeing, or oddities with gmirror/GEOM.  I've seen people
report this before, and GEOM often spits back an error on an
index/offset which seems way too large for it to be realistic.

 The machine used to work with two hitachi disks connected to the ad0 and
 ad1 (integrated ide) and only one - xl0 - nic perfectly.
 The problems started when I plugged in the PROMISE and other nic cards
 and started using it as router, fileserver and backup server (each in
 separate jail, except the pf firewall).
 ...

 2. The other strange issue is that when (I guess) it starts timeouting
 *sometimes* not everytime I'm loosing connection to xl0 or fxp0
 (sometimes the rl0 works and accepts connections from the outside,
 sometimes - not). When I go to the machine and plug a monitor - there
 are no messages from kernel, no logs in /var/log/messages or debug -
 noting. Stange thing is that I ping host from the local net and it time
 outs, ifconfig shows that interface is connected at fd 100mbit and
 everyting seems ok. I've tried ifconfig xl0 down up but doesn't help,
 tried plugging out the cable and it got connected but not packets passed
 - timeout again!

I've looked at your dmesg and vmstat output, and I have a feeling the
problem is an obvious one.

Your system has no APIC (this is not a typo), so your system *must*
share IRQs.  You have ***four*** devices on IRQ 11: a USB controller,
your fxp0 card, your rl0 card, and your xl0 card.

 http://valqk.ath.cx/tmp/dmesg
 http://valqk.ath.cx/tmp/vmstat
 http://valqk.ath.cx/tmp/smartctl
 
 please give any ideas/hints/solutions!

I would recommend you start yanking PCI cards out of the system and
see which solve the problem.  You did state once you added the Promise
card (which makes your system have FIVE PCI cards in it?!?  Sheesh) the
problems began.

I can't imagine you'll have a stable system with that many cards in the
box all sharing a single IRQ -- especially on a board that old.

I'd recommend decreasing the amount of cards you have in that system, or
get a motherboard that has an APIC and preferably some reliable on-board
networking (read: Intel chips).  Toss the rl0 card if possible, and
consider replacing the Promise controller with a different one.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Rare problems in upgrade process (corrupted FS?)

2008-09-26 Thread Peter Jeremy

On 2008-Sep-26 12:22:55 +0200, Jordi Espasa Clofent [EMAIL PROTECTED] wrote:
1) I do the sync process with csup(1); next I go into 
/usr/src/sys/amd64/conf to edit the GENERIC file (I use a custimized 
kernels) and this file doesn't exists.

You might like to check your CVSup site against
http://www.mavetju.org/unix/freebsd-mirrors/
to confirm it is updating correctly.  GENERIC should exist.

* I reboot the machine (because of I suspect a very weird FS problem), 
boot in single user mode and do a 'fsck -fy'. Effectively, the fsck(8) 
found and repair several errors. Epecially, one error claims my 
attention: SUPERBLOCK.

It might have been useful if you had kept a record of the exact
messages.  If you repeat the fsck, does it now report any problems?

If you are using an up-to-date CVSup mirror, my next suggestion
would be hardware problems.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgpc1Ionz3bYP.pgp
Description: PGP signature

Re: Rare problems in upgrade process (corrupted FS?)

2008-09-26 Thread Jordi Espasa Clofent


I would do the following:

rm -fr /usr/src/*
rm -fr /var/db/sup/src-all
csup -h cvsupserver -L 2 -g /usr/share/examples/stable-supfile


I've done it. But the results are, at least, curious...

# csup -h cvsup.de.FreeBSD.org -L 2 -g 
/usr/share/examples/cvsup/stable-supfile

Parsing supfile /usr/share/examples/cvsup/stable-supfile
Connecting to cvsup.de.FreeBSD.org
Connected to 212.19.57.134
Server software version: SNAP_16_1h
Negotiating file attribute support
Exchanging collection information
Establishing multiplexed-mode data connection
Running
Updating collection src-all/cvs
Shutting down connection to server
Finished successfully

# cd /usr/src ; ls -la
total 0

Anythings exists now in /usr/src.

I've tried again using another mirror and cvsup(1) instead of csup(1). 
Same results: nothing in /usr/src.


It's desconcerting


I can assure you /sys/amd64/conf/GENERIC exists, and is on the cvsup
mirrors.


Yes, of course. I've checked it from cvsweb.


Superblock problems wouldn't explain this; there are hundreds of
superblocks available (you wouldn't be able to use your machine if they
were all horked).


I've supposed it; your words confirm it.

--
Thanks,
Jordi Espasa Clofent
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Rare problems in upgrade process (corrupted FS?)

2008-09-26 Thread Peter Jeremy

On 2008-Sep-26 13:23:12 +0200, Jordi Espasa Clofent [EMAIL PROTECTED] wrote:
Connecting to cvsup.de.FreeBSD.org

Edwin's script reports this as up-to-date.

# cd /usr/src ; ls -la
total 0

But something is obviously wrong.  Can you post your supfile please.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgpAG1RvfOlva.pgp
Description: PGP signature

Re: Rare problems in upgrade process (corrupted FS?)

2008-09-26 Thread Jeremy Chadwick

On Fri, Sep 26, 2008 at 01:23:12PM +0200, Jordi Espasa Clofent wrote:
 I would do the following:

 rm -fr /usr/src/*
 rm -fr /var/db/sup/src-all
 csup -h cvsupserver -L 2 -g /usr/share/examples/stable-supfile

 I've done it. But the results are, at least, curious...

 # csup -h cvsup.de.FreeBSD.org -L 2 -g  
 /usr/share/examples/cvsup/stable-supfile
 Parsing supfile /usr/share/examples/cvsup/stable-supfile
 Connecting to cvsup.de.FreeBSD.org
 Connected to 212.19.57.134
 Server software version: SNAP_16_1h
 Negotiating file attribute support
 Exchanging collection information
 Establishing multiplexed-mode data connection
 Running
 Updating collection src-all/cvs
 Shutting down connection to server
 Finished successfully

 # cd /usr/src ; ls -la
 total 0

What's df -k have to say about this?  This is truly bizarre.

Can you truss the csup process?  Something like this should work:

truss -o truss.out -s 256 csup {...flags from above...}

Then put truss.out up somewhere where we can get to it?

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: bad NFS/UDP performance

2008-09-26 Thread Gavin Atkinson

On Fri, 2008-09-26 at 10:04 +0300, Danny Braniss wrote:
 Hi,
   There seems to be some serious degradation in performance.
 Under 7.0 I get about 90 MB/s (on write), while, on the same machine
 under 7.1 it drops to 20!
 Any ideas?

The scheduler has been changed to ULE, and NFS has historically been
very sensitive to changes like that.  You could try switching back to
the 4BSD scheduler and seeing if that makes a difference.  If it does,
toggling PREEMPTION would also be interesting to see the results of.

Gavin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Rare problems in upgrade process (corrupted FS?) [SOLVED]

2008-09-26 Thread Jordi Espasa Clofent


Finally I've modified the stable-supfile TAG from

*default release=cvs tag=RELENG_7_0

to

*default release=cvs tag=RELENG_7

and... voilà!... it works!

I've interrupted the csup process (^C) and change again the tag to

*default release=cvs tag=RELENG_7_0

and it works perfecty.

Maybe it's so stupid as the first tag was miss-typed... but I think not. 
I checked it several times.

I'ts solved, but I don't understand yet.

--
Thanks,
Jordi Espasa Clofent
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Rare problems in upgrade process (corrupted FS?) [SOLVED]

2008-09-26 Thread Jeremy Chadwick

On Fri, Sep 26, 2008 at 02:12:08PM +0200, Jordi Espasa Clofent wrote:
 Finally I've modified the stable-supfile TAG from

 *default release=cvs tag=RELENG_7_0

 to

 *default release=cvs tag=RELENG_7

 and... voilà!... it works!

 I've interrupted the csup process (^C) and change again the tag to

 *default release=cvs tag=RELENG_7_0

 and it works perfecty.

 Maybe it's so stupid as the first tag was miss-typed... but I think not.  
 I checked it several times.
 I'ts solved, but I don't understand yet.

The part that doesn't make sense to me is why csup using
/usr/share/example/cvsup/stable-supfile did not work for you.  That file
contains tag=RELENG_7.

Are you modifying this file?  If so, please don't.  Make a copy of it
somewhere and refer to that location.  /root might be a good place.

The next time you install world, /usr/share/examples will be
overwritten, and you'll lose your changes.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: HELP DEBUG: FreeBSD 6.3-RELEASE-p3 TIMEOUT - WRITE_DMA + other strange behaviour!

2008-09-26 Thread Anton - Valqk

Thanks Jeremy and Peter,
you are right that the machine has *lots* ot hardware in it,
I was thinking of the power supply as a reason and measured the 5 and 12
volts - seemd to be ok 11.8 and 5.2 with all hardware in it.
The shared irq is the one I've thought of and that's why I've posted
vmstat -i to hear your opinion.
[forgot to mention that I've read the wiki and next step is to patch the
kernel with
http://freenas.svn.sourceforge.net/viewvc/freenas/branches/0.69/build/kernel-patches/ata/files/patch-ata.diff?view=markup
this patch (any bad words for this patch or could just run - nothing bad
can happen?)]

Yes, I have 3 nics(2 on pci) + pci ide promise, I'll get a smart switch
with vlans and I'll leave just the integrated xl0 and fxp0 with both
external ips on it these days,
but first I'll patch the kernel if Jeremy says it won't hurt (as far as
I saw just a timeout is moved from hardcoded value to a sysctl?)...
I have another promise card that is a raid controller, but when I've
started looking for one I've asked here and there were  answers for
PROMISE ULTRA ATA133 for being a good card for my freebsd (
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=290848+0+archive/2008/freebsd-stable/20080316.freebsd-stable
)
(hmm, just saw that Jeremy pointed out promise card:  'Their Ultra133
TX2 card works fine on 33MHz PCI bus machines; don't worry about the
card being 66MHz, it will downthrottle correctly.') so maybe the problem
will be solved if I leave just two nics and no rl0...
Actually I'm using 6.3 here because I didn't wanted this to happen and I
was ware of such problems happening on 7-current

So test must be done... pls just answer about the patch will it be
helpful or I should try:

1. remove rl0 and run only one isp for the test.
2. replace the ultra 133 card with another one.
3. try to replace the ATA100 cables (the one with 80 wires) with an
older ones with only 40 cabels?
4. ? anything else?


Anton - Valqk wrote:
 Hello,
 I have a VERY strange behaving 6-3p3 with DMA tmieouts and network cards
 'dropping traffic'.
 Following is the explanation of hardware and the thinga that are happening.
 The machine is DELL optiplex PII 300mHZ with 512RAM.
 It has 3 NICs:
 fxp0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
 options=8VLAN_MTU
 inet 7.8.9.10 netmask 0xf000 broadcast 7.8.9.255
 ether 00:91:21:16:14:bf
 media: Ethernet autoselect (100baseTX full-duplex)
 status: active
 rl0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
 options=8VLAN_MTU
 inet 8.9.10.11 netmask 0xffe0 broadcast 8.9.10.255
 ether 00:02:44:73:2a:fa
 media: Ethernet autoselect (100baseTX full-duplex)
 status: active
 xl0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
 options=9RXCSUM,VLAN_MTU
 inet 192.168.123.2 netmask 0xff00 broadcast 192.168.123.255
 inet 192.168.123.5 netmask 0xff00 broadcast 192.168.123.255
 inet 192.168.123.6 netmask 0xff00 broadcast 192.168.123.255
 ether 00:c0:4f:20:66:a3
 media: Ethernet autoselect (100baseTX full-duplex)
 status: active
 fxp0 and rl0 are external links to the world and are plugged into pci slots
 xl0 is the internal interface and is integrated on motherboard.
 It also has 1 PROMISE ULTRA133 ATA pci IDE controller plugged into the
 pci slot.
 It has 5 disks in it - 4 connected to the PROMISE card and 1 to the
 motherboard ide.

 they are as follows:
 ad0 and ad6 are two identical hitachi disks in gmirror for the system
 and a partition that I keep backups on.

 ad4, ad5 and ad7 are storage disks - seagates 500GB 8mb cache that I
 keep isos etc files on and are the problematic (maybe because of high
 traffic operations compared to the other two?).

 What is the problem:
 Actually there are two problems:
 1. I get a lot of dma times outs. mostly on ad5 and ad7 where I keep
 files over 4-5MBs and write/read very often with 3-6-8MB/s from the
 disk. I don't use ad4 so I can not tell if there's gona be timeous but I
 suppose there will (currently has linux partitions on it and is not
 mounted). I get these errors:
 dmesg.today:ad7: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=5554848
 dmesg.today:ad7: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=5914112
 dmesg.today:ad7: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=14924096
 dmesg.today:ad7: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=374303456
 dmesg.today:ad7: FAILURE - WRITE_DMA48 status=51READY,DSC,ERROR
 error=10NID_NOT_FOUND LBA=374303456
 dmesg.today:g_vfs_done():ad7[WRITE(offset=191643369472,
 length=131072)]error = 5
 dmesg.today:ad5: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=50757760
 dmesg.today:ad5: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=50760192
 dmesg.today:ad5: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=12032
 dmesg.today:ad5: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=50769792

 strange thing is

Re: bad NFS/UDP performance

2008-09-26 Thread Danny Braniss

 On Fri, Sep 26, 2008 at 12:27:08PM +0300, Danny Braniss wrote:
   On Fri, Sep 26, 2008 at 10:04:16AM +0300, Danny Braniss wrote:
Hi,
There seems to be some serious degradation in performance.
Under 7.0 I get about 90 MB/s (on write), while, on the same machine
under 7.1 it drops to 20!
Any ideas?
   
   1) Network card driver changes,
  could be, but at least iperf/tcp is ok - can't get udp numbers, do you
  know of any tool to measure udp performance?
  BTW, I also checked on different hardware, and the badness is there.
 
 According to INDEX, benchmarks/iperf does UDP bandwidth testing.

I know, but I get about 1mgb, which seems somewhat low :-(

 
 benchmarks/nttcp should as well.
 
 What network card is in use?  If Intel, what driver version (should be
 in dmesg).

bge: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x9003 
and
bce: Broadcom NetXtreme II BCM5708 1000Base-T (B2)
and intels, but haven't tested there yet.

 
   2) This could be relevant, but rwatson@ will need to help determine
  that.
  
   http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045109.html
  
  gut feeling is that it's somewhere else:
  
  Writing 16 MB file
  BSCount / 7.0 --/ / 7.1 -/
 1*512  32768 0.16s  98.11MB/s  0.43s 37.18MB/s
 2*512  16384 0.17s  92.04MB/s  0.46s 34.79MB/s
 4*512   8192 0.16s 101.88MB/s  0.43s 37.26MB/s
 8*512   4096 0.16s  99.86MB/s  0.44s 36.41MB/s
16*512   2048 0.16s 100.11MB/s  0.50s 32.03MB/s
32*512   1024 0.26s  61.71MB/s  0.46s 34.79MB/s
64*512512 0.22s  71.45MB/s  0.45s 35.41MB/s
   128*512256 0.21s  77.84MB/s  0.51s 31.34MB/s
   256*512128 0.19s  82.47MB/s  0.43s 37.22MB/s
   512*512 64 0.18s  87.77MB/s  0.49s 32.69MB/s
  1024*512 32 0.18s  89.24MB/s  0.47s 34.02MB/s
  2048*512 16 0.17s  91.81MB/s  0.30s 53.41MB/s
  4096*512  8 0.16s 100.56MB/s  0.42s 38.07MB/s
  8192*512  4 0.82s  19.56MB/s  0.80s 19.95MB/s
 16384*512  2 0.82s  19.63MB/s  0.95s 16.80MB/s
 32768*512  1 0.81s  19.69MB/s  0.96s 16.64MB/s
  
  Average:   75.8633.00
  
  the nfs filer is a NetWork Appliance, and is in use, so i get fluctuations 
  in 
  the
  measurements, but the relation are similar, good on 7.0, bad on 7.1
 
 Do you have any NFS-related tunings in /etc/rc.conf or /etc/sysctl.conf?
 
no, but diffing the sysctl show:

-vfs.nfs.realign_test: 22141777
+vfs.nfs.realign_test: 498351

-vfs.nfsrv.realign_test: 5005908
+vfs.nfsrv.realign_test: 0

+vfs.nfsrv.commit_miss: 0
+vfs.nfsrv.commit_blks: 0

changing them did nothing - or at least with respect to nfs throughput :-)

danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: bad NFS/UDP performance

2008-09-26 Thread Danny Braniss

 On Fri, 2008-09-26 at 10:04 +0300, Danny Braniss wrote:
  Hi,
  There seems to be some serious degradation in performance.
  Under 7.0 I get about 90 MB/s (on write), while, on the same machine
  under 7.1 it drops to 20!
  Any ideas?
 
 The scheduler has been changed to ULE, and NFS has historically been
 very sensitive to changes like that.  You could try switching back to
 the 4BSD scheduler and seeing if that makes a difference.  If it does,
 toggling PREEMPTION would also be interesting to see the results of.
 
 Gavin

I'm testing 7.0-stable vs 7.1-prerelease, and both have ULE.
BTW, the nfs client hosts I'm testing are idle.

danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: bad NFS/UDP performance

2008-09-26 Thread John Baldwin

On Friday 26 September 2008 03:04:16 am Danny Braniss wrote:
 Hi,
   There seems to be some serious degradation in performance.
 Under 7.0 I get about 90 MB/s (on write), while, on the same machine
 under 7.1 it drops to 20!
 Any ideas?
 
 thanks,
   danny

Perhaps use nfsstat to see if 7.1 is performing more on-the-wire requests?  
Also, if you can, do a binary search to narrow down when the regression 
occurred in RELENG_7.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: bad NFS/UDP performance

2008-09-26 Thread Jeremy Chadwick

On Fri, Sep 26, 2008 at 04:35:17PM +0300, Danny Braniss wrote:
  On Fri, Sep 26, 2008 at 12:27:08PM +0300, Danny Braniss wrote:
On Fri, Sep 26, 2008 at 10:04:16AM +0300, Danny Braniss wrote:
 Hi,
   There seems to be some serious degradation in performance.
 Under 7.0 I get about 90 MB/s (on write), while, on the same machine
 under 7.1 it drops to 20!
 Any ideas?

1) Network card driver changes,
   could be, but at least iperf/tcp is ok - can't get udp numbers, do you
   know of any tool to measure udp performance?
   BTW, I also checked on different hardware, and the badness is there.
  
  According to INDEX, benchmarks/iperf does UDP bandwidth testing.
 
 I know, but I get about 1mgb, which seems somewhat low :-(
 
  
  benchmarks/nttcp should as well.
  
  What network card is in use?  If Intel, what driver version (should be
  in dmesg).
 
 bge: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x9003 
 and
 bce: Broadcom NetXtreme II BCM5708 1000Base-T (B2)
 and intels, but haven't tested there yet.

Both bge(4) and bce(4) claim to support checksum offloading.  You might
try disabling it (ifconfig ... -txcsum -rxcsum) to see if things
improve.  If not, more troubleshooting is needed.  You might also try
turning off TSO if it's supported (check your ifconfig output for TSO in
the options= section.  Then use ifconfig ... -tso)

  Do you have any NFS-related tunings in /etc/rc.conf or /etc/sysctl.conf?
  
 no, but diffing the sysctl show:
 
   -vfs.nfs.realign_test: 22141777
   +vfs.nfs.realign_test: 498351
 
   -vfs.nfsrv.realign_test: 5005908
   +vfs.nfsrv.realign_test: 0
 
   +vfs.nfsrv.commit_miss: 0
   +vfs.nfsrv.commit_blks: 0
 
 changing them did nothing - or at least with respect to nfs throughput :-)

I'm not sure what any of these do, as NFS is a bit out of my league.
:-)  I'll be following this thread though!

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: bad NFS/UDP performance

2008-09-26 Thread Danny Braniss

 On Fri, Sep 26, 2008 at 12:27:08PM +0300, Danny Braniss wrote:
   On Fri, Sep 26, 2008 at 10:04:16AM +0300, Danny Braniss wrote:
Hi,
There seems to be some serious degradation in performance.
Under 7.0 I get about 90 MB/s (on write), while, on the same machine
under 7.1 it drops to 20!
Any ideas?
   
   1) Network card driver changes,
  could be, but at least iperf/tcp is ok - can't get udp numbers, do you
  know of any tool to measure udp performance?
  BTW, I also checked on different hardware, and the badness is there.
 
 According to INDEX, benchmarks/iperf does UDP bandwidth testing.
 
 benchmarks/nttcp should as well.
 
 What network card is in use?  If Intel, what driver version (should be
 in dmesg).
 
   2) This could be relevant, but rwatson@ will need to help determine
  that.
  
   http://lists.freebsd.org/pipermail/freebsd-stable/2008-September/045109.html
  
  gut feeling is that it's somewhere else:
  
  Writing 16 MB file
  BSCount / 7.0 --/ / 7.1 -/
 1*512  32768 0.16s  98.11MB/s  0.43s 37.18MB/s
 2*512  16384 0.17s  92.04MB/s  0.46s 34.79MB/s
 4*512   8192 0.16s 101.88MB/s  0.43s 37.26MB/s
 8*512   4096 0.16s  99.86MB/s  0.44s 36.41MB/s
16*512   2048 0.16s 100.11MB/s  0.50s 32.03MB/s
32*512   1024 0.26s  61.71MB/s  0.46s 34.79MB/s
64*512512 0.22s  71.45MB/s  0.45s 35.41MB/s
   128*512256 0.21s  77.84MB/s  0.51s 31.34MB/s
   256*512128 0.19s  82.47MB/s  0.43s 37.22MB/s
   512*512 64 0.18s  87.77MB/s  0.49s 32.69MB/s
  1024*512 32 0.18s  89.24MB/s  0.47s 34.02MB/s
  2048*512 16 0.17s  91.81MB/s  0.30s 53.41MB/s
  4096*512  8 0.16s 100.56MB/s  0.42s 38.07MB/s
  8192*512  4 0.82s  19.56MB/s  0.80s 19.95MB/s
 16384*512  2 0.82s  19.63MB/s  0.95s 16.80MB/s
 32768*512  1 0.81s  19.69MB/s  0.96s 16.64MB/s
  
  Average:   75.8633.00
  
  the nfs filer is a NetWork Appliance, and is in use, so i get fluctuations 
  in 
  the
  measurements, but the relation are similar, good on 7.0, bad on 7.1
 
 Do you have any NFS-related tunings in /etc/rc.conf or /etc/sysctl.conf?
 

after more testing, it seems it's related to changes made between Aug 4 and 
Aug 29
ie, a kernel built on Aug 4 works fine, Aug 29 is slow.
I'l now try and close the gap.

danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: rl0: watchdog timeout + 40, 000 ms ping with 7.1-BETA-i386-disc1.iso

2008-09-26 Thread Julian Stacey

Hi All
Jeremy Chadwick wrote:
 On Thu, Sep 25, 2008 at 05:36:44PM +0200, Julian Stacey wrote:
  Hi stable@,
  I just imported an old tower from a friend. Used to run Linux OK.
  Reset BIOS to defaults, turned off power saving etc, installed
  7.1-BETA-i386-disc1.iso
  I now sees 
  rl0: watchdog timeout + 40,000 ms ping outgoing.
  ping incoming fails,
  it's not my net switch, I've moved to different segments etc  all else fine
  
  I'm remaking binaries,  will look around for netstat r whatever
  commands later, meanwhile here's dmesg (via a floppy)
  
  Of course it could be somehow a hardaware bad config, its a new box to me.
 
 It's a new box with hardware from the late 90s?  :-)

Yes, new to me :-)
The offer I got was Do you want this or shall I dump it ? :-)


  Copyright (c) 1992-2008 The FreeBSD Project.
  Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
  The Regents of the University of California. All rights reserved.
  FreeBSD is a registered trademark of The FreeBSD Foundation.
  FreeBSD 7.1-BETA #0: Sun Sep  7 13:49:18 UTC 2008
  [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
  Timecounter i8254 frequency 1193182 Hz quality 0
  CPU: Intel Pentium III (651.48-MHz 686-class CPU)
Origin = GenuineIntel  Id = 0x681  Stepping = 1

  Features=0x383f9ffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE
  real memory  = 134152192 (127 MB)
  avail memory = 117157888 (111 MB)
  kbd1 at kbdmux0
  ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
  acpi0: AWARD AWRDACPI on motherboard
  acpi0: [ITHREAD]
  ACPI Error (psargs-0459): [INX_] Namespace lookup failure, AE_NOT_FOUND
  ACPI Error (psparse-0626): Method parse/execution failed [\\_SB_.PCI0._PRW] 
  (Node 0xc1bd6700), AE_NOT_FOUND
  acpi0: Power Button (fixed)
  acpi0: reservation of 0, a (3) failed
  acpi0: reservation of 10, 7ef (3) failed
  Timecounter ACPI-safe frequency 3579545 Hz quality 850
  acpi_timer0: 24-bit timer at 3.579545MHz port 0x4008-0x400b on acpi0
  pcib0: ACPI Host-PCI bridge port 
  0xcf8-0xcff,0x4000-0x407f,0x4080-0x40ff,0x5000-0x500f on acpi0
  pci0: ACPI PCI bus on pcib0
  agp0: VIA 82C691 (Apollo Pro) host to PCI bridge on hostb0
  agp0: aperture size is 256M
  pcib1: PCI-PCI bridge at device 1.0 on pci0
  pci1: PCI bus on pcib1
  vgapci0: VGA-compatible display port 0xc000-0xc0ff mem 
  0xe000-0xe7ff,0xed00-0xed00 irq 11 at device 0.0 on pci1
  isab0: PCI-ISA bridge at device 7.0 on pci0
  isa0: ISA bus on isab0
  atapci0: VIA 82C596B UDMA66 controller port 
  0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xd000-0xd00f at device 7.1 on pci0
  ata0: ATA channel 0 on atapci0
  ata0: [ITHREAD]
  ata1: ATA channel 1 on atapci0
  ata1: [ITHREAD]
  uhci0: VIA 83C572 USB controller port 0xd400-0xd41f irq 10 at device 7.2 
  on pci0
  uhci0: [GIANT-LOCKED]
  uhci0: [ITHREAD]
  usb0: VIA 83C572 USB controller on uhci0
  usb0: USB revision 1.0
  uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 on usb0
  uhub0: 2 ports with 2 removable, self powered
  pci0: bridge, HOST-PCI at device 7.3 (no driver attached)
  rl0: RealTek 8139 10/100BaseTX port 0xd800-0xd8ff mem 
  0xee00-0xeeff irq 12 at device 10.0 on pci0
  miibus0: MII bus on rl0
  rlphy0: RealTek internal media interface PHY 0 on miibus0
  rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
  rl0: Ethernet address: 00:08:a1:6d:65:07
  rl0: [ITHREAD]
  pci0: multimedia, audio at device 11.0 (no driver attached)
  cpu0: ACPI CPU on acpi0
  acpi_throttle0: ACPI CPU Throttling on cpu0
  acpi_button0: Power Button on acpi0
  acpi_tz0: Thermal Zone on acpi0
  fdc0: floppy drive controller port 0x3f2-0x3f5,0x3f7 irq 6 drq 2 on acpi0
  fdc0: [FILTER]
  fd0: 1440-KB 3.5 drive on fdc0 drive 0
  sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x10 on 
  acpi0
  sio0: type 16550A
  sio0: [FILTER]
  sio1: 16550A-compatible COM port port 0x2f8-0x2ff irq 3 on acpi0
  sio1: type 16550A
  sio1: [FILTER]
  atkbdc0: Keyboard controller (i8042) port 0x60,0x64 irq 1 on acpi0
  atkbd0: AT Keyboard irq 1 on atkbdc0
  kbd0 at atkbd0
  atkbd0: [GIANT-LOCKED]
  atkbd0: [ITHREAD]
  ACPI Error (psargs-0459): [INX_] Namespace lookup failure, AE_NOT_FOUND
  ACPI Error (psparse-0626): Method parse/execution failed [\\_SB_.PCI0._PRW] 
  (Node 0xc1bd6700), AE_NOT_FOUND
  ACPI Error (psargs-0459): [INX_] Namespace lookup failure, AE_NOT_FOUND
  ACPI Error (psparse-0626): Method parse/execution failed [\\_SB_.PCI0._PRW] 
  (Node 0xc1bd6700), AE_NOT_FOUND
  pmtimer0 on isa0
  orm0: ISA Option ROM at iomem 0xc-0xccfff pnpid ORM on isa0
  sc0: System console at flags 0x100 on isa0
  sc0: VGA 16 virtual consoles, flags=0x300
  vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0
  Timecounter TSC frequency 651482522 Hz quality 800
  Timecounters tick every 1.000 msec
  ad0: 4110MB QUANTUM FIREBALL SE4.3A API.0C00 at ata0-master

Re: vm.kmem_size settings doesn't affect loader?

2008-09-26 Thread Ben Kelly


On Sep 26, 2008, at 4:43 AM, Bartosz Stec wrote:

Jeremy Chadwick wrote:


These are the tuning settings I use:

vm.kmem_size=1536M
vm.kmem_size_max=1536M
vfs.zfs.arc_min=16M
vfs.zfs.arc_max=64M

Yesterday I've added 512 MB memory to box (sum 1,5GB), and set  
vm.kmem_size and vm.kmem_size to 1024M. With pieces of 1024MB,  
512MB, 256MB, 256MB available and 3 memory slots it is hard to have  
2GB RAM ;)
Until now it survived world cleaning/building/installing/bonnie++  
benchmarkink/fs scrubing and general usage. Memory usage seems  
stable. If unfortunately kmem exhaustion will happen again I will  
experiment with ARC settings.
IMHO you've explained gently a lot of zfs tuning concerns in this  
thread and they should be added to tuning guide - espacially  
explanation of ARC and prefetch settings. Thanks again!


Did you increase KVA_PAGES in your kernel config as well?

The default of 256 only allows 1GB of kernel memory total.  Setting  
KVA_PAGES to 384 would probably be good for a kmem_size of 1GB.  This  
would give leave you with 512MB of space for other things in the  
kernel.  In your kernel config:


   optionsKVA_PAGES=384

Sorry if you already knew this.  I know its in the zfs tuning guide.   
I just hadn't seen it mentioned in the thread yet and wanted to make  
sure it wasn't missed.


Hope that helps.

- Ben

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: buildworld fails in csh

2008-09-26 Thread Tobias Roth


On 09/26/08 12:49, Jeremy Chadwick wrote:

Being as I just rebuilt world only 2 days ago and I did not run into
this problem, I'm concluding the issue must be with your system.  :-)
It's possible you've done some bizarre tuning in /etc/make.conf or
/etc/src.conf which is somehow breaking the build.


I checked make.conf already, since that is usually the cause when I have 
such problems. I didn't know about src.conf, I'll have a look at its 
manpage (so, since I don't have one, that can't be the cause of my 
problem either).


I'll wipe out /usr/src as well and re-cvsup, then build from single user  
mode for minimal intervention by shells and environments and see whether  
that might help.


I don't see how booting single-user is going to help with any of this.


I was finally able to do a buildworld by doing it from single user mode.

My guess is that the root of the problem was with either the shell I was 
using or some environment variables. Going to single user mode was just 
the safest way to remove all those possible effects, since I'm not quite 
sure how to do it in another way. But I agree, single user mode itself 
is not likely to help other than that.



And do not forget to remove /var/db/sup/src-all if you remove all of
/usr/src.  People often forget this fact.


I forgot it as well :-)

Thanks,
Tobias

--
Tobias Roth   ||   http://fsck.ch   ||   PGP: 0xCE599B4D
| You can't have everything. Where would you put it?
|  - Steven Wright
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

7.1-PRERELEASE freezes

2008-09-26 Thread Christian Laursen

Hello,

I decided to give 7.1-PRERELEASE a try on one of my machines to find
out if there might be any problems I should be aware of.

I quickly ran into problems. After a while the system freezes
completely. It seems to be somehow related to the load of the machine
as it doesn't seem to happen when it is idle. I built a kernel with
software watchdog enabled and enabled watchdog which had the nice
effect of turning the freeze into a panic. Hopefully that will be of
some help.

I first encountered the problem using SCHED_ULE and then tried if
SCHED_4BSD made any difference. But the freeze happens with either
scheduler.

I have disabled xorg and the nvidia driver but that doesn't help
either. I can cut down on various other stuff too, but first I hope
that someone here have a more educated guess about what could be the
cause of the freezes.

I have placed the backtraces from the most recent crashes as well as
the demsg output from the most recent boot at this URL:
http://borderworlds.dk/~xi/7.1-PRERELEASE.freeze.txt

My kernel config is also included.

As far as I can tell the two backtraces are identical and look like
this:

#0  doadump () at pcpu.h:196
#1  0xc05abd03 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc05abeff in panic (fmt=Variable fmt is not available.
) at /usr/src/sys/kern/kern_shutdown.c:572
#3  0xc0570d18 in hardclock (usermode=0, pc=3231434181) at 
/usr/src/sys/kern/kern_clock.c:642
#4  0xc07d194f in clkintr (frame=0xe38e1c68) at 
/usr/src/sys/i386/isa/clock.c:164
#5  0xc07c0465 in intr_execute_handlers (isrc=0xc0866700, frame=0xe38e1c68) at 
/usr/src/sys/i386/i386/intr_machdep.c:366
#6  0xc07d0fa8 in atpic_handle_intr (vector=0, frame=0xe38e1c68) at 
/usr/src/sys/i386/isa/atpic.c:596
#7  0xc07bbf41 in Xatpic_intr0 () at atpic_vector.s:62
#8  0xc09bc5c5 in acpi_cpu_c1 () at 
/usr/src/sys/modules/acpi/acpi/../../../i386/acpica/acpi_machdep.c:550
#9  0xc09b54f4 in acpi_cpu_idle () at 
/usr/src/sys/modules/acpi/acpi/../../../dev/acpica/acpi_cpu.c:945
#10 0xc07c35b6 in cpu_idle () at /usr/src/sys/i386/i386/machdep.c:1183
#11 0xc05c9275 in sched_idletd (dummy=0x0) at 
/usr/src/sys/kern/sched_4bsd.c:1429
#12 0xc05895d6 in fork_exit (callout=0xc05c9260 sched_idletd, arg=0x0, 
frame=0xe38e1d38) at /usr/src/sys/kern/kern_fork.c:804
#13 0xc07bbf10 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:264

I can provide more information as needed.

Any help will be greatly appreciated.

Thanks.

-- 
Christian Laursen
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: rl0: watchdog timeout + 40, 000 ms ping with 7.1-BETA-i386-disc1.iso

2008-09-26 Thread Julian Stacey

   I'm remaking binaries,

New generic kernel built  installed,  install of all src/ done too.
No improvement.

 Is there reliable way to reproduce the issue? 

Its continuous, the machine virtually never does a ping in less
than 10 seconds.

 Anyway, would you try attached patch and let me know result?

Thanks
Done, doesnt help.
Seeing a new message now too:
ping: sendto: No buffer space available.

Output of vmstat -i and pciconf -lv look the same as before

It's a small card. Weighs 46 gram. I was going to write 
I could simply post it to you,  you could keep it if you
want.  As I had quessed it might be some new kind of card
unexperienced before,
RTL8139D, card just says made in China

But I just grabbed another card 
card says Level One.
chip 8139B
 with both patched kernel  original no improvement.
So I tried a totaly different card xl0 fails too,
I think that 3com xl0 card was OK before in another box,
so I'd guess not an rl problem, Sorry.

Probably not 7.1 either, but probably a BIOS config problem of some sort.

IRQ 12 was listed in Award BIOS as Primary, options were also secondary or 
disabled, so Ive set it disabled.
PNP OS Yes
Resources: Auto
Reset config data to Enabled (I forgot before after card changes)

Did another restore BIOS factory defaults, no help.  
Moved xl0 to another slot (all other 3 slots never use I guess, as
chassis plates not torn off on what I guess is original chassis.
No luck with xl0
I'm out of ideas.


Cheers,
Julian
-- 
Julian Stacey: BSDUnixLinux C Prog Admin SysEng Consult Munich www.berklix.com
  Mail plain ASCII text.  HTML  Base64 text are spam. www.asciiribbon.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: rl0: watchdog timeout + 40, 000 ms ping with 7.1-BETA-i386-disc1.iso

2008-09-26 Thread Julian Stacey

Hi,
Reference:
 From: Julian Stacey [EMAIL PROTECTED] 
 Date: Fri, 26 Sep 2008 19:16:57 +0200 
 Message-id:   [EMAIL PROTECTED] 

Julian Stacey wrote:
I'm remaking binaries,

 New generic kernel built  installed,  install of all src/ done too.
 No improvement.

  Is there reliable way to reproduce the issue? 

 Its continuous, the machine virtually never does a ping in less
 than 10 seconds.

  Anyway, would you try attached patch and let me know result?

 Thanks
 Done, doesnt help.
 Seeing a new message now too:
 ping: sendto: No buffer space available.

 Output of vmstat -i and pciconf -lv look the same as before

 It's a small card. Weighs 46 gram. I was going to write 
   I could simply post it to you,  you could keep it if you
   want.  As I had quessed it might be some new kind of card
   unexperienced before,
   RTL8139D, card just says made in China

 But I just grabbed another card 
   card says Level One.
   chip 8139B
  with both patched kernel  original no improvement.
 So I tried a totaly different card xl0 fails too,
 I think that 3com xl0 card was OK before in another box,
 so I'd guess not an rl problem, Sorry.

 Probably not 7.1 either, but probably a BIOS config problem of some sort.

 IRQ 12 was listed in Award BIOS as Primary, options were also secondary or 
 disabled, so Ive set it disabled.
 PNP OS Yes
 Resources: Auto
 Reset config data to Enabled (I forgot before after card changes)

 Did another restore BIOS factory defaults, no help.  
 Moved xl0 to another slot (all other 3 slots never use I guess, as
 chassis plates not torn off on what I guess is original chassis.
 No luck with xl0
 I'm out of ideas.

Got it working on xl
interrupt problem, I turned off lpt com2  something else
in bios.
Got to go out now
Ill go back to rl0 too  report back soon
thanks for help both !

Cheers,
Julian
-- 
Julian Stacey: BSDUnixLinux C Prog Admin SysEng Consult Munich www.berklix.com
  Mail plain ASCII text.  HTML  Base64 text are spam. www.asciiribbon.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: rl0: watchdog timeout + 40, 000 ms ping with 7.1-BETA-i386-disc1.iso

2008-09-26 Thread Abdullah Ibn Hamad Al-Marri

- Original Message 

 From: Julian Stacey [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]
 Sent: Friday, September 26, 2008 8:16:57 PM
 Subject: Re: rl0: watchdog timeout + 40, 000 ms ping with 
 7.1-BETA-i386-disc1.iso 

I'm remaking binaries,

 New generic kernel built  installed,  install of all src/ done too.
 No improvement.

  Is there reliable way to reproduce the issue? 

 Its continuous, the machine virtually never does a ping in less
 than 10 seconds.

  Anyway, would you try attached patch and let me know result?

 Thanks
 Done, doesnt help.
 Seeing a new message now too:
 ping: sendto: No buffer space available.

 Output of vmstat -i and pciconf -lv look the same as before

 It's a small card. Weighs 46 gram. I was going to write 
 I could simply post it to you,  you could keep it if you
 want.  As I had quessed it might be some new kind of card
 unexperienced before,
 RTL8139D, card just says made in China

 But I just grabbed another card 
 card says Level One.
 chip 8139B
  with both patched kernel  original no improvement.
 So I tried a totaly different card xl0 fails too,
 I think that 3com xl0 card was OK before in another box,
 so I'd guess not an rl problem, Sorry.

 Probably not 7.1 either, but probably a BIOS config problem of some sort.

 IRQ 12 was listed in Award BIOS as Primary, options were also secondary or 
 disabled, so Ive set it disabled.
 PNP OS Yes
 Resources: Auto
 Reset config data to Enabled (I forgot before after card changes)

 Did another restore BIOS factory defaults, no help.  
 Moved xl0 to another slot (all other 3 slots never use I guess, as
 chassis plates not torn off on what I guess is original chassis.
 No luck with xl0
 I'm out of ideas.

 Cheers,
 Julian
 -- 
 Julian Stacey: BSDUnixLinux C Prog Admin SysEng Consult Munich www.berklix.com
   Mail plain ASCII text.  HTML  Base64 text are spam. www.asciiribbon.org

Just a shot in the darkness.

Do you have poll enabled for rl0 ?

Regards,

-Abdullah Ibn Hamad Al-Marri
Arab Portal
http://www.WeArab.Net/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: bad NFS/UDP performance

2008-09-26 Thread Matthew Dillon


:  -vfs.nfs.realign_test: 22141777
:  +vfs.nfs.realign_test: 498351
: 
:  -vfs.nfsrv.realign_test: 5005908
:  +vfs.nfsrv.realign_test: 0
: 
:  +vfs.nfsrv.commit_miss: 0
:  +vfs.nfsrv.commit_blks: 0
: 
: changing them did nothing - or at least with respect to nfs throughput :-)
:
:I'm not sure what any of these do, as NFS is a bit out of my league.
::-)  I'll be following this thread though!
:
:-- 
:| Jeremy Chadwickjdc at parodius.com |

A non-zero nfs_realign_count is bad, it means NFS had to copy the
mbuf chain to fix the alignment.  nfs_realign_test is just the
number of times it checked.  So nfs_realign_test is irrelevant.
it's nfs_realign_count that matters.

Several things can cause NFS payloads to be improperly aligned.
Anything from older network drivers which can't start DMA on a 
2-byte boundary, resulting in the 14-byte encapsulation header 
causing improper alignment of the IP header  payload, to rpc
embedded in NFS TCP streams winding up being misaligned.

Modern network hardware either support 2-byte-aligned DMA, allowing
the encapsulation to be 2-byte aligned so the payload winds up being
4-byte aligned, or support DMA chaining allowing the payload to be
placed in its own mbuf, or pad, etc.

--

One thing I would check is to be sure a couple of nfsiod's are running
on the client when doing your tests.  If none are running the RPCs wind
up being more synchronous and less pipelined.  Another thing I would
check is IP fragment reassembly statistics (for UDP) - there should be
none for TCP connections no matter what the NFS I/O size selected.

(It does seem more likely to be scheduler-related, though).

-Matt

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

[RELENG_6] Works Fine For Me!

2008-09-26 Thread Sean Bruno

Just an effort to test RELENG_6 .  No issues noted on my Dell server.  
Nice work folks!



Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
   The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.4-PRERELEASE #1 r183385M: Fri Sep 26 11:13:48 PDT 2008
   
[EMAIL PROTECTED]:/usr/obj/home/sbruno/bsd/6/sys/GENERIC

ACPI APIC Table: DELL   PE BKC  
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2793.19-MHz 686-class CPU)
 Origin = GenuineIntel  Id = 0xf41  Stepping = 1
 
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE

 Features2=0x641dSSE3,RSVD2,MON,DS_CPL,CNXT-ID,CX16,xTPR
 AMD Features=0x2010NX,LM
 Logical CPUs per core: 2
real memory  = 1073479680 (1023 MB)
avail memory = 1037283328 (989 MB)
ioapic0: Changing APIC ID to 8
ioapic1: Changing APIC ID to 9
ioapic2: Changing APIC ID to 10
ioapic0 Version 2.0 irqs 0-23 on motherboard
ioapic1 Version 2.0 irqs 32-55 on motherboard
ioapic2 Version 2.0 irqs 64-87 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
hptrr: HPT RocketRAID controller driver v1.1 (Sep 26 2008 11:10:49)
acpi0: DELL PE BKC on motherboard
acpi0: Power Button (fixed)
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on 
acpi0

Timecounter HPET frequency 14318180 Hz quality 900
cpu0: ACPI CPU on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pcib1: ACPI PCI-PCI bridge at device 2.0 on pci0
pci1: ACPI PCI bus on pcib1
pcib2: ACPI PCI-PCI bridge at device 0.0 on pci1
pci2: ACPI PCI bus on pcib2
mpt0: LSILogic 1030 Ultra4 Adapter port 0xec00-0xecff mem 
0xfe9f-0xfe9f,0xfe9e-0xfe9e irq 34 at device 5.0 on pci2

mpt0: [GIANT-LOCKED]
mpt0: MPI Version=1.2.12.0
pcib3: ACPI PCI-PCI bridge at device 0.2 on pci1
pci3: ACPI PCI bus on pcib3
ahc0: Adaptec 29160 Ultra160 SCSI adapter port 0xdc00-0xdcff mem 
0xfe7ff000-0xfe7f irq 37 at device 11.0 on pci3

ahc0: [GIANT-LOCKED]
aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
pcib4: ACPI PCI-PCI bridge at device 4.0 on pci0
pci4: ACPI PCI bus on pcib4
pcib5: ACPI PCI-PCI bridge at device 5.0 on pci0
pci5: ACPI PCI bus on pcib5
pcib6: ACPI PCI-PCI bridge at device 0.0 on pci5
pci6: ACPI PCI bus on pcib6
em0: Intel(R) PRO/1000 Network Connection Version - 6.7.3 port 
0xccc0-0xccff mem 0xfe4e-0xfe4f irq 64 at device 7.0 on pci6

em0: Ethernet address: 00:11:43:e2:ff:fd
pcib7: ACPI PCI-PCI bridge at device 0.2 on pci5
pci7: ACPI PCI bus on pcib7
em1: Intel(R) PRO/1000 Network Connection Version - 6.7.3 port 
0xbcc0-0xbcff mem 0xfe2e-0xfe2f irq 65 at device 8.0 on pci7

em1: Ethernet address: 00:11:43:e2:ff:fe
pcib8: ACPI PCI-PCI bridge at device 6.0 on pci0
pci8: ACPI PCI bus on pcib8
uhci0: Intel 82801EB (ICH5) USB controller USB-A port 0x9ce0-0x9cff 
irq 16 at device 29.0 on pci0

uhci0: [GIANT-LOCKED]
usb0: Intel 82801EB (ICH5) USB controller USB-A on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: Intel 82801EB (ICH5) USB controller USB-B port 0x9cc0-0x9cdf 
irq 19 at device 29.1 on pci0

uhci1: [GIANT-LOCKED]
usb1: Intel 82801EB (ICH5) USB controller USB-B on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: Intel 82801EB (ICH5) USB controller USB-C port 0x9ca0-0x9cbf 
irq 18 at device 29.2 on pci0

uhci2: [GIANT-LOCKED]
usb2: Intel 82801EB (ICH5) USB controller USB-C on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
ehci0: Intel 82801EB/R (ICH5) USB 2.0 controller mem 
0xfeb0-0xfeb003ff irq 23 at device 29.7 on pci0

ehci0: [GIANT-LOCKED]
usb3: EHCI version 1.0
usb3: companion controllers, 2 ports each: usb0 usb1 usb2
usb3: Intel 82801EB/R (ICH5) USB 2.0 controller on ehci0
usb3: USB revision 2.0
uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub3: 6 ports with 6 removable, self powered
uhub4: vendor 0x413c product 0xa001, class 9/0, rev 2.00/0.00, addr 2
uhub4: multiple transaction translators
uhub4: 2 ports with 2 removable, self powered
pcib9: ACPI PCI-PCI bridge at device 30.0 on pci0
pci9: ACPI PCI bus on pcib9
pci9: display, VGA at device 13.0 (no driver attached)
isab0: PCI-ISA bridge at device 31.0 on pci0
isa0: ISA bus on isab0
atapci0: Intel ICH5 UDMA100 controller port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 31.1 on pci0

ata0: ATA channel 0 on atapci0
ata1: ATA channel 1 on

Re: bad NFS/UDP performance

2008-09-26 Thread David Malone

On Fri, Sep 26, 2008 at 04:35:17PM +0300, Danny Braniss wrote:
 I know, but I get about 1mgb, which seems somewhat low :-(

Since UDP has no way to know how fast to send, you need to tell iperf
how fast to send the packets. I think 1Mbps is the default speed.

David.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: HELP DEBUG: FreeBSD 6.3-RELEASE-p3 TIMEOUT - WRITE_DMA + other strange behaviour!

2008-09-26 Thread Peter Jeremy

Hi Anton,

On 2008-Sep-26 15:13:19 +0300, Anton - Valqk [EMAIL PROTECTED] wrote:
you are right that the machine has *lots* ot hardware in it,
I was thinking of the power supply as a reason and measured the 5 and 12
volts - seemd to be ok 11.8 and 5.2 with all hardware in it.

A multimeter won't show noise or load spikes.  That said, if the PSU
is reasonably new and running well within its ratings, it shouldn't be
a problem.

1. remove rl0 and run only one isp for the test.

It's definitely worthwhile getting rid of rl(4) cards.  Read the top of
the driver source for reasons.

3. try to replace the ATA100 cables (the one with 80 wires) with an
older ones with only 40 cabels?

I wouldn't recommend this.  The 80-wire cables are electrically much
better than the 40-wire ones.  You might like to try a different
cable.  You should verify that the master/slave/MB sockets on the
cable are plugged into the correct device.  If you want to slow down
the ATA bus, I suggest you do it in software.

4. ? anything else?

Try disconnecting some of the disks and see if the problem goes
away - this would help rule out PSU problems.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgpMmVmLW3BJt.pgp
Description: PGP signature

Re: bad NFS/UDP performance

2008-09-26 Thread Kevin Oberman

David,

You beat me to it.

Danny, read the iperf man page:
   -b, --bandwidth n[KM]
  set  target  bandwidth to n bits/sec (default 1 Mbit/sec).  This
  setting requires UDP (-u).

The page needs updating, though. It should read -b, --bandwidth
n[KMG]. It also does NOT require -u. If you use -b, UDP is assumed.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: [EMAIL PROTECTED]   Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751


pgpx3jIT8TITC.pgp
Description: PGP signature

Re: HELP DEBUG: FreeBSD 6.3-RELEASE-p3 TIMEOUT - WRITE_DMA + other strange behaviour!

2008-09-26 Thread Peter Jeremy

On 2008-Sep-26 13:12:14 +0300, Anton - Valqk [EMAIL PROTECTED] wrote:
1. I get a lot of dma times outs. mostly on ad5 and ad7 where I keep
...
dmesg.today:ad7: FAILURE - WRITE_DMA48 status=51READY,DSC,ERROR
error=10NID_NOT_FOUND LBA=374303456

This is a bad sign and suggests dying disk but...

2. The other strange issue is that when (I guess) it starts timeouting
*sometimes* not everytime I'm loosing connection to xl0 or fxp0

You have an awful lot of hardware in this box.  Are you sure the
power supply and cooling is up to scratch?  Sagging power could
cause the problems you report, as could overheating.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgpwR8dFmRPsS.pgp
Description: PGP signature

sysctl maxfiles

2008-09-26 Thread Aristedes Maniatis


By default FreeBSD 7.0 shipped with the sysctls set to:

kern.maxfiles: 12328
kern.maxfilesperproc: 11095


We recently bumped up against these limits in an unfortunate way and  
we are going to raise them. I have some questions:


* why are the numbers set the way they are? They aren't round numbers,  
they aren't powers of 2. But they were arrived at somehow with  
planning and thought presumably, so when I increase them I'd like to  
know a bit more about why these numbers were chosen.


* why are the numbers so close together? Surely there should be more  
gap between max files per process and the max files for the whole  
system. What happens is that with one runaway broken process is that  
it hits 11095 and the 1233 files left for everything else is not  
enough (on many servers) to allow the admin to login using ssh. That  
gets very ugly very quickly.


* Under OSX (both server and client), these numbers are 12288 and  
10240. A bit more of a gap, but not terribly different to FreeBSD.  
Still interesting that someone changed these numbers just slightly.


* why do these controls exist at all? That is, if they were set to  
infinite what part of the system would be exhausted by a runaway  
process which kept opening files? Would the kernel run out of memory?  
What memory setting would be relevant here? I don't want to set  
maxfiles too high and then run out of some other resource which this  
maxfiles was protecting.



Thanks
Ari Maniatis





--
ish
http://www.ish.com.au
Level 1, 30 Wilson Street Newtown 2042 Australia
phone +61 2 9550 5001   fax +61 2 9550 4001
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: 7.1-PRERELEASE freezes

2008-09-26 Thread Jeremy Chadwick

On Fri, Sep 26, 2008 at 06:21:01PM +0200, Christian Laursen wrote:
 I decided to give 7.1-PRERELEASE a try on one of my machines to find
 out if there might be any problems I should be aware of.
 
 I quickly ran into problems. After a while the system freezes
 completely. It seems to be somehow related to the load of the machine
 as it doesn't seem to happen when it is idle. I built a kernel with
 software watchdog enabled and enabled watchdog which had the nice
 effect of turning the freeze into a panic. Hopefully that will be of
 some help.
 
 I first encountered the problem using SCHED_ULE and then tried if
 SCHED_4BSD made any difference. But the freeze happens with either
 scheduler.
 
 I have disabled xorg and the nvidia driver but that doesn't help
 either. I can cut down on various other stuff too, but first I hope
 that someone here have a more educated guess about what could be the
 cause of the freezes.
 
 I have placed the backtraces from the most recent crashes as well as
 the demsg output from the most recent boot at this URL:
 http://borderworlds.dk/~xi/7.1-PRERELEASE.freeze.txt
 
 My kernel config is also included.

 #0  doadump () at pcpu.h:196
 #1  0xc05abd03 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
 #2  0xc05abeff in panic (fmt=Variable fmt is not available.) at 
 /usr/src/sys/kern/kern_shutdown.c:572
 #3  0xc0570d18 in hardclock (usermode=0, pc=3231434181) at 
 /usr/src/sys/kern/kern_clock.c:642
 #4  0xc07d194f in clkintr (frame=0xe38e1c68) at 
 /usr/src/sys/i386/isa/clock.c:164
 #5  0xc07c0465 in intr_execute_handlers (isrc=0xc0866700, frame=0xe38e1c68) 
 at /usr/src/sys/i386/i386/intr_machdep.c:366
 #6  0xc07d0fa8 in atpic_handle_intr (vector=0, frame=0xe38e1c68) at 
 /usr/src/sys/i386/isa/atpic.c:596
 #7  0xc07bbf41 in Xatpic_intr0 () at atpic_vector.s:62
 #8  0xc09bc5c5 in acpi_cpu_c1 () at 
 /usr/src/sys/modules/acpi/acpi/../../../i386/acpica/acpi_machdep.c:550
 #9  0xc09b54f4 in acpi_cpu_idle () at 
 /usr/src/sys/modules/acpi/acpi/../../../dev/acpica/acpi_cpu.c:945
 #10 0xc07c35b6 in cpu_idle () at /usr/src/sys/i386/i386/machdep.c:1183
 #11 0xc05c9275 in sched_idletd (dummy=0x0) at 
 /usr/src/sys/kern/sched_4bsd.c:1429
 #12 0xc05895d6 in fork_exit (callout=0xc05c9260 sched_idletd, arg=0x0, 
 frame=0xe38e1d38) at /usr/src/sys/kern/kern_fork.c:804
 #13 0xc07bbf10 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:264
 

A couple generic things, although I think jhb@ might be able to figure
out what's going on here:

1) Is this machine running the latest BIOS available?
2) Are you running powerd(8) on this box?
3) Does disabling ACPI (it's a menu option when booting) help?
4) Does removing device cpufreq help?

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: sysctl maxfiles

2008-09-26 Thread Jeremy Chadwick

On Sat, Sep 27, 2008 at 11:10:01AM +1000, Aristedes Maniatis wrote:
 By default FreeBSD 7.0 shipped with the sysctls set to:

 kern.maxfiles: 12328
 kern.maxfilesperproc: 11095

 We recently bumped up against these limits in an unfortunate way and we 
 are going to raise them. I have some questions:

 * why are the numbers set the way they are? They aren't round numbers,  
 they aren't powers of 2. But they were arrived at somehow with planning 
 and thought presumably, so when I increase them I'd like to know a bit 
 more about why these numbers were chosen.

The values are calculated when the kernel is loaded, based on many other
parameters; you won't find 12328 hard-coded anywhere in the kernel
source, for example.

The Handbook goes over this fact:

http://www.freebsd.org/doc/en/books/handbook/configtuning-kernel-limits.html

By the way, DO NOT let the term maxusers make you think that has
something to do with the number of users which can be logged in
simultaneously or added to a box.  It has nothing to do with that.

Anyway, I'd like to know why you have so many fds open simultaneously in
the first place.  We're talking over 11,000 fds actively open at once --
this is not a small number.  What exactly is this machine doing?  Are
you absolutely certain tuning this higher is justified?  Have you looked
into the possibility that you have a program which is exhausting fds by
not closing them when finished?  (Yes, this is quite common; I've seen
bad Java code cause this problem on Solaris.)

 * why are the numbers so close together? Surely there should be more gap 
 between max files per process and the max files for the whole system. 
 What happens is that with one runaway broken process is that it hits 
 11095 and the 1233 files left for everything else is not enough (on many 
 servers) to allow the admin to login using ssh. That gets very ugly very 
 quickly.

Others will have to comment on this.

 * Under OSX (both server and client), these numbers are 12288 and 10240. 
 A bit more of a gap, but not terribly different to FreeBSD. Still 
 interesting that someone changed these numbers just slightly.

OS X isn't based on FreeBSD 7.  The calculation logic has changed over
time.

 * why do these controls exist at all? That is, if they were set to  
 infinite what part of the system would be exhausted by a runaway process 
 which kept opening files? Would the kernel run out of memory? What memory 
 setting would be relevant here? I don't want to set maxfiles too high and 
 then run out of some other resource which this maxfiles was protecting.

You're asking for trouble setting these values to the equivalent of
unlimited.  Instead of asking what would happen, you should be asking
why would I need to do that.

Regarding memory implications, the Handbook goes over it.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

2008-09-26 Thread Derek Kuliński

Hello Jeremy,

Sunday, September 21, 2008, 3:07:20 PM, you wrote:

 Consider using background_fsck=no in /etc/rc.conf if you prefer the
 old behaviour.  Otherwise, boot single-user then do the fsck.

Actually what's the advantage of having fsck run in background if it
isn't capable of fixing things?
Isn't it more dangerous to be it like that? i.e. administrator might
not notice the problem; also filesystem could break even further...

-- 
Best regards,
 Derekmailto:[EMAIL PROTECTED]

I tried to daydream, but my mind kept wandering.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

2008-09-26 Thread Jeremy Chadwick

On Fri, Sep 26, 2008 at 09:33:41PM -0700, Derek Kuli??ski wrote:
 Hello Jeremy,
 
 Sunday, September 21, 2008, 3:07:20 PM, you wrote:
 
  Consider using background_fsck=no in /etc/rc.conf if you prefer the
  old behaviour.  Otherwise, boot single-user then do the fsck.
 
 Actually what's the advantage of having fsck run in background if it
 isn't capable of fixing things?
 Isn't it more dangerous to be it like that? i.e. administrator might
 not notice the problem; also filesystem could break even further...

This question should really be directed at a set of different folks,
e.g. actual developers of said stuff (UFS2 and soft updates in
specific), because it's opening up a can of worms.

I believe it has to do with the fact that there is much faith given to
UFS2 soft updates -- the ability to background fsck allows the user to
boot their system and have it up and working (able to log in, etc.) in a
much shorter amount of time[1].  It makes the assumption that everything
will work just fine, which is faulty.

It also gives the impression of a journalled filesystem, which UFS2 soft
updates are not.  gjournal(8) on the other hand, is, and doesn't require
fsck at all[2].

I also think this further adds fuel to the so why are we enabling soft
updates by default and using UFS2 as a filesystem again? fire.  I'm
sure someone will respond to this with So use ZFS and shut up.  *sigh*

[1]: 
http://lists.freebsd.org/pipermail/freebsd-questions/2004-December/069114.html
[2]: http://lists.freebsd.org/pipermail/freebsd-questions/2008-April/173501.html

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

2008-09-26 Thread Derek Kuliński

Hello Jeremy,

Friday, September 26, 2008, 10:14:13 PM, you wrote:

 Actually what's the advantage of having fsck run in background if it
 isn't capable of fixing things?
 Isn't it more dangerous to be it like that? i.e. administrator might
 not notice the problem; also filesystem could break even further...

 This question should really be directed at a set of different folks,
 e.g. actual developers of said stuff (UFS2 and soft updates in
 specific), because it's opening up a can of worms.

 I believe it has to do with the fact that there is much faith given to
 UFS2 soft updates -- the ability to background fsck allows the user to
 boot their system and have it up and working (able to log in, etc.) in a
 much shorter amount of time[1].  It makes the assumption that everything
 will work just fine, which is faulty.

As far as I know (at least ideally, when write caching is disabled)
the data should always be consistent, and all fsck supposed to be
doing is to free unreferenced blocks that were allocated.
Wouldn't be possible for background fsck to do that while the
filesystem is mounted, and if there's some unrepairable error, that
somehow happen (while in theory it should be impossible) just
periodically scream on the emergency log level?

 It also gives the impression of a journalled filesystem, which UFS2 soft
 updates are not.  gjournal(8) on the other hand, is, and doesn't require
 fsck at all[2].

 I also think this further adds fuel to the so why are we enabling soft
 updates by default and using UFS2 as a filesystem again? fire.  I'm
 sure someone will respond to this with So use ZFS and shut up.  *sigh*

I think the reason for using Soft Updates by default is that it was
a pretty hard thing to implement, and (at least in theory it supposed
by as reliable as journaling.

Also, if I remember correctly, PJD said that gjournal is performing
much better with small files, while softupdates is faster with big
ones.

-- 
Best regards,
 Derekmailto:[EMAIL PROTECTED]

Programmers are tools for converting caffeine into code.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

48 matches

Mail list logo