Fresh 7.0 Install: Fatal Trap 12 panic when put under load
I am experiencing 'random' reboots interspersed with panics whenever I put a newly installed system under load (make index in /usr/ports is enough). A sample panic is at the end of this email. I have updated to 7.0-RELEASE-p2 using the GENERIC amd64 kernel and it is still the same. The system is a Gigabyte GA-M56S-S3 motherboard with 4GB of RAM, an Athlon X2 6400+ and 3 x Maxtor SATA 750GB HDD's (only the first is currently in use). The first disk is all allocated to FreeBSD using UFS. There is also a Linksys 802.11a/b/g card installed. I have flashed the BIOS to the latest revision (F4e). The onboard RAID is disabled. At the moment there is no exotic software installed. Although I have been using FreeBSD for a number of years this is the first time I have experienced regular panics and am at a complete loss trying to work out what is wrong. I would be grateful for any advice anyone is willing to give to help me troubleshoot this issue. Thanks in advance John Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x80b0 fault code - supervisor write data, page not present instruction pointer = 0x8:0x804db18c stack pointer = 0x10:b1e92450 frame pointer = 0x10:ffec code segment = base 0x0, limit 0xf, type 0x16, DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interupt enabled, resume, IOPL = 0 current processkernel trap 12 with interrupts disabled #nm -n /boot/kernel/kernel | grep 804db 804dbac0 t flushbufqueues ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load
On Tue, Jul 15, 2008 at 10:58:19AM +0100, John Sullivan wrote: I am experiencing 'random' reboots interspersed with panics whenever I put a newly installed system under load (make index in /usr/ports is enough). A sample panic is at the end of this email. I have updated to 7.0-RELEASE-p2 using the GENERIC amd64 kernel and it is still the same. The system is a Gigabyte GA-M56S-S3 motherboard with 4GB of RAM, an Athlon X2 6400+ and 3 x Maxtor SATA 750GB HDD's (only the first is currently in use). The first disk is all allocated to FreeBSD using UFS. There is also a Linksys 802.11a/b/g card installed. I have flashed the BIOS to the latest revision (F4e). The onboard RAID is disabled. At the moment there is no exotic software installed. Although I have been using FreeBSD for a number of years this is the first time I have experienced regular panics and am at a complete loss trying to work out what is wrong. I would be grateful for any advice anyone is willing to give to help me troubleshoot this issue. Can the system in question run memtest86+ successfully (no errors) for an hour? It would help diminish (but not entirely rule out) hardware (memory or chipset) issues. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
RE: Fresh 7.0 Install: Fatal Trap 12 panic when put under load
Can the system in question run memtest86+ successfully (no errors) for an hour? It would help diminish (but not entirely rule out) hardware (memory or chipset) issues. Sorry, forgot to mention, I ran memtest over night without any problem reported. I ran Fedora 9 for a month without any issue - FreeBSD 7.0 crashes within an hour. John ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load
John Sullivan wrote: Can the system in question run memtest86+ successfully (no errors) for an hour? It would help diminish (but not entirely rule out) hardware (memory or chipset) issues. Sorry, forgot to mention, I ran memtest over night without any problem reported. I ran Fedora 9 for a month without any issue - FreeBSD 7.0 crashes within an hour. Well, that doesn't rule out hardware failure. Different OSes may use different capabilities of the hardware, or just use it in a different way, and that can provoke failures from marginal hardware. Please collect kgdb/ddb backtraces. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Using iscsi with multiple targets
On Mon, 2008-07-14 at 11:29 +0300, Danny Braniss wrote: FreeBSD 7.0 I have 2 machines with identical configurations/hardware, let's call them A (master) and B (slave). I have installed iscsi-target from ports and have set up 3 targets representing the 3 drives I wish to be connected to from A. The Targets file: # extents filestart length extent0 /dev/da10 465GB extent1 /dev/da20 465GB extent2 /dev/da30 465GB # targetflags storage netmask target0 rw extent0 192.168.0.1/24 target1 rw extent1 192.168.0.1/24 target2 rw extent2 192.168.0.1/24 I then start up iscsi_target and all is good. Now on A I have set up my /etc/iscsi.conf file as follows: # cat /etc/iscsi.conf data1 { targetaddress=192.168.0.252 targetname=iqn.1994-04.org.netbsd.iscsi-target:target0 initiatorname=iqn.2005-01.il.ac.huji.cs::BSD-2-1.sven.local } data2 { targetaddress=192.168.0.252 targetname=iqn.1994-04.org.netbsd.iscsi-target:target1 initiatorname=iqn.2005-01.il.ac.huji.cs::BSD-2-1.sven.local } data3 { targetaddress=192.168.0.252 targetname=iqn.1994-04.org.netbsd.iscsi-target:target2 initiatorname=iqn.2005-01.il.ac.huji.cs::BSD-2-1.sven.local } So far so good, now come the issues. First of all, it would appear that with iscontrol one can only start one named session at a time; for example /sbin/iscontrol -n data1 /sbin/iscontrol -n data2 /sbin/isconrtol -n data3 I guess that is ok, except that each invocation of iscontrol resets the other sessions. Here is the camcontrol and dmesg output from running the above 3 commands. # camcontrol devlist AMCC 9550SXU-8L DISK 3.08at scbus0 target 0 lun 0 (pass0,da0) AMCC 9550SXU-8L DISK 3.08at scbus0 target 1 lun 0 (pass1,da1) AMCC 9550SXU-8L DISK 3.08at scbus0 target 2 lun 0 (pass2,da2) AMCC 9550SXU-8L DISK 3.08at scbus0 target 3 lun 0 (pass3,da3) NetBSD NetBSD iSCSI 0at scbus1 target 0 lun 0 (da5,pass5) NetBSD NetBSD iSCSI 0at scbus1 target 1 lun 0 (da6,pass6) NetBSD NetBSD iSCSI 0at scbus1 target 2 lun 0 (da4,pass4) [ /sbin/iscontrol -n data1 ] da4 at iscsi0 bus 0 target 0 lun 0 da4: NetBSD NetBSD iSCSI 0 Fixed Direct Access SCSI-3 device [ /sbin/iscontrol -n data2 ] (da4:iscsi0:0:0:0): lost device (da4:iscsi0:0:0:0): removing device entry da4 at iscsi0 bus 0 target 0 lun 0 da4: NetBSD NetBSD iSCSI 0 Fixed Direct Access SCSI-3 device da5 at iscsi0 bus 0 target 1 lun 0 da5: NetBSD NetBSD iSCSI 0 Fixed Direct Access SCSI-3 device [ /sbin/iscontrol -n data3 ] (da4:iscsi0:0:0:0): lost device (da4:iscsi0:0:0:0): removing device entry (da5:iscsi0:0:1:0): lost device (da5:iscsi0:0:1:0): removing device entry da4 at iscsi0 bus 0 target 2 lun 0 da4: NetBSD NetBSD iSCSI 0 Fixed Direct Access SCSI-3 device da5 at iscsi0 bus 0 target 0 lun 0 da5: NetBSD NetBSD iSCSI 0 Fixed Direct Access SCSI-3 device da6 at iscsi0 bus 0 target 1 lun 0 da6: NetBSD NetBSD iSCSI 0 Fixed Direct Access SCSI-3 device It would appear that rather than appending the new device to the end of the da devices, it starts to do some type of naming queue after the second device. If I am to use these devices in any type of automated setup, how can make sure that after these commands, da6 will always be target 1 (i.e. /dev/da2 on the slave machine). Next, there is no startup script for iscontrol - would that simply have to be added the system or is there a way with sysctl that it could be done. The plan here is use gmirror such that /dev/da1 on A is mirrored with the /dev/da1 on B using iscsi. Hi Sven, I just tried it here, and it seems that at the end all is ok :-) I think the lost/removing/found has something to do to iscontrol calling camcontrol rescan - I will check this later, but the end result is that you should have all /dev/da's. I don't see any reasonable safe way to tie a scsi# (/dev/dan), except to label (see glabel) the disk. The startup script is, at the moment, not trivial, but I'm attaching it, so someone can suggest improvements :-) #!/bin/sh # PROVIDE: iscsi # REQUIRE: NETWORKING # BEFORE: DAEMON # KEYWORD: nojail shutdown # # Add the following lines to /etc/rc.conf to enable iscsi: # # iscsi_enable=YES # iscsi_fstab=/etc/fstab.iscsi . /etc/rc.subr . /cs/share/etc/rc.subr name=iscsi rcvar=`set_rcvar` command=/sbin/iscontrol iscsi_enable=${iscsi_enable:-NO} iscsi_fstab=${iscsi_fstab:-/etc/fstab.iscsi} iscsi_exports=${iscsi_exports:-/etc/exports.iscsi} iscsi_debug=${iscsi_debug:-0}
Scour.com invite from S. M. Ibrahim lavlu
Hey, Did you hear about Scour? It is the next gen search engine with Google/Yahoo/MSN results and user comments all on one page. Best of all we get paid for using it by earning points with every search, comment and vote. The points are redeemable for Visa gift cards! It's like earning credit card or airline points just for searching! Hit the link below to join for free and we will both get points! U http://scour.com/invite/lavluda/ I know you'll like it! - S. M. Ibrahim lavlu ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Scour.com invite from S. M. Ibrahim lavlu
Hey, Did you hear about Scour? It is the next gen search engine with Google/Yahoo/MSN results and user comments all on one page. Best of all we get paid for using it by earning points with every search, comment and vote. The points are redeemable for Visa gift cards! It's like earning credit card or airline points just for searching! Hit the link below to join for free and we will both get points! U http://scour.com/invite/lavluda/ I know you'll like it! - S. M. Ibrahim lavlu ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Multi-machine mirroring choices
With the introduction of zfs to FreeBSD 7.0, a door has opened for more mirroring options so I would like to get some opinions on what direction I should take for the following scenario. Basically I have 2 machines that are clones of each other (master and slave) wherein one will be serving up samba shares. Each server has one disk to hold the OS (not mirrored) and then 3 disks, each of which will be its own mountpoint and samba share. The idea is to create a mirror of each of these disks on the slave machine so that in the event the master goes down, the slave can pick up serving the samba shares (I am using CARP as the samba server IP address). My initial thought was to have the slave set up as an iscsi target and then have the master connect to each drive, then create a gmirror or zpool mirror using local_data1:iscsi_data1, local_data2:iscsi_data2, and local_data3:iscsi_data3. After some feedback (P.French for example) it would appear as though iscsi may not be the way to go for this as it locks up when the target goes down and even though I may be able to remove the target from the mirror, that process may fail as the disk remains in D state. So that leaves me with the following options: 1) ggated/ggatec + gmirror 2) ggated/ggatec + zfs (zpool mirror) 3) zfs send/recv incremental snapshots (ssh) 1) I have been using ggated/ggatec on a set of 6.2-REL boxes and find that ggated tends to fail after some time leaving me rebuilding the mirror periodically (and gmirror resilvering takes quite some time). Has ggated/ggatec performance and stability improved in 7.0? This combination does work, but it is high maintenance and automating it is a bit painful (in terms of re-establishing the gmirror and rebuilding and making sure the master machine is the one being read from). 2) Noting the issues with ggated/ggatec in (1), would a zpool be better at rebuilding the mirror? I understand that it can better determine which drive of the mirror is out of sync than can gmirror so a lot of the insert rebuild manipulations used with gmirror would not be needed here. 3) The send/recv feature of zfs was something I had not even considered until very recently. My understanding is that this would work by a) taking a snapshot of master_data1 b) zfs sending that snapshot to slave_data1 c) via ssh on pipe, receiving that snapshot on slave_data1 and then d) doing incremental snapshots, sending, receiving as in (a)(b)(c). How time/cpu intensive is the snapshot generation and just how granular could this be done? I would imagine for systems with litle traffic/changes this could be practical but what about systems that may see a lot of files added, modified, deleted to the filesystem(s)? I would be interested to hear anyone's experience with any (or all) of these methods and caveats of each. I am leaning towards ggate(dc) + zpool at the moment assuming that zfs can smartly rebuild the mirror after the slave's ggated processes bug out. Sven signature.asc Description: This is a digitally signed message part
Re: Multi-machine mirroring choices
On Tue, Jul 15, 2008 at 10:07:14AM -0400, Sven Willenberger wrote: 3) The send/recv feature of zfs was something I had not even considered until very recently. My understanding is that this would work by a) taking a snapshot of master_data1 b) zfs sending that snapshot to slave_data1 c) via ssh on pipe, receiving that snapshot on slave_data1 and then d) doing incremental snapshots, sending, receiving as in (a)(b)(c). How time/cpu intensive is the snapshot generation and just how granular could this be done? I would imagine for systems with litle traffic/changes this could be practical but what about systems that may see a lot of files added, modified, deleted to the filesystem(s)? I can speak a bit on ZFS snapshots, because I've used them in the past with good results. Compared to UFS2 snapshots (e.g. dump -L or mksnap_ffs), ZFS snapshots are fantastic. The two main positives for me were: 1) ZFS snapshots take significantly less time to create; I'm talking seconds or minutes vs. 30-45 minutes. I also remember receiving mail from someone (on -hackers? I can't remember -- let me know and I can dig through my mail archives for the specific mail/details) stating something along the lines of over time, yes, UFS2 snapshots take longer and longer, it's a known design problem. 2) ZFS snapshots, when created, do not cause the system to more or less deadlock until the snapshot is generated; you can continue to use the system during the time the snapshot is being generated. While with UFS2, dump -L and mksnap_ffs will surely disappoint you. We moved all of our production systems off of using dump/restore solely because of these aspects. We didn't move to ZFS though; we went with rsync, which is great, except for the fact that it modifies file atimes (hope you use Maildir and not classic mbox/mail spools...). ZFS's send/recv capability (over a network) is something I didn't have time to experiment with, but it looked *very* promising. The method is documented in the manpage as Example 12, and is very simple -- as it should be. You don't have to use SSH either, by the way[1]. One of the annoyances to ZFS snapshots, however, was that I had to write my own script to do snapshot rotations (think incremental dump(8) but using ZFS snapshots). I would be interested to hear anyone's experience with any (or all) of these methods and caveats of each. I am leaning towards ggate(dc) + zpool at the moment assuming that zfs can smartly rebuild the mirror after the slave's ggated processes bug out. I don't have any experience with GEOM gate, so I can't comment on it. But I would highly recommend you discuss the shortcomings with pjd@, because he definitely listens. However, I must ask you this: why are you doing things the way you are? Why are you using the equivalent of RAID 1 but for entire computers? Is there some reason you aren't using a filer (e.g. NetApp) for your data, thus keeping it centralised? There has been recent discussion of using FreeBSD with ZFS as such, over on freebsd-fs. If you want a link to the thread, I can point you to it. I'd like to know why you're doing things the way you are. By knowing why, possibly myself or others could recommend solving the problem in a different way -- one that doesn't involve realtime duplication of filesystems via network. [1]: If you're transferring huge sums of data over a secure link (read: dedicated gigE LAN or a separate VLAN), you'll be disappointed to find that there is no Cipher=none with stock SSH; the closest you'll get is blowfish-cbc. You might be saddened by the fact that the only way you'll get Cipher=none is via the HPN patches, which means you'll be forced to install ports/security/openssh-portable. (I am not a fan of the overwrite the base system concept; it's a hack, and I'd rather get rid of the whole base system concept in general -- but that's for another discussion). My point is, your overall network I/O will be limited by SSH, so if you're pushing lots of data across a LAN, consider something without encryption. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Multi-machine mirroring choices
Jeremy Chadwick wrote: Compared to UFS2 snapshots (e.g. dump -L or mksnap_ffs), ZFS snapshots are fantastic. The two main positives for me were: 1) ZFS snapshots take significantly less time to create; I'm talking seconds or minutes vs. 30-45 minutes. I also remember receiving mail from someone (on -hackers? I can't remember -- let me know and I can dig through my mail archives for the specific mail/details) stating something along the lines of over time, yes, UFS2 snapshots take longer and longer, it's a known design problem. 2) ZFS snapshots, when created, do not cause the system to more or less deadlock until the snapshot is generated; you can continue to use the system during the time the snapshot is being generated. While with UFS2, dump -L and mksnap_ffs will surely disappoint you. a known design problem in the sense of intentional, yes. They were written to support bg fsck, not as a lightweight filesystem feature for general use. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Multi-machine mirroring choices
However, I must ask you this: why are you doing things the way you are? Why are you using the equivalent of RAID 1 but for entire computers? Is there some reason you aren't using a filer (e.g. NetApp) for your data, thus keeping it centralised? I am not the roiginal poster, but I am doing something very similar and can answer that question for you. Some people get paranoid about the whole single point of failure thing. I originally suggestted that we buy a filer and have identical servers so if one breaks we connect the other to the filer, but the response I got was what if the filer breaks?. So in the end I had to show we have duplicate independent machines, with the data kept symetrical on them at all times. It does actually work quite nicely actually - I have an 'active database machine, and a passive. The opassive is only used if the active fails, and the drives are run as a gmirror pair with the remote one being mounted using ggated. It also means I can flip from active to passive when I want to do an OS upgrade on the active machine. Switching takes a few seconds, and this is fine for our setup. So the answer is that the descisiuon was taken out of my hands - but this is not uncommon, and as a roll-your-own cluster it works very nicely. -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Multi-machine mirroring choices
Sven Willenberger wrote: [...] 1) I have been using ggated/ggatec on a set of 6.2-REL boxes and find that ggated tends to fail after some time leaving me rebuilding the mirror periodically (and gmirror resilvering takes quite some time). Has ggated/ggatec performance and stability improved in 7.0? This combination does work, but it is high maintenance and automating it is a bit painful (in terms of re-establishing the gmirror and rebuilding and making sure the master machine is the one being read from). First, some problems in ggated/ggatec have been fixed between 6.2 and 6.3. Second, you should tune it a little to improve performance and stability. The following reply in an earlier thread is interesting: http://lists.freebsd.org/pipermail/freebsd-stable/2008-January/039722.html 2) Noting the issues with ggated/ggatec in (1), would a zpool be better at rebuilding the mirror? I understand that it can better determine which drive of the mirror is out of sync than can gmirror so a lot of the insert rebuild manipulations used with gmirror would not be needed here. I don't think there's much of a difference between gmirror and a ZFS mirror if used with ggated/ggatec. Of course, ZFS has more advantages, like checksumming, snapshots etc., but also the disadvantages that it requires considerably more memory. Yet another way would be to use DragoFly's Hammer file system which is part of DragonFly BSD 2.0 which will be released in a few days. It supports remote mirroring, i.e. mirror source and mirror target can run on different machines. Of course it is still very new and experimental (however, ZFS is marked experimental, too), so you probably don't want to use it on critical production machines. (YMMV, of course.) Best regards Oliver -- Oliver Fromme, secnetix GmbH Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd PI: int f[9814],b,c=9814,g,i;long a=1e4,d,e,h; main(){for(;b=c,c-=14;i=printf(%04d,e+d/a),e=d%a) while(g=--b*2)d=h*b+a*(i?f[b]:a/5),h=d/--g,f[b]=d%g;} ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: how to get more logging from GEOM?
Jo Rhett [EMAIL PROTECTED] wrote: About 10 days ago one of my personal machines started hanging at random. This is the first bit of instability I've ever experienced on this machine (2+ years running) FreeBSD triceratops.netconsonance.com 6.2-RELEASE-p11 FreeBSD 6.2- RELEASE-p11 #0: Wed Feb 13 06:44:57 UTC 2008 [EMAIL PROTECTED] :/usr/obj/usr/src/sys/GENERIC i386 After about 2 weeks of watching it carefully I've learned almost nothing. It's not a disk failure (AFAIK) it's not cpu overheat (now running healthd without complaints) it's not based on any given network traffic... however it does appear to accompany heavy cpu/disk activity. It usually dies when indexing my websites at night (but not always) and it sometimes dies when compiling programs. Just heavy disk isn't enough to do the job, as backups proceed without problems. Heavy cpu by itself isn't enough to do it either. But if I start compiling things and keep going a while, it will eventually hang. I had exactly the same problems on a machine a few months ago. It had also been running for about two years, then started freezing when there was high CPU + disk activity. It turned out that the power supply went weak (either the power supply itself or the voltage regulators on the main- board). Replacing PS + mainboard solved the problem. Best regards Oliver -- Oliver Fromme, secnetix GmbH Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd C++ is the only current language making COBOL look good. -- Bertrand Meyer ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Multi-machine mirroring choices
On Tue, 2008-07-15 at 07:54 -0700, Jeremy Chadwick wrote: On Tue, Jul 15, 2008 at 10:07:14AM -0400, Sven Willenberger wrote: 3) The send/recv feature of zfs was something I had not even considered until very recently. My understanding is that this would work by a) taking a snapshot of master_data1 b) zfs sending that snapshot to slave_data1 c) via ssh on pipe, receiving that snapshot on slave_data1 and then d) doing incremental snapshots, sending, receiving as in (a)(b)(c). How time/cpu intensive is the snapshot generation and just how granular could this be done? I would imagine for systems with litle traffic/changes this could be practical but what about systems that may see a lot of files added, modified, deleted to the filesystem(s)? I can speak a bit on ZFS snapshots, because I've used them in the past with good results. Compared to UFS2 snapshots (e.g. dump -L or mksnap_ffs), ZFS snapshots are fantastic. The two main positives for me were: 1) ZFS snapshots take significantly less time to create; I'm talking seconds or minutes vs. 30-45 minutes. I also remember receiving mail from someone (on -hackers? I can't remember -- let me know and I can dig through my mail archives for the specific mail/details) stating something along the lines of over time, yes, UFS2 snapshots take longer and longer, it's a known design problem. 2) ZFS snapshots, when created, do not cause the system to more or less deadlock until the snapshot is generated; you can continue to use the system during the time the snapshot is being generated. While with UFS2, dump -L and mksnap_ffs will surely disappoint you. We moved all of our production systems off of using dump/restore solely because of these aspects. We didn't move to ZFS though; we went with rsync, which is great, except for the fact that it modifies file atimes (hope you use Maildir and not classic mbox/mail spools...). ZFS's send/recv capability (over a network) is something I didn't have time to experiment with, but it looked *very* promising. The method is documented in the manpage as Example 12, and is very simple -- as it should be. You don't have to use SSH either, by the way[1]. The examples do list ssh as the way of initiating the receiving end; I am curious as to what the alterative would be (short of installing openssh-portable and using cipher=no). One of the annoyances to ZFS snapshots, however, was that I had to write my own script to do snapshot rotations (think incremental dump(8) but using ZFS snapshots). That is what I was afraid of. Using snapshots would seem to involve a bit of housekeeping. Furthermore, it sounds more suited to a system that needs periodic rather than constant backing up (syncing). I would be interested to hear anyone's experience with any (or all) of these methods and caveats of each. I am leaning towards ggate(dc) + zpool at the moment assuming that zfs can smartly rebuild the mirror after the slave's ggated processes bug out. I don't have any experience with GEOM gate, so I can't comment on it. But I would highly recommend you discuss the shortcomings with pjd@, because he definitely listens. However, I must ask you this: why are you doing things the way you are? Why are you using the equivalent of RAID 1 but for entire computers? Is there some reason you aren't using a filer (e.g. NetApp) for your data, thus keeping it centralised? There has been recent discussion of using FreeBSD with ZFS as such, over on freebsd-fs. If you want a link to the thread, I can point you to it. Basically I am trying to eliminate the single point of failure. The project prior to this had such a failure that even a raid5 setup could not get out of. It was determined at that point that a single-machine storage solution would no longer suffice. What I am trying to achieve is having a slave machine that could take over as the file server in the event the master machine goes down. This could be something as simple as the master's network connection going down (CARP to the rescue on the slave) to a complete failure of the master. While zfs send/recv sounds like a good option for periodic backups, I don't think it will fit my purpose. Zpool or gmirror will be a better fit. I see in posts following my initial post that there is reference to improvements in ggate[cd] and/or tcp since 6.2 (and I have moved to 7.0 now) so that bodes well. The question then becomes a matter of which system would be easier to manage in terms of a) the master rebuilding the mirror in the event the slave goes down or ggate[cd] disconnects and b) have the slave become the master for serving files and mounting the drives that were part of the mirror. Thanks to the other posters, I see others are doing what I am trying to accomplish and there were some additional tuneables for ggate[cd] that I had not yet incorporated that were mentioned. Sven signature.asc Description: This is a digitally
Re: Multi-machine mirroring choices
Pete French wrote: I am not the roiginal poster, but I am doing something very similar and can answer that question for you. Some people get paranoid about the whole single point of failure thing. I originally suggestted that we buy a filer and have identical servers so if one breaks we connect the other to the filer, but the response I got was what if the filer breaks?. You install a filer cluster with two nodes. Then there is no single point of failure. I've done exactly that at customers of my company, i.e. set up NetApp filer clusters. Any disk can fail, any shelf can fail, any filer head can fail. A complete filer can fail. A switch can fail. The system will keep running and doing its job. And yes, we've tested all of that. Whether filers solve your problems is a different thing. I just pointed out the answer to the question what if the filer breaks?. I'm not a NetApp salesman. ;-) Best regards Oliver -- Oliver Fromme, secnetix GmbH Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd To this day, many C programmers believe that 'strong typing' just means pounding extra hard on the keyboard. -- Peter van der Linden ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Multi-machine mirroring choices
Oliver Fromme wrote: Yet another way would be to use DragoFly's Hammer file system which is part of DragonFly BSD 2.0 which will be released in a few days. It supports remote mirroring, i.e. mirror source and mirror target can run on different machines. Of course it is still very new and experimental (however, ZFS is marked experimental, too), so you probably don't want to use it on critical production machines. Let's not get carried away here :) Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: installdate of a port/package?
Ronald Klop wrote: I just upgraded a machine from FreeBSD 6 to 7. Very nice. But my portupgrade -fa failed after a while. How can I know which ports/packages are still from FreeBSD 6? Is there a datee recorded somewhere or the FreeBSD-version of the port/package? The date of the files in /var/db/pkg/* is unreliable, because installing a package gives these files the date of the files in the package. Sorry for th late reply, I didn't see this thread earlier. You can look at the ctime of the +DESC files. That should be the time when the packages were installed. This command will list all packages in the order they were installed: ls -lcrt /var/db/pkg/*/+DESC Best regards Oliver -- Oliver Fromme, secnetix GmbH Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd (On the statement print 42 monkeys + 1 snake:) By the way, both perl and Python get this wrong. Perl gives 43 and Python gives 42 monkeys1 snake, when the answer is clearly 41 monkeys and 1 fat snake.-- Jim Fulton ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Multi-machine mirroring choices
You install a filer cluster with two nodes. Then there is no single point of failure. Yes, that would be my choice too. Unfortunately it didn't get done that way. Mind you, the solution we do have is something I am actually pretty happy with - it's cheap and does the job. We never wanted 100% uuptime after all, just something so I could get stuff back up and running in about 15 minutes with no loss of data if possible. Woiuld have been nice to get to play with the NetApp stuff though. -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
taskqueue timeout
Hi everyone, I'm wondering if the problems described in the following link have been resolved: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2008-02/msg00211.html I've got four 500GB SATA disks in a ZFS raidz pool, and all four of them are experiencing the behavior. The problem only happens with extreme disk activity. The box becomes unresponsive (can not SSH etc). Keyboard input is displayed on the console, but the commands are not accepted. Is there anything I can do to either figure this out, or work around it? Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Multi-machine mirroring choices
On Tue, Jul 15, 2008 at 07:54:26AM -0700, Jeremy Chadwick wrote: One of the annoyances to ZFS snapshots, however, was that I had to write my own script to do snapshot rotations (think incremental dump(8) but using ZFS snapshots). There is a PR[1] to get something like this in the ports tree. I have no idea how good it is but I hope to get it in the tree soon. http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/125340 -- WXS ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: igb doesn't compile in STABLE?
At Mon, 14 Jul 2008 14:53:16 -0700, Jack Vogel wrote: Just guessing, did someone change conf/files maybe?? If you build a STABLE kernel with igb AND em then things work and the kernel uses em. I'm not sure which thing needs to be changed in conf/files or otherwise though. Later, George ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: igb doesn't compile in STABLE?
Oh, so the problem is if igb alone is defined? On Tue, Jul 15, 2008 at 10:04 AM, [EMAIL PROTECTED] wrote: At Mon, 14 Jul 2008 14:53:16 -0700, Jack Vogel wrote: Just guessing, did someone change conf/files maybe?? If you build a STABLE kernel with igb AND em then things work and the kernel uses em. I'm not sure which thing needs to be changed in conf/files or otherwise though. Later, George ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Multi-machine mirroring choices
Wesley Shields wrote: On Tue, Jul 15, 2008 at 07:54:26AM -0700, Jeremy Chadwick wrote: One of the annoyances to ZFS snapshots, however, was that I had to write my own script to do snapshot rotations (think incremental dump(8) but using ZFS snapshots). There is a PR[1] to get something like this in the ports tree. I have no idea how good it is but I hope to get it in the tree soon. http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/125340 There is also sysutils/freebsd-snapshot (pkg-descr is out of date, it supports ZFS too). I found it more convenient to just write my own tiny script. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: taskqueue timeout
:Hi everyone, : :I'm wondering if the problems described in the following link have been :resolved: : :http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2008-02/msg00211.html : :I've got four 500GB SATA disks in a ZFS raidz pool, and all four of them :are experiencing the behavior. : :The problem only happens with extreme disk activity. The box becomes :unresponsive (can not SSH etc). Keyboard input is displayed on the :console, but the commands are not accepted. : :Is there anything I can do to either figure this out, or work around it? : :Steve If you are getting DMA timeouts, go to this URL: http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting Then I would suggest going into /usr/src/sys/dev/ata (I think, on FreeBSD), locate all instances where request-timeout is set to 5, and change them all to 10. cd /usr/src/sys/dev/ata fgrep 'request-timeout' *.c ... change all assignments of 5 to 10 ... Try that first. If it helps then it is a known issue. Basically a combination of the on-disk write cache and possible ECC corrections, remappings, or excessive remapped sectors can cause the drive to take much longer then normal to complete a request. The default 5-second timeout is insufficient. If it does help, post confirmation to prod the FBsd developers to change the timeouts. -- If you are NOT getting DMA timeouts then the ZFS lockups may be due to buffer/memory deadlocks. ZFS has knobs for adjusting its memory footprint size. Lowering the footprint ought to solve (most of) those issues. It's actually somewhat of a hard issue to solve. Filesystems like UFS aren't complex enough to require the sort of dynamic memory allocations deep in the filesystem that ZFS and HAMMER need to do. -Matt Matthew Dillon [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
softdepflush bad block error has led to negative blocks in free inode and handle_workitem_freeblocks: block count
Hi, The problem started when i installed a kodicom 4400 card and started to run zoneminder. Prior to that no problems with my machine, which now runs FreeBSD panix.internal.net 7.0-RELEASE-p3 FreeBSD 7.0-RELEASE-p3 #3: Mon Jul 14 16:35:37 EEST 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC i386 This hardware change happened in Sunday Jul 13. The next day (Jul 14) morning periodic daily cron job at 03:01 gave: /var/log/messages.1.bz2:Jul 14 03:01:04 panix kernel: pid 48 (softdepflush), uid 0 inumber 2662656 on /usr: bad block /var/log/messages.1.bz2:Jul 14 03:01:04 panix kernel: pid 48 (softdepflush), uid 0 inumber 2662656 on /usr: bad block /var/log/messages.1.bz2:Jul 14 03:01:04 panix kernel: pid 48 (softdepflush), uid 0 inumber 2662656 on /usr: bad block /var/log/messages.1.bz2:Jul 14 03:01:04 panix kernel: pid 48 (softdepflush), uid 0 inumber 2662656 on /usr: bad block ... (15 times) The funny think is that df -h showed a huge negative capacity. Yesterday (Mon Jul 14) i had a crash when i tried to run (by hand) pkg_info . Today (Mon Jul 15) the morning periodic daily cron job resulted in a crash as well in when running find. I speculated that it was one of those cases that bad memory, or overheated memory could cause such problems and i removed the most suspicious sim. After that i didnt get any crashes when trying to run pkg_info or periodic daily,weekly,monthly, but i get the following whenever i run periodic weekly: panix kernel: free inode /usr/2662656 had -3549356 blocks (negative) and after a while panix kernel: handle_workitem_freeblocks: block count I suspect that even if i have a healthy system as far as memory is concerned (i hope), the problem with the 2662656 inode is still there. Any thoughts are very welcome. -- Achilleas Mantzios ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: igb doesn't compile in STABLE?
At Tue, 15 Jul 2008 10:07:22 -0700, Jack Vogel wrote: Oh, so the problem is if igb alone is defined? Yes. Best, George ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: igb doesn't compile in STABLE?
OK, will put on my todo list :) On Tue, Jul 15, 2008 at 10:31 AM, [EMAIL PROTECTED] wrote: At Tue, 15 Jul 2008 10:07:22 -0700, Jack Vogel wrote: Oh, so the problem is if igb alone is defined? Yes. Best, George ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Multi-machine mirroring choices
:Oliver Fromme wrote: : : Yet another way would be to use DragoFly's Hammer file : system which is part of DragonFly BSD 2.0 which will be : released in a few days. It supports remote mirroring, : i.e. mirror source and mirror target can run on different : machines. Of course it is still very new and experimental : (however, ZFS is marked experimental, too), so you probably : don't want to use it on critical production machines. : :Let's not get carried away here :) : :Kris Heh. I think its safe to say that a *NATIVE* uninterrupted and fully cache coherent fail-over feature is not something any of us in BSDland have yet. It's a damn difficult problem that is frankly best solved above the filesytem layer, but with filesystem support for bulk mirroring operations. HAMMER's native mirroring was the last major feature to go into it before the upcoming release, so it will definitely be more experimental then the rest of HAMMER. This is mainly because it implements a full blown queue-less incremental snapshot and mirroring algorithm, single-master-to-multi-slave. It does it at a very low level, by optimally scanning HAMMER's B-Tree. In other words, the kitchen sink. The B-Tree propagates the highest transaction id up to the root to support incremental mirroring and that's the bit that is highly experimental and not well tested yet. It's fairly complex because even destroyed B-Tree records and collapses must propagate a transaction id up the tree (so the mirroring code knows what it needs to send to the other end to do comparative deletions on the target). (transaction ids are bundled together in larger flushes so the actual B-Tree overhead is minimal). The rest of HAMMER is shaping up very well for the release. It's phenominal when it comes to storing backups. Post-release I'll be moving more of our production systems to HAMMER. The only sticky issue we have is filesystem-full handling, but it is more a matter of fine-tuning then anything else. -- Someone mentioned atime and mtime. For something like ZFS or HAMMER, these fields represent a real problem (atime more then mtime). I'm kinda interested in knowing, does ZFS do block replacement for atime updates? For HAMMER I don't roll new B-Tree records for atime or mtime updates. I update the fields in-place in the current version of the inode and all snapshot accesses will lock them (in getattr) to ctime in order to guarantee a consistent result. That way (tar | md5) can be used to validate snapshot integrity. At the moment, in this first release, the mirroring code does not propagate atime or mtime. I plan to do it, though. Even though I don't roll new B-Tree records for atime/mtime updates I can still propagate a new transaction id up the B-Tree to make the changes visible to the mirroring code. I'll definitely be doing that for mtime and will have the option to do it for atime as well. But atime still represents a big expense in actual mirroring bandwidth. If someone reads a million files on the master then a million inode records (sans file contents) would end up in the mirroring stream just for the atime update. Ick. -Matt ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: taskqueue timeout
Matthew Dillon wrote: If you are getting DMA timeouts, go to this URL: Yes, I am. http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting I fall under the category of ATA/SATA DMA timeout issues. Then I would suggest going into /usr/src/sys/dev/ata (I think, on FreeBSD), locate all instances where request-timeout is set to 5, and change them all to 10. cd /usr/src/sys/dev/ata fgrep 'request-timeout' *.c ... change all assignments of 5 to 10 ... Try that first. If it helps then it is a known issue. Basically a combination of the on-disk write cache and possible ECC corrections, remappings, or excessive remapped sectors can cause the drive to take much longer then normal to complete a request. The default 5-second timeout is insufficient. If it does help, post confirmation to prod the FBsd developers to change the timeouts. I've just reproduced the problem, and will try hacking the code now to see if the problem goes away. Since the box won't take input, I can't tell the disk usage at the time it dies. However, it seems to appear while running an Amanda backup, and my network throughput hits about ~90 Mbps @ ~5 kpps. I'll post back with results of the increase of the timeout. Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: taskqueue timeout
Matthew Dillon wrote: If you are getting DMA timeouts, go to this URL: http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting Then I would suggest going into /usr/src/sys/dev/ata (I think, on FreeBSD), locate all instances where request-timeout is set to 5, and change them all to 10. cd /usr/src/sys/dev/ata fgrep 'request-timeout' *.c ... change all assignments of 5 to 10 ... Changing 5 to 10 in all cases and rebuilding the kernel does not fix the problem. I'm going to install the patch that allows the values to be changed via sysctl and up it to 15. This problem happens across all four disks. Does anyone else have any suggestions on what I can check? Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load
Please collect kgdb/ddb backtraces. kgdb backtrace: server251# kgdb -c /var/crash/vmcore.0 kgdb: couldn't find a suitable kernel image server251# kgdb /boot/kernel/kernel /var/crash/vmcore.0 kgdb: kvm_read: invalid address (0xff00010e5468) [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Unde fined symbol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd. Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x64 fault code = supervisor read instruction, page not present instruction pointer = 0x8:0x64 stack pointer = 0x10:0xb1d7f590 frame pointer = 0x10:0xff0035d2dcc0 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 88622 (make) trap number = 12 panic: page fault cpuid = 0 Uptime: 5h57m22s Physical memory: 4082 MB Dumping 444 MB: 429 413 397 381 365 349 333 317 301 285 269 253 237 221 205 189 173 157 141 125 109 93 77 61 45 29 13 #0 doadump () at pcpu.h:194 194 pcpu.h: No such file or directory. in pcpu.h (kgdb) (kgdb) list *0x64 No source file for address 0x64. (kgdb) backtrace #0 doadump () at pcpu.h:194 #1 0xff0004742440 in ?? () #2 0x80477699 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0x80477a9d in panic (fmt=0x104 Address 0x104 out of bounds) at /usr/src/sys/kern/kern_shutdown.c:563 #4 0x8072ed44 in trap_fatal (frame=0xff00048ee000, eva=18446742974275512528) at /usr/src/sys/amd64/amd64/trap.c:724 #5 0x8072f115 in trap_pfault (frame=0xb1d7f4e0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641 #6 0x8072fa58 in trap (frame=0xb1d7f4e0) at /usr/src/sys/amd64/amd64/trap.c:410 #7 0x807156be in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 #8 0x0064 in ?? () #9 0x8067d3ee in uma_zalloc_arg (zone=0xff00bfed07e0, udata=0x0, flags=-256) at /usr/src/sys/vm/uma_core.c:1835 #10 0x80661ecf in ffs_vget (mp=0xff00047f4978, ino=47884512, flags=2, vpp=0xb1d7f728) at uma.h:277 #11 0x8066d010 in ufs_lookup (ap=0xb1d7f780) at /usr/src/sys/ufs/ufs/ufs_lookup.c:573 #12 0x804dfa89 in vfs_cache_lookup (ap=Variable ap is not available. ) at vnode_if.h:83 #13 0x8077235f in VOP_LOOKUP_APV (vop=0x809e7de0, a=0xb1d7f840) at vnode_if.c:99 ---Type return to continue, or q return to quit--- #14 0x804e6394 in lookup (ndp=0xb1d7f950) at vnode_if.h:57 #15 0x804e7228 in namei (ndp=0xb1d7f950) at /usr/src/sys/kern/vfs_lookup.c:219 #16 0x804f4717 in kern_stat (td=0xff00048ee000, path=0x8006f7040 Address 0x8006f7040 out of bounds, pathseg=Variable path seg is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2109 #17 0x804f4987 in stat (td=Variable td is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2093 #18 0x8072f397 in syscall (frame=0xb1d7fc70) at /usr/src/sys/amd64/amd64/trap.c:852 #19 0x807158cb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:290 #20 0x0043127c in ?? () Previous frame inner to this frame (corrupt stack?) I really don't understand this -any advice you can give would really be appreciated. John This message was sent using IMP, the Internet Messaging Program. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: taskqueue timeout
Steve Bertrand wrote: Matthew Dillon wrote: If you are getting DMA timeouts, go to this URL: http://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting Then I would suggest going into /usr/src/sys/dev/ata (I think, on FreeBSD), locate all instances where request-timeout is set to 5, and change them all to 10. cd /usr/src/sys/dev/ata fgrep 'request-timeout' *.c ... change all assignments of 5 to 10 ... Changing 5 to 10 in all cases and rebuilding the kernel does not fix the problem. Went from 10-15, and it took quite a bit longer into the backup before the problem cropped back up. Here is what I was seeing at the time it failed. Where netstat and zpool iostat drop off is where I start seeing the errors occur: # top last pid: 1069; load averages: 0.09, 0.17, 0.10 up 0+00:08:31 19:22:39 53 processes: 1 running, 52 sleeping CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 28M Active, 3644K Inact, 301M Wired, 76K Cache, 1634M Free Swap: # netstat -w 1 -h 4.8K 011M 3.5K 0 5.4M 0 4.5K 010M 3.3K 0 5.1M 0 4.9K 011M 3.6K 0 5.5M 0 4.8K 011M 3.5K 0 5.4M 0 4.3K 0 9.5M 3.1K 0 4.8M 0 5.1K 011M 3.7K 0 5.7M 0 5.0K 011M 3.6K 0 5.6M 0 5.3K 012M 3.9K 0 6.0M 0 4.8K 011M 3.5K 0 5.4M 0 4.7K 010M 3.4K 0 5.2M 0 4.8K 011M 3.5K 0 5.4M 0 4.6K 010M 3.4K 0 5.2M 0 4.1K 0 9.1M 3.0K 0 4.6M 0 5.3K 012M 3.9K 0 6.0M 0 5.2K 012M 3.8K 0 5.8M 0 4.3K 0 9.5M 3.1K 0 4.8M 0 4.3K 0 9.6M 3.2K 0 4.9M 0 5.4K 012M 4.0K 0 6.1M 0 4.8K 011M 3.5K 0 5.4M 0 2.4K 0 5.1M 1.7K 0 2.5M 0 input(Total) output packets errs bytespackets errs bytes colls 2 0120 2 0316 0 3 0180 4 0 1.0K 0 3 0180 2 0316 0 3 0180 3 0658 0 5 0 1.6K 5 0942 0 3 0254 4 0840 0 3 0180 2 0316 0 # zpool iostat 1 storage 6.40G 1.81T 0296 0 37.0M storage 6.43G 1.81T 0188 0 14.5M storage 6.43G 1.81T 0 0 0 0 storage 6.43G 1.81T 0 0 0 0 storage 6.43G 1.81T 0 0 0 0 storage 6.43G 1.81T 0 47 0 5.99M storage 6.46G 1.81T 0218 0 18.0M storage 6.46G 1.81T 0 0 0 0 storage 6.46G 1.81T 0 0 0 0 storage 6.46G 1.81T 9 0 192K 0 storage 6.46G 1.81T 0 59 0 7.39M storage 6.49G 1.81T 1250 3.42K 14.9M storage 6.49G 1.81T 0 0 0 0 storage 6.49G 1.81T 0 0 0 0 storage 6.49G 1.81T 0 0 0 0 storage 6.49G 1.81T 0141 0 17.5M storage 6.52G 1.81T 0 74 0 232K storage 6.52G 1.81T 0 0 0 0 storage 6.52G 1.81T 0 0 0 0 storage 6.52G 1.81T 0 0 0 0 storage 6.52G 1.81T 0151 0 18.8M storage 6.52G 1.81T 0114 0 8.07M storage 6.52G 1.81T 0 0 0 0 storage 6.52G 1.81T 0 0 0 0 storage 6.52G 1.81T 0 0 0 0 storage 6.52G 1.81T 0 0 0 0 Don't know if this will help anyone or not. Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load
[EMAIL PROTECTED] wrote: (kgdb) backtrace #0 doadump () at pcpu.h:194 #1 0xff0004742440 in ?? () #2 0x80477699 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0x80477a9d in panic (fmt=0x104 Address 0x104 out of bounds) at /usr/src/sys/kern/kern_shutdown.c:563 #4 0x8072ed44 in trap_fatal (frame=0xff00048ee000, eva=18446742974275512528) at /usr/src/sys/amd64/amd64/trap.c:724 #5 0x8072f115 in trap_pfault (frame=0xb1d7f4e0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641 #6 0x8072fa58 in trap (frame=0xb1d7f4e0) at /usr/src/sys/amd64/amd64/trap.c:410 #7 0x807156be in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 #8 0x0064 in ?? () #9 0x8067d3ee in uma_zalloc_arg (zone=0xff00bfed07e0, udata=0x0, flags=-256) at /usr/src/sys/vm/uma_core.c:1835 OK, that is if (zone-uz_ctor != NULL) { if (zone-uz_ctor(item, zone-uz_keg-uk_size, uz_ctor is indeed not null, but it's got 3 bits set. Not impossible that it's bad RAM still. I didn't spot anything that could cause it otherwise but I don't know this code in detail. Do all of the panics have the same backtrace? Kris #10 0x80661ecf in ffs_vget (mp=0xff00047f4978, ino=47884512, flags=2, vpp=0xb1d7f728) at uma.h:277 #11 0x8066d010 in ufs_lookup (ap=0xb1d7f780) at /usr/src/sys/ufs/ufs/ufs_lookup.c:573 #12 0x804dfa89 in vfs_cache_lookup (ap=Variable ap is not available. ) at vnode_if.h:83 #13 0x8077235f in VOP_LOOKUP_APV (vop=0x809e7de0, a=0xb1d7f840) at vnode_if.c:99 ---Type return to continue, or q return to quit--- #14 0x804e6394 in lookup (ndp=0xb1d7f950) at vnode_if.h:57 #15 0x804e7228 in namei (ndp=0xb1d7f950) at /usr/src/sys/kern/vfs_lookup.c:219 #16 0x804f4717 in kern_stat (td=0xff00048ee000, path=0x8006f7040 Address 0x8006f7040 out of bounds, pathseg=Variable path seg is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2109 #17 0x804f4987 in stat (td=Variable td is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2093 #18 0x8072f397 in syscall (frame=0xb1d7fc70) at /usr/src/sys/amd64/amd64/trap.c:852 #19 0x807158cb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:290 #20 0x0043127c in ?? () Previous frame inner to this frame (corrupt stack?) I really don't understand this -any advice you can give would really be appreciated. John This message was sent using IMP, the Internet Messaging Program. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load
#9 0x8067d3ee in uma_zalloc_arg (zone=0xff00bfed07e0, udata=0x0, flags=-256) at /usr/src/sys/vm/uma_core.c:1835 From the frame #9, please do p *zone I am esp. interested in the value of the uz_ctor member. It seems that it becomes corrupted, it value should be 0, as this seems to be ffs inode zone. I suspect that gdb would show 0x64 instead. I am afraid that you may need to spell out each step for me :-( (kgdb) p *zone No symbol zone in current context. (kgdb) list *0x8067d3ee 0x8067d3ee is in uma_zalloc_arg (/usr/src/sys/vm/uma_core.c:1835). 1830 (uma_zalloc: Bucket pointer mangled.)); 1831 cache-uc_allocs++; 1832 critical_exit(); 1833 #ifdef INVARIANTS 1834 ZONE_LOCK(zone); 1835 uma_dbg_alloc(zone, NULL, item); 1836 ZONE_UNLOCK(zone); 1837 #endif 1838 if (zone-uz_ctor != NULL) { 1839 if (zone-uz_ctor(item, zone-uz_keg-uk_size, Is this that you were looking for? John This message was sent using IMP, the Internet Messaging Program. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load
On Tue, Jul 15, 2008 at 08:19:15PM +0100, [EMAIL PROTECTED] wrote: Please collect kgdb/ddb backtraces. kgdb backtrace: server251# kgdb -c /var/crash/vmcore.0 kgdb: couldn't find a suitable kernel image server251# kgdb /boot/kernel/kernel /var/crash/vmcore.0 kgdb: kvm_read: invalid address (0xff00010e5468) [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Unde fined symbol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd. Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x64 fault code = supervisor read instruction, page not present instruction pointer = 0x8:0x64 stack pointer = 0x10:0xb1d7f590 frame pointer = 0x10:0xff0035d2dcc0 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 88622 (make) trap number = 12 panic: page fault cpuid = 0 Uptime: 5h57m22s Physical memory: 4082 MB Dumping 444 MB: 429 413 397 381 365 349 333 317 301 285 269 253 237 221 205 189 173 157 141 125 109 93 77 61 45 29 13 #0 doadump () at pcpu.h:194 194 pcpu.h: No such file or directory. in pcpu.h (kgdb) (kgdb) list *0x64 No source file for address 0x64. (kgdb) backtrace #0 doadump () at pcpu.h:194 #1 0xff0004742440 in ?? () #2 0x80477699 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0x80477a9d in panic (fmt=0x104 Address 0x104 out of bounds) at /usr/src/sys/kern/kern_shutdown.c:563 #4 0x8072ed44 in trap_fatal (frame=0xff00048ee000, eva=18446742974275512528) at /usr/src/sys/amd64/amd64/trap.c:724 #5 0x8072f115 in trap_pfault (frame=0xb1d7f4e0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641 #6 0x8072fa58 in trap (frame=0xb1d7f4e0) at /usr/src/sys/amd64/amd64/trap.c:410 #7 0x807156be in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 #8 0x0064 in ?? () #9 0x8067d3ee in uma_zalloc_arg (zone=0xff00bfed07e0, udata=0x0, flags=-256) at /usr/src/sys/vm/uma_core.c:1835 From the frame #9, please do p *zone I am esp. interested in the value of the uz_ctor member. It seems that it becomes corrupted, it value should be 0, as this seems to be ffs inode zone. I suspect that gdb would show 0x64 instead. That may be kernel memory corruption, but might be a bad memory as well (double bit inversion ?). #10 0x80661ecf in ffs_vget (mp=0xff00047f4978, ino=47884512, flags=2, vpp=0xb1d7f728) at uma.h:277 #11 0x8066d010 in ufs_lookup (ap=0xb1d7f780) at /usr/src/sys/ufs/ufs/ufs_lookup.c:573 #12 0x804dfa89 in vfs_cache_lookup (ap=Variable ap is not available. ) at vnode_if.h:83 #13 0x8077235f in VOP_LOOKUP_APV (vop=0x809e7de0, a=0xb1d7f840) at vnode_if.c:99 ---Type return to continue, or q return to quit--- #14 0x804e6394 in lookup (ndp=0xb1d7f950) at vnode_if.h:57 #15 0x804e7228 in namei (ndp=0xb1d7f950) at /usr/src/sys/kern/vfs_lookup.c:219 #16 0x804f4717 in kern_stat (td=0xff00048ee000, path=0x8006f7040 Address 0x8006f7040 out of bounds, pathseg=Variable path seg is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2109 #17 0x804f4987 in stat (td=Variable td is not available. ) at /usr/src/sys/kern/vfs_syscalls.c:2093 #18 0x8072f397 in syscall (frame=0xb1d7fc70) at /usr/src/sys/amd64/amd64/trap.c:852 #19 0x807158cb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:290 #20 0x0043127c in ?? () Previous frame inner to this frame (corrupt stack?) I really don't understand this -any advice you can give would really be appreciated. pgpRxQ8vDk9c9.pgp Description: PGP signature
Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load
On Tue, Jul 15, 2008 at 08:47:03PM +0100, [EMAIL PROTECTED] wrote: #9 0x8067d3ee in uma_zalloc_arg (zone=0xff00bfed07e0, udata=0x0, flags=-256) at /usr/src/sys/vm/uma_core.c:1835 From the frame #9, please do p *zone I am esp. interested in the value of the uz_ctor member. It seems that it becomes corrupted, it value should be 0, as this seems to be ffs inode zone. I suspect that gdb would show 0x64 instead. I am afraid that you may need to spell out each step for me :-( (kgdb) p *zone No symbol zone in current context. Do the frame 9 before p *zone. (kgdb) list *0x8067d3ee 0x8067d3ee is in uma_zalloc_arg (/usr/src/sys/vm/uma_core.c:1835). 1830 (uma_zalloc: Bucket pointer mangled.)); 1831 cache-uc_allocs++; 1832 critical_exit(); 1833 #ifdef INVARIANTS 1834 ZONE_LOCK(zone); 1835 uma_dbg_alloc(zone, NULL, item); 1836 ZONE_UNLOCK(zone); 1837 #endif 1838 if (zone-uz_ctor != NULL) { 1839 if (zone-uz_ctor(item, zone-uz_keg-uk_size, Is this that you were looking for? No, see above. pgpvrqCJe6SDX.pgp Description: PGP signature
Re: taskqueue timeout
:Went from 10-15, and it took quite a bit longer into the backup before :the problem cropped back up. Try 30 or longer. See if you can make the problem go away entirely. then fall back to 5 and see if the problem resumes at its earlier pace. -- It could be temperature related. The drives are being exercised a lot, they could very well be overheating. To find out add more airflow (a big house fan would do the trick). -- It could be that errors are accumulating on the drives, but it seems unlikely that four drives would exhibit the same problem. -- Also make sure the power supply can handle four drives. Most power supplies that come with consumer boxes can't under full load if you also have a mid or high-end graphics card installed. Power supplies that come with OEM slap-together enclosures are not usually much better. Specifically, look at the +5V and +12V amperage maximums on the power supply, then check the disk labels to see what they draw, then multiply by 2. e.g. if your power supply can do [EMAIL PROTECTED] and you have four drives each taking [EMAIL PROTECTED] (and typically ~half that at 5V), thats 4x2x2 = [EMAIL PROTECTED] and you would probably be ok. To test, remove two of the four drives, reformat the ZFS to use just 2, and see if the problem reoccurs with just two drives. -Matt ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load
Do the frame 9 before p *zone. It's obvious now you say it ;-) You are indeed right: (kgdb) frame 9 #9 0x8067d3ee in uma_zalloc_arg (zone=0xff00bfed07e0, udata=0x0, flags=-256) at /usr/src/sys/vm/uma_core.c:1835 1835 uma_dbg_alloc(zone, NULL, item); (kgdb) p *zone $1 = {uz_name = 0x808084cd FFS inode, uz_lock = 0xff00bfecf7f0, uz_keg = 0xff00bfecf7e0, uz_link = {le_next = 0x0, le_prev = 0xff00bfecf830}, uz_full_bucket = { lh_first = 0xffe01a74c830}, uz_free_bucket = { lh_first = 0xff00469bf830}, uz_ctor = 0x64, uz_dtor = 0, uz_init = 0x9a, uz_fini = 0, uz_allocs = 17180460407, uz_frees = 504673, uz_fails = 0, uz_fills = 0, uz_count = 128, uz_cpu = {{ uc_freebucket = 0xff000e5d6830, uc_allocbucket = 0xff003a5f7000, uc_allocs = 97, uc_frees = 0}}} Now what does that mean?? I just experienced another panic, but it failed to writ to disk :-(. I will force another one and check that the details are the same. John This message was sent using IMP, the Internet Messaging Program. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load
[EMAIL PROTECTED] wrote: #9 0x8067d3ee in uma_zalloc_arg (zone=0xff00bfed07e0, udata=0x0, flags=-256) at /usr/src/sys/vm/uma_core.c:1835 From the frame #9, please do p *zone I am esp. interested in the value of the uz_ctor member. It seems that it becomes corrupted, it value should be 0, as this seems to be ffs inode zone. I suspect that gdb would show 0x64 instead. I am afraid that you may need to spell out each step for me :-( (kgdb) p *zone No symbol zone in current context. (kgdb) list *0x8067d3ee 0x8067d3ee is in uma_zalloc_arg (/usr/src/sys/vm/uma_core.c:1835). 1830(uma_zalloc: Bucket pointer mangled.)); 1831cache-uc_allocs++; 1832critical_exit(); 1833#ifdef INVARIANTS 1834ZONE_LOCK(zone); 1835uma_dbg_alloc(zone, NULL, item); 1836ZONE_UNLOCK(zone); 1837#endif 1838if (zone-uz_ctor != NULL) { 1839if (zone-uz_ctor(item, zone-uz_keg-uk_size, Is this that you were looking for? Are you sure that is the same source tree you are running? The 7.0-RELEASE source has the zone-uz_ctor on line 1835, which is consistent with your backtrace. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: taskqueue timeout
Don't want to give conflicting advice, and would suggest you certainly try the 30 sec thing first. I'm already on 10 myself but haven't pushed further. In my own case I've not had any issue with zfs in particular since I applied the ZFS zil/prefetch disable loader.conf tunables 10 hours ago. I am observing this now. For the record .. What ata chipset/motherboard and model of disk have you got ? Have you seen any smart errors (real or otherwise) ? What do your 'zpool status' counters look like ? -- Alex On Tue, 2008-07-15 at 12:55 -0700, Matthew Dillon wrote: :Went from 10-15, and it took quite a bit longer into the backup before :the problem cropped back up. Try 30 or longer. See if you can make the problem go away entirely. then fall back to 5 and see if the problem resumes at its earlier pace. -- It could be temperature related. The drives are being exercised a lot, they could very well be overheating. To find out add more airflow (a big house fan would do the trick). -- It could be that errors are accumulating on the drives, but it seems unlikely that four drives would exhibit the same problem. -- Also make sure the power supply can handle four drives. Most power supplies that come with consumer boxes can't under full load if you also have a mid or high-end graphics card installed. Power supplies that come with OEM slap-together enclosures are not usually much better. Specifically, look at the +5V and +12V amperage maximums on the power supply, then check the disk labels to see what they draw, then multiply by 2. e.g. if your power supply can do [EMAIL PROTECTED] and you have four drives each taking [EMAIL PROTECTED] (and typically ~half that at 5V), thats 4x2x2 = [EMAIL PROTECTED] and you would probably be ok. To test, remove two of the four drives, reformat the ZFS to use just 2, and see if the problem reoccurs with just two drives. -Matt ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Konqueror and the Cookiejar
Since upgrading to 7.0 Stable, I've noticed an occasional problem with konqueror. I've been recompiling my ports for the past few weeks and have noticed that some sites are complaining about cookies not being enabled. Further investigation has revealed that if I start konqueror from the terminal prompt, I can get an error message: khtml (dom) Can't communicate with the cookiejar! A workaround I've discovered is to run kded first. Konqueror works with cookies after that. Question: What process is NOT running kded during the startx process? Where is there a log to track this? -- Paul Horechuk Think Free Use Open Source Software __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Multi-machine mirroring choices
On Tue, Jul 15, 2008 at 07:10:05PM +0200, Kris Kennaway wrote: Wesley Shields wrote: On Tue, Jul 15, 2008 at 07:54:26AM -0700, Jeremy Chadwick wrote: One of the annoyances to ZFS snapshots, however, was that I had to write my own script to do snapshot rotations (think incremental dump(8) but using ZFS snapshots). There is a PR[1] to get something like this in the ports tree. I have no idea how good it is but I hope to get it in the tree soon. http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/125340 There is also sysutils/freebsd-snapshot (pkg-descr is out of date, it supports ZFS too). I found it more convenient to just write my own tiny script. Thanks for pointing this out -- I had no idea such a port existed! -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Multi-machine mirroring choices
On Tue, Jul 15, 2008 at 11:47:57AM -0400, Sven Willenberger wrote: On Tue, 2008-07-15 at 07:54 -0700, Jeremy Chadwick wrote: ZFS's send/recv capability (over a network) is something I didn't have time to experiment with, but it looked *very* promising. The method is documented in the manpage as Example 12, and is very simple -- as it should be. You don't have to use SSH either, by the way[1]. The examples do list ssh as the way of initiating the receiving end; I am curious as to what the alterative would be (short of installing openssh-portable and using cipher=no). rsh or netcat come to mind. I haven't tried using either though. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Multi-machine mirroring choices
Jeremy Chadwick wrote: On Tue, Jul 15, 2008 at 11:47:57AM -0400, Sven Willenberger wrote: On Tue, 2008-07-15 at 07:54 -0700, Jeremy Chadwick wrote: ZFS's send/recv capability (over a network) is something I didn't have time to experiment with, but it looked *very* promising. The method is documented in the manpage as Example 12, and is very simple -- as it should be. You don't have to use SSH either, by the way[1]. The examples do list ssh as the way of initiating the receiving end; I am curious as to what the alterative would be (short of installing openssh-portable and using cipher=no). rsh or netcat come to mind. I haven't tried using either though. I wouldn't recommend either for the obvious reasons: weak or no authentication and integrity protection. Even if the former is not a concern for some reason then the latter should be (your data stream could be corrupted in transit and you'd never know until you tried to verify or restore the backup). Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: taskqueue timeout
Matthew Dillon wrote: Try that first. If it helps then it is a known issue. Basically a combination of the on-disk write cache and possible ECC corrections, remappings, or excessive remapped sectors can cause the drive to take much longer then normal to complete a request. The default 5-second timeout is insufficient. From Western Digital's line of enterprise drives: RAID-specific time-limited error recovery (TLER) - Pioneered by WD, this feature prevents drive fallout caused by the extended hard drive error-recovery processes common to desktop drives. Western Digital's information sheet on TLER states that they found most RAID controllers will wait 8 seconds for a disk to respond before dropping it from the RAID set. Consequently they changed their enterprise drives to try reading a bad sector for only 7 seconds before returning an error. Therefore I think the FreeBSD timeout should also be set to 8 seconds instead of 5 seconds. Desktop-targetted drives will not respond for over 10 seconds, up to minutes, so its not worth setting the FreeBSD timeout any higher. More info: http://www.wdc.com/en/library/sata/2579-001098.pdf http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery - Andrew ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: taskqueue timeout
Matthew Dillon wrote: :Went from 10-15, and it took quite a bit longer into the backup before :the problem cropped back up. Jumping right into it, there is another post after this one, but I'm going to try to reply inline: Try 30 or longer. See if you can make the problem go away entirely. then fall back to 5 and see if the problem resumes at its earlier pace. I'm sure 30 will either push the issue longer, or into non-existence, but are there any developers here who can say what this timer does? ie. How does changing this timer affect the performance of the disk subsystem (aside from allowing it to work, of course). After I'm done responding this message, I'll be testing the sysctl to 30. It could be temperature related. The drives are being exercised a lot, they could very well be overheating. To find out add more airflow (a big house fan would do the trick). Temperature is a good thought, but currently, my physical situation has this: - 2U chassis - multiple fans in the case - in my lab (which is essentially beside my desk) - the case has no lid - it is 64 degrees with A/C and circulating fans in this area - hard drives are separated relatively well inside the case It could be that errors are accumulating on the drives, but it seems unlikely that four drives would exhibit the same problem. Thats what I'm thinking. All four drives are exhibiting the same errors... or, for all intents and purposes, the machine is coughing the same errors for all the drives. Also make sure the power supply can handle four drives. Most power supplies that come with consumer boxes can't under full load if you also have a mid or high-end graphics card installed. Power supplies that come with OEM slap-together enclosures are not usually much better. I currently have a 550W PSU in the 2U chassis, which again, is sitting open. I have more hardware, running in worse conditions with less wattage PSUs that don't exhibit this behavior. I need to determine whether this problem is SATA, ZFS, the motherboard or code. Specifically, look at the +5V and +12V amperage maximums on the power supply, then check the disk labels to see what they draw, then multiply by 2. e.g. if your power supply can do [EMAIL PROTECTED] and you have four drives each taking [EMAIL PROTECTED] (and typically ~half that at 5V), thats 4x2x2 = [EMAIL PROTECTED] and you would probably be ok. I'm well within specs. Even after V/A tests with the meter. The power supply is providing ample wattage to each device accordingly. To test, remove two of the four drives, reformat the ZFS to use just 2, and see if the problem reoccurs with just two drives. ... I knew that was going to come up... my response is I worked so hard to get this system with ZFS all configured *exactly* how I wanted it. To test, I'm going to flip to 30 as per Matthews recommendation, and see how far that takes me. At this time, I'm only testing by backing up one machine on the network. If it fails, I'll clock the time, and then 'reformat' with two drives. Is there a technical reason this may work better with only two drives? Is there anyone interested to the point where remote login would be helpful? Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: taskqueue timeout
On Tue, Jul 15, 2008 at 10:29:28PM -0400, Steve Bertrand wrote: Is there anyone interested to the point where remote login would be helpful? I believe my FreeBSD Wiki page documents what to do if your problem is easily reproducable: contact Scott Long, who has offered to help track down the source of these problems. I'll reply to the other part of your mail in a bit. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: taskqueue timeout
Alex Trull wrote: Don't want to give conflicting advice, and would suggest you certainly try the 30 sec thing first. I'm already on 10 myself but haven't pushed further. What were you doing, and what did you notice when the problem started? As much as it seems silly, I'm mostly interested in what your network was doing at the time things went sour. In my own case I've not had any issue with zfs in particular since I applied the ZFS zil/prefetch disable loader.conf tunables 10 hours ago. I am observing this now. For some reason, and with no explanation or science behind it, I don't think this is a ZFS problem, and I'm trying to defend this thought to my peers until I prove otherwise. I have to be a bit careful on how I adjust loader properties, given that I'm loading from USB, and mounting root from a ZFS zpool hard disk. Like my GELI systems, tweaking things can be a bit touchy unless I put a little more planning into it. For the record .. What ata chipset/motherboard and model of disk have you got ? I'm not a hardware person per-se, but I'm advised to post that the motherboard is: - XFS nForce 610i with GeForce 7050 If there is more hardware info I can provide, let me know specifically what I should be looking for. Have you seen any smart errors (real or otherwise) ? What do your 'zpool status' counters look like ? zpool status is always clean. There are no errors otherwise, even if the box is up for multiple hours straight. The problem occurs only if I through work at it. Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: taskqueue timeout
Jeremy Chadwick wrote: On Tue, Jul 15, 2008 at 10:29:28PM -0400, Steve Bertrand wrote: Is there anyone interested to the point where remote login would be helpful? I believe my FreeBSD Wiki page documents what to do if your problem is easily reproducable: contact Scott Long, who has offered to help track down the source of these problems. Changing to 30 second timeout made no difference whatsoever. The problem occurred at about the same time during the single I'm at a standstill. I'm willing to help provide any information necessary to fix this issue, or provide remote access to the box in question. scottl@ has been Cc:'d. Thanks all, Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load
From: John Sullivan [EMAIL PROTECTED] Date: Tue, 15 Jul 2008 10:58:19 +0100 Sender: [EMAIL PROTECTED] I am experiencing 'random' reboots interspersed with panics whenever I put a newly installed system under load (make index in /usr/ports is enough). A sample panic is at the end of this email. I have updated to 7.0-RELEASE-p2 using the GENERIC amd64 kernel and it is still the same. The system is a Gigabyte GA-M56S-S3 motherboard with 4GB of RAM, an Athlon X2 6400+ and 3 x Maxtor SATA 750GB HDD's (only the first is currently in use). The first disk is all allocated to FreeBSD using UFS. There is also a Linksys 802.11a/b/g card installed. I have flashed the BIOS to the latest revision (F4e). The onboard RAID is disabled. At the moment there is no exotic software installed. Although I have been using FreeBSD for a number of years this is the first time I have experienced regular panics and am at a complete loss trying to work out what is wrong. I would be grateful for any advice anyone is willing to give to help me troubleshoot this issue. Thanks in advance John Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x80b0 fault code - supervisor write data, page not present instruction pointer = 0x8:0x804db18c stack pointer = 0x10:b1e92450 frame pointer = 0x10:ffec code segment = base 0x0, limit 0xf, type 0x16, DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interupt enabled, resume, IOPL = 0 current processkernel trap 12 with interrupts disabled #nm -n /boot/kernel/kernel | grep 804db 804dbac0 t flushbufqueues Could be memory, but I'd also suggest looking at temperatures. I've had overheating systems produce lots of such errors. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751 pgpnWWuBCVU7i.pgp Description: PGP signature
Re: taskqueue timeout
:... : and see if the problem reoccurs with just two drives. : :... I knew that was going to come up... my response is I worked so hard :to get this system with ZFS all configured *exactly* how I wanted it. : :To test, I'm going to flip to 30 as per Matthews recommendation, and see :how far that takes me. At this time, I'm only testing by backing up one :machine on the network. If it fails, I'll clock the time, and then :'reformat' with two drives. : :Is there a technical reason this may work better with only two drives? : :Is there anyone interested to the point where remote login would be helpful? : :Steve This issue is vexing a lot of people. Setting the timeout to 30 will not effect performance, but it will cause a 30 second delay in recovery when (if) the problem occurs. i.e. when the disk stalls it will just sit there doing nothing for 30 seconds, then it will print the timeout message and try to recover. It occurs to me that it might be beneficial to actually measure the disk's response time to each request, and then graph it over a period of time. Maybe seeing the issue visually will give some clue as to the actual cause. -Matt Matthew Dillon [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: taskqueue timeout
Andrew Snow wrote: From Western Digital's line of enterprise drives: RAID-specific time-limited error recovery (TLER) - Pioneered by WD, this feature prevents drive fallout caused by the extended hard drive error-recovery processes common to desktop drives. Therefore I think the FreeBSD timeout should also be set to 8 seconds instead of 5 seconds. Desktop-targetted drives will not respond for over 10 seconds, up to minutes, so its not worth setting the FreeBSD timeout any higher. Interesting you say this. To reiterate, I have /boot on USB thumb drive, and the system is mounted from / on a raidz pool called /storage via loader.conf. The four drives in question (per the packaging) are: - Western Digital Caviar SE16 500GB - 7200, 16MB, SATA-300, OEM Per the packaging on the rest of the hardware: # mobo - XFX 610i, 7050 GeForce (I *never* use graphics on my FreeBSD boxen, I *only* know/have CLI with no 'windows') # memory - 2 GB Corsair XMS2 Twin2X 6400C4 memory # cpu - Intel Pentium DC E2200 2.20GHz OEM - 2.20 GHz, 1MB Cache, 800MHz FSB, Allendale, Dual Core, OEM, Socket 775, Processor # swap - I don't run any, but can/will add in an IDE/ATA 7200 200GB in the event this problem may be related to ZFS/RAM issues. Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
HP Pavilion dv2000 laptop wont boot off install cd
Laptop details : HP Pavilion dv2000 (dv2422ca) Specifications (taken from http://h10025.www1.hp.com/ewfrf/wc/document?cc=audocname=c01070158dlc=enl c=enjumpid=reg_R1002_AUEN ) : Product Name: dv2422ca Product Number: GM039UA#ABC / GM039UA#ABL Microprocessor: 1.8 GHz AMD Turion T 64 X2 Dual-Core Mobile Technology TL-56 Microprocessor Cache: 512KB+512KB L2 Cache Memory: 2048 MB DDR2 System Memory (2 Dimm) I tried to boot from 7.0-release-AMD64, 7.0-release-i386 and 6.2-release-i386 install disks (about to try 6.3-release-amd64). I could not successfully boot up the computer using the install disks mentioned. Sometimes there would be a memory dump (scrolling infinitely), sometimes I would get the following message(s) : elf_32_lookup_symbol : corrupt symbol table loading required module 'pci' ACPI autoload failed - no such file or directory \ int=0006err=efl=00010002eip=0003 eax=00449130ebx=ecx=004f010fedx=0003fa40 esi=edi=ebp=esp=000928b0 cs=0008 ds=0010 es=0010 fs=0010 gs=0010 ss=0010 cs:eip= f0 53 ff 00 f0 c3 e2 00-f0 53 ff 00 f0 53 ff 00 f0 54 ff 00 f0 8a a8 00-f0 53 ff 00 f0 a5 fe 00 ss:esp= 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 BTX halted There is no significant BIOS option in this laptop that I can think of to at least begin to trouble shoot this issue. Laptop works fine for other operating systems as far as I can tell. Initial documentation suggests that this laptop should work, however, I'd like to get some more insight from freebsd-stable before continuing. If any additional information is required, please let me know. Cheers, Kevin K. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: taskqueue timeout
Matthew Dillon wrote: This issue is vexing a lot of people. Heh... I can appreciate this. I would like someone to inform me that this can't be guaranteed to be a ZFS problem... if I can get confirmation that others have this issue aside from ZFS, I would feel content. Setting the timeout to 30 will not effect performance, but it will cause a 30 second delay in recovery when (if) the problem occurs. i.e. when the disk stalls it will just sit there doing nothing for 30 seconds, then it will print the timeout message and try to recover. If I have the timeout at = 30 and the issue still occurs, the problem must be elsewhere. It occurs to me that it might be beneficial to actually measure the disk's response time to each request, and then graph it over a period of time. Maybe seeing the issue visually will give some clue as to the actual cause. I am interested in following through with this, but can't do it on my own. I'm willing to dedicate the box and bandwidth to anyone who can legitimately test this as you state. ie: I need either guidance or assistance. This box is ready for the taking. Beyond this box, I can provide legitimate parties other network resources to produce a consistent flow of data to ensure the ability to easily reproduce the issue locally, on demand. Steve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]