Re: zfs arc and amount of wired memory
on 09/02/2012 06:27 Eugene M. Zheganin said the following: The output I promised (if it's MORE acceptable in the form of a link to a paste site, just say it): I prefer links, but both ways are acceptable to me. Just one more hint on the reporting. The most useful reports are coherent reports. That is, I now have your older reports from top and zfs-stat and I have newer vmstat reports. But I do not have all the reports taken at about the same time, so I don't have a coherent picture of a system state. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs arc and amount of wired memory
on 09/02/2012 10:33 Andriy Gapon said the following: on 09/02/2012 06:27 Eugene M. Zheganin said the following: The output I promised (if it's MORE acceptable in the form of a link to a paste site, just say it): I prefer links, but both ways are acceptable to me. Just one more hint on the reporting. The most useful reports are coherent reports. That is, I now have your older reports from top and zfs-stat and I have newer vmstat reports. But I do not have all the reports taken at about the same time, so I don't have a coherent picture of a system state. And please take the reports after discrepancy between ARC size an wired size is large enough, like e.g. 1GB. That's when they are useful. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs arc and amount of wired memory
Hi. On 09.02.2012 14:35, Andriy Gapon wrote: And please take the reports after discrepancy between ARC size an wired size is large enough, like e.g. 1GB. That's when they are useful. Okay, I wrote a short script capturing sequence of top -b/zfs-stats -a/vmstat -m/vmstat -z in a timestamped file and put it in a crontab every hour. I will provide the files it creates (or a subset of files, if there will be too many) after the system will enter a deadlock again. This time varies from one week to two. Thanks. Eugene. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs arc and amount of wired memory
Hi. On 09.02.2012 14:35, Andriy Gapon wrote: And please take the reports after discrepancy between ARC size an wired size is large enough, like e.g. 1GB. That's when they are useful. One more thing - this machine is running a debug/ddb kernel, so just in order to save two weeks - when/if it will enter a deadlock, do you (or anyone else) need crashdump or anything else I can provide using ddb in a deadlock ? Thanks. Eugene. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs arc and amount of wired memory
Hi, if you are not using USB3 and a fast memory stick, it will be slower than swapping to disk. Bye, Alexander. -- Send via an Android device, please forgive brevity and typographic and spelling errors. Freddie Cash fjwc...@gmail.com hat geschrieben:On Wed, Feb 8, 2012 at 10:25 AM, Eugene M. Zheganin e...@norma.perm.ru wrote: On 08.02.2012 18:15, Alexander Leidinger wrote: I can't remember to have seen any mention of SWAP on ZFS being safe now. So if nobody can provide a reference to a place which tells that the problems with SWAP on ZFS are fixed: 1. do not use SWAP on ZFS 2. see 1. 3. check if you see the same problem without SWAP on ZFS (btw. see 1.) So, if a swap have to be used, and, it has to be backed up with something like gmirror so it won't come down with one of the disks, there's no need to use zfs for system. This makes zfs only useful in cases where you need to store something on a couple+ of terabytes, still having OS on ufs. Occam's razor and so on. Or, you plug a USB stick into the back (or even inside the case as a lot of mobos have internal USB connectors now) and use that for swap. -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs arc and amount of wired memory
Hi, this only applies to old systems (slooow disks, no NCQ support), or very fast USB3 memory sticks. Current (I would say at least 2-3 year old) hardware is slowed down by USB2. Bye, Alexander. -- Send via an Android device, please forgive brevity and typographic and spelling errors. Freddie Cash fjwc...@gmail.com hat geschrieben:On Wed, Feb 8, 2012 at 10:40 AM, Freddie Cash fjwc...@gmail.com wrote: On Wed, Feb 8, 2012 at 10:25 AM, Eugene M. Zheganin e...@norma.perm.ru wrote: On 08.02.2012 18:15, Alexander Leidinger wrote: I can't remember to have seen any mention of SWAP on ZFS being safe now. So if nobody can provide a reference to a place which tells that the problems with SWAP on ZFS are fixed: 1. do not use SWAP on ZFS 2. see 1. 3. check if you see the same problem without SWAP on ZFS (btw. see 1.) So, if a swap have to be used, and, it has to be backed up with something like gmirror so it won't come down with one of the disks, there's no need to use zfs for system. This makes zfs only useful in cases where you need to store something on a couple+ of terabytes, still having OS on ufs. Occam's razor and so on. Or, you plug a USB stick into the back (or even inside the case as a lot of mobos have internal USB connectors now) and use that for swap. That also works well for adding L2ARC (cache) to the ZFS pool as well. -- Freddie Cash fjwc...@gmail.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs arc and amount of wired memory
Hi, a possible soution would be to start a wiki pagee with what you know, e.g. a page which explains that solaris and zio* belong to ZFS. Over time people can extend with additional info. Bye, Alexander. -- Send via an Android device, please forgive brevity and typographic and spelling errors. Jeremy Chadwick free...@jdc.parodius.com hat geschrieben:On Wed, Feb 08, 2012 at 10:29:36PM +0200, Andriy Gapon wrote: on 08/02/2012 12:31 Eugene M. Zheganin said the following: Hi. On 08.02.2012 02:17, Andriy Gapon wrote: [output snipped] Thank you. I don't see anything suspicious/unusual there. Just case, do you have ZFS dedup enabled by a chance? I think that examination of vmstat -m and vmstat -z outputs may provide some clues as to what got all that memory wired. Nope, I don't have deduplication feature enabled. OK. So, did you have a chance to inspect vmstat -m and vmstat -z? Andriy, Politely -- recommending this to a user is a good choice of action, but the problem is that no user, even an experienced user, is going to know what all of the Types (vmstat -m) or ITEMs (vmstat -z) correlate with on the system. For example, for vmstat -m, the ITEM name is solaris. For vmstat -z, the Types are named zio_* but I have a feeling there are more than just that which pertain to ZFS. I'm having to make *assumptions*. The FreeBSD VM is highly complex and is not easy to understand even remotely. It becomes more complex when you consider that we use terms like wired, active, inactive, cache, and free -- and none of them, in simple English terms, actually represent the words chosen for what they do. Furthermore, the only definition I've been able to find over the years for how any of these work, what they do/mean, etc. is here: http://www.freebsd.org/doc/en/books/arch-handbook/vm.html And this piece of documentation is only useful for people who understand VMs (note: it was written by Matt Dillon, for example). It is not useful for end-users trying to track down what within the kernel is actually eating up memory. vmstat -m is as best as it's going to get, and like I said, with the ITEM names being borderline ambiguous (depending on what you're looking for -- with VFS and so on it's spread all over the place), this becomes a very tedious task, where the user or admin have to continually ask developers on the mailing lists what it is they're looking at. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs arc and amount of wired memory
Hi, feel free to register with FirstnameLastname in the wiki and tell us about it. We provide write access to people which seriously want to help improve the wiki content. Bye, Alexander. -- Send via an Android device, please forgive brevity and typographic and spelling errors. Charles Sprickman sp...@bway.net hat geschrieben: On Feb 8, 2012, at 7:43 PM, Artem Belevich wrote: On Wed, Feb 8, 2012 at 4:28 PM, Jeremy Chadwick free...@jdc.parodius.com wrote: On Thu, Feb 09, 2012 at 01:11:36AM +0100, Miroslav Lachman wrote: ... ARC Size: Current Size: 1769 MB (arcsize) Target Size (Adaptive): 512 MB (c) Min Size (Hard Limit): 512 MB (zfs_arc_min) Max Size (Hard Limit): 3584 MB (zfs_arc_max) The target size is going down to the min size and after few more days, the system is so slow, that I must reboot the machine. Then it is running fine for about 107 days and then it all repeat again. You can see more on MRTG graphs http://freebsd.quip.cz/ext/2012/2012-02-08-kiwi-mrtg-12-15/ You can see links to other useful informations on top of the page (arc_summary, top, dmesg, fs usage, loader.conf) There you can see nightly backups (higher CPU load started at 01:13), otherwise the machine is idle. It coresponds with ARC target size lowering in last 5 days http://freebsd.quip.cz/ext/2012/2012-02-08-kiwi-mrtg-12-15/local_zfs_arcstats_size.html And with ARC metadata cache overflowing the limit in last 5 days http://freebsd.quip.cz/ext/2012/2012-02-08-kiwi-mrtg-12-15/local_zfs_vfs_meta.html I don't know what's going on and I don't know if it is something know / fixed in newer releases. We are running a few more ZFS systems on 8.2 without this issue. But those systems are in different roles. This sounds like the... damn, what is it called... some kind of internal counter or ticks thing within the ZFS code that was discovered to only begin happening after a certain period of time (which correlated to some number of days, possibly 107). I'm sorry that I can't be more specific, but it's been discussed heavily on the lists in the past, and fixes for all of that were committed to RELENG_8. I wish I could remember the name of the function or macro or variable name it pertained to, something like LTHAW or TLOCK or something like that. I would say I don't know why I can't remember, but I do know why I can't remember: because I gave up trying to track all of these problems. Does someone else remember this issue? CC'ing Martin who might remember for certain. It's LBOLT. :-) And there was more than one related integer overflow. One of them manifested itself as L2ARC feeding thread hogging CPU time after about a month of uptime. Another one caused issue with ARC reclaim after 107 days. See more details in this thread: http://lists.freebsd.org/pipermail/freebsd-fs/2011-May/011584.html This would be an excellent piece of information to have on one of the ZFS wiki pages. The 107 day issue exists post-8.2, correct? Anyone on this cc: list have permissions to edit those pages? Thanks, Charles --Artem ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: siisch1: Error while READ LOG EXT
On 2/8/2012 5:46 PM, Alexander Motin wrote: READ LOG EXT for NCQ, same as REQUEST SENSE for ATAPI sent by every specific controller driver. In this case by siis_issue_recovery() function in dev/siis/siis.c. In case of proper READ LOG EXT completion, fetched status returned to CAM together with original command. Hi, Is there a way to find out which drive is causing these errors ? Looking at the logs on the various drives, they all seem to have the odd non zero value. I suspect it might be a Segate Disk as smartctl flags it as having bad firmware issues === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.11 Device Model: ST31000333AS Serial Number:9TE14SRV LU WWN Device Id: 5 000c50 010a39664 Firmware Version: SD35 User Capacity:1,000,204,886,016 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Device is:In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is:Thu Feb 9 09:40:56 2012 EST == WARNING: There are known problems with these drives, see the following Seagate web pages: http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931 http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207951 http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207957 -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: siisch1: Error while READ LOG EXT
On Thu, Feb 09, 2012 at 09:43:01AM -0500, Mike Tancsa wrote: On 2/8/2012 5:46 PM, Alexander Motin wrote: READ LOG EXT for NCQ, same as REQUEST SENSE for ATAPI sent by every specific controller driver. In this case by siis_issue_recovery() function in dev/siis/siis.c. In case of proper READ LOG EXT completion, fetched status returned to CAM together with original command. Hi, Is there a way to find out which drive is causing these errors ? Looking at the logs on the various drives, they all seem to have the odd non zero value. I suspect it might be a Segate Disk as smartctl flags it as having bad firmware issues === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.11 Device Model: ST31000333AS Serial Number:9TE14SRV LU WWN Device Id: 5 000c50 010a39664 Firmware Version: SD35 User Capacity:1,000,204,886,016 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Device is:In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is:Thu Feb 9 09:40:56 2012 EST == WARNING: There are known problems with these drives, see the following Seagate web pages: http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931 http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207951 http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207957 The URLs listed are for firmware-level problems with this model of Seagate drive. This is a very famous firmware issue and got a lot of media attention. The bugs with that firmware, however, would not appear as what you are seeing. You stated in your original mail that you added a port multiplier then started getting these errors. You then provided SMART output of /dev/ada9, so I made the assumption you had managed to figure out what device was causing the problem. I have to assume that devices connected on a port multiplier show up on a separate scbusX number. This is from your original mail: # camcontrol devlist WDC WD2001FASS-00U0B0 01.00101 at scbus0 target 0 lun 0 (pass0,ada0) WDC WD2001FASS-00U0B0 01.00101 at scbus0 target 1 lun 0 (pass1,ada1) WDC WD2001FASS-00U0B0 01.00101 at scbus0 target 2 lun 0 (pass2,ada2) WDC WD2001FASS-00U0B0 01.00101 at scbus0 target 3 lun 0 (pass3,ada3) Port Multiplier 47261095 1f06at scbus0 target 15 lun 0 (pass4,pmp1) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 0 lun 0 (pass5,ada4) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 1 lun 0 (pass6,ada5) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 2 lun 0 (pass7,ada6) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 3 lun 0 (pass8,ada7) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 4 lun 0 (pass9,ada8) Port Multiplier 37261095 1706at scbus1 target 15 lun 0 (pass10,pmp0) Areca usrvar R001at scbus4 target 0 lun 0 (pass11,da0) Areca backup1 R001 at scbus4 target 0 lun 1 (pass12,da1) Areca RAID controller R001 at scbus4 target 16 lun 0 (pass13) AMCC 9650SE-2LP DISK 4.10at scbus5 target 0 lun 0 (pass14,da2) ST31000333AS SD35at scbus6 target 0 lun 0 (pass15,ada9) ST31000528AS CC35at scbus7 target 0 lun 0 (pass16,ada10) ST31000340AS SD1Aat scbus8 target 0 lun 0 (pass17,ada11) WDC WD1002FAEX-00Z3A0 05.01D05 at scbus11 target 0 lun 0 (pass18,ada12) Based on this, and assuming my understanding of how this setup works -- and please note I could be wrong, these port multiplier things I have no familiarity with personally -- but it looks (to me) like this: scbus0 -- Associated with Port Multiplier device pmp1 -- Disk ada0 -- Disk ada1 -- Disk ada2 -- Disk ada3 scbus1 -- Associated with Port Multiplier device pmp0 -- Disk ada4 -- Disk ada5 -- Disk ada6 -- Disk ada7 -- Disk ada8 scbus4 -- Appeaars to be a Areca controller of some kind, in RAID -- Disk da0, volume usrvar -- Disk da1, volume backup1 scbus5 -- Not sure what this thing is -- Disk or thing da2 scbus6 -- Disk ada9 scbus7 -- Disk ada10 scbus8 -- Disk ada11 scbus11 -- Disk ada12 So which Port Multiplier did you add? The one at scbus0 or scbus1? A full dmesg (not just a snippet) would probably be helpful here. What you provided in your first post was too terse, especially given how many disks you have in this system. :-) I really see no problem with looking at all disks -- specifically disks ada0 through ada3, and ada4 through ada8 -- to determine which one may be having problems. You're welcome to run smartctl -a on each one and put them up on the web, preferably segregated by disk name (e.g. ada0.txt, ada1.txt, etc.) and I can review them all. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems
Re: siisch1: Error while READ LOG EXT
On 2/9/2012 10:22 AM, Jeremy Chadwick wrote: I have to assume that devices connected on a port multiplier show up on a separate scbusX number. This is from your original mail: Based on this, and assuming my understanding of how this setup works -- and please note I could be wrong, these port multiplier things I have no familiarity with personally -- but it looks (to me) like this: scbus0 -- Associated with Port Multiplier device pmp1 -- Disk ada0 -- Disk ada1 -- Disk ada2 -- Disk ada3 Correct. This is the original hardware. It too was showing the odd error prior to adding the new set of disks to expand the zfs pool. e.g. here are some errors on the original PM Feb 4 22:55:02 backup3 kernel: siisch0: Timeout on slot 24 Feb 4 22:55:02 backup3 kernel: siisch0: siis_timeout is 0004 ss 25002a00 rs 25002a00 es sts 80182000 serr Feb 4 22:55:02 backup3 kernel: siisch0: ... waiting for slots 24002a00 Feb 4 22:55:02 backup3 kernel: siisch0: Timeout on slot 13 Feb 4 22:55:02 backup3 kernel: siisch0: siis_timeout is 0004 ss 25002a00 rs 25002a00 es sts 80182000 serr Feb 4 22:55:02 backup3 kernel: siisch0: ... waiting for slots 24000a00 Feb 4 22:55:02 backup3 kernel: siisch0: Timeout on slot 29 Feb 4 22:55:02 backup3 kernel: siisch0: siis_timeout is 0004 ss 25002a00 rs 25002a00 es sts 80182000 serr Feb 4 22:55:02 backup3 kernel: siisch0: ... waiting for slots 04000a00 Feb 4 22:55:02 backup3 kernel: siisch0: Timeout on slot 11 scbus1 -- Associated with Port Multiplier device pmp0 -- Disk ada4 -- Disk ada5 -- Disk ada6 -- Disk ada7 -- Disk ada8 Correct, this is the new PM. 4 disks in use, and one spare. scbus4 -- Appeaars to be a Areca controller of some kind, in RAID yes. -- Disk da0, volume usrvar -- Disk da1, volume backup1 scbus5 -- Not sure what this thing is 3ware with a pair of faster disks that holds a large DB to slice and dice netflow data. -- Disk or thing da2 scbus6 scbus7 scbus8 scbus11 -- Disk ada12 Disks off the motherboard. So which Port Multiplier did you add? The one at scbus0 or scbus1? 1 WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 0 lun 0 (pass5,ada4) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 1 lun 0 (pass6,ada5) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 2 lun 0 (pass7,ada6) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 3 lun 0 (pass8,ada7) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 4 lun 0 (pass9,ada8) Port Multiplier 37261095 1706at scbus1 target 15 lun 0 (pass10,pmp0) A full dmesg (not just a snippet) would probably be helpful here. What you provided in your first post was too terse, especially given how many disks you have in this system. :-) I really see no problem with looking at all disks -- specifically disks ada0 through ada3, and ada4 through ada8 -- to determine which one may be having problems. You're welcome to run smartctl -a on each one and put them up on the web, preferably segregated by disk name (e.g. ada0.txt, ada1.txt, etc.) and I can review them all. Actually, I just had a look at another server at our DR site. Its hardware has not changed in a bit, but I did bring the kernel uptodate. Its now logging the odd 'READ LOG EXT' error as well. Its kernel is from Jan 22. Prior to that kernel update, I had not seen these errors. Something in the driver (ahci or cam layer?) that has changed perhaps ? Feb 4 11:12:36 offsite kernel: siisch1: Error while READ LOG EXT The output is in one giant txt file. But each section has the heading of the disk (for i in `jot 10 0`;do echo ada$i == d.rep; smartctl -x /dev/ada$i d.rep;smartctl -l gplog,0x10 /dev/ada$i d.rep;done;) http://www.tancsa.com/ahci.txt ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: serious packet routing issue causing ntpd high load?
Hi Vlad, Sorry about the delayed response. No, this one just fell through the cracks. Has anyone responded ? Does it still exist in 9.x ? --Qing On Mon, Feb 6, 2012 at 10:16 AM, Vlad Galu d...@dudu.ro wrote: Hi Qing, Any luck with this? Thanks Vlad On Thu, Nov 3, 2011 at 2:05 PM, Li, Qing qing...@bluecoat.com wrote: This endless route lookup miss message problem is reproducible without FLOWTABLE. The problem is with the multiple FIBs. I cannot reproduce this problem in my home network but the problem is easily seen at work. The route lookup miss itself in multi-FIBs configuration may be normal depending on the actual system configuration. It's the flooding of RTM_MISS messages that is abnormal. For example, if the route to the DNS servers is not configured in all FIBs, then the RTM_MISS message will be generated when an userland application sends to an explicit IP address in a specific FIB. In any case, I can reproduce the issue consistently and just trying to get a few uninterrupted hours to get it done. --Qing ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: siisch1: Error while READ LOG EXT
On Thu, Feb 09, 2012 at 07:22:40AM -0800, Jeremy Chadwick wrote: I have to assume that devices connected on a port multiplier show up on a separate scbusX number. This is from your original mail: # camcontrol devlist WDC WD2001FASS-00U0B0 01.00101 at scbus0 target 0 lun 0 (pass0,ada0) WDC WD2001FASS-00U0B0 01.00101 at scbus0 target 1 lun 0 (pass1,ada1) WDC WD2001FASS-00U0B0 01.00101 at scbus0 target 2 lun 0 (pass2,ada2) WDC WD2001FASS-00U0B0 01.00101 at scbus0 target 3 lun 0 (pass3,ada3) Port Multiplier 47261095 1f06at scbus0 target 15 lun 0 (pass4,pmp1) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 0 lun 0 (pass5,ada4) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 1 lun 0 (pass6,ada5) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 2 lun 0 (pass7,ada6) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 3 lun 0 (pass8,ada7) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 4 lun 0 (pass9,ada8) Port Multiplier 37261095 1706at scbus1 target 15 lun 0 (pass10,pmp0) Areca usrvar R001at scbus4 target 0 lun 0 (pass11,da0) Areca backup1 R001 at scbus4 target 0 lun 1 (pass12,da1) Areca RAID controller R001 at scbus4 target 16 lun 0 (pass13) AMCC 9650SE-2LP DISK 4.10at scbus5 target 0 lun 0 (pass14,da2) ST31000333AS SD35at scbus6 target 0 lun 0 (pass15,ada9) ST31000528AS CC35at scbus7 target 0 lun 0 (pass16,ada10) ST31000340AS SD1Aat scbus8 target 0 lun 0 (pass17,ada11) WDC WD1002FAEX-00Z3A0 05.01D05 at scbus11 target 0 lun 0 (pass18,ada12) Based on this, and assuming my understanding of how this setup works -- and please note I could be wrong, these port multiplier things I have no familiarity with personally -- but it looks (to me) like this: scbus5 -- Not sure what this thing is -- Disk or thing da2 3ware 9650SE controller (twa driver I beleive) Gary ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: siisch1: Error while READ LOG EXT
On Thu, Feb 09, 2012 at 11:12:06AM -0500, Mike Tancsa wrote: {snipping} So which Port Multiplier did you add? The one at scbus0 or scbus1? 1 WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 0 lun 0 (pass5,ada4) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 1 lun 0 (pass6,ada5) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 2 lun 0 (pass7,ada6) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 3 lun 0 (pass8,ada7) WDC WD2002FAEX-007BA0 05.01D05 at scbus1 target 4 lun 0 (pass9,ada8) Port Multiplier 37261095 1706at scbus1 target 15 lun 0 (pass10,pmp0) I'll provide analysis for all 5 of these disks below. A full dmesg (not just a snippet) would probably be helpful here. What you provided in your first post was too terse, especially given how many disks you have in this system. :-) I really see no problem with looking at all disks -- specifically disks ada0 through ada3, and ada4 through ada8 -- to determine which one may be having problems. You're welcome to run smartctl -a on each one and put them up on the web, preferably segregated by disk name (e.g. ada0.txt, ada1.txt, etc.) and I can review them all. Actually, I just had a look at another server at our DR site. Its hardware has not changed in a bit, but I did bring the kernel uptodate. Its now logging the odd 'READ LOG EXT' error as well. Its kernel is from Jan 22. Prior to that kernel update, I had not seen these errors. Something in the driver (ahci or cam layer?) that has changed perhaps ? Feb 4 11:12:36 offsite kernel: siisch1: Error while READ LOG EXT Perhaps, but mav@ would be the authority on that. http://www.tancsa.com/ahci.txt So here are the results of analysis for disks ada4 through ada8: ada4 -- When the below errors happened are 100% unknown. Just noting that here. -- SMART attribute 199 shows 13 CRC errors. These would be caused by issues between the disk and the device its attached to (port multiplier I guess). Causes could be bad SATA cables, bad ports, dirty/dusty ports, or flaky PCB (on the disk itself). -- SATA PHY log/counters confirms above problem: ID Size Value Description 0x0001 2 13 Command failed due to ICRC error 0x0002 2 13 R_ERR response for data FIS 0x0003 2 13 R_ERR response for device-to-host data FIS -- Given this behaviour, possibly the ATA commands submit which experienced errors were NCQ-related. -- The NCQ command error log does have non-zero values in it. The format of the output is proprietary, sadly, and smartmontools does not know how to decode it. But, compare it to your other drives and you'll see there is non-zero data there. -- This is a likely candidate for the behaviour seen on this PM. ada5 -- When the below errors happened are 100% unknown. Just noting that here. -- SMART attribute 199 shows 11 CRC errors. These would be caused by issues between the disk and the device its attached to (port multiplier I guess). Causes could be bad SATA cables, bad ports, dirty/dusty ports, or flaky PCB (on the disk itself). -- SATA PHY log/counters confirms above problem: ID Size Value Description 0x0001 2 11 Command failed due to ICRC error 0x0002 2 11 R_ERR response for data FIS 0x0003 2 11 R_ERR response for device-to-host data FIS -- Given this behaviour, possibly the ATA commands submit which experienced errors were NCQ-related. -- The NCQ command error log does have non-zero values in it. The format of the output is proprietary, sadly, and smartmontools does not know how to decode it. But, compare it to your other drives and you'll see there is non-zero data there. -- This is a likely candidate for the behaviour seen on this PM. ada6 -- When the below errors happened are 100% unknown. Just noting that here. -- SMART attribute 199 shows 8 CRC errors. These would be caused by issues between the disk and the device its attached to (port multiplier I guess). Causes could be bad SATA cables, bad ports, dirty/dusty ports, or flaky PCB (on the disk itself). -- SATA PHY log/counters confirms above problem: ID Size Value Description 0x0001 28 Command failed due to ICRC error 0x0002 28 R_ERR response for data FIS 0x0003 28 R_ERR response for device-to-host data FIS -- Given this behaviour, possibly the ATA commands submit which experienced errors were NCQ-related. -- The NCQ command error log does have non-zero values in it. The format of the output is proprietary, sadly, and smartmontools does not know how to decode it. But, compare it to your other drives and you'll see there is non-zero data there. -- This is a likely candidate for
Re: siisch1: Error while READ LOG EXT
On 2/9/2012 11:34 AM, Jeremy Chadwick wrote: You will probably need to track these drives on a regular basis. That is to say, set up some cronjob or similar that logs the above output to a file (appends data to it), specifically output from smartctl -A (not -a and not -x) and smartctl -l sataphy on a per-disk basis. smartd can track SMART attribute changes, but does not track GPLog changes. Make sure to put timestamps in your logs. Thanks very much for having a look, and the suggestions. It think this is the way to go to see which drive my have errors incrementing. Alexander, is there a better way you can suggest ? As for fixing the problem: I have no idea how you would go about this. Use of port multipliers involves additional cables, possibly of shoddy quality, or other components which may not be decent/reliable. Possibly. Cables are one of those things I am happy to pay extra for better quality but how does one assess quality of such parts. Overall, this is just one of the many reasons why I avoid PMs, as well as avoid eSATA (especially eSATA). Yeah, at some point it doesnt really work with too many PMs, especially if you cant query the thing to find out where things are bad. I think for the next version of this box I will use the newer generation 3ware SAS/SATA controller ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
known problems with 8.x and HP DL16 G5 server?
does anyone know of problems with freebsd and this system? the kernel We tried to boot seems to stop somewhere in the ahci probing. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: known problems with 8.x and HP DL16 G5 server?
On Thu, Feb 09, 2012 at 01:48:29PM -0800, Julian Elischer wrote: does anyone know of problems with freebsd and this system? the kernel We tried to boot seems to stop somewhere in the ahci probing. Few things: 1) Possible to get full console output (e.g. serial, etc.) from a verbose boot? 2) Can you also provide the exact release/tag/kernel/thing you're trying to install or upgrade to (8.x is a little vague; there are all sorts of changes that happen between tags). For example 8.1 is not going to behave the same necessarily as 8.2. 3) When you say ahci probing, are you booting a standard installation CD/DVD/memstick of, say, 8.2? If so, those won't make use of the AHCI-to-CAM translation layer (and that AHCI code is also different than the native-ATA-AHCI code), so you might try, when booting the system, dropping to the loader prompt and issuing load ahci.ko before typing boot. See if that helps. If it does, great, use it (ahci_load=yes in /boot/loader.conf) permanently (and benefit from things like NCQ too). 4) If it's an Intel ESB2 controller, I believe there were some fixes or identification shims put in place for this in recent RELENG_8, which wouldn't be available in RELENG_8_2 or 8.2-RELEASE CD/DVDs. I could be remembering the wrong controller though. Hmm... -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: serious packet routing issue causing ntpd high load?
- Original Message - From: Qing Li qin...@freebsd.org Sorry about the delayed response. No, this one just fell through the cracks. Has anyone responded ? Does it still exist in 9.x ? We discovered yesterday that adding the following routes, which are present in: /etc/rc.d/network_ipv6, but not active unless ipv6_enable=YES is set fixed the issue:- route add -inet6 :::0.0.0.0 -prefixlen 96 ::1 -reject route add -inet6 ::0.0.0.0 -prefixlen 96 ::1 -reject I haven't confirmed but this is reported to be set by default on 9.x due to the changes in rc.d scripts. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: known problems with 8.x and HP DL16 G5 server?
On Thu, Feb 09, 2012 at 04:02:12PM -0800, Julian Elischer wrote: On 2/9/12 1:56 PM, Jeremy Chadwick wrote: On Thu, Feb 09, 2012 at 01:48:29PM -0800, Julian Elischer wrote: does anyone know of problems with freebsd and this system? the kernel We tried to boot seems to stop somewhere in the ahci probing. Few things: 1) Possible to get full console output (e.g. serial, etc.) from a verbose boot? it's freebsd 8.2 from a TrueNAS/FreeNAS. I'm actually at ix-systems at the moment.. but I wasnhoping someone could save us some time by saying Oh yeah, merge in change number xx 2) Can you also provide the exact release/tag/kernel/thing you're trying to install or upgrade to (8.x is a little vague; there are all sorts of changes that happen between tags). For example 8.1 is not going to behave the same necessarily as 8.2. 3) When you say ahci probing, are you booting a standard installation CD/DVD/memstick of, say, 8.2? If so, those won't make use of the AHCI-to-CAM translation layer (and that AHCI code is also different than the native-ATA-AHCI code), so you might try, when booting the system, dropping to the loader prompt and issuing load ahci.ko before typing boot. See if that helps. If it does, great, use it (ahci_load=yes in /boot/loader.conf) permanently (and benefit from things like NCQ too). let me forward you an image... 4) If it's an Intel ESB2 controller, I believe there were some fixes or identification shims put in place for this in recent RELENG_8, which wouldn't be available in RELENG_8_2 or 8.2-RELEASE CD/DVDs. I could be remembering the wrong controller though. Hmm... that may be what we are looking for. I'll try get more info. For others: the last few lines in the kernel log are: acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on acpi0 acpi_hpet0: vend: 0x8086 rev: 0x1 num: 3 hz: 14318180 opts: legacy_route 64-bit Timecounter HPET frequency 14318180 Hz quality 900 acpi: wakeup code va 0xff848311d000 pa 0x4000 ahc_isa_probe 0: ioport 0xc00 alloc failed I don't see any indication of AHCI problems here (or AHCI at all). ahc_isa_probe is for the ahc(4) controller -- Adaptec SCSI. A verbose boot might be more helpful. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org