Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
Hello, Samuel. You wrote 15 декабря 2011 г., 16:32:47: Other benchmarks in the Phoronix suite and their representations are similarly flawed, _ALL_ of these results should be ignored and no time should be wasted by any FreeBSD committer further evaluating this garbage. (Yes, I have been down this rabbit hole). Here is one problem: we have choice from three items: (1) Make FreeBSD looks good on benchmarks by fixing FreeBSD (2) Make FreeBSD looks good on benchmarks by fixing Phoronix (communication with them, convincing, that they benchamrks are unfare / meaningless, ets) (3) Lose [potential] userbase. You know, that these benchmarks are bad. I know. But potential (and even some current!) user doesn't. And it seems, that these benchmarks become popular over Internet. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: VM images for FreeBSD
If anyone interested, I got here [1] VirtualBox Image: FreeBSD-10-amd64-r228694-2011-12-19.vdi.xz Anyone who's looking to test 10 can get to test it :) It contains package-installed partial system with openbox; Image configured to run DHCP on em0, you can change this in /etc/rc.conf, as usually. When you'll get internet working, you can add packages (9 ones), running /root/addpackage.sh $1 To get X, login as root, start /root/runx.sh In a few seconds (there's delays for safety) you should get X with openbox. BTW, it contains also qt 4.8.0 and qtcreator 2.4.0, you can test something and help a bit for KDE/QT team with any feedbacks. It's installed with default settings in their default prefixes (qt in /usr/local/Trolltech, and qtcreator in / ), so, to run something you probably must set correct LD_PATH. As for qtcreator, I created script for launch it, placed in root, which is also launched when you start X. 1. http://gits.kiev.ua/FreeBSD/ P.S. As always, I'm looking for anyone who will lend me a hand in enhancing build scripts. -- Regards, Alexander Yerenkow ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
On 19/12/2011 08:27, Lev Serebryakov wrote: Here is one problem: we have choice from three items: (1) Make FreeBSD looks good on benchmarks by fixing FreeBSD (2) Make FreeBSD looks good on benchmarks by fixing Phoronix (communication with them, convincing, that they benchamrks are unfare / meaningless, ets) (2a) Ignore Phoronix, other than explaining concisely why their numbers are complete balderdash. Publish our own benchmarks, done with care and rigour and using well defined, repeatable, peer reviewed methodology that anyone can repeat. Aggressively publicise these results. (3) Lose [potential] userbase. Indeed. Unfortunately performance is /the/ deciding factor in many OS choices, never mind that it is an impossibly complex subject to generalise to a few management-friendly numbers in a one-size-fits-all abstract way. Having only one source of published numbers suggesting that OS Foo is better *even if those numbers are completely bogus* will have a disproportionate effect. Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate JID: matt...@infracaninophile.co.uk Kent, CT11 9PW signature.asc Description: OpenPGP digital signature
r228700 can't dhclient em0
I updated to r228700 from 228122 and dhclient exits immediately saying that em0 doesn't exist. However ifconfig seems to disagree: em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=4219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO ether 00:24:e8:30:10:9b nd6 options=29PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL media: Ethernet autoselect (100baseTX full-duplex) status: active lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST metric 0 mtu 16384 options=3RXCSUM,TXCSUM nd6 options=21PERFORMNUD,AUTO_LINKLOCAL Interestingly, some of the options are different in that version, vs. the working version: em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC ether 00:24:e8:30:10:9b inet 172.17.198.245 netmask 0x broadcast 172.17.255.255 nd6 options=29PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL media: Ethernet autoselect (100baseTX full-duplex) status: active -- [^L] Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
Hello, Adrian. You wrote 16 декабря 2011 г., 20:43:27: Guys/girls/fuzzy things - this is 2011; people look at shiny blog sites with graphs rather than mailing lists. Sorry, we lost that battle. :) My thoughts exactly. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: svn commit: r228576 - in head: . sys/boot/forth sys/modules sys/modules/carp sys/modules/if_carp
Hi, we support the official ways to update FreeBSD with delete-old. This means installkernel resp. install in the kernel config directory, and freebsd-update. I hope freebsd-update does the right thing and moves the old kernel directory out of the way. We do not support weird cases with delete-old. As such the entry does not beong ino ObsoleteFiles. Bye, Alexander. -- Send via an Android device, please forgive brevity and typographic and spelling errors. Gleb Smirnoff gleb...@freebsd.org hat geschrieben: Alexander, On Sat, Dec 17, 2011 at 03:08:43PM +0100, Alexander Leidinger wrote: A we never had a kernel part in the list. Reinstallkernel is not a valid target after updating the sources. The renaming will only take effekt after updating. And we already hat issues because the list was too long. A Your entry for the carp module is completely out of question for this list. Please remove it. The file /boot/kernel/if_carp.ko had been installed on older installations. It is not overwritten now. Thus, it may happen in a some weird case, that it is left intact. 'make installkernel' is not the only way to upgrade FreeBSD. To cover these potential cases I have added an entry. This entry doesn't hurt anybody or anything. The argument for getting list of ObsoleteFiles.inc can't be taken seriously. The fact is that this file is going to instantly grow in any forseen future. It is never going to get shorter. Thus, if we are getting problems with the list getting too long, then we need to enhance the script that delete old files, not try to reduce it by 0.0235% removing one of recent entries, that is uncertain. I am adding current@ to CC, may be someone can take role of negotiator on this issue, or just has opinion. A A Bye, A Alexander. A A -- A Send via an Android device, please forgive brevity and typographic and spelling errors. Gleb Smirnoff gleb...@freebsd.org hat geschrieben: Alexander, A A On Fri, Dec 16, 2011 at 05:49:03PM +0100, Alexander Leidinger wrote: A A the ObsoleteFiles part ist not necessary, please remove. The installkernel moves the old stuff to kernel.old. A A I know that it does, and for 99% people this entry won't be needed. A But let it be here for those, who install new kernel some other way, A for example 'make reinstallkernel' or even copying by hand. A A The superfluous entry in ObsoleteFiles.inc has zero negative impact, A anyway. A A -- A Totus tuus, Glebius. A -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
Hello, Matthew. You wrote 19 декабря 2011 г., 13:13:09: (1) Make FreeBSD looks good on benchmarks by fixing FreeBSD (2) Make FreeBSD looks good on benchmarks by fixing Phoronix (communication with them, convincing, that they benchamrks are unfare / meaningless, ets) (2a) Ignore Phoronix, other than explaining concisely why their numbers are complete balderdash. Publish our own benchmarks, done with care and rigour and using well defined, repeatable, peer reviewed methodology that anyone can repeat. Aggressively publicise these results. Ok, it is The Way too, I agree. But in modern world, unfortunately (for me, and I'm sure, for many FreeBSD hackers), keywords are Aggressively publicise but not done with care and rigour and using well defined, repeatable, peer reviewed methodology that anyone can repeat (3) Lose [potential] userbase. Indeed. Unfortunately performance is /the/ deciding factor in many OS choices, never mind that it is an impossibly complex subject to generalise to a few management-friendly numbers in a one-size-fits-all abstract way. Having only one source of published numbers suggesting that OS Foo is better *even if those numbers are completely bogus* will have a disproportionate effect. Yep. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
On Mon, Dec 19, 2011 at 09:13:09AM +, Matthew Seaman wrote: On 19/12/2011 08:27, Lev Serebryakov wrote: Here is one problem: we have choice from three items: (1) Make FreeBSD looks good on benchmarks by fixing FreeBSD (2) Make FreeBSD looks good on benchmarks by fixing Phoronix (communication with them, convincing, that they benchamrks are unfare / meaningless, ets) (2a) Ignore Phoronix, other than explaining concisely why their numbers are complete balderdash. Publish our own benchmarks, done with care and rigour and using well defined, repeatable, peer reviewed methodology that anyone can repeat. Aggressively publicise these results. Slashdot and others don't ignore Phoronix, so (2a) is only and option if you accept (3). My personal opinion: Phoronix may compare apples to oranges from time to time and it might be possible to catch up with Linux' results by tweaking some system parameters, but Joe Average expects a fast and working OS out-of-the-box and after reading a Phoronix benchmark, he will probably prefer Linux over FreeBSD. /me thinks that our userbase is not big enough to put off potential new or existing users, so we should question our default config values or clearly and publicly explain why the results for FreeBSD are slower because of data integrity / security / $other_reasons. (3) Lose [potential] userbase. Indeed. Unfortunately performance is /the/ deciding factor in many OS choices, never mind that it is an impossibly complex subject to generalise to a few management-friendly numbers in a one-size-fits-all abstract way. Having only one source of published numbers suggesting that OS Foo is better *even if those numbers are completely bogus* will have a disproportionate effect. pgpUttizlWefQ.pgp Description: PGP signature
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
On 12/19/11 09:27, Lev Serebryakov wrote: Hello, Samuel. You wrote 15 декабря 2011 г., 16:32:47: Other benchmarks in the Phoronix suite and their representations are similarly flawed, _ALL_ of these results should be ignored and no time should be wasted by any FreeBSD committer further evaluating this garbage. (Yes, I have been down this rabbit hole). Here is one problem: we have choice from three items: (1) Make FreeBSD looks good on benchmarks by fixing FreeBSD (2) Make FreeBSD looks good on benchmarks by fixing Phoronix (communication with them, convincing, that they benchamrks are unfare / meaningless, ets) (3) Lose [potential] userbase. You know, that these benchmarks are bad. I know. But potential (and even some current!) user doesn't. And it seems, that these benchmarks become popular over Internet. +1 It is not about a faky way to let a specific OS look good by any means. I'M afraid of (3), which also implies pushing more towards beeing meaningless and not anymore a alternative with a unique, remarkable criteria to be choosen as __the__ operating system of the first choice for several purposes. By the way, how such a development could look alaike is very clear when it comes to GPGPU/HPC, highly related to the availability of proper graphics card drivers, X11 development and the necessary libraries, APIs and even compilers. None of those professionals out here, none of those pushing the eyewhitness of bad performance into very deep-insight-talks about what could cause the problem has obviously ever negotiated with people of the upper floor when it comes to the choice of the OS. Within my department, the *BSD aren't even considered an option, even if they would perform best for the specified purpose (which, I regeret, is a shrinking basis now since also Linux will have ZFS). Sometimes I feel like Don Quixote, fighting against windmills. Sorry having brought up this thread and I beg for pardon for putting another scrtach into the autoerotic world of the core. signature.asc Description: OpenPGP digital signature
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
2011/12/19 Lev Serebryakov l...@freebsd.org: Hello, Samuel. You wrote 15 декабря 2011 г., 16:32:47: Other benchmarks in the Phoronix suite and their representations are similarly flawed, _ALL_ of these results should be ignored and no time should be wasted by any FreeBSD committer further evaluating this garbage. (Yes, I have been down this rabbit hole). Here is one problem: we have choice from three items: (1) Make FreeBSD looks good on benchmarks by fixing FreeBSD (2) Make FreeBSD looks good on benchmarks by fixing Phoronix (communication with them, convincing, that they benchamrks are unfare / meaningless, ets) (3) Lose [potential] userbase. You know, that these benchmarks are bad. I know. But potential (and even some current!) user doesn't. And it seems, that these benchmarks become popular over Internet. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org Here is where you completely derail the train, let me paste again what I said before. ... Take the first test as an example, Blogbench read. This doesn't raise any red flags, right? At least not until you realize that Blogbench isn't a read test, it's a read/write test. So what they have done here is run a read/write test and then thrown away the write results for both platforms and reported only the read results. If you dig down into the actual results, http://openbenchmarking.org/result/1112113-AR-ORACLELIN37 -- you will see two Blogbench numbers, one for read and another for write. These were both taken from the same Blogbench run, so FreeBSD optimizes writes over reads, that's probably a good thing for your data but a bad thing when someone totally misrepresents benchmark results. ... FreeBSD actually does _BETTER_ (subjectively) in this test than the Linux system when you look at what is really going on. FreeBSD is favoring writes, which is _GOOD_. FreeBSD does not need to be fixed, the benchmarks need to be fixed to represent reality rather than throwing half of the results in the trash. To be quite frank, fixing FreeBSD to look good on this benchmark will make it a worse real-world OS. But you guys go ahead and foot-shoot over these ridiculous benchmarks all you want. Sam ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
a few usb issues related to edge cases
Hans Petter, I think that I see some issues in the USB code that could cause problems in some edge cases. From easiest to hardest: 1. I think that currently there is a LOR in usb_bus_shutdown. I think that the following patch should fix it: === --- a/sys/dev/usb/controller/usb_controller.c +++ b/sys/dev/usb/controller/usb_controller.c @@ -479,6 +481,7 @@ usb_bus_shutdown(struct usb_proc_msg *pm) bus_generic_shutdown(bus-bdev); + USB_BUS_UNLOCK(bus); usbd_enum_lock(udev); err = usbd_set_config_index(udev, USB_UNCONFIG_INDEX); @@ -497,6 +500,7 @@ usb_bus_shutdown(struct usb_proc_msg *pm) (bus-methods-set_hw_power_sleep) (bus, USB_HW_POWER_SHUTDOWN); usbd_enum_unlock(udev); + USB_BUS_LOCK(bus); } static void === Otherwise there are a lot of nasty reports like: lock order reversal: (sleepable after non-sleepable) 1st 0xff80006b0688 ohci0 (ohci0) @ /usr/src/sys/dev/usb/controller/usb_controller.c:336 2nd 0xfe00023cf070 USB config SX lock (USB config SX lock) @ /usr/src/sys/dev/usb/usb_device.c:2643 usbd_transfer_unsetup can sleep! with the following non-sleepable locks held: exclusive sleep mutex ohci0 (ohci0) r = 0 (0xff80006b0688) locked @ /usr/src/sys/dev/usb/controller/usb_controller.c:336 2. Somewhat related to the above. I think that because the USB subsystem implements the shutdown method and detaches all its drivers, then the ukbd driver won't be able to properly handle the 'shutdown -h' case where the kernel asks to press any key to reboot at the end. Depending on which thread wins the race (the one that executes the mainline shutdown code or the USB explore thread that detaches USB devices) there will either an immediate reboot or a later crash when any key is pressed. This is not critical, but OTOH perhaps the USB subsystem doesn't have to do the shutdown. As far as I can see a lot of the drivers just do nothing for the shutdown, for better or for worth. A side note: perhaps it would be a good idea to pass the 'how' value as an additional parameter to device_shutdown. 3. Looking at usbd_transfer_poll I see that it touches a lot of locks, including taking the bus lock. As we've discussed before, this is not safe in a particular context where the polling is supposed to be used - in the kdb/ddb context. If the lock is already taken by another thread, then instead of being able to use a USB keyboard a user would get even less debug-able crash. Also, it seems that usbd_transfer_poll calls into the usual state machine with various callbacks and dynamically made decisions about whether to execute some actions directly or defer their execution to a different thread. That code also touches locks in various places. I think that it would be more preferable to have a method that does the job in a more straight-forward way, without touching any locks, ignoring the usual code paths and assuming that no other treads are running in parallel. Ditto for the method to submit a request. As a side note: we probably need a flag to mark certain things such as e.g. the ukbd driver as non recoverable, meaning that once those are used in the kdb context then there is no safe way to go back to normal system operation. What do you think? Thank you. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
On Mon, Dec 19, 2011 at 6:49 PM, Samuel J. Greear s...@evilcode.net wrote: FreeBSD actually does _BETTER_ (subjectively) in this test than the Linux system when you look at what is really going on. FreeBSD is favoring writes, which is _GOOD_. FreeBSD does not need to be fixed, the benchmarks need to be fixed to represent reality rather than throwing half of the results in the trash. To be quite frank, fixing FreeBSD to look good on this benchmark will make it a worse real-world OS. But you guys go ahead and foot-shoot over these ridiculous benchmarks all you want. Would you prefer a blog which allows you to: A: - create/write 100 posts/s - serve/read 1000 posts/s or B: - create/write 80 posts/s - serve/read 3000 posts/s ? I would personally choose B. -- O ascii ribbon campaign - stop html mail - www.asciiribbon.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: svn commit: r228576 - in head: . sys/boot/forth sys/modules sys/modules/carp sys/modules/if_carp
On 19. Dec 2011, at 09:18 , Alexander Leidinger wrote: I think in general Alexander is right here. We usually do not allow for atomic replacements of individual modules in /boot/kernel/ unless you know what you are doing, in which case the ObsoleteFiles.inc doesn't seem to be what you are running either. Also please remember that for the user (not a developers) hitting this means a major version upgrade to 10.x and that will never keep the same /boot/kernel anyway. /bz -- Bjoern A. Zeeb You have to have visions! Stop bit received. Insert coin for new address family. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
IMHO, no offence, as always. As were told, Phoronix used default setup, not tuned. So? Is average user will tune it after setup? No, he'll get same defaults, and would expect same performance as in tests, and he probably get it. The problem of FreeBSD is not it's default settings, some kind of very-safe defaults really should be there. But problem really is lacking of choosing them (defaults) during install, for average users. For example, few checkboxes with common sysctl tuning would be perfect, even if they would be marked as Experimental, or not recommended. I'm thinking it's better way to make something in one place (like in installer) rather than require make almost same actions in many (hundreds of thousands?... more?...) places (end-users forced to read mail-lists/handbooks/forums over and over for same solutions). Simple example - many connections for PostgreSQL is not available on FreeBSD out-of-box. Just google postgresql freebsd max connection and you'll see how many there bikesheds requested and same solutions posted again and again :) FreeBSD currently have very obscure, closed community. To get in touch, you need to subscribe to several mail lists, constantly read them, I've just found recently (my shame of course) in mail list that there is service ( pub.allbsd.org) which constantly building current versions. This is great, but at homepage of freebsd.org there is no word about it :) I hope we all do something good about this, and things will going to change. -- Regards, Alexander Yerenkow ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: r228700 can't dhclient em0
On 2011-12-19 10:17, Doug Barton wrote: I updated to r228700 from 228122 and dhclient exits immediately saying that em0 doesn't exist. However ifconfig seems to disagree: em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=4219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO ether 00:24:e8:30:10:9b nd6 options=29PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL media: Ethernet autoselect (100baseTXfull-duplex) status: active lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST metric 0 mtu 16384 options=3RXCSUM,TXCSUM nd6 options=21PERFORMNUD,AUTO_LINKLOCAL Interestingly, some of the options are different in that version, vs. the working version: em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC ether 00:24:e8:30:10:9b inet 172.17.198.245 netmask 0x broadcast 172.17.255.255 nd6 options=29PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL media: Ethernet autoselect (100baseTXfull-duplex) status: active I saw this too, when my kernel and userland were out of sync (e.g. just after installing a new kernel, and before installworld). I suspect it is caused by the changes in r228571, which cause old ifconfig and dhclient to not recognize any interfaces. I'm not 100% sure though... ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
On 19 dec 2011, at 12:50, Samuel J. Greear s...@evilcode.net wrote: 2011/12/19 Lev Serebryakov l...@freebsd.org: Hello, Samuel. You wrote 15 декабря 2011 г., 16:32:47: Other benchmarks in the Phoronix suite and their representations are similarly flawed, _ALL_ of these results should be ignored and no time should be wasted by any FreeBSD committer further evaluating this garbage. (Yes, I have been down this rabbit hole). Here is one problem: we have choice from three items: (1) Make FreeBSD looks good on benchmarks by fixing FreeBSD (2) Make FreeBSD looks good on benchmarks by fixing Phoronix (communication with them, convincing, that they benchamrks are unfare / meaningless, ets) (3) Lose [potential] userbase. You know, that these benchmarks are bad. I know. But potential (and even some current!) user doesn't. And it seems, that these benchmarks become popular over Internet. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org Here is where you completely derail the train, let me paste again what I said before. ... Take the first test as an example, Blogbench read. This doesn't raise any red flags, right? At least not until you realize that Blogbench isn't a read test, it's a read/write test. So what they have done here is run a read/write test and then thrown away the write results for both platforms and reported only the read results. If you dig down into the actual results, http://openbenchmarking.org/result/1112113-AR-ORACLELIN37 -- you will see two Blogbench numbers, one for read and another for write. These were both taken from the same Blogbench run, so FreeBSD optimizes writes over reads, that's probably a good thing for your data but a bad thing when someone totally misrepresents benchmark results. ... FreeBSD actually does _BETTER_ (subjectively) in this test than the Linux system when you look at what is really going on. FreeBSD is favoring writes, which is _GOOD_. FreeBSD does not need to be fixed, the benchmarks need to be fixed to represent reality rather than throwing half of the results in the trash. To be quite frank, fixing FreeBSD to look good on this benchmark will make it a worse real-world OS. But you guys go ahead and foot-shoot over these ridiculous benchmarks all you want. Sam I seem to remember that before ULE people were fleeing to Linux as the os to run apache on since 4BSD didn't scale all too well. That may have changed over time though. However ULE could perhaps be made aware technologies like turbo-boost, ie with few threads higher performance might be gained by utilizing all virtual cores on a physical core before spreading tasks to too different cores. Just my speculations though :) Regards Andreas Nilsson ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
On 12/19/11 13:21, Andreas Nilsson wrote: On 19 dec 2011, at 12:50, Samuel J. Greear s...@evilcode.net wrote: 2011/12/19 Lev Serebryakov l...@freebsd.org: Hello, Samuel. You wrote 15 декабря 2011 г., 16:32:47: Other benchmarks in the Phoronix suite and their representations are similarly flawed, _ALL_ of these results should be ignored and no time should be wasted by any FreeBSD committer further evaluating this garbage. (Yes, I have been down this rabbit hole). Here is one problem: we have choice from three items: (1) Make FreeBSD looks good on benchmarks by fixing FreeBSD (2) Make FreeBSD looks good on benchmarks by fixing Phoronix (communication with them, convincing, that they benchamrks are unfare / meaningless, ets) (3) Lose [potential] userbase. You know, that these benchmarks are bad. I know. But potential (and even some current!) user doesn't. And it seems, that these benchmarks become popular over Internet. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org Here is where you completely derail the train, let me paste again what I said before. ... Take the first test as an example, Blogbench read. This doesn't raise any red flags, right? At least not until you realize that Blogbench isn't a read test, it's a read/write test. So what they have done here is run a read/write test and then thrown away the write results for both platforms and reported only the read results. If you dig down into the actual results, http://openbenchmarking.org/result/1112113-AR-ORACLELIN37 -- you will see two Blogbench numbers, one for read and another for write. These were both taken from the same Blogbench run, so FreeBSD optimizes writes over reads, that's probably a good thing for your data but a bad thing when someone totally misrepresents benchmark results. ... FreeBSD actually does _BETTER_ (subjectively) in this test than the Linux system when you look at what is really going on. FreeBSD is favoring writes, which is _GOOD_. FreeBSD does not need to be fixed, the benchmarks need to be fixed to represent reality rather than throwing half of the results in the trash. To be quite frank, fixing FreeBSD to look good on this benchmark will make it a worse real-world OS. But you guys go ahead and foot-shoot over these ridiculous benchmarks all you want. Sam I seem to remember that before ULE people were fleeing to Linux as the os to run apache on since 4BSD didn't scale all too well. That may have changed over time though. However ULE could perhaps be made aware technologies like turbo-boost, ie with few threads higher performance might be gained by utilizing all virtual cores on a physical core before spreading tasks to too different cores. Just my speculations though :) Regards Andreas Nilsson Such a scheduling stratey is definitely necessary on AMDs new Bulldozer architecture, which seems to be very pitty about threads locked on the same module. Microsoft just offered a patch for Windows 7 to implant such a Bulldozer awarenes but they withdraw the patch as invalid two days after the release. The seults seem to favour FPU performance over integer performance. As Samuel Greear wrote, FreeBSD looks not that bad in some of the benchmarks but there are obviosly issues, at least the fact that Phoronix/openbenchmark.org are the only sites offering benchmarks at all. People outside the FreeBSD realm looking for opportunities, what do you think they will look first after? Phoronix/Openbenchmark.org made the first step and they seem to make FreeBSD look bad (in my opinion), whether righteous or not. Compared to several subjective impressions I have in our heterogeneous environment at the lab, Linux on the same hardware looks in several aspects much better. Oliver signature.asc Description: OpenPGP digital signature
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
I have already canceled few replies to this thread, but... On 19.12.11 15:16, Alexander Yerenkow wrote: IMHO, no offence, as always. I feel obliged to include the same disclaimer :-) As were told, Phoronix used default setup, not tuned. Not really. They created some weird test environment, at least for FreeBSD -- who knows, possibly for Linux as well. For example, ZFS is by no means a default file system in FreeBSD. You need to go trough manual steps, to enable it, to build the pool, filesystems etc. This is because ZFS is very powerful file system and storage manager that needs some thinking before you implement it -- then it may reward you with features not found anywhere else. Funny, ZFS is available in Linux too, and at least the file system tests might benefit from using one and the same file system. One would expect that ZFS was used for both, in a multiple-disk (way over 4 disks) setup, as one would expect to be the case for a 'server'. So? Is average user will tune it after setup? No, he'll get same defaults, and would expect same performance as in tests, and he probably get it. You forget, that the FreeBSD type and the Linux type are quite different. This is why both worlds exist. The FreeBSD way is to understand what you do and configure your environment accordingly. FreeBSD gives you flexibility to do as you please and in most of the possible configurations it will work. Maybe not optimally, but will not break on you. With FreeBSD there is never one true way to do things. The Linux way on the other hand is to follow a HowTo instruction. The Linux OS is typically optimized for these setups and as long as you follow the HOWTO you are safe and well performance-wise. If you go way out of the prescriptions in the HOWTO, you may end up with losing data, crashing system or extremely poor performance. I know, things are not that black and white, but this is the general difference. But problem really is lacking of choosing them (defaults) during install, for average users. Who are the average users? It has been repeatedly said, that the PC user is always better to start with PC-BSD, because it is FreeBSD with safe defaults suitable for a desktop. For example, few checkboxes with common sysctl tuning would be perfect, even if they would be marked as Experimental, or not recommended. By following this, we push FreeBSD into the Linux style of doing things: someone else decides what is good for you, without having a clue of your circumstances. Simple example - many connections for PostgreSQL is not available on FreeBSD out-of-box. Just google postgresql freebsd max connection and you'll see how many there bikesheds requested and same solutions posted again and again :) Still, PostgreSQL is not part of FreeBSD. The PostgreSQL port clearly says what you need to adjust in your setup in order to use it. As do most other ports. Computers do what people ask them to do -- we are far from the AI times, when the computers will assembe, configure and run themselves the way we think they should. FreeBSD currently have very obscure, closed community. Some say this is a feature ;-) To get in touch, you need to subscribe to several mail lists, constantly read them, I've just found recently (my shame of course) in mail list that there is service (pub.allbsd.org) which constantly building current versions. This is great, but at homepage of freebsd.org there is no word about it :) There is a menu Community on www.freebsd.org and an Forums entry there. You don't have to use mailing lists, of you prefer forums. I hope we all do something good about this, and things will going to change. Many bright people do a lot of things about all of these issues. If there is a problem, one needs to understand the problem, what causes the problem and what are the implications. Merely reacting on the symptoms never helps in the long run, as the core problem is not resolved. So far in this thread there is no evidence of where the problem is. There is no evidence even if there is a real problem -- except that many people get overly excited by benchmarks. To the last point I could add that, with experience, one learns that: the benchmarks done in your environment, with your settings, with your OS version, on your hardware and with your set of applications does not help me much on my hardware/software/configuration -- except if these happen to be very similar. /usr/ports/benchmarks is your friend. Daniel ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Uneven load on drives in ZFS RAIDZ1
Hi ZFS users, for quite some time I have observed an uneven distribution of load between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt of a longer log of 10 second averages logged with gstat: dT: 10.001s w: 10.000s filter: ^a?da?.$ L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 0130106 41344.5 23 10335.2 48.8| ada0 0131111 37844.2 19 10074.0 47.6| ada1 0 90 66 22194.5 24 10315.1 31.7| ada2 1 81 58 20074.6 22 10232.3 28.1| ada3 L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 1132104 40364.2 27 11295.3 45.2| ada0 0129103 36794.5 26 11156.8 47.6| ada1 1 91 61 21334.6 30 11291.9 29.6| ada2 0 81 56 19854.8 24 11026.0 29.4| ada3 L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 1148108 40845.3 39 25117.2 55.5| ada0 1141104 36935.1 36 2505 10.4 54.4| ada1 1102 62 21125.6 39 25085.5 35.4| ada2 0 99 60 20646.0 39 24833.7 36.1| ada3 This goes on for minutes, without a change of roles (I had assumed that other 10 minute samples might show relatively higher load on another subset of the drives, but it's always the first two, which receive some 50% more read requests than the other two. The test consisted of minidlna rebuilding its content database for a media collection held on that pool. The unbalanced distribution of requests does not depend on the particular application and the distribution of requests does not change when the drives with highest load approach 100% busy. This is a -CURRENT built from yesterdays sources, but the problem exists for quite some time (and should definitely be reproducible on -STABLE, too). The pool consists of a 4 drive raidz1 on an ICH10 (H67) without cache or log devices and without much ZFS tuning (only max. ARC size, should not at all be relevant in this context): zpool status -v pool: raid1 state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM raid1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 ada1p2 ONLINE 0 0 0 ada2p2 ONLINE 0 0 0 ada3p2 ONLINE 0 0 0 errors: No known data errors Cached configuration: version: 28 name: 'raid1' state: 0 txg: 153899 pool_guid: 10507751750437208608 hostid: 3558706393 hostname: 'se.local' vdev_children: 1 vdev_tree: type: 'root' id: 0 guid: 10507751750437208608 children[0]: type: 'raidz' id: 0 guid: 7821125965293497372 nparity: 1 metaslab_array: 30 metaslab_shift: 36 ashift: 12 asize: 7301425528832 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 7487684108701568404 path: '/dev/ada0p2' phys_path: '/dev/ada0p2' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 12000329414109214882 path: '/dev/ada1p2' phys_path: '/dev/ada1p2' whole_disk: 1 create_txg: 4 children[2]: type: 'disk' id: 2 guid: 2926246868795008014 path: '/dev/ada2p2' phys_path: '/dev/ada2p2' whole_disk: 1 create_txg: 4 children[3]: type: 'disk' id: 3 guid: 5226543136138409733 path: '/dev/ada3p2' phys_path: '/dev/ada3p2' whole_disk: 1 create_txg: 4 I'd be interested to know, whether this behavior can be reproduced on other systems with raidz1 pools consisting of 4 or more drives. All it takes is generating some disk load and running the command: gstat -I 1000 -f '^a?da?.$' to obtain 10 second averages. I have not even tried to look at the scheduling of requests in ZFS, but I'm surprised to see higher than average load on just 2 of the 4 drives, since RAID parity should be evenly spread over all drives and for each file system block a different subset of 3 out of 4 drives should be able to deliver the data without
Re: a few usb issues related to edge cases
On Monday 19 December 2011 13:16:17 Andriy Gapon wrote: Hans Petter, I think that I see some issues in the USB code that could cause problems in some edge cases. From easiest to hardest: Hi, 1. I think that currently there is a LOR in usb_bus_shutdown. I think that the following patch should fix it: === --- a/sys/dev/usb/controller/usb_controller.c +++ b/sys/dev/usb/controller/usb_controller.c @@ -479,6 +481,7 @@ usb_bus_shutdown(struct usb_proc_msg *pm) bus_generic_shutdown(bus-bdev); + USB_BUS_UNLOCK(bus); usbd_enum_lock(udev); err = usbd_set_config_index(udev, USB_UNCONFIG_INDEX); @@ -497,6 +500,7 @@ usb_bus_shutdown(struct usb_proc_msg *pm) (bus-methods-set_hw_power_sleep) (bus, USB_HW_POWER_SHUTDOWN); usbd_enum_unlock(udev); + USB_BUS_LOCK(bus); } You are right! I believe my kernel tests were run without WITNESS. 2. Somewhat related to the above. I think that because the USB subsystem implements the shutdown method and detaches all its drivers, then the ukbd driver won't be able to properly handle the 'shutdown -h' case where the kernel asks to press any key to reboot at the end. Depending on which thread wins the race (the one that executes the mainline shutdown code or the USB explore thread that detaches USB devices) there will either an immediate reboot or a later crash when any key is pressed. This is not critical, but OTOH perhaps the USB subsystem doesn't have to do the shutdown. As far as I can see a lot of the drivers just do nothing for the shutdown, for better or for worth. A side note: perhaps it would be a good idea to pass the 'how' value as an additional parameter to device_shutdown. The shutdown of USB is done to give USB devices at last chance to turn off or reduce their current consumption. In the old code the Host controller itself would be disabled, so keyboard wouldn't have worked I believe like you suggest. BTW: Shutdown should be executed after any Press any key to reboot. and shutdown should be given time to complete, hence for USB this needs to happen in sync with the rest of the USB system. 3. Looking at usbd_transfer_poll I see that it touches a lot of locks, including taking the bus lock. As we've discussed before, this is not safe in a particular context where the polling is supposed to be used - in the kdb/ddb context. If the lock is already taken by another thread, then instead of being able to use a USB keyboard a user would get even less debug-able crash. Also, it seems that usbd_transfer_poll calls into the usual state machine with various callbacks and dynamically made decisions about whether to execute some actions directly or defer their execution to a different thread. This is an optimisation. If the current thread can do the job without a LOR, then we do it right away. Else we let another thread do it. It is possible to have a more simple model, but then you will also get more task switches. That code also touches locks in various places. I think that it would be more preferable to have a method that does the job in a more straight-forward way, without touching any locks, ignoring the usual code paths and assuming that no other treads are running in parallel. Ditto for the method to submit a request. The current USB code can be run fine without real locks, if you do a few tricks. I have a single-threaded BSD-kernel replacement for this which works like a charm for non-FreeBSD projects. I'm going to paste a few lines FYI: Why not extend struct mtx to have two fields which are only used in case of system polling (no scheduler running): struct mtx { xxx; int owned_polling = 0; struct mtx *parent_polling; }; void mtx_init(struct mtx *mtx, const char *name, const char *type, int opt) { mtx-owned = 0; mtx-parent = mtx; } void mtx_lock(struct mtx *mtx) { mtx = mtx-parent; mtx-owned++; } void mtx_unlock(struct mtx *mtx) { mtx = mtx-parent; mtx-owned--; } int mtx_owned(struct mtx *mtx) { mtx = mtx-parent; return (mtx-owned != 0); } void mtx_destroy(struct mtx *mtx) { /* NOP */ } Maybe mtx_init, mtx_lock, mtx_unlock mtx_owned, mtx_destroy, etc, could be function pointers, which are swapped at panic. USB is SMP! To run SMP code from a single thread, you need to create a hiherachy of the threads: 1) Callbacks (Giant) 2) Callbacks (non-Giant) 3) Control EP (non-Giant) 4) Explore thread (non-Giant) When the explore thread is busy, we look for work in the level above and so on. The USB stack implements this principle, which is maybe not documented anywhere btw. If you want more than code, you can hire me to do that. The mtx-code above I believe is far less work than to make new code which handles the polling case only. The reason for the parent mutex field, is to allow easy
USB testers wanted for system suspend and resume
Hi, Can someone which have access to computer hardware which support system suspend and resume please test FreeBSD-10-current after this commit: http://svn.freebsd.org/changeset/base/228709 Part of the test: Remove any custom rc.d scripts which load/unload ehci/ohci/uhci/xhci during suspend and resume. --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
2011/12/19 Stefan Esser s...@freebsd.org: Hi ZFS users, for quite some time I have observed an uneven distribution of load between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt of a longer log of 10 second averages logged with gstat: dT: 10.001s w: 10.000s filter: ^a?da?.$ L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 130 106 4134 4.5 23 1033 5.2 48.8| ada0 0 131 111 3784 4.2 19 1007 4.0 47.6| ada1 0 90 66 2219 4.5 24 1031 5.1 31.7| ada2 1 81 58 2007 4.6 22 1023 2.3 28.1| ada3 L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 1 132 104 4036 4.2 27 1129 5.3 45.2| ada0 0 129 103 3679 4.5 26 1115 6.8 47.6| ada1 1 91 61 2133 4.6 30 1129 1.9 29.6| ada2 0 81 56 1985 4.8 24 1102 6.0 29.4| ada3 L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 1 148 108 4084 5.3 39 2511 7.2 55.5| ada0 1 141 104 3693 5.1 36 2505 10.4 54.4| ada1 1 102 62 2112 5.6 39 2508 5.5 35.4| ada2 0 99 60 2064 6.0 39 2483 3.7 36.1| ada3 This goes on for minutes, without a change of roles (I had assumed that other 10 minute samples might show relatively higher load on another subset of the drives, but it's always the first two, which receive some 50% more read requests than the other two. The test consisted of minidlna rebuilding its content database for a media collection held on that pool. The unbalanced distribution of requests does not depend on the particular application and the distribution of requests does not change when the drives with highest load approach 100% busy. This is a -CURRENT built from yesterdays sources, but the problem exists for quite some time (and should definitely be reproducible on -STABLE, too). The pool consists of a 4 drive raidz1 on an ICH10 (H67) without cache or log devices and without much ZFS tuning (only max. ARC size, should not at all be relevant in this context): zpool status -v pool: raid1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM raid1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 ada1p2 ONLINE 0 0 0 ada2p2 ONLINE 0 0 0 ada3p2 ONLINE 0 0 0 errors: No known data errors Cached configuration: version: 28 name: 'raid1' state: 0 txg: 153899 pool_guid: 10507751750437208608 hostid: 3558706393 hostname: 'se.local' vdev_children: 1 vdev_tree: type: 'root' id: 0 guid: 10507751750437208608 children[0]: type: 'raidz' id: 0 guid: 7821125965293497372 nparity: 1 metaslab_array: 30 metaslab_shift: 36 ashift: 12 asize: 7301425528832 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 7487684108701568404 path: '/dev/ada0p2' phys_path: '/dev/ada0p2' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 12000329414109214882 path: '/dev/ada1p2' phys_path: '/dev/ada1p2' whole_disk: 1 create_txg: 4 children[2]: type: 'disk' id: 2 guid: 2926246868795008014 path: '/dev/ada2p2' phys_path: '/dev/ada2p2' whole_disk: 1 create_txg: 4 children[3]: type: 'disk' id: 3 guid: 5226543136138409733 path: '/dev/ada3p2' phys_path: '/dev/ada3p2' whole_disk: 1 create_txg: 4 I'd be interested to know, whether this behavior can be reproduced on other systems with raidz1 pools consisting of 4 or more drives. All it takes is generating some disk load and running the command: gstat -I 1000 -f '^a?da?.$' to obtain 10 second averages. I have not even tried to look at the scheduling of requests in ZFS, but I'm surprised to see higher than average load on just 2 of the 4 drives, since RAID parity should be evenly spread over all drives and for each file system block a different
Re: a few usb issues related to edge cases
First replying just to couple of points where there seems to be a misunderstanding. on 19/12/2011 16:30 Hans Petter Selasky said the following: 2. Somewhat related to the above. I think that because the USB subsystem implements the shutdown method and detaches all its drivers, then the ukbd driver won't be able to properly handle the 'shutdown -h' case where the kernel asks to press any key to reboot at the end. Depending on which thread wins the race (the one that executes the mainline shutdown code or the USB explore thread that detaches USB devices) there will either an immediate reboot or a later crash when any key is pressed. This is not critical, but OTOH perhaps the USB subsystem doesn't have to do the shutdown. As far as I can see a lot of the drivers just do nothing for the shutdown, for better or for worth. A side note: perhaps it would be a good idea to pass the 'how' value as an additional parameter to device_shutdown. The shutdown of USB is done to give USB devices at last chance to turn off or reduce their current consumption. In the old code the Host controller itself would be disabled, so keyboard wouldn't have worked I believe like you suggest. I am not sure about the old code, I have never checked it. But the atkbd definitely works at this stage. BTW: Shutdown should be executed after any Press any key to reboot. and shutdown should be given time to complete, hence for USB this needs to happen in sync with the rest of the USB system. Have you actually ever done shutdown -h? In other words do you know what the system halt is? :) I am not sure if it would be a good idea to declare a system as halted before shutdown_final hooks are executed. I would rather sacrifice the whole press a key interactivity and simply executed hlt. That's because I think that the system halt has a very limited usage, mostly in combination with UPS, where interactivity via console/keyboard is not very important. BTW, the reason that I suggested to pass 'how' to device_shutdown is to give drivers some choice. E.g. USB could the whole shutdown thing for the cases of poweroff and reboot, but keep the devices going for halt. But probably right now we just need to make a decision whether ukbd is going to support system halt or not. If not, then I think that usb_shutdown() must wait until the explore_proc terminates. If yes, then usb_shutdown() should become a noop. Or it could become quite smart to detach/poweroff other devices in such a way that ukbd still stays usable. But that's probably harder to implement. [snip] As a side note: we probably need a flag to mark certain things such as e.g. the ukbd driver as non recoverable, meaning that once those are used in the kdb context then there is no safe way to go back to normal system operation. I think you need to do shutdown _after_ the Press any key to reboot. A flag won't help. Umm, this suggestion was about entering and exiting KDB/DDB, not about shutdown/reboot. P.S. I've just looked at the code in stable/7 and it seems that it didn't actually unconfigured USB and detached device drivers. At least ohci_shutdown and ohci_shutdown are not called on FreeBSD. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: a few usb issues related to edge cases
On Monday 19 December 2011 16:06:13 Andriy Gapon wrote: First replying just to couple of points where there seems to be a misunderstanding. on 19/12/2011 16:30 Hans Petter Selasky said the following: 2. Somewhat related to the above. I think that because the USB subsystem implements the shutdown method and detaches all its drivers, then the ukbd driver won't be able to properly handle the 'shutdown -h' case where the kernel asks to press any key to reboot at the end. Depending on which thread wins the race (the one that executes the mainline shutdown code or the USB explore thread that detaches USB devices) there will either an immediate reboot or a later crash when any key is pressed. This is not critical, but OTOH perhaps the USB subsystem doesn't have to do the shutdown. As far as I can see a lot of the drivers just do nothing for the shutdown, for better or for worth. A side note: perhaps it would be a good idea to pass the 'how' value as an additional parameter to device_shutdown. The shutdown of USB is done to give USB devices at last chance to turn off or reduce their current consumption. In the old code the Host controller itself would be disabled, so keyboard wouldn't have worked I believe like you suggest. I am not sure about the old code, I have never checked it. But the atkbd definitely works at this stage. ATKBD is no comparison to UKBD :-) BTW: Shutdown should be executed after any Press any key to reboot. and shutdown should be given time to complete, hence for USB this needs to happen in sync with the rest of the USB system. Have you actually ever done shutdown -h? In other words do you know what the system halt is? :) No, I'm usually shutdown -p now. I am not sure if it would be a good idea to declare a system as halted before shutdown_final hooks are executed. I would rather sacrifice the whole press a key interactivity and simply executed hlt. That's because I think that the system halt has a very limited usage, mostly in combination with UPS, where interactivity via console/keyboard is not very important. BTW, the reason that I suggested to pass 'how' to device_shutdown is to give drivers some choice. E.g. USB could the whole shutdown thing for the cases of poweroff and reboot, but keep the devices going for halt. I see. But probably right now we just need to make a decision whether ukbd is going to support system halt or not. If not, then I think that usb_shutdown() must wait until the explore_proc terminates. If yes, then usb_shutdown() should become a noop. Or it could become quite smart to detach/poweroff other devices in such a way that ukbd still stays usable. But that's probably harder to implement. I will fix that. I see a missing wait there. Can I assume that we are allowed to sleep from device_shutdown() and that system timers still work? P.S. I've just looked at the code in stable/7 and it seems that it didn't actually unconfigured USB and detached device drivers. At least ohci_shutdown and ohci_shutdown are not called on FreeBSD. Hmm. --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
On 12/19/2011 03:22 PM, Stefan Esser wrote: Hi ZFS users, for quite some time I have observed an uneven distribution of load between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt of a longer log of 10 second averages logged with gstat: dT: 10.001s w: 10.000s filter: ^a?da?.$ L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 0130106 41344.5 23 10335.2 48.8| ada0 0131111 37844.2 19 10074.0 47.6| ada1 0 90 66 22194.5 24 10315.1 31.7| ada2 1 81 58 20074.6 22 10232.3 28.1| ada3 L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 1132104 40364.2 27 11295.3 45.2| ada0 0129103 36794.5 26 11156.8 47.6| ada1 1 91 61 21334.6 30 11291.9 29.6| ada2 0 81 56 19854.8 24 11026.0 29.4| ada3 L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 1148108 40845.3 39 25117.2 55.5| ada0 1141104 36935.1 36 2505 10.4 54.4| ada1 1102 62 21125.6 39 25085.5 35.4| ada2 0 99 60 20646.0 39 24833.7 36.1| ada3 ... So: Can anybody reproduce this distribution requests? I don't have a raidz1 machine, and no time to make you a special raidz1 pool out of spare disks, but on my raidz2 I can only ever see unevenness when a disk is bad, or between different vdevs. But you only have one vdev. Check is that your disks are identical (are they? we can only assume so since you didn't say so). Show us output from: smartctl -i /dev/ada0 smartctl -i /dev/ada1 smartctl -i /dev/ada2 smartctl -i /dev/ada3 Since your tests show read ms/r to be pretty even, I guess your disks are not broken. But the ms/w is slightly different. So I think it seems that the first 2 disks are slower for writing (someone once said that refurbished disks are like this, even if identical), or the hard disk controller ports they use are slower. For example, maybe your motherboard has 6 ports, and you plugged disks 1,2,3 into port 1,2,3 and disk 4 into port 5. Disk 3 and 4 would have their own channel, but disk 1 and 2 share one. So if the disks are identical, I would guess your hard disk controller is to blame. To test this, first back it up. Then *fix your setup by using labels*. ie. use gpt/somelabel0 or gptid/... rather than ada0p2. Check ls /dev/gpt* output for options on what labels you have already. Then try swapping disks around to see if the load changes. Make sure to back up... Swapping disks (or even removing one depending on controller, etc. when it fails) without labels can be bad. eg. You have ada1 ada2 ada3 ada4. Someone spills coffee on ada2; it fries and cannot be detected anymore, and you reboot. Now you have ada1 ada2 ada3. Then things are usually still fine (even though ada3 is now ada2 and ada4 is now ada3, because there is some zfs superblock stuff to keep track of things), but if you also had an ada5 that was not part of the pool, or was a spare or a log or something other than another disk in the same vdev as ada1, etc., bad things happen when it becomes ada4. Unfortunately, I don't know exactly what people do to cause the bad things that happen. When this happened to me, it just said my pool was faulted or degraded or something, and set a disk or two to UNAVAIL or FAULTED. I don't remember it automatically resilvering them, but when I read about these problems, I think it seems like some disks were resilvered afterwards. And last thing I can think of is to make sure your partitions are aligned, and identical. Show us output from: gpart show Any idea, why this is happening and whether something should be changed in ZFS to better distribute the load (leading to higher file system performance)? Best regards, STefan ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org -- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.malo...@brockmann-consult.de Internet: http://www.brockmann-consult.de ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On 12/18/11 04:34, Adrian Chadd wrote: The trouble is that there's lots of anecdotal evidence, but noone's really gone digging deep into _their_ example of why it's broken. The developers who know this stuff don't see anything wrong. That hints to me it may be something a little more creepy - as an example, the interplay between netisr/swi/taskqueue/callbacks and such. It may be that something is being starved that isn't obviously obvious. It's just a stab in the dark, but it sounds somewhat plausible based on what I've seen ULE do in my network throughput hacking. I applaud reppie for trying to make it as easy as possible for people to use KTR to provide scheduler traces for him to go digging with, so please, if you have these issues and you can absolutely reproduce them, please follow his instructions and work with him to get him what he needs. The thing I've seen is that ULE is substantially more enthusiastic about migrating processes between cores than 4BSD. Often, this is a good thing, but can increase the rate of cache misses, hurting performance for cache-bound processes (I see this particularly in HPC-type scientific workloads). It might be interesting to add some kind of tunable here. Another more interesting and slightly longer-term possibility if someone wants a project would be to integrate scheduling decisions with hwpmc counters, to accumulate statistics on cache hits at each context switch and preferentially keep processes with a high hits/misses ratio on the same thread/cache domain relative to processes with a low one. -Nathan P.S. The other thing that could be very interesting from a research and scheduling standpoint would be to integrate heterogeneous SMP support into the operating system, with a FreeBSD-4 Application Processor syscall model. We seem to be going down the road where GPGPU computing has MMUs, timer interrupts, IPIs, etc. (the next AMD Fusions, IBM Cell), as well as potential systems with both x86 and ARM cores. This is something that no operating system currently supports well, and would be a place for BSD to shine. If anyone has a free graduate student... ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: r228700 can't dhclient em0
On Dec 19, 2011, at 5:24 AM, Dimitry Andric d...@freebsd.org wrote: On 2011-12-19 10:17, Doug Barton wrote: I updated to r228700 from 228122 and dhclient exits immediately saying that em0 doesn't exist. However ifconfig seems to disagree: em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=4219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO ether 00:24:e8:30:10:9b nd6 options=29PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL media: Ethernet autoselect (100baseTXfull-duplex) status: active lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST metric 0 mtu 16384 options=3RXCSUM,TXCSUM nd6 options=21PERFORMNUD,AUTO_LINKLOCAL Interestingly, some of the options are different in that version, vs. the working version: em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC ether 00:24:e8:30:10:9b inet 172.17.198.245 netmask 0x broadcast 172.17.255.255 nd6 options=29PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL media: Ethernet autoselect (100baseTXfull-duplex) status: active I saw this too, when my kernel and userland were out of sync (e.g. just after installing a new kernel, and before installworld). I suspect it is caused by the changes in r228571, which cause old ifconfig and dhclient to not recognize any interfaces. I'm not 100% sure though. This makes sense because the structs that describe addresses changed recently. -Garrett___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
Hi, a quick test using `dd if=/dev/zero of=/test ...` shows: dT: 10.004s w: 10.000s filter: ^a?da?.$ L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 0378 0 0 12.5376 36414 11.9 60.6| ada0 0380 0 0 12.2378 36501 11.8 60.0| ada1 0382 0 07.7380 36847 11.6 59.2| ada2 0375 0 07.4374 361649.6 51.3| ada3 0377 0 1 10.2375 36325 10.1 53.3| ada4 10391 0 0 39.3389 38064 15.7 80.2| ada5 Seems to be sufficiently equally distributed for a life system... zpool status shows: ... NAMESTATE READ WRITE CKSUM bootONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0p3 ONLINE 0 0 0 ada1p3 ONLINE 0 0 0 ada2p3 ONLINE 0 0 0 ada3p3 ONLINE 0 0 0 ada4p3 ONLINE 0 0 0 ada5p3 ONLINE 0 0 0 ... The only cases I've seen (and expected to see) unequal load distributions on ZFS was after extending a nearly full four disk mirror pool by additional two disks. Bye/2 --- Michael Reifenberger mich...@reifenberger.com http://www.Reifenberger.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
In the last episode (Dec 19), Stefan Esser said: for quite some time I have observed an uneven distribution of load between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt of a longer log of 10 second averages logged with gstat: dT: 10.001s w: 10.000s filter: ^a?da?.$ L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 0130106 41344.5 23 10335.2 48.8| ada0 0131111 37844.2 19 10074.0 47.6| ada1 0 90 66 22194.5 24 10315.1 31.7| ada2 1 81 58 20074.6 22 10232.3 28.1| ada3 [...] zpool status -v pool: raid1 state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM raid1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 ada1p2 ONLINE 0 0 0 ada2p2 ONLINE 0 0 0 ada3p2 ONLINE 0 0 0 Any read from your raidz device will hit three disks (the checksum is applied across the stripe, not on each block, so a full stripe is always read) so I think your extra IOs are coming from somewhere else. What's on p1 on these disks? Could that be the cause of your extra I/Os? Does zpool iostat -v 10 give you even numbers across all disks? -- Dan Nelson dnel...@allantgroup.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
On Mon, 19 Dec 2011, Peter Maloney wrote: Swapping disks (or even removing one depending on controller, etc. when it fails) without labels can be bad. eg. Since ZFS uses (and searches for) its own UUID partition signatures s disk wapping shouldn't matter as long enough disks are found. Set vfs.zfs.debug=1 during boot to watch what is searched for. Bye/2 --- Michael Reifenberger mich...@reifenberger.com http://www.Reifenberger.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
On Mon, Dec 19, 2011 at 6:22 AM, Stefan Esser s...@freebsd.org wrote: Hi ZFS users, for quite some time I have observed an uneven distribution of load between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt of a longer log of 10 second averages logged with gstat: dT: 10.001s w: 10.000s filter: ^a?da?.$ L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 130 106 4134 4.5 23 1033 5.2 48.8| ada0 0 131 111 3784 4.2 19 1007 4.0 47.6| ada1 0 90 66 2219 4.5 24 1031 5.1 31.7| ada2 1 81 58 2007 4.6 22 1023 2.3 28.1| ada3 L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 1 132 104 4036 4.2 27 1129 5.3 45.2| ada0 0 129 103 3679 4.5 26 1115 6.8 47.6| ada1 1 91 61 2133 4.6 30 1129 1.9 29.6| ada2 0 81 56 1985 4.8 24 1102 6.0 29.4| ada3 L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 1 148 108 4084 5.3 39 2511 7.2 55.5| ada0 1 141 104 3693 5.1 36 2505 10.4 54.4| ada1 1 102 62 2112 5.6 39 2508 5.5 35.4| ada2 0 99 60 2064 6.0 39 2483 3.7 36.1| ada3 This suggests (note that I said suggests) that there might be a slight difference in the data path speeds or physical media as someone else suggested; look at zpool iostat -v interval though before making a firm statement as to whether or not a drive is truly not performing to your assumed spec. gstat and zpool iostat -v suggest performance though -- they aren't the end-all-be-all for determining drive performance. If the latency numbers were high enough, I would suggest dd'ing out to the individual drives (i.e. remove the drive from the RAIDZ) to see if there's a noticeable discrepancy, as this can indicate a bad cable, backplane, or drive; from there I would start doing the physical swap routine and see if the issue moves with the drive or stays static with the controller channel and/or chassis slot. Cheers, -Garrett ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: a few usb issues related to edge cases
on 19/12/2011 17:11 Hans Petter Selasky said the following: I will fix that. I see a missing wait there. Can I assume that we are allowed to sleep from device_shutdown() and that system timers still work? I don't see any reason why either of these should be not true. Oh, and I see that you've already committed the change - thanks! -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
I have observed similar behavior, even more extreme on a spool with dedup enabled. Is dedup enabled on this spool? Might be that the DDT tables somehow end up unevenly distributed to disks. My observation was on a 6 disk raidz2. Daniel___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
Am 19.12.2011 15:36, schrieb Olivier Smedts: 2011/12/19 Stefan Esser s...@freebsd.org: So: Can anybody reproduce this distribution requests? Hello, Stupid question, but are your drives all exactly the same ? I noticed ashift: 12 so I think you should have at least one 4k-sector drive, are you sure they're not mixed with 512B per sector drives ? All drives are identical: SAMSUNG HD204UI 1AQ10001 at scbus3 target 0 lun 0 (ada0,pass2) SAMSUNG HD204UI 1AQ10001 at scbus4 target 0 lun 0 (ada1,pass3) SAMSUNG HD204UI 1AQ10001 at scbus5 target 0 lun 0 (ada2,pass4) SAMSUNG HD204UI 1AQ10001 at scbus6 target 0 lun 0 (ada3,pass5) These are 4KB sector drives. Everything is correctly aligned and all drives have identical partition (created by a script that was run once for each drive, so there is no risk of typoes leading to differences). Regards, STefan ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
Am 19.12.2011 16:42, schrieb Peter Maloney: On 12/19/2011 03:22 PM, Stefan Esser wrote: So: Can anybody reproduce this distribution requests? I don't have a raidz1 machine, and no time to make you a special raidz1 pool out of spare disks, but on my raidz2 I can only ever see unevenness when a disk is bad, or between different vdevs. But you only have one vdev. Thanks for replying. In my previous raidz1 pool consisting of 3*1TB, one of the drives had to be replaced because it showed lots of recoverable errors when I initially created the pool. The effects where much more drastic than what I see now: Given identical request rates, the failed drive was 100% busy when the other drives had busy percentages in the one digit range. But the observed differences seem to be caused by a different rate of read requests issued towards the drives (the first two receive 30% of the reads, each, while the last two receive 20% each). And this ratio has been stable over months (I had already noticed this in summer, but did not have time to start a thread at that time). Check is that your disks are identical (are they? we can only assume so since you didn't say so). Yes, all 4 are identical. Show us output from: smartctl -i /dev/ada0 Model Family: SAMSUNG SpinPoint F4 EG (AFT) Device Model: SAMSUNG HD204UI Serial Number:S2H7JD1B116957 LU WWN Device Id: 5 0024e9 0049bee63 Firmware Version: 1AQ10001 User Capacity:2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Device is:In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 6 Local Time is:Mon Dec 19 19:23:36 2011 CET ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051Pre-fail Always - 0 2 Throughput_Performance 0x0026 252 252 000Old_age Always - 0 3 Spin_Up_Time0x0023 067 067 025Pre-fail Always - 10127 4 Start_Stop_Count0x0032 100 100 000Old_age Always - 254 5 Reallocated_Sector_Ct 0x0033 252 252 010Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051Old_age Always - 0 8 Seek_Time_Performance 0x0024 252 252 015Old_age Offline - 0 9 Power_On_Hours 0x0032 100 100 000Old_age Always - 2300 10 Spin_Retry_Count0x0032 252 252 051Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000Old_age Always - 1 12 Power_Cycle_Count 0x0032 100 100 000Old_age Always - 228 181 Program_Fail_Cnt_Total 0x0022 100 100 000Old_age Always - 621067 191 G-Sense_Error_Rate 0x0022 100 100 000Old_age Always - 4 192 Power-Off_Retract_Count 0x0022 252 252 000Old_age Always - 0 194 Temperature_Celsius 0x0002 064 055 000Old_age Always - 28 (Min/Max 15/48) 195 Hardware_ECC_Recovered 0x003a 100 100 000Old_age Always - 0 196 Reallocated_Event_Count 0x0032 252 252 000Old_age Always - 0 197 Current_Pending_Sector 0x0032 252 252 000Old_age Always - 0 198 Offline_Uncorrectable 0x0030 252 252 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x0036 200 200 000Old_age Always - 0 200 Multi_Zone_Error_Rate 0x002a 100 100 000Old_age Always - 2 223 Load_Retry_Count0x0032 100 100 000Old_age Always - 1 225 Load_Cycle_Count0x0032 100 100 000Old_age Always - 264 smartctl -i /dev/ada1 Model Family: SAMSUNG SpinPoint F4 EG (AFT) Device Model: SAMSUNG HD204UI Serial Number:S2H7JD1B116947 LU WWN Device Id: 5 0024e9 0049bee49 Firmware Version: 1AQ10001 User Capacity:2,000,398,934,016 bytes [2.00 TB] Sector Size: 512 bytes logical/physical Device is:In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 6 Local Time is:Mon Dec 19 19:23:22 2011 CET ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 051Pre-fail Always - 0 2 Throughput_Performance 0x0026 252 252 000Old_age Always - 0 3 Spin_Up_Time0x0023 067 067 025Pre-fail Always - 10096 4 Start_Stop_Count0x0032 100 100 000Old_age Always - 255 5 Reallocated_Sector_Ct 0x0033 252 252 010Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 252 252 051Old_age Always - 0 8
Re: WITHOUT_PROFILE=yes by default
On Dec 2, 2011, at 9:52 AM, Lyndon Nerenberg wrote: Using profiled libs and gprof to profile your code has been obsolete in FreeBSD on i386 and amd64 for over six years now. Funny, it still seems to work on my systems. Worked for me last time I tried as well. Was able to find the problems w/o a hassle. turning them off is plain wrong. Can we at least ship profiled libraries for the release? Warner ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: WITHOUT_PROFILE=yes by default
On Dec 2, 2011, at 3:37 PM, Steve Kargl wrote: On Fri, Dec 02, 2011 at 04:21:14PM +0700, Max Khon wrote: The most important thing is to have reasonable defaults. Having WITH_PROFILE by default does not seem to be a reasonable default to me. Now all users that want to profile anything need to build their own custom FreeBSD? That seems even more nuts to me. Warner ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: VM images for FreeBSD
Hi, Hm, so this lets us create a virtualbox image from what, a set of install tarballs? Or /usr/src build? Adrian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: VM images for FreeBSD
2011/12/19 Adrian Chadd adr...@freebsd.org Hi, Hm, so this lets us create a virtualbox image from what, a set of install tarballs? Or /usr/src build? I'm using cross-build and installation from sources dir (which is after that got svn-up'ed and all goes again). It shouldn't be complex to install to image from installation media and/or tarballs, but mine main idea is to have rolling image for making some automated tests. Currently I'm establishing building and providing images scheme, will do images with KMS+small graphical programs, with qt+unstable KDE, and probably with BHyVe. I think that's most useful setups currently. And maybe some image for benchmarking :) Adrian -- Regards, Alexander Yerenkow ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
Am 19.12.2011 17:22, schrieb Dan Nelson: In the last episode (Dec 19), Stefan Esser said: for quite some time I have observed an uneven distribution of load between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt of a longer log of 10 second averages logged with gstat: dT: 10.001s w: 10.000s filter: ^a?da?.$ L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 0130106 41344.5 23 10335.2 48.8| ada0 0131111 37844.2 19 10074.0 47.6| ada1 0 90 66 22194.5 24 10315.1 31.7| ada2 1 81 58 20074.6 22 10232.3 28.1| ada3 [...] zpool status -v pool: raid1 state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM raid1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 ada1p2 ONLINE 0 0 0 ada2p2 ONLINE 0 0 0 ada3p2 ONLINE 0 0 0 Any read from your raidz device will hit three disks (the checksum is applied across the stripe, not on each block, so a full stripe is always read) so I think your extra IOs are coming from somewhere else. What's on p1 on these disks? Could that be the cause of your extra I/Os? Does zpool iostat -v 10 give you even numbers across all disks? This is a ZFS only system. The first partition on each drive holds just the gptzfsloader. poolalloc free read write read write -- - - - - - - raid1 4.41T 2.21T139 72 12.3M 818K raidz14.41T 2.21T139 72 12.3M 818K ada0p2 - -114 17 4.24M 332K ada1p2 - -106 15 3.82M 305K ada2p2 - - 65 20 2.09M 337K ada3p2 - - 58 18 2.18M 329K capacity operationsbandwidth poolalloc free read write read write -- - - - - - - raid1 4.41T 2.21T150 45 12.8M 751K raidz14.41T 2.21T150 45 12.8M 751K ada0p2 - -113 14 4.34M 294K ada1p2 - -111 14 3.94M 277K ada2p2 - - 62 16 2.23M 294K ada3p2 - - 68 14 2.32M 277K -- - - - - - - capacity operationsbandwidth poolalloc free read write read write -- - - - - - - raid1 4.41T 2.21T157 86 12.3M 6.41M raidz14.41T 2.21T157 86 12.3M 6.41M ada0p2 - -119 39 4.21M 2.24M ada1p2 - -106 31 3.78M 2.21M ada2p2 - - 81 59 2.23M 2.23M ada3p2 - - 57 39 2.06M 2.22M -- - - - - - - capacity operationsbandwidth poolalloc free read write read write -- - - - - - - raid1 4.41T 2.21T187 45 14.2M 1.04M raidz14.41T 2.21T187 45 14.2M 1.04M ada0p2 - -117 13 4.27M 398K ada1p2 - -120 12 4.01M 384K ada2p2 - - 89 12 2.97M 403K ada3p2 - - 85 13 2.91M 386K -- - - - - - - The same difference of read operations per second as shown by gstat ... Regards, STefan ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
Am 19.12.2011 17:48, schrieb Michael Reifenberger: On Mon, 19 Dec 2011, Peter Maloney wrote: Swapping disks (or even removing one depending on controller, etc. when it fails) without labels can be bad. eg. Since ZFS uses (and searches for) its own UUID partition signatures s disk wapping shouldn't matter as long enough disks are found. Set vfs.zfs.debug=1 during boot to watch what is searched for. Bye/2 --- Michael Reifenberger mich...@reifenberger.com http://www.Reifenberger.com Thanks for the info. But I am confused by it, because when my disks moved around randomly on reboot, it really did mess things up. The first few times it happened, there was no issue, but when a spare took the place of a pool disk, it messed things up. I can see the UUIDs when I look at zdb output, so I really have no idea why it messed things up. ... but it did, so I will always caution people anyway. I can't point you to any relevant lines of code that cause the problem, but I know it can happen... and it will when you least expect it. ;) And I also see the opposite... people talking about their very old pools, with many disks exchanged, and wonder why mine was so easily messed up and theirs survived so long without labels. I just assumed it was the way the controller arranged the disks. (and by the way, mine now orders the disks perfectly consistently now that it is in IT mode, not mostly random like before... could be a factor) I am always very busy, but when I get the chance, it shouldn't take too long, so I will try to recreate it on a virtual machine and try vfs.zfs.debug=1.Thanks for the suggestion. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
Am 19.12.2011 17:36, schrieb Michael Reifenberger: Hi, a quick test using `dd if=/dev/zero of=/test ...` shows: dT: 10.004s w: 10.000s filter: ^a?da?.$ L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 0378 0 0 12.5376 36414 11.9 60.6| ada0 0380 0 0 12.2378 36501 11.8 60.0| ada1 0382 0 07.7380 36847 11.6 59.2| ada2 0375 0 07.4374 361649.6 51.3| ada3 0377 0 1 10.2375 36325 10.1 53.3| ada4 10391 0 0 39.3389 38064 15.7 80.2| ada5 Thanks! There are surprising differences (ada5 has a queue length of 10 and much higher latency than the other drives). Seems to be sufficiently equally distributed for a life system... Hmmm, 50%-55% busy on ada3 and ada4 contrasts with 80% busy on ada5. zpool status shows: ... NAMESTATE READ WRITE CKSUM bootONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0p3 ONLINE 0 0 0 ada1p3 ONLINE 0 0 0 ada2p3 ONLINE 0 0 0 ada3p3 ONLINE 0 0 0 ada4p3 ONLINE 0 0 0 ada5p3 ONLINE 0 0 0 ... The only cases I've seen (and expected to see) unequal load distributions on ZFS was after extending a nearly full four disk mirror pool by additional two disks. In my case the pool was created from disk drives with nearly identical serial numbers in its current configuration. Some of the drives have a few more power-on hours, since I performed some tests with them, before moving all data from the old pool the new one, but else everything should be symmetric. Best regards, STefan ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
Am 19.12.2011 18:05, schrieb Garrett Cooper: On Mon, Dec 19, 2011 at 6:22 AM, Stefan Esser s...@freebsd.org wrote: Hi ZFS users, for quite some time I have observed an uneven distribution of load between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt of a longer log of 10 second averages logged with gstat: dT: 10.001s w: 10.000s filter: ^a?da?.$ L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 0130106 41344.5 23 10335.2 48.8| ada0 0131111 37844.2 19 10074.0 47.6| ada1 0 90 66 22194.5 24 10315.1 31.7| ada2 1 81 58 20074.6 22 10232.3 28.1| ada3 L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 1132104 40364.2 27 11295.3 45.2| ada0 0129103 36794.5 26 11156.8 47.6| ada1 1 91 61 21334.6 30 11291.9 29.6| ada2 0 81 56 19854.8 24 11026.0 29.4| ada3 L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 1148108 40845.3 39 25117.2 55.5| ada0 1141104 36935.1 36 2505 10.4 54.4| ada1 1102 62 21125.6 39 25085.5 35.4| ada2 0 99 60 20646.0 39 24833.7 36.1| ada3 This suggests (note that I said suggests) that there might be a slight difference in the data path speeds or physical media as someone else suggested; look at zpool iostat -v interval though before making a firm statement as to whether or not a drive is truly not performing to your assumed spec. gstat and zpool iostat -v suggest performance though -- they aren't the end-all-be-all for determining drive performance. I doubt there is a difference in the data path speeds, since all drives are connected to the SATA II ports of an Intel H67 chip. The drives seem to perform equally well, just with a ratio of read requests of 30% / 30% / 20% / 20% for ada0 .. ada3. But neither queue length nor command latencies indicate a problem or differences in the drives. It seems that a different number of commands is scheduled for 2 of the 4 drives, compared to the other 2, and that scheduling should be part of the ZFS code. I'm quite convinced, that neither the drives nor the other hardware plays a role, but I'll follow the suggestion to swap drives between controller ports and to observe whether the increased read load moves with the drives (indicating something on disk causes the anomaly) or stays with the SATA ports (indicating that lower numbered ports see higher load). If the latency numbers were high enough, I would suggest dd'ing out to the individual drives (i.e. remove the drive from the RAIDZ) to see if there's a noticeable discrepancy, as this can indicate a bad cable, backplane, or drive; from there I would start doing the physical swap routine and see if the issue moves with the drive or stays static with the controller channel and/or chassis slot. I do not expect a hardware problem, since command latencies are very similar over all drives, despite the higher read load on some of them. These are more busy by exactly the factor to be expected by only the higher command rate. But it seems that others do not observe the asymmetric distribution of requests, which makes me wonder whether I happen to have meta data arranged in such a way that it is always read from ada0 or ada1, but not (or rarely) from ada2 or ada3. That could explain it, including the fact that raidz1 over other numbers of drives 8e.g. 3 or 6) apparently show a much more symmetric distribution of read requests. Regards, STefan ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: r228700 can't dhclient em0
On 2011-12-19 17:36, Garrett Cooper wrote: On Dec 19, 2011, at 5:24 AM, Dimitry Andricd...@freebsd.org wrote: On 2011-12-19 10:17, Doug Barton wrote: I updated to r228700 from 228122 and dhclient exits immediately saying that em0 doesn't exist. However ifconfig seems to disagree: ... I saw this too, when my kernel and userland were out of sync (e.g. just after installing a new kernel, and before installworld). I suspect it is caused by the changes in r228571, which cause old ifconfig and dhclient to not recognize any interfaces. I'm not 100% sure though. This makes sense because the structs that describe addresses changed recently. It may make sense, but it is very annoying when you want to installworld over NFS, or have any other network access before or during installation. :( ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
On Dec 19, 2011, at 12:54 PM, Stefan Esser wrote: Am 19.12.2011 18:05, schrieb Garrett Cooper: On Mon, Dec 19, 2011 at 6:22 AM, Stefan Esser s...@freebsd.org wrote: Hi ZFS users, for quite some time I have observed an uneven distribution of load between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt of a longer log of 10 second averages logged with gstat: dT: 10.001s w: 10.000s filter: ^a?da?.$ L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 0130106 41344.5 23 10335.2 48.8| ada0 0131111 37844.2 19 10074.0 47.6| ada1 0 90 66 22194.5 24 10315.1 31.7| ada2 1 81 58 20074.6 22 10232.3 28.1| ada3 L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 1132104 40364.2 27 11295.3 45.2| ada0 0129103 36794.5 26 11156.8 47.6| ada1 1 91 61 21334.6 30 11291.9 29.6| ada2 0 81 56 19854.8 24 11026.0 29.4| ada3 L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 1148108 40845.3 39 25117.2 55.5| ada0 1141104 36935.1 36 2505 10.4 54.4| ada1 1102 62 21125.6 39 25085.5 35.4| ada2 0 99 60 20646.0 39 24833.7 36.1| ada3 This suggests (note that I said suggests) that there might be a slight difference in the data path speeds or physical media as someone else suggested; look at zpool iostat -v interval though before making a firm statement as to whether or not a drive is truly not performing to your assumed spec. gstat and zpool iostat -v suggest performance though -- they aren't the end-all-be-all for determining drive performance. I doubt there is a difference in the data path speeds, since all drives are connected to the SATA II ports of an Intel H67 chip. The drives seem to perform equally well, just with a ratio of read requests of 30% / 30% / 20% / 20% for ada0 .. ada3. But neither queue length nor command latencies indicate a problem or differences in the drives. It seems that a different number of commands is scheduled for 2 of the 4 drives, compared to the other 2, and that scheduling should be part of the ZFS code. I'm quite convinced, that neither the drives nor the other hardware plays a role, but I'll follow the suggestion to swap drives between controller ports and to observe whether the increased read load moves with the drives (indicating something on disk causes the anomaly) or stays with the SATA ports (indicating that lower numbered ports see higher load). If the latency numbers were high enough, I would suggest dd'ing out to the individual drives (i.e. remove the drive from the RAIDZ) to see if there's a noticeable discrepancy, as this can indicate a bad cable, backplane, or drive; from there I would start doing the physical swap routine and see if the issue moves with the drive or stays static with the controller channel and/or chassis slot. I do not expect a hardware problem, since command latencies are very similar over all drives, despite the higher read load on some of them. These are more busy by exactly the factor to be expected by only the higher command rate. But it seems that others do not observe the asymmetric distribution of requests, which makes me wonder whether I happen to have meta data arranged in such a way that it is always read from ada0 or ada1, but not (or rarely) from ada2 or ada3. That could explain it, including the fact that raidz1 over other numbers of drives 8e.g. 3 or 6) apparently show a much more symmetric distribution of read requests. Basic question: does one set of drives vibrate differently than the other set? -Garrett___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
Am 19.12.2011 19:03, schrieb Daniel Kalchev: I have observed similar behavior, even more extreme on a spool with dedup enabled. Is dedup enabled on this spool? Thank you for the report! Well, I had dedup enabled for a few short tests. But since I have got only 8GB of RAM and dedup seems to require an order of magnitude more to be working well, I switched dedup off again after a few hours. Might be that the DDT tables somehow end up unevenly distributed to disks. My observation was on a 6 disk raidz2. Hmmm, there was another report of even distribution of load on a 6 disk raidz1 (but in fact, in that case the first half seems to have got some 10% to 15 higher load than the second half; the sixth drive showed quite different queue length and latencies and I think these might be caused either by a defect (soft-errors) or another partition being actively used only on that drive). Regards, STefan ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
On Dec 19, 2011, at 11:00 PM, Stefan Esser wrote: Am 19.12.2011 19:03, schrieb Daniel Kalchev: I have observed similar behavior, even more extreme on a spool with dedup enabled. Is dedup enabled on this spool? Thank you for the report! Well, I had dedup enabled for a few short tests. But since I have got only 8GB of RAM and dedup seems to require an order of magnitude more to be working well, I switched dedup off again after a few hours. You will need to get rid of the DDT, as those are read nevertheless even with dedup (already) disabled. The tables refer to already deduped data. In my case, I had about 2-3TB of deduced data, with 24GB RAM. There was no shortage of RAM and I could not confirm that ARC is full.. but somehow the pool was placing heavy read on one or two disks only (all others, nearly idle) -- apparently many small size reads. I resolved my issue by copying the data to a newly created filesystem in the same pool -- luckily there was enough space available, then removing the 'deduped' filesystems. That last operation was particularly slow and at one time I had spontaneous reboot -- the pool was 'impossible to mount', and as weird as it sounds, I had 'out of swap space' killing the 'zpool list' process. I let it sit for few hours, until it has cleared itself. I/O in that pool is back to normal now. There is something terribly wrong with the dedup code. Well, if your test data is not valuable, you can just delete it. :) Daniel ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
On Mon, Dec 19, 2011 at 1:07 PM, Daniel Kalchev dan...@digsys.bg wrote: On Dec 19, 2011, at 11:00 PM, Stefan Esser wrote: Am 19.12.2011 19:03, schrieb Daniel Kalchev: I have observed similar behavior, even more extreme on a spool with dedup enabled. Is dedup enabled on this spool? Thank you for the report! Well, I had dedup enabled for a few short tests. But since I have got only 8GB of RAM and dedup seems to require an order of magnitude more to be working well, I switched dedup off again after a few hours. You will need to get rid of the DDT, as those are read nevertheless even with dedup (already) disabled. The tables refer to already deduped data. In my case, I had about 2-3TB of deduced data, with 24GB RAM. There was no shortage of RAM and I could not confirm that ARC is full.. but somehow the pool was placing heavy read on one or two disks only (all others, nearly idle) -- apparently many small size reads. I resolved my issue by copying the data to a newly created filesystem in the same pool -- luckily there was enough space available, then removing the 'deduped' filesystems. That last operation was particularly slow and at one time I had spontaneous reboot -- the pool was 'impossible to mount', and as weird as it sounds, I had 'out of swap space' killing the 'zpool list' process. I let it sit for few hours, until it has cleared itself. I/O in that pool is back to normal now. There is something terribly wrong with the dedup code. Dedup in the ZFS manual claims that it needs 2GB of memory per TB of data, but in reality it's closer to 5GB of memory per TB of data on average. So if you turn it on on large datasets or pools and don't limit the ARC, it ties your box in knots after it wires down all of the physical memory (even when you're doing a reimport when it's replaying the ZIL -- either on the array or on your dedicated ZIL device). This of course either causes your machine to dig into swap and slow to a crawl, and/or blows away your userland (and now you're pretty much SoL). Bottom line is that dedup is a poorly articulated feature and causes lots of issues if enabled. Compression is a much better feature to enable. Well, if your test data is not valuable, you can just delete it. :) +1, but I suggest limiting the ARC first. Cheers, -Garrett ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
on 19/12/2011 17:50 Nathan Whitehorn said the following: The thing I've seen is that ULE is substantially more enthusiastic about migrating processes between cores than 4BSD. Hmm, this seems to be contrary to my theoretical expectations. I thought that with 4BSD all threads that were not in one of the following categories: - temporary pinned - bound to cpu in kernel via sched_bind - belong to a cpu set which a strict subset of a total set were placed onto a common queue that was shared by all cpus. And as such I expected them to get picked up by the cpus semi-randomly. In other words, I thought that it was ULE that took into account cpu/cache affinities while 4BSD was deliberately entirely ignorant of those details. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
Am 19.12.2011 22:00, schrieb Garrett Cooper: On Dec 19, 2011, at 12:54 PM, Stefan Esser wrote: But it seems that others do not observe the asymmetric distribution of requests, which makes me wonder whether I happen to have meta data arranged in such a way that it is always read from ada0 or ada1, but not (or rarely) from ada2 or ada3. That could explain it, including the fact that raidz1 over other numbers of drives 8e.g. 3 or 6) apparently show a much more symmetric distribution of read requests. Basic question: does one set of drives vibrate differently than the other set? No: All drives are mounted in similar cages in a midi tower case (and since I did not like the temperature rising to 45C, last summer, I added case fans to keep the temperature of all drives equally low, too). But I'll try swapping drives (or rather SATA ports) tomorrow. If the drives are different (hardware or data on the drives), then the higher load will move, but if it's in the ZFS code, then I expect the higher request rate to stay on the first two drives. I'll report the outcome. (And repeating what I wrote before: The drives seem to behave perfectly well, they do just receive different numbers of read requests although the pool appears to be symmetric with regard to all factors that could have an impact. I really doubt this is caused by hardware, else there would be observable differences in latency or queue length.) Regards, STefan ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
Am 19.12.2011 22:07, schrieb Daniel Kalchev: On Dec 19, 2011, at 11:00 PM, Stefan Esser wrote: Well, I had dedup enabled for a few short tests. But since I have got only 8GB of RAM and dedup seems to require an order of magnitude more to be working well, I switched dedup off again after a few hours. You will need to get rid of the DDT, as those are read nevertheless even with dedup (already) disabled. The tables refer to already deduped data. Thanks for the hint! Is there an easy way to identify the file systems that ever had dedup enabled? (I don't mind to extract the information from zdb output, in case that the UI of choice.) I seem to remember that I tried it with my /usr/svn (which obviously had lots of duplicated files), but I do not remember on which other file systems I tried it ... (I've created some 20-25 filesystems on this pool.) In my case, I had about 2-3TB of deduced data, with 24GB RAM. There was no shortage of RAM and I could not confirm that ARC is full.. but somehow the pool was placing heavy read on one or two disks only (all others, nearly idle) -- apparently many small size reads. I resolved my issue by copying the data to a newly created filesystem in the same pool -- luckily there was enough space available, then removing the 'deduped' filesystems. This should be easy in the case of /usr/svn, thanks for the suggestion! That last operation was particularly slow and at one time I had spontaneous reboot -- the pool was 'impossible to mount', and as weird as it sounds, I had 'out of swap space' killing the 'zpool list' process. I let it sit for few hours, until it has cleared itself. I/O in that pool is back to normal now. There is something terribly wrong with the dedup code. Well, if your test data is not valuable, you can just delete it. :) I could also start over with a clean SVN check-out, but since I've got the free disk space to copy the data over, I'll try that first. Thanks again and best regards, STefan ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Mon Dec 19 11, Nathan Whitehorn wrote: On 12/18/11 04:34, Adrian Chadd wrote: The trouble is that there's lots of anecdotal evidence, but noone's really gone digging deep into _their_ example of why it's broken. The developers who know this stuff don't see anything wrong. That hints to me it may be something a little more creepy - as an example, the interplay between netisr/swi/taskqueue/callbacks and such. It may be that something is being starved that isn't obviously obvious. It's just a stab in the dark, but it sounds somewhat plausible based on what I've seen ULE do in my network throughput hacking. I applaud reppie for trying to make it as easy as possible for people to use KTR to provide scheduler traces for him to go digging with, so please, if you have these issues and you can absolutely reproduce them, please follow his instructions and work with him to get him what he needs. The thing I've seen is that ULE is substantially more enthusiastic about migrating processes between cores than 4BSD. Often, this is a good thing, but can increase the rate of cache misses, hurting performance for cache-bound processes (I see this particularly in HPC-type scientific workloads). It might be interesting to add some kind of tunable here. does r228718 have any impact regarding this behaviour? cheers. alex Another more interesting and slightly longer-term possibility if someone wants a project would be to integrate scheduling decisions with hwpmc counters, to accumulate statistics on cache hits at each context switch and preferentially keep processes with a high hits/misses ratio on the same thread/cache domain relative to processes with a low one. -Nathan P.S. The other thing that could be very interesting from a research and scheduling standpoint would be to integrate heterogeneous SMP support into the operating system, with a FreeBSD-4 Application Processor syscall model. We seem to be going down the road where GPGPU computing has MMUs, timer interrupts, IPIs, etc. (the next AMD Fusions, IBM Cell). This is something that no operating system currently supports well, and would be a place for BSD to shine. If anyone has a free graduate student... ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
In the last episode (Dec 19), Stefan Esser said: Am 19.12.2011 17:22, schrieb Dan Nelson: In the last episode (Dec 19), Stefan Esser said: for quite some time I have observed an uneven distribution of load between drives in a 4 * 2TB RAIDZ1 pool. The following is an excerpt of a longer log of 10 second averages logged with gstat: dT: 10.001s w: 10.000s filter: ^a?da?.$ L(q) ops/sr/s kBps ms/rw/s kBps ms/w %busy Name 0130106 41344.5 23 10335.2 48.8| ada0 0131111 37844.2 19 10074.0 47.6| ada1 0 90 66 22194.5 24 10315.1 31.7| ada2 1 81 58 20074.6 22 10232.3 28.1| ada3 [...] This is a ZFS only system. The first partition on each drive holds just the gptzfsloader. poolalloc free read write read write -- - - - - - - raid1 4.41T 2.21T139 72 12.3M 818K raidz14.41T 2.21T139 72 12.3M 818K ada0p2 - -114 17 4.24M 332K ada1p2 - -106 15 3.82M 305K ada2p2 - - 65 20 2.09M 337K ada3p2 - - 58 18 2.18M 329K The same difference of read operations per second as shown by gstat ... I was under the impression that the parity blocks were scattered evenly across all disks, but from reading vdev_raidz.c, it looks like that isn't always the case. See the comment at the bottom of the vdev_raidz_map_alloc() function; it looks like it will toggle parity between the first two disks in a stripe every 1MB. It's not necessarily the first two disks assigned to the zvol, since stripes don't have to span all disks as long as there's one parity block (a small sync write may just hit two disks, essentially being written mirrored). The imbalance is only visible if you're writing full-width stripes in sequence, so if you write a 1TB file in one long stream, chances are that that file's parity blocks will be concentrated on just two disks, so those two disks will get less I/O on later reads. I don't know why the code toggles parity between just the first two columns; rotating it between all columns would give you an even balance. Is it always the last two disks that have less load, or does it slowly rotate to different disks depending on the data that you are reading? An interesting test would be to idle the system, run a tar cvf /dev/null /raidz1 in one window, and watch iostat output on another window. If the load moves from disk to disk as tar reads different files, then my parity guess is probably right. If ada0 and ada1 are always busier, than you can ignore me :) Since it looks like the algorithm ends up creating two half-cold parity disks instead of one cold disk, I bet a 3-disk RAIDZ would exhibit even worse balancing, and a 5-disk set would be more even. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
can a wrong alignment cause a decrease in a hdd's life expectancy?
hi there, i'm using a usb hdd with the following specs: otaku% sudo smartctl -i /dev/da0 smartctl 5.42 2011-10-20 r3458 [FreeBSD 10.0-CURRENT amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Western Digital My Passport Essential SE (USB, Adv. Format) Device Model: WDC WD10TMVW-11ZSMS4 Serial Number:WD-WXJ1A81C1845 LU WWN Device Id: 5 0014ee 1af1e4483 Firmware Version: 01.01A01 User Capacity:1,000,204,886,016 bytes [1,00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is:In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is:Mon Dec 19 23:00:43 2011 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled unfortunately i didn't align it properly using gpart(8)'s -a switch. performance wise it shouldn't cause any issues, because i'm accessing this hdd through usb 2 exclusively. however my concern is that using an alignment of 512 will put an extra workload onto the hdd (doing the conversion - 4096). will this reduce my hdd's life expectancy? in that case i might consider re-partitioning it (with proper alignment settings). cheers. alex ps: the hdd only gets mounted read-only! ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Failure to compile world
Hi people! I'm writing here because I'm having issues with compiling world from a Symphony# uname -a FreeBSD Symphony 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #2: Fri Dec 16 18:52:44 ART 2011 vertex@Symphony:/usr/obj/usr/src/sys/GENERIC i386 Machine with latest source from that date. I'm using this git mirror (Sorry guys, if I don't tolerate SVN, the less I'll tolerate CVS) → https://github.com/freebsd/freebsd-head (I've pulled and got this as last commit → https://github.com/freebsd/freebsd-head/commit/f700576aa6240ea7133ce4812aec810266bcbfe7 ) With this /etc/make.conf (I have to clean it up btw) Symphony# cat /etc/make.conf # # Make.conf WITHOUT_NOUVEAU= # added by use.perl 2011-11-10 19:36:38 PERL_VERSION=5.12.4 ## # Clang for kernel+world .if ${.CURDIR:M/usr/src/*} || ${.CURDIR:M/usr/obj/*} || ${.CURDIR:M/sys/*} .if !defined(CC) || ${CC} == cc CC=clang .endif .if !defined(CXX) || ${CXX} == c++ CXX=clang++ .endif .if !defined(CPP) || ${CPP} == cpp CPP=clang-cpp .endif # Don't die on warnings NO_WERROR= WERROR= # Don't forget this when using Jails! NO_FSCHG= .if defined(WITH_FLAGS) WITH_LIBCPLUSPLUS=YES .endif .endif And /usr/obj was properly rm -rf'ed The problem comes with undefined stuff when I'm compiling libc : clang -fpic -DPIC -O2 -pipe -I/usr/src/lib/libc/include -I/usr/src/lib/libc/../../include -I/usr/src/lib/libc/i386 -DNLS -D__DBINTERFACE_PRIVATE -I/usr/src/lib/libc/../../contrib/gdtoa -DINET6 -I/usr/obj/usr/src/lib/libc -I/usr/src/lib/libc/resolv -D_ACL_PRIVATE -DPOSIX_MISTAKE -I/usr/src/lib/libc/../../contrib/tzcode/stdtime -I/usr/src/lib/libc/stdtime -I/usr/src/lib/libc/locale -DBROKEN_DES -DPORTMAP -DDES_BUILTIN -I/usr/src/lib/libc/rpc -DYP -DNS_CACHING -DSYMBOL_VERSIONING -std=gnu99 -fstack-protector -Wsystem-headers -Wall -Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign -Wno-tautological-compare -Wno-unused-value -Wno-parentheses-equality -Wno-unused-function -Wno-conversion -Wno-switch-enum -Wno-empty-body -c _fcntl.S -o _fcntl.So clang -O2 -pipe -I/usr/src/lib/libc/include -I/usr/src/lib/libc/../../include -I/usr/src/lib/libc/i386 -DNLS -D__DBINTERFACE_PRIVATE -I/usr/src/lib/libc/../../contrib/gdtoa -DINET6 -I/usr/obj/usr/src/lib/libc -I/usr/src/lib/libc/resolv -D_ACL_PRIVATE -DPOSIX_MISTAKE -I/usr/src/lib/libc/../../contrib/tzcode/stdtime -I/usr/src/lib/libc/stdtime -I/usr/src/lib/libc/locale -DBROKEN_DES -DPORTMAP -DDES_BUILTIN -I/usr/src/lib/libc/rpc -DYP -DNS_CACHING -DSYMBOL_VERSIONING -std=gnu99 -fstack-protector -Wsystem-headers -Wall -Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign -Wno-tautological-compare -Wno-unused-value -Wno-parentheses-equality -Wno-unused-function -Wno-conversion -Wno-switch-enum -Wno-empty-body -c _sigwait.S clang -fpic -DPIC -O2 -pipe -I/usr/src/lib/libc/include -I/usr/src/lib/libc/../../include -I/usr/src/lib/libc/i386 -DNLS -D__DBINTERFACE_PRIVATE -I/usr/src/lib/libc/../../contrib/gdtoa -DINET6 -I/usr/obj/usr/src/lib/libc -I/usr/src/lib/libc/resolv -D_ACL_PRIVATE -DPOSIX_MISTAKE -I/usr/src/lib/libc/../../contrib/tzcode/stdtime -I/usr/src/lib/libc/stdtime -I/usr/src/lib/libc/locale -DBROKEN_DES -DPORTMAP -DDES_BUILTIN -I/usr/src/lib/libc/rpc -DYP -DNS_CACHING -DSYMBOL_VERSIONING -std=gnu99 -fstack-protector -Wsystem-headers -Wall -Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign -Wno-tautological-compare -Wno-unused-value -Wno-parentheses-equality -Wno-unused-function -Wno-conversion -Wno-switch-enum -Wno-empty-body -c _sigwait.S -o _sigwait.So clang -O2 -pipe -I/usr/src/lib/libc/include -I/usr/src/lib/libc/../../include -I/usr/src/lib/libc/i386 -DNLS -D__DBINTERFACE_PRIVATE -I/usr/src/lib/libc/../../contrib/gdtoa -DINET6 -I/usr/obj/usr/src/lib/libc -I/usr/src/lib/libc/resolv -D_ACL_PRIVATE -DPOSIX_MISTAKE -I/usr/src/lib/libc/../../contrib/tzcode/stdtime -I/usr/src/lib/libc/stdtime -I/usr/src/lib/libc/locale -DBROKEN_DES -DPORTMAP -DDES_BUILTIN -I/usr/src/lib/libc/rpc -DYP -DNS_CACHING -DSYMBOL_VERSIONING -std=gnu99 -fstack-protector -Wsystem-headers -Wall -Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign -Wno-tautological-compare -Wno-unused-value -Wno-parentheses-equality -Wno-unused-function -Wno-conversion -Wno-switch-enum -Wno-empty-body -c /usr/src/lib/libc/db/btree/bt_close.c -o bt_close.o In file included from /usr/src/lib/libc/db/btree/bt_close.c:44: /usr/src/lib/libc/../../include/stdlib.h:79:1: error: unknown type name '_Noreturn' _Noreturn void abort(void); ^ /usr/src/lib/libc/../../include/stdlib.h:79:11: error: expected identifier or '(' _Noreturn void abort(void); ^ /usr/src/lib/libc/../../include/stdlib.h:89:1: error: unknown type name '_Noreturn' _Noreturn void exit(int); ^ /usr/src/lib/libc/../../include/stdlib.h:89:11: error: expected identifier or '(' _Noreturn void exit(int); ^ /usr/src/lib/libc/../../include/stdlib.h:148:1: error: unknown type name '_Noreturn' _Noreturn void _Exit(int); ^
Re: can a wrong alignment cause a decrease in a hdd's life expectancy?
In message 20111219221617.ga70...@freebsd.org, Alexander Best writes: ps: the hdd only gets mounted read-only! There is no known wear-effects in flash storage as long as you only read. You may need to do refresh-writes every 5-10 years to avoid tunnel-leakage bit errors, but most flash controllers use semi-long ECC syndromes and will do so on first bit that gives an read error. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
cross-arch building picobsd/nanobsd images ?
Hi, recently I have tried to build picobsd image for a different architecture than the current one, with only partial success. In particular, three weeks ago i committed some changes to the picobsd script so now i can build working amd64 images on amd64. However when i try a cross build (e.g. i386 image on an amd64 host) the kernel stops right after trying to mount the root partition. The error message is the following: ... Timecounter TSC frequency 1858691100 Hz quality 800 Trying to mount root from ufs:/dev/md0 []... panic: mutex Giant owned at .../sys/kern/kern_exit.c:128 cpuid = 0 KDB: enter: panic [ thread pid 1 tid 11 ] Stopped at kdb_enter+0x3b: movl$0,kdb_why db The backtrace indicates the following (i omit the numbers, as i am manually copying the text) kdb_enter panic _mtx_assert exit1 kern_execve sys_execve exec_shell_imgact fork_exit fork_trampoline --- trap 0, eip = 0, esp = 0xc3708d60, ebp = 0 --- any idea on what could be going wrong ? On a related topic, does anyone have experience on cross-building nanobsd images ? thanks luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Uneven load on drives in ZFS RAIDZ1
On Dec 19, 2011, at 11:53 PM, Dan Nelson wrote: Since it looks like the algorithm ends up creating two half-cold parity disks instead of one cold disk, I bet a 3-disk RAIDZ would exhibit even worse balancing, and a 5-disk set would be more even. There were some experiments a year or two ago with different number of disks in raidz and the results suggested that certain number of disks had better performance, contrary to theory that writes should be evenly distributed. Worse, this is in the official theory of how raidz operates… Perhaps the code can be fixed to spread the writes to all devices in raidz, but compatibility issues need to be considered. Perhaps DDT is stored in the 'worst case' write size, because it clearly exhibits such poor distribution. Daniel___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: can a wrong alignment cause a decrease in a hdd's life expectancy?
Putting it better: http://en.wikipedia.org/wiki/Flash_memory#Read_disturb ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: can a wrong alignment cause a decrease in a hdd's life expectancy?
On Mon Dec 19 11, Poul-Henning Kamp wrote: In message 20111219221617.ga70...@freebsd.org, Alexander Best writes: ps: the hdd only gets mounted read-only! There is no known wear-effects in flash storage as long as you only read. You may need to do refresh-writes every 5-10 years to avoid tunnel-leakage bit errors, but most flash controllers use semi-long ECC syndromes and will do so on first bit that gives an read error. this is a regular hdd i believe -- no ssd. at least when i plug it into my usb drive i hear the hdd spinning up and causing vibrations. i don't think that would be the case with an ssd. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: can a wrong alignment cause a decrease in a hdd's life expectancy?
In message 20111219224700.ga75...@freebsd.org, Alexander Best writes: On Mon Dec 19 11, Poul-Henning Kamp wrote: In message 20111219221617.ga70...@freebsd.org, Alexander Best writes: ps: the hdd only gets mounted read-only! There is no known wear-effects in flash storage as long as you only read. You may need to do refresh-writes every 5-10 years to avoid tunnel-leakage bit errors, but most flash controllers use semi-long ECC syndromes and will do so on first bit that gives an read error. this is a regular hdd i believe -- no ssd. at least when i plug it into my usb drive i hear the hdd spinning up and causing vibrations. i don't think that would be the case with an ssd. Ahh, sorry, I don't know why I thought it was flash. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: can a wrong alignment cause a decrease in a hdd's life expectancy?
On Mon Dec 19 11, Poul-Henning Kamp wrote: In message 20111219224700.ga75...@freebsd.org, Alexander Best writes: On Mon Dec 19 11, Poul-Henning Kamp wrote: In message 20111219221617.ga70...@freebsd.org, Alexander Best writes: ps: the hdd only gets mounted read-only! There is no known wear-effects in flash storage as long as you only read. You may need to do refresh-writes every 5-10 years to avoid tunnel-leakage bit errors, but most flash controllers use semi-long ECC syndromes and will do so on first bit that gives an read error. this is a regular hdd i believe -- no ssd. at least when i plug it into my usb drive i hear the hdd spinning up and causing vibrations. i don't think that would be the case with an ssd. Ahh, sorry, I don't know why I thought it was flash. no problem. so will the improper alignment also not cause a life expectancy shortage in case of a hdd (non-flash-based)? and one other question: the hdd also supports usb 3. will the improper alignment have any effect (speed wise) when connected via usb 3, or is even usb 3 too slow to notice the performance drop due to the improper alignment? cheers. alex -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: can a wrong alignment cause a decrease in a hdd's life expectancy?
In message 20111219225633.ga77...@freebsd.org, Alexander Best writes: no problem. so will the improper alignment also not cause a life expectancy shortage in case of a hdd (non-flash-based)? Well, theoretically you will have more track-to-track seeks, as some blocks will span cylinders, but I doubt that will have measurable impact on lifetime, compared with the gains you could harvest if you spin it down for even just 1 hour a day... Read-Only/Read-Write makes no difference that I know of for hard-disks. and one other question: the hdd also supports usb 3. will the improper alignment have any effect (speed wise) when connected via usb 3, or is even usb 3 too slow to notice the performance drop due to the improper alignment? Again: I doubt it will be measurable. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: can a wrong alignment cause a decrease in a hdd's life expectancy?
On 12/19/2011 2:22 PM, Poul-Henning Kamp wrote: In message20111219221617.ga70...@freebsd.org, Alexander Best writes: ps: the hdd only gets mounted read-only! There is no known wear-effects in flash storage as long as you only read. No, sorry, that's not really true. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: can a wrong alignment cause a decrease in a hdd's life expectancy?
In message 4eefb9f3.80...@feral.com, Matthew Jacob writes: On 12/19/2011 2:22 PM, Poul-Henning Kamp wrote: In message20111219221617.ga70...@freebsd.org, Alexander Best writes: ps: the hdd only gets mounted read-only! There is no known wear-effects in flash storage as long as you only read. No, sorry, that's not really true. Pray tell! There will always be charge leakage, but last I talked to silicon-pushers, that was (almost) entirely independent of read-access and correlated strongly with temperature*duration. Obviously, if your flash controller lies to you and do needless writes anyway, we are not talking read-only. Those are the only two effects I know of ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: can a wrong alignment cause a decrease in a hdd's life expectancy?
On Mon Dec 19 11, Jeremy Chadwick wrote: On Mon, Dec 19, 2011 at 10:56:33PM +, Alexander Best wrote: On Mon Dec 19 11, Poul-Henning Kamp wrote: In message 20111219224700.ga75...@freebsd.org, Alexander Best writes: On Mon Dec 19 11, Poul-Henning Kamp wrote: In message 20111219221617.ga70...@freebsd.org, Alexander Best writes: ps: the hdd only gets mounted read-only! There is no known wear-effects in flash storage as long as you only read. You may need to do refresh-writes every 5-10 years to avoid tunnel-leakage bit errors, but most flash controllers use semi-long ECC syndromes and will do so on first bit that gives an read error. this is a regular hdd i believe -- no ssd. at least when i plug it into my usb drive i hear the hdd spinning up and causing vibrations. i don't think that would be the case with an ssd. Ahh, sorry, I don't know why I thought it was flash. no problem. so will the improper alignment also not cause a life expectancy shortage in case of a hdd (non-flash-based)? The improper alignment will result in sub-par write performance, and a slight decrease in read performance writes -- but will not impact life expectancy or harm the drive in any way. I recommend strongly that you rectify the situation before you get too carried away with software installations, etc.. And yes I am aware what you have is a mechanical HDD not an SSD (I say in this advance of what I'm about to write). If you need a safe alignment value, most software on Windows (including Windows 7) pick a value of 2MBytes as the alignment offset, which I believe is LBA 4095, since everything software-wise uses 512-byte sectors. That's calculated via: 2097152 / 512. This number is also evenly divisible by 4096 bytes (which is what you're trying to ensure for performance). Readers, as well as you, may wonder where the magical 2MByte value comes from, and can you pick something smaller. Yes you can pick something smaller, but the value itself stems from the added complexity of SSDs and NAND erase page size vs. NAND page size. A value of 2MBytes works well on all brands of SSDs on the market (as of this writing). Which reminds me -- I need to go back and redo most of our systems that use Intel SSDs, since at the time I picked the default offset in sysinstall (LBA 63, thus 64 * 512 = 32KBytes), which though divisible by 4096, is not optimal for NAND erase page size. I would love to advocate FreeBSD change sysinstall/bsdinstall to use a default offset of 2MBytes, but I imagine that would upset a lot of people who install FreeBSD on limited space devices (CF, etc.). Honestly though, with the size of media these days thanks a lot for the explanation. i'm going to get another drive, soon, and will then be able to fix the alignment, as i currently have no place where i can backup the data of my current (misaligned) hdd. and one other question: the hdd also supports usb 3. will the improper alignment have any effect (speed wise) when connected via usb 3, or is even usb 3 too slow to notice the performance drop due to the improper alignment? USB 3.0 vs. 2.0 vs. eSATA vs. native SATA has no bearing on the situation. Those are transport protocols that define maximum bandwidth. By the way, the hard disk itself does not support USB 3.0 -- your drive is in an enclosure that contains a SATA-USB3.0 conversion chipset inside. If you open the enclosure, you will find the hard disk is SATA, and probably supports SATA600. i was ware of this fact. what i meant by speed in connection with usb 3 was the following example-case (please don't take the numbers literally) 1) the drive itself can do 500 mb/sec when aligned properly 2) the drive does 350 mb/sec when aligned improperly (512 boundry) 3) usb 3 can do 100 mb/sec ... so in this case the improper alignment wouldn't have an impact, since even with proper alignment only 100 mb/sec were possible. however in the following example: 1) 500 mb/sec 2) 100 mb/sec 3) 200 mb/sec the improper alignment would have an impact, since usb 3 *could* perform at 200 mb/sec with proper alignment, but will drop to 100 mb/sec in the case of improper alignment. again...please don't take the transfer rates literaly. they're most defenately bogus. cheers. alex -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: can a wrong alignment cause a decrease in a hdd's life expectancy?
On Mon, Dec 19, 2011 at 03:20:10PM -0800, Jeremy Chadwick wrote: On Mon, Dec 19, 2011 at 10:56:33PM +, Alexander Best wrote: On Mon Dec 19 11, Poul-Henning Kamp wrote: In message 20111219224700.ga75...@freebsd.org, Alexander Best writes: On Mon Dec 19 11, Poul-Henning Kamp wrote: In message 20111219221617.ga70...@freebsd.org, Alexander Best writes: ps: the hdd only gets mounted read-only! There is no known wear-effects in flash storage as long as you only read. You may need to do refresh-writes every 5-10 years to avoid tunnel-leakage bit errors, but most flash controllers use semi-long ECC syndromes and will do so on first bit that gives an read error. this is a regular hdd i believe -- no ssd. at least when i plug it into my usb drive i hear the hdd spinning up and causing vibrations. i don't think that would be the case with an ssd. Ahh, sorry, I don't know why I thought it was flash. no problem. so will the improper alignment also not cause a life expectancy shortage in case of a hdd (non-flash-based)? The improper alignment will result in sub-par write performance, and a slight decrease in read performance writes -- but will not impact life expectancy or harm the drive in any way. This should have read ...slight decrease in read performance, not read performance writes. Editing mistake on my part. :-) -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: can a wrong alignment cause a decrease in a hdd's life expectancy?
On Mon, Dec 19, 2011 at 10:56:33PM +, Alexander Best wrote: On Mon Dec 19 11, Poul-Henning Kamp wrote: In message 20111219224700.ga75...@freebsd.org, Alexander Best writes: On Mon Dec 19 11, Poul-Henning Kamp wrote: In message 20111219221617.ga70...@freebsd.org, Alexander Best writes: ps: the hdd only gets mounted read-only! There is no known wear-effects in flash storage as long as you only read. You may need to do refresh-writes every 5-10 years to avoid tunnel-leakage bit errors, but most flash controllers use semi-long ECC syndromes and will do so on first bit that gives an read error. this is a regular hdd i believe -- no ssd. at least when i plug it into my usb drive i hear the hdd spinning up and causing vibrations. i don't think that would be the case with an ssd. Ahh, sorry, I don't know why I thought it was flash. no problem. so will the improper alignment also not cause a life expectancy shortage in case of a hdd (non-flash-based)? The improper alignment will result in sub-par write performance, and a slight decrease in read performance writes -- but will not impact life expectancy or harm the drive in any way. I recommend strongly that you rectify the situation before you get too carried away with software installations, etc.. And yes I am aware what you have is a mechanical HDD not an SSD (I say in this advance of what I'm about to write). If you need a safe alignment value, most software on Windows (including Windows 7) pick a value of 2MBytes as the alignment offset, which I believe is LBA 4095, since everything software-wise uses 512-byte sectors. That's calculated via: 2097152 / 512. This number is also evenly divisible by 4096 bytes (which is what you're trying to ensure for performance). Readers, as well as you, may wonder where the magical 2MByte value comes from, and can you pick something smaller. Yes you can pick something smaller, but the value itself stems from the added complexity of SSDs and NAND erase page size vs. NAND page size. A value of 2MBytes works well on all brands of SSDs on the market (as of this writing). Which reminds me -- I need to go back and redo most of our systems that use Intel SSDs, since at the time I picked the default offset in sysinstall (LBA 63, thus 64 * 512 = 32KBytes), which though divisible by 4096, is not optimal for NAND erase page size. I would love to advocate FreeBSD change sysinstall/bsdinstall to use a default offset of 2MBytes, but I imagine that would upset a lot of people who install FreeBSD on limited space devices (CF, etc.). Honestly though, with the size of media these days and one other question: the hdd also supports usb 3. will the improper alignment have any effect (speed wise) when connected via usb 3, or is even usb 3 too slow to notice the performance drop due to the improper alignment? USB 3.0 vs. 2.0 vs. eSATA vs. native SATA has no bearing on the situation. Those are transport protocols that define maximum bandwidth. By the way, the hard disk itself does not support USB 3.0 -- your drive is in an enclosure that contains a SATA-USB3.0 conversion chipset inside. If you open the enclosure, you will find the hard disk is SATA, and probably supports SATA600. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server
Hi all, just a thought here: On Tue, Dec 20, 2011 at 12:45 AM, Daniel Kalchev dan...@digsys.bg wrote: As were told, Phoronix used default setup, not tuned. Not really. They created some weird test environment, at least for FreeBSD -- who knows, possibly for Linux as well. For example, ZFS is by no means a default file system in FreeBSD. You need to go trough manual steps, to enable it, to build the pool, filesystems etc. .. Of course the benchmark setup and procedure is strange but.. it could be improved, I think. Have a good collection of tuning parameters for popular cases, advertised properly so it gets hard to miss them. I am a sysadmin and, over the years, I had to run file servers, database servers, web servers, tomcats... Well, most of the time I set it up and it just works because the system in question is not maxed out, not even close to it. But if I want to squeeze the last 20% out of it googling starts, and here and there I find hints how to tune the OS, the file system, what scheduler to use etc. It would be great to have a set of case studies at hand, e.g. under the /usr/share/examples directory, that describes tweaks to have a performing postgresql server, or mysql, or apache or a desktop or.. Things I find, for example, in the BSD Magazine. Maybe benchmarks become more meaningful then.. A general remark for people doing benchmarks for comparison: you need a well-informed system engineer for the systems you compare. So, if you compare a Linux system with FreeBSD, have two experienced admins that know their OS well. Regards Peter ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
making crdup()/crcopy() safe??
Hi, A recent NFS client crash: http://glebius.int.ru/tmp/nfs_panic.jpg appears to have happened because some field is bogus when crfree() is called. I've asked Gleb to disassemble crfree() for me, so I can try and see exactly which field causes the crash, however... Basically, the code: newcred = crdup(cred); - does read with newcred crfree(newcred); -- which crashes at 0x65 into crfree() Looking at crdup(), it calls crcopy(), which copies 4 pointers and then ref. counts them: cr_uidinfo, cr_ruidinfo, cr_prison and cr_loginclass It seems some lock should be held while crcopy() does this, so that the pointers don't get deref'd during the copy/ref. count? (Or is there some rule that guarantees these won't change. ie. No no calls to change_euid() or similar.) Is there such a lock and should crdup() use it? Thanks in advance for any info, rick ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Failure to compile world
http://clang.llvm.org/docs/LanguageExtensions.html#__builtin_unreachable http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1453.htm Apparently this is the problem: _Noreturn void abort(void); // [...] more declarations _Noreturn void exit(int); Those noreturns are supposed to be written with GCC syntax and this can be workarounded with this : |#define _Noreturn __attribute__ ((noreturn))| | Maybe the compiler can be checked in preprocessor and add that compatibility line so the code compiles correctly ||Any opinion on this?| | Thanks for your time and for reading ! | ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Failure to compile world
A follow-up on this is libc not building because of missing SCTP_REMOTE_UDP_ENCAPS_PORT apparently the Makefile doesn't include /sys/ into the includes of the libc. My current version (/usr/include/netinet/sctp.h) lacks that definition, it should look in the headers of the source, not the current system headers ... so I just added that to the Makefile ( lib/libc/net/Makefile.inc ). I'm leaving note if anyone else experiences the same problem. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: can a wrong alignment cause a decrease in a hdd's life expectancy?
On Mon, 19 Dec 2011, Alexander Best wrote: no problem. so will the improper alignment also not cause a life expectancy shortage in case of a hdd (non-flash-based)? and one other question: the hdd also supports usb 3. will the improper alignment have any effect (speed wise) when connected via usb 3, or is even usb 3 too slow to notice the performance drop due to the improper alignment? Many variables: file system, file size, drive firmware... The only reason not to fix it is time. And space for a temporary copy... two, two reasons not to fix it. Benchmark it as-is, back up, realign, restore, benchmark again. Or live with the gnawing, creeping doubt of not knowing for sure. Every day wondering is that drive slower than it could be just from a simple alignment error? Is every read a mere fraction of its potential? But it's probably fine. No pressure. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Failure to compile world
On 12/20/2011 01:52, Garrett Cooper wrote: On Mon, Dec 19, 2011 at 7:31 PM, Alex Kustervertexsymph...@zoho.com wrote: A follow-up on this is libc not building because of missing SCTP_REMOTE_UDP_ENCAPS_PORT apparently the Makefile doesn't include /sys/ into the includes of the libc. My current version (/usr/include/netinet/sctp.h) lacks that definition, it should look in the headers of the source, not the current system headers ... so I just added that to the Makefile ( lib/libc/net/Makefile.inc ). I'm leaving note if anyone else experiences the same problem. Please file a PR for this and other similar build issues. The mantra I've gotten in the past is that builds aren't guaranteed to work in a subdirs, but I would really like for this to become a reality because I really wouldn't want to have to installworld (or installincludes) a whole system just to get some headers installed for a trivial program in the base system :). Just to make sure though, did you do make depend all , or just make all? Thanks! -Garrett Hi Garett ... Well, those issues were raised by a simple make buildworld in the traditional /usr/src When I found the first issue with libc i just went to /usr/src/lib/libc, fixed and ran a make in there, so the second issue appeared and libc was built with no problems. Now I'm facing another one which I'll find out and see how to fix to get a compiling/working system. Thanks for your time! P.S → I didn't know about installincludes, I'll read about that P.S 2 → I never-ever-ever filed a PR ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org