Make ZFS use the physical sector size when computing initial ashift
The attached patch causes ZFS to base the minimum transfer size for a new vdev on the GEOM provider's stripesize (physical sector size) rather than sectorsize (logical sector size), provided that stripesize is a power of two larger than sectorsize and smaller than or equal to VDEV_PAD_SIZE. This should eliminate the need for ivoras@'s gnop trick when creating ZFS pools on Advanced Format drives. DES -- Dag-Erling Smørgrav - d...@des.no Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c === --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c (revision 253138) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c (working copy) @@ -578,6 +578,7 @@ { struct g_provider *pp; struct g_consumer *cp; + u_int sectorsize; size_t bufsize; int error; @@ -661,8 +662,21 @@ /* * Determine the device's minimum transfer size. + * + * This is a bit of a hack. For performance reasons, we would + * prefer to use the physical sector size (reported by GEOM as + * stripesize) as minimum transfer size. However, doing so + * unconditionally would break existing vdevs. Therefore, we + * compute ashift based on stripesize when the vdev isn't already + * part of a pool (vdev_asize == 0), and sectorsize otherwise. */ - *ashift = highbit(MAX(pp-sectorsize, SPA_MINBLOCKSIZE)) - 1; + if (vd-vdev_asize == 0 pp-stripesize pp-sectorsize + ISP2(pp-stripesize) pp-stripesize = VDEV_PAD_SIZE) { + sectorsize = pp-stripesize; + } else { + sectorsize = pp-sectorsize; + } + *ashift = highbit(MAX(sectorsize, SPA_MINBLOCKSIZE)) - 1; /* * Clear the nowritecache settings, so that on a vdev_reopen() ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
Hi DES, unfortunately you need a quite bit more than this to work compatibly. I've had a patch here that does just this for quite some time but there's been some discussion on how we want additional control over this so its not been commited. If others are interested I've attached this as it achieves what we needed here so may also be of use for others too. There's also a big discussion on illumos about this very subject ATM so I'm monitoring that too. Hopefully there will be a nice conclusion come from that how people want to proceed and we'll be able to get a change in that works for everyone. Regards Steve - Original Message - From: Dag-Erling Smørgrav d...@des.no To: freebsd...@freebsd.org; freebsd-hackers@freebsd.org Cc: ivo...@freebsd.org Sent: Wednesday, July 10, 2013 10:02 AM Subject: Make ZFS use the physical sector size when computing initial ashift The attached patch causes ZFS to base the minimum transfer size for a new vdev on the GEOM provider's stripesize (physical sector size) rather than sectorsize (logical sector size), provided that stripesize is a power of two larger than sectorsize and smaller than or equal to VDEV_PAD_SIZE. This should eliminate the need for ivoras@'s gnop trick when creating ZFS pools on Advanced Format drives. DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. zzz-zfs-ashift-fix.patch Description: Binary data ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: hw.physmem/hw.realmem question
On 3 July 2013 01:45, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: AMD Features2=0x1LAHF TSC: P-state invariant, performance statistics real memory = 34359738368 (32768 MB) avail memory = 32191340544 (30700 MB) 2GB memory disappears too even when you don't set anything. i asked such a question for other machine some time ago without much answer. in your laptop it may be shared graphics memory reserved by chipset still on my dell server real memory = 34359738368 (32768 MB) avail memory = 33166921728 (31630 MB) i have over 1GB unavailable and it doesn't have shared graphics memory. it would be nice to be able to look exactly how memory is used. On amd64 about 3% is cut on startup for page structures, see vm_page_startup(). -- wbr, pluknet ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
Steven Hartland kill...@multiplay.co.uk writes: Hi DES, unfortunately you need a quite bit more than this to work compatibly. *chirp* *chirp* *chirp* DES -- Dag-Erling Smørgrav - d...@des.no ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
On Jul 10, 2013, at 11:25 AM, Steven Hartland wrote: If others are interested I've attached this as it achieves what we needed here so may also be of use for others too. There's also a big discussion on illumos about this very subject ATM so I'm monitoring that too. Hopefully there will be a nice conclusion come from that how people want to proceed and we'll be able to get a change in that works for everyone. Hmm. I wonder if the simplest approach would be the better. I mean, adding a flag to zpool. At home I have a playground FreeBSD machine with a ZFS zmirror, and, you guessed it, I was careless when I purchased the components, I asked for two 1 TB drives and that I got, but different models, one of them advanced format and the other one classic. I don't think it's that bad to create a pool on a classic disk using 4 KB blocks, and it's quite likely that replacement disks will be 4 KB in the near future. Also, if you use SSDs the situation is similar. Borja. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
There's lots more to consider when considering a way foward not least of all ashift isn't a zpool configuration option is per top level vdev, space consideration of moving from 512b to 4k, see previous and current discussions on zfs-de...@freebsd.org and z...@lists.illumos.org for details. Regards Steve - Original Message - From: Borja Marcos bor...@sarenet.es On Jul 10, 2013, at 11:25 AM, Steven Hartland wrote: If others are interested I've attached this as it achieves what we needed here so may also be of use for others too. There's also a big discussion on illumos about this very subject ATM so I'm monitoring that too. Hopefully there will be a nice conclusion come from that how people want to proceed and we'll be able to get a change in that works for everyone. Hmm. I wonder if the simplest approach would be the better. I mean, adding a flag to zpool. At home I have a playground FreeBSD machine with a ZFS zmirror, and, you guessed it, I was careless when I purchased the components, I asked for two 1 TB drives and that I got, but different models, one of them advanced format and the other one classic. I don't think it's that bad to create a pool on a classic disk using 4 KB blocks, and it's quite likely that replacement disks will be 4 KB in the near future. Also, if you use SSDs the situation is similar. This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 07/10/13 02:02, Dag-Erling Smrgrav wrote: The attached patch causes ZFS to base the minimum transfer size for a new vdev on the GEOM provider's stripesize (physical sector size) rather than sectorsize (logical sector size), provided that stripesize is a power of two larger than sectorsize and smaller than or equal to VDEV_PAD_SIZE. This should eliminate the need for ivoras@'s gnop trick when creating ZFS pools on Advanced Format drives. I think there are multiple versions of this (I also have one[1]) but the concern is that if one creates a pool with ashift=9, and now ashift=12, the pool gets unimportable. So there need a way to disable this behavior. Another thing (not really related to the automatic detection) is that we need a way to manually override this setting from command line when creating the pool, this is under active discussion at Illumos mailing list right now. [1] https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26 Cheers, - -- Xin LI delp...@delphij.nethttps://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -BEGIN PGP SIGNATURE- iQEcBAEBCgAGBQJR3ZgAAAoJEG80Jeu8UPuzM6kIALu3Ud4uu+kdcsp+zNS54iw6 Etx2xWOjbHhJ1PZ0BKJ4R5/BOfpW4b1DrarPtpZLxoyg55GwlEVCH8Cia9ucznfP KgFGwzztQlsiI5hcWD6RVNkAx/2o7sSynbprxxP1UdEdmH7f5MWVpNwjGE2KiIpA 0TxfTu8Sg0/QB7h3pGWt5sJSuwyogewvHIfTAgHEqnQdYPXxpadH7PS7shSJVdim z2C9GoyLVQ6BMxXzQDcmA+fllgMZVKXROG7SxDFNDTWPnZ9HMZp2OJKELLtuZB1y Iaq/gd3uPR2ZzPxw2OjdYKe7khWtmuU5Ox6+natsOKCqfoAfCjArA8zJZYsZoMI= =Nd1V -END PGP SIGNATURE- ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
On Jul 10, 2013, at 11:21 AM, Xin Li delp...@delphij.net wrote: Signed PGP part On 07/10/13 02:02, Dag-Erling Smrgrav wrote: The attached patch causes ZFS to base the minimum transfer size for a new vdev on the GEOM provider's stripesize (physical sector size) rather than sectorsize (logical sector size), provided that stripesize is a power of two larger than sectorsize and smaller than or equal to VDEV_PAD_SIZE. This should eliminate the need for ivoras@'s gnop trick when creating ZFS pools on Advanced Format drives. I think there are multiple versions of this (I also have one[1]) but the concern is that if one creates a pool with ashift=9, and now ashift=12, the pool gets unimportable. So there need a way to disable this behavior. Another thing (not really related to the automatic detection) is that we need a way to manually override this setting from command line when creating the pool, this is under active discussion at Illumos mailing list right now. [1] https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26 Cheers, - -- Xin LI delp...@delphij.nethttps://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die ___ freebsd...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. -- Justin signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Make ZFS use the physical sector size when computing initial ashift
- Original Message - From: Xin Li -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 07/10/13 02:02, Dag-Erling Sm?rgrav wrote: The attached patch causes ZFS to base the minimum transfer size for a new vdev on the GEOM provider's stripesize (physical sector size) rather than sectorsize (logical sector size), provided that stripesize is a power of two larger than sectorsize and smaller than or equal to VDEV_PAD_SIZE. This should eliminate the need for ivoras@'s gnop trick when creating ZFS pools on Advanced Format drives. I think there are multiple versions of this (I also have one[1]) but the concern is that if one creates a pool with ashift=9, and now ashift=12, the pool gets unimportable. So there need a way to disable this behavior. I've tested my patch in all configurations I can think of including exported ashift=9 pools being imported, all no issues. For your example e.g. # Create a 4K pool (min_create_ashift=4K, dev=512) test:src sysctl vfs.zfs.min_create_ashift vfs.zfs.min_create_ashift: 12 test:src mdconfig -a -t swap -s 128m -S 512 -u 0 test:src zpool create mdpool md0 test:src zdb mdpool | grep ashift ashift: 12 ashift: 12 # Create a 512b pool (min_create_ashift=512, dev=512) test:src zpool destroy mdpool test:src sysctl vfs.zfs.min_create_ashift=9 vfs.zfs.min_create_ashift: 12 - 9 test:src zpool create mdpool md0 test:src zdb mdpool | grep ashift ashift: 9 ashift: 9 # Import a 512b pool (min_create_ashift=4K, dev=512) test:src zpool export mdpool test:src sysctl vfs.zfs.min_create_ashift=12 vfs.zfs.min_create_ashift: 9 - 12 test:src zpool import mdpool test:src zdb mdpool | grep ashift ashift: 9 ashift: 9 # Create a 4K pool (min_create_ashift=512, dev=4K) test:src zpool destroy mdpool test:src mdconfig -d -u 0 test:src mdconfig -a -t swap -s 128m -S 4096 -u 0 test:src sysctl vfs.zfs.min_create_ashift=9 vfs.zfs.min_create_ashift: 12 - 9 test:src zpool create mdpool md0 test:src zdb mdpool | grep ashift ashift: 12 ashift: 12 # Import a 4K pool (min_create_ashift=4K, dev=4K) test:src zpool export mdpool test:src sysctl vfs.zfs.min_create_ashift=12 vfs.zfs.min_create_ashift: 9 - 12 test:src zpool import mdpool test:src zdb mdpool | grep ashift ashift: 12 ashift: 12 Another thing (not really related to the automatic detection) is that we need a way to manually override this setting from command line when creating the pool, this is under active discussion at Illumos mailing list right now. [1] https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26 Yep has been on my list for a while, based on previous discussions on zfs-devel@. I've not had any time recently but I'm following the illumos thread to see what conclusions they come to. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 07/10/13 10:38, Justin T. Gibbs wrote: [snip] I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. Yes, me too. Your version is superior. Cheers, - -- Xin LI delp...@delphij.nethttps://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -BEGIN PGP SIGNATURE- iQEcBAEBCgAGBQJR3aQzAAoJEG80Jeu8UPuzHn8H/1ZpoTqAQ4+mgQOttOwXgBcr 2Fgh52ztW8fCEQSeIosxXKO06hP7HxFfTPvmeeWyjT8zIpSUSFV6G0NclebKDncP huGFofvx3BKPRmfzZp4iZx1wWQUxSHTmv6ceDwvP7P8GJ0mON+SrZxmmwUjKrf7V W9Sazl0p8e0nxSQykLyjjrkaBx5Iv+aUxu8Alomwy9BmpM8+gd2yutvzghW5L36L 0CvAtIMXdlc+eUdAqa/2rOk/nMOA9sfWVW0gkKYCZk6wvj2DMzjii05UechZ4Z+l 6nEU3UdVsbTX73CABZv4my4JAWc5Yk1s/cWrxtn68AfK8LMPFJCJcVXXOSckMWI= =351W -END PGP SIGNATURE- ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
possible changes from Panzura
I'm going through all the internal changes my current employer has made, categorizing them into proprietary and can feed back to FreeBSD. I will probably send out emails like this several times seeking feedback on whether a particular patch is considered useful or not.. these are verse 8.0 at the moment. (this is part of our effort to upgrade) My first candidates are: -internal commit message Add support for dumping kernel dumps in addition to text dumps for kernel panics. Add a new version of savecore to the tree, which knows how to retrieve and save both dumps. Control the new dump behavior via the debug.kerneldump_requested sysctl - disabling this wil go back to the old text dump-only behavior. -- part 2 - Have savecore be more optimistic about saving compressed cores - always try, and only bail if we actually run out of space. The pessimistic only try saving if we've got enough free space to handle the entire dump uncompressed made it too easy for us to run out of space on our /var/crash partition --- Julian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
- Original Message - From: Justin T. Gibbs I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. Reading through your patch it seems that your logical_ashift equates to the current ashift values which for geom devices is based off sectorsize and your physical_ashift is based stripesize. This is almost identical to the approach I used adding a desired ashift, which equates to your physical_ashift, along side the standard ashift i.e. required aka logical_ashift value :) One issue I did spot in your patch is that you currently expose zfs_max_auto_ashift as a sysctl but don't clamp its value which would cause problems should a user configure values 13. If your interested in the reason for this its explained in the comments in my version which does a very similar thing with validation. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
On Jul 10, 2013, at 1:06 PM, Steven Hartland kill...@multiplay.co.uk wrote: - Original Message - From: Justin T. Gibbs I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. Reading through your patch it seems that your logical_ashift equates to the current ashift values which for geom devices is based off sectorsize and your physical_ashift is based stripesize. This is almost identical to the approach I used adding a desired ashift, which equates to your physical_ashift, along side the standard ashift i.e. required aka logical_ashift value :) Yes, the approaches are similar. Our current version records the logical access size in the vdev structure too, which might relate to the issue below. One issue I did spot in your patch is that you currently expose zfs_max_auto_ashift as a sysctl but don't clamp its value which would cause problems should a user configure values 13. I would expect the zio pipeline to simply insert an ashift aligned thunking buffer for these operations, but I haven't tried going past an ashift of 13 in my tests. If it is an issue, it seems the restriction should be based on logical access size, not optimal access size. -- Justin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
- Original Message - From: Justin T. Gibbs On Jul 10, 2013, at 1:06 PM, Steven Hartland wrote: - Original Message - From: Justin T. Gibbs I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. Reading through your patch it seems that your logical_ashift equates to the current ashift values which for geom devices is based off sectorsize and your physical_ashift is based stripesize. This is almost identical to the approach I used adding a desired ashift, which equates to your physical_ashift, along side the standard ashift i.e. required aka logical_ashift value :) Yes, the approaches are similar. Our current version records the logical access size in the vdev structure too, which might relate to the issue below. One issue I did spot in your patch is that you currently expose zfs_max_auto_ashift as a sysctl but don't clamp its value which would cause problems should a user configure values 13. I would expect the zio pipeline to simply insert an ashift aligned thunking buffer for these operations, but I haven't tried going past an ashift of 13 in my tests. If it is an issue, it seems the restriction should be based on logical access size, not optimal access size. Yes with your methodology you'll only see the issue if zfs_max_auto_ashift and physical_ashift are both 13, but this can be the case for example on a RAID controller with large stripsize. Looking back at my old patch it too suffers from the same issue along with the current code base, but that would only happen if logical sector size resulted in an ashift 13 which is going to be much less common ;-) Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
On Jul 10, 2013, at 1:42 PM, Steven Hartland kill...@multiplay.co.uk wrote: - Original Message - From: Justin T. Gibbs On Jul 10, 2013, at 1:06 PM, Steven Hartland wrote: - Original Message - From: Justin T. Gibbs I'm sure lots of folks have some solution to this. Here is an old version of what we use at Spectra: http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff The above patch is missing some cleanup that was motivated by my discussions with George Wilson about this change in April. I'll dig that up later tonight. Even if you don't read the full diff, please read the included checkin comment since it explains the motivation behind this particular solution. This is on my list of things to upstream in the next week or so after I add logic to the userspace tools to report whether or not the TLVs in a pool are using an optimal allocation size. This is only possible if you actually make ZFS fully aware of logical, physical, and the configured allocation size. All of the other patches I've seen just treat physical as logical. Reading through your patch it seems that your logical_ashift equates to the current ashift values which for geom devices is based off sectorsize and your physical_ashift is based stripesize. This is almost identical to the approach I used adding a desired ashift, which equates to your physical_ashift, along side the standard ashift i.e. required aka logical_ashift value :) Yes, the approaches are similar. Our current version records the logical access size in the vdev structure too, which might relate to the issue below. One issue I did spot in your patch is that you currently expose zfs_max_auto_ashift as a sysctl but don't clamp its value which would cause problems should a user configure values 13. I would expect the zio pipeline to simply insert an ashift aligned thunking buffer for these operations, but I haven't tried going past an ashift of 13 in my tests. If it is an issue, it seems the restriction should be based on logical access size, not optimal access size. Yes with your methodology you'll only see the issue if zfs_max_auto_ashift and physical_ashift are both 13, but this can be the case for example on a RAID controller with large stripsize. I'm not sure I follow. logical_ashift is available in our latest code, as is the physical_ashift. But even without the logical_ashift, why doesn't the zio pipeline properly thunk zio_phys_read() access based on the configured ashift? -- Justin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Kernel dumps [was Re: possible changes from Panzura]
On Jul 10, 2013, at 11:16 AM, Julian Elischer jul...@elischer.org wrote: My first candidates are: Those sound useful. Just out of curiosity, however, since we're on the topic of kernel dumps: Has anyone even looked into the notion of an emergency fall-back network stack to enable remote kernel panic (or system hang) debugging, the way OS X lets you do? I can't tell you the number of times I've NMI'd a Mac and connected to it remotely in a scenario where everything was totally wedged and just a couple of minutes in kgdb (or now lldb) quickly showed that everything was waiting on a specific lock and the problem became manifestly clear. The feature also lets you scrape a panic'd machine with automation, running some kgdb scripts against it to glean useful information for later analysis vs having to have someone schlep the dump image manually to triage. It's going to be damn hard to live without this now, and if someone else isn't working on it, that's good to know too! - Jordan ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: possible changes from Panzura
On 07/10/2013 13:16, Julian Elischer wrote: I'm going through all the internal changes my current employer has made, categorizing them into proprietary and can feed back to FreeBSD. I will probably send out emails like this several times seeking feedback on whether a particular patch is considered useful or not.. these are verse 8.0 at the moment. (this is part of our effort to upgrade) My first candidates are: -internal commit message Add support for dumping kernel dumps in addition to text dumps for kernel panics. Add a new version of savecore to the tree, which knows how to retrieve and save both dumps. Control the new dump behavior via the debug.kerneldump_requested sysctl - disabling this wil go back to the old text dump-only behavior. I wonder which would be more useful: this, or just dumping the full dump and using crashinfo to create a text summary after reboot. Of course, crashinfo could be enhanced to show anything it's currently missing (relative to the text dump). This would have the advantage of doing less stuff at dump time. Yours would have the advantage that it exists and works. :) Thoughts? -- part 2 - Have savecore be more optimistic about saving compressed cores - always try, and only bail if we actually run out of space. The pessimistic only try saving if we've got enough free space to handle the entire dump uncompressed made it too easy for us to run out of space on our /var/crash partition Yes, please. I've run into this occasionally, but it never annoyed me enough to fix it. Procrastination pays off yet again. ;) Eric ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make ZFS use the physical sector size when computing initial ashift
- Original Message - From: Justin T. Gibbs ... One issue I did spot in your patch is that you currently expose zfs_max_auto_ashift as a sysctl but don't clamp its value which would cause problems should a user configure values 13. I would expect the zio pipeline to simply insert an ashift aligned thunking buffer for these operations, but I haven't tried going past an ashift of 13 in my tests. If it is an issue, it seems the restriction should be based on logical access size, not optimal access size. Yes with your methodology you'll only see the issue if zfs_max_auto_ashift and physical_ashift are both 13, but this can be the case for example on a RAID controller with large stripsize. I'm not sure I follow. logical_ashift is available in our latest code, as is the physical_ashift. But even without the logical_ashift, why doesn't the zio pipeline properly thunk zio_phys_read() access based on the configured ashift? When I looked at it, which was a long time ago now so please excuse me if I'm a little rusty on the details, zio_phys_read() was working more luck than judgement as the offsets passed in where calculated from a valid start + increment based on the size of a structure within vdev_label_offset() with no ashift logic applied that I cound find. The result was pools created with large ashift's where unstable when I tested. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Kernel dumps [was Re: possible changes from Panzura]
On Jul 10, 2013, at 1:04 PM, asom...@gmail.com wrote: I don't doubt that it would be useful to have an emergency network stack. But have you ever looked into debugging over firewire? Absolutely. In fact, before the advent of remote network debugging, FW was totally the debugging method of choice since firewire target DMA lets you do all kinds of useful things (as well as a few things that simply scare the security guys to death ;-) ). My point was more that actually being able to debug a machine over the network is such a step up in terms of convenience/awesomeness that if anyone is thinking of putting any time and attention into this area at all, that's definitely the target to go for. Looking at http://www.opensource.apple.com/tarballs/xnu/xnu-2050.22.13.tar.gz there's even reasonable documentation on the kernel debugging protocol in xnu/osfmk/kdp. Folks could do worse than try to clone it. The gdb debugger macros in support of it are also in xnu/kgmacros. None of it is going to be 'drop in' for FreeBSD by any stretch of the imagination, but it's always easier to get to a destination when you have a map. :-)Anyone with a Mac can also nvram boot-args=debug=0x144 and test-drive it around, just to see how it works in actual practice. See also: https://developer.apple.com/library/mac/#documentation/Darwin/Conceptual/KEXTConcept/KEXTConceptDebugger/debug_tutorial.html - Jordan ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Kernel dumps [was Re: possible changes from Panzura]
On Wed, Jul 10, 2013 at 12:57 PM, Jordan Hubbard j...@mail.turbofuzz.com wrote: On Jul 10, 2013, at 11:16 AM, Julian Elischer jul...@elischer.org wrote: My first candidates are: Those sound useful. Just out of curiosity, however, since we're on the topic of kernel dumps: Has anyone even looked into the notion of an emergency fall-back network stack to enable remote kernel panic (or system hang) debugging, the way OS X lets you do? I can't tell you the number of times I've NMI'd a Mac and connected to it remotely in a scenario where everything was totally wedged and just a couple of minutes in kgdb (or now lldb) quickly showed that everything was waiting on a specific lock and the problem became manifestly clear. The feature also lets you scrape a panic'd machine with automation, running some kgdb scripts against it to glean useful information for later analysis vs having to have someone schlep the dump image manually to triage. It's going to be damn hard to live without this now, and if someone else isn't working on it, that's good to know too! I don't doubt that it would be useful to have an emergency network stack. But have you ever looked into debugging over firewire? We've had success with it. All of our development machines are connected to a single firewire bus. When one panics, we can remotely debug it with both kdb and ddb. It's not ethernet , but it's still much faster than a serial port. https://wiki.freebsd.org/DebugWithDcons - Jordan ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Kernel dumps [was Re: possible changes from Panzura]
Those sound useful. Just out of curiosity, however, since we're on the topic of kernel dumps: Has anyone even looked into the notion of an emergency fall-back network stack to enable remote kernel panic (or system hang) debugging, the way OS X lets you do? I can't tell you the number of times I've NMI'd a Mac and connected to it remotely in a scenario where everything was totally wedged and just a couple of minutes in kgdb (or now lldb) quickly showed that everything was waiting on a specific lock and the problem became manifestly clear. The feature also lets you scrape a panic'd machine with automation, running some kgdb scripts against it to glean useful information for later analysis vs having to have someone schlep the dump image manually to triage. It's going to be damn hard to live without this now, and if someone else isn't working on it, that's good to know too! At a previous employer, we had a system where on a panic it had a totally separate stack capable of just IP/UDP/TFTP and would save its core via TFTP to a server. This isn’t as nice as full remote debugging, but it was a whole lot easier to develop. The caveats I remember were: 1) We didn’t want to implement ARP, so you had to write the mac address of the “dump server” to the kernel via sysctl before crashing. 2) We also didn’t want to have to deal with routing tables, so you had to manually specify what interface to blast packets out to, also via sysctl. 3) After a panic we didn’t want to rely on interrupt processing working, so it polled the network interface and blocked whenever it needed to. Since this was an embedded system, it wasn’t too big of a deal - only one network driver had to be hacked to support this. Basically a flag that would switch to “disable normal processing, switch to polled fifos for input and output” until reboot. 4) The whole system used only preallocated buffers and its own stack (carved out from memory on boot) so even if the kernel’s malloc was trashed, we could still dump. I’m not sure this really would scratch your itch, but I believe this took me no more than a day or two to implement. Parts #1 and #2 would be pretty easy, but I’m not sure how generic the kernel could support an emergency network mode that doesn’t require interrupts for every network card out there. Maybe that isn’t as important to you as it was to us. The whole exercise is much easier if you don’t use TFTP but a custom protocol that doesn’t require the crashing system to receive any packets, if it can just blast away at some random host oblivious if it’s working or not, it’s a lot less code to write. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Kernel dumps [was Re: possible changes from Panzura]
On Wed, Jul 10, 2013 at 3:50 PM, Jordan Hubbard j...@mail.turbofuzz.com wrote: Absolutely. In fact, before the advent of remote network debugging, FW was totally the debugging method of choice since firewire target DMA lets you do all kinds of useful things (as well as a few things that simply scare the security guys to death ;-) ). My point was more that actually being able to debug a machine over the network is such a step up in terms of convenience/awesomeness that if anyone is thinking of putting any time and attention into this area at all, that's definitely the target to go for. Looking at http://www.opensource.apple.com/tarballs/xnu/xnu-2050.22.13.tar.gz there's even reasonable documentation on the kernel debugging protocol in xnu/osfmk/kdp. Folks could do worse than try to clone it. The gdb debugger macros in support of it are also in xnu/kgmacros. None of it is going to be 'drop in' for FreeBSD by any stretch of the imagination, but it's always easier to get to a destination when you have a map. :-)Anyone with a Mac can also nvram boot-args=debug=0x144 and test-drive it around, just to see how it works in actual practice. See also: https://developer.apple.com/library/mac/#documentation/Darwin/Conceptual/KEXTConcept/KEXTConceptDebugger/debug_tutorial.html Speaking of Apple solutions, I've recently used Apple's kgdb with the kernel debug kit kdp remote debugging, to debug a panic'd OS X host. It's really quite nice, because the debug kit comes with a ton of macros, similar to kdb, and you also get the benefit of source debugging. I think FreeBSD would benefit massively from finding some way to share macros between kdb and kgdb, in addition to having an emergency network stack like you suggest. As Alan says, until then, there's firewire, and also gdbsx if your FreeBSD system is running as a Xen guest. --Will. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Kernel dumps [was Re: possible changes from Panzura]
On Wed, 10 Jul 2013 14:50:19 PDT Jordan Hubbard j...@mail.turbofuzz.com wrote: On Jul 10, 2013, at 1:04 PM, asom...@gmail.com wrote: I don't doubt that it would be useful to have an emergency network stack. But have you ever looked into debugging over firewire? My point was more that actually being able to debug a machine over the networ k is such a step up in terms of convenience/awesomeness that if anyone is thi nking of putting any time and attention into this area at all, that's definit ely the target to go for. You have to use this just once to see how convenient it is! For a previous company James Da Silva did this in 1997 by adding a network console (IIRC in a day or two). A new ethernet type was used + a host specific ethernet multicast address so you could connect from any machine on the same ethernet segment. Either as a remote console for the usual console IO ddb, or to run remote gdb. Quite insecure but that didn't matter as this was used in a test network. There was no emegerency network stack; just a polling function added to an ethernet driver since this had to work even when the kernel was on the operating table under anaesthetic! No new gdb hacks were necessary since the invoking program set things up for it. If I was doing this today, I'd probably still do the same and make sure that the interface used for remote debugging is on an isolated network. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Kernel dumps [was Re: possible changes from Panzura]
On 10/07/2013 23:09, Kevin Day wrote: Those sound useful. Just out of curiosity, however, since we're on the topic of kernel dumps: Has anyone even looked into the notion of an emergency fall-back network stack to enable remote kernel panic (or system hang) debugging, the way OS X lets you do? I can't tell you the number of times I've NMI'd a Mac and connected to it remotely in a scenario where everything was totally wedged and just a couple of minutes in kgdb (or now lldb) quickly showed that everything was waiting on a specific lock and the problem became manifestly clear. The feature also lets you scrape a panic'd machine with automation, running some kgdb scripts against it to glean useful information for later analysis vs having to have someone schlep the dump image manually to triage. It's going to be damn hard to live without this now, and if someone else isn't working on it, that's good to know too! At a previous employer, we had a system where on a panic it had a totally separate stack capable of just IP/UDP/TFTP and would save its core via TFTP to a server. This isn’t as nice as full remote debugging, but it was a whole lot easier to develop. The caveats I remember were: 1) We didn’t want to implement ARP, so you had to write the mac address of the “dump server” to the kernel via sysctl before crashing. 2) We also didn’t want to have to deal with routing tables, so you had to manually specify what interface to blast packets out to, also via sysctl. 3) After a panic we didn’t want to rely on interrupt processing working, so it polled the network interface and blocked whenever it needed to. Since this was an embedded system, it wasn’t too big of a deal - only one network driver had to be hacked to support this. Basically a flag that would switch to “disable normal processing, switch to polled fifos for input and output” until reboot. 4) The whole system used only preallocated buffers and its own stack (carved out from memory on boot) so even if the kernel’s malloc was trashed, we could still dump. I’m not sure this really would scratch your itch, but I believe this took me no more than a day or two to implement. Parts #1 and #2 would be pretty easy, but I’m not sure how generic the kernel could support an emergency network mode that doesn’t require interrupts for every network card out there. Maybe that isn’t as important to you as it was to us. The whole exercise is much easier if you don’t use TFTP but a custom protocol that doesn’t require the crashing system to receive any packets, if it can just blast away at some random host oblivious if it’s working or not, it’s a lot less code to write. There was some work on something similar at one point, not sure what came of it. http://lists.freebsd.org/pipermail/freebsd-current/2010-September/020164.html Vince ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org