Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Dag-Erling Smørgrav
The attached patch causes ZFS to base the minimum transfer size for a
new vdev on the GEOM provider's stripesize (physical sector size) rather
than sectorsize (logical sector size), provided that stripesize is a
power of two larger than sectorsize and smaller than or equal to
VDEV_PAD_SIZE.  This should eliminate the need for ivoras@'s gnop trick
when creating ZFS pools on Advanced Format drives.

DES
-- 
Dag-Erling Smørgrav - d...@des.no

Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
===
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c	(revision 253138)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c	(working copy)
@@ -578,6 +578,7 @@
 {
 	struct g_provider *pp;
 	struct g_consumer *cp;
+	u_int sectorsize;
 	size_t bufsize;
 	int error;
 
@@ -661,8 +662,21 @@
 
 	/*
 	 * Determine the device's minimum transfer size.
+	 *
+	 * This is a bit of a hack.  For performance reasons, we would
+	 * prefer to use the physical sector size (reported by GEOM as
+	 * stripesize) as minimum transfer size.  However, doing so
+	 * unconditionally would break existing vdevs.  Therefore, we
+	 * compute ashift based on stripesize when the vdev isn't already
+	 * part of a pool (vdev_asize == 0), and sectorsize otherwise.
 	 */
-	*ashift = highbit(MAX(pp-sectorsize, SPA_MINBLOCKSIZE)) - 1;
+	if (vd-vdev_asize == 0  pp-stripesize  pp-sectorsize 
+	ISP2(pp-stripesize)  pp-stripesize = VDEV_PAD_SIZE) {
+		sectorsize = pp-stripesize;
+	} else {
+		sectorsize = pp-sectorsize;
+	}
+	*ashift = highbit(MAX(sectorsize, SPA_MINBLOCKSIZE)) - 1;
 
 	/*
 	 * Clear the nowritecache settings, so that on a vdev_reopen()
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Steven Hartland

Hi DES, unfortunately you need a quite bit more than this to work compatibly.

I've had a patch here that does just this for quite some time but there's been 
some
discussion on how we want additional control over this so its not been commited.

If others are interested I've attached this as it achieves what we needed here 
so
may also be of use for others too.

There's also a big discussion on illumos about this very subject ATM so I'm
monitoring that too.

Hopefully there will be a nice conclusion come from that how people want to
proceed and we'll be able to get a change in that works for everyone.

   Regards
   Steve
- Original Message - 
From: Dag-Erling Smørgrav d...@des.no

To: freebsd...@freebsd.org; freebsd-hackers@freebsd.org
Cc: ivo...@freebsd.org
Sent: Wednesday, July 10, 2013 10:02 AM
Subject: Make ZFS use the physical sector size when computing initial ashift


The attached patch causes ZFS to base the minimum transfer size for a
new vdev on the GEOM provider's stripesize (physical sector size) rather
than sectorsize (logical sector size), provided that stripesize is a
power of two larger than sectorsize and smaller than or equal to
VDEV_PAD_SIZE.  This should eliminate the need for ivoras@'s gnop trick
when creating ZFS pools on Advanced Format drives.

DES
--
Dag-Erling Smørgrav - d...@des.no








___
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org 




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

zzz-zfs-ashift-fix.patch
Description: Binary data
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: hw.physmem/hw.realmem question

2013-07-10 Thread Sergey Kandaurov
On 3 July 2013 01:45, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote:
  AMD Features2=0x1LAHF
  TSC: P-state invariant, performance statistics
 real memory  = 34359738368 (32768 MB)
 avail memory = 32191340544 (30700 MB)


 2GB memory disappears too even when you don't set anything.

 i asked such a question for other machine some time ago without much answer.


 in your laptop it may be shared graphics memory reserved by chipset

 still on my dell server


 real memory  = 34359738368 (32768 MB)
 avail memory = 33166921728 (31630 MB)

 i have over 1GB unavailable and it doesn't have shared graphics memory.

 it would be nice to be able to look exactly how memory is used.

On amd64 about 3% is cut on startup for page structures, see vm_page_startup().

-- 
wbr,
pluknet
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Dag-Erling Smørgrav
Steven Hartland kill...@multiplay.co.uk writes:
 Hi DES, unfortunately you need a quite bit more than this to work
 compatibly.

*chirp* *chirp* *chirp*

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Borja Marcos

On Jul 10, 2013, at 11:25 AM, Steven Hartland wrote:

 If others are interested I've attached this as it achieves what we needed 
 here so
 may also be of use for others too.
 
 There's also a big discussion on illumos about this very subject ATM so I'm
 monitoring that too.
 
 Hopefully there will be a nice conclusion come from that how people want to
 proceed and we'll be able to get a change in that works for everyone.

Hmm. I wonder if the simplest approach would be the better. I mean, adding a 
flag to zpool.

At home I have a playground FreeBSD machine with a ZFS zmirror, and, you 
guessed it, I was
careless when I purchased the components, I asked for two 1 TB drives and 
that I got, but different
models, one of them advanced format and the other one classic.

I don't think it's that bad to create a pool on a classic disk using 4 KB 
blocks, and it's quite likely that
replacement disks will be 4 KB in the near future. 

Also, if you use SSDs the situation is similar.





Borja.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Steven Hartland

There's lots more to consider when considering a way foward not least of all
ashift isn't a zpool configuration option is per top level vdev, space
consideration of moving from 512b to 4k, see previous and current discussions
on zfs-de...@freebsd.org and z...@lists.illumos.org for details.

   Regards
   Steve

- Original Message - 
From: Borja Marcos bor...@sarenet.es


On Jul 10, 2013, at 11:25 AM, Steven Hartland wrote:


If others are interested I've attached this as it achieves what we needed here 
so
may also be of use for others too.

There's also a big discussion on illumos about this very subject ATM so I'm
monitoring that too.

Hopefully there will be a nice conclusion come from that how people want to
proceed and we'll be able to get a change in that works for everyone.


Hmm. I wonder if the simplest approach would be the better. I mean, adding a 
flag to zpool.

At home I have a playground FreeBSD machine with a ZFS zmirror, and, you 
guessed it, I was
careless when I purchased the components, I asked for two 1 TB drives and 
that I got, but different
models, one of them advanced format and the other one classic.

I don't think it's that bad to create a pool on a classic disk using 4 KB 
blocks, and it's quite likely that
replacement disks will be 4 KB in the near future. 


Also, if you use SSDs the situation is similar.



This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Xin Li
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 07/10/13 02:02, Dag-Erling Sm￸rgrav wrote:
 The attached patch causes ZFS to base the minimum transfer size for
 a new vdev on the GEOM provider's stripesize (physical sector size)
 rather than sectorsize (logical sector size), provided that
 stripesize is a power of two larger than sectorsize and smaller
 than or equal to VDEV_PAD_SIZE.  This should eliminate the need for
 ivoras@'s gnop trick when creating ZFS pools on Advanced Format
 drives.

I think there are multiple versions of this (I also have one[1]) but
the concern is that if one creates a pool with ashift=9, and now
ashift=12, the pool gets unimportable.  So there need a way to disable
this behavior.

Another thing (not really related to the automatic detection) is that
we need a way to manually override this setting from command line when
creating the pool, this is under active discussion at Illumos mailing
list right now.

[1]
https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26

Cheers,
- -- 
Xin LI delp...@delphij.nethttps://www.delphij.net/
FreeBSD - The Power to Serve!   Live free or die
-BEGIN PGP SIGNATURE-

iQEcBAEBCgAGBQJR3ZgAAAoJEG80Jeu8UPuzM6kIALu3Ud4uu+kdcsp+zNS54iw6
Etx2xWOjbHhJ1PZ0BKJ4R5/BOfpW4b1DrarPtpZLxoyg55GwlEVCH8Cia9ucznfP
KgFGwzztQlsiI5hcWD6RVNkAx/2o7sSynbprxxP1UdEdmH7f5MWVpNwjGE2KiIpA
0TxfTu8Sg0/QB7h3pGWt5sJSuwyogewvHIfTAgHEqnQdYPXxpadH7PS7shSJVdim
z2C9GoyLVQ6BMxXzQDcmA+fllgMZVKXROG7SxDFNDTWPnZ9HMZp2OJKELLtuZB1y
Iaq/gd3uPR2ZzPxw2OjdYKe7khWtmuU5Ox6+natsOKCqfoAfCjArA8zJZYsZoMI=
=Nd1V
-END PGP SIGNATURE-
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Justin T. Gibbs
On Jul 10, 2013, at 11:21 AM, Xin Li delp...@delphij.net wrote:

 Signed PGP part
 On 07/10/13 02:02, Dag-Erling Sm￸rgrav wrote:
  The attached patch causes ZFS to base the minimum transfer size for
  a new vdev on the GEOM provider's stripesize (physical sector size)
  rather than sectorsize (logical sector size), provided that
  stripesize is a power of two larger than sectorsize and smaller
  than or equal to VDEV_PAD_SIZE.  This should eliminate the need for
  ivoras@'s gnop trick when creating ZFS pools on Advanced Format
  drives.
 
 I think there are multiple versions of this (I also have one[1]) but
 the concern is that if one creates a pool with ashift=9, and now
 ashift=12, the pool gets unimportable.  So there need a way to disable
 this behavior.
 
 Another thing (not really related to the automatic detection) is that
 we need a way to manually override this setting from command line when
 creating the pool, this is under active discussion at Illumos mailing
 list right now.
 
 [1]
 https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26
 
 Cheers,
 - -- 
 Xin LI delp...@delphij.nethttps://www.delphij.net/
 FreeBSD - The Power to Serve!   Live free or die
 
 ___
 freebsd...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org

I'm sure lots of folks have some solution to this.  Here is an
old version of what we use at Spectra:

http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff

The above patch is missing some cleanup that was motivated by my
discussions with George Wilson about this change in April.  I'll
dig that up later tonight.  Even if you don't read the full diff,
please read the included checkin comment since it explains the
motivation behind this particular solution.

This is on my list of things to upstream in the next week or so after
I add logic to the userspace tools to report whether or not the
TLVs in a pool are using an optimal allocation size.  This is only
possible if you actually make ZFS fully aware of logical, physical,
and the configured allocation size.  All of the other patches I've seen
just treat physical as logical.

--
Justin



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Steven Hartland


- Original Message - 
From: Xin Li 


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 07/10/13 02:02, Dag-Erling Sm?rgrav wrote:

The attached patch causes ZFS to base the minimum transfer size for
a new vdev on the GEOM provider's stripesize (physical sector size)
rather than sectorsize (logical sector size), provided that
stripesize is a power of two larger than sectorsize and smaller
than or equal to VDEV_PAD_SIZE.  This should eliminate the need for
ivoras@'s gnop trick when creating ZFS pools on Advanced Format
drives.


I think there are multiple versions of this (I also have one[1]) but
the concern is that if one creates a pool with ashift=9, and now
ashift=12, the pool gets unimportable.  So there need a way to disable
this behavior.


I've tested my patch in all configurations I can think of including exported
ashift=9 pools being imported, all no issues.

For your example e.g.

# Create a 4K pool (min_create_ashift=4K, dev=512)
test:src sysctl vfs.zfs.min_create_ashift
vfs.zfs.min_create_ashift: 12
test:src mdconfig -a -t swap -s 128m -S 512 -u 0
test:src zpool create mdpool md0
test:src zdb mdpool | grep ashift
   ashift: 12
   ashift: 12

# Create a 512b pool (min_create_ashift=512, dev=512)
test:src zpool destroy mdpool
test:src sysctl vfs.zfs.min_create_ashift=9
vfs.zfs.min_create_ashift: 12 - 9
test:src zpool create mdpool md0 
test:src zdb mdpool | grep ashift

   ashift: 9
   ashift: 9

# Import a 512b pool (min_create_ashift=4K, dev=512)
test:src zpool export mdpool
test:src sysctl vfs.zfs.min_create_ashift=12
vfs.zfs.min_create_ashift: 9 - 12
test:src zpool import mdpool
test:src zdb mdpool | grep ashift
   ashift: 9
   ashift: 9

# Create a 4K pool (min_create_ashift=512, dev=4K)
test:src zpool destroy mdpool
test:src mdconfig -d -u 0
test:src mdconfig -a -t swap -s 128m -S 4096 -u 0   
test:src sysctl vfs.zfs.min_create_ashift=9

vfs.zfs.min_create_ashift: 12 - 9
test:src zpool create mdpool md0
test:src zdb mdpool | grep ashift
   ashift: 12
   ashift: 12

# Import a 4K pool (min_create_ashift=4K, dev=4K)
test:src zpool export mdpool
test:src sysctl vfs.zfs.min_create_ashift=12
vfs.zfs.min_create_ashift: 9 - 12
test:src zpool import mdpool
test:src zdb mdpool | grep ashift
   ashift: 12
   ashift: 12


Another thing (not really related to the automatic detection) is that
we need a way to manually override this setting from command line when
creating the pool, this is under active discussion at Illumos mailing
list right now.

[1]
https://github.com/trueos/trueos/commit/3d2e3a38faad8df4acf442b055c5e98ab873fb26


Yep has been on my list for a while, based on previous discussions on 
zfs-devel@. I've not had any time recently but I'm following the illumos

thread to see what conclusions they come to.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Xin Li
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 07/10/13 10:38, Justin T. Gibbs wrote:
[snip]
 I'm sure lots of folks have some solution to this.  Here is an 
 old version of what we use at Spectra:
 
 http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff
 
 The above patch is missing some cleanup that was motivated by my 
 discussions with George Wilson about this change in April.  I'll 
 dig that up later tonight.  Even if you don't read the full diff, 
 please read the included checkin comment since it explains the 
 motivation behind this particular solution.
 
 This is on my list of things to upstream in the next week or so
 after I add logic to the userspace tools to report whether or not
 the TLVs in a pool are using an optimal allocation size.  This is
 only possible if you actually make ZFS fully aware of logical,
 physical, and the configured allocation size.  All of the other
 patches I've seen just treat physical as logical.

Yes, me too.  Your version is superior.

Cheers,
- -- 
Xin LI delp...@delphij.nethttps://www.delphij.net/
FreeBSD - The Power to Serve!   Live free or die
-BEGIN PGP SIGNATURE-

iQEcBAEBCgAGBQJR3aQzAAoJEG80Jeu8UPuzHn8H/1ZpoTqAQ4+mgQOttOwXgBcr
2Fgh52ztW8fCEQSeIosxXKO06hP7HxFfTPvmeeWyjT8zIpSUSFV6G0NclebKDncP
huGFofvx3BKPRmfzZp4iZx1wWQUxSHTmv6ceDwvP7P8GJ0mON+SrZxmmwUjKrf7V
W9Sazl0p8e0nxSQykLyjjrkaBx5Iv+aUxu8Alomwy9BmpM8+gd2yutvzghW5L36L
0CvAtIMXdlc+eUdAqa/2rOk/nMOA9sfWVW0gkKYCZk6wvj2DMzjii05UechZ4Z+l
6nEU3UdVsbTX73CABZv4my4JAWc5Yk1s/cWrxtn68AfK8LMPFJCJcVXXOSckMWI=
=351W
-END PGP SIGNATURE-
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


possible changes from Panzura

2013-07-10 Thread Julian Elischer
I'm going through all the internal changes my current employer has made, 
categorizing them
into  proprietary and can feed back to FreeBSD.

I will probably send out emails like this several times seeking feedback on 
whether a particular patch is considered useful or not..
these are verse 8.0 at the moment.  (this is part of our effort to upgrade)

My first  candidates are:

-internal commit message
Add support for dumping kernel dumps in addition to text dumps for
kernel panics. Add a new version of savecore to the tree, which knows
how to retrieve and save both dumps. Control the new dump behavior via the
debug.kerneldump_requested sysctl - disabling this wil go back to the
old text dump-only behavior.

--  part 2 -
 Have savecore be more optimistic about
saving compressed cores - always try, and only bail if we actually run
out of space. The pessimistic only try saving if we've got enough free
space to handle the entire dump uncompressed made it too easy for us to
run out of space on our /var/crash partition
---

Julian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Steven Hartland
- Original Message - 
From: Justin T. Gibbs 

I'm sure lots of folks have some solution to this.  Here is an
old version of what we use at Spectra:

 http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff

The above patch is missing some cleanup that was motivated by my
discussions with George Wilson about this change in April.  I'll
dig that up later tonight.  Even if you don't read the full diff,
please read the included checkin comment since it explains the
motivation behind this particular solution.

This is on my list of things to upstream in the next week or so after
I add logic to the userspace tools to report whether or not the
TLVs in a pool are using an optimal allocation size.  This is only
possible if you actually make ZFS fully aware of logical, physical,
and the configured allocation size.  All of the other patches I've seen
just treat physical as logical.


Reading through your patch it seems that your logical_ashift equates to
the current ashift values which for geom devices is based off sectorsize
and your physical_ashift is based stripesize.

This is almost identical to the approach I used adding a desired ashift,
which equates to your physical_ashift, along side the standard ashift
i.e. required aka logical_ashift value :)

One issue I did spot in your patch is that you currently expose
zfs_max_auto_ashift as a sysctl but don't clamp its value which would
cause problems should a user configure values  13.

If your interested in the reason for this its explained in the comments in 
my version which does a very similar thing with validation.


   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Justin T. Gibbs
On Jul 10, 2013, at 1:06 PM, Steven Hartland kill...@multiplay.co.uk wrote:

 - Original Message - From: Justin T. Gibbs 
 I'm sure lots of folks have some solution to this.  Here is an
 old version of what we use at Spectra:
 http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff
 The above patch is missing some cleanup that was motivated by my
 discussions with George Wilson about this change in April.  I'll
 dig that up later tonight.  Even if you don't read the full diff,
 please read the included checkin comment since it explains the
 motivation behind this particular solution.
 
 This is on my list of things to upstream in the next week or so after
 I add logic to the userspace tools to report whether or not the
 TLVs in a pool are using an optimal allocation size.  This is only
 possible if you actually make ZFS fully aware of logical, physical,
 and the configured allocation size.  All of the other patches I've seen
 just treat physical as logical.
 
 Reading through your patch it seems that your logical_ashift equates to
 the current ashift values which for geom devices is based off sectorsize
 and your physical_ashift is based stripesize.
 
 This is almost identical to the approach I used adding a desired ashift,
 which equates to your physical_ashift, along side the standard ashift
 i.e. required aka logical_ashift value :)

Yes, the approaches are similar.  Our current version records the logical
access size in the vdev structure too, which might relate to the issue
below.

 One issue I did spot in your patch is that you currently expose
 zfs_max_auto_ashift as a sysctl but don't clamp its value which would
 cause problems should a user configure values  13.

I would expect the zio pipeline to simply insert an ashift aligned thunking
buffer for these operations, but I haven't tried going past an ashift of 13 in
my tests.  If it is an issue, it seems the restriction should be based on
logical access size, not optimal access size.

--
Justin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Steven Hartland


- Original Message - 
From: Justin T. Gibbs

On Jul 10, 2013, at 1:06 PM, Steven Hartland wrote:
- Original Message - From: Justin T. Gibbs 

I'm sure lots of folks have some solution to this.  Here is an
old version of what we use at Spectra:
http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff
The above patch is missing some cleanup that was motivated by my
discussions with George Wilson about this change in April.  I'll
dig that up later tonight.  Even if you don't read the full diff,
please read the included checkin comment since it explains the
motivation behind this particular solution.

This is on my list of things to upstream in the next week or so after
I add logic to the userspace tools to report whether or not the
TLVs in a pool are using an optimal allocation size.  This is only
possible if you actually make ZFS fully aware of logical, physical,
and the configured allocation size.  All of the other patches I've seen
just treat physical as logical.


Reading through your patch it seems that your logical_ashift equates to
the current ashift values which for geom devices is based off sectorsize
and your physical_ashift is based stripesize.

This is almost identical to the approach I used adding a desired ashift,
which equates to your physical_ashift, along side the standard ashift
i.e. required aka logical_ashift value :)


Yes, the approaches are similar.  Our current version records the logical
access size in the vdev structure too, which might relate to the issue
below.

 One issue I did spot in your patch is that you currently expose
 zfs_max_auto_ashift as a sysctl but don't clamp its value which would
 cause problems should a user configure values  13.

I would expect the zio pipeline to simply insert an ashift aligned thunking
buffer for these operations, but I haven't tried going past an ashift of 13 in
my tests.  If it is an issue, it seems the restriction should be based on
logical access size, not optimal access size.


Yes with your methodology you'll only see the issue if zfs_max_auto_ashift
and physical_ashift are both  13, but this can be the case for example
on a RAID controller with large stripsize.

Looking back at my old patch it too suffers from the same issue along with
the current code base, but that would only happen if logical sector size
resulted in an ashift  13 which is going to be much less common ;-)

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Justin T. Gibbs
On Jul 10, 2013, at 1:42 PM, Steven Hartland kill...@multiplay.co.uk wrote:

 
 - Original Message - From: Justin T. Gibbs
 On Jul 10, 2013, at 1:06 PM, Steven Hartland wrote:
 - Original Message - From: Justin T. Gibbs 
 I'm sure lots of folks have some solution to this.  Here is an
 old version of what we use at Spectra:
 http://people.freebsd.org/~gibbs/zfs_patches/zfs_auto_ashift.diff
 The above patch is missing some cleanup that was motivated by my
 discussions with George Wilson about this change in April.  I'll
 dig that up later tonight.  Even if you don't read the full diff,
 please read the included checkin comment since it explains the
 motivation behind this particular solution.
 This is on my list of things to upstream in the next week or so after
 I add logic to the userspace tools to report whether or not the
 TLVs in a pool are using an optimal allocation size.  This is only
 possible if you actually make ZFS fully aware of logical, physical,
 and the configured allocation size.  All of the other patches I've seen
 just treat physical as logical.
 Reading through your patch it seems that your logical_ashift equates to
 the current ashift values which for geom devices is based off sectorsize
 and your physical_ashift is based stripesize.
 This is almost identical to the approach I used adding a desired ashift,
 which equates to your physical_ashift, along side the standard ashift
 i.e. required aka logical_ashift value :)
 
 Yes, the approaches are similar.  Our current version records the logical
 access size in the vdev structure too, which might relate to the issue
 below.
 
  One issue I did spot in your patch is that you currently expose
  zfs_max_auto_ashift as a sysctl but don't clamp its value which would
  cause problems should a user configure values  13.
 
 I would expect the zio pipeline to simply insert an ashift aligned thunking
 buffer for these operations, but I haven't tried going past an ashift of 13 
 in
 my tests.  If it is an issue, it seems the restriction should be based on
 logical access size, not optimal access size.
 
 Yes with your methodology you'll only see the issue if zfs_max_auto_ashift
 and physical_ashift are both  13, but this can be the case for example
 on a RAID controller with large stripsize.

I'm not sure I follow.  logical_ashift is available in our latest code, as is 
the
physical_ashift.  But even without the logical_ashift, why doesn't the zio
pipeline properly thunk zio_phys_read() access based on the configured ashift?

--
Justin

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Kernel dumps [was Re: possible changes from Panzura]

2013-07-10 Thread Jordan Hubbard

On Jul 10, 2013, at 11:16 AM, Julian Elischer jul...@elischer.org wrote:

 My first  candidates are:

Those sound useful.   Just out of curiosity, however, since we're on the topic 
of kernel dumps:  Has anyone even looked into the notion of an emergency 
fall-back network stack to enable remote kernel panic (or system hang) 
debugging, the way OS X lets you do?  I can't tell you the number of times I've 
NMI'd a Mac and connected to it remotely in a scenario where everything was 
totally wedged and just a couple of minutes in kgdb (or now lldb) quickly 
showed that everything was waiting on a specific lock and the problem became 
manifestly clear.

The feature also lets you scrape a panic'd machine with automation, running 
some kgdb scripts against it to glean useful information for later analysis vs 
having to have someone schlep the dump image manually to triage.  It's going to 
be damn hard to live without this now, and if someone else isn't working on it, 
that's good to know too!

- Jordan

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: possible changes from Panzura

2013-07-10 Thread Eric van Gyzen
On 07/10/2013 13:16, Julian Elischer wrote:
 I'm going through all the internal changes my current employer has made, 
 categorizing them
 into  proprietary and can feed back to FreeBSD.

 I will probably send out emails like this several times seeking feedback on 
 whether a particular patch is considered useful or not..
 these are verse 8.0 at the moment.  (this is part of our effort to upgrade)

 My first  candidates are:

 -internal commit message
 Add support for dumping kernel dumps in addition to text dumps for
 kernel panics. Add a new version of savecore to the tree, which knows
 how to retrieve and save both dumps. Control the new dump behavior via the
 debug.kerneldump_requested sysctl - disabling this wil go back to the
 old text dump-only behavior.

I wonder which would be more useful:  this, or just dumping the full
dump and using crashinfo to create a text summary after reboot.  Of
course, crashinfo could be enhanced to show anything it's currently
missing (relative to the text dump).  This would have the advantage of
doing less stuff at dump time.  Yours would have the advantage that it
exists and works.  :)  Thoughts?

 --  part 2 -
  Have savecore be more optimistic about
 saving compressed cores - always try, and only bail if we actually run
 out of space. The pessimistic only try saving if we've got enough free
 space to handle the entire dump uncompressed made it too easy for us to
 run out of space on our /var/crash partition

Yes, please.  I've run into this occasionally, but it never annoyed me
enough to fix it.  Procrastination pays off yet again.  ;)

Eric
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make ZFS use the physical sector size when computing initial ashift

2013-07-10 Thread Steven Hartland
- Original Message - 
From: Justin T. Gibbs

...

 One issue I did spot in your patch is that you currently expose
 zfs_max_auto_ashift as a sysctl but don't clamp its value which would
 cause problems should a user configure values  13.

I would expect the zio pipeline to simply insert an ashift aligned thunking
buffer for these operations, but I haven't tried going past an ashift of 13 in
my tests.  If it is an issue, it seems the restriction should be based on
logical access size, not optimal access size.


Yes with your methodology you'll only see the issue if zfs_max_auto_ashift
and physical_ashift are both  13, but this can be the case for example
on a RAID controller with large stripsize.


I'm not sure I follow.  logical_ashift is available in our latest code, as is 
the
physical_ashift.  But even without the logical_ashift, why doesn't the zio
pipeline properly thunk zio_phys_read() access based on the configured ashift?


When I looked at it, which was a long time ago now so please excuse me if
I'm a little rusty on the details, zio_phys_read() was working more luck than
judgement as the offsets passed in where calculated from a valid start + 
increment
based on the size of a structure within vdev_label_offset() with no ashift
logic applied that I cound find.

The result was pools created with large ashift's where unstable when I
tested.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Kernel dumps [was Re: possible changes from Panzura]

2013-07-10 Thread Jordan Hubbard

On Jul 10, 2013, at 1:04 PM, asom...@gmail.com wrote:

 I don't doubt that it would be useful to have an emergency network
 stack.  But have you ever looked into debugging over firewire?

Absolutely.  In fact, before the advent of remote network debugging, FW was 
totally the debugging method of choice since firewire target DMA lets you do 
all kinds of useful things (as well as a few things that simply scare the 
security guys to death ;-) ).

My point was more that actually being able to debug a machine over the network 
is such a step up in terms of convenience/awesomeness that if anyone is 
thinking of putting any time and attention into this area at all, that's 
definitely the target to go for.

Looking at http://www.opensource.apple.com/tarballs/xnu/xnu-2050.22.13.tar.gz 
there's even reasonable documentation on the kernel debugging protocol in 
xnu/osfmk/kdp.  Folks could do worse than try to clone it.  The gdb debugger 
macros in support of it are also in xnu/kgmacros.  None of it is going to be 
'drop in' for FreeBSD by any stretch of the imagination, but it's always easier 
to get to a destination when you have a map. :-)Anyone with a Mac can also 
nvram boot-args=debug=0x144 and test-drive it around, just to see how it 
works in actual practice.  See also:  
https://developer.apple.com/library/mac/#documentation/Darwin/Conceptual/KEXTConcept/KEXTConceptDebugger/debug_tutorial.html

- Jordan


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Kernel dumps [was Re: possible changes from Panzura]

2013-07-10 Thread asomers
On Wed, Jul 10, 2013 at 12:57 PM, Jordan Hubbard j...@mail.turbofuzz.com 
wrote:

 On Jul 10, 2013, at 11:16 AM, Julian Elischer jul...@elischer.org wrote:

 My first  candidates are:

 Those sound useful.   Just out of curiosity, however, since we're on the 
 topic of kernel dumps:  Has anyone even looked into the notion of an 
 emergency fall-back network stack to enable remote kernel panic (or system 
 hang) debugging, the way OS X lets you do?  I can't tell you the number of 
 times I've NMI'd a Mac and connected to it remotely in a scenario where 
 everything was totally wedged and just a couple of minutes in kgdb (or now 
 lldb) quickly showed that everything was waiting on a specific lock and the 
 problem became manifestly clear.

 The feature also lets you scrape a panic'd machine with automation, running 
 some kgdb scripts against it to glean useful information for later analysis 
 vs having to have someone schlep the dump image manually to triage.  It's 
 going to be damn hard to live without this now, and if someone else isn't 
 working on it, that's good to know too!

I don't doubt that it would be useful to have an emergency network
stack.  But have you ever looked into debugging over firewire?  We've
had success with it.  All of our development machines are connected to
a single firewire bus.  When one panics, we can remotely debug it with
both kdb and ddb.  It's not ethernet , but it's still much faster than
a serial port.
https://wiki.freebsd.org/DebugWithDcons


 - Jordan

 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Kernel dumps [was Re: possible changes from Panzura]

2013-07-10 Thread Kevin Day

 
 
 Those sound useful.   Just out of curiosity, however, since we're on the 
 topic of kernel dumps:  Has anyone even looked into the notion of an 
 emergency fall-back network stack to enable remote kernel panic (or system 
 hang) debugging, the way OS X lets you do?  I can't tell you the number of 
 times I've NMI'd a Mac and connected to it remotely in a scenario where 
 everything was totally wedged and just a couple of minutes in kgdb (or now 
 lldb) quickly showed that everything was waiting on a specific lock and the 
 problem became manifestly clear.
 
 The feature also lets you scrape a panic'd machine with automation, running 
 some kgdb scripts against it to glean useful information for later analysis 
 vs having to have someone schlep the dump image manually to triage.  It's 
 going to be damn hard to live without this now, and if someone else isn't 
 working on it, that's good to know too!


At a previous employer, we had a system where on a panic it had a totally 
separate stack capable of just IP/UDP/TFTP and would save its core via TFTP to 
a server. This isn’t as nice as full remote debugging, but it was a whole lot 
easier to develop. The caveats I remember were:

1) We didn’t want to implement ARP, so you had to write the mac address of the 
“dump server” to the kernel via sysctl before crashing.
2) We also didn’t want to have to deal with routing tables, so you had to 
manually specify what interface to blast packets out to, also via sysctl.
3) After a panic we didn’t want to rely on interrupt processing working, so it 
polled the network interface and blocked whenever it needed to. Since this was 
an embedded system, it wasn’t too big of a deal - only one network driver had 
to be hacked to support this. Basically a flag that would switch to “disable 
normal processing, switch to polled fifos for input and output” until reboot.
4) The whole system used only preallocated buffers and its own stack (carved 
out from memory on boot) so even if the kernel’s malloc was trashed, we could 
still dump.

I’m not sure this really would scratch your itch, but I believe this took me no 
more than a day or two to implement. Parts #1 and #2 would be pretty easy, but 
I’m not sure how generic the kernel could support an emergency network mode 
that doesn’t require interrupts for every network card out there. Maybe that 
isn’t as important to you as it was to us.

The whole exercise is much easier if you don’t use TFTP but a custom protocol 
that doesn’t require the crashing system to receive any packets, if it can just 
blast away at some random host oblivious if it’s working or not, it’s a lot 
less code to write.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Kernel dumps [was Re: possible changes from Panzura]

2013-07-10 Thread Will Andrews
On Wed, Jul 10, 2013 at 3:50 PM, Jordan Hubbard j...@mail.turbofuzz.com wrote:
 Absolutely.  In fact, before the advent of remote network debugging, FW was 
 totally the debugging method of choice since firewire target DMA lets you do 
 all kinds of useful things (as well as a few things that simply scare the 
 security guys to death ;-) ).

 My point was more that actually being able to debug a machine over the 
 network is such a step up in terms of convenience/awesomeness that if anyone 
 is thinking of putting any time and attention into this area at all, that's 
 definitely the target to go for.

 Looking at http://www.opensource.apple.com/tarballs/xnu/xnu-2050.22.13.tar.gz 
 there's even reasonable documentation on the kernel debugging protocol in 
 xnu/osfmk/kdp.  Folks could do worse than try to clone it.  The gdb debugger 
 macros in support of it are also in xnu/kgmacros.  None of it is going to be 
 'drop in' for FreeBSD by any stretch of the imagination, but it's always 
 easier to get to a destination when you have a map. :-)Anyone with a Mac 
 can also nvram boot-args=debug=0x144 and test-drive it around, just to 
 see how it works in actual practice.  See also:  
 https://developer.apple.com/library/mac/#documentation/Darwin/Conceptual/KEXTConcept/KEXTConceptDebugger/debug_tutorial.html


Speaking of Apple solutions, I've recently used Apple's kgdb with the
kernel debug kit  kdp remote debugging, to debug a panic'd OS X host.
 It's really quite nice, because the debug kit comes with a ton of
macros, similar to kdb, and you also get the benefit of source
debugging.  I think FreeBSD would benefit massively from finding some
way to share macros between kdb and kgdb, in addition to having an
emergency network stack like you suggest.

As Alan says, until then, there's firewire, and also gdbsx if your
FreeBSD system is running as a Xen guest.

--Will.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Kernel dumps [was Re: possible changes from Panzura]

2013-07-10 Thread Bakul Shah
On Wed, 10 Jul 2013 14:50:19 PDT Jordan Hubbard j...@mail.turbofuzz.com wrote:
 
 On Jul 10, 2013, at 1:04 PM, asom...@gmail.com wrote:
 
  I don't doubt that it would be useful to have an emergency network
  stack.  But have you ever looked into debugging over firewire?
 
 My point was more that actually being able to debug a machine over the networ
 k is such a step up in terms of convenience/awesomeness that if anyone is thi
 nking of putting any time and attention into this area at all, that's definit
 ely the target to go for.

You have to use this just once to see how convenient it is!

For a previous company James Da Silva did this in 1997 by
adding a network console (IIRC in a day or two).  A new
ethernet type was used + a host specific ethernet multicast
address so you could connect from any machine on the same
ethernet segment.  Either as a remote console for the usual
console IO  ddb, or to run remote gdb.  Quite insecure but
that didn't matter as this was used in a test network.  There
was no emegerency network stack; just a polling function added
to an ethernet driver since this had to work even when the
kernel was on the operating table under anaesthetic! No new
gdb hacks were necessary since the invoking program set things
up for it.

If I was doing this today, I'd probably still do the same and
make sure that the interface used for remote debugging is on
an isolated network.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Kernel dumps [was Re: possible changes from Panzura]

2013-07-10 Thread Vincent Hoffman
On 10/07/2013 23:09, Kevin Day wrote:

 Those sound useful.   Just out of curiosity, however, since we're on the 
 topic of kernel dumps:  Has anyone even looked into the notion of an 
 emergency fall-back network stack to enable remote kernel panic (or system 
 hang) debugging, the way OS X lets you do?  I can't tell you the number of 
 times I've NMI'd a Mac and connected to it remotely in a scenario where 
 everything was totally wedged and just a couple of minutes in kgdb (or now 
 lldb) quickly showed that everything was waiting on a specific lock and the 
 problem became manifestly clear.

 The feature also lets you scrape a panic'd machine with automation, running 
 some kgdb scripts against it to glean useful information for later analysis 
 vs having to have someone schlep the dump image manually to triage.  It's 
 going to be damn hard to live without this now, and if someone else isn't 
 working on it, that's good to know too!

 At a previous employer, we had a system where on a panic it had a totally 
 separate stack capable of just IP/UDP/TFTP and would save its core via TFTP 
 to a server. This isn’t as nice as full remote debugging, but it was a whole 
 lot easier to develop. The caveats I remember were:

 1) We didn’t want to implement ARP, so you had to write the mac address of 
 the “dump server” to the kernel via sysctl before crashing.
 2) We also didn’t want to have to deal with routing tables, so you had to 
 manually specify what interface to blast packets out to, also via sysctl.
 3) After a panic we didn’t want to rely on interrupt processing working, so 
 it polled the network interface and blocked whenever it needed to. Since this 
 was an embedded system, it wasn’t too big of a deal - only one network driver 
 had to be hacked to support this. Basically a flag that would switch to 
 “disable normal processing, switch to polled fifos for input and output” 
 until reboot.
 4) The whole system used only preallocated buffers and its own stack (carved 
 out from memory on boot) so even if the kernel’s malloc was trashed, we could 
 still dump.

 I’m not sure this really would scratch your itch, but I believe this took me 
 no more than a day or two to implement. Parts #1 and #2 would be pretty easy, 
 but I’m not sure how generic the kernel could support an emergency network 
 mode that doesn’t require interrupts for every network card out there. Maybe 
 that isn’t as important to you as it was to us.

 The whole exercise is much easier if you don’t use TFTP but a custom protocol 
 that doesn’t require the crashing system to receive any packets, if it can 
 just blast away at some random host oblivious if it’s working or not, it’s a 
 lot less code to write.

There was some work on something similar at one point, not sure what
came of it.
http://lists.freebsd.org/pipermail/freebsd-current/2010-September/020164.html

Vince

 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org