Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-22 Thread Chad Cantwell
Hi Garrett,

Since my problem did turn out to be a debug kernel on my compilations,
I booted back into the Nexanta 3 RC2 CD and let a scrub run for about
half an hour to see if I just hadn't waited long enough the first time
around.  It never made it past 159 MB/s.  I finally rebooted into my
145 non-debug kernel and within a few seconds of reimporting the pool
the scrub was up to ~400 MB/s, so it does indeed seem like the Nexanta
CD kernel is either in debug mode, or something else is slowing it down.

Chad

On Wed, Jul 21, 2010 at 09:12:35AM -0700, Garrett D'Amore wrote:
 On Wed, 2010-07-21 at 02:21 -0400, Richard Lowe wrote:
  I built in the normal fashion, with the CBE compilers
  (cc: Sun C 5.9 SunOS_i386 Patch 124868-10 2009/04/30), and 12u1 lint.
  
  I'm not subscribed to zfs-discuss, but have you established whether the
  problematic build is DEBUG? (the bits I uploaded were non-DEBUG).
 
 That would make a *huge* difference.  DEBUG bits have zero optimization,
 and also have a great number of sanity tests included that are absent
 from the non-DEBUG bits.  If these are expensive checks on a hot code
 path, it can have a very nasty impact on performance.
 
 Now that said, I *hope* the bits that Nexenta delivered were *not*
 DEBUG.  But I've seen at least one bug that makes me think we might be
 delivering DEBUG binaries.  I'll check into it.
 
   -- Garrett
 
  
  -- Rich
  
  Haudy Kazemi wrote:
   Could it somehow not be compiling 64-bit support?
  
  
   -- 
   Brent Jones
   
  
   I thought about that but it says when it boots up that it is 64-bit, and 
   I'm able to run
   64-bit binaries.  I wonder if it's compiling for the wrong processor 
   optomization though?
   Maybe if it is missing some of the newer SSEx instructions the zpool 
   checksum checking is
   slowed down significantly?  I don't know how to check for this though 
   and it seems strange
   it would slow it down this significantly.  I'd expect even a non-SSE 
   enabled
   binary to be able to calculate a few hundred MB of checksums per second 
   for
   a 2.5+ghz processor.
  
   Chad
  
   Would it be possible to do a closer comparison between Rich Lowe's fast 
   142
   build and your slow 142 build?  For example run a diff on the source, 
   build
   options, and build scripts.  If the build settings are close enough, a
   comparison of the generated binaries might be a faster way to narrow 
   things
   down (if the optimizations are different then a resultant binary 
   comparison
   probably won't be useful).
  
   You said previously that:
   The procedure I followed was basically what is outlined here:
   http://insanum.com/blog/2010/06/08/how-to-build-opensolaris
  
   using the SunStudio 12 compilers for ON and 12u1 for lint.
 
   Are these the same compiler versions Rich Lowe used?  Maybe there is a
   compiler optimization bug.  Rich Lowe's build readme doesn't tell us which
   compiler he used.
   http://genunix.org/dist/richlowe/README.txt
  
   I suppose the easiest way for me to confirm if there is a regression or 
   if my
   compiling is flawed is to just try compiling snv_142 using the same 
   procedure
   and see if it works as well as Rich Lowe's copy or if it's slow like my 
   other
   compilations.
  
   Chad
  
   Another older compilation guide:
   http://hub.opensolaris.org/bin/view/Community+Group+tools/building_opensolaris
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-21 Thread Richard Lowe

I built in the normal fashion, with the CBE compilers
(cc: Sun C 5.9 SunOS_i386 Patch 124868-10 2009/04/30), and 12u1 lint.

I'm not subscribed to zfs-discuss, but have you established whether the
problematic build is DEBUG? (the bits I uploaded were non-DEBUG).

-- Rich

Haudy Kazemi wrote:
 Could it somehow not be compiling 64-bit support?


 -- 
 Brent Jones
 

 I thought about that but it says when it boots up that it is 64-bit, and I'm 
 able to run
 64-bit binaries.  I wonder if it's compiling for the wrong processor 
 optomization though?
 Maybe if it is missing some of the newer SSEx instructions the zpool 
 checksum checking is
 slowed down significantly?  I don't know how to check for this though and it 
 seems strange
 it would slow it down this significantly.  I'd expect even a non-SSE enabled
 binary to be able to calculate a few hundred MB of checksums per second for
 a 2.5+ghz processor.

 Chad

 Would it be possible to do a closer comparison between Rich Lowe's fast 142
 build and your slow 142 build?  For example run a diff on the source, build
 options, and build scripts.  If the build settings are close enough, a
 comparison of the generated binaries might be a faster way to narrow things
 down (if the optimizations are different then a resultant binary comparison
 probably won't be useful).

 You said previously that:
 The procedure I followed was basically what is outlined here:
 http://insanum.com/blog/2010/06/08/how-to-build-opensolaris

 using the SunStudio 12 compilers for ON and 12u1 for lint.
   
 Are these the same compiler versions Rich Lowe used?  Maybe there is a
 compiler optimization bug.  Rich Lowe's build readme doesn't tell us which
 compiler he used.
 http://genunix.org/dist/richlowe/README.txt

 I suppose the easiest way for me to confirm if there is a regression or if my
 compiling is flawed is to just try compiling snv_142 using the same procedure
 and see if it works as well as Rich Lowe's copy or if it's slow like my other
 compilations.

 Chad

 Another older compilation guide:
 http://hub.opensolaris.org/bin/view/Community+Group+tools/building_opensolaris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-21 Thread Garrett D'Amore
On Wed, 2010-07-21 at 02:21 -0400, Richard Lowe wrote:
 I built in the normal fashion, with the CBE compilers
 (cc: Sun C 5.9 SunOS_i386 Patch 124868-10 2009/04/30), and 12u1 lint.
 
 I'm not subscribed to zfs-discuss, but have you established whether the
 problematic build is DEBUG? (the bits I uploaded were non-DEBUG).

That would make a *huge* difference.  DEBUG bits have zero optimization,
and also have a great number of sanity tests included that are absent
from the non-DEBUG bits.  If these are expensive checks on a hot code
path, it can have a very nasty impact on performance.

Now that said, I *hope* the bits that Nexenta delivered were *not*
DEBUG.  But I've seen at least one bug that makes me think we might be
delivering DEBUG binaries.  I'll check into it.

-- Garrett

 
 -- Rich
 
 Haudy Kazemi wrote:
  Could it somehow not be compiling 64-bit support?
 
 
  -- 
  Brent Jones
  
 
  I thought about that but it says when it boots up that it is 64-bit, and 
  I'm able to run
  64-bit binaries.  I wonder if it's compiling for the wrong processor 
  optomization though?
  Maybe if it is missing some of the newer SSEx instructions the zpool 
  checksum checking is
  slowed down significantly?  I don't know how to check for this though and 
  it seems strange
  it would slow it down this significantly.  I'd expect even a non-SSE 
  enabled
  binary to be able to calculate a few hundred MB of checksums per second for
  a 2.5+ghz processor.
 
  Chad
 
  Would it be possible to do a closer comparison between Rich Lowe's fast 142
  build and your slow 142 build?  For example run a diff on the source, build
  options, and build scripts.  If the build settings are close enough, a
  comparison of the generated binaries might be a faster way to narrow things
  down (if the optimizations are different then a resultant binary comparison
  probably won't be useful).
 
  You said previously that:
  The procedure I followed was basically what is outlined here:
  http://insanum.com/blog/2010/06/08/how-to-build-opensolaris
 
  using the SunStudio 12 compilers for ON and 12u1 for lint.

  Are these the same compiler versions Rich Lowe used?  Maybe there is a
  compiler optimization bug.  Rich Lowe's build readme doesn't tell us which
  compiler he used.
  http://genunix.org/dist/richlowe/README.txt
 
  I suppose the easiest way for me to confirm if there is a regression or if 
  my
  compiling is flawed is to just try compiling snv_142 using the same 
  procedure
  and see if it works as well as Rich Lowe's copy or if it's slow like my 
  other
  compilations.
 
  Chad
 
  Another older compilation guide:
  http://hub.opensolaris.org/bin/view/Community+Group+tools/building_opensolaris
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-21 Thread Chad Cantwell
Hi,

My bits were originally debug because I didn't know any better.  I thought I 
had then
recompiled without debug to test again, but I didn't realize until just now the 
packages
end up in a different directory (nightly vs nightly-nd) so I believe after 
compiling
non-debug I just reinstalled the debug bits.  I'm about to test again with an 
actual
non-debug 142, and after that a non-debug 145 which just came out.

Thanks,
Chad

On Wed, Jul 21, 2010 at 02:21:51AM -0400, Richard Lowe wrote:
 
 I built in the normal fashion, with the CBE compilers
 (cc: Sun C 5.9 SunOS_i386 Patch 124868-10 2009/04/30), and 12u1 lint.
 
 I'm not subscribed to zfs-discuss, but have you established whether the
 problematic build is DEBUG? (the bits I uploaded were non-DEBUG).
 
 -- Rich
 
 Haudy Kazemi wrote:
  Could it somehow not be compiling 64-bit support?
 
 
  -- 
  Brent Jones
  
 
  I thought about that but it says when it boots up that it is 64-bit, and 
  I'm able to run
  64-bit binaries.  I wonder if it's compiling for the wrong processor 
  optomization though?
  Maybe if it is missing some of the newer SSEx instructions the zpool 
  checksum checking is
  slowed down significantly?  I don't know how to check for this though and 
  it seems strange
  it would slow it down this significantly.  I'd expect even a non-SSE 
  enabled
  binary to be able to calculate a few hundred MB of checksums per second for
  a 2.5+ghz processor.
 
  Chad
 
  Would it be possible to do a closer comparison between Rich Lowe's fast 142
  build and your slow 142 build?  For example run a diff on the source, build
  options, and build scripts.  If the build settings are close enough, a
  comparison of the generated binaries might be a faster way to narrow things
  down (if the optimizations are different then a resultant binary comparison
  probably won't be useful).
 
  You said previously that:
  The procedure I followed was basically what is outlined here:
  http://insanum.com/blog/2010/06/08/how-to-build-opensolaris
 
  using the SunStudio 12 compilers for ON and 12u1 for lint.

  Are these the same compiler versions Rich Lowe used?  Maybe there is a
  compiler optimization bug.  Rich Lowe's build readme doesn't tell us which
  compiler he used.
  http://genunix.org/dist/richlowe/README.txt
 
  I suppose the easiest way for me to confirm if there is a regression or if 
  my
  compiling is flawed is to just try compiling snv_142 using the same 
  procedure
  and see if it works as well as Rich Lowe's copy or if it's slow like my 
  other
  compilations.
 
  Chad
 
  Another older compilation guide:
  http://hub.opensolaris.org/bin/view/Community+Group+tools/building_opensolaris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-21 Thread Chad Cantwell
It does seem to be faster now that I really installed the non-debug bits.  I 
let it resume
a scrub after reboot, and while it's not as fast as it usually is (280 - 300 
MB/s vs 500+)
I assume it's just presently checking a part of the filesystem currently with 
smaller
files thus reducing the speed, since it's well past the prior limitation.  I 
tested 142
non-debug briefly until the scrub reached at least 250 MB/s and then booted 
into 145
non-debug where I'm letting the scrub finish now.  I'll test the Nexanta disc 
again to be
sure it was slow since I don't recall exactly how much time I gave it in my 
prior tests
for the scrub to reach it's normal speed, although I can't do that until this 
evening
when I'm home again.

Chad

On Wed, Jul 21, 2010 at 09:44:42AM -0700, Chad Cantwell wrote:
 Hi,
 
 My bits were originally debug because I didn't know any better.  I thought I 
 had then
 recompiled without debug to test again, but I didn't realize until just now 
 the packages
 end up in a different directory (nightly vs nightly-nd) so I believe after 
 compiling
 non-debug I just reinstalled the debug bits.  I'm about to test again with an 
 actual
 non-debug 142, and after that a non-debug 145 which just came out.
 
 Thanks,
 Chad
 
 On Wed, Jul 21, 2010 at 02:21:51AM -0400, Richard Lowe wrote:
  
  I built in the normal fashion, with the CBE compilers
  (cc: Sun C 5.9 SunOS_i386 Patch 124868-10 2009/04/30), and 12u1 lint.
  
  I'm not subscribed to zfs-discuss, but have you established whether the
  problematic build is DEBUG? (the bits I uploaded were non-DEBUG).
  
  -- Rich
  
  Haudy Kazemi wrote:
   Could it somehow not be compiling 64-bit support?
  
  
   -- 
   Brent Jones
   
  
   I thought about that but it says when it boots up that it is 64-bit, and 
   I'm able to run
   64-bit binaries.  I wonder if it's compiling for the wrong processor 
   optomization though?
   Maybe if it is missing some of the newer SSEx instructions the zpool 
   checksum checking is
   slowed down significantly?  I don't know how to check for this though 
   and it seems strange
   it would slow it down this significantly.  I'd expect even a non-SSE 
   enabled
   binary to be able to calculate a few hundred MB of checksums per second 
   for
   a 2.5+ghz processor.
  
   Chad
  
   Would it be possible to do a closer comparison between Rich Lowe's fast 
   142
   build and your slow 142 build?  For example run a diff on the source, 
   build
   options, and build scripts.  If the build settings are close enough, a
   comparison of the generated binaries might be a faster way to narrow 
   things
   down (if the optimizations are different then a resultant binary 
   comparison
   probably won't be useful).
  
   You said previously that:
   The procedure I followed was basically what is outlined here:
   http://insanum.com/blog/2010/06/08/how-to-build-opensolaris
  
   using the SunStudio 12 compilers for ON and 12u1 for lint.
 
   Are these the same compiler versions Rich Lowe used?  Maybe there is a
   compiler optimization bug.  Rich Lowe's build readme doesn't tell us which
   compiler he used.
   http://genunix.org/dist/richlowe/README.txt
  
   I suppose the easiest way for me to confirm if there is a regression or 
   if my
   compiling is flawed is to just try compiling snv_142 using the same 
   procedure
   and see if it works as well as Rich Lowe's copy or if it's slow like my 
   other
   compilations.
  
   Chad
  
   Another older compilation guide:
   http://hub.opensolaris.org/bin/view/Community+Group+tools/building_opensolaris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Chad Cantwell
On Mon, Jul 19, 2010 at 07:01:54PM -0700, Chad Cantwell wrote:
 On Tue, Jul 20, 2010 at 10:54:44AM +1000, James C. McPherson wrote:
  On 20/07/10 10:40 AM, Chad Cantwell wrote:
  fyi, everyone, I have some more info here.  in short, rich lowe's 142 works
  correctly (fast) on my hardware, while both my compilations (snv 143, snv 
  144)
  and also the nexanta 3 rc2 kernel (134 with backports) are horribly slow.
  
  I finally got around to trying rich lowe's snv 142 compilation in place of
  my own compilation of 143 (and later 144, not mentioned below), and unlike
  my own two compilations, his works very fast again on my same zpool (
  scrubbing avg increased from low 100s to over 400 MB/s within a few
  minutes after booting into this copy of 142.  I should note that since
  my original message, I also tried booting from a Nexanta Core 3.0 RC2 ISO
  after realizing it had zpool 26 support backported into 134 and was in
  fact able to read my zpool despite upgrading the version.  Running a
  scrub from the F2 shell on the Nexanta CD was also slow scrubbing, just
  like the 143 and 144 that I compiled.  So, there seem to be two 
  possibilities.
  Either (and this seems unlikely) there is a problem introduced post-142 
  which
  slows things down, and it occured in 143, 144, and was brought back to 134
  with Nexanta's backports, or else (more likely) there is something 
  different
  or wrong with how I'm compiling the kernel that makes the hardware not
  perform up to its specifications with a zpool, and possibly the Nexanta 3
  RC2 ISO has the same problem as my own compilations.
  
  So - what's your env file contents, which closedbins are you using,
  why crypto bits are you using, and what changeset is your own workspace
  synced with?
  
  
  James C. McPherson
  --
  Oracle
  http://www.jmcp.homeunix.com/blog
 
 
 The procedure I followed was basically what is outlined here:
 http://insanum.com/blog/2010/06/08/how-to-build-opensolaris
 
 using the SunStudio 12 compilers for ON and 12u1 for lint.
 
 For each build (143, 144) I cloned the exact tag for that build, i.e.:
 
 # hg clone ssh://a...@hg.opensolaris.org/hg/onnv/onnv-gate onnv-b144
 # cd onnv-b144
 # hg update onnv_144
 
 Then I downloaded the corresponding closed and crypto bins from
 http://dlc.sun.com/osol/on/downloads/b143 or
 http://dlc.sun.com/osol/on/downloads/b144
 
 The only environemnt variables I modified from the default opensolaris.sh
 file were the basic ones: GATE, CODEMGR_WS, STAFFER, and ON_CRYPTO_BINS
 to point to my work directory for the build, my username, and the relevant
 crypto bin:
 
 $ egrep -e ^GATE|^CODEMGR_WS|^STAFFER|^ON_CRYPTO_BINS opensolaris.sh
 GATE=onnv-b144; export GATE
 CODEMGR_WS=/work/compiling/$GATE; export 
 CODEMGR_WS
 STAFFER=chad;   export STAFFER
 ON_CRYPTO_BINS=$CODEMGR_WS/on-crypto-latest.$MACH.tar.bz2
 
 I suppose the easiest way for me to confirm if there is a regression or if my
 compiling is flawed is to just try compiling snv_142 using the same procedure
 and see if it works as well as Rich Lowe's copy or if it's slow like my other
 compilations.
 
 Chad
 

I've just compiled and booted into snv_142, and I experienced the same slow dd 
and
scrubbing as I did with my 142 and 143 compilations and with the Nexanta 3 RC2 
CD.
So, this would seem to indicate a build environment/process flaw rather than a
regression.

Chad
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Robert Milkowski

On 20/07/2010 07:59, Chad Cantwell wrote:


I've just compiled and booted into snv_142, and I experienced the same slow dd 
and
scrubbing as I did with my 142 and 143 compilations and with the Nexanta 3 RC2 
CD.
So, this would seem to indicate a build environment/process flaw rather than a
regression.

   


Are you sure it is not a debug vs. non-debug issue?


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Roy Sigurd Karlsbakk
 I'm surprised you're even getting 400MB/s on the fast
 configurations, with only 16 drives in a Raidz3 configuration.
 To me, 16 drives in Raidz3 (single Vdev) would do about 150MB/sec, as
 your slow speeds suggest.

That'll be for random i/o. His i/o here is sequential, so the i/o is spread 
over the drives.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Chad Cantwell
Yes, I think this might have been it.  I missed the NIGHTLY_OPTIONS variable in
opensolaris and I think it was compiling a debug build.  I'm not sure what the
ramifications are of this or how much slower a debug build should be, but I'm
recompiling a release build now so hopefully all will be well.

Thanks,
Chad

On Tue, Jul 20, 2010 at 08:39:42AM +0100, Robert Milkowski wrote:
 On 20/07/2010 07:59, Chad Cantwell wrote:
 
 I've just compiled and booted into snv_142, and I experienced the same slow 
 dd and
 scrubbing as I did with my 142 and 143 compilations and with the Nexanta 3 
 RC2 CD.
 So, this would seem to indicate a build environment/process flaw rather than 
 a
 regression.
 
 
 Are you sure it is not a debug vs. non-debug issue?
 
 
 -- 
 Robert Milkowski
 http://milek.blogspot.com
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Chad Cantwell
No, this wasn't it.  A non debug build with the same NIGHTLY_OPTIONS
at Rich Lowe's 142 build is still very slow...

On Tue, Jul 20, 2010 at 09:52:10AM -0700, Chad Cantwell wrote:
 Yes, I think this might have been it.  I missed the NIGHTLY_OPTIONS variable 
 in
 opensolaris and I think it was compiling a debug build.  I'm not sure what the
 ramifications are of this or how much slower a debug build should be, but I'm
 recompiling a release build now so hopefully all will be well.
 
 Thanks,
 Chad
 
 On Tue, Jul 20, 2010 at 08:39:42AM +0100, Robert Milkowski wrote:
  On 20/07/2010 07:59, Chad Cantwell wrote:
  
  I've just compiled and booted into snv_142, and I experienced the same 
  slow dd and
  scrubbing as I did with my 142 and 143 compilations and with the Nexanta 3 
  RC2 CD.
  So, this would seem to indicate a build environment/process flaw rather 
  than a
  regression.
  
  
  Are you sure it is not a debug vs. non-debug issue?
  
  
  -- 
  Robert Milkowski
  http://milek.blogspot.com
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Brent Jones
On Tue, Jul 20, 2010 at 10:29 AM, Chad Cantwell c...@iomail.org wrote:
 No, this wasn't it.  A non debug build with the same NIGHTLY_OPTIONS
 at Rich Lowe's 142 build is still very slow...

 On Tue, Jul 20, 2010 at 09:52:10AM -0700, Chad Cantwell wrote:
 Yes, I think this might have been it.  I missed the NIGHTLY_OPTIONS variable 
 in
 opensolaris and I think it was compiling a debug build.  I'm not sure what 
 the
 ramifications are of this or how much slower a debug build should be, but I'm
 recompiling a release build now so hopefully all will be well.

 Thanks,
 Chad

 On Tue, Jul 20, 2010 at 08:39:42AM +0100, Robert Milkowski wrote:
  On 20/07/2010 07:59, Chad Cantwell wrote:
  
  I've just compiled and booted into snv_142, and I experienced the same 
  slow dd and
  scrubbing as I did with my 142 and 143 compilations and with the Nexanta 
  3 RC2 CD.
  So, this would seem to indicate a build environment/process flaw rather 
  than a
  regression.
  
 
  Are you sure it is not a debug vs. non-debug issue?
 
 
  --
  Robert Milkowski
  http://milek.blogspot.com
 

Could it somehow not be compiling 64-bit support?


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Chad Cantwell
On Tue, Jul 20, 2010 at 10:45:58AM -0700, Brent Jones wrote:
 On Tue, Jul 20, 2010 at 10:29 AM, Chad Cantwell c...@iomail.org wrote:
  No, this wasn't it.  A non debug build with the same NIGHTLY_OPTIONS
  at Rich Lowe's 142 build is still very slow...
 
  On Tue, Jul 20, 2010 at 09:52:10AM -0700, Chad Cantwell wrote:
  Yes, I think this might have been it.  I missed the NIGHTLY_OPTIONS 
  variable in
  opensolaris and I think it was compiling a debug build.  I'm not sure what 
  the
  ramifications are of this or how much slower a debug build should be, but 
  I'm
  recompiling a release build now so hopefully all will be well.
 
  Thanks,
  Chad
 
  On Tue, Jul 20, 2010 at 08:39:42AM +0100, Robert Milkowski wrote:
   On 20/07/2010 07:59, Chad Cantwell wrote:
   
   I've just compiled and booted into snv_142, and I experienced the same 
   slow dd and
   scrubbing as I did with my 142 and 143 compilations and with the 
   Nexanta 3 RC2 CD.
   So, this would seem to indicate a build environment/process flaw rather 
   than a
   regression.
   
  
   Are you sure it is not a debug vs. non-debug issue?
  
  
   --
   Robert Milkowski
   http://milek.blogspot.com
  
 
 Could it somehow not be compiling 64-bit support?
 
 
 -- 
 Brent Jones
 br...@servuhome.net

I thought about that but it says when it boots up that it is 64-bit, and I'm 
able to run
64-bit binaries.  I wonder if it's compiling for the wrong processor 
optomization though?
Maybe if it is missing some of the newer SSEx instructions the zpool checksum 
checking is
slowed down significantly?  I don't know how to check for this though and it 
seems strange
it would slow it down this significantly.  I'd expect even a non-SSE enabled 
binary to 
be able to calculate a few hundred MB of checksums per second for a 2.5+ghz 
processor.

Chad
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Garrett D'Amore
So the next question is, lets figure out what richlowe did
differently. ;-)

- Garrett


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Marcelo H Majczak
If I can help narrow the variables, I compiled both 137 and 144 (137 is minimum 
req. to build 144) using the same recommended compiler and lint, nightly 
options etc. 137 works fine but 144 suffer the slowness reported. System wise, 
I'm using only the 32bit non-debug version in an old single-core/thread 
pentium-m laptop.

What I notice is that the zpool_$pool daemon had a lot more threads (total 136, 
iirc), so something changed there but not necessarily related to the problem. 
It also seems to be issuing a lot more writing to rpool, though I can't tell 
what. In my case it causes a lot of read contention since my rpool is a USB 
flash device with no cache. iostat says something like up to 10w/20r per 
second. Up to 137 the performance has been enough, so far, for my purposes on 
this laptop.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Bill Sommerfeld

On 07/20/10 14:10, Marcelo H Majczak wrote:

It also seems to be issuing a lot more
writing to rpool, though I can't tell what. In my case it causes a
lot of read contention since my rpool is a USB flash device with no
cache. iostat says something like up to 10w/20r per second. Up to 137
the performance has been enough, so far, for my purposes on this
laptop.


if pools are more than about 60-70% full, you may be running into 6962304

workaround: add the following to /etc/system, run
bootadm update-archive, and reboot

-cut here-
* Work around 6962304
set zfs:metaslab_min_alloc_size=0x1000
* Work around 6965294
set zfs:metaslab_smo_bonus_pct=0xc8
-cut here-

no guarantees, but it's helped a few systems..

- Bill


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Bill Sommerfeld

On 07/20/10 14:10, Marcelo H Majczak wrote:

It also seems to be issuing a lot more
writing to rpool, though I can't tell what. In my case it causes a
lot of read contention since my rpool is a USB flash device with no
cache. iostat says something like up to 10w/20r per second. Up to 137
the performance has been enough, so far, for my purposes on this
laptop.


if pools are more than about 60-70% full, you may be running into 6962304

workaround: add the following to /etc/system, run
bootadm update-archive, and reboot

-cut here-
* Work around 6962304
set zfs:metaslab_min_alloc_size=0x1000
* Work around 6965294
set zfs:metaslab_smo_bonus_pct=0xc8
-cut here-

no guarantees, but it's helped a few systems..

- Bill


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Garrett D'Amore

Your config makes me think this is an atypical ZFS configuration.  As a
result, I'm not as concerned.  But I think the multithread/concurrency
may be the biggest concern here.  Perhaps the compilers are doing
something different that causes significant cache issues.  (Perhaps the
compilers themselves are in need of an update?)

- Garrett

On Tue, 2010-07-20 at 14:10 -0700, Marcelo H Majczak wrote:
 If I can help narrow the variables, I compiled both 137 and 144 (137 is 
 minimum req. to build 144) using the same recommended compiler and lint, 
 nightly options etc. 137 works fine but 144 suffer the slowness reported. 
 System wise, I'm using only the 32bit non-debug version in an old 
 single-core/thread pentium-m laptop.
 
 What I notice is that the zpool_$pool daemon had a lot more threads (total 
 136, iirc), so something changed there but not necessarily related to the 
 problem. It also seems to be issuing a lot more writing to rpool, though I 
 can't tell what. In my case it causes a lot of read contention since my rpool 
 is a USB flash device with no cache. iostat says something like up to 10w/20r 
 per second. Up to 137 the performance has been enough, so far, for my 
 purposes on this laptop.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Haudy Kazemi



Could it somehow not be compiling 64-bit support?


--
Brent Jones



I thought about that but it says when it boots up that it is 64-bit, and I'm 
able to run
64-bit binaries.  I wonder if it's compiling for the wrong processor 
optomization though?
Maybe if it is missing some of the newer SSEx instructions the zpool checksum 
checking is
slowed down significantly?  I don't know how to check for this though and it 
seems strange
it would slow it down this significantly.  I'd expect even a non-SSE enabled binary to 
be able to calculate a few hundred MB of checksums per second for a 2.5+ghz processor.


Chad


Would it be possible to do a closer comparison between Rich Lowe's fast 
142 build and your slow 142 build?  For example run a diff on the 
source, build options, and build scripts.  If the build settings are 
close enough, a comparison of the generated binaries might be a faster 
way to narrow things down (if the optimizations are different then a 
resultant binary comparison probably won't be useful).


You said previously that:

The procedure I followed was basically what is outlined here:
http://insanum.com/blog/2010/06/08/how-to-build-opensolaris

using the SunStudio 12 compilers for ON and 12u1 for lint.
  
Are these the same compiler versions Rich Lowe used?  Maybe there is a 
compiler optimization bug.  Rich Lowe's build readme doesn't tell us 
which compiler he used.

http://genunix.org/dist/richlowe/README.txt


I suppose the easiest way for me to confirm if there is a regression or if my
compiling is flawed is to just try compiling snv_142 using the same procedure
and see if it works as well as Rich Lowe's copy or if it's slow like my other
compilations.

Chad


Another older compilation guide:
http://hub.opensolaris.org/bin/view/Community+Group+tools/building_opensolaris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-19 Thread Chad Cantwell
fyi, everyone, I have some more info here.  in short, rich lowe's 142 works
correctly (fast) on my hardware, while both my compilations (snv 143, snv 144)
and also the nexanta 3 rc2 kernel (134 with backports) are horribly slow.

I finally got around to trying rich lowe's snv 142 compilation in place of
my own compilation of 143 (and later 144, not mentioned below), and unlike
my own two compilations, his works very fast again on my same zpool (
scrubbing avg increased from low 100s to over 400 MB/s within a few
minutes after booting into this copy of 142.  I should note that since
my original message, I also tried booting from a Nexanta Core 3.0 RC2 ISO
after realizing it had zpool 26 support backported into 134 and was in
fact able to read my zpool despite upgrading the version.  Running a
scrub from the F2 shell on the Nexanta CD was also slow scrubbing, just
like the 143 and 144 that I compiled.  So, there seem to be two possibilities.
Either (and this seems unlikely) there is a problem introduced post-142 which
slows things down, and it occured in 143, 144, and was brought back to 134
with Nexanta's backports, or else (more likely) there is something different
or wrong with how I'm compiling the kernel that makes the hardware not
perform up to its specifications with a zpool, and possibly the Nexanta 3
RC2 ISO has the same problem as my own compilations.

Chad

On Tue, Jul 06, 2010 at 03:08:50PM -0700, Chad Cantwell wrote:
 Hi all,
 
 I've noticed something strange in the throughput in my zpool between
 different snv builds, and I'm not sure if it's an inherent difference
 in the build or a kernel parameter that is different in the builds.
 I've setup two similiar machines and this happens with both of them.
 Each system has 16 2TB Samsung HD203WI drives (total) directly connected
 to two LSI 3081E-R 1068e cards with IT firmware in one raidz3 vdev.
 
 In both computers, after a fresh installation of snv 134, the throughput
 is a maximum of about 300 MB/s during scrub or something like
 dd if=/dev/zero bs=1024k of=bigfile.
 
 If I bfu to snv 138, I then get throughput of about 700 MB/s with both
 scrub or a single thread dd.
 
 I assumed at first this was some sort of bug or regression in 134 that
 made it slow.  However, I've now tested also from the fresh 134
 installation, compiling the OS/Net build 143 from the mercurial
 repository and booting into it, after which the dd throughput is still
 only about 300 MB/s just like snv 134.  The scrub throughput in 143
 is even slower, rarely surpassing 150 MB/s.  I wonder if the scrubbing
 being extra slow here is related to the additional statistics displayed
 during the scrub that didn't used to be shown.
 
 Is there some kind of debug option that might be enabled in the 134 build
 and persist if I compile snv 143 which would be off if I installed a 138
 through bfu?  If not, it makes me think that the bfu to 138 is changing
 the configuration somewhere to make it faster rather than fixing a bug or
 being a debug flag on or off.  Does anyone have any idea what might be
 happening?  One thing I haven't tried is bfu'ing to 138, and from this
 faster working snv 138 installing the snv 143 build, which may possibly
 create a 143 that performs faster if it's simply a configuration parameter.
 I'm not sure offhand if installing source-compiled ON builds from a bfu'd
 rpool is supported, although I suppose it's simple enough to try.
 
 Thanks,
 Chad Cantwell
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-19 Thread James C. McPherson

On 20/07/10 10:40 AM, Chad Cantwell wrote:

fyi, everyone, I have some more info here.  in short, rich lowe's 142 works
correctly (fast) on my hardware, while both my compilations (snv 143, snv 144)
and also the nexanta 3 rc2 kernel (134 with backports) are horribly slow.

I finally got around to trying rich lowe's snv 142 compilation in place of
my own compilation of 143 (and later 144, not mentioned below), and unlike
my own two compilations, his works very fast again on my same zpool (
scrubbing avg increased from low 100s to over 400 MB/s within a few
minutes after booting into this copy of 142.  I should note that since
my original message, I also tried booting from a Nexanta Core 3.0 RC2 ISO
after realizing it had zpool 26 support backported into 134 and was in
fact able to read my zpool despite upgrading the version.  Running a
scrub from the F2 shell on the Nexanta CD was also slow scrubbing, just
like the 143 and 144 that I compiled.  So, there seem to be two possibilities.
Either (and this seems unlikely) there is a problem introduced post-142 which
slows things down, and it occured in 143, 144, and was brought back to 134
with Nexanta's backports, or else (more likely) there is something different
or wrong with how I'm compiling the kernel that makes the hardware not
perform up to its specifications with a zpool, and possibly the Nexanta 3
RC2 ISO has the same problem as my own compilations.


So - what's your env file contents, which closedbins are you using,
why crypto bits are you using, and what changeset is your own workspace
synced with?


James C. McPherson
--
Oracle
http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-19 Thread Garrett D'Amore
On Mon, 2010-07-19 at 17:40 -0700, Chad Cantwell wrote:
 fyi, everyone, I have some more info here.  in short, rich lowe's 142 works
 correctly (fast) on my hardware, while both my compilations (snv 143, snv 144)
 and also the nexanta 3 rc2 kernel (134 with backports) are horribly slow.

The idea that its a regression introduced into NCP 3RC2 is not very far
fetched at all.

It certainly could stand some more analysis.

- Garrett

 
 I finally got around to trying rich lowe's snv 142 compilation in place of
 my own compilation of 143 (and later 144, not mentioned below), and unlike
 my own two compilations, his works very fast again on my same zpool (
 scrubbing avg increased from low 100s to over 400 MB/s within a few
 minutes after booting into this copy of 142.  I should note that since
 my original message, I also tried booting from a Nexanta Core 3.0 RC2 ISO
 after realizing it had zpool 26 support backported into 134 and was in
 fact able to read my zpool despite upgrading the version.  Running a
 scrub from the F2 shell on the Nexanta CD was also slow scrubbing, just
 like the 143 and 144 that I compiled.  So, there seem to be two possibilities.
 Either (and this seems unlikely) there is a problem introduced post-142 which
 slows things down, and it occured in 143, 144, and was brought back to 134
 with Nexanta's backports, or else (more likely) there is something different
 or wrong with how I'm compiling the kernel that makes the hardware not
 perform up to its specifications with a zpool, and possibly the Nexanta 3
 RC2 ISO has the same problem as my own compilations.
 
 Chad
 
 On Tue, Jul 06, 2010 at 03:08:50PM -0700, Chad Cantwell wrote:
  Hi all,
  
  I've noticed something strange in the throughput in my zpool between
  different snv builds, and I'm not sure if it's an inherent difference
  in the build or a kernel parameter that is different in the builds.
  I've setup two similiar machines and this happens with both of them.
  Each system has 16 2TB Samsung HD203WI drives (total) directly connected
  to two LSI 3081E-R 1068e cards with IT firmware in one raidz3 vdev.
  
  In both computers, after a fresh installation of snv 134, the throughput
  is a maximum of about 300 MB/s during scrub or something like
  dd if=/dev/zero bs=1024k of=bigfile.
  
  If I bfu to snv 138, I then get throughput of about 700 MB/s with both
  scrub or a single thread dd.
  
  I assumed at first this was some sort of bug or regression in 134 that
  made it slow.  However, I've now tested also from the fresh 134
  installation, compiling the OS/Net build 143 from the mercurial
  repository and booting into it, after which the dd throughput is still
  only about 300 MB/s just like snv 134.  The scrub throughput in 143
  is even slower, rarely surpassing 150 MB/s.  I wonder if the scrubbing
  being extra slow here is related to the additional statistics displayed
  during the scrub that didn't used to be shown.
  
  Is there some kind of debug option that might be enabled in the 134 build
  and persist if I compile snv 143 which would be off if I installed a 138
  through bfu?  If not, it makes me think that the bfu to 138 is changing
  the configuration somewhere to make it faster rather than fixing a bug or
  being a debug flag on or off.  Does anyone have any idea what might be
  happening?  One thing I haven't tried is bfu'ing to 138, and from this
  faster working snv 138 installing the snv 143 build, which may possibly
  create a 143 that performs faster if it's simply a configuration parameter.
  I'm not sure offhand if installing source-compiled ON builds from a bfu'd
  rpool is supported, although I suppose it's simple enough to try.
  
  Thanks,
  Chad Cantwell
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-19 Thread Chad Cantwell
On Tue, Jul 20, 2010 at 10:54:44AM +1000, James C. McPherson wrote:
 On 20/07/10 10:40 AM, Chad Cantwell wrote:
 fyi, everyone, I have some more info here.  in short, rich lowe's 142 works
 correctly (fast) on my hardware, while both my compilations (snv 143, snv 
 144)
 and also the nexanta 3 rc2 kernel (134 with backports) are horribly slow.
 
 I finally got around to trying rich lowe's snv 142 compilation in place of
 my own compilation of 143 (and later 144, not mentioned below), and unlike
 my own two compilations, his works very fast again on my same zpool (
 scrubbing avg increased from low 100s to over 400 MB/s within a few
 minutes after booting into this copy of 142.  I should note that since
 my original message, I also tried booting from a Nexanta Core 3.0 RC2 ISO
 after realizing it had zpool 26 support backported into 134 and was in
 fact able to read my zpool despite upgrading the version.  Running a
 scrub from the F2 shell on the Nexanta CD was also slow scrubbing, just
 like the 143 and 144 that I compiled.  So, there seem to be two 
 possibilities.
 Either (and this seems unlikely) there is a problem introduced post-142 which
 slows things down, and it occured in 143, 144, and was brought back to 134
 with Nexanta's backports, or else (more likely) there is something different
 or wrong with how I'm compiling the kernel that makes the hardware not
 perform up to its specifications with a zpool, and possibly the Nexanta 3
 RC2 ISO has the same problem as my own compilations.
 
 So - what's your env file contents, which closedbins are you using,
 why crypto bits are you using, and what changeset is your own workspace
 synced with?
 
 
 James C. McPherson
 --
 Oracle
 http://www.jmcp.homeunix.com/blog


The procedure I followed was basically what is outlined here:
http://insanum.com/blog/2010/06/08/how-to-build-opensolaris

using the SunStudio 12 compilers for ON and 12u1 for lint.

For each build (143, 144) I cloned the exact tag for that build, i.e.:

# hg clone ssh://a...@hg.opensolaris.org/hg/onnv/onnv-gate onnv-b144
# cd onnv-b144
# hg update onnv_144

Then I downloaded the corresponding closed and crypto bins from
http://dlc.sun.com/osol/on/downloads/b143 or
http://dlc.sun.com/osol/on/downloads/b144

The only environemnt variables I modified from the default opensolaris.sh
file were the basic ones: GATE, CODEMGR_WS, STAFFER, and ON_CRYPTO_BINS
to point to my work directory for the build, my username, and the relevant
crypto bin:

$ egrep -e ^GATE|^CODEMGR_WS|^STAFFER|^ON_CRYPTO_BINS opensolaris.sh
GATE=onnv-b144; export GATE
CODEMGR_WS=/work/compiling/$GATE; export 
CODEMGR_WS
STAFFER=chad;   export STAFFER
ON_CRYPTO_BINS=$CODEMGR_WS/on-crypto-latest.$MACH.tar.bz2

I suppose the easiest way for me to confirm if there is a regression or if my
compiling is flawed is to just try compiling snv_142 using the same procedure
and see if it works as well as Rich Lowe's copy or if it's slow like my other
compilations.

Chad

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-19 Thread Chad Cantwell
On Mon, Jul 19, 2010 at 06:00:04PM -0700, Brent Jones wrote:
 On Mon, Jul 19, 2010 at 5:40 PM, Chad Cantwell c...@iomail.org wrote:
  fyi, everyone, I have some more info here.  in short, rich lowe's 142 works
  correctly (fast) on my hardware, while both my compilations (snv 143, snv 
  144)
  and also the nexanta 3 rc2 kernel (134 with backports) are horribly slow.
 
  I finally got around to trying rich lowe's snv 142 compilation in place of
  my own compilation of 143 (and later 144, not mentioned below), and unlike
  my own two compilations, his works very fast again on my same zpool (
  scrubbing avg increased from low 100s to over 400 MB/s within a few
  minutes after booting into this copy of 142.  I should note that since
  my original message, I also tried booting from a Nexanta Core 3.0 RC2 ISO
  after realizing it had zpool 26 support backported into 134 and was in
  fact able to read my zpool despite upgrading the version.  Running a
  scrub from the F2 shell on the Nexanta CD was also slow scrubbing, just
  like the 143 and 144 that I compiled.  So, there seem to be two 
  possibilities.
  Either (and this seems unlikely) there is a problem introduced post-142 
  which
  slows things down, and it occured in 143, 144, and was brought back to 134
  with Nexanta's backports, or else (more likely) there is something different
  or wrong with how I'm compiling the kernel that makes the hardware not
  perform up to its specifications with a zpool, and possibly the Nexanta 3
  RC2 ISO has the same problem as my own compilations.
 
  Chad
 
  On Tue, Jul 06, 2010 at 03:08:50PM -0700, Chad Cantwell wrote:
  Hi all,
 
  I've noticed something strange in the throughput in my zpool between
  different snv builds, and I'm not sure if it's an inherent difference
  in the build or a kernel parameter that is different in the builds.
  I've setup two similiar machines and this happens with both of them.
  Each system has 16 2TB Samsung HD203WI drives (total) directly connected
  to two LSI 3081E-R 1068e cards with IT firmware in one raidz3 vdev.
 
  In both computers, after a fresh installation of snv 134, the throughput
  is a maximum of about 300 MB/s during scrub or something like
  dd if=/dev/zero bs=1024k of=bigfile.
 
  If I bfu to snv 138, I then get throughput of about 700 MB/s with both
  scrub or a single thread dd.
 
  I assumed at first this was some sort of bug or regression in 134 that
  made it slow.  However, I've now tested also from the fresh 134
  installation, compiling the OS/Net build 143 from the mercurial
  repository and booting into it, after which the dd throughput is still
  only about 300 MB/s just like snv 134.  The scrub throughput in 143
  is even slower, rarely surpassing 150 MB/s.  I wonder if the scrubbing
  being extra slow here is related to the additional statistics displayed
  during the scrub that didn't used to be shown.
 
  Is there some kind of debug option that might be enabled in the 134 build
  and persist if I compile snv 143 which would be off if I installed a 138
  through bfu?  If not, it makes me think that the bfu to 138 is changing
  the configuration somewhere to make it faster rather than fixing a bug or
  being a debug flag on or off.  Does anyone have any idea what might be
  happening?  One thing I haven't tried is bfu'ing to 138, and from this
  faster working snv 138 installing the snv 143 build, which may possibly
  create a 143 that performs faster if it's simply a configuration parameter.
  I'm not sure offhand if installing source-compiled ON builds from a bfu'd
  rpool is supported, although I suppose it's simple enough to try.
 
  Thanks,
  Chad Cantwell
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 I'm surprised you're even getting 400MB/s on the fast
 configurations, with only 16 drives in a Raidz3 configuration.
 To me, 16 drives in Raidz3 (single Vdev) would do about 150MB/sec, as
 your slow speeds suggest.
 
 -- 
 Brent Jones
 br...@servuhome.net

With which drives and controllers?  For a single dd thread writing a large file 
to fill
up a new zpool from /dev/zero, in this configuration I can sustain over 700 
MB/s for
the duration of the process and can fill up the ~26t usable space overnight.  
This is
with two 8 port LSI 1068e controllers and no expanders.  RAIDZ operates 
similiar to
regular raid and you should get striped speeds for sequential access minus any
inefficiencies and processing time for the parity.  16 disks in raidz3 is 13 
disks
worth of striping so with ~700 MB/s I'm getting about 50% efficiency after the 
parity
calculations etc which is fine with me.  I understand that some people need to 
have
higher performance random I/O to many 

[zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-06 Thread Chad Cantwell
Hi all,

I've noticed something strange in the throughput in my zpool between
different snv builds, and I'm not sure if it's an inherent difference
in the build or a kernel parameter that is different in the builds.
I've setup two similiar machines and this happens with both of them.
Each system has 16 2TB Samsung HD203WI drives (total) directly connected
to two LSI 3081E-R 1068e cards with IT firmware in one raidz3 vdev.

In both computers, after a fresh installation of snv 134, the throughput
is a maximum of about 300 MB/s during scrub or something like
dd if=/dev/zero bs=1024k of=bigfile.

If I bfu to snv 138, I then get throughput of about 700 MB/s with both
scrub or a single thread dd.

I assumed at first this was some sort of bug or regression in 134 that
made it slow.  However, I've now tested also from the fresh 134
installation, compiling the OS/Net build 143 from the mercurial
repository and booting into it, after which the dd throughput is still
only about 300 MB/s just like snv 134.  The scrub throughput in 143
is even slower, rarely surpassing 150 MB/s.  I wonder if the scrubbing
being extra slow here is related to the additional statistics displayed
during the scrub that didn't used to be shown.

Is there some kind of debug option that might be enabled in the 134 build
and persist if I compile snv 143 which would be off if I installed a 138
through bfu?  If not, it makes me think that the bfu to 138 is changing
the configuration somewhere to make it faster rather than fixing a bug or
being a debug flag on or off.  Does anyone have any idea what might be
happening?  One thing I haven't tried is bfu'ing to 138, and from this
faster working snv 138 installing the snv 143 build, which may possibly
create a 143 that performs faster if it's simply a configuration parameter.
I'm not sure offhand if installing source-compiled ON builds from a bfu'd
rpool is supported, although I suppose it's simple enough to try.

Thanks,
Chad Cantwell
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss