[zfs-discuss] slow zfs send

2012-05-07 Thread Karl Rossing

Hi,

I'm showing slow zfs send on pool v29. About 25MB/sec
bash-3.2# zpool status vdipool
  pool: vdipool
 state: ONLINE
 scan: scrub repaired 86.5K in 7h15m with 0 errors on Mon Feb  6 
01:36:23 2012

config:

NAME   STATE READ WRITE CKSUM
vdipoolONLINE   0 0 0
  raidz1-0 ONLINE   0 0 0
c0t5000C500103F2057d0  ONLINE   0 0 0 
(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod
c0t5000C5000440AA0Bd0  ONLINE   0 0 0 
(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod
c0t5000C500103E9FFBd0  ONLINE   0 0 
0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod
c0t5000C500103E370Fd0  ONLINE   0 0 
0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod
c0t5000C500103E120Fd0  ONLINE   0 0 
0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod

logs
  mirror-1 ONLINE   0 0 0
c0t500151795955D430d0  ONLINE   0 0 0(ATA-INTEL 
SSDSA2VP02-02M5-18.64GB) onboard drive on x4140
c0t500151795955BDB6d0  ONLINE   0 0 0 
(ATA-INTEL SSDSA2VP02-02M5-18.64GB)onboard drive on x4140

cache
  c0t5001517BB271845Dd0ONLINE   0 0 0 
(ATA-INTEL SSDSA2CW16-0362-149.05GB)onboard drive on x4140

spares
  c0t5000C500103E368Fd0AVAIL   
(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod


The drives are in an external promise 12 drive jbod. The jbod is also 
connected to another server that uses the other 6 SEAGATE ST31000640SS 
drives.


This on Solaris 10 8/11 (Generic_147441-01). I'm using LSI 9200 for the 
external promise jbod and an internal 9200 for the zli and l2arc which 
also uses rpool.

FW versions on both cards are  MPTFW-12.00.00.00-IT and MPT2BIOS-7.23.01.00.

I'm wondering why the zfs send could be so slow.  Could the other server 
be slowing down the sas bus?


Karl




CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is strictly
prohibited.  If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send

2012-05-07 Thread Jim Klimov

2012-05-07 20:45, Karl Rossing цкщеу:

I'm wondering why the zfs send could be so slow. Could the other server
be slowing down the sas bus?


I hope other posters would have more relevant suggestions, but
you can see if the buses are contended by dd'ing from the drives.
At least that would give you the measure of available sequential
throughput.

During the send you can also monitor zpool iostat 1 and usual
iostat -xnz 1 in order to see how busy the disks are and how
many IO requests are issued. The snapshots are likely sent in
the order of block age (TXG number), which for a busy pool may
mean heavy fragmentation and lots of random small IOs...

HTH,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send

2012-05-07 Thread Cindy Swearingen

Hi Karl,

I like to verify that no dead or dying disk is killing pool
performance and your zpool status looks good. Jim has replied
with some ideas to check your individual device performance.

Otherwise, you might be impacted by this CR:

7060894 zfs recv is excruciatingly slow

This CR covers both zfs send/recv ops and should be resolved
in an upcoming Solaris 10 release. Its already available in an
s11 SRU.

Thanks,

Cindy

On 5/7/12 10:45 AM, Karl Rossing wrote:

Hi,

I'm showing slow zfs send on pool v29. About 25MB/sec
bash-3.2# zpool status vdipool
  pool: vdipool
 state: ONLINE
 scan: scrub repaired 86.5K in 7h15m with 0 errors on Mon Feb  6 
01:36:23 2012

config:

NAME   STATE READ WRITE CKSUM
vdipoolONLINE   0 0 0
  raidz1-0 ONLINE   0 0 0
c0t5000C500103F2057d0  ONLINE   0 0 0 
(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod
c0t5000C5000440AA0Bd0  ONLINE   0 0 0 
(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod
c0t5000C500103E9FFBd0  ONLINE   0 0 
0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod
c0t5000C500103E370Fd0  ONLINE   0 0 
0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod
c0t5000C500103E120Fd0  ONLINE   0 0 
0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod

logs
  mirror-1 ONLINE   0 0 0
c0t500151795955D430d0  ONLINE   0 0 
0(ATA-INTEL SSDSA2VP02-02M5-18.64GB) onboard drive on x4140
c0t500151795955BDB6d0  ONLINE   0 0 0 
(ATA-INTEL SSDSA2VP02-02M5-18.64GB)onboard drive on x4140

cache
  c0t5001517BB271845Dd0ONLINE   0 0 0 
(ATA-INTEL SSDSA2CW16-0362-149.05GB)onboard drive on x4140

spares
  c0t5000C500103E368Fd0AVAIL   
(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod


The drives are in an external promise 12 drive jbod. The jbod is also 
connected to another server that uses the other 6 SEAGATE ST31000640SS 
drives.


This on Solaris 10 8/11 (Generic_147441-01). I'm using LSI 9200 for 
the external promise jbod and an internal 9200 for the zli and l2arc 
which also uses rpool.
FW versions on both cards are  MPTFW-12.00.00.00-IT and 
MPT2BIOS-7.23.01.00.


I'm wondering why the zfs send could be so slow.  Could the other 
server be slowing down the sas bus?


Karl




CONFIDENTIALITY NOTICE:  This communication (including all 
attachments) is
confidential and is intended for the use of the named addressee(s) 
only and

may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any 
attachments, in
whole or in part, by anyone other than the intended recipient(s) is 
strictly
prohibited.  If you have received this communication in error, please 
notify

the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send

2012-05-07 Thread Cindy Swearingen

Hi Karl,

Someone sitting across the table from me (who saw my posting)
informs me that CR 7060894 would not impact Solaris 10 releases,
so kindly withdrawn my comment about CR 7060894.

Thanks,

Cindy

On 5/7/12 11:35 AM, Cindy Swearingen wrote:

Hi Karl,

I like to verify that no dead or dying disk is killing pool
performance and your zpool status looks good. Jim has replied
with some ideas to check your individual device performance.

Otherwise, you might be impacted by this CR:

7060894 zfs recv is excruciatingly slow

This CR covers both zfs send/recv ops and should be resolved
in an upcoming Solaris 10 release. Its already available in an
s11 SRU.

Thanks,

Cindy

On 5/7/12 10:45 AM, Karl Rossing wrote:

Hi,

I'm showing slow zfs send on pool v29. About 25MB/sec
bash-3.2# zpool status vdipool
  pool: vdipool
 state: ONLINE
 scan: scrub repaired 86.5K in 7h15m with 0 errors on Mon Feb  6 
01:36:23 2012

config:

NAME   STATE READ WRITE CKSUM
vdipoolONLINE   0 0 0
  raidz1-0 ONLINE   0 0 0
c0t5000C500103F2057d0  ONLINE   0 0 0 
(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod
c0t5000C5000440AA0Bd0  ONLINE   0 0 0 
(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod
c0t5000C500103E9FFBd0  ONLINE   0 0 
0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod
c0t5000C500103E370Fd0  ONLINE   0 0 
0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod
c0t5000C500103E120Fd0  ONLINE   0 0 
0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod

logs
  mirror-1 ONLINE   0 0 0
c0t500151795955D430d0  ONLINE   0 0 
0(ATA-INTEL SSDSA2VP02-02M5-18.64GB) onboard drive on x4140
c0t500151795955BDB6d0  ONLINE   0 0 0 
(ATA-INTEL SSDSA2VP02-02M5-18.64GB)onboard drive on x4140

cache
  c0t5001517BB271845Dd0ONLINE   0 0 0 
(ATA-INTEL SSDSA2CW16-0362-149.05GB)onboard drive on x4140

spares
  c0t5000C500103E368Fd0AVAIL   
(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod


The drives are in an external promise 12 drive jbod. The jbod is also 
connected to another server that uses the other 6 SEAGATE 
ST31000640SS drives.


This on Solaris 10 8/11 (Generic_147441-01). I'm using LSI 9200 for 
the external promise jbod and an internal 9200 for the zli and l2arc 
which also uses rpool.
FW versions on both cards are  MPTFW-12.00.00.00-IT and 
MPT2BIOS-7.23.01.00.


I'm wondering why the zfs send could be so slow.  Could the other 
server be slowing down the sas bus?


Karl




CONFIDENTIALITY NOTICE:  This communication (including all 
attachments) is
confidential and is intended for the use of the named addressee(s) 
only and

may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any 
attachments, in
whole or in part, by anyone other than the intended recipient(s) is 
strictly
prohibited.  If you have received this communication in error, please 
notify

the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Hung zfs destroy

2012-05-07 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Ian Collins
 
 On a Solaris 11 (SR3) system I have a zfs destroy process what appears
 to be doing nothing and can't be killed.  It has used 5 seconds of CPU
 in a day and a half, but truss -p won't attach.  No data appears to have
 been removed.  The dataset (but not the pool) is busy.
 
 I thought this was an old problem that was fixed long ago in Solaris 10
 (I had several temporary patches over the years), but it appears to be
 alive and well.

How big is your dataset?  On what type of disks/pool?
zfs destroy does indeed take time (unlike zpool destroy.)  A couple of days
might be normal expected behavior, depending on your configuration.  You
didn't specify if you have dedup...  Dedup will greatly hurt your zfs
destroy speed, too.

That being said, sometimes things go wrong, and I don't have any suggestion
for you to determine if yours is behaving as expected.  Or not.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] IOzone benchmarking

2012-05-07 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Bob Friesenhahn
 
 Has someone done real-world measurements which indicate that raidz*
 actually provides better sequential read or write than simple
 mirroring with the same number of disks?  While it seems that there
 should be an advantage, I don't recall seeing posted evidence of such.
 If there was a measurable advantage, it would be under conditions
 which are unlikely in the real world.

Apparently I pulled it down at some point, so I don't have a URL for you
anymore, but I did, and I posted.  Long story short, both raidzN and mirror
configurations behave approximately the way you would hope they do.  That
is...

Approximately, as compared to a single disk:  And I *mean* approximately,
because I'm just pulling it back from memory the way I chose to remember it,
which is to say, a simplified model that I felt comfortable with:
seq rd  seq wr  rand rd rand wr
2-disk mirror   2x  1x  2x  1x
3-disk mirror   3x  1x  3x  1x
2x 2disk mirr   4x  2x  4x  2x
3x 2disk mirr   6x  3x  6x  3x
3-disk raidz2x  2x  1x  1x
4-disk raidz3x  3x  1x  1x
5-disk raidz4x  4x  1x  1x
6-disk raidz5x  5x  1x  1x

I went on to test larger and more complex arrangements...  Started getting
things like 1.9x and 1.8x where I would have expected 2x and so forth...
Sorry for being vague now, but the data isn't in front of me anymore.  Might
not ever be again.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] IOzone benchmarking

2012-05-07 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Paul Kraus
 
 Even with uncompressable data I measure better performance with
 compression turned on rather than off. 

*cough*

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Hung zfs destroy

2012-05-07 Thread Ian Collins

On 05/ 8/12 08:36 AM, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Ian Collins

On a Solaris 11 (SR3) system I have a zfs destroy process what appears
to be doing nothing and can't be killed.  It has used 5 seconds of CPU
in a day and a half, but truss -p won't attach.  No data appears to have
been removed.  The dataset (but not the pool) is busy.

I thought this was an old problem that was fixed long ago in Solaris 10
(I had several temporary patches over the years), but it appears to be
alive and well.

How big is your dataset?


Small, 15GB.


  On what type of disks/pool?


Single iSCSI volume.


zfs destroy does indeed take time (unlike zpool destroy.)  A couple of days
might be normal expected behavior, depending on your configuration.  You
didn't specify if you have dedup...  Dedup will greatly hurt your zfs
destroy speed, too.


I've yet to find a system with enough RAM to make dedup worthwhile!

After 5 days, a grand total of 1.2GB has been removed and the process
responded to kill -9 and exited...

I just re-ran the command it it completed in 2 seconds.  Well odd.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] IOzone benchmarking

2012-05-07 Thread Bob Friesenhahn

On Mon, 7 May 2012, Edward Ned Harvey wrote:


Apparently I pulled it down at some point, so I don't have a URL for you
anymore, but I did, and I posted.  Long story short, both raidzN and mirror
configurations behave approximately the way you would hope they do.  That
is...

Approximately, as compared to a single disk:  And I *mean* approximately,


Yes, I remember your results.

In a few weeks I should be setting up a new system with OpenIndiana 
and 8 SAS disks.  This will give me an opportunity to test again. 
Last time I got to play was back in Feburary 2008 and I did not bother 
to test raidz 
(http://www.simplesystems.org/users/bfriesen/zfs-discuss/2540-zfs-performance.pdf).


Most common benchmarking is sequential read/write and rarely 
read-file/write-file where 'file' is a megabyte or two and the file is 
different for each iteration.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send

2012-05-07 Thread Karl Rossing

On 12-05-07 12:18 PM, Jim Klimov wrote:

During the send you can also monitor zpool iostat 1 and usual
iostat -xnz 1 in order to see how busy the disks are and how
many IO requests are issued. The snapshots are likely sent in
the order of block age (TXG number), which for a busy pool may
mean heavy fragmentation and lots of random small IOs..
I have been able to verify that I can get a zfs send at 135MB/sec for a 
striped pool with 2 internal drives on the same server.


Each dataset had about 3-4 snapshots. There were about 36 datasets

I deleted the snapshots and the speed may have increased slightly.

Given iostat -xnz 1 it looks like the IO's are very high. So I guess 
the drives are badly fragmented.


So is fixing this going to require a zfs pool rebuild?

Karl

extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.04.00.0   16.0  0.0  0.00.00.0   0   0 
c0t500151795955D430d0
0.04.00.0   16.0  0.0  0.00.00.0   0   0 
c0t500151795955BDB6d0
0.01.00.08.0  0.0  0.00.00.1   0   0 
c0t5001517BB271845Dd0
  759.00.0 4800.00.0  0.0  2.90.03.8   0  75 
c0t5000C500103F2057d0
  887.00.0 4738.00.0  0.0  1.60.01.8   0  42 
c0t5000C500103E9FFBd0
  915.00.0 4628.50.0  0.0  1.50.01.6   0  30 
c0t5000C5000440AA0Bd0
  922.00.0 4676.50.0  0.0  1.00.01.1   0  26 
c0t5000C500103E120Fd0
  970.00.0 4276.00.0  0.0  1.00.01.0   0  20 
c0t5000C500103E370Fd0

extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.04.00.0   32.0  0.0  0.00.00.1   0   0 
c0t5001517BB271845Dd0
 1363.00.0 9007.80.0  0.0  2.00.01.5   1  54 
c0t5000C500103F2057d0
 1405.00.0 10169.20.0  0.0  1.80.01.3   1  37 
c0t5000C500103E9FFBd0
 1448.00.0 9884.20.0  0.0  1.70.01.2   1  40 
c0t5000C5000440AA0Bd0
 1264.00.0 9537.30.0  0.0  2.10.01.7   0  51 
c0t5000C500103E120Fd0
 1260.00.0 9749.80.0  0.0  1.90.01.5   0  44 
c0t5000C500103E370Fd0

extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.06.00.0   24.0  0.0  0.00.00.0   0   0 
c0t500151795955D430d0
0.06.00.0   24.0  0.0  0.00.00.0   0   0 
c0t500151795955BDB6d0
 1023.00.0 5131.60.0  0.0  1.60.01.6   0  45 
c0t5000C500103F2057d0
 1003.00.0 5040.10.0  0.0  1.50.01.5   0  36 
c0t5000C500103E9FFBd0
  959.00.0 5069.10.0  0.0  1.70.01.8   0  46 
c0t5000C5000440AA0Bd0
  941.00.0 5117.60.0  0.0  1.70.01.8   0  45 
c0t5000C500103E120Fd0
 1043.00.0 5034.10.0  0.0  1.00.01.0   0  24 
c0t5000C500103E370Fd0




CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is strictly
prohibited.  If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send

2012-05-07 Thread Bob Friesenhahn

On Mon, 7 May 2012, Karl Rossing wrote:


On 12-05-07 12:18 PM, Jim Klimov wrote:

During the send you can also monitor zpool iostat 1 and usual
iostat -xnz 1 in order to see how busy the disks are and how
many IO requests are issued. The snapshots are likely sent in
the order of block age (TXG number), which for a busy pool may
mean heavy fragmentation and lots of random small IOs..
I have been able to verify that I can get a zfs send at 135MB/sec for a 
striped pool with 2 internal drives on the same server.


I see that there are a huge number of reads and hardy any reads.  Are 
you SURE that deduplication was not enabled for this pool?  This is 
the sort of behavior that one might expect if deduplication was 
enabled without enough RAM or L2 read cache.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send

2012-05-07 Thread Karl Rossing

On 12-05-07 8:45 PM, Bob Friesenhahn wrote:
I see that there are a huge number of reads and hardy any reads.  Are 
you SURE that deduplication was not enabled for this pool?  This is 
the sort of behavior that one might expect if deduplication was 
enabled without enough RAM or L2 read cache.


Bob
After hours the pool is pretty quiet. zpool history does not have dedup. 
zfs get dedup shows dedup off.


Karl




CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is strictly
prohibited.  If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss