[zfs-discuss] Strange hang during snapshot receive

2012-05-10 Thread Ian Collins
I have an application I have been using to manage data replication for a 
number of years.  Recently we started using a new machine as a staging 
server (not that new, an x4540) running Solaris 11 with a single pool 
built from 7x6 drive raidz.  No dedup and no reported errors.


On that box and nowhere else is see empty snapshots taking 17 or 18 
seconds to write.  Everywhere else they return in under a second.


Using truss and the last published source code, it looks like the pause 
is between a printf and  the call to zfs_ioctl and there aren't any 
other functions calls between them:


100.5124 0.0004open(/dev/zfs, O_RDWR|O_EXCL)= 10
100.7582 0.0001read(7, \0\0\0\0\0\0\0\0ACCBBAF5.., 312)= 312
100.7586 0.read(7, 0x080464F8, 0)= 0
100.7591 0.time()= 1336628656
100.7653 0.0035ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040CF0)= 0
100.7699 0.0022ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040900)= 0
100.7740 0.0016ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040580)= 0
100.7787 0.0026ioctl(8, ZFS_IOC_OBJSET_STATS, 0x080405B0)= 0
100.7794 0.0001write(1,  r e c e i v i n g   i n.., 75)= 75
118.3551 0.6927ioctl(8, ZFS_IOC_RECV, 0x08042570)= 0
118.3596 0.0010ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040900)= 0
118.3598 0.time()= 1336628673
118.3600 0.write(1,  r e c e i v e d   3 1 2.., 45)= 45

zpool iostat (1 second interval) for the period is:

tank12.5T  6.58T175  0   271K  0
tank12.5T  6.58T176  0   299K  0
tank12.5T  6.58T189  0   259K  0
tank12.5T  6.58T156  0   231K  0
tank12.5T  6.58T170  0   243K  0
tank12.5T  6.58T252  0   295K  0
tank12.5T  6.58T179  0   200K  0
tank12.5T  6.58T214  0   258K  0
tank12.5T  6.58T165  0   210K  0
tank12.5T  6.58T154  0   178K  0
tank12.5T  6.58T186  0   221K  0
tank12.5T  6.58T184  0   215K  0
tank12.5T  6.58T218  0   248K  0
tank12.5T  6.58T175  0   228K  0
tank12.5T  6.58T146  0   194K  0
tank12.5T  6.58T 99258   209K  1.50M
tank12.5T  6.58T196296   294K  1.31M
tank12.5T  6.58T188130   229K   776K

Can anyone offer any insight or further debugging tips?

Thanks.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Strange hang during snapshot receive

2012-05-10 Thread Mike Gerdts
On Thu, May 10, 2012 at 5:37 AM, Ian Collins i...@ianshome.com wrote:
 I have an application I have been using to manage data replication for a
 number of years.  Recently we started using a new machine as a staging
 server (not that new, an x4540) running Solaris 11 with a single pool built
 from 7x6 drive raidz.  No dedup and no reported errors.

 On that box and nowhere else is see empty snapshots taking 17 or 18 seconds
 to write.  Everywhere else they return in under a second.

 Using truss and the last published source code, it looks like the pause is
 between a printf and  the call to zfs_ioctl and there aren't any other
 functions calls between them:

For each snapshot in a stream, there is one zfs_ioctl() call.  During
that time, the kernel will read the entire substream (that is, for one
snapshot) from the input file descriptor.


 100.5124     0.0004    open(/dev/zfs, O_RDWR|O_EXCL)            = 10
 100.7582     0.0001    read(7, \0\0\0\0\0\0\0\0ACCBBAF5.., 312)    = 312
 100.7586     0.    read(7, 0x080464F8, 0)                = 0
 100.7591     0.    time()                        = 1336628656
 100.7653     0.0035    ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040CF0)    = 0
 100.7699     0.0022    ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040900)    = 0
 100.7740     0.0016    ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040580)    = 0
 100.7787     0.0026    ioctl(8, ZFS_IOC_OBJSET_STATS, 0x080405B0)    = 0
 100.7794     0.0001    write(1,  r e c e i v i n g   i n.., 75)    = 75
 118.3551     0.6927    ioctl(8, ZFS_IOC_RECV, 0x08042570)        = 0
 118.3596     0.0010    ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040900)    = 0
 118.3598     0.    time()                        = 1336628673
 118.3600     0.    write(1,  r e c e i v e d   3 1 2.., 45)    = 45

 zpool iostat (1 second interval) for the period is:

 tank        12.5T  6.58T    175      0   271K      0
 tank        12.5T  6.58T    176      0   299K      0
 tank        12.5T  6.58T    189      0   259K      0
 tank        12.5T  6.58T    156      0   231K      0
 tank        12.5T  6.58T    170      0   243K      0
 tank        12.5T  6.58T    252      0   295K      0
 tank        12.5T  6.58T    179      0   200K      0
 tank        12.5T  6.58T    214      0   258K      0
 tank        12.5T  6.58T    165      0   210K      0
 tank        12.5T  6.58T    154      0   178K      0
 tank        12.5T  6.58T    186      0   221K      0
 tank        12.5T  6.58T    184      0   215K      0
 tank        12.5T  6.58T    218      0   248K      0
 tank        12.5T  6.58T    175      0   228K      0
 tank        12.5T  6.58T    146      0   194K      0
 tank        12.5T  6.58T     99    258   209K  1.50M
 tank        12.5T  6.58T    196    296   294K  1.31M
 tank        12.5T  6.58T    188    130   229K   776K

 Can anyone offer any insight or further debugging tips?

I have yet to see a time when zpool iostat tells me something useful.
I'd take a look at iostat -xzn 1 or similar output.  It could point
to imbalanced I/O or a particular disk that has abnormally high
service times.

Have you installed any SRUs?  If not, you could be seeing:

7060894 zfs recv is excruciatingly slow

which is fixed in Solaris 11 SRU 5.

If you are using zones and are using any https pkg(5) origins (such as
https://pkg.oracle.com/solaris/support), I suggest reading
https://forums.oracle.com/forums/thread.jspa?threadID=2380689tstart=15
before updating to SRU 6 (SRU 5 is fine, however).  The fix for the
problem mentioned in that forums thread should show up in an upcoming
SRU via CR 7157313.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss