Re: [storage-discuss] AVS - SNDR: Recovery bitmaps not allocated

Roman Naumenko Fri, 24 Apr 2009 07:53:14 -0700

Thanks for your help!

Actually I have some more questions. I need to make a decision onreplication mode for our storages: zfs send-receive, avs or evenmicrosoft internal tool on the iscsi volumes with independent zfssnashots on both side.Initially avs seemed to me a good options, but I can't make it workingon 100Mb link with 8x1.36Tb volumes

Roman,
> A weird issue:
> 1. avs works for connections on a local switch via local freebsd> router connected to the switch
>  host1 -> switch -> router freebsd -> switch -> host2
> 2. When trying to emulate replication using far distance remote> connection with the freebsd router on the remote side then AVS fails> with the error:
> [b]sndradm: warning: SNDR: Recovery bitmaps not allocated
> [/b]
First of all, what version of AVS / OpenSolaris are you running? Thereason I ask, is that this error message being returned from"sndradm", was a problem partially resolved for AVS on Solaris 10, orAVS bundled with OpenSolaris.
# sndradm -v
Remote Mirror version 11.11
# uname -a
SunOS tor.flt 5.11 snv_101b i86pc i386 i86pc Solaris
The specific issue at hand, is that during the first stages of an"sndradm -u ...", update command, the SNDR secondary node is asked tosend its entire bitmap to the SNDR primary node. The operation is donevia a Solaris RPC call, an operation which has an associated timeoutvalue. If the amount of time it takes to send this data over thenetwork from the secondary node to primary node, exceeds the RPCtimeout value, the operation fails with a "Recovery bitmaps notallocated".
It's strange that sndr sends entire bitmap - what if one is for a bigreplicated volumes like for 1.36Gb? It's more that 100000 blocks forasync replication, .
There will be constant timeouts on average 100M link in this case.
SNDR does not replicate the bitmap volume, just the bitmap itself.There is one bit per 32KB of primary volume size, with 8 bits perbyte, and 512 bytes per block. The answer for 1.36GB is just 11.04blocks, or 5.5KB.

But the dsbitmap shows 100441 blocks for async replication, I'm missingsomething?


Required bitmap volume size:
 Sync replication: 11161 blocks
 Async replication with memory queue: 11161 blocks
 Async replication with disk queue: 100441 blocks
 Async replication with disk queue and 32 bit refcount: 368281 blocks

Good. I kind of figure that this was the problem. What are is yourSNDR primary volume size?

After initial sync started to work (although it's a very slow processand it takes 10-15 mins to complete) I have the following situation:


1. Storage (8x1.36Tb in one raidz2 pool):
[email protected]# sndradm -i

tor2.flt2 /dev/rdsk/c3t0d0s0 /dev/md/rdsk/bmp0 mtl2.flt2/dev/rdsk/c3t0d0s0 /dev/md/rdsk/bmp0 ip async g zfs-pooltor2.flt2 /dev/rdsk/c3t1d0s0 /dev/md/rdsk/bmp1 mtl2.flt2/dev/rdsk/c3t1d0s0 /dev/md/rdsk/bmp1 ip async g zfs-pooltor2.flt2 /dev/rdsk/c3t2d0s0 /dev/md/rdsk/bmp2 mtl2.flt2/dev/rdsk/c3t2d0s0 /dev/md/rdsk/bmp2 ip async g zfs-pooltor2.flt2 /dev/rdsk/c3t3d0s0 /dev/md/rdsk/bmp3 mtl2.flt2/dev/rdsk/c3t3d0s0 /dev/md/rdsk/bmp3 ip async g zfs-pooltor2.flt2 /dev/rdsk/c3t4d0s0 /dev/md/rdsk/bmp4 mtl2.flt2/dev/rdsk/c3t4d0s0 /dev/md/rdsk/bmp4 ip async g zfs-pooltor2.flt2 /dev/rdsk/c3t5d0s0 /dev/md/rdsk/bmp5 mtl2.flt2/dev/rdsk/c3t5d0s0 /dev/md/rdsk/bmp5 ip async g zfs-pooltor2.flt2 /dev/rdsk/c3t6d0s0 /dev/md/rdsk/bmp6 mtl2.flt2/dev/rdsk/c3t6d0s0 /dev/md/rdsk/bmp6 ip async g zfs-pooltor2.flt2 /dev/rdsk/c3t7d0s0 /dev/md/rdsk/bmp7 mtl2.flt2/dev/rdsk/c3t7d0s0 /dev/md/rdsk/bmp7 ip async g zfs-pooltor2.flt2 /dev/rdsk/c4t1d0s0 /dev/md/rdsk/bmp8 mtl2.flt2/dev/rdsk/c4t1d0s0 /dev/md/rdsk/bmp8 ip async g zfs-pool

2. Bitmaps are on the mirrored metadevice, they are bigger than youmentioned but this is what dsbitmap shows for volumes:


bmp0: Soft Partition
   Device: d100
   State: Okay
   Size: 100441 blocks (49 MB)
   Extent              Start Block              Block count
        0                       34                   100441

3. Network:
tor2.flt2 <--> freebsd router < --->mtl2.flt2

4. Latency:

[email protected]:# ping -s mtl2.flt2
PING 172.0.5.10: 56 data bytes
64 bytes from mtl2.flt2 (172.0.5.10): icmp_seq=0. time=16.822 ms

I'm emulating on Freebsd the actual delay and the speed for the realcircuit which is 100Mb and 16ms.


5. The queue during writes with the speed 40Mbite/s on the main host:

[email protected]:/# kstat sndr::setinfo | grep async_block_hwm
   async_block_hwm                 1402834
   async_block_hwm                 1402834
   async_block_hwm                 1402834
   async_block_hwm                 1402834
   async_block_hwm                 1402834
   async_block_hwm                 1402834
   async_block_hwm                 1402834
   async_block_hwm                 1402834
   async_block_hwm                 1402834

The problems:

1.

In replication mode data transmission on freebsd is only 2.5 Mbite/s forrcp traffic which is quite lower the numbers netio shows:


----------------- Real link, 100Mb; 16 delay -------------
TCP connection established.
Packet size  1k bytes:  3239 KByte/s Tx,  5885 KByte/s Rx.

2.

When initial synchronization (sndradm -nu) happens it's traffic isalmost zero.But, if it's gig connection on the switch then the syncro is prettyfast, maybe dozed of seconds instead of minutes.

3.

But the real problem is that iscsi initiator writings stale because ofsndr replication. The windows initiator just hungs up and can be onlyreseted by deleting target on the server.And async_block_hwm is very high when there is writhing on pool and getsstuck:

async_block_hwm                 1402834

[email protected]:/export/home/roman/zfs# zpool  iostat 5
              capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rpool       10.6G  63.4G      0      0  3.27K  2.80K
zstor       8.99G  10.9T      0      4     35   451K
----------  -----  -----  -----  -----  -----  -----
rpool       10.6G  63.4G      0      0      0      0
zstor       8.99G  10.9T      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
rpool       10.6G  63.4G      0      0      0      0
zstor       8.99G  10.9T      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
rpool       10.6G  63.4G      0      0      0      0
zstor       8.99G  10.9T      0    499      0  61.9M
----------  -----  -----  -----  -----  -----  -----
rpool       10.6G  63.4G      0      0      0      0
zstor       8.99G  10.9T      0     21      0  2.67M
----------  -----  -----  -----  -----  -----  -----
rpool       10.6G  63.4G      0      0      0      0
zstor       8.99G  10.9T      0     18      0  2.37M
----------  -----  -----  -----  -----  -----  -----
rpool       10.6G  63.4G      0      0      0      0
zstor       8.99G  10.9T      0     25      0  3.24M
----------  -----  -----  -----  -----  -----  -----
rpool       10.6G  63.4G      0      0      0      0
zstor       8.99G  10.9T      0     18      0  2.27M
----------  -----  -----  -----  -----  -----  -----
rpool       10.6G  63.4G      0      0      0      0
zstor       8.99G  10.9T      0     26      0  3.37M
----------  -----  -----  -----  -----  -----  -----

zpool iostat show some writing but the initiator doesn't respond onwindows box at this time.


And this is what kstat shows:

[email protected]:# kstat sndr:0:setinfo

module: sndr instance: 0name: setinfo class: storedge

   async_block_hwm                 1402834
   async_item_hwm                  16417
   async_queue_blocks              *1390128*
   async_queue_items               16382
   async_queue_type                memory
   async_throttle_delay            8135137
   autosync                                0
   bitmap                                  /dev/md/rdsk/bmp0
   bitsset                                 2674
   bmpflags                                0
   bmp_size                            5713920
   crtime                                  11243.852414994
   disk_status                         0
   flags                                   2054
   if_down                             0
   if_rpc_version                      7
   maxqfbas                            25000000
   maxqitems                           16384
   primary_host                    tor2.flt2
   primary_vol                     /dev/rdsk/c3t0d0s0
   secondary_host                  mtl2.flt2
   secondary_vol                   /dev/rdsk/c3t0d0s0
   snaptime                        76822.27391977
   syncflags                       0
   syncpos                         2925489600
   type_flag                       5
   volsize                         2925489887

So, the question for me now if avs available to handle burst writingand replicate changes over the slow and hight latency link to the otherside without affecting performance on the primary host? And it shouldtrack changes during long periods when the link is down.

If it's still possible to use avs with 2 hosts contain 8x1.36G volumes,max writing speed is about 30-40 Mb/s and the circuit is 100Mb linkwith 15-20 ms latency? In async mode obviously

Another question: if link was down for a long time and blocks havechanged couple of time over the downtime - how sndr replicatedsequenced writes?
It doesn't.

SNDR has three modes of operation:
logging mode
(re)synchronization mode
replicating mode
In logging mode and replicating mode, SNDR keeps the both SNDRprimary and secondary volumes in write-consistent (sequenced) order.
During (re)synchronization mode, SNDR updates the secondary volume ina block (actually bitmap) order. There is only a single bit used totrack differences between the primary and secondary volumes, and onebit is equal to 32KB.
If the replication environment, volume content andapplication availability need be concerned that an SNDR replica is notwrite-order consistent during (re)synchronization mode, SNDR supportsan option call ndr_ii, what takes an automatic compact dependentsnapshot of the write-order consistent SNDR secondary volume, and ifin the unlikely case that (re)synchronization mode fails, and the SNDRprimary volume is lost, the snapshot volume can be restore on the SNDRsecondary.

Does it mean that during re(syncronization) mode volumes shouldn't bemounted because the write order is not consistent?


Roman

_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Re: [storage-discuss] AVS - SNDR: Recovery bitmaps not allocated

Reply via email to