Re: [zfs-discuss] Thinking about spliting a zpool in system and data

2012-01-10 Thread Richard Elling

On Jan 9, 2012, at 7:23 PM, Jesus Cea wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 07/01/12 13:39, Jim Klimov wrote:
 I have transitioned a number of systems roughly by the same
 procedure as you've outlined. Sadly, my notes are not in English so
 they wouldn't be of much help directly;
 
 Yes, my russian is rusty :-).
 
 I have bitten the bullet and spend 3-4 days doing the migration. I
 wrote the details here:
 
 http://www.jcea.es/artic/solaris_zfs_split.htm
 
 The page is written in Spanish, but the terminal transcriptions should
 be useful for everybody.
 
 In the process, maybe somebody finds this interesting too:
 
 http://www.jcea.es/artic/zfs_flash01.htm

Google translate works well for this :-)  Thanks for posting!
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
illumos meetup, Jan 10, 2012, Menlo Park, CA
http://www.meetup.com/illumos-User-Group/events/41665962/ 














___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thinking about spliting a zpool in system and data

2012-01-10 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/01/12 21:32, Richard Elling wrote:
 On Jan 9, 2012, at 7:23 PM, Jesus Cea wrote:
[...]
 The page is written in Spanish, but the terminal transcriptions
 should be useful for everybody.
 
 In the process, maybe somebody finds this interesting too:
 
 http://www.jcea.es/artic/zfs_flash01.htm
 
 Google translate works well for this :-)  Thanks for posting! --
 richard

Talking about this, there is something that bugs me.

For some reason, sync writes are written to the ZIL only if they are
small. Big writes are far slower, apparently bypassing the ZIL.
Maybe some concern about disk bandwidth (because we would be writing
the data twice, but it is only speculation).

But this is happening TOO when the ZIL is in a SSD. I guess ZFS should
write the sync writes to the SSD even if they are quite big (megabytes).

In the zil.c code I see things like:


/*
 * Define a limited set of intent log block sizes.
 * These must be a multiple of 4KB. Note only the amount used (again
 * aligned to 4KB) actually gets written. However, we can't always just
 * allocate SPA_MAXBLOCKSIZE as the slog space could be exhausted.
 */
uint64_t zil_block_buckets[] = {
4096,   /* non TX_WRITE */
8192+4096,  /* data base */
32*1024 + 4096, /* NFS writes */
UINT64_MAX
};

/*
 * Use the slog as long as the logbias is 'latency' and the current
commit size
 * is less than the limit or the total list size is less than 2X the
limit.
 * Limit checking is disabled by setting zil_slog_limit to UINT64_MAX.
 */
uint64_t zil_slog_limit = 1024 * 1024;
#define USE_SLOG(zilog) (((zilog)-zl_logbias == ZFS_LOGBIAS_LATENCY)  \
(((zilog)-zl_cur_used  zil_slog_limit) || \
((zilog)-zl_itx_list_sz  (zil_slog_limit  1


I have 2GB of ZIL in a mirrored SSD. I can randomly write to it at
240MB/s, so I guess the sync write restriction could be reexamined
when ZFS is using a separate ZIL device, with plenty of space to burn
:-). Am I missing anything?

Could I change the value of zil_slog_limit in the kernel (via mdb)
when using a ZIL device, safely?. Would do what I expect?

My usual database block size is 64KB... :-(. The writeahead log write
can be bigger that 128KB easily (before and after data, plus some
changes in the parent nodes).

Seems faster to do several writes with several SYNCs that a big write
with a final SYNC. That is quite counterintuitive.

Am I hitting something else, like the write throttle?

PS: I am talking about Solaris 10 U10. My ZFS logbias attribute is
latency.

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTw0EMZlgi5GaxT1NAQLfVAQAhQxJwLVBOJ4ybA8HUJc+p94cJJ4CtsSS
/9Un7KKR09+FYrkOycoViYsUqrb+vBGSZHCyElQRXZf7nz14qX0qullXn6jqkSHv
Pxjp3nQAu7ERCcPi2jfuOgXyzw7F74F/UduL2Qla+XFrYSpkBYsDIikIO+lgSLZh
JVdvnshISMc=
=at00
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thinking about spliting a zpool in system and data

2012-01-09 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/01/12 13:39, Jim Klimov wrote:
 I have transitioned a number of systems roughly by the same
 procedure as you've outlined. Sadly, my notes are not in English so
 they wouldn't be of much help directly;

Yes, my russian is rusty :-).

I have bitten the bullet and spend 3-4 days doing the migration. I
wrote the details here:

http://www.jcea.es/artic/solaris_zfs_split.htm

The page is written in Spanish, but the terminal transcriptions should
be useful for everybody.

In the process, maybe somebody finds this interesting too:

http://www.jcea.es/artic/zfs_flash01.htm

Sorry, Spanish only too.

 Overall, your plan seems okay and has more failsafes than we've had
 - because longer downtimes were affordable ;) However, when doing
 such low-level stuff, you should make sure that you have remote
 access to your systems (ILOM, KVM, etc.; remotely-controlled PDUs
 for externally enforced

Yes, the migration I did had plenty of safety points (you can go back if
something doesn't work) and, most of the time, the system was in a
state able to survive accidental reboot. Downtime was minimal, less than
an hour in total (several reboots to validate configurations before
proceeding). I am quite pleased of the eventless migration, but I
planned it quite carefully. Worried about hitting bugs in Solaris/ZFS,
though. But it was very smooth.

The machine is hosted remotely but yes, I have remote-KVM. I can't
boot from remote media, but I have an OpenIndiana release in the SSD,
with VirtualBox installed and the Solaris 10 Update 10 release ISO,
just in case :-).

The only suspicious thing is that I keep swap (32GB) and dump
(4GB) in the data zpool, instead in system. Seems to work OK.
Crossing my fingers for the next Live Upgrade :-).

I have read your message after I migrated, but it was very
interesting. Thanks for taking the time to write it!.

Have a nice 2012.

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTwuvNJlgi5GaxT1NAQLJ0wP9EgpQnUdYCiLOnlGK8UC2QodT9s8KuqMK
5F9YwlPLdZ3S1DfWGKgC3k9MLbCfYLihM+KqysblsHs5Jf9/HGYSGK5Ky5HlYB5c
4vO+KrDU2eT/BYIVrDmFCucj8Fh8CN0Ule+Z5JtvhdlN/5rQ+osRmLQXr3SqQm6F
w/ilYwB09+0=
=fGc3
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thinking about spliting a zpool in system and data

2012-01-07 Thread Jim Klimov

Hello, Jesus,

  I have transitioned a number of systems roughly by the
same procedure as you've outlined. Sadly, my notes are
not in English so they wouldn't be of much help directly;
but I can report that I had success with similar in-place
manual transitions from mirrored SVM (pre-solaris 10u4)
to new ZFS root pools, as well as various transitions
of ZFS root pools from one layout to another, on systems
with limited numbers of disk drives (2-4 overall).

  As I've recently reported on the list, I've also done
such migration for my faulty single-disk rpool at home
via the data pool and backwards, changing the copies
setting enroute.

  Overall, your plan seems okay and has more failsafes
than we've had - because longer downtimes were affordable ;)
However, when doing such low-level stuff, you should make
sure that you have remote access to your systems (ILOM,
KVM, etc.; remotely-controlled PDUs for externally enforced
poweroff-poweron are welcome), and that you can boot the
systems over ILOM/rKVM with an image of a LiveUSB/LiveCD/etc
in case of bigger trouble.

  In the steps 6-7, where you reboot the system to test
that new rpool works, you might want to keep the zones
down, i.e. by disabling the zones service in the old BE
just before reboot, and zfs-sending this update to the
new small rpool. Also it is likely that in the new BE
(small rpool) your old data from the big rpool won't
get imported by itself and zones (or their services)
wouldn't start correctly anyway before steps 7-8.

---

Below I'll outline our experience from my notes, as it
successfully applied to an even more complicated situation
than yours:

  On many Sol10/SXCE systems with ZFS roots we've also
created a hierarchical layout (separate /var, /usr, /opt
with compression enabled), but this procedure HAS FAILED
for newer OpenIndiana systems. So for OI we have to use
the default single-root layout and only seperate some of
/var/* subdirs (adm, log, mail, crash, cores, ...) in
order to set quotas and higher compression on them.
Such datasets are also kept separate from OS upgrades
and are used in all boot environments without cloning.

  To simplify things, most of the transitions were done
in off-hours time so it was okay to shut down all the
zones and other services. In some cases for Sol10/SXCE
the procedure involved booting in the Failsafe Boot
mode; for all systems this can be done with the BootCD.

  For usual Solaris 10 and OpenSolaris SXCE maintenance
we did use LiveUpgrade, but at that time its ZFS support
was immature, so we circumvented LU and transitioned
manually. In those cases we used LU to update systems
to the base level supporting ZFS roots (Sol10u4+) while
running from SVM mirrors (one mirror for main root,
another mirror for LU root for new/old OS image).
After the transition to ZFS rpool, we cleared out the
LU settings (/etc/lu/, /etc/lutab) by using defaults
from the most recent SUNWlu* packages, and when booted
from ZFS - we created the current LU BE based on the
current ZFS rpool.

  When the OS was capable of booting from ZFS (sol10u4+,
snv_100 approx), we broke the SVM mirrors, repartitioned
the second disk to our liking (about 4-10Gb for rpool,
rest for data), created the new rpool and dataset
hierarchy we needed and had in mounted under /zfsroot.

  Note that in our case we used a minimized install
of Solaris which fit under 1-2Gb per BE, we did not use
a separate /dump device and the swap volume was located
in the ZFS data pool (mirror or raidz for 4-disk systems).
Zoneroots were also separate from the system rpool and
were stored in the data pool. This DID yield problems
for LiveUpgrade, so zones were detached before LU and
reattached-with-upgrade after the OS upgrade and disk
migrations.

  Then we copied the root FS data like this:

# cd /zfsroot  ( ufsdump 0f - / | ufsrestore -rf - )

  If the source (SVM) paths like /var, /usr or /boot are
separate UFS filesystems - repeat likewise, changing the
current paths in the command above.

  For non-UFS systems, such as migration from VxFS or
even ZFS (if you need a different layout, compression,
etc. - so ZFS send/recv is not applicable), you can use
Sun cpio (it should carry over extended attributes and
ACLs). For example, if you're booted from the LiveCD
and the old UFS root is mounted in /usfroot and new
ZFS rpool hierarchy is in /zfsroot, you'd do this:

# cd /ufsroot  ( find . -xdev -depth -print | cpio -pvdm /zfsroot )

  The example above also copies only the data from
current FS, so you need to repeat it for each UFS
sub-fs like /var, etc.

  Another problem we've encountered while cpio'ing live
systems (when not running from failsafe/livecd) is that
find skips mountpoints of sub-fses. While your new ZFS
hierarchy would provide usr, var, opt under /zfspool,
you might need to manually create some others - see the
list in your current df output. Example:

# cd /zfsroot
# mkdir -p tmp proc devices var/run system/contract system/object 
etc/svc/volatile


Re: [zfs-discuss] Thinking about spliting a zpool in system and data

2012-01-06 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Jesus Cea
 
 Sorry if this list is inappropriate. Pointers welcomed.

Not at all.  This is the perfect forum for your question.


 So I am thinking about splitting my full two-disk zpool in two zpools,
 one for system and other for data. Both using both disks for
 mirroring. So I would have two slices per disk.

Please see the procedure below, which I wrote as notes for myself, to
perform disaster recovery backup/restore of rpool.  This is not DIRECTLY
applicable for you, but it includes all the necessary ingredients to make
your transition successful.  So please read, and modify as necessary for
your purposes.

Many good notes available:
ZFS Troubleshooting Guide
 
http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#ZFS
_Root_Pool_Recovery

Before you begin
Because you will restore from a boot CD, there are only a few
compression options available to you:  7z, bzip2, and gzip.
The clear winner in general is 7z with compression level 1.  It's about
as fast as gzip, with compression approx 2x stronger than bzip2

Because you will restore from a boot CD, the media needs to be located
somewhere 
that can be accessed from a CD boot environment, which does not include
ssh/scp.
The obvious choice is NFS.  Be aware that solaris NFS client is often
not very
compatible with linux NFS servers.

I am assuming there is a solaris NFS server available because it makes
my job easy while I'm writing this.  ;-)  Note:  You could just as 
easily store the backup on removable disk in something like a zfs pool.
Just make sure whatever way you store it, it's accessible from the 
CD boot environment, which might not support a later version of zpool
etc.

Create NFS exports on some other solaris machine.
share -F nfs -o rw=machine1:machine2,root=machine1:machine2 /backupdir
Also edit hosts file to match, because forward/reverse dns must match
for the client.

Create a backup suitable for system recovery.
mount someserver:/backupdir /mnt

Create snapshots:
zfs snapshot rpool@uniquebackupstring
zfs snapshot rpool/ROOT@uniquebackupstring
zfs snapshot rpool/ROOT/machinename_slash@uniquebackupstring

Send snapshots:
Notice: due to bugs, don't do this recursively.  Do it separately, as
outlined here.
Notice:  In some version of zpool/zfs, these bugs were fixed so you can
safely do it recursively.  I don't know what rev is needed.
zfs send rpool@uniquebackupstring | 7z a -mx=1 -si /mnt/rpool.zfssend.7z
zfs send rpool/ROOT@uniquebackupstring | 7z a -mx=1 -si
/mnt/rpool_ROOT.zfssend.7z
zfs send rpool/ROOT/machinename_slash@uniquebackupstring | 7z a -mx=1
-si /mnt/rpool_ROOT_machinename_slash.zfssend.7z

It is also wise to capture a list of the pristine zpool and zfs
properties:
echo   /mnt/zpool-properties.txt
for pool in `zpool list | grep -v '^NAME ' | sed 's/ .*//'` ; do
echo - | tee -a
/mnt/zpool-properties.txt
echo zpool get all $pool | tee -a /mnt/zpool-properties.txt
zpool get all $pool | tee -a /mnt/zpool-properties.txt
done
echo   /mnt/zfs-properties.txt
for fs in `zfs list | grep -v @ | grep -v '^NAME ' | sed 's/ .*//'` ; do
echo - | tee -a
/mnt/zfs-properties.txt
echo zfs get all $fs | tee -a /mnt/zfs-properties.txt
zfs get all $fs | tee -a /mnt/zfs-properties.txt
done

Notice:  The above will also capture info about dump  swap, which might
be important, 
so you know what sizes  blocksizes they are.


Now suppose a disaster has happened.  You need to restore.
Boot from the CD.
Choose Solaris
Choose 6.  Single User Shell

To bring up the network:
ifconfig -a plumb
ifconfig -a
(Notice the name of the network adapter.  In my case, it's e1000g0)
ifconfig e1000g0 192.168.1.100/24 up

mount 192.168.1.105:/backupdir /mnt

Verify that you have access to the backup images.  Now prepare your boot
disk as follows:

format -e
(Select the appropriate disk)
fdisk
(no fdisk table exists.  Yes, create default)
partition
(choose to modify a table based on hog)
(in my example, I'm using c1t0d0s0 for rpool)

zpool create -f -o failmode=continue -R /a -m legacy rpool c1t0d0s0

7z x /mnt/rpool.zfssend.7z -so | zfs receive -F rpool
(notice: the first one requires -F because it already exists.  The
others don't need this.)

7z x /mnt/rpool_ROOT.zfssend.7z -so | zfs receive rpool/ROOT
7z x /mnt/rpool_ROOT_machinename_slash.zfssend.7z -so | zfs receive
rpool/ROOT/machinename_slash

zfs set mountpoint=/rpool rpool
zfs set mountpoint=legacy rpool/ROOT
zfs set mountpoint=/  rpool/ROOT/machinename_slash

zpool set 

Re: [zfs-discuss] Thinking about spliting a zpool in system and data

2012-01-06 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Fajar A. Nugraha
 
  c) Currently Solaris decides to activate write caching in the SATA
  disks, nice. What would happen if I still use the complete disks BUT
  with two slices instead of one?. Would it still have write cache
  enabled?. And yes, I have checked that the cache flush works as
  expected, because I can only do around one hundred write+sync per
  second.
 
 You can enable disk cache manually using format.

I'm not aware of any automatic way to make this work correctly.  I wrote
some scripts to run in cron, if you're interested.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thinking about spliting a zpool in system and data

2012-01-06 Thread Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.


may be one can do the following (assume c0t0d0 and c0t1d0)
1)split rpool mirror: zpool split rpool newpool c0t1d0s0
1b)zpool destroy newpool
2)partition 2nd hdd c0t1d0s0 into two slice (s0 and s1)
3)zpool create rpool2 c0t1d0s1
4)use lucreate  -c c0t0d0s0 -n new-zfsbe -p c0t1d0s0
5)lustatus
c0t0d0s0
new-zfsbe
6)luactivate new-zfsbe
7)init 6
now you have two BE old and new
you can create dpool on  slice1 add L2ARC and zil and repartition c0t0d0
if you want you can create  rpool on c0t0d0s0 and new BE so everything 
will be name rpool for root pool


SWAP and DUMP can be on different rpool

good luck


On 1/6/2012 12:32 AM, Jesus Cea wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sorry if this list is inappropriate. Pointers welcomed.

Using Solaris 10 Update 10, x86-64.

I have been a ZFS heavy user since available, and I love the system.
My servers are usually small (two disks) and usually hosted in a
datacenter, so I usually create a ZPOOL used both for system and data.
That is, the entire system contains an unique two-disk zpool.

This have worked nice so far.

But my new servers have SSD too. Using them for L2ARC is easy enough,
but I can not use them as ZIL because no separate ZIL device can be
used in root zpools. Ugh, that hurts!.

So I am thinking about splitting my full two-disk zpool in two zpools,
one for system and other for data. Both using both disks for
mirroring. So I would have two slices per disk.

I have the system in production in a datacenter I can not access, but
I have remote KVM access. Servers are in production, I can't reinstall
but I could be allowed to have small (minutes) downtimes for a while.

My plan is this:

1. Do a scrub to be sure the data is OK in both disks.

2. Break the mirror. The A disk will keep working, B disk is idle.

3. Partition B disk with two slices instead of current full disk slice.

4. Create a system zpool in B.

5. Snapshot zpool/ROOT in A and zfs send it to system in B.
Repeat several times until we have a recent enough copy. This stream
will contain the OS and the zones root datasets. I have zones.

6. Change GRUB to boot from system instead of zpool. Cross fingers
and reboot. Do I have to touch the bootfs property?

Now ideally I would be able to have system as the zpool root. The
zones would be mounted from the old datasets.

7. If everything is OK, I would zfs send the data from the old zpool
to the new one. After doing a few times to get a recent copy, I would
stop the zones and do a final copy, to be sure I have all data, no
changes in progress.

8. I would change the zone manifest to mount the data in the new zpool.

9. I would restart the zones and be sure everything seems ok.

10. I would restart the computer to be sure everything works.

So fair, it this doesn't work, I could go back to the old situation
simply changing the GRUB boot to the old zpool.

11. If everything works, I would destroy the original zpool in A,
partition the disk and recreate the mirroring, with B as the source.

12. Reboot to be sure everything is OK.

So, my questions:

a) Is this workflow reasonable and would work?. Is the procedure
documented anywhere?. Suggestions?. Pitfalls?

b) *MUST* SWAP and DUMP ZVOLs reside in the root zpool or can they
live in a nonsystem zpool? (always plugged and available). I would
like to have a quite small(let say 30GB, I use Live Upgrade and quite
a fez zones) system zpool, but my swap is huge (32 GB and yes, I use
it) and I would rather prefer to have SWAP and DUMP in the data zpool,
if possible  supported.

c) Currently Solaris decides to activate write caching in the SATA
disks, nice. What would happen if I still use the complete disks BUT
with two slices instead of one?. Would it still have write cache
enabled?. And yes, I have checked that the cache flush works as
expected, because I can only do around one hundred write+sync per
second.

Advices?.

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/

j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTwaHW5lgi5GaxT1NAQLe/AP9EIK0tckVBhqzrTHWbNzT2TPUGYc7ZYjS
pZYX1EXkJNxVOmmXrWApmoVFGtYbwWeaSQODqE9XY5rUZurEbYrXOmejF2olvBPL
zyGFMnZTcmWLTrlwH5vaXeEJOSBZBqzwMWPR/uv2Z/a9JWO2nbidcV1OAzVdT2zU
kfboJpbxONQ=
=6i+A
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Hung-Sheng Tsao Ph D.
Founder  Principal
HopBit GridComputing LLC

Re: [zfs-discuss] Thinking about spliting a zpool in system and data

2012-01-06 Thread Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.


correction
On 1/6/2012 3:34 PM, Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. wrote:


may be one can do the following (assume c0t0d0 and c0t1d0)
1)split rpool mirror: zpool split rpool newpool c0t1d0s0
1b)zpool destroy newpool
2)partition 2nd hdd c0t1d0s0 into two slice (s0 and s1)
3)zpool create rpool2 c0t1d0s1 ===should be c0t1d0s0
4)use lucreate  -c c0t0d0s0 -n new-zfsbe -p c0t1d0s0 ==rpool2
5)lustatus
c0t0d0s0
new-zfsbe
6)luactivate new-zfsbe
7)init 6
now you have two BE old and new
you can create dpool on  slice1 add L2ARC and zil and repartition c0t0d0
if you want you can create  rpool on c0t0d0s0 and new BE so everything 
will be name rpool for root pool


SWAP and DUMP can be on different rpool

good luck


On 1/6/2012 12:32 AM, Jesus Cea wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sorry if this list is inappropriate. Pointers welcomed.

Using Solaris 10 Update 10, x86-64.

I have been a ZFS heavy user since available, and I love the system.
My servers are usually small (two disks) and usually hosted in a
datacenter, so I usually create a ZPOOL used both for system and data.
That is, the entire system contains an unique two-disk zpool.

This have worked nice so far.

But my new servers have SSD too. Using them for L2ARC is easy enough,
but I can not use them as ZIL because no separate ZIL device can be
used in root zpools. Ugh, that hurts!.

So I am thinking about splitting my full two-disk zpool in two zpools,
one for system and other for data. Both using both disks for
mirroring. So I would have two slices per disk.

I have the system in production in a datacenter I can not access, but
I have remote KVM access. Servers are in production, I can't reinstall
but I could be allowed to have small (minutes) downtimes for a while.

My plan is this:

1. Do a scrub to be sure the data is OK in both disks.

2. Break the mirror. The A disk will keep working, B disk is idle.

3. Partition B disk with two slices instead of current full disk slice.

4. Create a system zpool in B.

5. Snapshot zpool/ROOT in A and zfs send it to system in B.
Repeat several times until we have a recent enough copy. This stream
will contain the OS and the zones root datasets. I have zones.

6. Change GRUB to boot from system instead of zpool. Cross fingers
and reboot. Do I have to touch the bootfs property?

Now ideally I would be able to have system as the zpool root. The
zones would be mounted from the old datasets.

7. If everything is OK, I would zfs send the data from the old zpool
to the new one. After doing a few times to get a recent copy, I would
stop the zones and do a final copy, to be sure I have all data, no
changes in progress.

8. I would change the zone manifest to mount the data in the new zpool.

9. I would restart the zones and be sure everything seems ok.

10. I would restart the computer to be sure everything works.

So fair, it this doesn't work, I could go back to the old situation
simply changing the GRUB boot to the old zpool.

11. If everything works, I would destroy the original zpool in A,
partition the disk and recreate the mirroring, with B as the source.

12. Reboot to be sure everything is OK.

So, my questions:

a) Is this workflow reasonable and would work?. Is the procedure
documented anywhere?. Suggestions?. Pitfalls?

b) *MUST* SWAP and DUMP ZVOLs reside in the root zpool or can they
live in a nonsystem zpool? (always plugged and available). I would
like to have a quite small(let say 30GB, I use Live Upgrade and quite
a fez zones) system zpool, but my swap is huge (32 GB and yes, I use
it) and I would rather prefer to have SWAP and DUMP in the data zpool,
if possible  supported.

c) Currently Solaris decides to activate write caching in the SATA
disks, nice. What would happen if I still use the complete disks BUT
with two slices instead of one?. Would it still have write cache
enabled?. And yes, I have checked that the cache flush works as
expected, because I can only do around one hundred write+sync per
second.

Advices?.

- -- Jesus Cea Avion _/_/  _/_/_/
_/_/_/

j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTwaHW5lgi5GaxT1NAQLe/AP9EIK0tckVBhqzrTHWbNzT2TPUGYc7ZYjS
pZYX1EXkJNxVOmmXrWApmoVFGtYbwWeaSQODqE9XY5rUZurEbYrXOmejF2olvBPL
zyGFMnZTcmWLTrlwH5vaXeEJOSBZBqzwMWPR/uv2Z/a9JWO2nbidcV1OAzVdT2zU
kfboJpbxONQ=
=6i+A
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org

Re: [zfs-discuss] Thinking about spliting a zpool in system and data

2012-01-05 Thread Fajar A. Nugraha
On Fri, Jan 6, 2012 at 12:32 PM, Jesus Cea j...@jcea.es wrote:
 So, my questions:

 a) Is this workflow reasonable and would work?. Is the procedure
 documented anywhere?. Suggestions?. Pitfalls?

try 
http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Complete_Solaris_ZFS_Root_Pool_Recovery


 b) *MUST* SWAP and DUMP ZVOLs reside in the root zpool or can they
 live in a nonsystem zpool? (always plugged and available). I would
 like to have a quite small(let say 30GB, I use Live Upgrade and quite
 a fez zones) system zpool, but my swap is huge (32 GB and yes, I use
 it) and I would rather prefer to have SWAP and DUMP in the data zpool,
 if possible  supported.

try it? :D

Last time i played around with S11, you could even go without swap and
dump (with some manual setup).


 c) Currently Solaris decides to activate write caching in the SATA
 disks, nice. What would happen if I still use the complete disks BUT
 with two slices instead of one?. Would it still have write cache
 enabled?. And yes, I have checked that the cache flush works as
 expected, because I can only do around one hundred write+sync per
 second.

You can enable disk cache manually using format.

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss