Re: [zfs-discuss] Wanted: sanity check for a clustered ZFS idea
Hello all, Definitely not impossible, but please work on the business case. Remember, it is easier to build hardware than software, so your software solution must be sufficiently advanced to not be obsoleted by the next few hardware generations. -- richard I guess Richard was correct about the usecase description - I should detail what I'm thinking about, to give some illustration. Coming from a software company though, I tend to think of software being the more flexible part of equation. This is something we have a chance to change. We use whatever hardware is given to us from above, for years... When thinking about the problem and its applications to life, I have in mind blade servers farms like Intel MFSYS25 which include relatively large internal storage and you can possibly add external SAS storage. We use such server farms as self-contained units (a single chassis plugged into customer's network) for a number of projects, and recently more and more of these deployments become VMWare ESX farms with shared VMFS. Due to my stronger love for things Solaris, I would love to see ZFS and any of Solaris-based hypervisors (VBox, Xen or KVM ports) running there instead. But for things to be as efficient, ZFS would have to become shared - clustered... I think I would have to elaborate more on this hardware, as it tends to be our major usecase, and thus a limitation which influences my approach to clustered ZFS and belief whatever shortcuts are appropriate. These boxes have a shared chassis to accomodate 6 server blades, each with 2 CPUs and 2 or 4 gigabit ethernet ports. The chassis also has single or dual ethernet switches to interlink the servers and to connect to external world (10 ext ports each), as well as single or dual storage controllers and 14 internal HDD bays. External SAS boxes can also be attached to the storage controller modules, but I haven't yet seen real setups like that. In normal "Intel usecase", the controller(s) implement several RAID LUNs which are accessible to the servers via SAS (with MPIO in case of dual controllers). Usually these LUNs are dedicated to servers - for example, boot/OS volumes. With an additional license from Intel, Shared LUNs can be implemented on the chassis - these are primarily aimed at VMWare farms with clustered VMFS to use available disk space (and multiple-spindle aggregated bandwidths) more efficiently, as well as aid in VM migration. To be clearer, I should say that modern VM hypervisors can migrate running virtual machines between two VM hosts. Usually (with dedicated storage for each server host) they do this by copying over the IP network their HDD image files from an "old host" to "new host", transferring virtual RAM contents, replumbing virtual networks and beginning execution "from the same point" - after just a second-long hiccup for finalization of the running VM's migration. With clustered VMFS on shared storage, VMWare can migrate VMs faster - it knows not to copy the HDD image file in vain - it will be equally available to the "new host" at the correct point in migration, just as it was accessible to the "old host". This is what I kind of hoped to reimplement with VirtualBox or Xen or KVM running on OpenSolaris derivatives (such as OpenIndiana and others), and the proposed "ZFS clustering" using each HDD wholly as an individual LUN, aggregated into a ZFS pool by the servers themselves. For many cases this would also be cheaper, with OpenIndiana and free hypervisors ;) As was rightfully noted, with a common ZFS pool as underlying storage (as happens in current Sun VDI solutions using a ZFS NAS), VM image clones can be instantiated quickly and efficiently on resources - cheaper and faster than copying a golden image. Now, at the risk of being accused pushing some "marketing" through the discussion list, I have to state that these servers are relatively cheap (if compared to 6 single-unit servers of comparable configuration, dual managed ethernet switches, a SAN with 14 disks + dual storage controllers). Price is an important factor in many of our deployments, where these boxes work stand-alone. This usually starts with a POC, when a pre-configured basic MFSYS with some VMs of our software arrives to a customer, gets tailored and works like a "black box". In a year or so an upgrade may come in form of added disks, server blades and RAM. I have never heard even discussions of adding external storage - too pricey, and often useless with relatively fixed VM sizes - hence my desire to get a single ZFS pool available to all the blades equally. While dedicated storage boxes might be good and great, they would bump the solution price by orders of magnitude (StorEdge 7000 series) and are generally out of question for our limited deployments. Thanks to Nico for concerns about POSIX locking. However, hopefully, in the usecase I described - serving images of VMs in a manner where storage, access and migration are efficient - whole datasets (
Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive
In message <4e970387.3040...@oracle.com>, Cindy Swearingen writes: >Any USB-related messages in /var/adm/messages for this device? Negative. cfgadm(1M) shows the drive and format->fdisk->analyze->read runs merrily. John groenv...@acm.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive
In message <201110131150.p9dbo8yk011...@acsinet22.oracle.com>, Casper.Dik@oracl e.com writes: >What is the partition table? I thought about that so I reproduced with the legacy SMI label and a Solaris fdisk partition with ZFS on slice 0. Same result as EFI; once I export the pool I cannot import it. John groenv...@acm.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive
John, Any USB-related messages in /var/adm/messages for this device? Thanks, Cindy On 10/12/11 11:29, John D Groenveld wrote: In message <4e95cb2a.30...@oracle.com>, Cindy Swearingen writes: What is the error when you attempt to import this pool? "cannot import 'foo': no such pool available" John groenv...@acm.org # format -e Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c1t0d0 /pci@0,0/pci108e,6676@2,1/hub@7/storage@2/disk@0,0 1. c8t0d0 /pci@0,0/pci108e,6676@5/disk@0,0 2. c8t1d0 /pci@0,0/pci108e,6676@5/disk@1,0 Specify disk (enter its number): ^C # zpool create foo c1t0d0 # zfs create foo/bar # zfs list -r foo NAME USED AVAIL REFER MOUNTPOINT foo 126K 2.68T32K /foo foo/bar31K 2.68T31K /foo/bar # zpool export foo # zfs list -r foo cannot open 'foo': dataset does not exist # truss -t open zpool import foo open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT open("/lib/libumem.so.1", O_RDONLY) = 3 open("/lib/libc.so.1", O_RDONLY)= 3 open("/lib/libzfs.so.1", O_RDONLY) = 3 open("/usr/lib/fm//libtopo.so", O_RDONLY) = 3 open("/lib/libxml2.so.2", O_RDONLY) = 3 open("/lib/libpthread.so.1", O_RDONLY) = 3 open("/lib/libz.so.1", O_RDONLY)= 3 open("/lib/libm.so.2", O_RDONLY)= 3 open("/lib/libsocket.so.1", O_RDONLY) = 3 open("/lib/libnsl.so.1", O_RDONLY) = 3 open("/usr/lib//libshare.so.1", O_RDONLY) = 3 open("/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_SGS.mo", O_RDONLY) Err#2 ENOENT open("/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_OSLIB.mo", O_RDONLY) Err#2 ENOENT open("/usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3", O_RDONLY) = 3 open("/usr/lib/locale/en_US.UTF-8/methods_unicode.so.3", O_RDONLY) = 3 open("/dev/zfs", O_RDWR)= 3 open("/etc/mnttab", O_RDONLY) = 4 open("/etc/dfs/sharetab", O_RDONLY) = 5 open("/lib/libavl.so.1", O_RDONLY) = 6 open("/lib/libnvpair.so.1", O_RDONLY) = 6 open("/lib/libuutil.so.1", O_RDONLY)= 6 open64("/dev/rdsk/", O_RDONLY) = 6 /3: openat64(6, "c8t0d0s0", O_RDONLY) = 9 /3: open("/lib/libadm.so.1", O_RDONLY) = 15 /9: openat64(6, "c8t0d0s2", O_RDONLY) = 13 /5: openat64(6, "c8t1d0s0", O_RDONLY) = 10 /7: openat64(6, "c8t1d0s2", O_RDONLY) = 14 /8: openat64(6, "c1t0d0s0", O_RDONLY) = 7 /4: openat64(6, "c1t0d0s2", O_RDONLY) Err#5 EIO /8: open("/lib/libefi.so.1", O_RDONLY) = 15 /3: openat64(6, "c1t0d0", O_RDONLY) = 9 /5: openat64(6, "c1t0d0p0", O_RDONLY) = 10 /9: openat64(6, "c1t0d0p1", O_RDONLY) = 13 /7: openat64(6, "c1t0d0p2", O_RDONLY) Err#5 EIO /4: openat64(6, "c1t0d0p3", O_RDONLY) Err#5 EIO /7: openat64(6, "c1t0d0s8", O_RDONLY) = 14 /2: openat64(6, "c7t0d0s0", O_RDONLY) = 8 /6: openat64(6, "c7t0d0s2", O_RDONLY) = 12 /1: Received signal #20, SIGWINCH, in lwp_park() [default] /3: openat64(6, "c7t0d0p0", O_RDONLY) = 9 /4: openat64(6, "c7t0d0p1", O_RDONLY) = 11 /5: openat64(6, "c7t0d0p2", O_RDONLY) = 10 /6: openat64(6, "c8t0d0p0", O_RDONLY) = 12 /6: openat64(6, "c8t0d0p1", O_RDONLY) = 12 /6: openat64(6, "c8t0d0p2", O_RDONLY) Err#5 EIO /6: openat64(6, "c8t0d0p3", O_RDONLY) Err#5 EIO /6: openat64(6, "c8t0d0p4", O_RDONLY) Err#5 EIO /6: openat64(6, "c8t1d0p0", O_RDONLY) = 12 /8: openat64(6, "c7t0d0p3", O_RDONLY) = 7 /6: openat64(6, "c8t1d0p1", O_RDONLY) = 12 /6: openat64(6, "c8t1d0p2", O_RDONLY) Err#5 EIO /6: openat64(6, "c8t1d0p3", O_RDONLY) Err#5 EIO /6: openat64(6, "c8t1d0p4", O_RDONLY) Err#5 EIO /9: openat64(6, "c7t0d0p4", O_RDONLY) = 13 /7: openat64(6, "c7t0d0s1", O_RDONLY) = 14 /1: open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_OSCMD.cat", O_RDONLY) Err#2 ENOENT open("/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_OSCMD.mo", O_RDONLY) Err#2 ENOENT cannot import 'foo': no such pool available ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive
>> From: casper@oracle.com [mailto:casper@oracle.com] >> >> What is the partition table? > >He also said this... > > >> -Original Message- >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of John D Groenveld >> >> # zpool create foo c1t0d0 > >Which, to me, suggests no partition table. An EFI partition table (there needs to be some form of label so there is always a partition table). Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive
> From: casper@oracle.com [mailto:casper@oracle.com] > > What is the partition table? He also said this... > -Original Message- > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of John D Groenveld > > # zpool create foo c1t0d0 Which, to me, suggests no partition table. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Cindy Swearingen >> >> In the steps below, you're missing a zpool import step. >> I would like to see the error message when the zpool import >> step fails. > >I see him doing this... > > >> > # truss -t open zpool import foo > >The following lines are informative, sort of. > > >> > /8: openat64(6, "c1t0d0s0", O_RDONLY) = 7 >> > /4: openat64(6, "c1t0d0s2", O_RDONLY) Err#5 EIO > >And the output result is: > > >> > cannot import 'foo': no such pool available > What is the partition table? Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Cindy Swearingen > > In the steps below, you're missing a zpool import step. > I would like to see the error message when the zpool import > step fails. I see him doing this... > > # truss -t open zpool import foo The following lines are informative, sort of. > > /8: openat64(6, "c1t0d0s0", O_RDONLY) = 7 > > /4: openat64(6, "c1t0d0s2", O_RDONLY) Err#5 EIO And the output result is: > > cannot import 'foo': no such pool available ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] commercial zfs-based storage replication software?
On 10/13/11 09:27, Fajar A. Nugraha wrote: On Tue, Oct 11, 2011 at 5:26 PM, Darren J Moffat wrote: Have you looked at the time-slider functionality that is already in Solaris ? Hi Darren. Is it available for Solaris 10? I just installed Solaris 10 u10 and couldn't find it. No it is not. There is a GUI for configuration of the snapshots the screenshots that I can find all refer to opensolaris and time-slider can be configured to do a 'zfs send' or 'rsync'. The GUI doesn't have the ability to set the 'zfs recv' command but that is set one-time in the SMF service properties. Is there a reference on how to get/install this functionality on Solaris 10? No because it doesn't exist on Solaris 10. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] commercial zfs-based storage replication software?
On Tue, Oct 11, 2011 at 5:26 PM, Darren J Moffat wrote: > Have you looked at the time-slider functionality that is already in Solaris > ? Hi Darren. Is it available for Solaris 10? I just installed Solaris 10 u10 and couldn't find it. > > There is a GUI for configuration of the snapshots the screenshots that I can find all refer to opensolaris > and time-slider can be > configured to do a 'zfs send' or 'rsync'. The GUI doesn't have the ability > to set the 'zfs recv' command but that is set one-time in the SMF service > properties. Is there a reference on how to get/install this functionality on Solaris 10? Thanks, Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss