Re: [zfs-discuss] Wanted: sanity check for a clustered ZFS idea

2011-10-13 Thread Jim Klimov

Hello all,


Definitely not impossible, but please work on the business case.
Remember, it is easier to build hardware than software, so your
software solution must be sufficiently advanced to not be obsoleted
by the next few hardware generations.
  -- richard


I guess Richard was correct about the usecase description -
I should detail what I'm thinking about, to give some illustration.
Coming from a software company though, I tend to think of
software being the more flexible part of equation. This is
something we have a chance to change. We use whatever
hardware is given to us from above, for years...

When thinking about the problem and its applications to life,
I have in mind blade servers farms like Intel MFSYS25 which
include relatively large internal storage and you can possibly
add external SAS storage. We use such server farms as
self-contained units (a single chassis plugged into customer's
network) for a number of projects, and recently more and more
of these deployments become VMWare ESX farms with shared
VMFS. Due to my stronger love for things Solaris, I would love
to see ZFS and any of Solaris-based hypervisors (VBox, Xen
or KVM ports) running there instead. But for things to be as
efficient, ZFS would have to become shared - clustered...

I think I would have to elaborate more on this hardware, as
it tends to be our major usecase, and thus a limitation which
influences my approach to clustered ZFS and belief whatever
shortcuts are appropriate.

These boxes have a shared chassis to accomodate 6 server
blades, each with 2 CPUs and 2 or 4 gigabit ethernet ports.
The chassis also has single or dual ethernet switches to interlink
the servers and to connect to external world (10 ext ports each),
as well as single or dual storage controllers and 14 internal HDD
bays. External SAS boxes can also be attached to the storage
controller modules, but I haven't yet seen real setups like that.

In normal "Intel usecase", the controller(s) implement several
RAID LUNs which are accessible to the servers via SAS
(with MPIO in case of dual controllers). Usually these LUNs
are dedicated to servers - for example, boot/OS volumes.

With an additional license from Intel, Shared LUNs can be
implemented on the chassis - these are primarily aimed at
VMWare farms with clustered VMFS to use available disk
space (and multiple-spindle aggregated bandwidths) more
efficiently, as well as aid in VM migration.

To be clearer, I should say that modern VM hypervisors can
migrate running virtual machines between two VM hosts.

Usually (with dedicated storage for each server host) they
do this by copying over the IP network their HDD image
files from an "old host" to "new host", transferring virtual
RAM contents, replumbing virtual networks and beginning
execution "from the same point" - after just a second-long
hiccup for finalization of the running VM's migration.

With clustered VMFS on shared storage, VMWare can
migrate VMs faster - it knows not to copy the HDD image
file in vain - it will be equally available to the "new host"
at the correct point in migration, just as it was accessible
to the "old host".

This is what I kind of hoped to reimplement with VirtualBox
or Xen or KVM running on OpenSolaris derivatives (such as
OpenIndiana and others), and the proposed "ZFS clustering"
using each HDD wholly as an individual LUN, aggregated into
a ZFS pool by the servers themselves. For many cases this
would also be cheaper, with OpenIndiana and free hypervisors ;)

As was rightfully noted, with a common ZFS pool as underlying
storage (as happens in current Sun VDI solutions using a ZFS
NAS), VM image clones can be instantiated quickly and efficiently
on resources - cheaper and faster than copying a golden image.

Now, at the risk of being accused pushing some "marketing"
through the discussion list, I have to state that these servers
are relatively cheap (if compared to 6 single-unit servers of
comparable configuration, dual managed ethernet switches,
a SAN with 14 disks + dual storage controllers). Price is an
important factor in many of our deployments, where these
boxes work stand-alone.

This usually starts with a POC, when a pre-configured
basic MFSYS with some VMs of our software arrives to
a customer, gets tailored and works like a "black box".
In a year or so an upgrade may come in form of added
disks, server blades and RAM. I have never heard even
discussions of adding external storage - too pricey, and
often useless with relatively fixed VM sizes - hence my
desire to get a single ZFS pool available to all the blades
equally. While dedicated storage boxes might be good
and great, they would bump the solution price by orders
of magnitude (StorEdge 7000 series) and are generally
out of question for our limited deployments.

Thanks to Nico for concerns about POSIX locking. However,
hopefully, in the usecase I described - serving images of
VMs in a manner where storage, access and migration are
efficient - whole datasets (

Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive

2011-10-13 Thread John D Groenveld
In message <4e970387.3040...@oracle.com>, Cindy Swearingen writes:
>Any USB-related messages in /var/adm/messages for this device?

Negative.
cfgadm(1M) shows the drive and format->fdisk->analyze->read
runs merrily.

John
groenv...@acm.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive

2011-10-13 Thread John D Groenveld
In message <201110131150.p9dbo8yk011...@acsinet22.oracle.com>, Casper.Dik@oracl
e.com writes:
>What is the partition table?

I thought about that so I reproduced with the legacy SMI label
and a Solaris fdisk partition with ZFS on slice 0.
Same result as EFI; once I export the pool I cannot import it.

John
groenv...@acm.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive

2011-10-13 Thread Cindy Swearingen

John,

Any USB-related messages in /var/adm/messages for this device?

Thanks,

Cindy

On 10/12/11 11:29, John D Groenveld wrote:

In message <4e95cb2a.30...@oracle.com>, Cindy Swearingen writes:

What is the error when you attempt to import this pool?


"cannot import 'foo': no such pool available"
John
groenv...@acm.org

# format -e
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c1t0d0 
  /pci@0,0/pci108e,6676@2,1/hub@7/storage@2/disk@0,0
   1. c8t0d0 
  /pci@0,0/pci108e,6676@5/disk@0,0
   2. c8t1d0 
  /pci@0,0/pci108e,6676@5/disk@1,0
Specify disk (enter its number): ^C
# zpool create foo c1t0d0
# zfs create foo/bar
# zfs list -r foo
NAME  USED  AVAIL  REFER  MOUNTPOINT
foo   126K  2.68T32K  /foo
foo/bar31K  2.68T31K  /foo/bar
# zpool export foo
# zfs list -r foo
cannot open 'foo': dataset does not exist
# truss -t open zpool import foo
open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT
open("/lib/libumem.so.1", O_RDONLY) = 3
open("/lib/libc.so.1", O_RDONLY)= 3
open("/lib/libzfs.so.1", O_RDONLY)  = 3
open("/usr/lib/fm//libtopo.so", O_RDONLY)   = 3
open("/lib/libxml2.so.2", O_RDONLY) = 3
open("/lib/libpthread.so.1", O_RDONLY)  = 3
open("/lib/libz.so.1", O_RDONLY)= 3
open("/lib/libm.so.2", O_RDONLY)= 3
open("/lib/libsocket.so.1", O_RDONLY)   = 3
open("/lib/libnsl.so.1", O_RDONLY)  = 3
open("/usr/lib//libshare.so.1", O_RDONLY)   = 3
open("/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_SGS.mo", O_RDONLY) Err#2 
ENOENT
open("/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_OSLIB.mo", O_RDONLY) 
Err#2 ENOENT
open("/usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3", O_RDONLY) = 3
open("/usr/lib/locale/en_US.UTF-8/methods_unicode.so.3", O_RDONLY) = 3
open("/dev/zfs", O_RDWR)= 3
open("/etc/mnttab", O_RDONLY)   = 4
open("/etc/dfs/sharetab", O_RDONLY) = 5
open("/lib/libavl.so.1", O_RDONLY)  = 6
open("/lib/libnvpair.so.1", O_RDONLY)   = 6
open("/lib/libuutil.so.1", O_RDONLY)= 6
open64("/dev/rdsk/", O_RDONLY)  = 6
/3: openat64(6, "c8t0d0s0", O_RDONLY)   = 9
/3: open("/lib/libadm.so.1", O_RDONLY)  = 15
/9: openat64(6, "c8t0d0s2", O_RDONLY)   = 13
/5: openat64(6, "c8t1d0s0", O_RDONLY)   = 10
/7: openat64(6, "c8t1d0s2", O_RDONLY)   = 14
/8: openat64(6, "c1t0d0s0", O_RDONLY)   = 7
/4: openat64(6, "c1t0d0s2", O_RDONLY)   Err#5 EIO
/8: open("/lib/libefi.so.1", O_RDONLY)  = 15
/3: openat64(6, "c1t0d0", O_RDONLY) = 9
/5: openat64(6, "c1t0d0p0", O_RDONLY)   = 10
/9: openat64(6, "c1t0d0p1", O_RDONLY)   = 13
/7: openat64(6, "c1t0d0p2", O_RDONLY)   Err#5 EIO
/4: openat64(6, "c1t0d0p3", O_RDONLY)   Err#5 EIO
/7: openat64(6, "c1t0d0s8", O_RDONLY)   = 14
/2: openat64(6, "c7t0d0s0", O_RDONLY)   = 8
/6: openat64(6, "c7t0d0s2", O_RDONLY)   = 12
/1: Received signal #20, SIGWINCH, in lwp_park() [default]
/3: openat64(6, "c7t0d0p0", O_RDONLY)   = 9
/4: openat64(6, "c7t0d0p1", O_RDONLY)   = 11
/5: openat64(6, "c7t0d0p2", O_RDONLY)   = 10
/6: openat64(6, "c8t0d0p0", O_RDONLY)   = 12
/6: openat64(6, "c8t0d0p1", O_RDONLY)   = 12
/6: openat64(6, "c8t0d0p2", O_RDONLY)   Err#5 EIO
/6: openat64(6, "c8t0d0p3", O_RDONLY)   Err#5 EIO
/6: openat64(6, "c8t0d0p4", O_RDONLY)   Err#5 EIO
/6: openat64(6, "c8t1d0p0", O_RDONLY)   = 12
/8: openat64(6, "c7t0d0p3", O_RDONLY)   = 7
/6: openat64(6, "c8t1d0p1", O_RDONLY)   = 12
/6: openat64(6, "c8t1d0p2", O_RDONLY)   Err#5 EIO
/6: openat64(6, "c8t1d0p3", O_RDONLY)   Err#5 EIO
/6: openat64(6, "c8t1d0p4", O_RDONLY)   Err#5 EIO
/9: openat64(6, "c7t0d0p4", O_RDONLY)   = 13
/7: openat64(6, "c7t0d0s1", O_RDONLY)   = 14
/1: open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_OSCMD.cat", 
O_RDONLY) Err#2 ENOENT
open("/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_OSCMD.mo", O_RDONLY) 
Err#2 ENOENT
cannot import 'foo': no such pool available
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive

2011-10-13 Thread Casper . Dik

>> From: casper@oracle.com [mailto:casper@oracle.com]
>> 
>> What is the partition table?
>
>He also said this...
>
>
>> -Original Message-
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of John D Groenveld
>>
>> # zpool create foo c1t0d0
>
>Which, to me, suggests no partition table.

An EFI partition table (there needs to be some form of label so there
is always a partition table).

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive

2011-10-13 Thread Edward Ned Harvey
> From: casper@oracle.com [mailto:casper@oracle.com]
> 
> What is the partition table?

He also said this...


> -Original Message-
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of John D Groenveld
>
> # zpool create foo c1t0d0

Which, to me, suggests no partition table.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive

2011-10-13 Thread Casper . Dik

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Cindy Swearingen
>> 
>> In the steps below, you're missing a zpool import step.
>> I would like to see the error message when the zpool import
>> step fails.
>
>I see him doing this...
>
>
>> > # truss -t open zpool import foo
>
>The following lines are informative, sort of.
>
>
>> > /8: openat64(6, "c1t0d0s0", O_RDONLY)   = 7
>> > /4: openat64(6, "c1t0d0s2", O_RDONLY)   Err#5 EIO
>
>And the output result is:
>
>
>> > cannot import 'foo': no such pool available
>

What is the partition table?

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive

2011-10-13 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Cindy Swearingen
> 
> In the steps below, you're missing a zpool import step.
> I would like to see the error message when the zpool import
> step fails.

I see him doing this...


> > # truss -t open zpool import foo

The following lines are informative, sort of.


> > /8: openat64(6, "c1t0d0s0", O_RDONLY)   = 7
> > /4: openat64(6, "c1t0d0s2", O_RDONLY)   Err#5 EIO

And the output result is:


> > cannot import 'foo': no such pool available


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] commercial zfs-based storage replication software?

2011-10-13 Thread Darren J Moffat

On 10/13/11 09:27, Fajar A. Nugraha wrote:

On Tue, Oct 11, 2011 at 5:26 PM, Darren J Moffat
  wrote:

Have you looked at the time-slider functionality that is already in Solaris
?


Hi Darren. Is it available for Solaris 10? I just installed Solaris 10
u10 and couldn't find it.


No it is not.


There is a GUI for configuration of the snapshots


the screenshots that I can find all refer to opensolaris


and time-slider can be
configured to do a 'zfs send' or 'rsync'.  The GUI doesn't have the ability
to set the 'zfs recv' command but that is set one-time in the SMF service
properties.


Is there a reference on how to get/install this functionality on Solaris 10?


No because it doesn't exist on Solaris 10.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] commercial zfs-based storage replication software?

2011-10-13 Thread Fajar A. Nugraha
On Tue, Oct 11, 2011 at 5:26 PM, Darren J Moffat
 wrote:
> Have you looked at the time-slider functionality that is already in Solaris
> ?

Hi Darren. Is it available for Solaris 10? I just installed Solaris 10
u10 and couldn't find it.

>
> There is a GUI for configuration of the snapshots

the screenshots that I can find all refer to opensolaris

> and time-slider can be
> configured to do a 'zfs send' or 'rsync'.  The GUI doesn't have the ability
> to set the 'zfs recv' command but that is set one-time in the SMF service
> properties.

Is there a reference on how to get/install this functionality on Solaris 10?

Thanks,

Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss