date:20070124

Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Bryan Cantrill


On Wed, Jan 24, 2007 at 12:15:21AM -0700, Jason J. W. Williams wrote:
 Wow. That's an incredibly cool story. Thank you for sharing it! Does
 the Thumper today pretty much resemble what you saw then?

Yes, amazingly so:  4-way, 48 spindles, 4u.  The real beauty of the
match between ZFS and Thumper was (and is) that ZFS unlocks new economics
in storage -- smart software achieving high performance and ultra-high
reliability with dense, cheap hardware -- and that Thumper was (and is)
the physical embodiment of those economics.   And without giving away
too much of our future roadmap, suffice it to say that one should expect
much, much more from Sun in this vein:  innovative software and innovative
hardware working together to deliver world-beating systems with undeniable
economics.

And actually, as long as we're talking history, you might be interested to
know the story behind the name Thumper:  Fowler initially suggested the
name as something of a joke, but, as often happens with Fowler,  he tells a
joke with a straight face once too many to one person too many, and next
thing you know it's the plan of record.  I had suggested the name Humper
for the server that became Andromeda (the x8000 series) -- so you could
order a datacenter by asking for (say) two Humpers and five Thumpers.
(And I loved the idea of asking would you like a Humper for your Thumper?)
But Fowler said the name was too risque (!).  Fortunately the name
Thumper stuck...

- Bryan

--
Bryan Cantrill, Solaris Kernel Development.   http://blogs.sun.com/bmc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-01-24 Thread Darren J Moffat


Rainer Heilke wrote:

For the clone another system zfs send/recv might be
useful


Keeping in mind that you only want to send/recv one half of the ZFS mirror...


Huh ?

That doesn't make any sense.  You can't send half a mirror.  When you 
are running zfs send it is a read and ZFS will read the data from all 
available mirrors to help performance.  When it is zfs recv it will 
write to all sides of the mirror on the destination.


What are you actually trying to say here ?

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Casper . Dik


Actually, it was meant to hold the entire electronic transcript of the 
George Bush impeachment proceedings ... we were thinking ahead.

Fortunately, larger disks became available in time.

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Chris Ridd

On 24/1/07 9:06, Bryan Cantrill [EMAIL PROTECTED] wrote:

 But Fowler said the name was too risque (!).  Fortunately the name
 Thumper stuck...

I assumed it was a reference to Bambi... That's what comes from having small
children :-)

Cheers,

Chris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Roland Rambau


Chris,

well, Thumper is actually a reference to Bambi

The comment about being risque was refering to Humper as
a codename proposed for a related server
( and e.g. leo.org confirms that is has a meaning labelled as [vulg.] :-)

  -- Roland


Chris Ridd schrieb:

On 24/1/07 9:06, Bryan Cantrill [EMAIL PROTECTED] wrote:


But Fowler said the name was too risque (!).  Fortunately the name
Thumper stuck...


I assumed it was a reference to Bambi... That's what comes from having small
children :-)

Cheers,

Chris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS direct IO

2007-01-24 Thread Roch - PAE

[EMAIL PROTECTED] writes:
   Note also that for most applications, the size of their IO operations
   would often not match the current page size of the buffer, causing
   additional performance and scalability issues.
  
  Thanks for mentioning this, I forgot about it.
  
  Since ZFS's default block size is configured to be larger than a page,
  the application would have to issue page-aligned block-sized I/Os.
  Anyone adjusting the block size would presumably be responsible for
  ensuring that the new size is a multiple of the page size.  (If they
  would want Direct I/O to work...)
  
  I believe UFS also has a similar requirement, but I've been wrong
  before.
  

I believe the UFS requirement is that the I/O be sector
aligned for DIO to be attempted. And Anton did mention that
one of the benefit of DIO is the ability to direct-read a
subpage block. Without UFS/DIO the OS is required to read and
cache the full page and the extra amount of I/O may lead to
data channel saturation (I don't see latency as an issue in
here, right ?).

This is where I said that such a feature would translate
for ZFS into the ability to read parts of a filesystem block 
which would only make sense if checksums are disabled.

And for RAID-Z that could mean avoiding I/Os to each disks but 
one in a group, so that's a nice benefit.

So  for the  performance  minded customer that can't  afford
mirroring, is not  much a fan  of data integrity, that needs
to do subblock reads to an  uncacheable workload, then I can
see a feature popping up. And this feature is independant on
whether   or not the data  is  DMA'ed straight into the user
buffer.

The  other  feature,  is to  avoid a   bcopy by  DMAing full
filesystem block reads straight into user buffer (and verify
checksum after). The I/O is high latency, bcopy adds a small
amount. The kernel memory can  be freed/reuse straight after
the user read  completes. This is  where I ask, how much CPU
is lost to the bcopy in workloads that benefit from DIO ?

At this point, there are lots of projects  that will lead to
performance improvements.  The DIO benefits seems like small
change in the context of ZFS.

The quickest return on  investement  I see for  the  directio
hint would be to tell ZFS to not grow the ARC when servicing
such requests.


-r



  -j
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] panic with zfs

2007-01-24 Thread Ihsan Dogan

Hello,

We're setting up a new mailserver infrastructure and decided, to run it
on zfs. On a E220R with a D1000, I've setup a storage pool with four
mirrors:

--
[EMAIL PROTECTED] # zpool status
  pool: pool0
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
pool0ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t0d0   ONLINE   0 0 0
c5t8d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t1d0   ONLINE   0 0 0
c5t9d0   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t2d0   ONLINE   0 0 0
c5t10d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c1t3d0   ONLINE   0 0 0
c5t11d0  ONLINE   0 0 0

errors: No known data errors
--

Before we start to install any software on it, we've got to the idea, to
see how zfs behaves when something goes wrong. So we pulled out a disk,
while a mkfile was running. What happened then, was not that, what we
expected. The system was hanging for more than an hour and finally it
paniced:

--
Jan 23 18:49:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0:
got SCSI bus reset
Jan 23 18:50:36 newponit scsi: [ID 365881 kern.info] /[EMAIL 
PROTECTED],4000/[EMAIL PROTECTED]
(glm0):
Jan 23 18:50:36 newponitCmd (0x6a3ed10) dump for Target 1 Lun 0:
Jan 23 18:50:36 newponit scsi: [ID 365881 kern.info] /[EMAIL 
PROTECTED],4000/[EMAIL PROTECTED]
(glm0):
Jan 23 18:50:36 newponitcdb=[ 0x2a 0x0 0x2 0x1b 0x2c 0x93 0x0 
0x0 0x1
0x0 ]
Jan 23 18:50:36 newponit scsi: [ID 365881 kern.info] /[EMAIL 
PROTECTED],4000/[EMAIL PROTECTED]
(glm0):
Jan 23 18:50:36 newponitpkt_flags=0xc000 pkt_statistics=0x60 
pkt_state=0x7
Jan 23 18:50:36 newponit scsi: [ID 365881 kern.info] /[EMAIL 
PROTECTED],4000/[EMAIL PROTECTED]
(glm0):
Jan 23 18:50:36 newponitpkt_scbp=0x0 cmd_flags=0x1860
Jan 23 18:50:36 newponit scsi: [ID 107833 kern.warning] WARNING:
/[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0):
Jan 23 18:50:36 newponitDisconnected tagged cmd(s) (1) timeout for
Target 1.0
Jan 23 18:50:36 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
fault detected in device; service still available
Jan 23 18:50:36 newponit genunix: [ID 611667 kern.info] NOTICE: glm0:
Disconnected tagged cmd(s) (1) timeout for Target 1.0
Jan 23 18:50:36 newponit glm: [ID 401478 kern.warning] WARNING:
ID[SUNWpd.glm.cmd_timeout.6018]
Jan 23 18:50:36 newponit scsi: [ID 107833 kern.warning] WARNING:
/[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0):
Jan 23 18:50:36 newponitgot SCSI bus reset
Jan 23 18:50:36 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
fault detected in device; service still available
Jan 23 18:50:36 newponit genunix: [ID 611667 kern.info] NOTICE: glm0:
got SCSI bus reset
Jan 23 18:50:36 newponit scsi: [ID 107833 kern.warning] WARNING:
/[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd0):
Jan 23 18:50:36 newponitSCSI transport failed: reason 'timeout': giving 
up
Jan 23 18:50:36 newponit md: [ID 312844 kern.warning] WARNING: md: state
database commit failed
Jan 23 18:50:36 newponit last message repeated 1 time
Jan 23 18:51:38 newponit unix: [ID 836849 kern.notice]
Jan 23 18:51:38 newponit ^Mpanic[cpu2]/thread=3e81600:
Jan 23 18:51:38 newponit unix: [ID 268973 kern.notice] md: Panic due to
lack of DiskSuite state
Jan 23 18:51:38 newponit  database replicas. Fewer than 50% of the total
were available,
Jan 23 18:51:38 newponit  so panic to ensure data integrity.
Jan 23 18:51:38 newponit unix: [ID 10 kern.notice]
Jan 23 18:51:38 newponit genunix: [ID 723222 kern.notice]
02a1003c1230 md:mddb_commitrec_wrapper+a8 (a, 3e81600, 18e9250,
12ecc00, 18e9000, 1)
Jan 23 18:51:38 newponit genunix: [ID 179002 kern.notice]   %l0-3:
0030  0002 06a8e6c8
Jan 23 18:51:38 newponit   %l4-7:  012ecf48
0002 012ecc00
Jan 23 18:51:39 newponit genunix: [ID 723222 kern.notice]
02a1003c12e0 md_mirror:mirror_mark_resync_region+290 (0, 0,
68dacc0, 68da980, 0, 1)
Jan 23 18:51:39 newponit genunix: [ID 179002 kern.notice]   %l0-3:
 068e9e80 0001 
Jan 23 18:51:39 newponit   %l4-7: 0001 
0183d400 0002
Jan 23 18:51:39 newponit genunix: [ID 723222 kern.notice]
02a1003c1390 md_mirror:mirror_write_strategy+5c0 (6885108, 0, 0,
0, 68dad20, 0)
Jan 23 18:51:39 newponit genunix: [ID 179002 kern.notice]   %l0-3:
 030c33b8

Re: [zfs-discuss] panic with zfs

2007-01-24 Thread Michael Schuster


Ihsan Dogan wrote:

Hello,

We're setting up a new mailserver infrastructure and decided, to run it
on zfs. On a E220R with a D1000, I've setup a storage pool with four
mirrors:

--
[EMAIL PROTECTED] # zpool status
  pool: pool0
 state: ONLINE
 scrub: none requested
config:


[...]


Jan 23 18:51:38 newponit ^Mpanic[cpu2]/thread=3e81600:
Jan 23 18:51:38 newponit unix: [ID 268973 kern.notice] md: Panic due to
lack of DiskSuite state
Jan 23 18:51:38 newponit  database replicas. Fewer than 50% of the total
were available,
Jan 23 18:51:38 newponit  so panic to ensure data integrity.


this message shows (and the rest of the stack prove) that your panic 
happened in SVM. It has NOTHING to do with zfs. So either you pulled the 
wrong disk, or the disk you pulled also contained SVM volumes (next to ZFS).




--
Michael SchusterSun Microsystems, Inc.
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] panic with zfs

2007-01-24 Thread Dennis Clarke


 Hello,

 We're setting up a new mailserver infrastructure and decided, to run it
 on zfs. On a E220R with a D1000, I've setup a storage pool with four
 mirrors:

   Good morning Ihsan ...

   I see that you have everything mirrored here, thats excellent.

   When you pulled a disk, was it a disk that was containing a metadevice or
 was it a disk in the zpool ?  In the case of a metadevice, as you know, the
 system should have kept running fine.  We have probably both done this over
 and over at various sites to demonstrate SVM to people.

   If you pulled out a device in the zpool, well now we are in a whole new
 world and I had heard that there was some *feature* in Solaris now that
 will protect the ZFS file system integrity by simply causing a system to
 panic if the last device in some redundant component was compromised.

   I think you hit a major bug in ZFS personally.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] panic with zfs

2007-01-24 Thread Jason Banham


Afternoon,

The panic looks due to the fact that your SVM state databases aren't
all there, so when we came to update one of them we found there
was = 50% of the state databases and crashed.

This doesn't look like anything to do with ZFS.
I'd check the output from metadb and see if it looks like
you've got a SVM database on a disk that's also in use by ZFS.


Jan 23 18:50:36 newponit 	SCSI transport failed: reason 'timeout':  
giving up
Jan 23 18:50:36 newponit md: [ID 312844 kern.warning] WARNING: md:  
state

database commit failed
Jan 23 18:50:36 newponit last message repeated 1 time
Jan 23 18:51:38 newponit unix: [ID 836849 kern.notice]
Jan 23 18:51:38 newponit ^Mpanic[cpu2]/thread=3e81600:
Jan 23 18:51:38 newponit unix: [ID 268973 kern.notice] md: Panic  
due to

lack of DiskSuite state
Jan 23 18:51:38 newponit  database replicas. Fewer than 50% of the  
total

were available,
Jan 23 18:51:38 newponit  so panic to ensure data integrity.




Regards,

Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] panic with zfs

2007-01-24 Thread Ihsan Dogan

Hello Michael,

Am 24.1.2007 14:36 Uhr, Michael Schuster schrieb:

 --
 [EMAIL PROTECTED] # zpool status
   pool: pool0
  state: ONLINE
  scrub: none requested
 config:
 
 [...]
 
 Jan 23 18:51:38 newponit ^Mpanic[cpu2]/thread=3e81600:
 Jan 23 18:51:38 newponit unix: [ID 268973 kern.notice] md: Panic due to
 lack of DiskSuite state
 Jan 23 18:51:38 newponit  database replicas. Fewer than 50% of the total
 were available,
 Jan 23 18:51:38 newponit  so panic to ensure data integrity.
 
 this message shows (and the rest of the stack prove) that your panic
 happened in SVM. It has NOTHING to do with zfs. So either you pulled the
 wrong disk, or the disk you pulled also contained SVM volumes (next to
 ZFS).

I noticed that the panic was in SVM and I'm wondering, why the machine
was hanging. SVM is only running on the internal disks (c0) and I pulled
a disk from the D1000:

Jan 23 17:24:14 newponit scsi: [ID 107833 kern.warning] WARNING:
/[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50):
Jan 23 17:24:14 newponitSCSI transport failed: reason 'incomplete':
retrying command
Jan 23 17:24:14 newponit scsi: [ID 107833 kern.warning] WARNING:
/[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50):
Jan 23 17:24:14 newponitdisk not responding to selection
Jan 23 17:24:18 newponit scsi: [ID 107833 kern.warning] WARNING:
/[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50):
Jan 23 17:24:18 newponitdisk not responding to selection

This is clearly the disk with ZFS on it: SVM has nothing to do with this
disk. A minute later, the troubles started with the internal disks:

Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL 
PROTECTED],4000/[EMAIL PROTECTED]
(glm0):
Jan 23 17:25:26 newponitCmd (0x6a3ed10) dump for Target 0 Lun 0:
Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL 
PROTECTED],4000/[EMAIL PROTECTED]
(glm0):
Jan 23 17:25:26 newponitcdb=[ 0x28 0x0 0x0 0x78 0x6 0x30 0x0 
0x0 0x10
0x0 ]
Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL 
PROTECTED],4000/[EMAIL PROTECTED]
(glm0):
Jan 23 17:25:26 newponitpkt_flags=0x4000 pkt_statistics=0x60 
pkt_state=0x7
Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL 
PROTECTED],4000/[EMAIL PROTECTED]
(glm0):
Jan 23 17:25:26 newponitpkt_scbp=0x0 cmd_flags=0x860
Jan 23 17:25:26 newponit scsi: [ID 107833 kern.warning] WARNING:
/[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0):
Jan 23 17:25:26 newponitDisconnected tagged cmd(s) (1) timeout for
Target 0.0
Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
fault detected in device; service still available
Jan 23 17:25:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0:
Disconnected tagged cmd(s) (1) timeout for Target 0.0
Jan 23 17:25:26 newponit glm: [ID 401478 kern.warning] WARNING:
ID[SUNWpd.glm.cmd_timeout.6018]
Jan 23 17:25:26 newponit scsi: [ID 107833 kern.warning] WARNING:
/[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0):
Jan 23 17:25:26 newponitgot SCSI bus reset
Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
fault detected in device; service still available

SVM and ZFS disks are on a seperate SCSI bus, so theoretically there
should be any impact on the SVM disks when I pull out a ZFS disk.



Ihsan

-- 
[EMAIL PROTECTED]   http://ihsan.dogan.ch/
http://gallery.dogan.ch/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] panic with zfs

2007-01-24 Thread Dennis Clarke


 Hello Michael,

 Am 24.1.2007 14:36 Uhr, Michael Schuster schrieb:

 --
 [EMAIL PROTECTED] # zpool status
   pool: pool0
  state: ONLINE
  scrub: none requested
 config:

 [...]

 Jan 23 18:51:38 newponit ^Mpanic[cpu2]/thread=3e81600:
 Jan 23 18:51:38 newponit unix: [ID 268973 kern.notice] md: Panic due to
 lack of DiskSuite state
 Jan 23 18:51:38 newponit  database replicas. Fewer than 50% of the total
 were available,
 Jan 23 18:51:38 newponit  so panic to ensure data integrity.

 this message shows (and the rest of the stack prove) that your panic
 happened in SVM. It has NOTHING to do with zfs. So either you pulled the
 wrong disk, or the disk you pulled also contained SVM volumes (next to
 ZFS).

 I noticed that the panic was in SVM and I'm wondering, why the machine
 was hanging. SVM is only running on the internal disks (c0) and I pulled
 a disk from the D1000:

   so the device that was affected had nothing to do with SVM at all.

   fine ... I have the exact same cconfig here.  Internal SVM and
  then external ZFS on two disk arrays on two controllers.

 Jan 23 17:24:14 newponit scsi: [ID 107833 kern.warning] WARNING:
 /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50):
 Jan 23 17:24:14 newponit  SCSI transport failed: reason 'incomplete':
 retrying command
 Jan 23 17:24:14 newponit scsi: [ID 107833 kern.warning] WARNING:
 /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50):
 Jan 23 17:24:14 newponit  disk not responding to selection
 Jan 23 17:24:18 newponit scsi: [ID 107833 kern.warning] WARNING:
 /[EMAIL PROTECTED],4000/[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd50):
 Jan 23 17:24:18 newponit  disk not responding to selection

 This is clearly the disk with ZFS on it: SVM has nothing to do with this
 disk. A minute later, the troubles started with the internal disks:

  OKay .. so are we back to looking at ZFS or ZFS and the SVM components or
some interaction between these kernel modules.  At this point I have to be
careful not to fall into a pit of blind ignorance as I grobe for the
answer.  Perhaps some data would help.  Was there a core file in
/var/crash/newponit ?

 Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL 
 PROTECTED],4000/[EMAIL PROTECTED]
 (glm0):
 Jan 23 17:25:26 newponit  Cmd (0x6a3ed10) dump for Target 0 Lun 0:
 Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL 
 PROTECTED],4000/[EMAIL PROTECTED]
 (glm0):
 Jan 23 17:25:26 newponit  cdb=[ 0x28 0x0 0x0 0x78 0x6 0x30 0x0 
 0x0 0x10
 0x0 ]
 Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL 
 PROTECTED],4000/[EMAIL PROTECTED]
 (glm0):
 Jan 23 17:25:26 newponit  pkt_flags=0x4000 pkt_statistics=0x60 
 pkt_state=0x7
 Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /[EMAIL 
 PROTECTED],4000/[EMAIL PROTECTED]
 (glm0):
 Jan 23 17:25:26 newponit  pkt_scbp=0x0 cmd_flags=0x860
 Jan 23 17:25:26 newponit scsi: [ID 107833 kern.warning] WARNING:
 /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0):
 Jan 23 17:25:26 newponit  Disconnected tagged cmd(s) (1) timeout for
 Target 0.0

   so a pile of scsi noise above there .. one would expect that from a
 suddenly missing scsi device.

 Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
 fault detected in device; service still available
 Jan 23 17:25:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0:
 Disconnected tagged cmd(s) (1) timeout for Target 0.0

  NCR scsi controllers .. what OS revision is this ?   Solaris 10 u 3 ?

  Solaris Nevada snv_55b ?

 Jan 23 17:25:26 newponit glm: [ID 401478 kern.warning] WARNING:
 ID[SUNWpd.glm.cmd_timeout.6018]
 Jan 23 17:25:26 newponit scsi: [ID 107833 kern.warning] WARNING:
 /[EMAIL PROTECTED],4000/[EMAIL PROTECTED] (glm0):
 Jan 23 17:25:26 newponit  got SCSI bus reset
 Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
 fault detected in device; service still available

 SVM and ZFS disks are on a seperate SCSI bus, so theoretically there
 should be any impact on the SVM disks when I pull out a ZFS disk.

  I still feel that you hit a bug in ZFS somewhere.  Under no circumstances
should a Solaris server panic and crash simply because you pulled out a
single disk that was totally mirrored.  In fact .. I will reproduce those
conditions here and then see what happens for me.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] panic with zfs

2007-01-24 Thread Ihsan Dogan

Hello,

Am 24.1.2007 14:40 Uhr, Dennis Clarke schrieb:

 We're setting up a new mailserver infrastructure and decided, to run it
 on zfs. On a E220R with a D1000, I've setup a storage pool with four
 mirrors:
 
Good morning Ihsan ...
 
I see that you have everything mirrored here, thats excellent.
 
When you pulled a disk, was it a disk that was containing a metadevice or
  was it a disk in the zpool ?  In the case of a metadevice, as you know, the
  system should have kept running fine.  We have probably both done this over
  and over at various sites to demonstrate SVM to people.
 
If you pulled out a device in the zpool, well now we are in a whole new
  world and I had heard that there was some *feature* in Solaris now that
  will protect the ZFS file system integrity by simply causing a system to
  panic if the last device in some redundant component was compromised.


The disk was in a zpool. The SVM disks are on a separate SCSI bus, so
they can't disturb each other.

I think you hit a major bug in ZFS personally.

For me it also looks like a bug.


Ihsan


-- 
[EMAIL PROTECTED]   http://ihsan.dogan.ch/
http://gallery.dogan.ch/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] panic with zfs

2007-01-24 Thread Michael Schuster


Ihsan Dogan wrote:


   I think you hit a major bug in ZFS personally.


For me it also looks like a bug.


I think we don't have enough information to judge. If you have a supported 
version of Solaris, open a case and supply all the data (crash dump!) you have.


HTH
--
Michael SchusterSun Microsystems, Inc.
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] panic with zfs

2007-01-24 Thread Ihsan Dogan

Hello,

Am 24.1.2007 14:49 Uhr, Jason Banham schrieb:

 The panic looks due to the fact that your SVM state databases aren't
 all there, so when we came to update one of them we found there
 was = 50% of the state databases and crashed.

The metadbs are fine. I haven't touched them at all:

[EMAIL PROTECTED] # metadb
flags   first blk   block count
 a m  p  luo16  8192/dev/dsk/c0t0d0s7
 ap  luo82088192/dev/dsk/c0t0d0s7
 ap  luo16  8192/dev/dsk/c0t1d0s7
 ap  luo82088192/dev/dsk/c0t1d0s7

 This doesn't look like anything to do with ZFS.
 I'd check the output from metadb and see if it looks like
 you've got a SVM database on a disk that's also in use by ZFS.

The question is still, why the system is panicing? I pulled out now a
different this, which is for sure on ZFS and not on SVM. The system
still runs, but I can't login anymore and the console doesn't work at
all anymore. Even if it has nothing to do with zfs, I don't think this
is a normal behavior.


Ihsan


-- 
[EMAIL PROTECTED]   http://ihsan.dogan.ch/
http://gallery.dogan.ch/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] panic with zfs

2007-01-24 Thread Ihsan Dogan

Am 24.1.2007 14:59 Uhr, Dennis Clarke schrieb:

 Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
 fault detected in device; service still available
 Jan 23 17:25:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0:
 Disconnected tagged cmd(s) (1) timeout for Target 0.0
 
   NCR scsi controllers .. what OS revision is this ?   Solaris 10 u 3 ?
 
   Solaris Nevada snv_55b ?

[EMAIL PROTECTED] # cat /etc/release
   Solaris 10 11/06 s10s_u3wos_10 SPARC
   Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
   Assembled 14 November 2006
[EMAIL PROTECTED] # uname -a
SunOS newponit 5.10 Generic_118833-33 sun4u sparc SUNW,Ultra-60

 SVM and ZFS disks are on a seperate SCSI bus, so theoretically there
 should be any impact on the SVM disks when I pull out a ZFS disk.
 
   I still feel that you hit a bug in ZFS somewhere.  Under no circumstances
 should a Solaris server panic and crash simply because you pulled out a
 single disk that was totally mirrored.  In fact .. I will reproduce those
 conditions here and then see what happens for me.

And Solaris should not hang at all.



Ihsan

-- 
[EMAIL PROTECTED]   http://ihsan.dogan.ch/
http://gallery.dogan.ch/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Solaris-Supported cards with battery backup

2007-01-24 Thread James F. Hranicky

Since we're talking about various hardware configs, does anyone know
which controllers with battery backup are supported on Solaris? If
we build a big ZFS box I'd like to be able to turn on write caching
on the drives but have them battery-backed in the event of a power
loss. Are 3ware cards going to be supported any time soon?

I checked and there doesn't seem to be a battery backup option
for Thumper. Is that right? Does anyone know if there plans for
that?

--
| Jim Hranicky, Senior SysAdmin   UF/CISE Department |
| E314D CSE BuildingPhone (352) 392-1499 |
| [EMAIL PROTECTED]  http://www.cise.ufl.edu/~jfh |
--

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] panic with zfs

2007-01-24 Thread Dennis Clarke


 Am 24.1.2007 14:59 Uhr, Dennis Clarke schrieb:

 Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
 fault detected in device; service still available
 Jan 23 17:25:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0:
 Disconnected tagged cmd(s) (1) timeout for Target 0.0

   NCR scsi controllers .. what OS revision is this ?   Solaris 10 u 3 ?

   Solaris Nevada snv_55b ?

 [EMAIL PROTECTED] # cat /etc/release
Solaris 10 11/06 s10s_u3wos_10 SPARC
Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
 Use is subject to license terms.
Assembled 14 November 2006
 [EMAIL PROTECTED] # uname -a
 SunOS newponit 5.10 Generic_118833-33 sun4u sparc SUNW,Ultra-60


   oh dear.

   that's not Solaris Nevada at all.  That is production Solaris 10.

 SVM and ZFS disks are on a seperate SCSI bus, so theoretically there
 should be any impact on the SVM disks when I pull out a ZFS disk.

   I still feel that you hit a bug in ZFS somewhere.  Under no
 circumstances
 should a Solaris server panic and crash simply because you pulled out a
 single disk that was totally mirrored.  In fact .. I will reproduce those
 conditions here and then see what happens for me.

 And Solaris should not hang at all.

  I agree.  We both know this.  You just recently patched a blastwave server
that was running for over 700 days in production and *this* sort of
behavior just does not happen in Solaris.

  Let me see if I can reproduce your config here :

bash-3.2# metastat -p
d0 -m /dev/md/rdsk/d10 /dev/md/rdsk/d20 1
d10 1 1 /dev/rdsk/c0t1d0s0
d20 1 1 /dev/rdsk/c0t0d0s0
d1 -m /dev/md/rdsk/d11 1
d11 1 1 /dev/rdsk/c0t1d0s1
d4 -m /dev/md/rdsk/d14 1
d14 1 1 /dev/rdsk/c0t1d0s7
d5 -m /dev/md/rdsk/d15 1
d15 1 1 /dev/rdsk/c0t1d0s5
d21 1 1 /dev/rdsk/c0t0d0s1
d24 1 1 /dev/rdsk/c0t0d0s7
d25 1 1 /dev/rdsk/c0t0d0s5

bash-3.2# metadb
flags   first blk   block count
 a m  p  luo16  8192/dev/dsk/c0t0d0s4
 ap  luo82088192/dev/dsk/c0t0d0s4
 ap  luo16  8192/dev/dsk/c0t1d0s4
 ap  luo82088192/dev/dsk/c0t1d0s4

bash-3.2# zpool status -v zfs0
  pool: zfs0
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
zfs0ONLINE   0 0 0
  c1t9d0ONLINE   0 0 0
  c1t10d0   ONLINE   0 0 0
  c1t11d0   ONLINE   0 0 0
  c1t12d0   ONLINE   0 0 0
  c1t13d0   ONLINE   0 0 0
  c1t14d0   ONLINE   0 0 0

errors: No known data errors
bash-3.2#

I will add in mirrors to that zpool from another array on another controller
and then yank a disk.  However this machine is on snv_52 at the moment.

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] panic with zfs

2007-01-24 Thread Michael Schuster


Dennis Clarke wrote:

Ihsan Dogan wrote:


   I think you hit a major bug in ZFS personally.

For me it also looks like a bug.

I think we don't have enough information to judge. If you have a supported
version of Solaris, open a case and supply all the data (crash dump!) you
have.


I agree we need data.  Everything else is just speculation and wild conjecture.

I am going to create the same conditions here but with snv_55b and then yank
a disk from my zpool.  If I get a similar response then I will *hope* for a
crash dump.

You must be kidding about the open a case however.  This is OpenSolaris.


no, I'm not. That's why I said If you have a supported version of 
Solaris. Also, Ihsan seems to disagree about OpenSolaris:



[EMAIL PROTECTED] # uname -a
SunOS newponit 5.10 Generic_118833-33 sun4u sparc SUNW,Ultra-60


Michael
--
Michael SchusterSun Microsystems, Inc.
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Peter Eriksson

 too much of our future roadmap, suffice it to say that one should expect
 much, much more from Sun in this vein: innovative software and innovative
 hardware working together to deliver world-beating systems with undeniable
 economics.

Yes please. Now give me a fairly cheap (but still quality) FC-attached JBOD 
utilizing SATA/SAS disks and I'll be really happy! :-)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] unsubscribe

2007-01-24 Thread Tim Cook

Hi Guys,

I completely forgot to unsubscribe to the zfs list before changing email 
addresses, and no longer have access to the old one.  Is there someone I can 
contact about manually removing my old address, or updating it with my new one?

Thanks!

--Tim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Jonathan Edwards



On Jan 24, 2007, at 09:25, Peter Eriksson wrote:

too much of our future roadmap, suffice it to say that one should  
expect
much, much more from Sun in this vein: innovative software and  
innovative
hardware working together to deliver world-beating systems with  
undeniable

economics.


Yes please. Now give me a fairly cheap (but still quality) FC- 
attached JBOD utilizing SATA/SAS disks and I'll be really happy! :-)


Could you outline why FC attached instead of network attached (iSCSI  
say) makes more sense to you?  It might help to illustrate the demand  
for an FC target I'm hearing instead of just a network target ..


.je
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Tim Cook

I think this will be a hard sell internally given that it would eat up their 
own storagetek line.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS direct IO

2007-01-24 Thread Jonathan Edwards



On Jan 24, 2007, at 06:54, Roch - PAE wrote:


[EMAIL PROTECTED] writes:
Note also that for most applications, the size of their IO  
operations

would often not match the current page size of the buffer, causing
additional performance and scalability issues.


Thanks for mentioning this, I forgot about it.

Since ZFS's default block size is configured to be larger than a  
page,

the application would have to issue page-aligned block-sized I/Os.
Anyone adjusting the block size would presumably be responsible for
ensuring that the new size is a multiple of the page size.  (If they
would want Direct I/O to work...)

I believe UFS also has a similar requirement, but I've been wrong
before.



I believe the UFS requirement is that the I/O be sector
aligned for DIO to be attempted. And Anton did mention that
one of the benefit of DIO is the ability to direct-read a
subpage block. Without UFS/DIO the OS is required to read and
cache the full page and the extra amount of I/O may lead to
data channel saturation (I don't see latency as an issue in
here, right ?).


In QFS there are mount options to do automatic type switching
depending on whether or not the IO is sector aligned or not.  You
essentially set a trigger to switch to DIO if you receive a tunable
number of well aligned IO requests.  This helps tremendously in
certain streaming workloads (particularly write) to reduce overhead.


This is where I said that such a feature would translate
for ZFS into the ability to read parts of a filesystem block
which would only make sense if checksums are disabled.


would it be possible to do checksums a posteri? .. i suspect that
the checksum portion of the transaction may not be atomic though
and this leads us back towards the older notion of a DIF.


And for RAID-Z that could mean avoiding I/Os to each disks but
one in a group, so that's a nice benefit.

So  for the  performance  minded customer that can't  afford
mirroring, is not  much a fan  of data integrity, that needs
to do subblock reads to an  uncacheable workload, then I can
see a feature popping up. And this feature is independant on
whether   or not the data  is  DMA'ed straight into the user
buffer.


certain streaming write workloads that are time dependent can
fall into this category .. if i'm doing a DMA read directly from a
device's buffer that i'd like to stream - i probably want to avoid
some of the caching layers of indirection that will probably impose
more overhead.

The idea behind allowing an application to advise the filesystem
of how it plans on doing it's IO (or the state of it's own cache or
buffers or stream requirements) is to prevent the one cache fits
all sort of approach that we currently seem to have in the ARC.


The  other  feature,  is to  avoid a   bcopy by  DMAing full
filesystem block reads straight into user buffer (and verify
checksum after). The I/O is high latency, bcopy adds a small
amount. The kernel memory can  be freed/reuse straight after
the user read  completes. This is  where I ask, how much CPU
is lost to the bcopy in workloads that benefit from DIO ?


But isn't the cost more than just the bcopy?  Isn't there additional
overhead in the TLB/PTE from the page invalidation that needs
to occur when you do actually go to write the page out or flush
the page?


At this point, there are lots of projects  that will lead to
performance improvements.  The DIO benefits seems like small
change in the context of ZFS.

The quickest return on  investement  I see for  the  directio
hint would be to tell ZFS to not grow the ARC when servicing
such requests.


How about the notion of multiple ARCs that could be referenced
or fine tuned for various types of IO workload profiles to provide a
more granular approach?  Wouldn't this also keep the page tables
smaller and hopefully more contiguous for atomic operations? Not
sure what this would break ..

.je
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Richard Elling


Peter Eriksson wrote:

too much of our future roadmap, suffice it to say that one should expect
much, much more from Sun in this vein: innovative software and innovative
hardware working together to deliver world-beating systems with undeniable
economics.


Yes please. Now give me a fairly cheap (but still quality) FC-attached JBOD 
utilizing SATA/SAS disks and I'll be really happy! :-)


... with write cache and dual redundant controllers?  I think we call that
the Sun StorageTek 3511.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Bryan Cantrill


 well, Thumper is actually a reference to Bambi

You'd have to ask Fowler, but certainly when he coined it, Bambi was the
last thing on anyone's mind.  I believe Fowler's intention was one that
thumps (or, in the unique parlance of a certain Commander-in-Chief,
one that gives a thumpin').

- Bryan

--
Bryan Cantrill, Solaris Kernel Development.   http://blogs.sun.com/bmc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Can you turn on zfs compression when the fs is already populated?

2007-01-24 Thread Neal Pollack


I have an 800GB raidz2 zfs filesystem.  It already has approx 142Gb of data.
Can I simply turn on compression at this point, or do you need to start 
with compression
at the creation time?  If I turn on compression now, what happens to the 
existing data?


Thanks,

Neal

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Moazam Raja

Well, he did say fairly cheap. the ST 3511 is about $18.5k. That's  
about the same price for the low-end NetApp FAS250 unit.


-Moazam

On Jan 24, 2007, at 9:40 AM, Richard Elling wrote:


Peter Eriksson wrote:
too much of our future roadmap, suffice it to say that one should  
expect
much, much more from Sun in this vein: innovative software and  
innovative
hardware working together to deliver world-beating systems with  
undeniable

economics.
Yes please. Now give me a fairly cheap (but still quality) FC- 
attached JBOD utilizing SATA/SAS disks and I'll be really happy! :-)


... with write cache and dual redundant controllers?  I think we  
call that

the Sun StorageTek 3511.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Can you turn on zfs compression when the fs is already populated?

2007-01-24 Thread Casper . Dik


I have an 800GB raidz2 zfs filesystem.  It already has approx 142Gb of data.
Can I simply turn on compression at this point, or do you need to start 
with compression
at the creation time?  If I turn on compression now, what happens to the 
existing data?

Yes.  Nothing.

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Sean McGrath - Sun Microsystems Ireland

Bryan Cantrill stated:
 
  well, Thumper is actually a reference to Bambi

  I keep thinking of the classic AC/DC song when Fowler and thumpers are
  mentioned..  s/thunder/thumper/

 
 You'd have to ask Fowler, but certainly when he coined it, Bambi was the
 last thing on anyone's mind.  I believe Fowler's intention was one that
 thumps (or, in the unique parlance of a certain Commander-in-Chief,
 one that gives a thumpin').
 
   - Bryan
 
 --
 Bryan Cantrill, Solaris Kernel Development.   http://blogs.sun.com/bmc
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Sean.
.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Can you turn on zfs compression when the fs is already populated?

2007-01-24 Thread Dana H. Myers

Neal Pollack wrote:
 I have an 800GB raidz2 zfs filesystem.  It already has approx 142Gb of
 data.
 Can I simply turn on compression at this point, or do you need to start
 with compression at the creation time?

As I understand it, you can turn compression on and off at will.
Data will be written to the disk according to the compression mode,
and either compressed or uncompressed blocks can be read regardless
of the setting.

  If I turn on compression now, what
 happens to the existing data?

Existing (uncompressed) data will remain uncompressed until it is re-written,
at which point it may be compressed.

Dana
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Bryan Cantrill


On Wed, Jan 24, 2007 at 09:46:11AM -0800, Moazam Raja wrote:
 Well, he did say fairly cheap. the ST 3511 is about $18.5k. That's  
 about the same price for the low-end NetApp FAS250 unit.

Note that the 3511 is being replaced with the 6140:

  http://www.sun.com/storagetek/disk_systems/midrange/6140/

Also, don't read too much into the prices you see on the website -- that's
the list price, and doesn't reflect any discounting.  If you're interested
in what it _actually_ costs, you should talk to a Sun rep or one of our
channel partners to get a quote.  (And lest anyone attack the messenger:
I'm not defending this system of getting an accurate price, I'm just
describing it.)

- Bryan

--
Bryan Cantrill, Solaris Kernel Development.   http://blogs.sun.com/bmc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Rich Teer

On Wed, 24 Jan 2007, Jonathan Edwards wrote:

  Yes please. Now give me a fairly cheap (but still quality) FC-attached JBOD
  utilizing SATA/SAS disks and I'll be really happy! :-)
 
 Could you outline why FC attached instead of network attached (iSCSI say)
 makes more sense to you?  It might help to illustrate the demand for an FC
 target I'm hearing instead of just a network target ..

Dunno about FC or iSCSI, but what I'd really like to see is a 1U direct
attach 8-drive SAS JBOD, as described (back in May 2006!) here:


http://richteer.blogspot.com/2006/05/sun-storage-product-i-would-like-to.html

Modulo the UltraSCSI 320 stuff perhaps.

Given that other vendors have released something similar, and how strong
Sun's entry-level server offerings are, I can't believe that Sun hasn't
annouced something like this, to bring their entry-level storage offerings
up to the bar set by their servers...

-- 
Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member

President,
Rite Online Inc.

Voice: +1 (250) 979-1638
URL: http://www.rite-group.com/rich
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Jonathan Edwards



On Jan 24, 2007, at 12:41, Bryan Cantrill wrote:




well, Thumper is actually a reference to Bambi


You'd have to ask Fowler, but certainly when he coined it, Bambi  
was the
last thing on anyone's mind.  I believe Fowler's intention was one  
that

thumps (or, in the unique parlance of a certain Commander-in-Chief,
one that gives a thumpin').


You can take your pick of things that thump here:
http://en.wikipedia.org/wiki/Thumper

given the other name is the X4500 .. it does seem like it should be a  
weapon


---
.je

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Joe Little


On 1/24/07, Jonathan Edwards [EMAIL PROTECTED] wrote:


On Jan 24, 2007, at 09:25, Peter Eriksson wrote:

 too much of our future roadmap, suffice it to say that one should
 expect
 much, much more from Sun in this vein: innovative software and
 innovative
 hardware working together to deliver world-beating systems with
 undeniable
 economics.

 Yes please. Now give me a fairly cheap (but still quality) FC-
 attached JBOD utilizing SATA/SAS disks and I'll be really happy! :-)

Could you outline why FC attached instead of network attached (iSCSI
say) makes more sense to you?  It might help to illustrate the demand
for an FC target I'm hearing instead of just a network target ..



I'm not generally for FC-attached storage, but we've documented here
many times how the round trip latency with iSCSI hasn't been the
perfect match with ZFS and NFS (think NAS). You need either IB or FC
right now to make that workable. Some day though.. either with
nvram-backed NFS or cheap 10Gig-E...




.je
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Bryan Cantrill


 You can take your pick of things that thump here:
 http://en.wikipedia.org/wiki/Thumper

I think it's safe to say that Fowler was thinking more along the lines
of whomever dubbed the M79 grenade launcher -- which you can safely bet
was not named after a fictional bunny...

- Bryan

--
Bryan Cantrill, Solaris Kernel Development.   http://blogs.sun.com/bmc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Rich Teer

On Wed, 24 Jan 2007, Sean McGrath - Sun Microsystems Ireland wrote:

 Bryan Cantrill stated:
  
   well, Thumper is actually a reference to Bambi
 
   I keep thinking of the classic AC/DC song when Fowler and thumpers are
   mentioned..  s/thunder/thumper/

Yeah, AC/DC songs seem to be most apropos for Sun at the moment:

* Thumperstruck (the subject of this thread)

* For those about to rock (the successor to the US-IV)

* Back in back (Sun's return to profitability as announced yesterday)

Although Queen is almost as good:

* We will Rock you

* We are the champions


And what do M$ users have?  Courtesy of the Rolling Stones:

* (I can't get) no satisfaction

* 19th Nervous breakdown

:-)

-- 
Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member

President,
Rite Online Inc.

Voice: +1 (250) 979-1638
URL: http://www.rite-group.com/rich
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Frank Cusack

On January 24, 2007 9:40:41 AM -0800 Richard Elling 
[EMAIL PROTECTED] wrote:

Peter Eriksson wrote:

Yes please. Now give me a fairly cheap (but still quality) FC-attached
JBOD  utilizing SATA/SAS disks and I'll be really happy! :-)


... with write cache and dual redundant controllers?  I think we call that
the Sun StorageTek 3511.


Ah but the 3511 JBOD is not supported for direct attach to a host, nor is it
supported for attachment to a SAN.  You have to have a 3510 or 3511 with
RAID controller to use the 3511 JBOD.  The RAID controller is pretty pricey
on these guys.  $5k each IIRC.

On January 24, 2007 10:04:04 AM -0800 Bryan Cantrill [EMAIL PROTECTED] 
wrote:


On Wed, Jan 24, 2007 at 09:46:11AM -0800, Moazam Raja wrote:

Well, he did say fairly cheap. the ST 3511 is about $18.5k. That's
about the same price for the low-end NetApp FAS250 unit.


Note that the 3511 is being replaced with the 6140:


Which is MUCH nicer but also much pricier.  Also, no non-RAID option.

You can get a 4Gb FC-SATA RAID with 12*750gb drives for about $10k
from third parties.  I doubt we'll ever see that from Sun if for no
other reason just due to the drive markups.  (Which might be justified
based on drive qualification; I'm not making any comment as to whether
the markup is warranted or not, just that it exists and is obscene.)

But you still can't beat thumper overall.  I believe S10U3 has iSCSI
target support?  If so, there you go.  Not on the low end in absolute
$$$ but certainly in $/GB per bits/sec.  Probably better on power too
compared to equivalent solutions.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Rich Teer

On Wed, 24 Jan 2007, Bryan Cantrill wrote:

 I think it's safe to say that Fowler was thinking more along the lines

Presumably, that's John Fowler?

-- 
Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member

President,
Rite Online Inc.

Voice: +1 (250) 979-1638
URL: http://www.rite-group.com/rich
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Frank Cusack

On January 24, 2007 10:02:52 AM -0800 Rich Teer [EMAIL PROTECTED] 
wrote:

Dunno about FC or iSCSI, but what I'd really like to see is a 1U direct
attach 8-drive SAS JBOD, as described (back in May 2006!) here:




http://richteer.blogspot.com/2006/05/sun-storage-product-i-would-like-to.html

The problem with that is the 2.5 drives are too expensive and too small.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Angelo Rajadurai



On 24 Jan 2007, at 13:04, Bryan Cantrill wrote:



On Wed, Jan 24, 2007 at 09:46:11AM -0800, Moazam Raja wrote:

Well, he did say fairly cheap. the ST 3511 is about $18.5k. That's
about the same price for the low-end NetApp FAS250 unit.


Note that the 3511 is being replaced with the 6140:

  http://www.sun.com/storagetek/disk_systems/midrange/6140/

Also, don't read too much into the prices you see on the website --  
that's
the list price, and doesn't reflect any discounting.  If you're  
interested

in what it _actually_ costs, you should talk to a Sun rep or one of our
channel partners to get a quote.  (And lest anyone attack the  
messenger:

I'm not defending this system of getting an accurate price, I'm just
describing it.)



If your company can qualify as a start-up (4 year old or less with less
than 150 employees) you may want to look at the Sun Startup essentials
program. It provides Sun hardware at big discounts for startups.

http://www.sun.com/emrkt/startupessentials/

For an idea on the levels of discounts see
http://kalsey.com/2006/11/sun_startup_essentials_pricing/

-Angelo



- Bryan

--- 
---
Bryan Cantrill, Solaris Kernel Development.
http://blogs.sun.com/bmc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Shannon Roddy

Frank Cusack wrote:
 On January 24, 2007 9:40:41 AM -0800 Richard Elling
 [EMAIL PROTECTED] wrote:
 Peter Eriksson wrote:
 Yes please. Now give me a fairly cheap (but still quality) FC-attached
 JBOD  utilizing SATA/SAS disks and I'll be really happy! :-)

 ... with write cache and dual redundant controllers?  I think we call
 that
 the Sun StorageTek 3511.
 
 Ah but the 3511 JBOD is not supported for direct attach to a host, nor
 is it
 supported for attachment to a SAN.  You have to have a 3510 or 3511 with
 RAID controller to use the 3511 JBOD.  The RAID controller is pretty pricey
 on these guys.  $5k each IIRC.


I started looking into the 3511 for a ZFS system and just about
immediately stopped considering it for this reason.  If it is not
supported in JBOD, then I might as well go get a third party JBOD at the
same level of support.


 You can get a 4Gb FC-SATA RAID with 12*750gb drives for about $10k
 from third parties.  I doubt we'll ever see that from Sun if for no
 other reason just due to the drive markups.  (Which might be justified
 based on drive qualification; I'm not making any comment as to whether
 the markup is warranted or not, just that it exists and is obscene.)
 

Yep.  I went with a third party FC/SATA unit which has been flawless as
a direct attach for my ZFS JBOD system.  Paid about $0.70/GB.  And I
still have enough money left over this year to upgrade my network core.
 If I would have gone with Sun, I wouldn't be able to push as many bits
across my network.

I just don't know how people can afford Sun storage, or even if they
can, what drives them to pay such premiums.

Sun is missing out on lots of lower end storage, but perhaps that is by
design.  I am a small shop by many standards, but I would have spent
tens of thousands over the last few years with Sun if they had
reasonably priced storage.  shrug  I just need a place to put my bits.
 Doesn't need to be the fastest, bleeding edge stuff.  Just a bucket
that performs reasonably, and preferably one that I can use with ZFS.

-Shannon

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: ZFS direct IO

2007-01-24 Thread johansen-osdev

 And this feature is independant on whether   or not the data  is
 DMA'ed straight into the user buffer.

I suppose so, however, it seems like it would make more sense to
configure a dataset property that specifically describes the caching
policy that is desired.  When directio implies different semantics for
different filesystems, customers are going to get confused.

 The  other  feature,  is to  avoid a   bcopy by  DMAing full
 filesystem block reads straight into user buffer (and verify
 checksum after). The I/O is high latency, bcopy adds a small
 amount. The kernel memory can  be freed/reuse straight after
 the user read  completes. This is  where I ask, how much CPU
 is lost to the bcopy in workloads that benefit from DIO ?

Right, except that if we try to DMA into user buffers with ZFS there's a
bunch of other things we need the VM to do on our behalf to protect the
integrity of the kernel data that's living in user pages.  Assume you
have a high-latency I/O and you've locked some user pages for this I/O.
In a pathological case, when another thread tries to access the locked
pages and then also blocks,  it does so for the duration of the first
thread's I/O.  At that point, it seems like it might be easier to accept
the cost of the bcopy instead of blocking another thread.

I'm not even sure how to assess the impact of VM operations required to
change the permissions on the pages before we start the I/O.

 The quickest return on  investement  I see for  the  directio
 hint would be to tell ZFS to not grow the ARC when servicing
 such requests.

Perhaps if we had an option that specifies not to cache data from a
particular dataset, that would suffice.  I think you've filed a CR along
those lines already (6429855)?

-j
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Rich Teer

On Wed, 24 Jan 2007, Shannon Roddy wrote:

 Sun is missing out on lots of lower end storage, but perhaps that is by
 design.  I am a small shop by many standards, but I would have spent
 tens of thousands over the last few years with Sun if they had
 reasonably priced storage.  shrug  I just need a place to put my bits.
  Doesn't need to be the fastest, bleeding edge stuff.  Just a bucket
 that performs reasonably, and preferably one that I can use with ZFS.

+1

-- 
Rich Teer, SCSA, SCNA, SCSECA, OpenSolaris CAB member

President,
Rite Online Inc.

Voice: +1 (250) 979-1638
URL: http://www.rite-group.com/rich
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Can you turn on zfs compression when the fs is already populated?

2007-01-24 Thread Anantha N. Srirama

I've used the COMPRESS feature for quite a while and you can flip back and 
forth without any problem. When you turn the compress ON nothing happens to the 
existing data. However when you start updating your files all new blocks will 
be compressed; so it is possible to have your file be composed of both 
compressed and uncompressed blocks!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Converting home directory from ufs to zfs

2007-01-24 Thread Anantha N. Srirama

No such facility exists to automagically convert an existing UFS filesystem to 
ZFS. You've to create a new ZFS pool/filesystem and then move your data.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool split

2007-01-24 Thread Dick Davies


On 23/01/07, Darren J Moffat [EMAIL PROTECTED] wrote:


Can you pick another name for this please because that name has already
been suggested for zfs(1) where the argument is a directory in an
existing ZFS file system and the result is that the directory becomes a
new ZFS file system while retaining its contents.


Sorry to jump in on the thread, but -

that's an excellent feature addition, look forward to it.
Will it be accompanied by a 'zfs join'?

--
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Casper . Dik


Bryan Cantrill wrote:
 well, Thumper is actually a reference to Bambi
 
 You'd have to ask Fowler, but certainly when he coined it, Bambi was the
 last thing on anyone's mind.  I believe Fowler's intention was one that
 thumps (or, in the unique parlance of a certain Commander-in-Chief,
 one that gives a thumpin').

me, I always thought of calling sandworms.

sandworms use up a lot of space, you see...

And bring in a lot of cash

(IIRC, the worms caused the spice and the spice was mined)

It was my association too.

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Dan Mick


[EMAIL PROTECTED] wrote:

Bryan Cantrill wrote:

well, Thumper is actually a reference to Bambi

You'd have to ask Fowler, but certainly when he coined it, Bambi was the
last thing on anyone's mind.  I believe Fowler's intention was one that
thumps (or, in the unique parlance of a certain Commander-in-Chief,
one that gives a thumpin').

me, I always thought of calling sandworms.

sandworms use up a lot of space, you see...


And bring in a lot of cash

(IIRC, the worms caused the spice and the spice was mined)

It was my association too.


...and if you imagine 48 head positioner arms moving at once
one can imagine the vibration would travel through sand, is all.

Just means it's a good name, I suppose!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris-Supported cards with battery backup

2007-01-24 Thread Robert Milkowski

Hello James,

Wednesday, January 24, 2007, 3:20:14 PM, you wrote:

JFH Since we're talking about various hardware configs, does anyone know
JFH which controllers with battery backup are supported on Solaris? If
JFH we build a big ZFS box I'd like to be able to turn on write caching
JFH on the drives but have them battery-backed in the event of a power
JFH loss. Are 3ware cards going to be supported any time soon?

JFH I checked and there doesn't seem to be a battery backup option
JFH for Thumper. Is that right? Does anyone know if there plans for
JFH that?

ZFS makes sure itself that transaction is on disk by issuing write
cache flush command to disks. So you don't have to worry about it.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris-Supported cards with battery backup

2007-01-24 Thread Chad Leigh -- Shire.Net LLC



On Jan 24, 2007, at 1:57 PM, Robert Milkowski wrote:


Hello James,

Wednesday, January 24, 2007, 3:20:14 PM, you wrote:

JFH Since we're talking about various hardware configs, does  
anyone know
JFH which controllers with battery backup are supported on  
Solaris? If
JFH we build a big ZFS box I'd like to be able to turn on write  
caching
JFH on the drives but have them battery-backed in the event of a  
power

JFH loss. Are 3ware cards going to be supported any time soon?

JFH I checked and there doesn't seem to be a battery backup option
JFH for Thumper. Is that right? Does anyone know if there plans for
JFH that?

ZFS makes sure itself that transaction is on disk by issuing write
cache flush command to disks. So you don't have to worry about it.



Areca SATA cards are supported on Solaris x86 by Areca (drivers etc  
from them, not from Sun) and they support battery backup.


It is what I am using

Chad


---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net





smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Solaris-Supported cards with battery backup

2007-01-24 Thread James F. Hranicky

Robert Milkowski wrote:
 Hello James,
 
 Wednesday, January 24, 2007, 3:20:14 PM, you wrote:
 
 JFH Since we're talking about various hardware configs, does anyone know
 JFH which controllers with battery backup are supported on Solaris? If
 JFH we build a big ZFS box I'd like to be able to turn on write caching
 JFH on the drives but have them battery-backed in the event of a power
 JFH loss. Are 3ware cards going to be supported any time soon?
 
 JFH I checked and there doesn't seem to be a battery backup option
 JFH for Thumper. Is that right? Does anyone know if there plans for
 JFH that?
 
 ZFS makes sure itself that transaction is on disk by issuing write
 cache flush command to disks. So you don't have to worry about it.

Ok, does that negate the performance gains of having the write cache
on?

I guess what I'm really asking is with the problems I and others have
noted with NFS/ZFS, what's currently the best way to get good NFS
performance without sacrificing reliability (i.e., disabling the ZIL,
etc).

If a battery-backed cache isn't necessary, all the better.

Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Synchronous Mount?

2007-01-24 Thread Peter Schuller

 Specifically, I was trying to compare ZFS snapshots with LVM snapshots on
 Linux. One of the tests does writes to an ext3FS (that's on top of an LVM
 snapshot) mounted synchronously, in order to measure the real
 Copy-on-write overhead. So, I was wondering if I could do the same with
 ZFS. Seems not.

Given that ZFS does COW for *all* writes, what does this test actually intend 
to show when running on ZFS? Am I missing something, or should not writes to 
a clone be as fast, or even faster, than a write to a non-clone? Given that 
COW is always performed, but in the case of the clone the old data is not 
removed.

-- 
/ Peter Schuller, InfiDyne Technologies HB

PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Synchronous Mount?

2007-01-24 Thread Prashanth Radhakrishnan


  Specifically, I was trying to compare ZFS snapshots with LVM snapshots on
  Linux. One of the tests does writes to an ext3FS (that's on top of an LVM
  snapshot) mounted synchronously, in order to measure the real
  Copy-on-write overhead. So, I was wondering if I could do the same with
  ZFS. Seems not.
 
 Given that ZFS does COW for *all* writes, what does this test actually intend 
 to show when running on ZFS? Am I missing something, or should not writes to 
 a clone be as fast, or even faster, than a write to a non-clone? Given that 
 COW is always performed, but in the case of the clone the old data is not 
 removed.
 

well, yes - for ZFS. But not the case with LVM snapshots. Doing the same 
(sync mount) on ZFS was just for comparing them on similar grounds. 
Anyways, I figured ZFS performs way better.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Thumper Origins Q

2007-01-24 Thread Peter Eriksson

 #1 is speed. You can aggregate 4x1Gbit ethernet and still not touch 4Gb/sec 
 FC. 
 #2 drop in compatibility. I'm sure people would love to drop this into an 
 existing SAN

#2 is the key for me. And I also have a #3:

  FC has been around a long time now. The HBAs and Switches are (more or less 
:-) debugged and we know how things work...

iSCSI - well, perhaps. But to me that feels like it gets too far away from 
the hardware. I'd like to keep the distance between the disks and ZFS as 
short as possible.

Ie:
ZFS - HBA - FC Switch - JBOD - Simple FC-SATA-converter - SATA disk
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re[2]: [zfs-discuss] Solaris-Supported cards with battery backup

2007-01-24 Thread Robert Milkowski

Hello James,

Wednesday, January 24, 2007, 10:31:46 PM, you wrote:

JFH Robert Milkowski wrote:
 Hello James,
 
 Wednesday, January 24, 2007, 3:20:14 PM, you wrote:
 
 JFH Since we're talking about various hardware configs, does anyone know
 JFH which controllers with battery backup are supported on Solaris? If
 JFH we build a big ZFS box I'd like to be able to turn on write caching
 JFH on the drives but have them battery-backed in the event of a power
 JFH loss. Are 3ware cards going to be supported any time soon?
 
 JFH I checked and there doesn't seem to be a battery backup option
 JFH for Thumper. Is that right? Does anyone know if there plans for
 JFH that?
 
 ZFS makes sure itself that transaction is on disk by issuing write
 cache flush command to disks. So you don't have to worry about it.

JFH Ok, does that negate the performance gains of having the write cache
JFH on?

JFH I guess what I'm really asking is with the problems I and others have
JFH noted with NFS/ZFS, what's currently the best way to get good NFS
JFH performance without sacrificing reliability (i.e., disabling the ZIL,
JFH etc).

JFH If a battery-backed cache isn't necessary, all the better.

I thought you were worried about write cache in disks - if you
dedicate whole disks for zfs on x4500 write cache on disks will be
enabled by default.

But if you are talking about another level of cache then you're right
- currently you can't do it on x4500.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] On-failure policies for pools

2007-01-24 Thread Richard Elling


comment below...

Peter Schuller wrote:

In many situations it may not feel worth it to move to a raidz2 just to
avoid this particular case.

I can't think of any, but then again, I get paid to worry about failures
:-)


Given that one of the tauted features of ZFS is data integrity, including in 
the case of cheap drives, that implies it is of interest to get maximum 
integrity with any given amount of resources.


In your typical home use situation for example, buying 4 drives of decent size 
is pretty expensive considering that it *is* home use. Getting 4 drives for 
the diskspace of 3 is a lot more attractive than 5 drives for the diskspace 
of 3. But given that you do get 4 drives and put them in a raidz, you want as 
much safety as possible, and often you don't care that much about 
availability.


That said, the argument scales. If you're not in a situation like the above, 
you may easily warrant wasting an extra drive on raidz2. But raidz2 without 
this feature is still less safe than raidz2 with the feature. So moving back 
to the idea of getting as much redundancy as possible given a certain set of 
hardware resources, you're still not optimal given your hardware.



Please correct me if I misunderstand your reasoning, are you saying that a
broken disk should not be replaced?


Sorry, no. However, I realize my desire actually requires an additional 
feature. The situation I envision situation is this:


* One disk goes down in a raidz, because the controller suddenly broke 
(platters/heads are fine).


* You replace the disk and start a re-silvering.

* You trigger a bad block. At this point, you are now pretty screwed, unless:

* The pool did not change after the original drive failed, AND a broken drive 
assisted resilvering is supported. You go to whatever effort required to fix 
the disk (say, buy another one of the same model and replace the controller, 
or hire some company that does this stuff), re-insert it into the machine.


* At this point you have a drive you can read data off of, but that you 
certainly don't trust in general. So you want to start replacing the drive 
with the new drive; if ZFS were then able to resilver to the new drive by 
using both parity data on other healthy drives in the pool, and the disk 
being replaced, you're a happy.


It is my understanding that zpool replace already does this.  Just don't
remove the failing disk...

Or let's do a more likely senario. A disk starts dying because of bad sectors 
(the disk has run out of remapping possibilities). You cannot fix this 
anymore by re-writing the bad sectors; trying to re-write the sector ends up 
failing with an I/O error and ZFS kicks the disk out of the pool.


Standard procedure at this point is to replace the drive and resilver. But 
once again - you might end up with a bad sector on another drive. Without 
utilizing the existing broken drive, you're screwed. If however you were able 
to take advantage of sectors that *ARE* readable off of the drive, and the 
drive has *NOT* gone out of date since it was kicked out due to additional 
transaction commits, you are once again happy.


(Once again assuming you don't happen to have bad luck and the set of bad 
sectors on the two drives overlap.)


...
I think I was off base previously.  It seems to me that you are really after
the policy for failing/failed disks.  Currently, the only way a drive gets
kicked out is if ZFS cannot open it.  Obviously, if ZFS cannot open the
drive, then you won't be able to read anything from it.

Looking forward, I think that there are several policies which may be desired...

If so, then that is contrary to the 
accepted methods used in most mission critical systems.  There may be other

methods which meet your requirements and are accepted.  For example, one
procedure we see for those sites who are very interested in data retention
is to power off a system when it is degraded to a point (as specified)
where data retention is put at unacceptable risk.


This is kind of what I am after, except that I want to guarantee that not a 
single transaction gets committed once a pool is degraded. Even if an admin 
goes and turns the machine off, the disk will be out of date.


... such as a policy that says if a disk is going bad, go read-only.  I'm
quite sure that most applications won't respond well to such a policy, though.

The theory is that a 
powered down system will stop wearing out.  When the system is serviced,

then it can be brought back online.  Obviously, this is not the case where
data availability is a primary requirement -- data retention has higher
priority.


On the other hand, hardware has a nasty tendancy to break in relation to power 
cycles...



We can already set a pool (actually the file systems in a pool) to be read
only.


Automatically and *immediately* on a drive failure?


You can listen to sysevents and implement policies.


There may be something else lurking here that we might be able to take
advantage

[zfs-discuss] Why replacing a drive generates writes to other disks?

2007-01-24 Thread Robert Milkowski

Hello zfs-discuss,

  Subject says it all.


I first checked - no IO activity at all to the pool named thumper-2.
So I started replacing one drive with 'zpool replace thumper-2 c7t7d0
c4t1d0'.

Now the question is why am I seeing writes to other disks than c7t7d0?

Also why in case of replacing a disk we do not just copy disk-to-disk?
It would be MUCH faster here. Probably 'coz we're traversing
meta-data? But perhaps it could be done in a clever way so we endup
just copying from one disk to another. Checking parity or checksum
here it's not necessary - scrub is for it. What we want in most cases
is to replace drive as fast as possible.

On another thumper I have a failing drive (port resets, etc.) so I
issued over a week ago drive replacement. Well it still hasn't
completed even 4% in a week! The pool config is the same. It's just
wy to slow and in a long term risky.


  
bash-3.00# zpool status
  pool: thumper-2
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 0,01% done, 350h29m to go
config:

NAME  STATE READ WRITE CKSUM
thumper-2 ONLINE   0 0 0
  raidz2  ONLINE   0 0 0
c0t0d0ONLINE   0 0 0
c1t0d0ONLINE   0 0 0
c4t0d0ONLINE   0 0 0
c6t0d0ONLINE   0 0 0
c7t0d0ONLINE   0 0 0
c0t1d0ONLINE   0 0 0
c1t1d0ONLINE   0 0 0
c5t1d0ONLINE   0 0 0
c6t1d0ONLINE   0 0 0
c7t1d0ONLINE   0 0 0
c0t2d0ONLINE   0 0 0
  raidz2  ONLINE   0 0 0
c1t2d0ONLINE   0 0 0
c5t2d0ONLINE   0 0 0
c6t2d0ONLINE   0 0 0
c7t2d0ONLINE   0 0 0
c0t4d0ONLINE   0 0 0
c1t4d0ONLINE   0 0 0
c4t4d0ONLINE   0 0 0
c6t4d0ONLINE   0 0 0
c7t4d0ONLINE   0 0 0
c0t3d0ONLINE   0 0 0
c1t3d0ONLINE   0 0 0
  raidz2  ONLINE   0 0 0
c4t3d0ONLINE   0 0 0
c5t3d0ONLINE   0 0 0
c6t3d0ONLINE   0 0 0
c7t3d0ONLINE   0 0 0
c0t5d0ONLINE   0 0 0
c1t5d0ONLINE   0 0 0
c4t5d0ONLINE   0 0 0
c5t5d0ONLINE   0 0 0
c6t5d0ONLINE   0 0 0
c7t5d0ONLINE   0 0 0
c0t6d0ONLINE   0 0 0
  raidz2  ONLINE   0 0 0
c1t6d0ONLINE   0 0 0
c4t6d0ONLINE   0 0 0
c5t6d0ONLINE   0 0 0
c6t6d0ONLINE   0 0 0
c7t6d0ONLINE   0 0 0
c0t7d0ONLINE   0 0 0
c1t7d0ONLINE   0 0 0
c4t7d0ONLINE   0 0 0
c5t7d0ONLINE   0 0 0
c6t7d0ONLINE   0 0 0
spare ONLINE   0 0 0
  c7t7d0  ONLINE   0 0 0
  c4t1d0  ONLINE   0 0 0
spares
  c4t1d0  INUSE currently in use
  c4t2d0  AVAIL

errors: No known data errors

  pool: zones
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
zones ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c5t0d0s4  ONLINE   0 0 0
c5t4d0s4  ONLINE   0 0 0

errors: No known data errors
bash-3.00#
  

  
# iostat -xnz 1
[...]
  
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  114.00.0 7232.30.0  5.9  0.8   51.96.7  74  76 c0t0d0
  132.00.0 8320.60.0  9.0  1.0   68.47.5  95  98 c6t0d0
  123.00.0 7807.70.0  7.3  0.8   59.36.3  76  77 c7t0d0
  115.00.0 7296.30.0  7.9  0.8   68.77.1  80  81 c4t0d0
  100.00.0 6336.40.0  3.6  0.6   36.36.0  56  60 c6t1d0
0.0  297.00.0  151.0  0.0  0.00.00.2   0   5 c4t1d0
  106.00.0 6720.30.0  5.3  0.6   50.06.1  63  65 c7t1d0
  122.00.0 7743.70.0  6.9  0.7   56.86.0  72  73 c0t1d0
  120.00.0 7679.20.0  5.6  0.7   46.95.7  66  68 c1t1d0
4.00.0  129.50.0  0.0

Re: [zfs-discuss] Re: Thumper Origins Q

2007-01-24 Thread Shannon Roddy

Ben Gollmer wrote:
 On Jan 24, 2007, at 12:37 PM, Shannon Roddy wrote:
 
 I went with a third party FC/SATA unit which has been flawless as
 a direct attach for my ZFS JBOD system.  Paid about $0.70/GB.
 
 What did you use, if you don't mind my asking?
 

Arena Janus 6641.  Turns out I underestimated what I paid per GB.  I
went back and dug up the invoice and I paid just under $1/GB.  My memory
was a little off on the 750 GB drive prices.  I used an LSI Logic FC
card that was listed on the Solaris Ready page, and I am using the LSI
Logic driver.

http://www.sun.com/io_technologies/vendor/lsi_logic_corporation.html

Works fine for our purposes, but again, we don't need screaming bleeding
edge performance either.

-Shannon

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread David Magda


On Jan 24, 2007, at 04:06, Bryan Cantrill wrote:


On Wed, Jan 24, 2007 at 12:15:21AM -0700, Jason J. W. Williams wrote:

Wow. That's an incredibly cool story. Thank you for sharing it! Does
the Thumper today pretty much resemble what you saw then?


Yes, amazingly so:  4-way, 48 spindles, 4u.  The real beauty of the
match between ZFS and Thumper was (and is) that ZFS unlocks new  
economics

in storage -- smart software achieving high performance and ultra-high


If Thumper and ZFS were born independently, how were all those disks  
going to be used without ZFS? It seems logical that the two be mated,  
but AFAIK there is no hardware RAID available in Thumpers.


Was normal software RAID the plan? Treating each disk as a separate  
mount point?


Just curious.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re[2]: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Robert Milkowski

Hello David,

Thursday, January 25, 2007, 1:47:57 AM, you wrote:

DM On Jan 24, 2007, at 04:06, Bryan Cantrill wrote:

 On Wed, Jan 24, 2007 at 12:15:21AM -0700, Jason J. W. Williams wrote:
 Wow. That's an incredibly cool story. Thank you for sharing it! Does
 the Thumper today pretty much resemble what you saw then?

 Yes, amazingly so:  4-way, 48 spindles, 4u.  The real beauty of the
 match between ZFS and Thumper was (and is) that ZFS unlocks new  
 economics
 in storage -- smart software achieving high performance and ultra-high

DM If Thumper and ZFS were born independently, how were all those disks  
DM going to be used without ZFS? It seems logical that the two be mated,
DM but AFAIK there is no hardware RAID available in Thumpers.

DM Was normal software RAID the plan? Treating each disk as a separate
DM mount point?

I guess Linux was considered probably with LVM or something else.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Why replacing a drive generates writes to other disks?

2007-01-24 Thread Brian Hechinger

On Thu, Jan 25, 2007 at 12:39:25AM +0100, Robert Milkowski wrote:
 Hello zfs-discuss,
 
 On another thumper I have a failing drive (port resets, etc.) so I
 issued over a week ago drive replacement. Well it still hasn't
 completed even 4% in a week! The pool config is the same. It's just
 wy to slow and in a long term risky.

The last time I saw something like this was on a D1000 that had serious
parity issues.  Overall it spent so much time retrying and backing down
the transfer rate that the data path to the disks was so slow as to be
unusable.  Got new cables and the problem went away.

Don't know if that applies to you or not.

-brian
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Jason J. W. Williams


Hi Wee,

Having snapshots in the filesystem that work so well is really nice.
How are y'all quiescing the DB?

Best Regards,
J

On 1/24/07, Wee Yeh Tan [EMAIL PROTECTED] wrote:

On 1/25/07, Bryan Cantrill [EMAIL PROTECTED] wrote:
 ...
 after all, what was ZFS going to do with that expensive but useless
 hardware RAID controller?  ...

I almost rolled over reading this.

This is exactly what I went through when we moved our database server
out from Vx** to ZFS.  We had a 3510 and were thinking how best to
configure the RAID.  In the end, we ripped out the controller board
and used the 3510 as a JBOD directly attached to the server.  My DBA
was so happy with this setup (especially with the snapshot capability)
he is asking for another such setup.


--
Just me,
Wire ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: zpool split

2007-01-24 Thread Richard L. Hamilton

...such that a snapshot (cloned if need be) won't do what you want?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool split

2007-01-24 Thread Dick Davies


On 25/01/07, Adam Leventhal [EMAIL PROTECTED] wrote:

On Wed, Jan 24, 2007 at 08:52:47PM +, Dick Davies wrote:
 that's an excellent feature addition, look forward to it.
 Will it be accompanied by a 'zfs join'?

Out of curiosity, what will you (or anyone else) use this for? If the idea
is to copy datasets to a new pool, why not use zfs send/receive?


To clarify, I'm talking about 'zfs split' as in
breaking  /tank/export/home into /tank/export/home/user1 ,
/tank/export/home/user2, etc.

The 'zfs join' is just an undo to help me out when I've been overzealous, every
directory in my system is a filesystem, and I have more automated
snapshots than I can stand...

--
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] X2100 not hotswap, was Re: External drive enclosures + Sun Server for massstorage

2007-01-24 Thread Frank Cusack


On January 23, 2007 8:11:24 PM -0200 Toby Thain [EMAIL PROTECTED] wrote:

Still, would be nice for those of us who bought them. And judging by
other posts on this thread it seems just about everyone assumes hotswap
just works.


hot *plug* :-)

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

66 matches

Mail list logo