Re: [zfs-discuss] X4500 device disconnect problem persists

2007-11-13 Thread Lida Horn
The "reset: no matching NCQ I/O found" issue appears to be related to the
error recovery for bad blocks on the disk.  In general it should be harmless, 
but
I have looked into this.  If there is someone out there who;
1) Is hitting this issue, and;
2) Is running recent Solaris Nevada bits (not Solaris 10) and;
3) Is willing to try out an experimental driver

I can provide a new binary (with which I've done some testing already)
which would appear to deal with this issue and do better and quicker error
recovery.  Remember that the underlying problem still appears to be bad blocks
on the disk, so until those blocks are re-written or mapped away there will
still be slow response and error messages generated each and every time those
blocks are read.

Regards,
Lida
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status can not detect the vdev removed?

2007-11-13 Thread hex.cookie
and when the system is reboot, I run zpool status, status told me that one vdev
is corrupt, and I recreate the file what I had removed. After all those 
operation, I run zpool destroy pool, the system reboot again.. should 
solaris do it?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] in a zpool consist of regular files, when I remove the file vdev, zpool status can not detect?

2007-11-13 Thread Chookiex
I make a file zpool like this:
bash-3.00# zpool status
  pool: filepool
 state: ONLINE
 scrub: none requested
config:
NAME  STATE READ WRITE CKSUM
filepool  ONLINE   0 0 0
  /export/f1.dat  ONLINE   0 0 0
  /export/f2.dat  ONLINE   0 0 0
  /export/f3.dat  ONLINE   0 0 0
spares
  /export/f4.dat  AVAIL
errors: No known data errors

after this, I run "rm /export/f1.dat", and I write something, the write 
operation is normal, but when I check the status of zpool, it hadn't told me 
any exception, but the file f1.dat is really removed!  

and when I scrub the pool, Solaris reboot...
what should I consider this? If the system would reboot when I get off a disk 
from the pool?


  

Be a better sports nut!  Let your teams follow you 
with Yahoo Mobile. Try it now.  
http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-13 Thread Jason J. W. Williams
Hi Darren,

> Ah, your "CPU end" was referring to the NFS client cpu, not the storage
> device CPU.  That wasn't clear to me.  The same limitations would apply
> to ZFS (or any other filesystem) when running in support of an NFS
> server.
>
> I thought you were trying to describe a qualitative difference between
> ZFS and WAFL in terms of data checksumming in the on-disk layout.

Eh...NetApp can just open WAFL to neuter the argument... ;-) Or I
suppose you could just run ZFS on top of an iSCSI or FC mount from the
NetApp.

The problem it seems to me with criticizing ZFS as not much different
than WAFL, is that WAFL is really a networked storage backend, not a
server operating system FS. If all you're using ZFS for is backending
networked storage, the "not much different" criticism holds a fair
amount of water I think. However, that highlights what's special about
ZFS...it isn't limited to just that use case. Its the first server OS
FS (to my knowledge) to provide all those features in one place, and
that's what makes it revolutionary. Because you can truly use its
features in any application with any storage. Its on that basis I
think that placing ZFS and WAFL on equal footing is not a strong
argument.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Suggestion/Request: ZFS-aware rm command

2007-11-13 Thread Paul Jochum
I agree, being able to delete the snapshot that a clone is attached to would be 
a nice feature.  Until we get that, this is what I have done (in case this 
helps anyone else):

1) snapshot the filesystem
2) clone the snapshot into a seperate pool
3) only nfs mount the seperate pool with clones

That way, if I need to delete a file from the backups, I can delete it from the 
clones.  Since users only have access to the clones, they would not see the 
deleted files.  On the other hand, the deleted files still exist in the 
snapshots, so I am never 'corrupting' my backups.  Note, this only works for 
limited situations like mine I believe, not sure if there is a way to prevent 
users from having access to their .zfs/snapshot directory when they have local 
access to the zfs filesystem.

Paul
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on a raid box

2007-11-13 Thread Richard Elling
Paul Boven wrote:
> Hi everyone,
> 
> We've building a storage system that should have about 2TB of storage
> and good sequential write speed. The server side is a Sun X4200 running
> Solaris 10u4 (plus yesterday's recommended patch cluster), the array we
> bought is a Transtec Provigo 510 12-disk array. The disks are SATA, and
> it's connected to the Sun through U320-scsi.

A lot of improvements in this area are in the latest SXCE builds.  Can
you try this test on b77?  I'm not sure what the schedule is for backporting
these changes to S10.
  -- richard

> Now the raidbox was sold to us as doing JBOD and various other raid
> levels, but JBOD turns out to mean 'create a single-disk stripe for
> every drive'. Which works, after a fashion: When using a 12-drive zfs
> with raidz and 1 hotspare, I get 132MB/s write performance, with raidz2
> it's still 112MB/s. If instead I configure the array as a Raid-50
> through the hardware raid controller, I can only manage 72MB/s.
> So at a first glance, this seems a good case for zfs.
> 
> Unfortunately, if I then pull a disk from the zfs array, it will keep
> trying to write to this disk, and will never activate the hot-spare. So
> a zpool status will then show the pool as 'degraded', one drive marked
> as unavailable - and the hot-spare still marked as available. Write
> performance also drops to about 32MB/s.
> 
> If I then try to activate the hot-spare by hand (zpool replace  disk> ) the resilvering starts, but never makes it past 10% -
> it seems to restart all the time. As this box is not in production yet,
> and I'm the only user on it, I'm 100% sure that there is nothing
> happening on the zfs filesystem during the resilvering - no reads,
> writes and certainly no snapshots.
> 
> In /var/adm/messages, I see this message repeated several times each minute:
> Nov 12 17:30:52 ddd scsi: [ID 107833 kern.warning] WARNING:
> /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1000,[EMAIL 
> PROTECTED]/[EMAIL PROTECTED],0 (sd47):
> Nov 12 17:30:52 ddd offline or reservation conflict
> 
> Why isn't this enough for zfs to switch over to the hotspare?
> I've tried disabling (setting to write-thru) the write-cache on the
> array box, but that didn't make any difference to the behaviour either.
> 
> I'd appreciate any insights or hints on how to proceed with this -
> should I even be trying to use zfs in this situation?
> 
> Regards, Paul Boven.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-13 Thread Richard Elling
Nathan Kroenert wrote:
> This question triggered some silly questions in my mind:
> 
> Lots of folks are determined that the whole COW to different locations 
> are a Bad Thing(tm), and in some cases, I guess it might actually be...

There is a lot of speculation about this, but no real data.
I've done some experiments on long seeks and didn't see much of a
performance difference, but I wasn't using a database workload.

Note that the many caches and optimizations in the path between the
database and physical medium will make this very difficult to characterize
for a general case.  Needless to say, you'll get better performance on a
device which can handle multiple outstanding I/Os -- avoid PATA disks.

> What if ZFS had a pool / filesystem property that caused zfs to do a 
> journaled, but non-COW update so the data's relative location for 
> databases is always the same?
> 
> Or - What if it did a double update: One to a staged area, and another 
> immediately after that to the 'old' data blocks. Still always have 
> on-disk consistency etc, at a cost of double the I/O's...

This is a non-starter.  Two I/Os is worse than one.

> Of course, both of these would require non-sparse file creation for the 
> DB etc, but would it be plausible?
> 
> For very read intensive and position sensitive applications, I guess 
> this sort of capability might make a difference?

We are all anxiously awaiting data...
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-13 Thread A Darren Dunham
On Tue, Nov 13, 2007 at 07:33:20PM -0200, Toby Thain wrote:
> >>> Yup - that's exactly the kind of error that ZFS and
> >> WAFL do a
> >>> perhaps uniquely good job of catching.
> >>
> >> WAFL can't catch all: It's distantly isolated from
> >> the CPU end.
> >
> > WAFL will catch everything that ZFS catches, including the kind of  
> > DMA error described above:  it contains validating information  
> > outside the data blocks just as ZFS does.
> 
> Explain how it can do that, when it is isolated from the application  
> by several layers including the network?

Ah, your "CPU end" was referring to the NFS client cpu, not the storage
device CPU.  That wasn't clear to me.  The same limitations would apply
to ZFS (or any other filesystem) when running in support of an NFS
server.

I thought you were trying to describe a qualitative difference between
ZFS and WAFL in terms of data checksumming in the on-disk layout.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
 < This line left intentionally blank to confuse you. >
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-13 Thread Toby Thain

On 11-Nov-07, at 10:19 AM, can you guess? wrote:

>>
>> On 9-Nov-07, at 2:45 AM, can you guess? wrote:
>
> ...
>
>>> This suggests that in a ZFS-style installation
>> without a hardware
>>> RAID controller they would have experienced at
>> worst a bit error
>>> about every 10^14 bits or 12 TB
>>
>>
>> And how about FAULTS?
>> hw/firmware/cable/controller/ram/...
>
> If you had read either the CERN study or what I already said about  
> it, you would have realized that it included the effects of such  
> faults.


...and ZFS is the only prophylactic available.


>
> ...
>
>>>  but I had a box that was randomly
 corrupting blocks during
 DMA.  The errors showed up when doing a ZFS scrub
>> and
 I caught the
 problem in time.
>>>
>>> Yup - that's exactly the kind of error that ZFS and
>> WAFL do a
>>> perhaps uniquely good job of catching.
>>
>> WAFL can't catch all: It's distantly isolated from
>> the CPU end.
>
> WAFL will catch everything that ZFS catches, including the kind of  
> DMA error described above:  it contains validating information  
> outside the data blocks just as ZFS does.

Explain how it can do that, when it is isolated from the application  
by several layers including the network?

--Toby

>
> ...
>
>>> CERN was using relatively cheap disks
>>
>> Don't forget every other component in the chain.
>
> I didn't, and they didn't:  read the study.
>
> ...
>
>>> Your position is similar to that of an audiophile
>> enthused about a
>>> measurable but marginal increase in music quality
>> and trying to
>>> convince the hoi polloi that no other system will
>> do:  while other
>>> audiophiles may agree with you, most people just
>> won't consider it
>>> important - and in fact won't even be able to
>> distinguish it at all.
>>
>> Data integrity *is* important.
>
> You clearly need to spend a lot more time trying to understand what  
> you've read before responding to it.
>
> - bill
>
>
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-13 Thread Nathan Kroenert
This question triggered some silly questions in my mind:

Lots of folks are determined that the whole COW to different locations 
are a Bad Thing(tm), and in some cases, I guess it might actually be...

What if ZFS had a pool / filesystem property that caused zfs to do a 
journaled, but non-COW update so the data's relative location for 
databases is always the same?

Or - What if it did a double update: One to a staged area, and another 
immediately after that to the 'old' data blocks. Still always have 
on-disk consistency etc, at a cost of double the I/O's...

Of course, both of these would require non-sparse file creation for the 
DB etc, but would it be plausible?

For very read intensive and position sensitive applications, I guess 
this sort of capability might make a difference?

Just some stabs in the dark...

Cheers!

Nathan.


Louwtjie Burger wrote:
> Hi
> 
> After a clean database load a database would (should?) look like this,
> if a random stab at the data is taken...
> 
> [8KB-m][8KB-n][8KB-o][8KB-p]...
> 
> The data should be fairly (100%) sequential in layout ... after some
> days though that same spot (using ZFS) would problably look like:
> 
> [8KB-m][   ][8KB-o][   ]
> 
> Is this "pseudo logical-physical" view correct (if blocks n and p was
> updated and with COW relocated somewhere else)?
> 
> Could a utility be constructed to show the level of "fragmentation" ?
> (50% in above example)
> 
> IF the above theory is flawed... how would fragmentation "look/be
> observed/calculated" under ZFS with large Oracle tablespaces?
> 
> Does it even matter what the "fragmentation" is from a performance 
> perspective?
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-11-13 Thread Peter Tribble
On 11/13/07, Dan Poltawski <[EMAIL PROTECTED]> wrote:
> I've just discovered patch 125205-07, which wasn't installed on our system 
> because we don't have SUNWhea..
>
> Has anyone with problems tried this patch, and has it helped at all?

We were having a pretty rough time running S10U4. While I was away on vacation
125205-06 was applied and apparently made some difference, although the
problem doesn't seem to have entirely vanished. (It's gone far enough away that
users aren't complaining, but I think we still want to put the -07
version of the
patch on when we can and I too would like confirmation that it's helping and
hasn't introduced any other regressions..)

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Suggestion/Request: ZFS-aware rm command

2007-11-13 Thread Ross
> But to create a clone you'll need a snapshot so I
> think the problem
> will still be there...

This might be a way around this problem though.  Deleting files from snapshots 
sounds like a messy approach in terms of the architecture, but deleting files 
from clones would be fine.

So what's needed is a way to separate a clone from it's snapshot and make it 
standalone.  Once that's done the original snapshot can be deleted, and files 
can be deleted from the clone.

You might loose a few of the benefits of snapshots with this approach, but it 
would seem a reasonable way of doing this.  Can ideas like this be raised with 
the people developing zfs?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-13 Thread Jonathan Stewart
can you guess? wrote:> Vitesse VSC410


> Yes, it will help detect
> hardware faults as well if they happen to occur between RAM and the
> disk (and aren't otherwise detected - I'd still like to know whether
> the 'bad cable' experiences reported here occurred before ATA started
> CRCing its transfers), but while there's anecdotal evidence of such
> problems presented here it doesn't seem to be corroborated by the few
> actual studies that I'm familiar with, so that risk is difficult to
> quantify.

It may not have been a bad cable and it is a cheap highpoint card but I
was running the card in RAID0 and getting random corrupted bytes on
reads that went away when I switched to JBOD.  The data was fine on disk
but I would get a corrupted byte every 250-500MB and the only reason I
noticed was because I was using Unison to sync folders and it kept
reporting differences I knew shouldn't exist.  So "bad cable" type
things do happen and ZFS probably would have helped me notice it sooner.
 If I hadn't had another copy of the data I may have been able to still
recover it but only because most of the files where 1-1.5MB jpegs and
the errors moved around so I could have just copied a file repeatedly
until I got a good copy but that would have been a lot of work.

Jonathan

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nice chassis for ZFS server

2007-11-13 Thread Richard Elling
Mick Russom wrote:
> Sun's "own" v60 and Sun v65 were pure Intel reference servers that worked 
> GREAT!

I'm glad they worked for you.  But I'll note that the critical deficiencies
in those platforms is solved by the newer Sun AMD/Intel/SPARC small form factor
rackmount servers.  The new chassis are far superior to the V60/V65 chassis, 
which
were not data center class designs even though they were rack-mountable.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status can not detect the vdev removed?

2007-11-13 Thread Eric Schrock
As with any application, if you hold the vnode (or file descriptor) open
and remove the underlying file, you can still write to the file even if
it is removed.  Removing the file only removes it from the namespace;
until the last reference is closed it will continue to exist.

You can use 'zpool online' to trigger a reopen of the device.  If you're
running a recent build of Nevada, you are better off using lofi devices
to simulate device removal, as it is much closer to the real thing.

- Eric

On Tue, Nov 13, 2007 at 09:39:27AM -0800, hex.cookie wrote:
> I make a file zpool like this:
> bash-3.00# zpool status
>   pool: filepool
>  state: ONLINE
>  scrub: none requested
> config:
> NAME  STATE READ WRITE CKSUM
> filepool  ONLINE   0 0 0
>   /export/f1.dat  ONLINE   0 0 0
>   /export/f2.dat  ONLINE   0 0 0
>   /export/f3.dat  ONLINE   0 0 0
> spares
>   /export/f4.dat  AVAIL
> errors: No known data errors
>  
> after this, I run "rm /export/f1.dat", and I write something, the write 
> operation is normal, but when I check the status of zpool, it hadn't told me 
> any exception, but the file f1.dat is really removed!  
>  
> and when I scrub the pool, Solaris reboot...
> what should I consider this? If the system would reboot when I get off a disk 
> from the pool?
> 
> 
> 
> Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See how.
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool status can not detect the vdev removed?

2007-11-13 Thread Chookiex
I make a file zpool like this:
bash-3.00# zpool status
  pool: filepool
 state: ONLINE
 scrub: none requested
config:
NAME  STATE READ WRITE CKSUM
filepool  ONLINE   0 0 0
  /export/f1.dat  ONLINE   0 0 0
  /export/f2.dat  ONLINE   0 0 0
  /export/f3.dat  ONLINE   0 0 0
spares
  /export/f4.dat  AVAIL
errors: No known data errors
 
after this, I run "rm /export/f1.dat", and I write something, the write 
operation is normal, but when I check the status of zpool, it hadn't told me 
any exception, but the file f1.dat is really removed!  
 
and when I scrub the pool, Solaris reboot...
what should I consider this? If the system would reboot when I get off a disk 
from the pool?


  

Be a better sports nut!  Let your teams follow you 
with Yahoo Mobile. Try it now.  
http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool status can not detect the vdev removed?

2007-11-13 Thread hex.cookie
I make a file zpool like this:
bash-3.00# zpool status
  pool: filepool
 state: ONLINE
 scrub: none requested
config:
NAME  STATE READ WRITE CKSUM
filepool  ONLINE   0 0 0
  /export/f1.dat  ONLINE   0 0 0
  /export/f2.dat  ONLINE   0 0 0
  /export/f3.dat  ONLINE   0 0 0
spares
  /export/f4.dat  AVAIL
errors: No known data errors
 
after this, I run "rm /export/f1.dat", and I write something, the write 
operation is normal, but when I check the status of zpool, it hadn't told me 
any exception, but the file f1.dat is really removed!  
 
and when I scrub the pool, Solaris reboot...
what should I consider this? If the system would reboot when I get off a disk 
from the pool?



Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See how.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Suggestion/Request: ZFS-aware rm command

2007-11-13 Thread Darren J Moffat
Paul Jochum wrote:
> Hi Richard:
> 
> I just tried your suggestion, unfortunately it doesn't work.  Basically:
> make a clone of the snapshot - works bine
> in the clone, remove the directories - works fine
> make a snapshot of the clone - works fine
> destroy the clone - fails, because  ZFS reports that the "filesystem has 
> children"

Have you looked at the "promote" command for zfs(1) ?


-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 device disconnect problem persists

2007-11-13 Thread Dan Poltawski
I've just discovered patch 125205-07, which wasn't installed on our system 
because we don't have SUNWhea..

Has anyone with problems tried this patch, and has it helped at all?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs on a raid box

2007-11-13 Thread Paul Boven
Hi everyone,

We've building a storage system that should have about 2TB of storage
and good sequential write speed. The server side is a Sun X4200 running
Solaris 10u4 (plus yesterday's recommended patch cluster), the array we
bought is a Transtec Provigo 510 12-disk array. The disks are SATA, and
it's connected to the Sun through U320-scsi.

Now the raidbox was sold to us as doing JBOD and various other raid
levels, but JBOD turns out to mean 'create a single-disk stripe for
every drive'. Which works, after a fashion: When using a 12-drive zfs
with raidz and 1 hotspare, I get 132MB/s write performance, with raidz2
it's still 112MB/s. If instead I configure the array as a Raid-50
through the hardware raid controller, I can only manage 72MB/s.
So at a first glance, this seems a good case for zfs.

Unfortunately, if I then pull a disk from the zfs array, it will keep
trying to write to this disk, and will never activate the hot-spare. So
a zpool status will then show the pool as 'degraded', one drive marked
as unavailable - and the hot-spare still marked as available. Write
performance also drops to about 32MB/s.

If I then try to activate the hot-spare by hand (zpool replace  ) the resilvering starts, but never makes it past 10% -
it seems to restart all the time. As this box is not in production yet,
and I'm the only user on it, I'm 100% sure that there is nothing
happening on the zfs filesystem during the resilvering - no reads,
writes and certainly no snapshots.

In /var/adm/messages, I see this message repeated several times each minute:
Nov 12 17:30:52 ddd scsi: [ID 107833 kern.warning] WARNING:
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd47):
Nov 12 17:30:52 ddd offline or reservation conflict

Why isn't this enough for zfs to switch over to the hotspare?
I've tried disabling (setting to write-thru) the write-cache on the
array box, but that didn't make any difference to the behaviour either.

I'd appreciate any insights or hints on how to proceed with this -
should I even be trying to use zfs in this situation?

Regards, Paul Boven.
-- 
Paul Boven <[EMAIL PROTECTED]> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nice chassis for ZFS server

2007-11-13 Thread Mick Russom
>Internal drives suck. If you go through the trouble of putting in a
>drive, at least make it hot pluggable.

They are all hot-swappable/pluggable on the the SSR212MC2. There are two 
additional internal 2.5" SAS bonus drives that arent, but the front 12 are.

I for one think external enclosures are annoying. Whats wrong with God-boxes 
like this? You will invariably use up more than 2U for every 12 3.5" drivers 
with **all** other alternatives to this. 

>argv! surely this is a clerical error?
No, its annoying with the best platforms, especially from vendors like Intel 
who go a long way to support the product properly over long periods of time, do 
not land on the HCL. These are the best platforms to certify.

Hope this box lands on the HCL, its a beaut.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nice chassis for ZFS server

2007-11-13 Thread Mick Russom
Sun did something like this with the v60 and v65 servers, and they should do it 
again with the SSR212MC2.

The heart of the SAS subsystem of the SSR212MC2 is the SRCSAS144E .

This card is interfacing with a Vitesse VSC410 SAS-expander and is plugged into 
a S5000PSL motherboard. 

This card is closely related to the MegaRAID SAS 8208ELP . 

All the drives, 12, in front are SAS/SATA-II hotswappable. 

http://www.intel.com/design/servers/storage/ssr212mc2/index.htm .

This is a pure Intel reference design. This is the most drives that fits into a 
2U **EVER**. This is the best storage product in existence today. 

The SRCSAS144E is a MegaRAID SAS controller. 

Sun's "own" v60 and Sun v65 were pure Intel reference servers that worked 
GREAT! 

Everything works in Linux, want to see?

cat /etc/redhat-release 
CentOS release 5 (Final)

uname -a
Linux localhost.localdomain 2.6.18-8.1.15.el5 #1 SMP Mon Oct 22 08:32:28 EDT 
2007 x86_64 x86_64 x86_64 GNU/Linux

megasas: 00.00.03.05 Mon Oct 02 11:21:32 PDT 2006
megasas: 0x1000:0x0411:0x8086:0x1003: bus 9:slot 14:func 0
ACPI: PCI Interrupt :09:0e.0[A] -> GSI 18 (level, low) -> IRQ 185
megasas: FW now in Ready state
scsi0 : LSI Logic SAS based MegaRAID driver
Vendor: Intel Model: SSR212MC Rev: 01A 
Type: Enclosure ANSI SCSI revision: 05
Vendor: INTEL Model: SRCSAS144E Rev: 1.03
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sda: 2919915520 512-byte hdwr sectors (1494997 MB)
sda: Write Protect is off
sda: Mode Sense: 1f 00 00 08
SCSI device sda: drive cache: write back
SCSI device sda: 2919915520 512-byte hdwr sectors (1494997 MB)
sda: Write Protect is off
sda: Mode Sense: 1f 00 00 08
SCSI device sda: drive cache: write back
sda: sda1 sda2 sda3
sd 0:2:0:0: Attached scsi disk sda
Fusion MPT base driver 3.04.02
Copyright (c) 1999-2005 LSI Logic Corporation
Fusion MPT SAS Host driver 3.04.02
ACPI: PCI Interrupt :04:00.0[A] -> GSI 17 (level, low) -> IRQ 177
mptbase: Initiating ioc0 bringup
ioc0: SAS1064E: Capabilities={Initiator}
PCI: Setting latency timer of device :04:00.0 to 64
scsi1 : ioc0: LSISAS1064E, FwRev=0110h, Ports=1, MaxQ=511, IRQ=177

09:0e.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS
Subsystem: Intel Corporation SRCSAS144E RAID Controller

09:0e.0 0104: 1000:0411
Subsystem: 8086:1003

I know that Google and Yahoo are buying these chassis in droves, and many of 
the other folks I know "in the industry" are seeing massive sales of this box.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + DB + "fragments"

2007-11-13 Thread Roch - PAE

Louwtjie Burger writes:
 > Hi
 > 
 > After a clean database load a database would (should?) look like this,
 > if a random stab at the data is taken...
 > 
 > [8KB-m][8KB-n][8KB-o][8KB-p]...
 > 
 > The data should be fairly (100%) sequential in layout ... after some
 > days though that same spot (using ZFS) would problably look like:
 > 
 > [8KB-m][   ][8KB-o][   ]
 > 
 > Is this "pseudo logical-physical" view correct (if blocks n and p was
 > updated and with COW relocated somewhere else)?
 > 

That's the proper view if the ZFS recordsize is tuned to be 8KB.
That's a best practice that might need to be qualified in
the future.


 > Could a utility be constructed to show the level of "fragmentation" ?
 > (50% in above example)
 > 

That will need to dive into the internals of ZFS. But
anything is possible.  It's been done for UFS before.


 > IF the above theory is flawed... how would fragmentation "look/be
 > observed/calculated" under ZFS with large Oracle tablespaces?
 > 
 > Does it even matter what the "fragmentation" is from a performance 
 > perspective?

It matters to table scans and how those scans will impact OLTP
workloads. Good blog topic. Stay tune.


 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Securing a risky situation with zfs

2007-11-13 Thread Gabriele Bulfon
Hi,
we're having a bad situation with a SAN iScsi solution in a production 
environment of a customer: the storage hardware may panic its kernel because of 
its software fault, with the risk of loosing data.
We want to give the SAN manufacturer a last chance of correcting their 
solution: we're going to move data from the SAN to new fresh scsi-attached 
disks, for the time needed for them to find the bugs. Once they've certified us 
the solution, we will move back the data onto the SAN.
Here comes the issue: we can't risk our customer's data to be again on a 
possibly faulty SAN, so we were thinking about reusing the scsi-attached disks 
as part of the zfs pool of the SAN partitions.
The basic idea was to have a zfs mirror of each iscsi disk on scsi-attached 
disks, so that in case of another panic of the SAN, everything should still 
work on the scsi-attached disks.
My questions are:
- is this a good idea?
- should I use zfs mirrors or normal solaris mirrors?
- is mirroring the best performance, or should I use zfs raid-z?
- is there any other possibility I don't see?
Last but not least (I know the question is not pertinent, but maybe you can 
help):
- The SAN includes 2 Sun-Solaris-10 machines, and 3 windows machinesis 
there any similar solution on the win machines?

Thanx for any help
Gabriele Bulfon.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Suggestion/Request: ZFS-aware rm command

2007-11-13 Thread Sylvain Dusart
2007/11/13, Paul Jochum <[EMAIL PROTECTED]>:

> (the only option I can think of, is to use clones instead of snapshots in the 
> future, just so that I can delete files in the clones in case I ever need to)

But to create a clone you'll need a snapshot so I think the problem
will still be there...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss