Re: [zfs-discuss] ZFS RAID-Z1 Degraded Array won't import

2010-04-12 Thread Richard Elling
On Apr 12, 2010, at 8:01 PM, Peter Tripp wrote:

> Hi folks,
> 
> At home I run OpenSolaris x86 with a 4 drive Raid-Z (4x1TB) zpool and it's 
> not in great shape.  A fan stopped spinning and soon after the top disk 
> failed (cause you know, heat rises).  Naturally, OpenSolaris and ZFS didn't 
> skip a beat; I didn't even notice it was dead until I saw the disk activity 
> LED stuck on nearly a week later.
> 
> So I decided I would attach the disks to 2nd system (with working fans) where 
> I could backup the data to tape. So here's where I got dumb...I ran 'zpool 
> export'.  Of course, I never actually ended up attaching the disks to another 
> machine, but ever since that export I've been unable to import the pool at 
> all. I've ordered a replacement 1TB disk, but it hasn't arrived yet. Since I 
> got no errors from the scrub I ran while the array was degraded, I'm pretty 
> confident that the remaining 3 disks have valid data.
> 
> * Should I be able to import a degraded pool?

In general, yes. But it is complaining about corrupted data, which can 
be due to another failure.

> * If not, shouldn't there be a warning when exporting a degraded pool?

What should the warning say?

> * If replace 1TB dead disk with a blank disk, might the import work?

Have you tried simply removing the dead drive?  
Also, the ZFS Troubleshooting Guide has procedures that might help.

> * Are there any tools (or commercial services) for ZFS recovery?

Versions of OpenSolaris after b128 have additional recovery capability
using the "zpool import -F" option.
 -- richard

> 
> I read a blog post (which naturally now I can't find) where someone in 
> similar circumstances was able to import his pool after restoring 
> /etc/zfs/zpool.cache from a backup before the 'zpool export'. Naturally this 
> guy was doing it with ZFS-FUSE under Linux, so it's another step removed, but 
> can someone explain to me the logic & risks of trying such a thing?  Will it 
> work if the zpool.cache comes from 1day/1week/1month old backup?
> 
> So here's what I get...
> pe...@pickle:~$ pfexec zpool import
> pool: andre
>   id: 5771661786439152324
> state: FAULTED
> status: One or more devices are missing from the system.
> action: The pool cannot be imported. Attach the missing
>   devices and try again.
>  see: http://www.sun.com/msg/ZFS-8000-3C
> config:
> 
>   andre   FAULTED  corrupted data
> raidz1DEGRADED
>   c5t0d0  ONLINE
>   c5t1d0  ONLINE
>   c5t2d0  ONLINE
>   c5t3d0  UNAVAIL  cannot open
> 
> Any constructive suggestions would be greatly appreciated.
> Thanks
> --Peter
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Fileserver help.

2010-04-12 Thread Daniel
Hi all.

Im pretty new to the whole OpenSolaris thing, i've been doing a bit of research 
but cant find anything on what i need.

I am thinking of making myself a home file server running OpenSolaris with ZFS 
and utilizing Raid/Z

I was wondering if there is anything i can get that will allow Windows Media 
Center based hardware (HTPC or XBOX 360) to steam from my new fileserver?

Any help is appreciated and remember im new :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snapshots taking too much space

2010-04-12 Thread Peter Tripp
Though the rsync switch is probably the answer to your problem...

You might want to consider upgrading to Nexenta 3.0, switching checksums from 
fletcher to sha1 and then enabling block level deduplication. You'd probably 
use less GB per snapshot even with rsync running inefficiently.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] problem in recovering back data

2010-04-12 Thread MstAsg
Hello,

I have problem regarding zfs. After installing solaris 10 x86, it worked for a 
while, and then, some problem happened that there was problem in solaris and it 
could not get loaded! Even fail safe didn't resolve the problem. I put an open 
solaris CD and booted from it, I wrote the bellow commands;
zpool create raidz c2t0d0 c2t1d0 c2t2d0 
zfs create indexes/db1 (&/db2) 
mount -F zfs /dev/dsk/c2t0d0   /name 

  
Then,  I didn't see the data I had after cd /name. When I retried to boot, the 
error of 'no active partition' was what I saw in my system. I don't know how, 
but I installed and chose the disks while installing. After installation, I 
copied boot block to the other disks for not having problem of boot. I hope 
that someone help me with that.
   


Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAID-Z1 Degraded Array won't import

2010-04-12 Thread Daniel Carosone
On Mon, Apr 12, 2010 at 08:01:27PM -0700, Peter Tripp wrote:
> So I decided I would attach the disks to 2nd system (with working fans) where 
> I could backup the data to tape. So here's where I got dumb...I ran 'zpool 
> export'.  Of course, I never actually ended up attaching the disks to another 
> machine, but ever since that export I've been unable to import the pool at 
> all. I've ordered a replacement 1TB disk, but it hasn't arrived yet. Since I 
> got no errors from the scrub I ran while the array was degraded, I'm pretty 
> confident that the remaining 3 disks have valid data.
> 
> * Should I be able to import a degraded pool?

Did you try with -f?  I doubt it will help.

> * If not, shouldn't there be a warning when exporting a degraded pool?

Interesting point.

> * If replace 1TB dead disk with a blank disk, might the import work?

Only if the import is failing because the dead disk is nonresponsive
in a way that makes the import hang.  Otherwise, you'd import the pool
first then replace the drive.

> * Are there any tools (or commercial services) for ZFS recovery?

Dunno about commercial services, zpool and zdb seem to work most of
the time.


> I read a blog post (which naturally now I can't find) where someone
> in similar circumstances was able to import his pool after restoring
> /etc/zfs/zpool.cache from a backup before the 'zpool
> export'. Naturally this guy was doing it with ZFS-FUSE under Linux,
> so it's another step removed, but can someone explain to me the
> logic & risks of trying such a thing?  Will it work if the
> zpool.cache comes from 1day/1week/1month old backup? 

If you have auto-snapshots of your running BE (/etc) from before the
import, that should work fine.  Note that you can pass import an
argument "-c cachefile" so you don't have to interfere with the
current system one.

You'd have to do this on the original system, I think.

The logic is that the cachefile contains copies of the labels of the
missing devices, and can substitute for the devices themselves when
importing a degradedd pool (typically at boot).

This is useful enough that i'd like to see some of the reserved area
between the on-disk labels and the first metaslab on each disk, used
to store a copy of the cache file / same data.  That way every pool
member has the information about other members necessary to import a
degraded pool.   Even if it had to be extracted first with zdb to be
used as a separate zpool.cache as above, it would be helpful for this
scenario. 

--
Dan.

pgpc2zztGllcm.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to Catch ZFS error with syslog?

2010-04-12 Thread matthew patton

> > Please can this be on by default? Please?
> 
> There are some situations where many reports may be sent
> per second so 
> it is not necessarily a wise idea for this to be enabled by
> default.

Every implementation of Syslog worth a damn has automatic message throttling 
and coalescing. This is not a defensible excuse. If it's important to the 
system status/health, it belongs in syslog.


  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS RAID-Z1 Degraded Array won't import

2010-04-12 Thread Peter Tripp
Hi folks,

At home I run OpenSolaris x86 with a 4 drive Raid-Z (4x1TB) zpool and it's not 
in great shape.  A fan stopped spinning and soon after the top disk failed 
(cause you know, heat rises).  Naturally, OpenSolaris and ZFS didn't skip a 
beat; I didn't even notice it was dead until I saw the disk activity LED stuck 
on nearly a week later.

So I decided I would attach the disks to 2nd system (with working fans) where I 
could backup the data to tape. So here's where I got dumb...I ran 'zpool 
export'.  Of course, I never actually ended up attaching the disks to another 
machine, but ever since that export I've been unable to import the pool at all. 
I've ordered a replacement 1TB disk, but it hasn't arrived yet. Since I got no 
errors from the scrub I ran while the array was degraded, I'm pretty confident 
that the remaining 3 disks have valid data.

* Should I be able to import a degraded pool?
* If not, shouldn't there be a warning when exporting a degraded pool?
* If replace 1TB dead disk with a blank disk, might the import work?
* Are there any tools (or commercial services) for ZFS recovery?

I read a blog post (which naturally now I can't find) where someone in similar 
circumstances was able to import his pool after restoring /etc/zfs/zpool.cache 
from a backup before the 'zpool export'. Naturally this guy was doing it with 
ZFS-FUSE under Linux, so it's another step removed, but can someone explain to 
me the logic & risks of trying such a thing?  Will it work if the zpool.cache 
comes from 1day/1week/1month old backup?

So here's what I get...
pe...@pickle:~$ pfexec zpool import
 pool: andre
   id: 5771661786439152324
state: FAULTED
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
   devices and try again.
  see: http://www.sun.com/msg/ZFS-8000-3C
config:

   andre   FAULTED  corrupted data
 raidz1DEGRADED
   c5t0d0  ONLINE
   c5t1d0  ONLINE
   c5t2d0  ONLINE
   c5t3d0  UNAVAIL  cannot open

Any constructive suggestions would be greatly appreciated.
Thanks
--Peter
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Create 1 pool from 3 exising pools in mirror configuration

2010-04-12 Thread Harry Putnam
Daniel Carosone  writes:

[...] snipped welcome info... thanks.

> Oh, another thing, just to make sure before you start, since this is
> evidently older hardware: are you running a 32-bit or 64-bit kernel?
> The 32-bit kernel won't use drives larger than 1TB.

Its an athlon64 so I'm good there.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Create 1 pool from 3 exising pools in mirror configuration

2010-04-12 Thread Brandon High
On Mon, Apr 12, 2010 at 4:17 PM, Harry Putnam  wrote:
> If its possible to add a mirrored set as a vdev to a zpool like what
> seems to be happening in (3) above, why wouldn't I just add the two
> new disks as mirrored vdev to z2 to start off, rather than additional
> mirrors, and never remove the original disks of z2.

I started writing it, assuming that you'd discard the original two
drives, then realized you'd said you were going to add the new drives
and keep the old but I didn't clean things up completely.

What that, the process is:
1. Add the new drives as a mirror vdev to the z2 pool. ( zpool attach
z2 mirror new_disk_0 new_disk_0 - The "mirror is very, very important.
Without it you'll add two unmirrored vdevs. zpool should complain if
this is the case though.)
2. Do a zfs send of all datasets from z3 to z2.
3. zpool destroy z3.
4. Add the drives from z3 to z2 as a mirror vdev.

As Daniel suggested, testing the process on VirtualBox / VMWare / etc
is a great idea.

>> If you're so inclined, you could move some datasets from rpool to z2
>> to keep your rpool smaller.
>
> So having some data on rpool (besides the OS I mean) is not
> necessarily a bad thing then?

The pool will resilver faster if it's got less data on it, which may
be important to you.

The rpool only supports mirrored redundancy, and you can't add more
vdevs to it, so the ability to grow it is limited.

It's also good practice to keep your OS install separate from your
data for maintenance reasons. For my home server, the rpool contains
only the OS install and everything else is in a separate pool.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread Eric D. Mudama

On Mon, Apr 12 at 10:50, Bob Friesenhahn wrote:

On Mon, 12 Apr 2010, Tomas Ögren wrote:


For flash to overwrite a block, it needs to clear it first.. so yes,
clearing it out in the background (after erasing) instead of just before
the timing critical write(), you can make stuff go faster.


Yes of course.  Properly built SSDs include considerable extra space 
to support wear leveling, and this same space may be used to store 
erased blocks.  A block which is "overwritten" can simply be written 
to a block allocated from the extra free pool, and the existing block 
can be re-assigned to the free pool and scheduled for erasure.  This 
is a fairly simple recirculating algorithm which just happens to also 
assist with wear management.


The point originally made is that if you eventually write to every LBA
on a drive without TRIM support, your "considerable extra space" will
only include the extra physical blocks that the manufacturer provided
when they sold you the device, and for which you are paying.

The advantage of TRIM, even in high end SSDs, is that it allows you to
effectively have additional "considerable extra space" available to
the device for garbage collection and wear management when not all
sectors are in use on the device.

For most users, with anywhere from 5-15% of their device unused, this
difference is significant and can improve performance greatly in some
workloads.  Without TRIM, the device has no way to use this space for
anything but tracking the data that is no longer active.

Based on the above, I think TRIM has the potential to help every SSD,
not just the "cheap" SSDs.

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Create 1 pool from 3 exising pools in mirror configuration

2010-04-12 Thread Daniel Carosone
On Mon, Apr 12, 2010 at 06:17:47PM -0500, Harry Putnam wrote:
> But, I'm too unskilled in solaris and zfs admin to be risking a total
> melt down if I try that before gaining a more thorough understanding.

Grab virtualbox or something similar and set yourself up a test
environment.  In general, and for you in particular, you will learn
the most this way - including learning what not to fear. 

You can also experiment with test pools in files, using a file per
vdev instead of a disk per vdev.   There's no need for these vdevs to
be especially large, in order to practice things like attaching
mirrors or sending between pools.

> If its possible to add a mirrored set as a vdev to a zpool like what
> seems to be happening in (3) above, why wouldn't I just add the two
> new disks as mirrored vdev to z2 to start off, rather than additional
> mirrors, and never remove the original disks of z2.

If you have enough ports and bays for all the drives, sure.  There was
an assumption from your earlier messages that the 1.5's were to
replace the original drives.  This involves the same consolidation
steps as before, with an add of the new 1.5TB mirror set as an
additional step, basically anywhere in the sequence.

Oh, another thing, just to make sure before you start, since this is
evidently older hardware: are you running a 32-bit or 64-bit kernel?
The 32-bit kernel won't use drives larger than 1TB.

> So having some data on rpool (besides the OS I mean) is not
> necessarily a bad thing then?

Not at all; laptops would be screwed otherwise.

--
Dan.


pgpPako2S0ks5.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Send/Receive Question

2010-04-12 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> 
> I am trying to duplicate a filesystem from one zpool to another zpool.
> I don't care so much about snapshots on the destination side...I am
> more trying to duplicate how RSYNC would copy a filesystem, and then
> only copy incrementals from the source side to the destination side in
> subsequent runs until I can complete a switch-over.  What would be some
> examples of the syntax in zfs send/receive?


Sending from a pool called "tank" on "firsthost" to a pool called "tank" on
"somehost."  Ensure "somehost" has a pool called "tank" and a zfs filesystem
called "firsthost-tank" inside that pool.
Forgive the prolific occurrences of "set readonly=on" below.  I just made
that happen on every send, unconditionally.

zfs snapshot t...@forsync-2010-02-06-00-07-18
zfs send t...@forsync-2010-02-06-00-07-18 | ssh somehost 'zfs receive -F
tank/firsthost-t...@forsync-2010-02-06-00-07-18'
ssh somehost 'zfs set readonly=on tank/firsthost-tank'

zfs snapshot t...@forsync-2010-02-07-00-01-05
zfs send -i t...@forsync-2010-02-06-00-07-18
t...@forsync-2010-02-07-00-01-05 | ssh somehost 'zfs receive
tank/firsthost-t...@forsync-2010-02-07-00-01-05'
ssh somehost 'zfs set readonly=on tank/firsthost-tank'


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Create 1 pool from 3 exising pools in mirror configuration

2010-04-12 Thread Harry Putnam
Brandon High  writes:

> On Mon, Apr 12, 2010 at 7:38 AM, Harry Putnam  wrote:
>> But as someone suggested it might be better to get two more bigger
>> drives.  1t or 1.5t would handle all my data on one pair.
>>
>> Then I guess after moving all the data to a single zpool made up of
>> those 2 new disks, I could then add the freed up drives as vdevs to
>> it?

First off I'm sorry to keep on with what are actually theoretical
questions since I'm chicken to work along with this with real
experiments etc It may get a little irritating answering things that
might be obvious if I just tried this stuff as I go allong.  

But, I'm too unskilled in solaris and zfs admin to be risking a total
melt down if I try that before gaining a more thorough understanding.

> It's probably even easier than that if you have enough ports and bays
> for two more drives. What you'd have to do, roughly, is:
>
> 1. Add both of the new drives as additional mirrors of the z2 pool.
> Wait for resilver to complete.
> 2. Detach the original drives from z2. If autoexpand is set to on, z2
> should now have 1.5TB. Otherwise export / import the pool.
> 3. Add the original drives to z2 as a mirror vdev.

I'm getting a bit confused here.

If its possible to add a mirrored set as a vdev to a zpool like what
seems to be happening in (3) above, why wouldn't I just add the two
new disks as mirrored vdev to z2 to start off, rather than additional
mirrors, and never remove the original disks of z2.

> 4. Do a zfs send of all datasets from z3 to z2.
> 5. zpool destroy z3.
> 6. Add the drives from z3 to z2 a mirror vdev.

[...]

> If you're so inclined, you could move some datasets from rpool to z2
> to keep your rpool smaller.

So having some data on rpool (besides the OS I mean) is not
necessarily a bad thing then?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What happens when unmirrored ZIL log device is removed ungracefully

2010-04-12 Thread Edward Ned Harvey
> Carson Gaspar wrote:
> 
> Does anyone who understands the internals better than care to take a
> stab at what happens if:
> 
> - ZFS writes data to /dev/foo
> - /dev/foo looses power and the data from the above write, not yet
> flushed to rust (say a field tech pulls the wrong drive...)
> - /dev/foo powers back on (field tech quickly goes whoops and plugs it
> back in)

I can't answer as an "internals" guy, but I can say this:  I accidentally
knocked the power off my external drive, which contains a pool.  I quickly
reconnected it.  A few days later I noticed the machine was essentially
nonresponsive, and had to power cycle it.

It is possible that something else happened in the meantime, to put the
machine into a bad state, but at least it's highly suspect that this
happened after I kicked the power.

I never tested this scientifically.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Eric D. Mudama
> 
> I believe the reason strings of bits "leak" on rotating drives you've
> overwritten (other than grown defects) is because of minute off-track
> occurances while writing (vibration, particles, etc.), causing
> off-center writes that can be recovered in the future with the right
> equipment.

That's correct.  In spindle drives, even if you "zero" the drive, the
imprecise positioning of the head is accurate enough for itself to later
read "zeroes" accurately from that location, but if the platters are removed
and placed into special high precision hardware, the data can be
forensically reconstructed by reading the slightly off-track traces.  This
process costs a few thousand per drive, and takes about a week.  So
"zero"ing the drive is good enough data destruction for nearly all people in
nearly all situations, but not good enough if a malicious person were
willing to pay thousands to recover the data.

BTW, during the above process, they have to make intelligent guesses about
when they're picking up formerly erased bits and when they're picking up
noise.  They have to know what to listen for.  So they can identify things
like "that sounds like a jpg file" and so on ... but if the data itself were
encrypted, and the empty space around the useful data were also encrypted,
and then the whole thing was then zeroed, it would be nearly impossible to
recover the encrypted data after zero'ing, because even the intended data
signal would be indistinguishable from noise.  And even if they were able to
get that ... they'd still have to decrypt it.


> Flash doesn't have this "analog positioning" problem.  While each
> electron well is effectively analog, there's no "best guess" work at
> locating the wells.

Although flash doesn't have the tracking issue, it does have a similar
stored history characteristic, which at least theoretically could be used to
read formerly erased data.  Assuming the storage elements are 3-bit
multilevel cells, it means the FG charge level should land into one of 8
bins ... ideally at the precise center of each bin each time.  But in
reality, it never will.  When programming or erasing the element, the tunnel
injection or release is held at a known value for a known time, sufficiently
long enough to bring the FC into the desired bin, but if the final charge
level lands within +/- 5% or even 10% or more, off center from the precise
center of the bin, that doesn't matter in normal operation.  Because it's
still clearly identifiable which bin it's in.  But if a flash device were
"zero"ed or "erased" (all 1's) and a forensic examiner could directly access
the word lines, then using instrumentation of higher precision than the
on-chip A2D's, the former data could be extracted, with a level of
confidence similar to the aforementioned off-track forensic data
reconstruction of a spindle drive.

Problem is, how to access the word lines.  Cuz generally speaking, they
didn't bring 'em out to pins of the chip.  So like I said ... theoretically
possible.  I expect the NSA or CIA could do it.  But the local drive
recovery mom & pop shop ... not so likely.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to Catch ZFS error with syslog ?

2010-04-12 Thread Bob Friesenhahn

On Tue, 13 Apr 2010, Daniel Carosone wrote:


On Mon, Apr 12, 2010 at 09:32:50AM -0600, Tim Haley wrote:

Try explicitly enabling fmd to send to syslog in
/usr/lib/fm/fmd/plugins/syslog-msgs.conf


Wow, so useful, yet so well hidden I never even knew to look for it.

Please can this be on by default? Please?


There are some situations where many reports may be sent per second so 
it is not necessarily a wise idea for this to be enabled by default.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to Catch ZFS error with syslog ?

2010-04-12 Thread Daniel Carosone
On Mon, Apr 12, 2010 at 09:32:50AM -0600, Tim Haley wrote:
> Try explicitly enabling fmd to send to syslog in
> /usr/lib/fm/fmd/plugins/syslog-msgs.conf

Wow, so useful, yet so well hidden I never even knew to look for it.

Please can this be on by default? Please?

--
Dan.


pgpDwZouV1dUr.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Create 1 pool from 3 exising pools in mirror configuration

2010-04-12 Thread Brandon High
On Mon, Apr 12, 2010 at 7:38 AM, Harry Putnam  wrote:
> But as someone suggested it might be better to get two more bigger
> drives.  1t or 1.5t would handle all my data on one pair.
>
> Then I guess after moving all the data to a single zpool made up of
> those 2 new disks, I could then add the freed up drives as vdevs to
> it?

It's probably even easier than that if you have enough ports and bays
for two more drives. What you'd have to do, roughly, is:

1. Add both of the new drives as additional mirrors of the z2 pool.
Wait for resilver to complete.
2. Detach the original drives from z2. If autoexpand is set to on, z2
should now have 1.5TB. Otherwise export / import the pool.
3. Add the original drives to z2 as a mirror vdev.
4. Do a zfs send of all datasets from z3 to z2.
5. zpool destroy z3.
6. Add the drives from z3 to z2 a mirror vdev.

Your z2 should now have about 2.75TB space.

If you're so inclined, you could move some datasets from rpool to z2
to keep your rpool smaller.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What happens when unmirrored ZIL log device is removed ungracefully

2010-04-12 Thread Carson Gaspar

Carson Gaspar wrote:

Miles Nordin wrote:

"re" == Richard Elling  writes:



How do you handle the case when a hotplug SATA drive is powered off
unexpectedly with data in its write cache?  Do you replay the writes, 
or do they go down the ZFS hotplug write hole?


If zfs never got a positive response to a cache flush, that data is 
still in memory and will be re-written. Unless I greatly misunderstand 
how ZFS works...


If the drive _lies_ about a cache flush, you're screwed (well, you can 
probably roll back a few TXGs...). Don't buy broken drives / bridge 
chipsets.


Hrm... thinking about this some more, I'm not sure what happens if the 
drive comes _back_ after a power loss, quickly enough that ZFS is never 
told about the disappearance (assuming that can happen without a human 
cfgadm'ing it back online - I don't know).


Does anyone who understands the internals better than care to take a 
stab at what happens if:


- ZFS writes data to /dev/foo
- /dev/foo looses power and the data from the above write, not yet 
flushed to rust (say a field tech pulls the wrong drive...)
- /dev/foo powers back on (field tech quickly goes whoops and plugs it 
back in)


In the case of a redundant zpool config, when will ZFS notice the 
uberblocks are out of sync and repair? If this is a non-redundant zpool, 
how does the response differ?


--
Carson


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What happens when unmirrored ZIL log device is removed ungracefully

2010-04-12 Thread Carson Gaspar

Miles Nordin wrote:

"re" == Richard Elling  writes:



How do you handle the case when a hotplug SATA drive is powered off
unexpectedly with data in its write cache?  Do you replay the writes, 
or do they go down the ZFS hotplug write hole?


If zfs never got a positive response to a cache flush, that data is 
still in memory and will be re-written. Unless I greatly misunderstand 
how ZFS works...


If the drive _lies_ about a cache flush, you're screwed (well, you can 
probably roll back a few TXGs...). Don't buy broken drives / bridge 
chipsets.


--
Carson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What happens when unmirrored ZIL log device is removed ungracefully

2010-04-12 Thread Miles Nordin
> "re" == Richard Elling  writes:
> "dc" == Daniel Carosone  writes:

re> In general, I agree. How would you propose handling nested
re> mounts?

force-unmount them.  (so that they can be manually mounted elsewhere,
if desired, or even in the same place with the middle filesystem
missing and empty directories in between.  In the latter case the nfs
fsid should stay the same so that hard-mounted clients can continue
once a sysadmin forces the remount.  Remember, hard-mounted NFS
clients will do this even hours or days later, and this behavior can
be extremely useful to a batch cluster thqat's hard to start, or even
just someone who doesn't want to lose his last edit.)

And make force-mounting actually work like it does on Mac OS.

dc> Please look at the pool property "failmode".

It doesn't work, though.  We've been over this.  Failmode applies
after it's decided that the drive is failed, but it can take an
arbitrary time---minutes, hours, or forever---for an underlying driver
to report that a drive is failed up to ZFS, and until then (a) you get
``wait'' no matter what you picked, and (b) commands like 'zpool
status' hang for all pools, where in a resiliently-designed system
they would hang for no pools especially not the pool affected by the
unresponsive device.  One might reasonably want a device state like
HUNG or SLOW or >30SEC in 'zpool status', along with the ability to
'zpool offline' any device at any time and, when doing so, cancel all
outstanding commands to that device to zfs's view as if they'd gotten
failures from the driver even though they're still waiting for
responses from the driver.  That device state doesn't exist partly
because 'zpool status' isn't meant to work well enough to ever return
such a state.  'failmode' is not a real or complete answer so long as
we agree it's reasonable to expect maintenance commands to work all
the time and not freeze up for intervals of 180sec -  - .  I understand most Unixes do act this way, not
just Solaris, but it's really not good enough.

dc> The other part of the issue, when failmode is set to the
dc> default "wait", relates to lower-level drivers and subsystems
dc> recovering reliably to things like removable disks reappearing
dc> after removal. There's surely room for improvement in some
dc> cases there, and perhaps your specific chipsets

How do you handle the case when a hotplug SATA drive is powered off
unexpectedly with data in its write cache?  Do you replay the writes, 
or do they go down the ZFS hotplug write hole?

I don't think this side of the issue is dependent on ``specific
chipsets''.


pgp9TCOl1MYsd.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread Mattias Pantzare
On Mon, Apr 12, 2010 at 19:19, David Magda  wrote:
> On Mon, April 12, 2010 12:28, Tomas Ögren wrote:
>> On 12 April, 2010 - David Magda sent me these 0,7K bytes:
>>
>>> On Mon, April 12, 2010 10:48, Tomas Ögren wrote:
>>>
>>> > For flash to overwrite a block, it needs to clear it first.. so yes,
>>> > clearing it out in the background (after erasing) instead of just
>>> > before the timing critical write(), you can make stuff go faster.
>>>
>>> Except that ZFS does not overwrite blocks because it is copy-on-write.
>>
>> So CoW will enable infinite storage, so you never have to write on the
>> same place again? Cool.
>
> Your comment was regarding making write()s go faster by pre-clearing
> unused blocks so there's always writable blocks available. Because ZFS
> doesn't go to the same LBAs when writing data, the SSD doesn't have to
> worry about read-modify-write circumstances like it has to with
> traditional file systems.
>
> Given that ZFS probably would not have to go back to "old" blocks until
> it's reached the end of the disk, that should give the SSDs' firmware
> plenty of time to do block-remapping and background erasing--something
> that's done now anyway regardless of whether an SSD supports TRIM or not.
> You don't need TRIM to make ZFS go fast, though it doesn't hurt.

Why would the disk care about if the block was written recently? There
is old data on it that has to be preserved anyway. The SSD does not
know if the old data was important.

ZFS will overwrite just as any other filesystem.

The only thing that makes ZFS SSD friendly is that it tries to make
large writes. But that only works if you have few synchronous writes.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Send/Receive Question

2010-04-12 Thread Robert Loper
I am trying to duplicate a filesystem from one zpool to another zpool.  I
don't care so much about snapshots on the destination side...I am more
trying to duplicate how RSYNC would copy a filesystem, and then only copy
incrementals from the source side to the destination side in subsequent runs
until I can complete a switch-over.  What would be some examples of the
syntax in zfs send/receive?

-- 
Robert Loper
rlo...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snapshots taking too much space

2010-04-12 Thread Arne Jansen
Paul Archer wrote:
> 
> Because it's easier to change what I'm doing than what my DBA does, I
> decided that I would put rsync back in place, but locally. So I changed
> things so that the backups go to a staging FS, and then are rsync'ed
> over to another FS that I take snapshots on. The only problem is that
> the snapshots are still in the 500GB range.
> 
> So, I need to figure out why these snapshots are taking so much more
> room than they were before.
> 
> This, BTW, is the rsync command I'm using (and essentially the same
> command I was using when I was rsync'ing from the NetApp):
> 
> rsync -aPH --inplace --delete /staging/oracle_backup/
> /backups/oracle_backup/

Try adding --no-whole-file to rsync. rsync disables block-by-block
comparison if used locally by default.

--Arne

> 
> 
> 
> This is the old system (rsync'ing from a NetApp and taking snapshots):
> zfs list -t snapshot -r bpool/snapback
> NAME   USED  AVAIL  REFER  MOUNTPOINT
> ...
> bpool/snapb...@20100310-18271353.7G  -   868G  -
> bpool/snapb...@20100312-00031859.8G  -   860G  -
> bpool/snapb...@20100312-18255254.0G  -   840G  -
> bpool/snapb...@20100313-18483471.7G  -   884G  -
> bpool/snapb...@20100314-12302417.5G  -   832G  -
> bpool/snapb...@20100315-17360972.6G  -   891G  -
> bpool/snapb...@20100316-16552724.3G  -   851G  -
> bpool/snapb...@20100317-17130456.2G  -   884G  -
> bpool/snapb...@20100318-17025050.9G  -   865G  -
> bpool/snapb...@20100319-18113153.9G  -   874G  -
> bpool/snapb...@20100320-18361780.8G  -   902G  -
> ...
> 
> 
> 
> This is from the new system (backing up directly to one volume,
> rsync'ing to and snapshotting another one):
> 
> r...@backup02:~# zfs list -t snapshot -r bpool/backups/oracle_backup
> NAME  USED  AVAIL  REFER 
> MOUNTPOINT
> bpool/backups/oracle_bac...@20100411-023130   479G  -   681G  -
> bpool/backups/oracle_bac...@20100411-104428   515G  -   721G  -
> bpool/backups/oracle_bac...@20100412-144700  0  -   734G  -
> 
> 
> Thanks for any help,
> 
> Paul
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread Bob Friesenhahn

On Mon, 12 Apr 2010, James Van Artsdalen wrote:


TRIM is not a Windows 7 command but rather a device command.


I only called it the "Windows 7 TRIM command" since that is how almost 
all of the original reports in the media described it.  It seems best 
to preserve this original name (as used by media experts) when 
discussing the feature on a Solaris list. ;-)


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread James Van Artsdalen
My point is not to advocate the TRIM command - those issues are already 
well-known - but rather suggest that the code that sends TRIM is also a good 
place to securely erase data on other media, such a hard disk.

TRIM is not a Windows 7 command but rather a device command.  FreeBSD's CAM 
layer also provides TRIM support but I don't think any of the filesystems issue 
the request yet.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] snapshots taking too much space

2010-04-12 Thread Paul Archer

I've got a bit of a strange problem with snapshot sizes. First, some
background:

For ages our DBA backed up all the company databases to a directory NFS
mounted from a NetApp filer. That directory would then get dumped to tape.

About a year ago, I built an OpenSolaris (technically Nexenta) machine with 24
x 1.5TB drives, for about 24TB of usable space. I am using this to backup OS
images using backuppc.

I was also backing up the DBA's backup volume from the NetApp to the (ZFS)
backup server. This is a combination of rsync + snapshots. The snapshots were
using about 50GB/day. The backup volume is about 600GB total, so this
wasn't bad, especially on a box with 24TB of space available.

I decided to cut out the middleman, and save some of that expensive NetApp
disk space, by having the DBA backup directly to the backup server. I
repointed the NFS mounts on our DB servers to point to the backup server
instead of the NetApp. Then I ran a simple cron job to snapshot that ZFS
filesystem daily.

My problem is that the snapshots started taking around 500GB instead of 50GB.
After a bit of thinking, I realized that the backup system my DBA was using
must have been writing new files and moving them into place, or possibly 
writing a whole new file even if only part changed.
I think this is the problem because ZFS never overwrites files in place. 
Instead it would allocate new blocks. But rsync does a byte-by-byte 
comparison, and only updates the blocks that have changed.


Because it's easier to change what I'm doing than what my DBA does, I decided 
that I would put rsync back in place, but locally. So I changed things so that 
the backups go to a staging FS, and then are rsync'ed over to another FS that 
I take snapshots on. The only problem is that the snapshots are still in the 
500GB range.


So, I need to figure out why these snapshots are taking so much more room than 
they were before.


This, BTW, is the rsync command I'm using (and essentially the same command I 
was using when I was rsync'ing from the NetApp):


rsync -aPH --inplace --delete /staging/oracle_backup/ /backups/oracle_backup/



This is the old system (rsync'ing from a NetApp and taking snapshots):
zfs list -t snapshot -r bpool/snapback
NAME   USED  AVAIL  REFER  MOUNTPOINT
...
bpool/snapb...@20100310-18271353.7G  -   868G  -
bpool/snapb...@20100312-00031859.8G  -   860G  -
bpool/snapb...@20100312-18255254.0G  -   840G  -
bpool/snapb...@20100313-18483471.7G  -   884G  -
bpool/snapb...@20100314-12302417.5G  -   832G  -
bpool/snapb...@20100315-17360972.6G  -   891G  -
bpool/snapb...@20100316-16552724.3G  -   851G  -
bpool/snapb...@20100317-17130456.2G  -   884G  -
bpool/snapb...@20100318-17025050.9G  -   865G  -
bpool/snapb...@20100319-18113153.9G  -   874G  -
bpool/snapb...@20100320-18361780.8G  -   902G  -
...



This is from the new system (backing up directly to one volume, rsync'ing to 
and snapshotting another one):


r...@backup02:~# zfs list -t snapshot -r bpool/backups/oracle_backup
NAME  USED  AVAIL  REFER  MOUNTPOINT
bpool/backups/oracle_bac...@20100411-023130   479G  -   681G  -
bpool/backups/oracle_bac...@20100411-104428   515G  -   721G  -
bpool/backups/oracle_bac...@20100412-144700  0  -   734G  -


Thanks for any help,

Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread David Magda
On Mon, April 12, 2010 12:28, Tomas Ögren wrote:
> On 12 April, 2010 - David Magda sent me these 0,7K bytes:
>
>> On Mon, April 12, 2010 10:48, Tomas Ögren wrote:
>>
>> > For flash to overwrite a block, it needs to clear it first.. so yes,
>> > clearing it out in the background (after erasing) instead of just
>> > before the timing critical write(), you can make stuff go faster.
>>
>> Except that ZFS does not overwrite blocks because it is copy-on-write.
>
> So CoW will enable infinite storage, so you never have to write on the
> same place again? Cool.

Your comment was regarding making write()s go faster by pre-clearing
unused blocks so there's always writable blocks available. Because ZFS
doesn't go to the same LBAs when writing data, the SSD doesn't have to
worry about read-modify-write circumstances like it has to with
traditional file systems.

Given that ZFS probably would not have to go back to "old" blocks until
it's reached the end of the disk, that should give the SSDs' firmware
plenty of time to do block-remapping and background erasing--something
that's done now anyway regardless of whether an SSD supports TRIM or not.
You don't need TRIM to make ZFS go fast, though it doesn't hurt.

There will be no "timing critical" instances as long as there is a decent
amount of free space available and ZFS can simply keep doing an LBA++.
SSDs worked fine without TRIM, it's just that command helps them work more
efficiently.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-12 Thread Kyle McDonald
On 4/12/2010 9:10 AM, Willard Korfhage wrote:
> I upgraded to the latest firmware. When I rebooted the machine, the pool was 
> back, with no errors. I was surprised.
>
> I will work with it more, and see if it stays good. I've done a scrub, so now 
> I'll put more data on it and stress it some more.
>
> If the firmware upgrade fixed everything, then I've got  a question about 
> which I am better off doing: keep it as-is, with the raid card providing 
> redundancy, or turn it all back into pass-through drives and let ZFS handle 
> it, making the Areca card just a really expensive way of getting a bunch of 
> SATA interfaces?
>   

AS one of the other posters mentioned there may be a third way that
might give you something close to "the best of both worlds".

Try using the Areca card to make 12 single disk RAID 0 LUNs, and then
use those in ZFS.
I'm not sure of the definition of 'passthrough', but if it disables any
battery backed cache that the card may have, then by setting up 12 HW
RAID LUNs instead, you it should give you an improvement by allowing the
Card to cache writes.

The one downside of doing this vs. something more like 'jbod' is that if
the controller dies you will need to move the disks to another Areca
controller, where as with 12  'jbod' connections you could move them to
pretty much any controller you wanted.

 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread Mattias Pantzare
>> OpenSolaris needs support for the TRIM command for SSDs.  This command is
>> issued to an SSD to indicate that a block is no longer in use and the SSD
>> may erase it in preparation for future writes.
>
> There does not seem to be very much `need' since there are other ways that a
> SSD can know that a block is no longer in use so it can be erased.  In fact,
> ZFS already uses an algorithm (COW) which is friendly for SSDs.

What ways would that be?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread Tomas Ögren
On 12 April, 2010 - David Magda sent me these 0,7K bytes:

> On Mon, April 12, 2010 10:48, Tomas Ögren wrote:
> > On 12 April, 2010 - Bob Friesenhahn sent me these 0,9K bytes:
> >
> >> Zfs is designed for high thoughput, and TRIM does not seem to improve
> >> throughput.  Perhaps it is most useful for low-grade devices like USB
> >> dongles and compact flash.
> >
> > For flash to overwrite a block, it needs to clear it first.. so yes,
> > clearing it out in the background (after erasing) instead of just before
> > the timing critical write(), you can make stuff go faster.
> 
> Except that ZFS does not overwrite blocks because it is copy-on-write.

So CoW will enable infinite storage, so you never have to write on the
same place again? Cool.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Areca ARC-1680 on OpenSolaris 2009.06?

2010-04-12 Thread Dave Pooser
> What do you mean by overpromised and underdelivered?

Well, when I did a quick Google search this
 was one of the first results I got. (I know, a
different card-- but the same company, and if they fudge compatibility
information on one product)
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread Bob Friesenhahn

On Mon, 12 Apr 2010, David Magda wrote:


Except that ZFS does not overwrite blocks because it is copy-on-write.


At some time in the (possibly distant) future the ZFS block might 
become free and then the Windows 7 TRIM command could be used to try 
to pre-erase it.  This might help an intermittent benchmark.  Of 
course, the background TRIM commands might clog other on-going 
operations so it might hurt the benchmark.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread Bob Friesenhahn

On Mon, 12 Apr 2010, Tomas Ögren wrote:


For flash to overwrite a block, it needs to clear it first.. so yes,
clearing it out in the background (after erasing) instead of just before
the timing critical write(), you can make stuff go faster.


Yes of course.  Properly built SSDs include considerable extra space 
to support wear leveling, and this same space may be used to store 
erased blocks.  A block which is "overwritten" can simply be written 
to a block allocated from the extra free pool, and the existing block 
can be re-assigned to the free pool and scheduled for erasure.  This 
is a fairly simple recirculating algorithm which just happens to 
also assist with wear management.


Filesystem blocks are rarely aligned and sized to match underlying 
FLASH device blocks so FLASH devices would need to implement fancy 
accounting in order to decide when they should actually erase a FLASH 
block.  Erasing a FLASH block may require moving some existing data 
which was still not erased.  It is much easier to allocate a 
completely fresh block, update it as needed, and use some sort of 
ordered "atomic" operation to exchange the blocks so the data always 
exists in some valid state.  Without care, existing data which should 
not be involved in the write may be destroyed due to a power failure.


This is why it is not extremely useful for Solaris to provide support 
for the "Windows 7 TRIM" command.


Really low-grade devices might not have much smarts or do very good 
wear leveling, and these devices might benefit from the Windows 7 TRIM 
command.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread David Magda
On Mon, April 12, 2010 10:48, Tomas Ögren wrote:
> On 12 April, 2010 - Bob Friesenhahn sent me these 0,9K bytes:
>
>> Zfs is designed for high thoughput, and TRIM does not seem to improve
>> throughput.  Perhaps it is most useful for low-grade devices like USB
>> dongles and compact flash.
>
> For flash to overwrite a block, it needs to clear it first.. so yes,
> clearing it out in the background (after erasing) instead of just before
> the timing critical write(), you can make stuff go faster.

Except that ZFS does not overwrite blocks because it is copy-on-write.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to Catch ZFS error with syslog ?

2010-04-12 Thread Tim Haley

On 04/12/10 09:05 AM, J James wrote:

I have a simple mirror pool with 2 disks. I pulled out one disk to simulate a 
failed drive. zpool status shows that the pool is in DEGRADED state.

I want syslog to log these type of ZFS errors. I have syslog running and 
logging all sorts of error to a log server. But this failed disk in ZFS pool 
did not generate any syslog messages.

ZFS diagnosists engine are online as seen bleow.

hrs1zgpprd1# fmadm config | grep zfs
zfs-diagnosis 1.0 active ZFS Diagnosis Engine
zfs-retire 1.0 active ZFS Retire Agent

So, why is it not generating any syslog messages?


Try explicitly enabling fmd to send to syslog in
/usr/lib/fm/fmd/plugins/syslog-msgs.conf

-tim


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to Catch ZFS error with syslog ?

2010-04-12 Thread J James
I have a simple mirror pool with 2 disks. I pulled out one disk to simulate a 
failed drive. zpool status shows that the pool is in DEGRADED state.

I want syslog to log these type of ZFS errors. I have syslog running and 
logging all sorts of error to a log server. But this failed disk in ZFS pool 
did not generate any syslog messages.

ZFS diagnosists engine are online as seen bleow.

hrs1zgpprd1# fmadm config | grep zfs
zfs-diagnosis 1.0 active ZFS Diagnosis Engine
zfs-retire 1.0 active ZFS Retire Agent

So, why is it not generating any syslog messages?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread Tomas Ögren
On 12 April, 2010 - Bob Friesenhahn sent me these 0,9K bytes:

> On Sun, 11 Apr 2010, James Van Artsdalen wrote:
>
>> OpenSolaris needs support for the TRIM command for SSDs.  This command 
>> is issued to an SSD to indicate that a block is no longer in use and 
>> the SSD may erase it in preparation for future writes.
>
> There does not seem to be very much `need' since there are other ways  
> that a SSD can know that a block is no longer in use so it can be  
> erased.  In fact, ZFS already uses an algorithm (COW) which is friendly 
> for SSDs.
>
> Zfs is designed for high thoughput, and TRIM does not seem to improve  
> throughput.  Perhaps it is most useful for low-grade devices like USB  
> dongles and compact flash.

For flash to overwrite a block, it needs to clear it first.. so yes,
clearing it out in the background (after erasing) instead of just before
the timing critical write(), you can make stuff go faster.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread Bob Friesenhahn

On Sun, 11 Apr 2010, James Van Artsdalen wrote:

OpenSolaris needs support for the TRIM command for SSDs.  This 
command is issued to an SSD to indicate that a block is no longer in 
use and the SSD may erase it in preparation for future writes.


There does not seem to be very much `need' since there are other ways 
that a SSD can know that a block is no longer in use so it can be 
erased.  In fact, ZFS already uses an algorithm (COW) which is 
friendly for SSDs.


Zfs is designed for high thoughput, and TRIM does not seem to improve 
throughput.  Perhaps it is most useful for low-grade devices like USB 
dongles and compact flash.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Create 1 pool from 3 exising pools in mirror configuration

2010-04-12 Thread Harry Putnam
Daniel Carosone  writes:

> For Harry's benefit, the recipe we're talking about here is roughly as
> follows. Your pools z2 and z3, we will merge into z2.   and
>  are the current members of z3.

[...] snipped very handy outline

Thank you.  That kind of walk through is really helpful here.

I haven't actually done any of it yet, still cogitating.

Also though, for my lightish usage... as home lan NAS I kind of wonder
if its worth trying that.  It would be nice to make better use of the
space I have.

But as someone suggested it might be better to get two more bigger
drives.  1t or 1.5t would handle all my data on one pair.

Then I guess after moving all the data to a single zpool made up of
those 2 new disks, I could then add the freed up drives as vdevs to
it?

So if you have time and inclination another walk through scheme would
be well appreciated.

I think, I'd just leave rpool clear out of it.  That mirror is made of
2 500gb IDE drives, that are now almost outdated by the move to sata.

rpool consists of 2 500gb IDE drives.
z2 consists of 2 500gb sata drives
z3 consists of 2 750 gb sata drives.

So I guess I'd add 2 1.5tb sata drives and create z1 of them.

I think I might know how to migrate all the data or at least I'm
confident with a little reading of opensolaris docs I will be able to
do it.

I'm not clear how I'd go about adding the freed up drives to z1. ending
up with 3vdevs in z1, each a 2disk mirrors.

Or what commands then will stripe(?) the data across them all.

Once done, I would then have some thing close to 2.7TB available on
z1.

I'm wondering if after completing the above migrations, I'd then need
to migrate whatever data is now being kept on rpool, onto the new z1
and just leave those 2 rpool disks for the OS, even though it would
waste a few hundred GB.  But maybe keeping the periodic hefty
delete/write off the system disks is worth it?

Or would there really be any good reason not to just leave some data
on rpool?  Maybe certain kinds of data that don't grow to speak of.
Is it really that bad having some data stored on the OS (rpool)
drives?  I'm thinking maybe keeping system (C drive) disk images of
the windows lan machines there, which would mean periodic hefty
deletes/writes but the amount of data would not really grow too much.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread Pablo Méndez Hernández
Hi James:

On Mon, Apr 12, 2010 at 06:45, James Van Artsdalen
 wrote:
> OpenSolaris needs support for the TRIM command for SSDs.  This command is 
> issued to an SSD to indicate that a block is no longer in use and the SSD may 
> erase it in preparation for future writes.

That's what this RFE is about

6859245 Solaris needs to support the TRIM command for solid state drives (ssd)

> A SECURE_FREE dataset property might be added that says that when a block is 
> released to free space (and hence eligible for TRIM), ZFS should overwrite 
> the block to zeros (or better, ones).  If a dataset has such a property set 
> then no "stray" copies of the data exist in free space and deletion of the 
> file and snapshots is sufficient to remove all instances of the data.
>
> If a file exists before such a property is set that's a problem.  If it's 
> really important - and it might be in some cases because of legal mandates - 
> there could be a per-file flag SECURELY_FREED that is set on file creation 
> iff the dataset SECURE_FREE is set and is reset if the file is ever changed 
> while SECURE_FREE is clear - this indicates if any file data "escaped" into 
> free space at some point.  Finally an UNLINK_SECURE call would be needed to 
> avoid race conditions at the end so an app can be sure the data really was 
> securely erased.
>
> PS. It is faster for an SSD to write a block of 0xFF than 0 and it's possible 
> some might make that optimization.  That's why I suggest erase-to-ones rather 
> than erase-to-zero.

-- 

Pablo Méndez Hernández
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Secure delete?

2010-04-12 Thread Eric D. Mudama

On Sun, Apr 11 at 22:45, James Van Artsdalen wrote:


PS. It is faster for an SSD to write a block of 0xFF than 0 and it's
possible some might make that optimization.  That's why I suggest
erase-to-ones rather than erase-to-zero.


Do you have any data to back this up?  While I understand the
underlying hardware implementation of NAND, I am not sure SSDs would
bother optimizing for this case.  A block erase would be just as
effective at hiding data.

I believe the reason strings of bits "leak" on rotating drives you've
overwritten (other than grown defects) is because of minute off-track
occurances while writing (vibration, particles, etc.), causing
off-center writes that can be recovered in the future with the right
equipment.

Flash doesn't have this "analog positioning" problem.  While each
electron well is effectively analog, there's no "best guess" work at
locating the wells.

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-12 Thread David Magda
On Mon, April 12, 2010 09:10, Willard Korfhage wrote:

> If the firmware upgrade fixed everything, then I've got  a question about
> which I am better off doing: keep it as-is, with the raid card providing
> redundancy, or turn it all back into pass-through drives and let ZFS
> handle it, making the Areca card just a really expensive way of getting a
> bunch of SATA interfaces?

Unless there's a specific feature that the card does, I'd say that ZFS
would give you more capabilities: scrubbing, reporting, recovery on
checksum errors, more efficient rebuilds (i.e., only copying blocks that
are used). If the hardware ever goes south, you'll also be able to have to
move the disks to any arbitrary machine and do a 'zpool import'.

At least for DAS, there are very few reasons to use fancy cards nowadays
(also true with Linux and LVM to a certain extent).


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-12 Thread Willard Korfhage
I upgraded to the latest firmware. When I rebooted the machine, the pool was 
back, with no errors. I was surprised.

I will work with it more, and see if it stays good. I've done a scrub, so now 
I'll put more data on it and stress it some more.

If the firmware upgrade fixed everything, then I've got  a question about which 
I am better off doing: keep it as-is, with the raid card providing redundancy, 
or turn it all back into pass-through drives and let ZFS handle it, making the 
Areca card just a really expensive way of getting a bunch of SATA interfaces?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-12 Thread Tonmaus
Upgrading the firmware a good idea, as there are other issues with Areca 
controllers that only have been solved recently. i.e. 1.42 is probably still 
affected by a problem with SCSI labels that may give problems importing a pool.

-Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-12 Thread Willard Korfhage
I was wondering if the controller itself has problems. My card's firmware is 
version 1.42, and the firmware on the website is up to 1.48.

I see the firmware released in last September says

Fix Opensolaris+ZFS to add device to mirror set in JBOD or passthrough mode

and

Fix SATA raid controller seagate HDD error handling

I'm not using mirroring, but I am using seagate drives. Looks like I should do 
a firmware upgrade
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] CIFS in production and my experience so far, advice needed

2010-04-12 Thread charles
I am looking at opensolaris with ZFS and CIFS shares as an option for large 
scale production use with Active Directory.

I have successfully joined the opensolaris CIFS server to our Windows AD test 
domain and created an SMB share that the Windows server 2003 can see. I have 
also created test users on the Windows Server, specifying the profile and home 
directory path to the SMB share on the opensolaris server. This creates a home 
directory with relevant ACL's on ZFS on the fly. All appears to work fine.
I could then in theory apply quotas on opensolaris if needed with the rich easy 
to use ZFS commands, and even tweak ACLS with /usr/bin/chmod.

In theory I could batch user creation on the Windows server, all homedirs would 
be on ZFS and the world would be a better place.

This is all using the ephemeral user model on opensolaris, so only the SIDs are 
stored permanently on opensolaris, the uids and username 'just passing through 
in memory', the ACLs importantly remaining associated to the SIDS (this is my 
experience with ACLs but I am unsure how stable the permanency of ACLs are 
considering the appear to be associated with temporoay uids in opensolaris if 
you interrogate with /usr/bin/ls -V  ?)

The scale is around 60,000 users all with home directories and a total of 40 
million files, soon to be increasing. Around 600 concurrent users.

This is where I stop to thinkis opensolaris and CIFS designed to be robust 
enough for use like this in production?

There are several opensolaris  bug reports I have noted with CIFS like for 
example you cannot unmount a share because  'device busy' (bug id 6819639).

This make me worry about how robust it really is.

I would dearly love to have a ZFS fileserver for my Windows users, especially 
as NTFS is not as scalable and flexible and frankly as nice.

I have been put off Solaris (10) and SAMBA because the config looks complex and 
you appear to need identity on the Solaris server, which would be a major 
headache with 60,000 users. Opensolaris and CIFS solves this issue immediately 
with idmap and ephemeral users.

What are peoples thought on the viability of opensolaris in production?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-12 Thread Willard Korfhage
Just a message 7 hours earlier about an IRQ being shared by drivers with 
different interrupt levels might result in reduced performance.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why would zfs have too many errors when underlying raid array is fine?

2010-04-12 Thread Tonmaus
Hi,

> I started off my setting up all the disks to be
> pass-through disks, and tried to make a raidz2 array
> using all the disks. It would work for a while, then
> suddenly every disk in the array would have too many
> errors and the system would fail.

I had exactly the same experience with my Areca controller. Actually, I 
couldn't get it to work unless I put the whole controller in jbod mode. Neither 
12 x "Raid-0 arrays" with single disks nor pass-through was workable. I had 
kernel panic and pool corruption all over the place, sometimes with, sometimes 
without additional corruption messages from the areca panel.I am not sure if 
this relates to the rest of your problem, though.

Regards,

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss