Re: [zfs-discuss] Using iSCSI on ZFS with non-native FS - How to backup.

2009-12-05 Thread Orvar Korvar
I think it should work. I have seen blog post about ZFS, iSCSI and Macs. Just 
google a bit.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Transaction consistency of ZFS

2009-12-05 Thread nxyyt
The "rename trick" may not work here. Even if I renamed the file successfully, 
the data of the file may still reside in the memory instead of flushing back to 
the disk.  If I made any mistake here, please correct me. Thank you!

I'll try to find out whether ZFS binding the same file always to the same 
opening transaction group. If so, I guess my assumption here would be true. 
Seems like there is only one opening transaction group at anytime. Can anybody 
give me a definitive answer here?

For ZIL, it must be flushed back to disk in the order of fsync(). So that the 
last append of the file would happen as the last transaction log in ZIL for 
this file, I think. The assumption should still be true.

fsync or fdatasync may be too heavyweight for my case because it's a write 
intensive workload. I hope replicating the data to different machines to 
protect the data from power outage would be better.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Transaction consistency of ZFS

2009-12-05 Thread Bob Friesenhahn

On Sat, 5 Dec 2009, Damon Atkins wrote:

If power failure happens you will lose anything in cache. So you 
could lose the entire file on power failure if the system is not 
busy (ie ZFS does delay writes, unless you do a fsync before closing 
the file).  I would still like to see a file system option "sync on 
close" or even "wait for txg on close"


A memory-mapped file may still be updated even after its file 
descriptor has been closed.  It may be updated as long as any of its 
pages remain mapped.  File updates due to updated pages are usually 
lazy unless msync() is used to flush the pages to backing store.  How 
do you propose that this would be handled?


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Transaction consistency of ZFS

2009-12-05 Thread Damon Atkins
If power failure happens you will lose anything in cache. So you could lose the 
entire file on power failure if the system is not busy (ie ZFS does delay 
writes, unless you do a fsync before closing the file).  I would still like to 
see a file system option "sync on close" or even "wait for txg on close"

Some of the best methods are to create a temp file  e.g. ".download.filename" 
and rename when the download (or what ever) is sucessfull to "filename" Or 
create a extra empty file to say it has been completed e.g. filename.dn. I 
prefer the rename trick.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-12-05 Thread Chad Cantwell
I was under the impression that the problem affecting most of us was introduced 
much later than b104,
sometime between ~114 and ~118.  When I first started using my LSI 3081 cards, 
they had the IR firmware
on them, and it caused me all kinds of problems.  The disks showed up but I 
couldn't write to them, I
believe.  Eventually I found that I needed the IT firmware for it to work 
properly, which is what I
have used ever since, but maybe some builds do work with IR firmware?  I 
remember, then, when I was
originally trying to set them up with the IR firmware, Opensolaris saw my two 
cards as one device,
whereas with the IT firmware they were always mpt0 and mpt1.  Could also be the 
IR works with one card
but not well when two cards are combine...

Chad

On Sat, Dec 05, 2009 at 02:47:55PM -0800, Calvin Morrow wrote:
> I found this thread after fighting the same problem in Nexenta which uses the 
> OpenSolaris kernel from b104.  Thankfully, I think I have (for the moment) 
> solved my problem.
> 
> Background:
> 
> I have an LSI 3081e-R (1068E based) adapter which experiences the same 
> disconnected command timeout error under relatively light load.  This card 
> connects to a Supermicro chassis using 2 MiniSAS cables to redundant 
> expanders that are attached to 18 SAS drives.  The card ran the latest IT 
> firmware (1.29?).
> 
> This server is a new install, and even installing from the CD to two disks in 
> a mirrored ZFS root would randomly cause the disconnect error.  The system 
> remained unresponsive until after a reboot.
> 
> I tried the workarounds mentioned in this thread, namely using "set 
> mpt:mpt_enable_msi = 0" and "set xpv_psm:xen_support_msi = -1" in 
> /etc/system.  Once I added those lines, the system never really became 
> unresponsive, however there were partial read and partial write messages that 
> littered dmesg.  At one point there appeared to be a disconnect error ( can 
> not confirm ) that the system recovered from.
> 
> Eventually, I became desperate and flashed the IR (Integrated Raid) firmware 
> over the top of the IT firmware.  Since then, I have had no errors in dmesg 
> of any kind.
> 
> I even removed the workarounds from /etc/system and still have had no issues. 
>  The mpt driver is exceptionally quiet now.
> 
> I'm interested to know if anyone who has a 1068E based card is having these 
> problems using the IR firmware, or if they all seem to be IT (initiator 
> target) related.
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-12-05 Thread Calvin Morrow
I found this thread after fighting the same problem in Nexenta which uses the 
OpenSolaris kernel from b104.  Thankfully, I think I have (for the moment) 
solved my problem.

Background:

I have an LSI 3081e-R (1068E based) adapter which experiences the same 
disconnected command timeout error under relatively light load.  This card 
connects to a Supermicro chassis using 2 MiniSAS cables to redundant expanders 
that are attached to 18 SAS drives.  The card ran the latest IT firmware 
(1.29?).

This server is a new install, and even installing from the CD to two disks in a 
mirrored ZFS root would randomly cause the disconnect error.  The system 
remained unresponsive until after a reboot.

I tried the workarounds mentioned in this thread, namely using "set 
mpt:mpt_enable_msi = 0" and "set xpv_psm:xen_support_msi = -1" in /etc/system.  
Once I added those lines, the system never really became unresponsive, however 
there were partial read and partial write messages that littered dmesg.  At one 
point there appeared to be a disconnect error ( can not confirm ) that the 
system recovered from.

Eventually, I became desperate and flashed the IR (Integrated Raid) firmware 
over the top of the IT firmware.  Since then, I have had no errors in dmesg of 
any kind.

I even removed the workarounds from /etc/system and still have had no issues.  
The mpt driver is exceptionally quiet now.

I'm interested to know if anyone who has a 1068E based card is having these 
problems using the IR firmware, or if they all seem to be IT (initiator target) 
related.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import - device names not always updated?

2009-12-05 Thread Ragnar Sundblad

On 4 dec 2009, at 18.47, Cindy Swearingen wrote:

> Hi--
> 
> The problem with your test below was creating a pool by using the
> components from another pool. This configuration is not supported.
> 
> We don't have a lot of a specific information about using volumes,
> other than for using as iSCSI and COMSTAR devices.
> 
> You might review our ZFS best practices guide, here, for guidelines on
> creating ZFS storage pools:
> 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
> 
> Cindy

Thank you! To me it was news that ZFS volume dsk/rdsk devices
wasn't supported as containers for ZFS file systems. A bit
surprising too. The ZFS Best Practices Guide doesn't say very
much about this.

I am even more surprised that a file in ZFS *is* supported for ZFS
filesystems when volume devices aren't, that wasn't too obvious
to me.

Can Cindy or someone else please comment on what is and what isn't
supported of the following, which we currently use or plan to use:

- UFS in a ZFS volume, mounted locally?

- a ZFS volume, iSCSI exported (soon to be COMSTAR), locally imported
again, and with a ZFS in it locally mounted/imported?

Thanks!

/ragge


> 
> On 12/03/09 15:26, Ragnar Sundblad wrote:
>> Thank you Cindy for your reply!
>> On 3 dec 2009, at 18.35, Cindy Swearingen wrote:
>>> A bug might exist but you are building a pool based on the ZFS
>>> volumes that are created in another pool. This configuration
>>> is not supported and possible deadlocks can occur.
>> I had absolutely no idea that ZFS volumes weren't supported
>> as ZFS containers. Were can I find information about what
>> is and what isn't supported for ZFS volumes?
>>> If you can retry this example without building a pool on another
>>> pool, like using files to create a pool and can reproduce this,
>>> then please let me know.
>> I retried it with files instead, and it then worked exactly
>> as expected. (Also, it didn't anymore magically remember
>> locations of earlier found volumes in other directories for
>> import, with or without the sleeps.)
>> I don't know if it is of interest, to anyone, but I'll
>> include the reworked file based test below.
>> /ragge
>> 
>> #!/bin/bash
>> set -e
>> set -x
>> mkdir /d
>> mkfile 1g /d/f1
>> mkfile 1g /d/f2
>> zpool create pool mirror /d/f1 /d/f2
>> zpool status pool
>> zpool export pool
>> mkdir /d/subdir1
>> mkdir /d/subdir2
>> mv /d/f1 /d/subdir1/
>> mv /d/f2 /d/subdir2/
>> zpool import -d /d/subdir1
>> zpool import -d /d/subdir2
>> zpool import -d /d/subdir1 -d /d/subdir2 pool
>> zpool status pool
>> # cleanup - remove the "# DELETEME_" part
>> # DELETEME_zpool destroy pool
>> # DELETEME_rm -rf /d
>> 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Transaction consistency of ZFS

2009-12-05 Thread Toby Thain


On 5-Dec-09, at 8:32 AM, nxyyt wrote:


Thank you very much for your quick response.

My question is I  want to figure out whether there is data loss  
after power outage. I have replicas on other machines so I can  
recovery from the data loss. But I need a way to know whether there  
is data loss without comparing the different data replicas.


I suppose if I append a footer to the end of file before I close  
it, I can detect the data loss by validating the footer. Is it a  
work aroud for me ? Or is there a better alternative? In my  
scenario, the file is append-only, no in-place overwrite.


You seem to be looking for fsync() and/or fdatasync(); or, take  
advantage of existing systems with durable commits (e.g. [R]DBMS).


--Toby


--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mpt errors on snv 127

2009-12-05 Thread Chad Cantwell
Hi all,

Unfortunately for me, there does seem to be a hardware component to my problem. 
 Although my rsync copied almost 4TB of
data with no iostat errors after going back to OpenSolaris 2009.06, I/O on one 
of my mpt cards did eventually hang, with
6 disk lights on and 2 off, until rebooting.  There are a few hardware changes 
made since the last time I did a full
backup, so it's possible that whatever problem was introduced didn't happen 
frequently enough in low i/o usage for
me to detect until now when I was reinstalling and copying massive amounts of 
data back.

The changes I had made since originally installing osol2009.06 several months 
ago are:

- stop using marvel yukon2 ethernet onboard driver (which used a 3rd party 
driver) in favor of intel 1000 pt dual port,
which necessesitated an extra pci-e slot, prompting the following item:
- swapped motherboards between 2 machines (they were similiar though, with 
similiar onboard hardware and shouldn't
have been a major change).  Originally was an Asus P5Q Deluxe w/3 pci-e slots, 
now is a slightly older Asus P5W64 w/4
pci-e slots.
- the intel 1000 pt dual port card has been aggregated as aggr0 since it was 
installed (the older yukon2 was a basic
interface)

the above changes were what was done awhile ago before upgrading opensolaris to 
127, and things seemed to be working fine
for at least 2-3 months with rsync updating (never hung, or had a fatal zfs 
error or lost access to data requiring a reboot)

new changes since troubleshooting snv 127 mpt issues:
- upgrade LSI 3081 firmware from 1.28.2 (or was it .02) to 1.29, the latest.  
If this turns out to be an issue, I do have
the previous IT firmware that I was using before which I can flash back.

another, albeit unlikely factor: when I originally copied all my data to my 
first opensolaris raidz2 pool, I didn't use
rsync at all, I used netcat & tar, and only setup rsync later for updates.  
perhaps the huge initial single rsync of
the large tree does something strange that the original intiial netcat & tar 
copy did not (i know, unlikely, but I'm
grasping at straws here to determine what has happened).

I'll work on ruling out the potential sources of hardware problems before I 
report any more on the mpt issues, since
my test case would probably confound things at this point.  I am affected by 
the mpt bugs since I would get the
timeouts almost constantly in snv 127+, but since I'm also apparently affected 
by some other unknown hardware issue,
my data on the mpt problems might lead people in the wrong direction at this 
point.

I will first try to go back to the non-aggregated yukon ethernet, remove the 
intel dual port pci-e network adapter,
then if the problem persists try half of my drives on each LSI controller 
individually to confirm if one controller
has a problem the other does not, or one drive in one set is causing a new 
problem to a particular controller.  I hope
to have some kind of answer at that point and not have to resort to motherboard 
swapping again.

Chad

On Thu, Dec 03, 2009 at 10:44:53PM -0800, Chad Cantwell wrote:
> I eventually performed a few more tests, adjusting some zfs tuning options 
> which had no effect, and trying the
> itmpt driver which someone had said would work, and regardless my system 
> would always freeze quite rapidly in
> snv 127 and 128a.  Just to double check my hardware, I went back to the 
> opensolaris 2009.06 release version, and
> everything is working fine.  The system has been running a few hours and 
> copied a lot of data and not had any
> trouble, mpt syslog events, or iostat errors.
> 
> One thing I found interesting, and I don't know if it's significant or not, 
> is that under the recent builds and
> under 2009.06, I had run "echo '::interrupts' | mdb -k" to check the 
> interrupts used.  (I don't have the printout
> handy for snv 127+, though).
> 
> I have a dual port gigabit Intel 1000 P PCI-e card, which shows up as e1000g0 
> and e1000g1.  In snv 127+, each of
> my e1000g devices shares an IRQ with my mpt devices (mpt0, mpt1) on the IRQ 
> listing, whereas in opensolaris
> 2009.06, all 4 devices are on different IRQs.  I don't know if this is 
> significant, but most of my testing when
> I encountered errors was data transfer via the network, so it could have 
> potentially been interfering with the
> mpt drivers when it was on the same IRQ.  The errors did seem to be less 
> frequent when the server I was copying
> from was linked at 100 instead of 1000 (one of my tests), but that is as 
> likely to be a result of the slower zpool
> throughput as it is to be related to the network traffic.
> 
> I'll probably stay with 2009.06 for now since it works fine for me, but I can 
> try a newer build again once some
> more progress is made in this area and people want to see if its fixed (this 
> machine is mainly to backup another
> array so it's not too big a deal to test later when the mpt drivers are 
> looking better and wipe again

Re: [zfs-discuss] ZFS send | verify | receive

2009-12-05 Thread Mike Gerdts
On Sat, Dec 5, 2009 at 11:32 AM, Bob Friesenhahn
 wrote:
> On Sat, 5 Dec 2009, dick hoogendijk wrote:
>
>> On Sat, 2009-12-05 at 09:22 -0600, Bob Friesenhahn wrote:
>>
>>> You can also stream into a gzip or lzop wrapper in order to obtain the
>>> benefit of incremental CRCs and some compression as well.
>>
>> Can you give an example command line for this option please?
>
> Something like
>
>  zfs send mysnapshot | gzip -c -3 > /somestorage/mysnap.gz
>
> should work nicely.  Zfs send sends to its standard output so it is just a
> matter of adding another filter program on its output.  This could be
> streamed over ssh or some other streaming network transfer protocol.
>
> Later, you can do 'gzip -t mysnap.gz' on the machine where the snapshot file
> is stored to verify that it has not been corrupted in storage or transfer.
>
> lzop (not part of Solaris) is much faster than gzip but can be used in a
> similar way since it is patterned after gzip.

It seems as though a similar filter could be created to create and
inject an error correcting code into the stream.  That is:

zfs send $snap | ecc -i  > /somestorage/mysnap.ecc
ecc -o < /somestorage/mysnap | zfs receive ...

I'm not aware of an existing  ecc program, but I can't imagine it
would be hard to create one.  There seems to already be an
implementation of Reed-Solomon encoding in ON that could likely be
used as a starting point.

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_raidz.c

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Using iSCSI on ZFS with non-native FS - How to backup.

2009-12-05 Thread Jens Vilstrup
Hi there.

I'm looking at moving my home server to ZFS and adding a second for backup 
purposes.
In the process of researching ZFS I noticed iSCSI.
I'm thinking of creating a zvol, share it with iSCSI and use it with my Mac(s).
In this scenario, the fs would obviously have to be HFS+.

Now, my question is:
How would I go about replicating this non-native FS to the backup server?
Can I have snapshots of the zvol like if it was a native zfs filesystem?
If yes, do they take up as little space and overhead as "normal" snapshots?

Thanks,
Jens.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send | verify | receive

2009-12-05 Thread Bob Friesenhahn

On Sat, 5 Dec 2009, dick hoogendijk wrote:


On Sat, 2009-12-05 at 09:22 -0600, Bob Friesenhahn wrote:


You can also stream into a gzip or lzop wrapper in order to obtain the
benefit of incremental CRCs and some compression as well.


Can you give an example command line for this option please?


Something like

  zfs send mysnapshot | gzip -c -3 > /somestorage/mysnap.gz

should work nicely.  Zfs send sends to its standard output so it is 
just a matter of adding another filter program on its output.  This 
could be streamed over ssh or some other streaming network transfer 
protocol.


Later, you can do 'gzip -t mysnap.gz' on the machine where the 
snapshot file is stored to verify that it has not been corrupted in 
storage or transfer.


lzop (not part of Solaris) is much faster than gzip but can be used in 
a similar way since it is patterned after gzip.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on ssd

2009-12-05 Thread Seth Heeren
Colin Raven wrote:
> On Sat, Dec 5, 2009 at 17:43, Seth Heeren  > wrote:
>
> Bob Friesenhahn wrote:
> > On Sat, 5 Dec 2009, Seth Heeren wrote:
> >>>
> >>> in the same way, I guess, when running an OS on a SSD boot disk,
> >>> should we still need the same memory swapping mechanisms as we do
> >>> today, considering that in that case, the swap device is
> (nearly) as
> >>> fast as memory itself.
> >> Is it? I think that when you look up the numbers (for server-grade
> >> hardware) you could find an order of magnitude difference. Now
> there are
> >
> > The difference is pretty huge.  Consider 6GB+/second vs
> 140MB/second.
> Not to detract from the point (my own point in fact) but my 2xSSD in
> stripes deliver a peak read throughput of 350Mb/s each time I boot
> :) My
> boot time lands at 11-13 seconds depending on wheather conditions.
>
>
> Goodness me, what the heck does weather have to do with the
> performance of an SSD? 
>

It's a figure of speech... Nothing. But my network conditions and the
speed at which I login, the exact delay detecting the logical volume
groups on my 6 sata disks in the same system... these times will vary.
Not to mention system updates that trigger actions at boot time etc.

I'm on Ubuntu Karmic, btw. and the boot fs is (obviously) not on ZFS
(but ext4 minus journalling and with tweaked block sizes/alignment).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send | verify | receive

2009-12-05 Thread Seth Heeren


Bob Friesenhahn wrote:
> On Sat, 5 Dec 2009, Sriram Narayanan wrote:
>
>> If feasible, you may want to generate MD5 sums on the streamed output
>> and then use these for verification.
>
> You can also stream into a gzip or lzop wrapper in order to obtain the
> benefit of incremental CRCs and some compression as well.  As long as
> the wrapper is generated on the sending side (and not subject to
> problems like truncation) it should be quite useful for verifying that
> the stream has not been corrupted.
Same deal as with MD5 sums. It doesn't guarantee that the stream is
'receivable' on the receiver.
Now, unless your wrapper is able to retransmit on CRC error, a MD5 would
be vastly superior due to qualilty of error detection.
Both techniques would be optimal (although I'd suspect the compression
doesn't help. I should think the send/recv streams will be compressed as
it is).
>
> Bob
> -- 
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us,
> http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on ssd

2009-12-05 Thread Seth Heeren
Bob Friesenhahn wrote:
> On Sat, 5 Dec 2009, Seth Heeren wrote:
>>>
>>> in the same way, I guess, when running an OS on a SSD boot disk,
>>> should we still need the same memory swapping mechanisms as we do
>>> today, considering that in that case, the swap device is (nearly) as
>>> fast as memory itself.
>> Is it? I think that when you look up the numbers (for server-grade
>> hardware) you could find an order of magnitude difference. Now there are
>
> The difference is pretty huge.  Consider 6GB+/second vs 140MB/second.
Not to detract from the point (my own point in fact) but my 2xSSD in
stripes deliver a peak read throughput of 350Mb/s each time I boot :) My
boot time lands at 11-13 seconds depending on wheather conditions.
>
> The interesting thing for the future will be non-volatile main memory,
> with the primary concern being how to firewall damage due to a bug.
> You would be able to turn your computer off and back on and be working
> again almost instantaneously.
>
> Bob
> -- 
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us,
> http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send | verify | receive

2009-12-05 Thread dick hoogendijk
On Sat, 2009-12-05 at 09:22 -0600, Bob Friesenhahn wrote:

> You can also stream into a gzip or lzop wrapper in order to obtain the 
> benefit of incremental CRCs and some compression as well.

Can you give an example command line for this option please?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send | verify | receive

2009-12-05 Thread Richard Elling

On Dec 4, 2009, at 4:11 PM, Edward Ned Harvey wrote:

Depending of your version of OS, I think the following post from  
Richard

Elling
will be of great interest to you:
-
http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams 
.

html


Thanks!  :-)
No, wait! 

According to that page, if you "zfs receive -n" then you should get  
a 0 exit

status for success, and 1 for error.

Unfortunately, I've been sitting here and testing just now ...  I  
created a
"zfs send" datastream, then I made a copy of it and toggled a bit in  
the

middle to make it corrupt ...

I found that the "zfs receive -n" always returns 0 exit status, even  
if the
data stream is corrupt.  In order to get the "1" exit status, you  
have to

get rid of the "-n" which unfortunately means writing the completely
restored filesystem to disk.


I believe it will depend on the nature of the corruption.  Regardless,
the answer is to use zstreamdump.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on ssd

2009-12-05 Thread Richard Elling

On Dec 5, 2009, at 8:09 AM, Andrew Gabriel wrote:


Bob Friesenhahn wrote:
The interesting thing for the future will be non-volatile main  
memory, with the primary concern being how to firewall damage due  
to a bug. You would be able to turn your computer off and back on  
and be working again almost instantaneously.
Some of us are old enough (just) to have used computers back in the  
days when they all did this anyway...

Funny how things go full circle...


:-)
Get the power low enough and we'll never turn them off... you
can even remove the on/off switch entirely.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on ssd

2009-12-05 Thread Andrew Gabriel

Bob Friesenhahn wrote:
The interesting thing for the future will be non-volatile main memory, 
with the primary concern being how to firewall damage due to a bug. 
You would be able to turn your computer off and back on and be working 
again almost instantaneously.
Some of us are old enough (just) to have used computers back in the days 
when they all did this anyway...

Funny how things go full circle...

--
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Transaction consistency of ZFS

2009-12-05 Thread Bob Friesenhahn

On Sat, 5 Dec 2009, Seth Heeren wrote:


Yes. It is my understanding that (at least recent versions) will detect
incomplete transactions and simply rollback to the last consistent
uberblock in case of trouble.

I'm not completely up to speed with regard to the ODF, Uberblocks and
the ZIL; In my recollection the "inspection / selection" of uberblocks
had been in realm of manual recovery with zdb only, until lately. If I'm
not mistaken a automatic 'regress-to-last-known-good-uberblock' function
is new and recent.


Zfs has always rolled back to the last good state.  The manual 
rollback is to deal with the case where the underlying storage 
hardware misbehaved and did not persist the data as instructed but an 
older transaction group did get persisted ok.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on ssd

2009-12-05 Thread Bob Friesenhahn

On Sat, 5 Dec 2009, Seth Heeren wrote:


in the same way, I guess, when running an OS on a SSD boot disk,
should we still need the same memory swapping mechanisms as we do
today, considering that in that case, the swap device is (nearly) as
fast as memory itself.

Is it? I think that when you look up the numbers (for server-grade
hardware) you could find an order of magnitude difference. Now there are


The difference is pretty huge.  Consider 6GB+/second vs 140MB/second.

The interesting thing for the future will be non-volatile main memory, 
with the primary concern being how to firewall damage due to a bug. 
You would be able to turn your computer off and back on and be working 
again almost instantaneously.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on ssd

2009-12-05 Thread Neil Perrin



On 12/05/09 01:36, anu...@kqinfotech.com wrote:

Hi,

What you say is probably right with respect to L2ARC, but logging (ZIL or 
database log) is required for consistency purpose.


No, the ZIL is not required for consistency. The pool is fully consistent 
without
the ZIL. See  http://blogs.sun.com/perrin/entry/the_lumberjack for more details.

Neil. 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send | verify | receive

2009-12-05 Thread Bob Friesenhahn

On Sat, 5 Dec 2009, Sriram Narayanan wrote:


If feasible, you may want to generate MD5 sums on the streamed output
and then use these for verification.


You can also stream into a gzip or lzop wrapper in order to obtain the 
benefit of incremental CRCs and some compression as well.  As long as 
the wrapper is generated on the sending side (and not subject to 
problems like truncation) it should be quite useful for verifying that 
the stream has not been corrupted.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How can we help fix MPT driver post build 129

2009-12-05 Thread Rob Nelson
How can we help with what is outlined below.  I can reproduce these at will, so 
if anyone at Sun would like an environment to test this situation let me know.

What is the best info to grab for you folks to help here?

Thanks - nola



This is in regard to these threads:

http://www.opensolaris.org/jive/thread.jspa?messageID=421400񦸘
http://www.opensolaris.org/jive/thread.jspa?threadID=118947&tstart=0
http://www.opensolaris.org/jive/thread.jspa?threadID=117702&tstart=1
http://www.opensolaris.org/jive/thread.jspa?messageID=437031&tstart=0

And bug IDs: 

6894775 mpt driver timeouts and bus resets under load
6900767 Server hang with LSI 1068E based SAS controller under load

Exec Summary:  Those using the LSI 1068 chipset with the LSI SAS2x IC expander 
have IO errors under load from about build 118 to 129 (last build I tested).

At build 111b, it worked.  If you take the same hardware and load test scripts, 
run under 111b your OK, run under @118 and on you suffer from for example:

Dec  5 08:17:04 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:17:04 gb2000-007  Log info 0x3000 received for target 79.
Dec  5 08:17:04 gb2000-007  scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
Dec  5 08:17:07 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:17:07 gb2000-007  SAS Discovery Error on port 4. DiscoveryStatus 
is DiscoveryStatus is |Unaddressable device found|
Dec  5 08:18:09 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:09 gb2000-007  Disconnected command timeout for Target 79
Dec  5 08:18:14 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:14 gb2000-007  Log info 0x3113 received for target 79.
Dec  5 08:18:14 gb2000-007  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Dec  5 08:18:17 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:17 gb2000-007  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:18:17 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:17 gb2000-007  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:18:19 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:19 gb2000-007  Log info 0x3000 received for target 79.
Dec  5 08:18:19 gb2000-007  scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
Dec  5 08:18:22 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:22 gb2000-007  SAS Discovery Error on port 4. DiscoveryStatus 
is DiscoveryStatus is |Unaddressable device found|
Dec  5 08:19:24 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:24 gb2000-007  Disconnected command timeout for Target 79
Dec  5 08:19:29 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:29 gb2000-007  Log info 0x3113 received for target 79.
Dec  5 08:19:29 gb2000-007  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Dec  5 08:19:32 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:32 gb2000-007  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:19:32 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:32 gb2000-007  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:19:34 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:34 gb2000-007  Log info 0x3000 received for target 79.
Dec  5 08:19:34 gb2000-007  scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
Dec  5 08:19:37 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:37 gb2000-007  SAS Discovery Error on port 4. DiscoveryStatus 
is DiscoveryStatus is |Unaddressable device found|
Dec  5 08:20:39 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:20:39 gb2000-007  Disconnected command timeout for Target 79
Dec  5 08:20:39 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:20:39 gb2000-007  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x31112000
Dec  5 08:20:44 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:20:44 gb2000-007  Log info 0x3113 received for target 79.
Dec  5 08:20:44 gb2000-007  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Dec  5 08:20:44 gb2000-007 scsi: [ID 

Re: [zfs-discuss] Transaction consistency of ZFS

2009-12-05 Thread nxyyt
Thank you very much for your quick response.

My question is I  want to figure out whether there is data loss after power 
outage. I have replicas on other machines so I can recovery from the data loss. 
But I need a way to know whether there is data loss without comparing the 
different data replicas. 

I suppose if I append a footer to the end of file before I close it, I can 
detect the data loss by validating the footer. Is it a work aroud for me ? Or 
is there a better alternative? In my scenario, the file is append-only, no 
in-place overwrite.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Transaction consistency of ZFS

2009-12-05 Thread Seth Heeren
Yes. It is my understanding that (at least recent versions) will detect
incomplete transactions and simply rollback to the last consistent
uberblock in case of trouble.

I'm not completely up to speed with regard to the ODF, Uberblocks and
the ZIL; In my recollection the "inspection / selection" of uberblocks
had been in realm of manual recovery with zdb only, until lately. If I'm
not mistaken a automatic 'regress-to-last-known-good-uberblock' function
is new and recent.

I'm not quite sure whether that uberblock-based rollback _is being used
in the context of_ ZIL transaction recovery, or intended in case the ZIL
itself had failed (e.g.: ZIL on ramdisk or ZIL on failed vdev with
insufficient redundancy). I suspect it is separate and works even
without a ZIL. Note that of course this still means that working without
a ZIL or having a loss of the ZIL with a crash/unexpected shutdown of
ZFS will result in data-loss. It just won't (easily) result in a
corrupted zpool because it will try and find a working uberblock at all
times, possibly an older one, lacking the latest changes...

So far my ramblings. I'm sure it contains a few handy pointers where to
look for more solid info...

Seth

nxyyt wrote:
> Hi, everybody,
>
> I'm a newbie to ZFS. I have a special question against the COW transaction of 
> ZFS.
> Does ZFS keeps the sequential consistency when it meets power outage or 
> server crash?
>
> Assume following scenario:
>
> My application has only a single thread and it appends the data to the file 
> continuously. Suppose at time t1, it append a buf named A to the file. At 
> time t2, which is later than t1, it appends a buf named B to the file. If the 
> server crashes after t2, is it possible the buf B is flushed back to the disk 
> but buf A is not? 
>
> Does ZFS keep the consistency that the data written to a file in sequential 
> order or casual order be flushed to disk in the same order? If the writer 
> operation to a single file always binding with the same transaction group, I 
> think the answer should be YES.
>
> Hope anybody can help me clarify it. Thank you very much!
>   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Transaction consistency of ZFS

2009-12-05 Thread nxyyt
Hi, everybody,

I'm a newbie to ZFS. I have a special question against the COW transaction of 
ZFS.
Does ZFS keeps the sequential consistency when it meets power outage or server 
crash?

Assume following scenario:

My application has only a single thread and it appends the data to the file 
continuously. Suppose at time t1, it append a buf named A to the file. At time 
t2, which is later than t1, it appends a buf named B to the file. If the server 
crashes after t2, is it possible the buf B is flushed back to the disk but buf 
A is not? 

Does ZFS keep the consistency that the data written to a file in sequential 
order or casual order be flushed to disk in the same order? If the writer 
operation to a single file always binding with the same transaction group, I 
think the answer should be YES.

Hope anybody can help me clarify it. Thank you very much!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on ssd

2009-12-05 Thread Seth Heeren
Selim Daoud wrote:
> I was wondering if there were work done in the area of zfs
> configuration running out of 100% SSD disks.
>
> L2ARC and ZIL have been designed as a way to improve long seek
> times/latencies of rotational disks.
> now if we use only SSD (F5100 or F20) as back end drives for zfs, we
> should not need those additional log/cache mechanisms..or at least
> algorithms managing those caches might need improvement
Given correct tuning ZFS is already pretty solid (pun intended) on SSD.
Any log-structured thing is going to be fast on SSD. ZFS has the unique
property of being able to mix SSD and traditional SAS/SCSI drives for
maximum bang for the buck. If you just want bang, and no buck left, go
ahead and buy a zillion SSD drives instead?

If you don't want/need log or cache, disable these? You might want to
run your ZIL (slog) on ramdisk. Beware, that without a persisted ZIL
there _will_ be dataloss with unexpected shutdowns. I'd go for the
default: without explicit log-vdev(s) the ZIL will reside in the storage
pool itself.
>
> in the same way, I guess, when running an OS on a SSD boot disk,
> should we still need the same memory swapping mechanisms as we do
> today, considering that in that case, the swap device is (nearly) as
> fast as memory itself.
Is it? I think that when you look up the numbers (for server-grade
hardware) you could find an order of magnitude difference. Now there are
solid state storage cards that employ RAM chips and backup power to
persist the state. These are the fastest in the industry, but I know you
will _never_ want to put your multi-terabyte ZFS pools on those (better
buy a couple of Ferrari's instead).

> To some extension,  log journals found in DB would also not be
> relevant anymore?
>
I beg your pardon? The crux with transaction logs is that they get
_physically committed_ (and synced, that is) before writing the actual
transaction so that the log will survive reboot and the transaction can
be rolled back at reboot. This is crucial for atomicity/integrity. So...
the log is obviously still required to be on disk/SSD.
>
>
> tia,
>
> selim
> 
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send | verify | receive

2009-12-05 Thread Seth Heeren
Well what does _that_ verify?

It will verify that no bits provably broke during transport.

It will still leave the chance of getting an incompatible stream, an
incomplete stream (kill the dump), or plain corrupted data. Of course,
the chance of the latter should be extremely small in server-grade hardware.

$0.02

Sriram Narayanan wrote:
> If feasible, you may want to generate MD5 sums on the streamed output
> and then use these for verification.
>
> -- Sriram
>
> On 12/5/09, Edward Ned Harvey  wrote:
>   
>>> Depending of your version of OS, I think the following post from Richard
>>> Elling
>>> will be of great interest to you:
>>> -
>>> http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams.
>>> html
>>>   
>> Thanks!  :-)
>> No, wait! 
>>
>> According to that page, if you "zfs receive -n" then you should get a 0 exit
>> status for success, and 1 for error.
>>
>> Unfortunately, I've been sitting here and testing just now ...  I created a
>> "zfs send" datastream, then I made a copy of it and toggled a bit in the
>> middle to make it corrupt ...
>>
>> I found that the "zfs receive -n" always returns 0 exit status, even if the
>> data stream is corrupt.  In order to get the "1" exit status, you have to
>> get rid of the "-n" which unfortunately means writing the completely
>> restored filesystem to disk.
>>
>> I've sent a message to Richard to notify him of the error on his page.  But
>> it would seem, the zstreamdump must be the only way to verify the integrity
>> of a stored data stream.  I haven't tried it yet, and I'm out of time for
>> today...
>>
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>> 
>
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on ssd

2009-12-05 Thread anurag
Hi,

What you say is probably right with respect to L2ARC, but logging (ZIL or 
database log) is required for consistency purpose.

Anurag.
Sent from my BlackBerry® smartphone from !DEA

-Original Message-
From: Selim Daoud 
Date: Sat, 5 Dec 2009 08:59:52 
To: ZFS Discussions
Subject: [zfs-discuss] zfs on ssd

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs on ssd

2009-12-05 Thread Selim Daoud
I was wondering if there were work done in the area of zfs configuration
running out of 100% SSD disks.

L2ARC and ZIL have been designed as a way to improve long seek
times/latencies of rotational disks.
now if we use only SSD (F5100 or F20) as back end drives for zfs, we should
not need those additional log/cache mechanisms..or at least algorithms
managing those caches might need improvement

in the same way, I guess, when running an OS on a SSD boot disk, should we
still need the same memory swapping mechanisms as we do today, considering
that in that case, the swap device is (nearly) as fast as memory itself. To
some extension,  log journals found in DB would also not be relevant
anymore?



tia,

selim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss