Re: [zfs-discuss] ZFS Mountroot and Bootroot Comparison

2007-10-04 Thread Kugutsumen
Please do share how you managed to have a separate ZFS /usr since  
b64; there are dependencies to /usr and they are not documented.
-kv doesn't help too.  I tried added /usr/lib/libdisk* to a /usr/lib  
dir on the root partition and failed.

Jurgen also pointed that there are two related bugs already filed:

Bug ID   6570056
Synopsis/sbin/zpool should not link to files in /usr/lib
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6570056

Bug ID   6494840
Synopsislibzfs should dlopen libiscsitgt rather than linking to it
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494840

I can do a snapshot on bootroot too ... after I tried to do a  
rollback from failsafe I couldn't boot anymore, probably because  
there was no straightforward way to rebuild the boot archive.

Regarding compression, if I am not mistaken, grub cannot access files  
that are compressed.

Regards,
K.

On 05/10/2007, at 5:55 AM, Andre Wenas wrote:

> Hi,
>
> Using bootroot I can do seperate /usr filesystem since b64. I can  
> also do snapshot, clone and compression.
>
> Rgds,
> Andre W.
>
> Kugutsumen wrote:
>> Lori Alt told me that mountrount was a temporary hack until grub   
>> could boot zfs natively.
>> Since build 62, mountroot support was dropped and I am not  
>> convinced  that this is a mistake.
>>
>> Let's compare the two:
>>
>> Mountroot:
>>
>> Pros:
>>* can have root partition on raid-z: YES
>>* can have root partition on zfs stripping mirror: YES
>>* can have usr partition on separate ZFS partition with build  
>> <  72 : YES
>>* can snapshot and rollback root partition: YES
>>* can use copies on root partition on a single root disk (e.g.  
>> a  laptop ): YES
>>* can use compression on root partition: YES
>> Cons:
>>* grub native support: NO (if you use raid-z or stripping  
>> mirror,  you will need to have a small UFS partition
>>  to bootstrap the system, but you can use a small usb stick  
>> for  that purpose.)
>>
>> New and "improved" *sigh* bootroot scheme:
>>
>> Pros:
>>* grub native support: YES
>> Cons:
>>* can have root partition on raid-z: NO
>>* can have root partition on zfs stripping mirror: NO
>>* can use copies on root partition on a single root disk (e.g.  
>> a  laptop ): NO
>>* can have usr partition on separate ZFS partition with build  
>> <  72 : NO
>>* can snapshot and rollback root partition: NO
>>* can use compression on root partition: NO
>>* No backward compatibility with zfs mountroot.
>>
>> Why did we completely drop support for the old mountroot approach   
>> which is so much more flexible?
>>
>> Kugutsumen
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Jonathan Loran


I've been thinking about this for awhile, but Anton's analysis makes me 
think about it even more:


We all love ZFS, right.  It's futuristic in a bold new way, which many 
virtues,  I won't preach tot he choir.  But to make it all glue together 
has some necessary CPU/Memory intensive operations around checksum 
generation/validation, compression, encryption, data placement/component 
load balancing, etc.  Processors have gotten really powerful, much more 
so than the relative disk I/O gains, which in all honesty make ZFS 
possible.  My question: Is anyone working on an offload engine for ZFS?  
I can envision a highly optimized, pipelined system, where writes and 
reads pass through checksum, compression, encryption ASICs, that also 
locate data properly on disk.  This could even be in the form of a PCIe 
SATA/SAS card with many ports, or different options.  This would make 
direct IO, or DMA IO possible again.  The file system abstraction with 
ZFS is really too much and too important to ignore, and too hard to 
optimize with different load conditions, (my rookie opinion) to expect 
any RDBMS app to have a clue what to do with it.  I guess what I'm 
saying is the RDMBS app will know what blocks it needs, and wants to get 
them in and out speedy quick, but the mapping to disk is not linear with 
ZFS, the way other file systems are.  An offload engine could translate 
this instead.


Just throwing this out there for the purpose of blue sky fluff.

Jon

Anton B. Rang wrote:

5) DMA straight from user buffer to disk avoiding a copy.



This is what the "direct" in "direct i/o" has historically meant.  :-)

  

line has been that 5) won't help latency much and
latency is here I think the game is currently played. Now the
disconnect might be because people might feel that the game
is not latency but CPU efficiency : "how many CPU cycles do I
burn to do get data from disk to user buffer".



Actually, it's less CPU cycles in many cases than memory cycles.

For many databases, most of the I/O is writes (reads wind up
cached in memory).  What's the cost of a write?

With direct I/O: CPU writes to memory (spread out over many
transactions), disk DMAs from memory.  We write LPS (log page size)
bytes of data from CPU to memory, we read LPS bytes from memory.
On processors without a cache line zero, we probably read the LPS
data from memory as part of the write.  Total cost = W:LPS, R:2*LPS.

Without direct I/O: The cost of getting the data into the user buffer
remains the same (W:LPS, R:LPS).  We copy the data from user buffer
to system buffer (W:LPS, R:LPS).  Then we push it out to disk.  Total
cost = W:2*LPS, R:3*LPS.  We've nearly doubled the cost, not including
any TLB effects.

On a memory-bandwidth-starved system (which should be nearly all
modern designs, especially with multi-threaded chips like Niagara),
replacing buffered I/O with direct I/O should give you nearly a 2x
improvement in log write bandwidth.  That's without considering
cache effects (which shouldn't be too significant, really, since LPS
should be << the size of L2).

How significant is this?  We'd have to measure; and it will likely
vary quite a lot depending on which database is used for testing.

But note that, for ZFS, the win with direct I/O will be somewhat
less.  That's because you still need to read the page to compute
its checksum.  So for direct I/O with ZFS (with checksums enabled),
the cost is W:LPS, R:2*LPS.  Is saving one page of writes enough to
make a difference?  Possibly not.

Anton
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


--


- _/ _/  /   - Jonathan Loran -   -
-/  /   /IT Manager   -
-  _  /   _  / / Space Sciences Laboratory, UC Berkeley
-/  / /  (510) 643-5146 [EMAIL PROTECTED]
- __/__/__/   AST:7731^29u18e3




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs + iscsi target + vmware esx server

2007-10-04 Thread Adam Bodette
I'm posting here as this seems to be a zfs issue.  We also have an open ticket 
with Sun support and I've heard another large sun customer also is reporting 
this as an issue.

Basic Problem:  Create a zfs file system and set shareiscsi to on.  On a vmware 
esx server discover that iscsi target.  It shows up as 249 luns.  When 
attempting to then add the storage esx server eventually times out, if you view 
it from command line you see it checking each lun and crashing out before it 
gets to 249.

Test Environment: 
Sun ISCSI Target Host:  compaq PC with 2 80GB SATA drives, 1GB RAM, 3.2GHz CPU 
running solaris 10 x86 update 4 (08/07).  The second drive is setup as a 
storage pool with 1 filesystem created with shareiscsi on.

ESX Server: compaq PC with SCSI HD, 1GB RAM, 3.2GHz CPU running vmware esx 
server 3.0.2

Other test scenarios:

We also created an iscsi target via iscsitadm using a spare slice on the 
primary disk.  esx server sees this target just fine with a single lun as 
expected.

Trying to get more creative I did an iscsitadm modify admin and set the path to 
my zfs filesystem.  I then used iscsitadm to create a new target that was a 
file (which would get created on the zfs filesystem).  However in esx server I 
see the same results with 249 luns.  The one difference this time is the first 
lun is the size I created the target as and the other 248 are the size of the 
zfs filesystem.

So if zfs is involved it screws up, if it's not it's fine.

I do not yet have another solaris 10 update 4 system to test as an initiator 
but my update 3 system sees it just fine in all test scenarios.

It seems to be an issue with how the iscsi target is being broadcast when zfs 
is involved that the solaris initiator doesn't seem to mind but esx server sure 
does.  Since it works fine when I use a regular disk slice I think it's 
something to do with zfs but I wouldn't rule out an issue with esx completely 
yet.

Any help is greatly appreciated.  We are looking to role out multiple thumpers 
and vmware servers using the thumpers as their backend storage via iscsi (or 
nfs if we have to, but would rather go iscsi)

Thanks!
Adam
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for OSX - it'll be in there.

2007-10-04 Thread Ben Rockwood
Dale Ghent wrote:
> ...and eventually in a read-write capacity:
>
> http://www.macrumors.com/2007/10/04/apple-seeds-zfs-read-write- 
> developer-preview-1-1-for-leopard/
>
> Apple has seeded version 1.1 of ZFS (Zettabyte File System) for Mac  
> OS X to Developers this week. The preview updates a previous build  
> released on June 26, 2007.
>   

Y!  Finally my USB Thumb Drives will work on my MacBook! :)

I wonder if it'll automatically mount the Zpool on my iPod when I sync it.

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS for OSX - it'll be in there.

2007-10-04 Thread Dale Ghent
...and eventually in a read-write capacity:

http://www.macrumors.com/2007/10/04/apple-seeds-zfs-read-write- 
developer-preview-1-1-for-leopard/

Apple has seeded version 1.1 of ZFS (Zettabyte File System) for Mac  
OS X to Developers this week. The preview updates a previous build  
released on June 26, 2007.

/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Anton B. Rang
> 5) DMA straight from user buffer to disk avoiding a copy.

This is what the "direct" in "direct i/o" has historically meant.  :-)

> line has been that 5) won't help latency much and
> latency is here I think the game is currently played. Now the
> disconnect might be because people might feel that the game
> is not latency but CPU efficiency : "how many CPU cycles do I
> burn to do get data from disk to user buffer".

Actually, it's less CPU cycles in many cases than memory cycles.

For many databases, most of the I/O is writes (reads wind up
cached in memory).  What's the cost of a write?

With direct I/O: CPU writes to memory (spread out over many
transactions), disk DMAs from memory.  We write LPS (log page size)
bytes of data from CPU to memory, we read LPS bytes from memory.
On processors without a cache line zero, we probably read the LPS
data from memory as part of the write.  Total cost = W:LPS, R:2*LPS.

Without direct I/O: The cost of getting the data into the user buffer
remains the same (W:LPS, R:LPS).  We copy the data from user buffer
to system buffer (W:LPS, R:LPS).  Then we push it out to disk.  Total
cost = W:2*LPS, R:3*LPS.  We've nearly doubled the cost, not including
any TLB effects.

On a memory-bandwidth-starved system (which should be nearly all
modern designs, especially with multi-threaded chips like Niagara),
replacing buffered I/O with direct I/O should give you nearly a 2x
improvement in log write bandwidth.  That's without considering
cache effects (which shouldn't be too significant, really, since LPS
should be << the size of L2).

How significant is this?  We'd have to measure; and it will likely
vary quite a lot depending on which database is used for testing.

But note that, for ZFS, the win with direct I/O will be somewhat
less.  That's because you still need to read the page to compute
its checksum.  So for direct I/O with ZFS (with checksums enabled),
the cost is W:LPS, R:2*LPS.  Is saving one page of writes enough to
make a difference?  Possibly not.

Anton
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] O.T. "patches" for OpenSolaris

2007-10-04 Thread Dick Davies
On 30/09/2007, William Papolis <[EMAIL PROTECTED]> wrote:
> Henk,
>
> By upgrading do you mean, rebooting and installing Open Solaris from DVD or 
> Network?
>
> Like, no Patch Manager install some quick patches and updates and a quick 
> reboot, right?

You can live upgrade and then do a quick reboot:

  
http://number9.hellooperator.net/articles/2007/08/08/solaris-laptop-live-upgrade


-- 
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Mountroot and Bootroot Comparison

2007-10-04 Thread Andre Wenas
Hi,

Using bootroot I can do seperate /usr filesystem since b64. I can also 
do snapshot, clone and compression.

Rgds,
Andre W.

Kugutsumen wrote:
> Lori Alt told me that mountrount was a temporary hack until grub  
> could boot zfs natively.
> Since build 62, mountroot support was dropped and I am not convinced  
> that this is a mistake.
>
> Let's compare the two:
>
> Mountroot:
>
> Pros:
>* can have root partition on raid-z: YES
>* can have root partition on zfs stripping mirror: YES
>* can have usr partition on separate ZFS partition with build <  
> 72 : YES
>* can snapshot and rollback root partition: YES
>* can use copies on root partition on a single root disk (e.g. a  
> laptop ): YES
>* can use compression on root partition: YES
> Cons:
>* grub native support: NO (if you use raid-z or stripping mirror,  
> you will need to have a small UFS partition
>  to bootstrap the system, but you can use a small usb stick for  
> that purpose.)
>
> New and "improved" *sigh* bootroot scheme:
>
> Pros:
>* grub native support: YES
> Cons:
>* can have root partition on raid-z: NO
>* can have root partition on zfs stripping mirror: NO
>* can use copies on root partition on a single root disk (e.g. a  
> laptop ): NO
>* can have usr partition on separate ZFS partition with build <  
> 72 : NO
>* can snapshot and rollback root partition: NO
>* can use compression on root partition: NO
>* No backward compatibility with zfs mountroot.
>
> Why did we completely drop support for the old mountroot approach  
> which is so much more flexible?
>
> Kugutsumen
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

2007-10-04 Thread Nathan Kroenert
Awesome.

Thanks, Eric. :)

This type of feature / fix is quite important to a number of the guys in 
the our local OSUG. In particular, they are adamant that they cannot use 
ZFS in production until it stops panicing the whole box for isolated 
filesystem / zpool failures.

This will be a big step. :)

Cheers.

Nathan.

Eric Schrock wrote:
> On Fri, Oct 05, 2007 at 08:20:13AM +1000, Nathan Kroenert wrote:
>> Erik -
>>
>> Thanks for that, but I know the pool is corrupted - That was kind if the 
>> point of the exercise.
>>
>> The bug (at least to me) is ZFS panicing Solaris just trying to import 
>> the dud pool.
>>
>> But, maybe I'm missing your point?
>>
>> Nathan.
> 
> This a variation on the "read error while writing" problem.  It is a
> known issue and a generic solution (to handle any kind of non-replicated
> writes failing) is in the works (see PSARC 2007/567).
> 
> - Eric
> 
> --
> Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

2007-10-04 Thread Eric Schrock
On Fri, Oct 05, 2007 at 08:20:13AM +1000, Nathan Kroenert wrote:
> Erik -
> 
> Thanks for that, but I know the pool is corrupted - That was kind if the 
> point of the exercise.
> 
> The bug (at least to me) is ZFS panicing Solaris just trying to import 
> the dud pool.
> 
> But, maybe I'm missing your point?
> 
> Nathan.

This a variation on the "read error while writing" problem.  It is a
known issue and a generic solution (to handle any kind of non-replicated
writes failing) is in the works (see PSARC 2007/567).

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

2007-10-04 Thread Nathan Kroenert
Erik -

Thanks for that, but I know the pool is corrupted - That was kind if the 
point of the exercise.

The bug (at least to me) is ZFS panicing Solaris just trying to import 
the dud pool.

But, maybe I'm missing your point?

Nathan.




eric kustarz wrote:
>>
>> Client A
>>   - import pool make couple-o-changes
>>
>> Client B
>>   - import pool -f  (heh)
>>
>> Client A + B - With both mounting the same pool, touched a couple of
>> files, and removed a couple of files from each client
>>
>> Client A + B - zpool export
>>
>> Client A - Attempted import and dropped the panic.
>>
> 
> ZFS is not a clustered file system.  It cannot handle multiple readers 
> (or multiple writers).  By importing the pool on multiple machines, you 
> have corrupted the pool.
> 
> You purposely did that by adding the '-f' option to 'zpool import'.  
> Without the '-f' option ZFS would have told you that its already 
> imported on another machine (A).
> 
> There is no bug here (besides admin error :)  ).
> 
> eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] chgrp -R hangs all writes to pool

2007-10-04 Thread Stuart Anderson
On Mon, Jul 16, 2007 at 09:36:06PM -0700, Stuart Anderson wrote:
> Running Solaris 10 Update 3 on an X4500 I have found that it is possible
> to reproducibly block all writes to a ZFS pool by running "chgrp -R"
> on any large filesystem in that pool.  As can be seen below in the zpool
> iostat output below, after about 10-sec of running the chgrp command all
> writes to the pool stop, and the pool starts exclusively running a slow
> background task of 1kB reads.
> 
> At this point the chgrp -R command is not killable via root kill -9,
> and in fact even the command "halt -d" does not do anything.
> 

For posterity this appears to have been fixed in S10U4, at least I am
unable to reproduce the problem that was easy to trigger with S10U3.

Thanks.

-- 
Stuart Anderson  [EMAIL PROTECTED]
http://www.ligo.caltech.edu/~anderson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Do we have a successful installation method for patch 120011-14?

2007-10-04 Thread Brian H. Nelson
It was 120272-12 that caused ths snmp.conf problem and was withdrawn. 
120272-13 has replaced it and has that bug fixed.

122660-10 does not have any issues that I am aware of. It is only 
obsolete, not withdrawn. Additionally, it appears that the circular 
patch dependency is by design if you read this BugID:

6574472 U4 feature Ku's need to hard require a patch that enforces 
zoneadmd patch is installed

So hacking the prepatch script for 125547-02/125548-02 to bypass the 
dependency check (as others have recommended) is a BAD THING and you may 
wind up with a broken system.

-Brian


Rob Windsor wrote:
> Yeah, the only thing wrong with that patch is that it eats 
> /etc/sma/snmp/snmpd.conf
>
> All is not lost, your original is copied to 
> /etc/sma/snmp/snmpd.conf.save in the process.
>
> Rob++
>
> Brian H. Nelson wrote:
>   
>> Manually installing the obsolete patch 122660-10 has worked fine for me. 
>> Until sun fixes the patch dependencies, I think that is the easiest way.
>>
>> -Brian
>>
>>
>> 

-- 
---
Brian H. Nelson Youngstown State University
System Administrator   Media and Academic Computing
  bnelson[at]cis.ysu.edu
---

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Do we have a successful installation method for patch 120011-14?

2007-10-04 Thread Rob Windsor
Yeah, the only thing wrong with that patch is that it eats 
/etc/sma/snmp/snmpd.conf

All is not lost, your original is copied to 
/etc/sma/snmp/snmpd.conf.save in the process.

Rob++

Brian H. Nelson wrote:
> Manually installing the obsolete patch 122660-10 has worked fine for me. 
> Until sun fixes the patch dependencies, I think that is the easiest way.
> 
> -Brian
> 
> Bruce Shaw wrote:
>> It fails on my machine because it requires a patch that's deprecated.
>>
>> This email and any files transmitted with it are confidential and intended 
>> solely for the use of the individual or entity to whom they are addressed. 
>> If you have received this email in error please notify the system manager. 
>> This message contains confidential information and is intended only for the 
>> individual named. If you are not the named addressee you should not 
>> disseminate, distribute or copy this e-mail.
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>   
> 


-- 
Internet: [EMAIL PROTECTED] __o
Life: [EMAIL PROTECTED]_`\<,_
(_)/ (_)
"They couldn't hit an elephant at this distance."
   -- Major General John Sedgwick
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Convert Raid-Z to Mirror

2007-10-04 Thread Brian King
Update to this. Before destroying the original pool the first time, offline the 
disk you plan on re-using in the new pool. Otherwise when you destroy the 
original pool for the second time it causes issues with the new pool. In fact, 
if you attempt to destroy the new pool immediately after destroying the 
original pool, the system will panic.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Do we have a successful installation method for patch 120011-14?

2007-10-04 Thread Brian H. Nelson
Manually installing the obsolete patch 122660-10 has worked fine for me. 
Until sun fixes the patch dependencies, I think that is the easiest way.

-Brian

Bruce Shaw wrote:
> It fails on my machine because it requires a patch that's deprecated.
>
> This email and any files transmitted with it are confidential and intended 
> solely for the use of the individual or entity to whom they are addressed. If 
> you have received this email in error please notify the system manager. This 
> message contains confidential information and is intended only for the 
> individual named. If you are not the named addressee you should not 
> disseminate, distribute or copy this e-mail.
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

-- 
---
Brian H. Nelson Youngstown State University
System Administrator   Media and Academic Computing
  bnelson[at]cis.ysu.edu
---

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Eric Hamilton
I'd like to second a couple of comments made recently:
* If they don't regularly do so, I too encourage the ZFS, Solaris 
performance, and Sun Oracle support teams to sit down and talk about the 
utility of Direct I/O for databases.
* I too suspect that absent Direct I/O (or some ringing endorsement 
from Oracle about how ZFS doesn't need Direct I/O), there will be a 
drain of customer escalations regarding the lack-- plus FUD and other 
sales inhibitors.

While I realize that Sun has not published a TPC-C result since 2001 and 
offers a different value proposition to customers, performance does 
matter and for some cases Direct I/O can contribute to that.

Historically, every TPC-C database benchmark run can be converted from 
being I/O bound to being CPU bound by adding enough disk spindles and 
enough main memory.  In that context, saving the CPU cycles (and cache 
misses) from a copy are important.

Another historical trend was that for performance, portability across 
different operating systems, and perhaps just because they could, 
databases tended to use as few OS capabilities as possible and to do 
their own resource management.  So for instance databases were often 
benchmarked using raw devices.  Customers on the other hand preferred 
the manageability of filesystems and tended to deploy there.  In that 
context, Direct I/O is an attempt to get the best of both worlds.

Finally, besides UFS Direct I/O on Solaris, other filesystems including 
VxFS also have various forms of Direct I/O-- either separate APIs or 
mount options for that bypass the cache on large writes, etc.  
Understanding those benefits, both real and advertised, helps understand 
the opportunities and shortfalls for ZFS.

It may be that this is not the most important thing for ZFS performance 
or capability right now-- measurement in  targeted configurations and 
workloads is the only way to tell-- but I'd be highly surprised if there 
isn't something (bypass cache on really large writes?) that can't be 
learned from experiences with Direct I/O.

Eric (Hamilton)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Nicolas Williams
On Thu, Oct 04, 2007 at 06:59:56PM +0200, Roch - PAE wrote:
> Nicolas Williams writes:
>  > On Thu, Oct 04, 2007 at 03:49:12PM +0200, Roch - PAE wrote:
>  > > So the DB memory pages should not be _contented_ for. 
>  > 
>  > What if your executable text, and pretty much everything lives on ZFS?
>  > You don't want to content for the memory caching those things either.
>  > It's not just the DB's memory you don't want to contend for.
> 
> On the read side, 
> 
> We're talking  here  about  1000   disks each  running35
> concurrent I/Os of 8K, so a footprint of 250MB, to stage a
> ton of work.

I'm not sure what you mean, but extra copies and memory just to stage
the I/Os is not the same as the systemic memory pressure issue.

Now, I'm _speculating_ as to what the real problem is, but it seems very
likely that putting things in the cache that needn't be there would push
out things that should be there, and since restoring those things to the
cache later would cost I/Os...

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Roch - PAE

Nicolas Williams writes:
 > On Wed, Oct 03, 2007 at 04:31:01PM +0200, Roch - PAE wrote:
 > >  > It does, which leads to the core problem. Why do we have to store the
 > >  > exact same data twice in memory (i.e., once in the ARC, and once in
 > >  > the shared memory segment that Oracle uses)? 
 > > 
 > > We do not retain 2 copies of the same data.
 > > 
 > > If the DB cache is made large enough to consume most of memory,
 > > the ZFS copy will quickly be evicted to stage other I/Os on
 > > their way to the DB cache.
 > > 
 > > What problem does that pose ?
 > 
 > Other things deserving of staying in the cache get pushed out by things
 > that don't deserve being in the cache.  Thus systemic memory pressure
 > (e.g., more on-demand paging of text).
 > 
 > Nico
 > -- 

I agree. That's why I submitted both of these.

6429855 Need way to tell ZFS that caching is a lost cause
6488341 ZFS should avoiding growing the ARC into trouble

-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Do we have a successful installation method for patch 120011-14?

2007-10-04 Thread Bruce Shaw
It fails on my machine because it requires a patch that's deprecated.

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Mountroot and Bootroot Comparison

2007-10-04 Thread Eric Schrock
Remember that you have to maintain an entirely separate slice with yet
another boot environment.  This causes huge amounts of complexity in
terms of live upgrade, multiple BE management, etc.  The old mountroot
solution was useful for mounting ZFS root, but completely unmaintainable
from an installation and upgrade perspective.  It was dropped because we
could not possibly develop installation, packaging, and upgrade software
that would work across multiple BEs under such a scheme.

- Eric

On Thu, Oct 04, 2007 at 11:27:46PM +0700, Kugutsumen wrote:
> Lori Alt told me that mountrount was a temporary hack until grub  
> could boot zfs natively.
> Since build 62, mountroot support was dropped and I am not convinced  
> that this is a mistake.
> 
> Let's compare the two:
> 
> Mountroot:
> 
> Pros:
>* can have root partition on raid-z: YES
>* can have root partition on zfs stripping mirror: YES
>* can have usr partition on separate ZFS partition with build <  
> 72 : YES
>* can snapshot and rollback root partition: YES
>* can use copies on root partition on a single root disk (e.g. a  
> laptop ): YES
>* can use compression on root partition: YES
> Cons:
>* grub native support: NO (if you use raid-z or stripping mirror,  
> you will need to have a small UFS partition
>  to bootstrap the system, but you can use a small usb stick for  
> that purpose.)
> 
> New and "improved" *sigh* bootroot scheme:
> 
> Pros:
>* grub native support: YES
> Cons:
>* can have root partition on raid-z: NO
>* can have root partition on zfs stripping mirror: NO
>* can use copies on root partition on a single root disk (e.g. a  
> laptop ): NO
>* can have usr partition on separate ZFS partition with build <  
> 72 : NO
>* can snapshot and rollback root partition: NO
>* can use compression on root partition: NO
> 
> Why did we completely drop support for the old mountroot approach  
> which is so much more flexible?

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Roch - PAE

Nicolas Williams writes:

 > On Thu, Oct 04, 2007 at 03:49:12PM +0200, Roch - PAE wrote:
 > > ...memory utilisation... OK so we should implement the 'lost cause' rfe.
 > > 
 > > In all cases, ZFS must not steal pages from other memory consumers :
 > > 
 > >6488341 ZFS should avoiding growing the ARC into trouble
 > > 
 > > So the DB memory pages should not be _contented_ for. 
 > 
 > What if your executable text, and pretty much everything lives on ZFS?
 > You don't want to content for the memory caching those things either.
 > It's not just the DB's memory you don't want to contend for.

On the read side, 

We're talking  here  about  1000   disks each  running35
concurrent I/Os of 8K, so a footprint of 250MB, to stage a
ton of work.

On the write side  we do have  to play with  the transaction
group  so  that will be  5-10   seconds worth of synchronous
write activity.

But how much memory does a 1000-disks server got ?

-r




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] About bug 6486493 (ZFS boot incompatible with

2007-10-04 Thread Eric Schrock
On Thu, Oct 04, 2007 at 05:22:58AM -0700, Ivan Wang wrote:
> > This bug was rendered moot via 6528732 in build
> > snv_68 (and s10_u5).  We
> > now store physical devices paths with the vnodes, so
> > even though the
> > SATA framework doesn't correctly support open by
> > devid in early boot, we
> 
> But if I read it right, there is still a problem in SATA framework (failing 
> ldi_open_by_devid,) right?
> If this problem is framework-wide, it might just bite back some time in the 
> future.
> 

Yes, there is still a bug in the SATA framework, in that
ldi_open_by_devid() doesn't work early in boot.  Opening by device path
works so long as you don't recable your boot devices.  If we had open by
devid working in early boot, then this wouldn't be a problem.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Nicolas Williams
On Thu, Oct 04, 2007 at 03:49:12PM +0200, Roch - PAE wrote:
> ...memory utilisation... OK so we should implement the 'lost cause' rfe.
> 
> In all cases, ZFS must not steal pages from other memory consumers :
> 
>   6488341 ZFS should avoiding growing the ARC into trouble
> 
> So the DB memory pages should not be _contented_ for. 

What if your executable text, and pretty much everything lives on ZFS?
You don't want to content for the memory caching those things either.
It's not just the DB's memory you don't want to contend for.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Nicolas Williams
On Wed, Oct 03, 2007 at 04:31:01PM +0200, Roch - PAE wrote:
>  > It does, which leads to the core problem. Why do we have to store the
>  > exact same data twice in memory (i.e., once in the ARC, and once in
>  > the shared memory segment that Oracle uses)? 
> 
> We do not retain 2 copies of the same data.
> 
> If the DB cache is made large enough to consume most of memory,
> the ZFS copy will quickly be evicted to stage other I/Os on
> their way to the DB cache.
> 
> What problem does that pose ?

Other things deserving of staying in the cache get pushed out by things
that don't deserve being in the cache.  Thus systemic memory pressure
(e.g., more on-demand paging of text).

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Roch - PAE

eric kustarz writes:
 > >
 > > Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ. I'm
 > > surprised that this is being met with skepticism considering that
 > > Oracle highly recommends direct IO be used,  and, IIRC, Oracle
 > > performance was the main motivation to adding DIO to UFS back in
 > > Solaris 2.6. This isn't a problem with ZFS or any specific fs per se,
 > > it's the buffer caching they all employ. So I'm a big fan of seeing
 > > 6429855 come to fruition.
 > 
 > The point is that directI/O typically means two things:
 > 1) concurrent I/O
 > 2) no caching at the file system
 > 

In my blog I also mention :

   3) no readahead (but can be viewed as an implicit consequence of 2)

And someone chimed in with

   4) ability to do I/O at the sector granularity.


I also think that for many 2) is too weak form of what they
expect :

   5) DMA straight from user buffer to disk avoiding a copy.


So
 
   1) concurrent I/O we have in ZFS.

   2) No Caching.
  we could do by taking a directio hint and evict 
  arc buffer immediately after copyout to user space
  for reads,  and after txg completion for writes.

   3) No prefetching.
  we have 2 level of prefetching. The low level was
  fixed recently. Should not cause problem to DB loads.
  The high level still needs fixing on it's own.
  Then we should take the same hint as 2) to disable it
  altogether. In the mean time we can tune our way into 
  this mode.

   4) Sector sized I/O
  Is really foreign to ZFS design.

   5) Zero Copy & more CPU efficientcy.
  I think is where the debate is.



My line has been that 5) won't help latency much and latency is
where I think the game is currently played. Now the
disconnect might be because people might feel that the game
is not latency but CPU efficientcy : "how many CPU cycles to I
burn to do get data from disk to user buffer". This is a
valid point. Configurations can with very large number
of disks end up saturated by the filesystem CPU utilisation.

So I still think that the major area  for ZFS perf gains are
on the latency  front : block  allocation (now much improved
with  the Separate  intent log),  I/O  scheduling, and other
fixes to the threading & ARC behavior.  But at some point we
can turn  our microscope onthe CPU efficientcy  of   the
implementation.   The copy will certainly be  a big chunk of
the CPU cost per  I/O but I would still  like to gather that
data.

Also  consider, 50  disks at  200 IOPS of   8K is 80 MB/sec.
That means maybe  1/10th  of a single  CPU  to  be saved  by
avoiding just   the copy. Probably  not  what people have in
mind.  How many CPU's do you have when attaching 1000 drives 
to a host running a 100TB database ? That many drivers will barely 
occupy 2 cores running the copies.

People want  performance and efficientcy. Directio is
just an overloaded name that  delivered those gains to other
filesystems.

Right now, what I think  is worth gathering is cycles  spent
in ZFS per reads & writes in a large DB environment where DB
holds  90%  of memory.  For  comparison with another  FS, we
should disable checksum, file prefetching, vdev prefetching,
cap the  ARC, atime  off,  8K  recordsize.  A breakdown  and
comparison  of   the  CPU  cost per   layer   will  be quite
interesting and points to what needs work.

Another interesting thing for me would be : what is your
budget ?

"how   many cycles per DB   reads and writes are you
willing to spend and how did you come to that number"


But, as Eric says, let's develop 2 and I'll try  in parallel to 
figure out the per layer breakdown cost.

-r



 > Most file systems (ufs, vxfs, etc.) don't do 1) or 2) without turning  
 > on "directI/O".
 > 
 > ZFS *does* 1.  It doesn't do 2 (currently).
 > 
 > That is what we're trying to discuss here.
 > 
 > Where does the win come from with "directI/O"?  Is it 1), 2), or some  
 > combination?  If its a combination, what's the percentage of each  
 > towards the win?
 > 
 > We need to tease 1) and 2) apart to have a full understanding.  I'm  
 > not against adding 2) to ZFS but want more information.  I suppose  
 > i'll just prototype it and find out for myself.
 > 
 > eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Mountroot and Bootroot Comparison

2007-10-04 Thread Kugutsumen
Lori Alt told me that mountrount was a temporary hack until grub  
could boot zfs natively.
Since build 62, mountroot support was dropped and I am not convinced  
that this is a mistake.

Let's compare the two:

Mountroot:

Pros:
   * can have root partition on raid-z: YES
   * can have root partition on zfs stripping mirror: YES
   * can have usr partition on separate ZFS partition with build <  
72 : YES
   * can snapshot and rollback root partition: YES
   * can use copies on root partition on a single root disk (e.g. a  
laptop ): YES
   * can use compression on root partition: YES
Cons:
   * grub native support: NO (if you use raid-z or stripping mirror,  
you will need to have a small UFS partition
 to bootstrap the system, but you can use a small usb stick for  
that purpose.)

New and "improved" *sigh* bootroot scheme:

Pros:
   * grub native support: YES
Cons:
   * can have root partition on raid-z: NO
   * can have root partition on zfs stripping mirror: NO
   * can use copies on root partition on a single root disk (e.g. a  
laptop ): NO
   * can have usr partition on separate ZFS partition with build <  
72 : NO
   * can snapshot and rollback root partition: NO
   * can use compression on root partition: NO
   * No backward compatibility with zfs mountroot.

Why did we completely drop support for the old mountroot approach  
which is so much more flexible?

Kugutsumen

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

2007-10-04 Thread A Darren Dunham
On Thu, Oct 04, 2007 at 08:36:10AM -0600, eric kustarz wrote:
> > Client A
> >   - import pool make couple-o-changes
> >
> > Client B
> >   - import pool -f  (heh)
> >
> > Client A + B - With both mounting the same pool, touched a couple of
> > files, and removed a couple of files from each client
> >
> > Client A + B - zpool export
> >
> > Client A - Attempted import and dropped the panic.
> >
> 
> ZFS is not a clustered file system.  It cannot handle multiple  
> readers (or multiple writers).  By importing the pool on multiple  
> machines, you have corrupted the pool.

Yes.

> You purposely did that by adding the '-f' option to 'zpool import'.   
> Without the '-f' option ZFS would have told you that its already  
> imported on another machine (A).
> 
> There is no bug here (besides admin error :)  ).

My reading is that the complaint is not about corrupting the pool.  The
complaint is that once a pool has become corrupted, it shouldn't cause a
panic on import.  It seems reasonable to detect this and fail the import
instead.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
 < This line left intentionally blank to confuse you. >
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Crypto Alpha Release

2007-10-04 Thread Darren J Moffat
I'm pleased to announce that the ZFS Crypto project now has Alpha 
release binaries that you can download and try.   Currently we only have 
x86/x64 binaries available, SPARC will be available shortly.

Information on the Alpha release of ZFS Crypto and links for downloading 
the binaries is here:

http://opensolaris.org/os/project/zfs-crypto/phase1/alpha/

Please pay particular note to the important information at the top of 
the above page.

One of the main purposes of this Alpha release is to get feedback so 
that we can complete the design and schedule our second design review 
and our PSARC commitment review.

Note the the feature set is NOT committed at this time, neither is the 
user interface.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

2007-10-04 Thread eric kustarz
>
> Client A
>   - import pool make couple-o-changes
>
> Client B
>   - import pool -f  (heh)
>
> Client A + B - With both mounting the same pool, touched a couple of
> files, and removed a couple of files from each client
>
> Client A + B - zpool export
>
> Client A - Attempted import and dropped the panic.
>

ZFS is not a clustered file system.  It cannot handle multiple  
readers (or multiple writers).  By importing the pool on multiple  
machines, you have corrupted the pool.

You purposely did that by adding the '-f' option to 'zpool import'.   
Without the '-f' option ZFS would have told you that its already  
imported on another machine (A).

There is no bug here (besides admin error :)  ).

eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Roch - PAE
Jim Mauro writes:
 > 
 > > Where does the win come from with "directI/O"?  Is it 1), 2), or some  
 > > combination?  If its a combination, what's the percentage of each  
 > > towards the win?
 > >   
 > That will vary based on workload (I know, you already knew that ... :^).
 > Decomposing the performance win between what is gained as a result of 
 > single writer
 > lock breakup and no caching is something we can only guess at, because, 
 > at least
 > for UFS, you can't do just one - it's all or nothing.
 > > We need to tease 1) and 2) apart to have a full understanding.  
 > 
 > We can't. We can only guess (for UFS).
 > 
 > My opinion - it's a must-have for ZFS if we're going to get serious 
 > attention
 > in the database space. I'll bet dollars-to-donuts that, over the next 
 > several years,
 > we'll burn many tens-of-millions of dollars on customer support 
 > escalations that
 > come down to memory utilization issues and contention between database
 > specific buffering and the ARC. This is entirely my opinion (not that of 
 > Sun),


...memory utilisation... OK so we should implement the 'lost cause' rfe.

In all cases, ZFS must not steal pages from other memory consumers :

6488341 ZFS should avoiding growing the ARC into trouble

So the DB memory pages should not be _contented_ for. 

-r

 > and I've been wrong before.
 > 
 > Thanks,
 > /jim
 > 
 > 
 > 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-04 Thread Jim Mauro

> Where does the win come from with "directI/O"?  Is it 1), 2), or some  
> combination?  If its a combination, what's the percentage of each  
> towards the win?
>   
That will vary based on workload (I know, you already knew that ... :^).
Decomposing the performance win between what is gained as a result of 
single writer
lock breakup and no caching is something we can only guess at, because, 
at least
for UFS, you can't do just one - it's all or nothing.
> We need to tease 1) and 2) apart to have a full understanding.  

We can't. We can only guess (for UFS).

My opinion - it's a must-have for ZFS if we're going to get serious 
attention
in the database space. I'll bet dollars-to-donuts that, over the next 
several years,
we'll burn many tens-of-millions of dollars on customer support 
escalations that
come down to memory utilization issues and contention between database
specific buffering and the ARC. This is entirely my opinion (not that of 
Sun),
and I've been wrong before.

Thanks,
/jim



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] About bug 6486493 (ZFS boot incompatible with

2007-10-04 Thread Ivan Wang
> This bug was rendered moot via 6528732 in build
> snv_68 (and s10_u5).  We
> now store physical devices paths with the vnodes, so
> even though the
> SATA framework doesn't correctly support open by
> devid in early boot, we

But if I read it right, there is still a problem in SATA framework (failing 
ldi_open_by_devid,) right?
If this problem is framework-wide, it might just bite back some time in the 
future.

Ivan.

> can fallback to the device path just fine.  ZFS root
> works great on
> thumper, which uses the marvell SATA driver.
> 
> - Eric
> 
> On Wed, Oct 03, 2007 at 08:10:16AM +, Marc Bevand
> wrote:
> > I would like to test ZFS boot on my home server,
> but according to bug 
> > 6486493 ZFS boot cannot be used if the disks are
> attached to a SATA
> > controller handled by a driver using the new SATA
> framework (which
> > is my case: driver si3124). I have never heard of
> someone having
> > successfully used ZFS boot with the SATA framework,
> so I assume this
> > bug is real and everybody out there playing with
> ZFS boot is doing so
> > with PATA controllers, or SATA controllers
> operating in compatibility
> > mode, or SCSI controllers, right ?
> > 
> > -marc
> > 
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> >
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
> 
> --
> Eric Schrock, Solaris Kernel Development
>   http://blogs.sun.com/eschrock
> _
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

2007-10-04 Thread Victor Engle
> Perhaps it's the same cause, I don't know...
>
> But I'm certainly not convinced that I'd be happy with a 25K, for
> example, panicing just because I tried to import a dud pool...
>
> I'm ok(ish) with the panic on a failed write to a non-redundant storage.
> I expect it by now...
>

I agree, forcing a panic seems to be pretty severe and may cause as
much grief as it prevents. Why not just stop allowing I/O to the pool
so the sys admin can gracefully shutdown the system? Applications
would be disrupted but no more so than they would be disrupted during
a panic.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zones on zfs

2007-10-04 Thread Neal Miskin
Hi 

I have a Netra T1 with 2 int disks. I want to install Sol 10 8/07 and build 2 
zones (one as an ftp server and one as an scp server) and would like the system 
mirrored.

My thoughts are to use SVM to mirror the / partitions, then build a mirrored 
zfs pool using slice 5 on both disks (I know this isnt recommended but 2 disks 
is all I have). The zones will then be built on the zfs filesystem.

Filesystem size   used   avail  capacity  Mounted on
/dev/md/dsk/d2   5.8G   4.0G  1.7G  70%/
zfspool  9.6G   220M 9.4G  3%  /zfspool

Does this sound feasible or are are there any better ways of doing this?

Thanks

Neal
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

2007-10-04 Thread Nathan Kroenert
I think it's a little more sinister than that...

I'm only just trying to import the pool. Not even yet doing any I/O to it...

Perhaps it's the same cause, I don't know...

But I'm certainly not convinced that I'd be happy with a 25K, for 
example, panicing just because I tried to import a dud pool...

I'm ok(ish) with the panic on a failed write to a non-redundant storage. 
I expect it by now...

Cheers!

Nathan.

Victor Engle wrote:
> Wouldn't this be the known feature where a write error to zfs forces a panic?
> 
> Vic
> 
> 
> 
> On 10/4/07, Ben Rockwood <[EMAIL PROTECTED]> wrote:
>> Dick Davies wrote:
>>> On 04/10/2007, Nathan Kroenert <[EMAIL PROTECTED]> wrote:
>>>
>>>
 Client A
   - import pool make couple-o-changes

 Client B
   - import pool -f  (heh)

>>>
 Oct  4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ff0002b51c80:
 Oct  4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion
 failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5
 == 0x0)
 , file: ../../common/fs/zfs/space_map.c, line: 339
 Oct  4 15:03:12 fozzie unix: [ID 10 kern.notice]
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51160
 genunix:assfail3+b9 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51200
 zfs:space_map_load+2ef ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51240
 zfs:metaslab_activate+66 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51300
 zfs:metaslab_group_alloc+24e ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b513d0
 zfs:metaslab_alloc_dva+192 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51470
 zfs:metaslab_alloc+82 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b514c0
 zfs:zio_dva_allocate+68 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b514e0
 zfs:zio_next_stage+b3 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51510
 zfs:zio_checksum_generate+6e ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51530
 zfs:zio_next_stage+b3 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b515a0
 zfs:zio_write_compress+239 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b515c0
 zfs:zio_next_stage+b3 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51610
 zfs:zio_wait_for_children+5d ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51630
 zfs:zio_wait_children_ready+20 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51650
 zfs:zio_next_stage_async+bb ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51670
 zfs:zio_nowait+11 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51960
 zfs:dbuf_sync_leaf+1ac ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b519a0
 zfs:dbuf_sync_list+51 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51a10
 zfs:dnode_sync+23b ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51a50
 zfs:dmu_objset_sync_dnodes+55 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51ad0
 zfs:dmu_objset_sync+13d ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51b40
 zfs:dsl_pool_sync+199 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51bd0
 zfs:spa_sync+1c5 ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51c60
 zfs:txg_sync_thread+19a ()
 Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51c70
 unix:thread_start+8 ()
 Oct  4 15:03:12 fozzie unix: [ID 10 kern.notice]

>>>
 Is this a known issue, already fixed in a later build, or should I bug it?

>>> It shouldn't panic the machine, no. I'd raise a bug.
>>>
>>>
 After spending a little time playing with iscsi, I have to say it's
 almost inevitable that someone is going to do this by accident and panic
 a big box for what I see as no good reason. (though I'm happy to be
 educated... ;)

>>> You use ACLs and TPGT groups to ensure 2 hosts can't simultaneously
>>> access the same LUN by accident. You'd have the same problem with
>>> Fibre Channel SANs.
>>>
>> I ran into similar problems when replicating via AVS.
>>
>> benr.
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mai

Re: [zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

2007-10-04 Thread Victor Engle
Wouldn't this be the known feature where a write error to zfs forces a panic?

Vic



On 10/4/07, Ben Rockwood <[EMAIL PROTECTED]> wrote:
> Dick Davies wrote:
> > On 04/10/2007, Nathan Kroenert <[EMAIL PROTECTED]> wrote:
> >
> >
> >> Client A
> >>   - import pool make couple-o-changes
> >>
> >> Client B
> >>   - import pool -f  (heh)
> >>
> >
> >
> >> Oct  4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ff0002b51c80:
> >> Oct  4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion
> >> failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5
> >> == 0x0)
> >> , file: ../../common/fs/zfs/space_map.c, line: 339
> >> Oct  4 15:03:12 fozzie unix: [ID 10 kern.notice]
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51160
> >> genunix:assfail3+b9 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51200
> >> zfs:space_map_load+2ef ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51240
> >> zfs:metaslab_activate+66 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51300
> >> zfs:metaslab_group_alloc+24e ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b513d0
> >> zfs:metaslab_alloc_dva+192 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51470
> >> zfs:metaslab_alloc+82 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b514c0
> >> zfs:zio_dva_allocate+68 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b514e0
> >> zfs:zio_next_stage+b3 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51510
> >> zfs:zio_checksum_generate+6e ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51530
> >> zfs:zio_next_stage+b3 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b515a0
> >> zfs:zio_write_compress+239 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b515c0
> >> zfs:zio_next_stage+b3 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51610
> >> zfs:zio_wait_for_children+5d ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51630
> >> zfs:zio_wait_children_ready+20 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51650
> >> zfs:zio_next_stage_async+bb ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51670
> >> zfs:zio_nowait+11 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51960
> >> zfs:dbuf_sync_leaf+1ac ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b519a0
> >> zfs:dbuf_sync_list+51 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51a10
> >> zfs:dnode_sync+23b ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51a50
> >> zfs:dmu_objset_sync_dnodes+55 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51ad0
> >> zfs:dmu_objset_sync+13d ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51b40
> >> zfs:dsl_pool_sync+199 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51bd0
> >> zfs:spa_sync+1c5 ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51c60
> >> zfs:txg_sync_thread+19a ()
> >> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51c70
> >> unix:thread_start+8 ()
> >> Oct  4 15:03:12 fozzie unix: [ID 10 kern.notice]
> >>
> >
> >
> >> Is this a known issue, already fixed in a later build, or should I bug it?
> >>
> >
> > It shouldn't panic the machine, no. I'd raise a bug.
> >
> >
> >> After spending a little time playing with iscsi, I have to say it's
> >> almost inevitable that someone is going to do this by accident and panic
> >> a big box for what I see as no good reason. (though I'm happy to be
> >> educated... ;)
> >>
> >
> > You use ACLs and TPGT groups to ensure 2 hosts can't simultaneously
> > access the same LUN by accident. You'd have the same problem with
> > Fibre Channel SANs.
> >
> I ran into similar problems when replicating via AVS.
>
> benr.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When I stab myself with this knife, it hurts... But - should it kill me?

2007-10-04 Thread Ben Rockwood
Dick Davies wrote:
> On 04/10/2007, Nathan Kroenert <[EMAIL PROTECTED]> wrote:
>
>   
>> Client A
>>   - import pool make couple-o-changes
>>
>> Client B
>>   - import pool -f  (heh)
>> 
>
>   
>> Oct  4 15:03:12 fozzie ^Mpanic[cpu0]/thread=ff0002b51c80:
>> Oct  4 15:03:12 fozzie genunix: [ID 603766 kern.notice] assertion
>> failed: dmu_read(os, smo->smo_object, offset, size, entry_map) == 0 (0x5
>> == 0x0)
>> , file: ../../common/fs/zfs/space_map.c, line: 339
>> Oct  4 15:03:12 fozzie unix: [ID 10 kern.notice]
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51160
>> genunix:assfail3+b9 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51200
>> zfs:space_map_load+2ef ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51240
>> zfs:metaslab_activate+66 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51300
>> zfs:metaslab_group_alloc+24e ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b513d0
>> zfs:metaslab_alloc_dva+192 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51470
>> zfs:metaslab_alloc+82 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b514c0
>> zfs:zio_dva_allocate+68 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b514e0
>> zfs:zio_next_stage+b3 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51510
>> zfs:zio_checksum_generate+6e ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51530
>> zfs:zio_next_stage+b3 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b515a0
>> zfs:zio_write_compress+239 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b515c0
>> zfs:zio_next_stage+b3 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51610
>> zfs:zio_wait_for_children+5d ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51630
>> zfs:zio_wait_children_ready+20 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51650
>> zfs:zio_next_stage_async+bb ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51670
>> zfs:zio_nowait+11 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51960
>> zfs:dbuf_sync_leaf+1ac ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b519a0
>> zfs:dbuf_sync_list+51 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51a10
>> zfs:dnode_sync+23b ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51a50
>> zfs:dmu_objset_sync_dnodes+55 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51ad0
>> zfs:dmu_objset_sync+13d ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51b40
>> zfs:dsl_pool_sync+199 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51bd0
>> zfs:spa_sync+1c5 ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51c60
>> zfs:txg_sync_thread+19a ()
>> Oct  4 15:03:12 fozzie genunix: [ID 655072 kern.notice] ff0002b51c70
>> unix:thread_start+8 ()
>> Oct  4 15:03:12 fozzie unix: [ID 10 kern.notice]
>> 
>
>   
>> Is this a known issue, already fixed in a later build, or should I bug it?
>> 
>
> It shouldn't panic the machine, no. I'd raise a bug.
>
>   
>> After spending a little time playing with iscsi, I have to say it's
>> almost inevitable that someone is going to do this by accident and panic
>> a big box for what I see as no good reason. (though I'm happy to be
>> educated... ;)
>> 
>
> You use ACLs and TPGT groups to ensure 2 hosts can't simultaneously
> access the same LUN by accident. You'd have the same problem with
> Fibre Channel SANs.
>   
I ran into similar problems when replicating via AVS.

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss