[zfs-discuss] Oracle on ZFS best practice? docs? blogs?

2008-10-21 Thread david lacerte
Oracle on ZFS best practice? docs? blogs?   Any recent/new info related 
to Running Oracle 10g and/or 11g on ZFS Solaris 10?

dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with Fusion-IO?

2008-10-21 Thread Bob Friesenhahn
On Tue, 21 Oct 2008, MC wrote:
>
> There is a fusion-io user here claiming that the performance drops 
> 90% after the device has written to capacity once.  Does fusion-io

Isn't this what is expected from FLASH-based SSD devices?  They have 
to erase before they can write.  The vendor is kind enough to deliver 
a pre-erased drive so the user feels an initial rush of exhilaration 
and feels good about the purchase.

With standard hard drive interfaces the drive does not know if written 
data is useful or not so it can only erase when an overwrite is 
requested.  This is because with hard drives, there is no need to 
"free" any data on the media.  Usually the SSD's erase block size is 
larger than the filesystem block size so more data needs to be erased 
than will be written, so some pre-existing data needs to be read and 
restored (potentially placing it at risk).  With a higher level 
interface (e.g. to ZFS) then freed regions can be erased once they are 
added to the filesystem free list and will hopefully be erased by the 
time they are used again.

It seems that optical media often support an erase mechansim so 
perhaps there is something in the IDE/ATA/SATA/SCSI/SAS protocols 
which can be used to explicitly erase blocks.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC is Solaris 10?

2008-10-21 Thread Lance
> When will L2ARC be available in Solaris 10?

My Sun SE said L2ARC should be in S10U7.  It was scheduled for S10U6 (shipping 
in a few weeks), but didn't make it in time.  At least S10U6 will have ZIL 
offload and ZFS boot.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs migration question

2008-10-21 Thread Dave Bevans

Hi,

I have a customer with the following question...

She's trying to combine 2 ZFS 460gb disks into one 900gb ZFS disk. If 
this is possible how is this done? Is there any documentation on this 
that I can provide to them?

--
Regards,
Dave
--
My normal working hours are Sunday through Wednesday from 8PM to 6AM 
Eastern. If you need assistance outside of these hours, please call 
1-800-usa-4sun and request the next available engineer.





Sun Microsystems
Mailstop ubur04-206
75 Network Drive
Burlington, MA  01803

*Dave Bevans - Technical Support Engineer*
*Phone: 1-800-USA-4SUN (800-872-4786)
(opt-2), (case #) (press "0" for the next available engineer)
* *Email:   david.bevans @Sun.com 


TSC Systems Group-OS / Hours: 8PM - 6AM EST / Sun - Wed
*
Submit, Check & Update Cases at the Online Support Center 





This email may contain confidential and privileged material for the sole 
use of the intended recipient. Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient please 
contact the sender and delete all copies.



--
Regards,
Dave
--
My normal working hours are Sunday through Wednesday from 8PM to 6AM 
Eastern. If you need assistance outside of these hours, please call 
1-800-usa-4sun and request the next available engineer.





Sun Microsystems
Mailstop ubur04-206
75 Network Drive
Burlington, MA  01803

*Dave Bevans - Technical Support Engineer*
*Phone: 1-800-USA-4SUN (800-872-4786)
(opt-2), (case #) (press "0" for the next available engineer)
* *Email:   david.bevans @Sun.com 


TSC Systems Group-OS / Hours: 8PM - 6AM EST / Sun - Wed
*
Submit, Check & Update Cases at the Online Support Center 





This email may contain confidential and privileged material for the sole 
use of the intended recipient. Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient please 
contact the sender and delete all copies.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with Fusion-IO?

2008-10-21 Thread MC
> Yes, we've been pleasantly surprised by the demand.
> But, that doesn't mean we're not anxious to expand
> our ability to address such an important market as
>  OpenSolaris and ZFS.
> 
> We're actively working on OpenSolaris drivers.  We
> don't expect it to take long - I'll keep you posted.
> 
> -David Flynn
> 
> CTO Fusion-io
> [EMAIL PROTECTED]

There is a fusion-io user here claiming that the performance drops 90% after 
the device has written to capacity once.  Does fusion-io have a response?  
http://forums.storagereview.net/index.php?s=&showtopic=27190&view=findpost&p=253758
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Success Stories

2008-10-21 Thread Robert Milkowski
Hello Marc,

Tuesday, October 21, 2008, 8:14:17 AM, you wrote:

MB> About 2 years ago I used to run snv_55b with a raidz on top of 5 500GB SATA
MB> drives. After 10 months I ran out of space and added a mirror of 2 250GB
MB> drives to my pool with "zpool add". No pb. I scrubbed it weekly. I only saw 
1
MB> CKSUM error one day (ZFS self-healed itself automatically of course). Never
MB> had any pb with that server.

MB> After running again out of space I replaced it with a new system running
MB> snv_82, configured with a raidz on top of 7 750GB drives. To burn in the
MB> machine, I wrote a python script that read random sectors from the drives. I
MB> let it run for 48 hours to subject each disk to 10+ million I/O operations.
MB> After it passed this test, I created the pool and run some more scripts to
MB> create/delete files off it continously. To test disk failures (and SATA
MB> hotplug), I disconnected and reconnected a drive at random while the scripts
MB> were running. The system was always able to redetect the drive immediately
MB> after being plugged in (you need "set sata:sata_auto_online=1" for this to
MB> work). Depending on how long the drive had been disconnected, I either 
needed
MB> to do a "zpool replace" or nothing at all, for the system to re-add the disk
MB> to the pool and initiate a resilver. After these tests, I trusted the system
MB> enough to move all my data to it, so I rsync'd everything and double-checked
MB> it with MD5 sums.

MB> I have another ZFS server, at work, on which 1 disk someday started acting
MB> weirdly (timeouts). I physically replaced it, and ran "zpool replace". The
MB> resilver completed successfully. On this server, we have seen 2 CKSUM errors
MB> over the last 18 months or so. We read about 3 TB of data every day from it
MB> (daily rsync), that amounts to about 1.5 PB over 18 months. I guess 2 silent
MB> data corruptions while reading that quantity of data is about the expected
MB> error rate of modern SATA drives. (Again ZFS self-healed itself, so this was
MB> completely transparent to us.)

Which means you haven't experienced silent data corruption thanks to
ZFS. :)

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting per-file record size / querying fs/file record size?

2008-10-21 Thread Robert Milkowski
Hello Nicolas,

Monday, October 20, 2008, 10:57:22 PM, you wrote:

NW> I've a report that the mismatch between SQLite3's default block size and
NW> ZFS' causes some performance problems for Thunderbird users.

NW> It'd be great if there was an API by which SQLite3 could set its block
NW> size to match the hosting filesystem or where it could set the DB file's
NW> record size to match the SQLite3/app default block size (1KB).

NW> Is there such an API?  If not, is there an RFE I could add a call record
NW> to?

Maybe it would be also useful to provide an extra API so application
could set zfs recordsize for a given file instead a file system wide
propery (which should be a maximum allowed value). That way, assuming
MySQL, Oracle and other would start using such an extension, Oracle
would ask ZFS to match its recordsize to db_block_size automatically
and OOTB user experience would be much better.





-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Tuning ZFS for Sun Java Messaging Server

2008-10-21 Thread Robert Milkowski
Hello Adam,

Tuesday, October 21, 2008, 2:00:46 PM, you wrote:

ANC> We're using a rather large (3.8TB) ZFS volume for our mailstores on a
ANC> JMS setup. Does anybody have any tips for tuning ZFS for JMS? I'm
ANC> looking for even the most obvious tips, as I am a bit of a novice. Thanks,

Well, it's kind of broad topic and it depends on a specific
environment. Then do not tune for the sake of tuning - try to
understand your problem first. Nevertheless you should consider things like 
(random order):

1. RAID level - you probably will end-up with relatively small random
   IOs - generally avoid RAID-Z
   Of course it could be that RAID-Z in your environment is perfectly
   fine.

2. Depending on your workload and disk subsystem ZFS's slog on SSD
could help to improve performance

3. Disable atime updates on zfs file system

4. Enabling compression like lzjb in theory could help - depends on
how weel you data would compress and how much CPU you have left and if
you are mostly IO bond

5. ZFS recordsize - probably not as in most cases when you read
anything from email you will probably read entire mail anyway.
Nevertheless could be easily checked with dtrace.

6. IIRC JMS keeps an index/db file per mailbox - so just maybe L2ARC
on large SSD would help assuming it would nicely cache these files -
would need to be simulated/tested

7. Disabling vdev pre-fetching in ZFS could help - see ZFS Evile tuning
guide


Except for #3 and maybe #7 first identify what is your problem and
what are you trying to fix.



-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-21 Thread Nigel Smith
Hi tano
I hope you can try with the 'iscsisnoop.d' script, so 
we can see if your problem is the same as what Eugene is seeing.

Please can you also check the contents of the file:
/var/svc/log/system-iscsitgt\:default.log
.. just to make sure that the iscsi target is not core dumping & restarting.

I've also done a post on the storage-forum on how to
enable a debug log on the iscsi target, which may also give some clues.
http://mail.opensolaris.org/pipermail/storage-discuss/2008-October/006423.html

It may also be worth trying with a smaller target size,
just to see if that is a factor.
(There have in the past been bugs, now fixed, which triggered with 'large' 
targets.)
As I said, it worked ok for me with a 200Gb target.

Many thanks for all your testing. Please bear with us on this one.
If it is a problem with the Solaris iscsi target we need to get to 
the bottom of the root cause.
Following Eugene report, I'm beginning to fear that some sort of regression
has been introduced into the iscsi target code...
Regards
Nigel Smith
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-21 Thread Tano
one more update:

common hardware between all my machines soo far has been the PERC (Poweredge 
Raid Controllers) or also known as the LSI MEGA RAID controller.


The 1850 has a PERC 4d/i
the 1900 has a PERC 5/i

I'll be testing the iscsitarget with a SATA controller to test my hypothesis.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-21 Thread Tano
The poweredge 1850 has an intel etherexpress pro 1000 internal cards in it. 

However, some new updates, even the microsoft initiator hung writing a 1.5 
gigabyte file to the iscsitarget on the opensolaris box.

I've installed linux iscsitarget on the same box and will reattempt the iscsi 
targets to the microsoft and esx servers.

I'll also get the DTRACE of the iscsi box later this afternoon. 

Sorry for the delay.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting per-file record size / querying fs/file record size?

2008-10-21 Thread Nicolas Williams
On Tue, Oct 21, 2008 at 03:43:08PM -0400, Bill Sommerfeld wrote:
> On Mon, 2008-10-20 at 16:57 -0500, Nicolas Williams wrote:
> > I've a report that the mismatch between SQLite3's default block size and
> > ZFS' causes some performance problems for Thunderbird users.
> 
> I was seeing a severe performance problem with sqlite3 databases as used
> by evolution (not thunderbird).
> 
> It appears that reformatting the evolution databases to a 32KB database
> page size and setting zfs's record size to a matching 32KB has done
> wonders for evolution performance to a ZFS home directory.
> 
> > It'd be great if there was an API by which SQLite3 could set its block
> > size to match the hosting filesystem or where it could set the DB file's
> > record size to match the SQLite3/app default block size (1KB).
> 
> IMHO some of the fix has to involve sqlite3 using a larger page size by
> default when creating the database -- it seems to be a lot more
> efficient with the larger page size.
> 
> Databases like sqlite3 are being used "under the covers" by growing
> numbers of applications -- it seems like there's a missing interface
> here if we want decent out-of-the-box performance of end-user apps like
> tbird and evolution using databases on zfs.

Agreed.

I've filed:

3452: os_unix.c:unixSectorSize() should use statvfs() to get pref blk size
http://www.sqlite.org/cvstrac/tktview?tn=3452

3454: hardcoded 32KB pagesize max may be too low
http://www.sqlite.org/cvstrac/tktview?tn=3454

and

6762083 sqlite3 default pagesize selection could be better (SQLite3 tickets 
#3452 and 3454)

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting per-file record size / querying fs/file record size?

2008-10-21 Thread Bill Sommerfeld
On Mon, 2008-10-20 at 16:57 -0500, Nicolas Williams wrote:
> I've a report that the mismatch between SQLite3's default block size and
> ZFS' causes some performance problems for Thunderbird users.

I was seeing a severe performance problem with sqlite3 databases as used
by evolution (not thunderbird).

It appears that reformatting the evolution databases to a 32KB database
page size and setting zfs's record size to a matching 32KB has done
wonders for evolution performance to a ZFS home directory.

> It'd be great if there was an API by which SQLite3 could set its block
> size to match the hosting filesystem or where it could set the DB file's
> record size to match the SQLite3/app default block size (1KB).

IMHO some of the fix has to involve sqlite3 using a larger page size by
default when creating the database -- it seems to be a lot more
efficient with the larger page size.

Databases like sqlite3 are being used "under the covers" by growing
numbers of applications -- it seems like there's a missing interface
here if we want decent out-of-the-box performance of end-user apps like
tbird and evolution using databases on zfs.

- Bill
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver being killed by 'zpool status' when root

2008-10-21 Thread Jacob Ritorto
Pls pardon the off-topic question, but is there a Solaris backport of the fix?


On Tue, Oct 21, 2008 at 2:15 PM, Victor Latushkin
<[EMAIL PROTECTED]> wrote:
> Blake Irvin wrote:
>> Looks like there is a closed bug for this:
>>
>> http://bugs.opensolaris.org/view_bug.do?bug_id=6655927
>>
>> It's been closed as 'not reproducible', but I can reproduce consistently on 
>> Sol 10 5/08.  How can I re-open this bug?
>
> Have you tried to reproduce it with Nevada build 94 or later? Bug
> 6655927 is closed as not reproducible because that part of the code was
> rewritten as part of fixing 6343667, and problem described in 6655927
> was not reproducible any longer.
>
> If you can reproduce it with build 94 or later, then the bug 6655927
> probably worth revisiting.
>
> victor
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver being killed by 'zpool status' when root

2008-10-21 Thread Victor Latushkin
Blake Irvin wrote:
> Looks like there is a closed bug for this:
> 
> http://bugs.opensolaris.org/view_bug.do?bug_id=6655927
> 
> It's been closed as 'not reproducible', but I can reproduce consistently on 
> Sol 10 5/08.  How can I re-open this bug?

Have you tried to reproduce it with Nevada build 94 or later? Bug 
6655927 is closed as not reproducible because that part of the code was 
rewritten as part of fixing 6343667, and problem described in 6655927 
was not reproducible any longer.

If you can reproduce it with build 94 or later, then the bug 6655927 
probably worth revisiting.

victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-21 Thread Eugene Chupriyanov
I have a very similar problem with SNV_(( and Virtual Iron 
(http://www.opensolaris.org/jive/thread.jspa?threadID=79831&tstart=0)

I am using IBM x3650 Server with 6 SAS drives. And what we have in common is 
Broadcomm network cards (BNX driver). From previous experiance I know this 
cards had a driver problem in linux. So as a wild guess maybe problem is here? 
Can you try another card in your server? Unfortunately I don't have compatible 
spare card to check it..

Regards,
Eugene
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver being killed by 'zpool status' when root

2008-10-21 Thread Blake Irvin
Looks like there is a closed bug for this:

http://bugs.opensolaris.org/view_bug.do?bug_id=6655927

It's been closed as 'not reproducible', but I can reproduce consistently on Sol 
10 5/08.  How can I re-open this bug?

I'm using a pair of Supermicro AOC-SAT2-MV8 on a fully patched install of 
Solaris 10 5/08, with a 9-disk raidz2 pool.

The motherboard is a Supermicro H8DM8-2.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver being killed by 'zpool status' when root

2008-10-21 Thread Blake Irvin
I've confirmed the problem with automatic resilvers as well.  I will see about 
submitting a bug.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting per-file record size / querying fs/file record size?

2008-10-21 Thread Nicolas Williams
On Mon, Oct 20, 2008 at 04:57:22PM -0500, Nicolas Williams wrote:
> I've a report that the mismatch between SQLite3's default block size and
> ZFS' causes some performance problems for Thunderbird users.
> 
> It'd be great if there was an API by which SQLite3 could set its block
> size to match the hosting filesystem or where it could set the DB file's
> record size to match the SQLite3/app default block size (1KB).
> 
> Is there such an API?  If not, is there an RFE I could add a call record
> to?

To answer one of my own questions: I can use statvfs(2) to discover the
preferred block size of a filesystem.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Building a 2nd pool, can I do it in stages?

2008-10-21 Thread Bob Friesenhahn

On Tue, 21 Oct 2008, Håvard Krüger wrote:

Is it possible to build a RaidZ with 3x 1TB disks and 5x 0.5TB 
disks, and then swap out the 0.5 TB disks as time goes by? Is there 
a documentation/wiki on doing this?


Yes, you can build a raidz vdev with all of these drives but only 
0.5TB will be used from your 1TB drives.  Once you replace *all* of 
the 0.5TB drives with 1TB drives, then the full space of the 1TB 
drives will be used.


Depending on how likely it is that you will replace all of these old 
drives, you might consider using the new drives to add a second vdev 
to the pool so that the disk space on all the existing drives may be 
fully used and you obtain better mutiuser performance.


Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Building a 2nd pool, can I do it in stages?

2008-10-21 Thread Håvard Krüger
Hi, my present RaidZ pool is now almost full so I've recently bought a Adaptec 
3805 to start building a 2nd one, but since these are so expensive I don't have 
enough money left over to buy 8x1TB disks, but I can still buy 3, and I have 5x 
0.5TB disks lying around.

Is it possible to build a RaidZ with 3x 1TB disks and 5x 0.5TB disks, and then 
swap out the 0.5 TB disks as time goes by? Is there a documentation/wiki on 
doing this?


(I'm using NexentaCore and not OpenSolaris, but ZFS is still ZFS isn't it?)
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS scalability in terms of file system count (or lack thereof) in S10U6

2008-10-21 Thread Pramod Batni



On 10/21/08 04:52, Paul B. Henson wrote:

On Mon, 20 Oct 2008, Pramod Batni wrote:

  

Yes, the implementation of the above ioctl walks the list of mounted
filesystems 'vfslist' [in this case it walks 5000 nodes of a linked list
before the ioctl returns] This in-kernel traversal of the filesystems is
taking time.



Hmm, O(n) :(... I guess that is the implementation of getmntent(3C)?
  
  In fact the problem is that 'zfs create' calls the ioctl way too many 
times.

  getmntent(3C) issues a single ioctl( MNTIOC_GETMNTENT).

Why does creating a new ZFS filesystem require enumerating all existing
ones?
  

 This is to determine if any of the filesystems in the dataset are mounted.
 The ioctl calls are coming from:

 libc.so.1`ioctl+0x8
 libc.so.1`getmntany+0x200
 libzfs.so.1`is_mounted+0x60
 libshare.so.1`sa_get_zfs_shares+0x118
 libshare.so.1`sa_init+0x330
 libzfs.so.1`zfs_init_libshare+0xac
 libzfs.so.1`zfs_share_proto+0x4c
 zfs`zfs_do_create+0x608
 zfs`main+0x2b0
 zfs`_start+0x108

  zfs_init_libshare is walking through a list of filesystems and 
determining if each of them
  are mounted. I think there can be a better way to do this rather than 
doing a is_mounted()

  check on each of the filesystems. In any case a bug can be filed on this.

Pramod
  

You could set 'zfs set mountpoint=none ' and then create the
filesystems under the  . [In my experiments the number of
ioctl's went down drastically.] You could then set a mountpoint for the
pool and then issue a 'zpool mount -a' .



That would work for an initial mass creation, but we are going to need to
create and delete fairly large numbers of file systems over time, this
workaround would not help for that.


  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Tuning ZFS for Sun Java Messaging Server

2008-10-21 Thread Adam N. Copeland
We're using a rather large (3.8TB) ZFS volume for our mailstores on a
JMS setup. Does anybody have any tips for tuning ZFS for JMS? I'm
looking for even the most obvious tips, as I am a bit of a novice. Thanks,

Adam
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Booting 0811 from USB Stick

2008-10-21 Thread Marcelo Leal
Hello all,
 Did you make a install on the USB stick, or did you use the Distribution 
Constructor (DC)? 

 Leal.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?

2008-10-21 Thread Marcelo Leal
Hello Roch!

> 
> Leave the default recordsize. With 128K recordsize,
> files smaller than  
> 128K are stored as single record
> tightly fitted to the smallest possible # of disk
> sectors. Reads and  
> writes are then managed with fewer ops.
 In the write ZFS is dynamic, but in the read? 
 If i have many small files (smaller than 128K), i would not waste time reading 
128K? And after the ZFS has allocated a FSB of 64K for example, if that file 
gets bigger, ZFS will use 64K blocks right?
 
> 
> Not tuning the recordsize is very generally more
> space efficient and  
> more performant.
> Large DB (fixed size aligned accesses to uncacheable
> working set) is  
> the exception here (tuning recordsize helps) and a
> few other corner  
> cases.
> 
> -r
> 
> 
> Le 15 sept. 08 à 04:49, Peter Eriksson a écrit :
> 
> > I wonder if there exists some tool that can be used
> to figure out an
> > optimal ZFS recordsize configuration? Specifically
> for a mail
> > server using Maildir (one ZFS filesystem per user).
> Ie, lot's of
> > small files (one file per email).
> > --
> > This message posted from opensolaris.org
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> >
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Solution: Recover after disk labels "failure"

2008-10-21 Thread Oleg Muravskiy
I recovered the pool by doing export, import  and scrub.

Apparently you could export pool with a FAILED device, and import will restore 
labels from backup copies. Data errors are still there after import, so you 
need to scrub pool. After all that the filesystem is back with no 
errors/problems.

It would be nice if documentation mention this, namely that before trying to 
replace disks or restoring backups, you could try to export/import.

Also, it is not clear what "zpool clear" actually clears (what a nice use of 
word "clear"!). It does not clear data errors recorded within the pool. In my 
case they were registered when I tried to read data from pool with one device 
marked as FAILED (when in fact only label was corrupted, the data itself was 
OK), and disappeared upon scrub.

So my thanks go to people on Internet who share their findings about zfs, and 
zfs developers who made such a robust system (I still think it's the best from 
all [free] systems I used).
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-21 Thread Nigel Smith
Well, my colleague & myself have recently had a basic Vmare ESX cluster working,
with the Solaris iscsi target, in the Lab at work, so I know it does work.

We used ESX 3.5i on two Dell Precision 390 workstations,
booted from USB memory sticks.
We used snv_97 and no special tweaks required.
We used Vmotion to move a running Windows XP guest
from one ESX host to the another.
Windows XP was playing a video feed at the time.  
It all worked fine.  We repeated the operation three times.
My colleague is the ESX expert, but I believe it was
update 2 with all latest patches applied.
But we only had a single iscsi target setup on the Solaris box,
The target size was 200Gb, formated with VMFS.

Ok, another thing you could try, which may give a clue
to what is going wrong, is to run the 'iscsisnoop.d'
script on the Solaris box.
http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_iSCSI
This is a DTrace script which shows what iscsi target events are happening,
so interesting if it shows anything unusual at the point of failure.

But, I'm beginning to think it could be one of your hardware components
that is playing up, but no clue so far. It could be anywhere on the path.
Maybe you could check the Solaris iScsi target works ok under stress
from something other that ESX, like say the Windows iscsi initiator.
Regards
Nigel Smith
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Success Stories

2008-10-21 Thread Marc Bevand
About 2 years ago I used to run snv_55b with a raidz on top of 5 500GB SATA 
drives. After 10 months I ran out of space and added a mirror of 2 250GB 
drives to my pool with "zpool add". No pb. I scrubbed it weekly. I only saw 1 
CKSUM error one day (ZFS self-healed itself automatically of course). Never 
had any pb with that server.

After running again out of space I replaced it with a new system running 
snv_82, configured with a raidz on top of 7 750GB drives. To burn in the 
machine, I wrote a python script that read random sectors from the drives. I 
let it run for 48 hours to subject each disk to 10+ million I/O operations. 
After it passed this test, I created the pool and run some more scripts to 
create/delete files off it continously. To test disk failures (and SATA 
hotplug), I disconnected and reconnected a drive at random while the scripts 
were running. The system was always able to redetect the drive immediately 
after being plugged in (you need "set sata:sata_auto_online=1" for this to 
work). Depending on how long the drive had been disconnected, I either needed 
to do a "zpool replace" or nothing at all, for the system to re-add the disk 
to the pool and initiate a resilver. After these tests, I trusted the system 
enough to move all my data to it, so I rsync'd everything and double-checked 
it with MD5 sums.

I have another ZFS server, at work, on which 1 disk someday started acting 
weirdly (timeouts). I physically replaced it, and ran "zpool replace". The 
resilver completed successfully. On this server, we have seen 2 CKSUM errors 
over the last 18 months or so. We read about 3 TB of data every day from it 
(daily rsync), that amounts to about 1.5 PB over 18 months. I guess 2 silent 
data corruptions while reading that quantity of data is about the expected 
error rate of modern SATA drives. (Again ZFS self-healed itself, so this was 
completely transparent to us.)

-marc


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss