Re: [zfs-discuss] SSD over 10gbe not any faster than 10K SAS over GigE

2009-10-12 Thread Derek Anderson
Thank you for your input folks.  The MTU 9000 idea worked like a charm.  I have 
the Intel X25 also, but the capacity was not what I am after for a 6 device 
array.   I have looked and looked at review after review and thats why I 
started with the Intel path, albeit that firmware upgrade in May was a pain to 
pull off.  I have seen glowing things about the Samsung's and Intels both.  
What tipped me over the edge is a youtube video, ( surely paid for by Samsung 
).  Check it out : http://www.youtube.com/watch?v=96dWOEa4Djs

Figuring out how to do jumbo frames on the ixgbe was fun given my newness to 
suns platform.

Thanks,

Derek
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD over 10gbe not any faster than 10K SAS over GigE

2009-10-12 Thread Al Hopper
On Fri, Oct 9, 2009 at 9:25 PM, Derek Anderson  wrote:
>
> GigE wasn't giving me the performance I had hoped for so I spring for some 
> 10Gbe cards.    So what am I doing wrong.
>
> My setup is a Dell 2950 without a raid controller, just a SAS6 card.  The 
> setup is as such
> :
> mirror rpool (boot) SAS 10K
> raidz SSD  467 GB  on 3 Samsung 256 MLC SSD (220MB/s each)
>
> to create the raidz I did a simple zpool create raidz SSD c1x c1xx 
> c1x.  I have a single 10GBe card with a single IP on it.
>
> I created a NFS filesystem for vmware by using :  zfs create SSD/vmware .   I 
> had to set permissoins for Vmware anon=0, but thats it.  Below is what zpool 
> iostat reads:
>
> File copy 10Gbe to SSD -> 40M max
> file copy  1gbe  to SSD ->  5.4M max
> File copy  SAS to SSD internal -> 90M
> File copy SSD to SAS internal -> 55M
>
> Top shows not matter what I always have 2.5 G free and every other test says 
> the same thing.  Can anyone tell me why this is seems to be slow?  Does 90M 
> mean MegaBytes or MegaBits?
>
> Thanks,
>

Derek - I think you made a bad choice with the Samsung disks.  I'd
recommend the Intel 160Gb drives if its not too late to return the
Samsungs.  The Intel drives currently offer the best compromise
between different work loads.  There are plenty of SSD reviews and the
Samsungs always come out poorly in comparison testing.

Regards,

--
Al Hopper  Logical Approach Inc,Plano,TX a...@logical-approach.com
                  Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] deduplication

2009-10-12 Thread Matty
On Fri, Jul 17, 2009 at 2:42 PM, Brandon High  wrote:

> The keynote was given on Wednesday. Any more willingness to discuss
> dedup on the list now?

The following video contains a de-duplication overview from Bill and Jeff:

https://slx.sun.com/1179275620

Hope this helps,
- Ryan
--
http://prefetch.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS sgid directory interoperability with Linux

2009-10-12 Thread Paul B. Henson
On Mon, 12 Oct 2009, Mark Shellenbaum wrote:

> Does it only fail under NFS or does it only fail when inheriting an ACL?

It only fails over NFS from a Linux client, locally it works fine, and from
a Solaris client it works fine. It also only seems to fail on directories,
files receive the correct group ownership:

$ uname -a
Linux damien 2.6.27-gentoo-r8 #7 SMP Tue May 26 13:15:08 PDT 2009 x86_64
Dual Core AMD Opteron(tm) Processor 280 AuthenticAMD GNU/Linux

$ id
uid=1005(henson) gid=1012(csupomona)

$ mount | grep henson
kyle.unx.csupomona.edu:/export/user/henson on /user/henson type nfs4
(rw,sec=krb5p,clientaddr=134.71.247.8,sloppy,addr=134.71.247.14)

$ ls -ld .
drwx--s--x 3 henson iit 4 Oct 12 15:58 .

$ touch foo
$ mkdir bar
$ ls -l

total 1
drwxr-sr-x 2 henson csupomona 2 Oct 12 15:58 bar
-rw-r--r-- 1 henson iit   0 Oct 12 15:58 foo

New directory group ownership is wrong whether the containing directory has
an inheritable ACL or not.

I only have ZFS filesystems exported right now, but I assume it would
behave the same for ufs. The underlying issue seems to be the Sun NFS
server expects the NFS client to apply the sgid bit itself and create the
new directory with the parent directory's group, while the Linux NFS client
expects the server to enforce the sgid bit.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS sgid directory interoperability with Linux

2009-10-12 Thread Mark Shellenbaum

Paul B. Henson wrote:

We're running Solaris 10 with ZFS to provide home and group directory file
space over NFSv4. We've run into an interoperability issue between the
Solaris NFS server and the Linux NFS client regarding the sgid bit on
directories and assigning appropriate group ownership on newly created
subdirectories.

If a directory exists with the sgid bit set owned by a group other than
the user's primary group, new directories created in that directory are
owned by the primary group rather than by the group of the parent
directory.

Evidently, the Solaris NFS server assumes the client will specify the
correct owner of the directory, whereas the Linux NFS client assumes the
server is in charge of implementing the sgid functionality and will assign
the right group itself. As such, with a Solaris server and a Linux client
the functionality is simply broken :(.

This poses a considerable security issue, as the GROUP@ inherited ACL now
provides access to the primary group of the user rather than the intended
group, which as you might imagine is somewhat problematic.

Ideally, it seems that the server should be responsible for this, rather
than the client voluntarily enforcing it. Is this functionality strictly
defined anywhere, or is it implementation dependent? You'd think
something like this would have turned up in an interoperability bake-off at
some point.

Thanks for any information...




Does it only fail under NFS or does it only fail when inheriting an ACL?

I just tried it locally and it appears to work.

# ls -ld test.dir
drwsr-sr-x   2 marksstorage4 Oct 12 16:45 test.dir

my primary group is "staff"

$ touch file
$ ls -l file
-rw-r--r--   1 marksstorage0 Oct 12 16:49 file


   -Mark




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] NFS sgid directory interoperability with Linux

2009-10-12 Thread Paul B. Henson

We're running Solaris 10 with ZFS to provide home and group directory file
space over NFSv4. We've run into an interoperability issue between the
Solaris NFS server and the Linux NFS client regarding the sgid bit on
directories and assigning appropriate group ownership on newly created
subdirectories.

If a directory exists with the sgid bit set owned by a group other than
the user's primary group, new directories created in that directory are
owned by the primary group rather than by the group of the parent
directory.

Evidently, the Solaris NFS server assumes the client will specify the
correct owner of the directory, whereas the Linux NFS client assumes the
server is in charge of implementing the sgid functionality and will assign
the right group itself. As such, with a Solaris server and a Linux client
the functionality is simply broken :(.

This poses a considerable security issue, as the GROUP@ inherited ACL now
provides access to the primary group of the user rather than the intended
group, which as you might imagine is somewhat problematic.

Ideally, it seems that the server should be responsible for this, rather
than the client voluntarily enforcing it. Is this functionality strictly
defined anywhere, or is it implementation dependent? You'd think
something like this would have turned up in an interoperability bake-off at
some point.

Thanks for any information...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Keep track of meta data on each zfs

2009-10-12 Thread David Dyer-Bennet

On Sat, October 10, 2009 12:02, Harry Putnam wrote:

>
> What do real live administators who administer important data do about
> meta info like that?

Same thing I do about directories -- I name them meaningfully.  So I've
got /home/ddb which is the home directory for user ddb and is mounted from
/zp1/ddb, and similarly for other users.  And then I've got
//fsfs/public/music and //fsfs/public/installers, which are probably
mounted as one filesystem from /zp1/public but I don't remember for sure. 
They hold music files and software installers.

I'm kind of wondering what you're doing, because the confusion sounds
strange to me.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-12 Thread Mertol Ozyoney
Hi Richard; 

You are right ZFS is not a shared FS so it can not be used for RAC unless
you have 7000 series disk system. 
In Exadata ASM is used for storage Management where F20 can perform as a
cache. 

Best regards
Mertol 



Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email mertol.ozyo...@sun.com



-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Richard Elling
Sent: Thursday, September 24, 2009 8:10 PM
To: James Andrewartha
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Sun Flash Accelerator F20

On Sep 24, 2009, at 12:20 AM, James Andrewartha wrote:

> I'm surprised no-one else has posted about this - part of the Sun  
> Oracle Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with  
> 48 or 96 GB of SLC, a built-in SAS controller and a super-capacitor  
> for cache protection.
http://www.sun.com/storage/disk_systems/sss/f20/specs.xml

At the Exadata-2 announcement, Larry kept saying that it wasn't a  
disk.  But there
was little else of a technical nature said, though John did have one  
to show.

RAC doesn't work with ZFS directly, so the details of the  
configuration should prove
interesting.
  -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-12 Thread Mertol Ozyoney
Hi James;

Product will be lounched in a very short time. You can learn pricing from
sun. Please keep in mind that Logzilla and F20 is desigined for slightly
different tasks in mind. Logzilla is an extremely fast and reliable write
device while F20 can be used for many different loads (read or write cache r
both at the same time) 

Mertol 



Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email mertol.ozyo...@sun.com



-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of James Andrewartha
Sent: Thursday, September 24, 2009 10:21 AM
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] Sun Flash Accelerator F20

I'm surprised no-one else has posted about this - part of the Sun Oracle 
Exadata v2 is the Sun Flash Accelerator F20 PCIe card, with 48 or 96 GB of 
SLC, a built-in SAS controller and a super-capacitor for cache protection. 
http://www.sun.com/storage/disk_systems/sss/f20/specs.xml

There's no pricing on the webpage though - does anyone know how it compares 
in price to a logzilla?

-- 
James Andrewartha
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] deduplication

2009-10-12 Thread Mertol Ozyoney
Hi All ;

I am not the right person to talk about Solaris/ZFS roadmap, however you can
talk with you Sun account Manager about 7000 series roadmap if you sign an
NDA, which can give you more information 

Best regards
Mertol 

Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email mertol.ozyo...@sun.com



-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Cyril Plisko
Sent: Thursday, September 17, 2009 9:20 AM
To: Brandon High
Cc: ZFS discuss
Subject: Re: [zfs-discuss] deduplication

2009/9/17 Brandon High :
> 2009/9/11 "C. Bergström" :
>> Can we make a FAQ on this somewhere?
>>
>> 1) There is some legal bla bla between Sun and green-bytes that's tying
up
>> the IP around dedup... (someone knock some sense into green-bytes please)
>> 2) there's an acquisition that's got all sorts of delays.. which may very
>> well delay the thing with green-bytes as well..
>
> I know you're trying to help, but your opinion as to the delay is
> hardly authoritative.
>
> Could someone from Sun provide information on data deduplication in
> ZFS, even if just to say it's tied up in litigation at the moment?

I think it should be pretty obvious by now that no one from Sun going
to tell you a word until it is possible to tell things. At which point
they will probably tell everything + source.

My own opinion of course...

-- 
Regards,
Cyril
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does ZFS work with SAN-attached devices?

2009-10-12 Thread Miles Nordin
> "sj" == Shawn Joy  writes:

sj> Can you explain in, simple terms, how ZFS now reacts
sj> to this? 

I can't.  :) I think Victor's long message made a lot of sense.  The
failure modes with a SAN are not simple.  At least there is the
difference of whether the target's write buffer was lost after a
transient failure or not, and the current storage stack assumes it's
never lost.

IMHO, SAN's are in general broken by design because their software
stacks don't deal predictably with common network failure modes (like
the target rebooting, but the initiator staying up).  The standard
that would qualify to me as ``deal predictably'' would be what NFS
provides:

 * writes are double-cached on client and server, so the client can
   replay them if the server crashes.  To my limited knowledge, no SAN
   stack does this.  Expensive SAN's can limit the amount of data at
   risk with NVRAM, but it seems like there would always be a little
   bit of data in-flight.  

   A cost-conscious Solaris iSCSI target will put a quite large amount
   of data at risk between sync-cache commands.

   This is okay, just as it's ok for NFS servers, but only if all the
   initiators reboot whenver the target reboots.

   Doing the client side part of the double-caching is a little tricky
   because I think you really want to do it pretty high in the storage
   stack, maybe in ZFS rather than in the initiator, or else you will
   be triple-caching a TXG (twice on the client, once on the server)
   which can be pretty big.  This means introducing the idea that a
   sync-cache command can fail, and that when it does, none/some/all
   of the writes between the last sync-cache that succeeded and the
   current one that failed may have been silently lost even if those 
   write commands were ack'd successful when they were issued.

 * the bcp for NFS mount type is 'hard,intr' meaning, retry forever if
   there is a failure.  If you want to stop retrying, whatever app was
   doing the writing gets killed.  This rule means any database file
   that got ``intr'd'' will be crash-consistent.

   The SAN equivalent of 'intr' would be force-unmounting the
   filesystem (and force-unmounting implies either killing processes
   with open files or giving persistent errors to any open
   filehandles).  I'm pretty sure no SAN stack does this intentionally
   whenever it's needed---rather it just sort of happens sometimes
   depending on how errors percolate upwards through various
   nested cargo-cult timeouts.

   I guess it would be easy to add to a first order---just make SAN
   targets stay down forever after they bounce until ZFS marks them
   offline.  The tricky part is the complaints you get after: ``how do
   I add this target back without rebooting?'', ``do I really have to
   resilver?  It's happening daily so I'm basically always
   resilvering.'', ``we are going down twice a day because of harmless
   SAN glitches that we never noticed before---is this really
   necessary?''  I think I remember some post that made it sound like
   people were afraid to touch any of the storage exception handling
   because no one knows what cases are really captured by the many
   stupid levels of timeouts and retries.

In short, to me it sounds like the retry state machines of SAN
initiators are broken by design, across the board.  They make the same
assumption they did for local storage: the only time data in a target
write buffer will get lost is during a crash-reboot.  This is wrong
not only for SAN's but also for hot-pluggable drives which can have
power sags that get wrongly treated the same way as CRC errors on the
data cable.  It's possible to get it right, like NFS is right, but
instead the popular fix with most people is to leave the storage stack
broken but make ZFS more resilient to this type of corruption, like
other filesystems are, because resilience is good, and people are
always twitchy and frightened and not expecting strictly consistent
behavior around their SAN's anyway so the problem is rare.

So far SAN targets have been proprietary, so vendors are free to
conceal this problem with protocol tweaks, expensive NVRAM's, and
giving undefended or fuzzed advice through their support channels to
their paranoid, accepting sysadmins.  Whatever free and open targets
behaved differently were assumed to be ``immature.''  Hopefully now
that SAN's are opening up this SAN write hole will finally get plugged
somehow,

...maybe with one of the two * points above, and if we were to pick
the second * then we'd probably need some notion of a ``target boot
cookie'' so we only take the 'intr'-like force-unmount path in the
cases where it's really needed.

sj> Do we all agree that creating a zpool out of one device in a
sj> SAN environment is not recommended.

This is still a good question.  The stock response is ``ZFS needs to
manage at least one layer of '', but this problem (SAN
target reboots while initiator does not) isn't unexplai

Re: [zfs-discuss] How to use ZFS on x4270

2009-10-12 Thread Joerg Moellenkamp

Hi,

Am 12.10.2009 um 13:29 schrieb Richard Elling:


I've not implemented qmail, but it appears to be just an MTA.
These do store-and-forward, so it is unlikely that they need to
use sync calls. It will create a lot of files, but that is usually
done async.


Async I/O for mail servers is a big no go. I worked for Canbox, a  
large unified messaging provider during the dotcom-boom. My  
experience: You can afford to lose a index because you can reconstruct  
it but your aren't allowed to loose a single mail. And this would be  
the consequence for using async for the spool.


Regards
 Joerg


--
Joerg MoellenkampTel: (+49 40) 25 15 23 - 460
Principal Field Technologist  Fax: (+49 40) 25 15 23 - 425
Sun Microsystems GmbH   Mobile: (+49 172) 83 18 433
Nagelsweg 55 mailto:joerg.moellenk...@sun.com
D-20097 Hamburg   Website: http://www.sun.de
 Blog: 
http://www.c0t0d0s0.org

Sitz der Gesellschaft:   Sun Microsystems GmbH
 Sonnenallee 1
 D-85551 Kirchheim-Heimstetten
Amtsgericht München: HRB 161028
Geschäftsführer:Thomas Schröder
  Wolfgang Engels
  Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Häring

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] use zpool directly w/o create zfs

2009-10-12 Thread Cindy Swearingen

Hua,

The behavior below is described here:

http://docs.sun.com/app/docs/doc/819-5461/setup-1?a=view

The top-level /tank file system cannot be removed so it is
less flexible then using descendent datasets.

If you want to create snapshot or clone and later promote
the /tank clone, then it is best to create separate ZFS
file systems rather than using /tank.

Cindy

On 10/10/09 17:00, Hua wrote:

I understand that usually zfs need to be created inside a zpool to store 
files/data.

However, I quick test shows that I actually can put files directly inside a 
mounted zpool without creating any zfs.

After
zpool create -f tank c0d1

I actually can copy/delete any files into /tank. I can also create dir inside 
/tank.

I haven't seen any documentation talking about such an usage. Just wonder 
whether it is allowed or is there any problem that I use zpool this way?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to use ZFS on x4270

2009-10-12 Thread Andrew Gabriel

Richard Elling wrote:

On Oct 12, 2009, at 2:12 AM, tak ar wrote:


I'm not aware of email services using sync regularly.
In my experience
with large
email services, the response time of the disks used
for database and
indexes is
the critical factor (for > 600 messages/sec
delivered, caches don't
matter :-)
Performance of the disks for the mail messages
themselves is not as
critical.


I'm not using database. I'm using qmail only. Sync don't matter?


I've not implemented qmail, but it appears to be just an MTA.
These do store-and-forward, so it is unlikely that they need to
use sync calls. It will create a lot of files, but that is usually
done async.


I can't speak for qmail which I've never used, but MTA's should sync 
data to disk before acknowledging receipt, to ensure that in the event 
of unexpected outage, no messages are lost. (Some of the MTA testing 
standards do permit message duplication on unexpected MTA outage, but 
never any loss, or at least didn't 10 years ago when I was working in 
this area.) An MTA is basically a transactional database, and (if 
properly written), the requirements on the underlying storage will be 
quite similar.


--
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to use ZFS on x4270

2009-10-12 Thread Richard Elling

On Oct 12, 2009, at 2:12 AM, tak ar wrote:


I'm not aware of email services using sync regularly.
In my experience
with large
email services, the response time of the disks used
for database and
indexes is
the critical factor (for > 600 messages/sec
delivered, caches don't
matter :-)
Performance of the disks for the mail messages
themselves is not as
critical.


I'm not using database. I'm using qmail only. Sync don't matter?


I've not implemented qmail, but it appears to be just an MTA.
These do store-and-forward, so it is unlikely that they need to
use sync calls. It will create a lot of files, but that is usually
done async.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to use ZFS on x4270

2009-10-12 Thread tak ar
> I'm not aware of email services using sync regularly.
> In my experience  
> with large
> email services, the response time of the disks used
> for database and  
> indexes is
> the critical factor (for > 600 messages/sec
> delivered, caches don't  
> matter :-)
> Performance of the disks for the mail messages
> themselves is not as  
> critical.

I'm not using database. I'm using qmail only. Sync don't matter?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic on zpool import

2009-10-12 Thread Darren Taylor
i have re run zdb -l /dev/dsk/c9t4d0s0 as i should have the first time (thanks 
Nicolas).

Attached output.
-- 
This message posted from opensolaris.org# zdb -l /dev/dsk/c9t4d0s0

LABEL 0

version=14
name='tank'
state=0
txg=119170
pool_guid=15136317365944618902
hostid=290968
hostname='lexx'
top_guid=1561201926038510280
guid=11292568128772689834
vdev_tree
type='raidz'
id=0
guid=1561201926038510280
nparity=1
metaslab_array=23
metaslab_shift=35
ashift=9
asize=4000766230528
is_log=0
children[0]
type='disk'
id=0
guid=11292568128772689834
path='/dev/dsk/c9t4d0s0'
devid='id1,s...@n50014ee2588170a5/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@4,0:a'
whole_disk=1
children[1]
type='disk'
id=1
guid=10678319508898151547
path='/dev/dsk/c9t5d0s0'
devid='id1,s...@n50014ee2032b9b04/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@5,0:a'
whole_disk=1
children[2]
type='disk'
id=2
guid=16523383997370950474
path='/dev/dsk/c9t6d0s0'
devid='id1,s...@n50014ee2032b9b75/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@6,0:a'
whole_disk=1
children[3]
type='disk'
id=3
guid=1710422830365926220
path='/dev/dsk/c9t7d0s0'
devid='id1,s...@n50014ee2add68f2c/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@7,0:a'
whole_disk=1

LABEL 1

version=14
name='tank'
state=0
txg=119170
pool_guid=15136317365944618902
hostid=290968
hostname='lexx'
top_guid=1561201926038510280
guid=11292568128772689834
vdev_tree
type='raidz'
id=0
guid=1561201926038510280
nparity=1
metaslab_array=23
metaslab_shift=35
ashift=9
asize=4000766230528
is_log=0
children[0]
type='disk'
id=0
guid=11292568128772689834
path='/dev/dsk/c9t4d0s0'
devid='id1,s...@n50014ee2588170a5/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@4,0:a'
whole_disk=1
children[1]
type='disk'
id=1
guid=10678319508898151547
path='/dev/dsk/c9t5d0s0'
devid='id1,s...@n50014ee2032b9b04/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@5,0:a'
whole_disk=1
children[2]
type='disk'
id=2
guid=16523383997370950474
path='/dev/dsk/c9t6d0s0'
devid='id1,s...@n50014ee2032b9b75/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@6,0:a'
whole_disk=1
children[3]
type='disk'
id=3
guid=1710422830365926220
path='/dev/dsk/c9t7d0s0'
devid='id1,s...@n50014ee2add68f2c/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@7,0:a'
whole_disk=1

LABEL 2

version=14
name='tank'
state=0
txg=119170
pool_guid=15136317365944618902
hostid=290968
hostname='lexx'
top_guid=1561201926038510280
guid=11292568128772689834
vdev_tree
type='raidz'
id=0
guid=1561201926038510280
nparity=1
metaslab_array=23
metaslab_shift=35
ashift=9
asize=4000766230528
is_log=0
children[0]
type='disk'
id=0
guid=11292568128772689834
path='/dev/dsk/c9t4d0s0'
devid='id1,s...@n50014ee2588170a5/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@4,0:a'
whole_disk=1
children[1]
type='disk'
id=1
guid=10678319508898151547
path='/dev/dsk/c9t5d0s0'
devid='id1,s...@n50014ee2032b9b04/a'
phys_path='/p...@0,0/pci1022,9...@2/pci15d9,a...@0/s...@5,0:a'
whole_disk=1
children[2]
type='disk'
id=2
guid=16523383997370950474
path='/dev/dsk/c9t6d0s0'
devid='id

Re: [zfs-discuss] kernel panic on zpool import

2009-10-12 Thread Darren Taylor
Hi Victor, i have tried to re-attach the detail from /var/adm/messages
-- 
This message posted from opensolaris.orgOct 11 17:16:55 opensolaris unix: [ID 836849 kern.notice] 
Oct 11 17:16:55 opensolaris ^Mpanic[cpu0]/thread=ff000b6f7c60: 
Oct 11 17:16:55 opensolaris genunix: [ID 361072 kern.notice] zfs: freeing free 
segment (offset=3540185931776 size=22528)
Oct 11 17:16:55 opensolaris unix: [ID 10 kern.notice] 
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f75f0 
genunix:vcmn_err+2c ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f76e0 
zfs:zfs_panic_recover+ae ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7770 
zfs:space_map_remove+13c ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7820 
zfs:space_map_load+260 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7860 
zfs:metaslab_activate+64 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7920 
zfs:metaslab_group_alloc+2b7 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7a00 
zfs:metaslab_alloc_dva+295 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7aa0 
zfs:metaslab_alloc+9b ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7ad0 
zfs:zio_dva_allocate+3e ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7b00 
zfs:zio_execute+a0 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7b60 
zfs:zio_notify_parent+a6 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7b90 
zfs:zio_ready+188 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7bc0 
zfs:zio_execute+a0 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7c40 
genunix:taskq_thread+193 ()
Oct 11 17:16:55 opensolaris genunix: [ID 655072 kern.notice] ff000b6f7c50 
unix:thread_start+8 ()
Oct 11 17:16:55 opensolaris unix: [ID 10 kern.notice] 
Oct 11 17:16:55 opensolaris genunix: [ID 672855 kern.notice] syncing file 
systems...
Oct 11 17:16:55 opensolaris genunix: [ID 904073 kern.notice]  done
Oct 11 17:16:56 opensolaris genunix: [ID 111219 kern.notice] dumping to 
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Oct 11 17:17:09 opensolaris genunix: [ID 409368 kern.notice] ^M100% done: 
168706 pages dumped, compression ratio 3.58, 
Oct 11 17:17:09 opensolaris genunix: [ID 851671 kern.notice] dump succeeded___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to use ZFS on x4270

2009-10-12 Thread tak ar
> > Use the BBWC to maintain high IOPS when X25-E's
> write cache is disabled?
> 
> It should certainly help.  Note that in this case
> your relatively 
> small battery-backed memory is accepting writes for
> both the X25-E and 
> for the disk storage so the BBWC memory becomes 1/2
> as useful and you 
> are wasting some of the RAID card write performance.
> 
> Some people here advocate putting as much
> battery-backed memory on the 
> RAID card as possible (and with multiple RAID cards
> if possible) 
> rather than using a slower slog SSD.  Battery backed
> RAM is faster 
> than FLASH SSDs.  The only FLASH SSDs which can keep
> up include their 
> own battery-backed (or capacitor backed) RAM.
> 
> Regardless, if you can decouple your slog I/O path
> from the main I/O 
> path, you should see less latency, and more
> performance. This suggests 
> that you should use a different controller for your
> X25-E's if you 
> can.

OK, I will disable the X25-E's write cache. But I can't prepare the different 
controller because there is no budget.

> > At some report I have seen, write cache is
> necessary for 
> > wear-leveling. Should I switch off the X25-E's
> write cache?
> 
> I don't know the answer to that.  Intel does not seem
> to provide much 
> detail.  If you want your slog to protect as much
> data as possible 
> when the system loses power, then it seems that you
> should disable the 
> X25-E write cache since it is not protected.  Expect
> a 5X reduction in 
> write IOPS performance (e.g. 5000 --> 1000).

I think the data is more important than the performance, so I will disable the 
X25-E's write cache.

> > The serser has RAID card, so I can use
> hardware(Adaptec's) RAID(the 
> > file system is ZFS). Should I use ZFS for the RAID?
> 
> Unless the Adaptec firmware is broken so that you
> can't usefully 
> export the disks as "JBOD" devices, then I would use
> ZFS for the RAID.

OK, I will use ZFS for the RAID(include boot disk).

> > I think the IOPS is important for mail server, so
> ZIL is useful. The 
> > server has 48GB RAM and two(ZFS or hardware mirror)
> X25-E(32GB) for 
> > ZIL(slog). I understand the ZIL needs half of RAM.
> 
> There is a difference between synchronous IOPS and
> async "IOPS" since 
> synchronous writes require that data be written right
> away while async 
> I/O can be written later.  Postponed writes are much
> more efficient.
> 
> If the mail software invokes fsync(2) to flush a mail
> file to disk, 
> then a synchronous write is required.  However, there
> is still a 
> difference between opening a file with the O_DSYNC
> option (all writes 
> are synchronous) and using the fsync(2) call when the
> file write is 
> complete (only pending unwritten data is
> synchronous).
> 
> A lot depends on how your mail software operates.
>  Some mail systems 
> reate a file for each mail message while others
> concatenate all of 
> the messages for one user into one file.
> 
> You may want to defer installing your X25-Es and
> evaluate performance 
> of the mail system with a DTrace tool called
> 'zilstat', which is 
> written by Richard Elling.  This tool will tell you
> how much and what 
> type of synchronous write traffic you have.
> 
> It is currently difficult to remove slog devices so
> it is safer to add 
> them if you determine they will help rather than
> reduce performance.

I'm using qmail for the mail server on linux now, and I will replace it to 
solaris. I think the qmail invokes fsync whenever the server receives mail 
messages. And the mail server is used to relay mail received from application 
servers. I think slog device is useful.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic on zpool import

2009-10-12 Thread Victor Latushkin

On 11.10.09 12:59, Darren Taylor wrote:

I have searched the forums and google wide, but cannot find a fix for the issue 
I'm currently experiencing. Long story short - I'm now at a point where I 
cannot even import my zpool (zpool import -f tank) without causing a kernel 
panic

I'm running OpenSolaris snv_111b and the zpool is version 14. 


This is the panic from /var/adm/messages;  (full output attached);


Where is full stack back trace? I do not see any attachment.

victor



genunix: [ID 361072 kern.notice] zfs: freeing free segment 
(offset=3540185931776 size=22528)

This is the output I get from zpool import;

# zpool import
  pool: tank
id: 15136317365944618902
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

tankONLINE
  raidz1ONLINE
c9t4d0  ONLINE
c9t5d0  ONLINE
c9t6d0  ONLINE
c9t7d0  ONLINE
  raidz1ONLINE
c9t0d0  ONLINE
c9t1d0  ONLINE
c9t2d0  ONLINE
c9t3d0  ONLINE

I tried pulling back some info via this zdb command, but i'm not sure if i'm on 
the right track here (as zpool import seems to see the zpool without issue). 
This result is similar from all drives;

# zdb -l /dev/dsk/c9t4d0

LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

failed to unpack label 2

LABEL 3

failed to unpack label 3

I also can complete zdb -e tank without issues – it lists all my snapshots and various objects without problem (this is still running on the machine at the moment) 


I have put the following into /etc/system;

set zfs:zfs_recover=1
set aok=1 

i've also tried mounting the zpool read only with zpool import -f -o ro tank but no luck.. 


I dont know where to go next? – am I meant to try and recover using an older 
txg? E.

I would be extremely grateful to anyone who can offer advice on how to resolve this issue as the pool contains irreplaceable photos. Unfortunately I have not done any backups for a while as I thought raidz would be my savour. :( 


please help

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss