Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?

2012-05-06 Thread Greg Mason
I am currently trying to get two of these things running Illumian. I don't have 
any particular performance requirements, so I'm thinking of using some sort of 
supported hypervisor, (either RHEL and KVM or VMware ESXi) to get around the 
driver support issues, and passing the disks through to an Illumian guest.

The H310 does indeed support pass-through (the non-raid mode), but one thing to 
keep in mind is that I was only able to configure a single boot disk. I 
configured the rear two drives into a hardware raid 1 and set the virtual disk 
as the boot disk so that I can still boot the system if an OS disk fails.

Once Illumos is better supported on the R720 and the PERC H310, I plan to get 
rid of the hypervisor silliness and run Illumos on bare metal.

-Greg

Sent from my iPhone
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Group Quotas

2010-08-18 Thread Greg Mason
 
 Also the linux NFSv4 client is bugged (as in hang-the-whole-machine bugged).
 I am deploying a new osol fileserver for home directories and I'm using NFSv3 
 + automounter (because I am also using one dataset per user, and thus I have 
 to mount each home dir separately).

We are also in the same boat here. I have about 125TB of ZFS storage in 
production currently, running OSOL, across 5 X4540s. We tried the NFSv4 route, 
and crawled back to NFSv3 and the linux automounter because NFSv4 on Linux is 
*that* broken. As in hung-disk-io-that-wedges-the-whole-box broken. We know 
that NFSv3 was never meant for the scale we're using it at, but we have no 
choice in the matter.

On the topic of Linux clients, NFS and ZFS: We've also found that Linux is bad 
at handling lots of mounts/umounts. We will occasionally find a client where 
the automounter requested a mount, but it never actually completed. It'll show 
as mounted in /proc/mounts, but won't *actually* be mounted. A umount -f for 
the affected filesystem fixes this. On ~250 clients in an HPC environment, 
we'll see such an error every week or so.

I'm hoping that recent versions of linux (i.e. RHEL 6) are a bit better at 
NFSv4, but i'm not holding my breath.

--
Greg Mason
HPC Administrator
Michigan State University
Institute for Cyber Enabled Research
High Performance Computing Center

web: www.icer.msu.edu
email: gma...@msu.edu




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] swap - where is it coming from?

2010-06-09 Thread Greg Eanes
On Wed, Jun 9, 2010 at 8:17 PM, devsk funt...@yahoo.com wrote:
 $ swap -s
 total: 473164k bytes allocated + 388916k reserved = 862080k used, 6062060k 
 available

 $ swap -l
 swapfile             dev    swaplo   blocks     free
 /dev/dsk/c6t0d0s1   215,1         8 12594952 12594952

 Can someone please do the math for me here? I am not able to figure the total.

 What is 473164k bytes allocated? Where is it allocated? In some hidden zfs 
 swap FS in my root pool?
 What's the magic behind the number 473164k?
 What is 388916k reserved?
 862080k+6062060k != 12594952/2 - So, where did the rest of it come from? I 
 just configured one device in /etc/vfstab.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



man swap

These numbers include swap  space  from  all  configured
 swap  areas  as  listed  by  the -l option, as well swap
 space in the form of physical memory.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] backup zpool to tape

2010-03-15 Thread Greg
Hey Scott, 
Thanks for the information. I doubt I can drop that kind of cash, but back to 
getting bacula working!

Thanks again,
Greg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] backup zpool to tape

2010-03-12 Thread Greg
Hey Miles,
Do you have any idea if there is a way to backup a zvol in the manner you speak 
of with bacula? Is DD a secure way to do this or are there better methods to do 
this? Otherwise I will just use dd. Thanks again! 

Thanks!
Greg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] backup zpool to tape

2010-03-12 Thread Greg
Yes it would, however we only have the restore/verify portion. Unless of course 
I am overlooking something. 

Thanks,
Greg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] backup zpool to tape

2010-03-09 Thread Greg
Thank you for such a thorough look into my issue. As you said, I guess I am 
down to trying to backup to a zvol and then backing that up to tape. Has anyone 
tried this solution? I would be very interested to find out. Anyone else with 
any other solutions?

Thanks!
Greg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-25 Thread Greg
uep,
This solution seems like the best and most efficient way of handling large 
filesystems. My biggest question however is, when backing this up to tape, can 
it be split across several tapes? I will be using bacula to back this up. Will 
i need to tar or star this filesystem before writing it to tape? The next 
question I have is since I am using a primary and a secondary server, using zfs 
send/recv how would I incorporate this solution to then backing the secondary 
SAN to tape? Will I need double the hard disk space as I will need a file based 
copy of the ZFS filesystem? I know it is a lot of questions but I thought the 
solution would work perfect in my environment.

Thanks,
Greg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] opensolaris-vmware

2010-01-11 Thread Greg
Hello All,
I hope this makes sense, I have two opensolaris machines with a bunch of hard 
disks, one acts as a iSCSI SAN, and the other is identical other than the hard 
disk configuration. The only thing being served are VMWare esxi raw disks, 
which hold either virtual machines or data that the particular virtual machine 
uses, I.E. we have exchange 2007 virtualized and through its iSCSI initiator we 
are mounting two LUNs one for the database and another for the Logs, all on 
different arrays of course. Any how we are then snapshotting this data across 
the SAN network to the other box using snapshot send/recv. In the case the 
other box fails this box can immediatly serve all of the iSCSI LUNs. The 
problem, I don't really know if its a problem...Is when I snapshot a running vm 
will it come up alive in esxi or do I have to accomplish this in a different 
way. These snapshots will then be written to tape with bacula. I hope I am 
posting this in the correct place. 

Thanks, 
Greg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] d2d2t

2009-09-17 Thread Greg
Hello all, 
I have an opensolaris server which is used as an iscsi SAN on snv_122. I am 
then using two ESXi boxes to connect to them and this is where the storage for 
the virtual machines lies. On here are several vm's including linux and windows 
servers. We have another server which is almost exactly the same except it has 
fewer hard disks in a raidz2 setup as the other server is all in raid 10. The 
vm files for each of the OS's do not need to be changed often as they will not 
change very much but the lun's they connect to do need to be updated as there 
are multiple databases (mysql and exchange databases/logs) there are also files 
from a file server vm. I need to backup the virtual machines as well as their 
respective data and backup the main server's configuration, i.e. comstar 
settings and making it so that if s*** hits the fan I just point the esx 
servers to the new box and we are up and running. The next issue is then 
backing this all up to tape and making it so that it is not impo
 ssible to recover if people do their standard bone headed things. Does anyone 
have any ideas on how to do this? I was first thinking rsync or zfs 
send/receive.

Thanks,
Greg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] snv_121 zfs issue

2009-09-11 Thread Greg
Hello all,
I am having a problem when I do a zfs promote or a zfs rollback, I get a 
dataset is busy error I am now doing a image update to see if there was an 
issue with the image I have. Has anyone idea as to how to fix this issue?

Thanks,
Greg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snv_121 zfs issue

2009-09-11 Thread Greg
This also occurs  when I do a zfs destroy.
Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snv_121 zfs issue

2009-09-11 Thread Greg
I have tried to unmount the zfs volume and remount it. However, this does not 
help the issue.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Comstar and ESXi

2009-08-28 Thread Greg
Hello all, 
I am running an OpenSolaris server running 06/09. I installed comstar and 
enabled it. I have an ESXi 4.0 server connecting to Comstar via iscsi on its 
own switch. (There are two esxi servers), both of which do this regardless of 
whether one is on or off. The error I see is on esxi Lost connectivity to 
storage device 
naa.600144f030bc45004a9806980003. Path vmhba33:C0:T0:L0 is down. Affected 
datastores: Unknown. error 8/28/2009 11:10:34 AM This error occurs every 40 
seconds and does not stop. I have disabled the iscsigt service and all other 
iscsi services and just enable the one for comstar. I have created target 
groups and host groups however to no avail the issue continues. Has anyone seen 
this issue I can give you other error logs if needed. Would I get the same 
result if I moved to Solaris 10 05/09? I also had the thought that it might be 
esxi 4 so I updated it but again to no avail. If anyone has any ideas it would 
be helpful!

Thanks!
Greg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Ssd for zil on a dell 2950

2009-08-20 Thread Greg Mason
Something our users do quite a bit of is untarring archives with a lot 
of small files. Also, many small, quick writes are also one of the many 
workloads our users have.


Real-world test: our old Linux-based NFS server allowed us to unpack a 
particular tar file (the source for boost 1.37) in around 2-4 minutes, 
depending on load. This machine wasn't special at all, but it had fancy 
SGI disk on the back end, and was using the Linux-specific async NFS option.


We turned up our X4540s, and this same tar unpack took over 17 minutes! 
We disabled the ZIL for testing, and we dropped this to under 1 minute. 
With the X25-E as a slog, we were able to run this test in 2-4 minutes, 
same as the old storage.


That said, I strongly recommend using Richard Elling's zilstat. He's 
posted about it previously on this list. It will help you determine if 
adding a slog device will help your workload or not. I didn't know about 
this script at the time of our testing, so it ended up being some trial 
and error, running various tests on different hardware setups (which 
means creating and destroying quite a few pools).


-Greg

Jorgen Lundman wrote:


Does un-taring something count? It is what I used for our tests.

I tested with ZIL disable, zil cache on /tmp/zil, CF-card (300x) and 
cheap SSD. Waiting for X-25E SSDs to arrive for testing those:


http://mail.opensolaris.org/pipermail/zfs-discuss/2009-July/030183.html

If you want a quick answer, disable ZIL (you need to unmount/mount, 
export/import or reboot) on your ZFS volume and try it. That is the 
theoretical maximum. You can get close to this using various 
technologies, SSD and all that.


I am no expert on this, I knew nothing about it 2 weeks ago.

But for our provisioning engine to untar Movable-Types for customers, 
5 mins to 45secs is quite an improvement. I can get that to 11seconds 
theoretically. (ZIL disable)


Lund


Monish Shah wrote:

Hello Greg,

I'm curious how much performance benefit you gain from the ZIL 
accelerator. Have you measured that?  If not, do you have a gut feel 
about how much it helped?  Also, for what kind of applications does 
it help?


(I know it helps with synchronous writes.  I'm looking for real world 
answers like: Our XYZ application was running like a dog and we 
added an SSD for ZIL and the response time improved by X%.)


Of course, I would welcome a reply from anyone who has experience 
with this, not just Greg.


Monish

- Original Message - From: Greg Mason gma...@msu.edu
To: HUGE | David Stahl dst...@hugeinc.com
Cc: zfs-discuss zfs-discuss@opensolaris.org
Sent: Thursday, August 20, 2009 4:04 AM
Subject: Re: [zfs-discuss] Ssd for zil on a dell 2950


Hi David,

We are using them in our Sun X4540 filers. We are actually using 2 SSDs
per pool, to improve throughput (since the logbias feature isn't in an
official release of OpenSolaris yet). I kind of wish they made an 8G or
16G part, since the 32G capacity is kind of a waste.

We had to go the NewEgg route though. We tried to buy some Sun-branded
disks from Sun, but that's a different story. To summarize, we had to
buy the NewEgg parts to ensure a project stayed on-schedule.

Generally, we've been pretty pleased with them. Occasionally, we've had
an SSD that wasn't behaving well. Looks like you can replace log devices
now though... :) We use the 2.5 to 3.5 SATA adapter from IcyDock, in a
Sun X4540 drive sled. If you can attach a standard sata disk to a Dell
sled, this approach would most likely work for you as well. Only issue
with using the third-party parts is that the involved support
organizations for the software/hardware will make it very clear that
such a configuration is quite unsupported. That said, we've had pretty
good luck with them.

-Greg




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Ssd for zil on a dell 2950

2009-08-20 Thread Greg Mason



How about the bug removing slog not possible? What if this slog fails? Is 
there a plan for such situation (pool becomes inaccessible in this case)?
  

You can zpool replace a bad slog device now.

-Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] unexpected behavior with nbmand=on set

2009-08-19 Thread Greg Mason

It's not too often we see good news on the zfs-discuss list, so here's some:

We at the High Performance Computing Center at MSU have finally worked 
out the root cause of a long-standing issue with our OpenSolaris NFS 
servers. It was a minor configuration issue, involving a ZFS file system 
property.


A little backstory: We chose to go with Sun X4540s, running OpenSolaris 
and ZFS for our home directory space. We initially implemented 100TB of 
usable space. All was well for a while, but then some mostly annoying 
issues started popping up:


1. 0-byte files named '4913' were appearing in user directories. We 
discovered that vi was doing:


open(4913)
close(4913)
remove(4913)

The remove() operation would fail intermittently. With assistance from 
the helpful folks at SGI (because we originally thought this was a Linux 
NFSv4 client problem), testing revealed that this behavior is caused by 
the NFS server on Solaris occasionally returning NFS4ERR_FILE_OPEN, 
which is not handled by the client. According to a Linux NFS kernel 
developer, the error is usually due to ordering issues with 
asynchronous RPC calls. 
http://www.linux-nfs.org/Linux-2.6.x/2.6.18/linux-2.6.18-068-handle_nfs4err_file_open.dif 
We applied a patch to the Linux NFSv4 client, which told the client to 
wait and retry when the client received that error.


2. There was also an issue with gedit. When opening then saving an 
already existing file, it did:


open(file)
rename(file,file~)

rename() returned Input/Output Error. After applying the fix for #1, 
rename() hung indefinitely. We also noticed a similar problem with gcc.


Interestingly, running this test locally on the OpenSolaris server on 
same file system, this test resulted in a permission denied error. If 
we mounted this same file system over NFSv4 on another OpenSolaris 
system, we received the same permission denied error.



Yesterday, we discovered the property 'nbmand' was set on the ZFS file 
systems in question. This was a leftover from our initial testing with 
Solaris CIFS. It was set because the documentation at 
http://dlc.sun.com/osol/docs/content/SSMBAG/managingsmbsharestm.html and 
http://204.152.191.100/wiki/index.php/Getting_Started_With_the_Solaris_CIFS_Service 
instructed that nbmand should be turned on when using CIFS. What isn't 
mentioned, however, is that nbmand can adversely affect the behavior of 
NFSv4 and even local file systems. The ZFS admin guide also states that 
nbmand applies only to CIFS clients, when it actually applies to NFSv4 
clients as well as local file system access.


I think nbmand is also a bit slow in releasing its locks, which explains 
the behavior of bug number 1. The only tests we've run so far show that 
the slow locking behavior goes away when nbmand is turned off. Would 
filing a bug about this slow behavior of nbmand be the correct thing to 
do at this point? If so, where is the proper place to file this bug? The 
OpenSolaris BugZilla is where I've been told these bug reports go to, 
but I'm not sure if this should be filed in bugs.opensolaris.org or not.


Disabling nbmand on a test file system resolved both bugs, as well as 
other known issues that our users have been running into. All the 
various known issues this caused can be found at the MSU HPCC wiki: 
https://wiki.hpcc.msu.edu/display/Issues/Known+Issues, under Home 
Directory file system.


-Greg


--
Greg Mason
System Administrator
High Performance Computing Center
Michigan State University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD's and ZFS...

2009-07-23 Thread Greg Mason
  I think it is a great idea, assuming the SSD has good write performance.
  This one claims up to 230MB/s read and 180MB/s write and it's only $196.
 
  http://www.newegg.com/Product/Product.aspx?Item=N82E16820609393
 
  Compared to this one (250MB/s read and 170MB/s write) which is $699.
 
 Oops. Forgot the link:
 
 http://www.newegg.com/Product/Product.aspx?Item=N82E16820167014
  Are those claims really trustworthy? They sound too good to be true!
 
   -Kyle

Kyle-

The less expensive SSD is an MLC device. The Intel SSD is an SLC device.
That right there accounts for the cost difference. The SLC device (Intel
X25-E) will last quite a bit longer than the MLC device.

-Greg

-- 
Greg Mason
System Administrator
Michigan State University
High Performance Computing Center

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Question about user/group quotas

2009-07-09 Thread Greg Mason
I'm trying to find documentation on how to set and work with user and
group quotas on ZFS. I know it's quite new, but googling around I'm just
finding references to a ZFS quota and refquota, which are
filesystem-wide settings, not per user/group.

Also, after reviewing a few bugs, I'm a bit confused about which build
has user quota support. I recall that snv_111 has user quota support,
but not in rquotad. According to bug 6501037, ZFS user quota support is
in snv_114. 

We're preparing to roll out OpenSolaris 2009.06 (snv_111b), and we're
also curious about being able to utilize ZFS user quotas, as we're
having problems with NFSv4 on our clients (SLES 10 SP2). We'd like to be
able to use NFSv3 for now (one large ZFS filesystem, with user quotas
set), until the flaws with our Linux NFS clients can be addressed.

-- 
Greg Mason
System Administrator
Michigan State University
High Performance Computing Center

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question about user/group quotas

2009-07-09 Thread Greg Mason
Thanks for the link Richard,

I guess the next question is, how safe would it be to run snv_114 in
production? Running something that would be technically unsupported
makes a few folks here understandably nervous...

-Greg

On Thu, 2009-07-09 at 10:13 -0700, Richard Elling wrote:
 Greg Mason wrote:
  I'm trying to find documentation on how to set and work with user and
  group quotas on ZFS. I know it's quite new, but googling around I'm just
  finding references to a ZFS quota and refquota, which are
  filesystem-wide settings, not per user/group.

 
 Cindy does an excellent job of keeping the ZFS Admin Guide up to date.
 http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf
 See the section titled Setting User or Group Quotas on a ZFS File System
  -- richard
  Also, after reviewing a few bugs, I'm a bit confused about which build
  has user quota support. I recall that snv_111 has user quota support,
  but not in rquotad. According to bug 6501037, ZFS user quota support is
  in snv_114. 
 
  We're preparing to roll out OpenSolaris 2009.06 (snv_111b), and we're
  also curious about being able to utilize ZFS user quotas, as we're
  having problems with NFSv4 on our clients (SLES 10 SP2). We'd like to be
  able to use NFSv3 for now (one large ZFS filesystem, with user quotas
  set), until the flaws with our Linux NFS clients can be addressed.
 

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] Backups

2009-06-25 Thread Greg
I think I am getting closer to ideas as to how to back this up. I will do as 
you said to backup the os, take an image or something of that nature. I will 
take a full backup every one to three months of the virtual machines, however 
the data that the vm is working with will be mounted seperately so that if the 
virtual machine goes down all that is needed is to restore the last backup of 
the vm and mount the storage and we should be up and running. Now my only worry 
is how to backup data that the vm's are accessing. I guess my question is this: 
Say I take a full backup every x amount of days, say 7 so weekly backups. I 
then take snapshots throughout the week. Then something happens and there is a 
flood or something. Once I have all hardware and that side of things going, can 
I restore from that full backup and then apply the snapshots to it. Will I then 
be up to yesterday backup wise or are those snapshots useless and I am up to 
last week.

Thanks for helping!
Greg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SAN server

2009-06-22 Thread Greg
Hey all,
I am working on a SAN server for my office and would like to know about 
hardware recommendations. I am quite confused as to go the raidz route or a 
standard raid route. As for what this will be doing, I will be having a vmware 
esxi server connected via iscsi and it will be running multiple servers, as of 
today for sure Exchange 2007 server, Blackberry enterprise server, several 
linux servers, running mysql databases and web servers. The options known to me 
thus far are:
a. a large raidz array or several raidz arrays
b. a hardware raid 10 array for exchange 2007 and then raidz arrays for 
everything else.
c. several hardware raid 10 arrays
d. none of the above

Once I find this what is the average rebuild time on raidz say a 1tb sata disk. 
And finally what kinds of cards will I get the best performance from for the 
use of raidz. Or does it not really matter. 

Thanks,
Greg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] importing pool with missing slog followup

2009-06-09 Thread Greg Mason
In my testing, I've seen that trying to duplicate zpool disks with dd
often results in a disk that's unreadable. I believe it has something to
do with the block sizes of dd.

In order to make my own slog backups, I just used cat instead. I plugged
the slog SSD into another system (not a necessary step, but easier in my
case), catted the disk to a file, then put the slog SSD back. I imagine
this needs to be done with the zpool in a cleanly-exported state, i
haven't tested it otherwise. I've also tested replacing an SSD with my
method, just cat the file back to the disk. I've tested this method of
replacing a slog, and the zpool is imported on boot, like nothing
happened, even though the physical hardware has changed.

A question I have is, does zpool replace now work for slog devices as
of snv_111b?

-Greg

On Fri, 2009-06-05 at 20:57 -0700, Paul B. Henson wrote:
 My research into recovering from a pool whose slog goes MIA while the pool
 is off-line resulted in two possible methods, one requiring prior
 preparation and the other a copy of the zpool.cache including data for the
 failed pool.
 
 The first method is to simply dump a copy of the slog device right after
 you make it (just dd if=/dev/dsk/slog of=slog.dump). If the device ever
 failed, theoretically you could restore the image onto a replacement (dd
 if=slog.dump of=/dev/dsk/slog) and import the pool.
 
 My initial testing of that method was promising, however that testing was
 performed by intentionally corrupting the slog device, and restoring the
 copy back onto the original device. However, when I tried restoring the
 slog dump onto a different device, that didn't work out so well. zpool
 import recognized the different device as a log device for the pool, but
 still complained there were unknown missing devices and refused to import
 the pool. It looks like the device serial number is stored as part of the
 zfs label, resulting in confusion when that label is restored onto a
 different device. As such, this method is only usable if the underlying
 fault is simply corruption, and the original device is available to restore
 onto.
 
 The second method is described at:
 
   http://opensolaris.org/jive/thread.jspa?messageID=377018
 
 Unfortunately, the included binary does not run under S10U6, and after half
 an hour or so of trying to get the source code to compile under S10U6 I
 gave up (I found some of the missing header files in the S10U6 grub source
 code package which presumably match the actual data structures in use under
 S10, but there was additional stuff missing which as I started copying it
 out of opensolaris code just started getting messier and messier). Unless
 someone with more zfs-fu than me creates a binary for S10, this approach is
 not going to be viable.
 
 Unofficially I was told that there is expected to be a fix for this issue
 putback into Nevada around July, but whether or not that might be available
 in U8 wasn't said. So, barring any official release of a fix or unofficial
 availability of a workaround for S10, in the (admittedly unlikely) failure
 mode of a slog device failure on an inactive pool, have good backups :).
 
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] Supermicro SAS/SATA controllers?

2009-04-15 Thread Greg Mason



And it looks like the Intel fragmentation issue is fixed as well:
http://techreport.com/discussions.x/16739


FYI, Intel recently had a new firmware release. IMHO, odds are that
this will be as common as HDD firmware releases, at least for the
next few years.
http://news.cnet.com/8301-13924_3-10218245-64.html?tag=mncol


It should also be noted that the Intel X25-M != the Intel X25-E. The 
X25-E hasn't had any of the performance and fragmentation issues.


The X25-E is an SLC SSD, the X25-M is an MLC SSD, hence the more complex 
firmware.


-Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs as a cache server

2009-04-09 Thread Greg Mason

Francois,

Your best bet is probably a stripe of mirrors. i.e. a zpool made of many 
mirrors.


This way you have redundancy, and fast reads as well. You'll also enjoy 
pretty quick resilvering in the event of a disk failure as well.


For even faster reads, you can add dedicated L2ARC cache devices (folks 
typically use SSDs for very fast (15k RPM) SAS drives for this).


-Greg

Francois wrote:

Hello list,

What would be the best zpool configuration for a cache/proxy server
(probably based on squid) ?

In other words with which zpool configuration I could expect best
reading performance ? (there'll be some writes too but much less).


Thanks.

--
Francois

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data size grew.. with compression on

2009-04-09 Thread Greg Mason

Harry,

ZFS will only compress data if it is able to gain more than 12% of space 
by compressing the data (I may be wrong on the exact percentage). If ZFS 
can't get get that 12% compression at least, it doesn't bother and will 
just store the block uncompressed.


Also, the default ZFS compression algorithm isn't gzip, so you aren't 
going to get the greatest compression possible, but it is quite fast.


Depending on the type of data, it may not compress well at all, leading 
ZFS to store that data completely uncompressed.


-Greg



All good info thanks.  Still one thing doesn't quite work in your line
of reasoning.   The data on the gentoo linux end is uncompressed.
Whereas it is compressed on the zfs side.

A number of the files are themselves compressed formats such as jpg
mpg avi pdf maybe a few more, which aren't going to compress further
to speak of, but thousands of the files are text files (html).  So
compression should show some downsize.

Your calculation appears to be based on both ends being uncompressed.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] GSoC 09 zfs ideas?

2009-03-03 Thread Greg Mason

Just my $0.02, but would pool shrinking be the same as vdev evacuation?

I'm quite interested in vdev evacuation as an upgrade path for 
multi-disk pools. This would be yet another reason to for folks to use 
ZFS at home (you only have to buy cheap disks), but it would also be a 
good to have that ability from an enterprise perspective, as I'm sure 
we've all engineered ourselves into a corner one time or another...


It's a much cleaner, safer, and possibly much faster alternative to 
systematically pulling drives and letting zfs resilver onto a larger 
disk, in order to upgrade a pool in-place, and in production.


basically, what I'm thinking is:

zpool remove mypool list of devices/vdevs

Allow time for ZFS to vacate the vdev(s), and then light up the OK to 
remove light on each evacuated disk.


-Greg

Blake Irvin wrote:

Shrinking pools would also solve the right-sizing dilemma.

Sent from my iPhone

On Feb 28, 2009, at 3:37 AM, Joe Esposito j...@j-espo.com wrote:


I'm using opensolaris and zfs at my house for my photography storage
as well as for an offsite backup location for my employer and several
side web projects.

I have an 80g drive as my root drive.  I recently took posesion of 2
74g 10k drives which I'd love to add as a mirror to replace the 80 g
drive.

From what I gather it is only possible if I zfs export my storage
array and reinstall solaris on the new disks.

So I guess I'm hoping zfs shrink and grow commands show up sooner or 
later.


Just a data point.

Joe Esposito
www.j-espo.com

On 2/28/09, C. Bergström cbergst...@netsyncro.com wrote:

Blake wrote:

Gnome GUI for desktop ZFS administration



On Fri, Feb 27, 2009 at 9:13 PM, Blake blake.ir...@gmail.com wrote:


zfs send is great for moving a filesystem with lots of tiny files,
since it just handles the blocks :)



I'd like to see:

pool-shrinking (and an option to shrink disk A when i want disk B to
become a mirror, but A is a few blocks bigger)


This may be interesting... I'm not sure how often you need to shrink a
pool though?  Could this be classified more as a Home or SME level 
feature?

install to mirror from the liveCD gui


I'm not working on OpenSolaris at all, but for when my projects
installer is more ready /we/ can certainly do this..

zfs recovery tools (sometimes bad things happen)


Agreed.. part of what I think keeps zfs so stable though is the complete
lack of dependence on any recovery tools..  It forces customers to bring
up the issue instead of dirty hack and nobody knows.

automated installgrub when mirroring an rpool


This goes back to an installer option?

./C

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs streams data corruption

2009-02-25 Thread Greg Palmer

Miles Nordin wrote:

gp Performing a checkpoint will perform such tasks as making sure
gp that all transactions recorded in the log but not yet written
gp to the database are written out and that the system is not in
gp the middle of a write when you grab the data.

great copying of buzzwords out of a glossary, 
Wasn't copied from a glossary, I just tried to simplify it enough for 
you to understand. I apologize if I didn't accomplish that goal.



but does it change my
claim or not?  My claim is: 


  that SQLite2 should be equally as tolerant of snapshot backups as it
  is of cord-yanking.
  
You're missing the point here Miles. The folks weren't asking for a 
method to confirm their database was able to perform proper error 
recovery and confirm it would survive having the cord yanked out of the 
wall. They were asking for a reliable way to backup their data. The best 
way to do that is not by snapshotting alone. The process of performing 
database backups is well understood and supported throughout the industry.


Relying on the equivalent of crashing the database to perform backups 
isn't how professionals get the job done. There is a reason that 
database vendor do not suggest you backup their databases by pulling the 
plug out of the wall or killing the running process. The best way to 
backup a database is by using a checkpoint. Your comment about 
checkpoints being for systems where snapshots are not available is not 
accurate. That is the normal method of backing up databases under 
Solaris among others. Checkpoints are useful for all systems since they 
guarantee that the database files are consistent and do not require 
recovery which doesn't always work no matter what the glossy brochures 
tell you. Typically they are used in concert with snapshots. Force the 
checkpoint, trigger the snapshot and you're golden.


Let's take a simple case of a transaction which consists of three 
database updates within a transaction. One of those writes succeeds, you 
take a snapshot and then the two other writes succeed. Everyone 
concerned with the transaction believes it succeeded but your snapshot 
does not show that. When the database starts up again, the data it will 
have in your snapshot indicates the transaction never succeeded and 
therefore it will roll out the database transaction and you will lose 
that transaction. Well, it will assuming that all code involved in that 
recovery works flawlessly. Issuing a checkpoint on the other hand causes 
the database to complete the transaction including ensuring consistency 
of the database files before you take your snapshot. NOTE: If you issue 
a checkpoint and then perform a snapshot you will get consistent data 
which does not require the database perform recovery. Matter of fact, 
that's the best way to do it.


Your dismissal of write activity taking place is inaccurate. Snapshots 
take a picture of the file system at a point in time. They have no 
knowledge of whether or not one of three writes required for the 
database to be consistent have completed. (Refer to above example) Data 
does not hit the disk instantly, it takes some finite amount of time in 
between when the write command is issued for it to arrive at the disk. 
Plainly, terminating the writes between when they are issued and before 
it has completed is possible and a matter of timing. The database on the 
other hand does understand when the transaction has completed and allows 
outside processes to take advantage of this knowledge via checkpointing.


All real database systems have flaws in the recovery process and so far 
every database system I've seen has had issues at one time or another. 
If we were in a perfect world it SHOULD work every time but we aren't in 
a perfect world. ZFS promises on disk consistency but as we saw in the 
recent thread about Unreliable for professional usage it is possible 
to have issues. Likewise with database systems.


Regards,
 Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs streams data corruption

2009-02-24 Thread Greg Palmer

Miles Nordin wrote:

Hope this helps untangle some FUD.  Snapshot backups of databases
*are* safe, unless the database or application above it is broken in a
way that makes cord-yanking unsafe too.
  
Actually Miles, what they were asking for is generally referred to as a 
checkpoint and they are used by all major databases for backing up 
files. Performing a checkpoint will perform such tasks as making sure 
that all transactions recorded in the log but not yet written to the 
database are written out and that the system is not in the middle of a 
write when you grab the data. Dragging the discussion of database 
recovery into the discussion seems to me to only be increasing the FUD 
factor.


Regards,
 Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-13 Thread Greg Palmer

Miles Nordin wrote:

gm That implies that ZFS will have to detect removable devices
gm and treat them differently than fixed devices.

please, no more of this garbage, no more hidden unchangeable automatic
condescending behavior.  The whole format vs rmformat mess is just
ridiculous.  And software and hardware developers alike have both
proven themselves incapable of settling on a definition of
``removeable'' that fits with actual use-cases like: FC/iSCSI;
hot-swappable SATA; adapters that have removeable sockets on both ends
like USB-to-SD, firewire CD-ROM's, SATA/SAS port multipliers, and so
on.
  
Since this discussion is taking place in the context of someone removing 
a USB stick I think you're confusing the issue by dragging in other 
technologies. Let's keep this in the context of the posts preceding it 
which is how USB devices are treated. I would argue that one of the 
first design goals in an environment where you can expect people who are 
not computer professionals to be interfacing with computers is to make 
sure that the appropriate safeties are in place and that the system does 
not behave in a manner which a reasonable person might find unexpected.


This is common practice for any sort of professional engineering effort. 
As an example, you aren't going to go out there and find yourself a 
chainsaw being sold new without a guard. It might be removable, but the 
default is to include it. Why? Well because there is a considerable 
chance of damage to the user without it. Likewise with a file system on 
a device which might cache a data write for as long as thirty seconds 
while being easily removable. In this case, the user may write the file 
and seconds later remove the device. Many folks out there behave in this 
manner.


It really doesn't matter to them that they have a copy of the last save 
they did two hours ago, what they want and expect is that the most 
recent data they saved actually be on the USB stick for the to retrieve. 
What you are suggesting is that it is better to lose that data when it 
could have been avoided. I would personally suggest that it is better to 
have default behavior which is not surprising along with more advanced 
behavior for those who have bothered to read the manual. In Windows 
case, the write cache can be turned on, it is not unchangeable and 
those who have educated themselves use it. I seldom turn it on unless 
I'm doing heavy I/O to a USB hard drive, otherwise the performance 
difference is just not that great.


Regards,
 Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Write caches on X4540

2009-02-12 Thread Greg Mason

We use several X4540's over here as well, what type of workload do you
have, and how much performance increase did you see by disabling the
write caches?



We see the difference between our tests completing in around 2.5 minutes 
(with write caches) to around a minute an and a half without them, in 
one instance.


I'm trying to optimize our machines for a write-heavy environment, as 
our users will undoubtedly hit this limitation of the machines.


-Greg

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Write caches on X4540

2009-02-12 Thread Greg Mason


Are you sure thar write cache is back on after restart?



Yes, I've checked with format -e, on each drive.

When disabling the write cache with format, it also gives a warning 
stating this is the case.


What I'm looking for is a faster way to do this than format -e -d disk 
-f script, for all 48 disks.


From format, after a reboot:

selecting c10t7d0
[disk formatted]
/dev/dsk/c10t7d0s0 is part of active ZFS pool export. Please see zpool(1M).


FORMAT MENU:
disk   - select a disk
type   - select (define) a disk type
partition  - select (define) a partition table
current- describe the current disk
format - format and analyze the disk
fdisk  - run the fdisk program
repair - repair a defective sector
label  - write label to the disk
analyze- surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
inquiry- show vendor, product and revision
scsi   - independent SCSI mode selects
cache  - enable, disable or query SCSI disk cache
volname- set 8-character volume name
!cmd - execute cmd, then return
quit
format cache


CACHE MENU:
write_cache - display or modify write cache settings
read_cache  - display or modify read cache settings
!cmd  - execute cmd, then return
quit
cache write_cache


WRITE_CACHE MENU:
display - display current setting of write cache
enable  - enable write cache
disable - disable write cache
!cmd  - execute cmd, then return
quit
write_cache display
Write Cache is enabled
write_cache disable
This setting is valid until next reset only. It is not saved permanently.
write_cache display
Write Cache is disabled
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Greg Palmer

Ross wrote:

I can also state with confidence that very, very few of the 100 staff working 
here will even be aware that it's possible to unmount a USB volume in windows.  
They will all just pull the plug when their work is saved, and since they all 
come to me when they have problems, I think I can safely say that pulling USB 
devices really doesn't tend to corrupt filesystems in Windows.  Everybody I 
know just waits for the light on the device to go out.
  
The key here is that Windows does not cache writes to the USB drive 
unless you go in and specifically enable them. It caches reads but not 
writes. If you enable them you will lose data if you pull the stick out 
before all the data is written. This is the type of safety measure that 
needs to be implemented in ZFS if it is to support the average user 
instead of just the IT professionals.


Regards,
 Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Write caches on X4540

2009-02-12 Thread Greg Mason
well, since the write cache flush command is disabled, I would like this 
to happen as early as practically possible in the bootup process, as ZFS 
will not be issuing the cache flush commands to the disks.


I'm not really sure what happens in the case where the write flush 
command is disabled, something makes its way into the write cache, then 
the cache is disabled. Does this mean the write cache is flushed to disk 
when the cache is disabled? If so, then I guess it's less critical when 
it happens in the bootup process or if it's permanent...


-Greg

A Darren Dunham wrote:

On Thu, Feb 12, 2009 at 10:33:40AM -0500, Greg Mason wrote:
What I'm looking for is a faster way to do this than format -e -d disk  
-f script, for all 48 disks.


Is the speed critical?  I mean, do you have to pause startup while the
script runs, or does it interfere with data transfer?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-11 Thread Greg Palmer

Uwe Dippel wrote:

We have seen some unfortunate miscommunication here, and misinterpretation. 
This extends into differences of culture. One of the vocal person in here is 
surely not 'Anti-xyz'; rather I sense his intense desire to further the 
progress by pointing his finger to some potential wounds.
I really don't have a dog in this fight but I think what we've seen here 
is the behavior of a person who is too lazy to read the manual, unable 
to understand the technology they are working with, and unwilling to 
face the consequences of their own behavior. As the Solaris user base 
increases though, the number of people like this will increase. The 
general population do not read the manuals nor do they care how the 
magic box works, they just want it to work. This is entirely appropriate 
for a business user who is using the computer as a means to an end. They 
have their area of expertise, which isn't computers. Of course, it 
really isn't appropriate for a system administrator so I can't generate 
a lot of sympathy for DE personally, especially after the manner in 
which he has behaved in this thread.


Turning Solaris into something that can be used with the same amount of 
thought as a toaster is one of the challenges facing the Sun and the 
community in the future. Designing guards to prevent the ignorant from 
harming themselves is a challenge (see quote below).


There are 2 things that are infinite in this world, the universe and 
human stupidity. I'm not sure about the first one - Albert Einstein


Regards,
 Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Write caches on X4540

2009-02-11 Thread Greg Mason

We're using some X4540s, with OpenSolaris 2008.11.

According to my testing, to optimize our systems for our specific 
workload, I've determined that we get the best performance with the 
write cache disabled on every disk, and with zfs:zfs_nocacheflush=1 set 
in /etc/system.


The only issue is setting the write cache permanently, or at least quickly.

Right now, as it is, I've scripted up format to run on boot, disabling 
the write cache of all disks. This takes around two minutes. I'd like to 
avoid needing to take this time on every bootup (which is more often 
than you'd think, we've got quite a bit of construction happening, which 
necessitates bringing everything down periodically). This would also be 
painful in the event of unplanned downtime for one of our Thors.


so, basically, my question is: Is there a way to quickly or permanently 
disable the write cache on every disk in an X4540?


Thanks,

-Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Send Receive (and why does 'ls' modify a snapshot?)

2009-02-04 Thread Greg Mason
Tony,

I believe you want to use zfs recv -F to force a rollback on the 
receiving side.

I'm wondering if your ls is updating the atime somewhere, which would 
indeed be a change...

-Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Add SSD drive as L2ARC(?) cache to existing ZFS raid?

2009-02-03 Thread Greg Mason
Orvar,

With my testing, i've seen a 5x improvement with small file creation 
when working specifically with NFS. This is after I added an SSD for the 
ZIL.

I recommend Richard Elling's zilstat (he posted links earlier). It'll 
let you see if a dedicated device for the ZIL will help your specific 
workload.

My understanding is that you'll get more bang for the buck using an 
SSD for the ZIL rather than the L2ARC. Performing some of your own 
benchmarks is really the only way see what will help improve performance 
for your specific workload. I recommend reading up on the ZFS ARC and 
L2ARC, to help try to determine if testing a dedicated L2ARC device is 
even worthwhile for your uses. I know it wasn't really helpful for me, 
as our read performance is already great.

As for a specific SSD, I've tested the Intel X25E. It's around $600 or 
so. It's got about half the performance of the snazzy, pricey STEC Zeus 
drives. With the specific workload I was trying to accelerate, I wasn't 
hitting any of the limits of the Intel SSDs (but I was definitely WAY 
past the performance limits of a standard hard disk). Again, all of this 
was for accelerating the ZIL, not for use on the L2ARC, so YMMV.

Fishworks does this. They use an SSD both for the read cache as well as 
the ZIL.

-Greg

Orvar Korvar wrote:
 So are there no guide lines how to add a SSD disk as a home user? Which is 
 the best SSD disk to add? What percentage improvements are typical? Or, will 
 a home user not benefit from adding a SSD drive? It is only enterprise SSD 
 drives that works, together with some esoteric software from Fishworks? It 
 requires Enterprise hardware to get a boost from SSD? Not possible? Or?
 
 No one has done this yet? What does the Fishworks team say?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bad sectors arises - discs differ in size - trouble?

2009-02-02 Thread Greg Palmer
Orvar Korvar wrote:
 Ok. Just to confirm: A modern disk has already some spare capacity which is 
 not normally utilized by ZFS, UFS, etc. If the spare capacity is finished, 
 then the disc should be replaced.
   
Yup, that is the case.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache and cache flush

2009-01-30 Thread Greg Mason
A Linux NFS file server, with a few terabytes of fibre-attached disk, 
using XFS.

I'm trying to get these Thors to perform at least as well as the current 
setup. A performance hit is very hard to explain to our users.

 Perhaps I missed something, but what was your previous setup?
 I.e. what did you upgrade from?
 Neil.
 
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache and cache flush

2009-01-30 Thread Greg Mason
I should also add that this creating many small files issue is the 
ONLY case where the Thors are performing poorly, which is why I'm 
focusing on it.

Greg Mason wrote:
 A Linux NFS file server, with a few terabytes of fibre-attached disk, 
 using XFS.
 
 I'm trying to get these Thors to perform at least as well as the current 
 setup. A performance hit is very hard to explain to our users.
 
 Perhaps I missed something, but what was your previous setup?
 I.e. what did you upgrade from?
 Neil.


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache and cache flush

2009-01-30 Thread Greg Mason
Jim Mauro wrote:
 
 This problem only manifests itself when dealing with many small files 
 over NFS. There is no throughput problem with the network.
 But there could be a _latency_ issue with the network.

If there was a latency issue, we would see such a problem with our 
existing file server as well, which we do not. We'd also have much 
greater problems than just file server performance.

So, like I've said, we've ruled out the network as an issue.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache and cache flush

2009-01-30 Thread Greg Mason
 If there was a latency issue, we would see such a problem with our 
 existing file server as well, which we do not. We'd also have much 
 greater problems than just file server performance.
 
 So, like I've said, we've ruled out the network as an issue.

I should also add that I've tested these Thors with the ZIL disabled, 
and they scream! With the cache flush disabled, they also do quite well.

The specific issue i'm trying to solve is the ZIL being slow when using NFS.

I really don't want to have to do something drastic like disabling the 
ZIL to get the performance I need...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache and cache flush

2009-01-30 Thread Greg Mason
I'll give this a script a shot a little bit later today.

For ZIL sizing, I'm using either 1 or 2 32G Intel X25-E SSDs in my 
tests, which, according to what I've read, is 2-4 times larger than the 
maximum that ZFS can possibly use. We've got 32G of system memory in 
these Thors, and (if I'm not mistaken) the maximum amount of in-play 
data can be 16G, 1/2 the system memory.

Also, because I know people will be asking, has anybody ever tried to 
recover from something like a system crash with a ZFS pool that has the 
ZIL disabled? What kind of nightmares would I be facing in such a 
situation? Would I simply just risk losing that in-play data, or could 
more serious things happen? I know disabling the ZIL is an Extremely Bad 
Idea, but I need to tell people exactly why...

-Greg

Jim Mauro wrote:
  You have SSD's for the ZIL (logzilla) enabled, and ZIL IO
  is what is hurting your performance...Hmmm
 
  I'll ask the stupid question (just to get it out of the way) - is
  it possible that the logzilla is undersized?
 
  Did you gather data using Richard Elling's zilstat (included below)?
 
  Thanks,
  /jim
snip
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Add SSD drive as L2ARC(?) cache to existing ZFS raid?

2009-01-29 Thread Greg Mason
How were you running this test?

were you running it locally on the machine, or were you running it over 
something like NFS?

What is the rest of your storage like? just direct-attached (SAS or 
SATA, for example) disks, or are you using a higher-end RAID controller?

-Greg

kristof wrote:
 Kebabber,
 
 You can't expose zfs filesystems over iSCSI.
 
 You only can expose ZFS volumes (raw volumes) over iscsi.
 
 PS: 2 weeks ago I did a few tests, using filebench.
 
 I saw little to no improvement using a 32GB Intel X25E SSD.
 
 Maybe this is because filebench is flushing the cache in between tests.
 
 I also compared iscsi boot time (using gpxe as boot loader) ,
 
 We are using raidz storagepool (4disks). here again, adding the X25E as cache 
 device did not speedup the boot proccess. So I did not see real improvement. 
 
 PS: We have 2 master volumes (xp and vista) which we clone to provision 
 additional guests. 
 
 I'm now waiting for new SSD disks (STEC Zeus 18GB en STEC Mach 100GB.), since 
 those are used in SUN 7000 product. I hope they perform better.
 
 Kristof
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] write cache and cache flush

2009-01-29 Thread Greg Mason
So, I'm still beating my head against the wall, trying to find our 
performance bottleneck with NFS on our Thors.

We've got a couple Intel SSDs for the ZIL, using 2 SSDs as ZIL devices. 
Cache flushing is still enabled, as are the write caches on all 48 disk 
devices.

What I'm thinking of doing is disabling all write caches, and disabling 
the cache flushing.

What would this mean for the safety of data in the pool?

And, would this even do anything to address the performance issue?

-Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache and cache flush

2009-01-29 Thread Greg Mason
This problem only manifests itself when dealing with many small files 
over NFS. There is no throughput problem with the network.

I've run tests with the write cache disabled on all disks, and the cache 
flush disabled. I'm using two Intel SSDs for ZIL devices.

This setup is faster than using the two Intel SSDs with write caches 
enabled on all disks, and with the cache flush enabled.

My test would run around 3.5 to 4 minutes, now it is completing in 
abound 2.5 minutes. I still think this is a bit slow, but I still have 
quite a bit of testing to perform. I'll keep the list updated with my 
findings.

I've already established both via this list and through other research 
that ZFS has performance issues over NFS when dealing with many small 
files. This seems to maybe be an issue with NFS itself, where 
NVRAM-backed storage is needed for decent performance with small files. 
Typically such an NVRAM cache is supplied by a hardware raid controller 
in a disk shelf.

I find it very hard to explain to a user why an upgrade is a step down 
in performance. For the users these Thors are going to serve, such a 
drastic performance hit is a deal breaker...

I've done my homework on this issue, I've ruled out the network as an 
issue, as well as the NFS clients. I've narrowed my particular 
performance issue down to the ZIL, and how well ZFS plays with NFS.

-Greg

Jim Mauro wrote:
 Multiple Thors (more than 2?), with performance problems.
 Maybe it's the common demnominator - the network.
 
 Can you run local ZFS IO loads and determine if performance
 is expected when NFS and the network are out of the picture?
 
 Thanks,
 /jim
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD drives in Sun Fire X4540 or X4500 for dedicated ZIL device

2009-01-23 Thread Greg Mason
If i'm not mistaken (and somebody please correct me if i'm wrong), the 
Sun 7000 series storage appliances (the Fishworks boxes) use enterprise 
SSDs, with dram caching. One such product is made by STEC.

My understanding is that the Sun appliances use one SSD for the ZIL, and 
one as a read cache. For the 7210 (which is basically a Sun Fire X4540), 
that gives you 46 disks and 2 SSDs.

-Greg


Bob Friesenhahn wrote:
 On Thu, 22 Jan 2009, Ross wrote:
 
 However, now I've written that, Sun use SATA (SAS?) SSD's in their 
 high end fishworks storage, so I guess it definately works for some 
 use cases.
 
 But the fishworks (Fishworks is a development team, not a product) 
 write cache device is not based on FLASH.  It is based on DRAM.  The 
 difference is like night and day. Apparently there can also be a read 
 cache which is based on FLASH.
 
 Bob
 ==
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SSD drives in Sun Fire X4540 or X4500 for dedicated ZIL device

2009-01-22 Thread Greg Mason
We're evaluating the possibility of speeding up NFS operations of our 
X4540s with dedicated log devices. What we are specifically evaluating 
is replacing 1 or two of our spare sata disks with sata SSDs.

Has anybody tried using SSD device(s) as dedicated ZIL devices in a 
X4540? Are there any known technical issues with using a SSD in a X4540?

-Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Greg Mason
We're running into a performance problem with ZFS over NFS. When working 
with many small files (i.e. unpacking a tar file with source code), a 
Thor (over NFS) is about 4 times slower than our aging existing storage 
solution, which isn't exactly speedy to begin with (17 minutes versus 3 
minutes).

We took a rough stab in the dark, and started to examine whether or not 
it was the ZIL.

Performing IO tests locally on the Thor shows no real IO problems, but 
running IO tests over NFS, specifically, with many smaller files we see 
a significant performance hit.

Just to rule in or out the ZIL as a factor, we disabled it, and ran the 
test again. It completed in just under a minute, around 3 times faster 
than our existing storage. This was more like it!

Are there any tunables for the ZIL to try to speed things up? Or would 
it be best to look into using a high-speed SSD for the log device?

And, yes, I already know that turning off the ZIL is a Really Bad Idea. 
We do, however, need to provide our users with a certain level of 
performance, and what we've got with the ZIL on the pool is completely 
unacceptable.

Thanks for any pointers you may have...

--

Greg Mason
Systems Administrator
Michigan State University
High Performance Computing Center
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Greg Mason
So, what we're looking for is a way to improve performance, without  
disabling the ZIL, as it's my understanding that disabling the ZIL  
isn't exactly a safe thing to do.

We're looking for the best way to improve performance, without  
sacrificing too much of the safety of the data.

The current solution we are considering is disabling the cache  
flushing (as per a previous response in this thread), and adding one  
or two SSD log devices, as this is similar to the Sun storage  
appliances based on the Thor. Thoughts?

-Greg

On Jan 19, 2009, at 6:24 PM, Richard Elling wrote:

 We took a rough stab in the dark, and started to examine whether or  
 not it was the ZIL.

 It is. I've recently added some clarification to this section in the
 Evil Tuning Guide which might help you to arrive at a better solution.
 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29
 Feedback is welcome.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS, poor performance with many small files

2009-01-19 Thread Greg Mason

 Good idea.  Thor has a CF slot, too, if you can find a high speed
 CF card.
 -- richard

We're already using the CF slot for the OS. We haven't really found  
any CF cards that would be fast enough anyways :)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using ZFS for replication

2009-01-15 Thread Greg Mason
zfs-auto-snapshot (SUNWzfs-auto-snapshot) is what I'm using. Only trick 
is that on the other end, we have to manage our own retention of the 
snapshots we send to our offsite/backup boxes.

zfs-auto-snapshot can handle the sending of snapshots as well.

We're running this in OpenSolaris 2008.11 (snv_100).

Another use I've seen is using zfs-auto-snapshot to take and manage 
snapshots on both ends, using rsync to replicate the data, but that's 
less than ideal for most folks...

-Greg

Ian Mather wrote:
 Fairly new to ZFS. I am looking to replicate data between two thumper boxes.
 Found quite a few articles about using zfs incremental snapshot send/receive. 
 Just a cheeky question to see if anyone has anything working in a live 
 environment and are happy to share the scripts,  save me reinventing the 
 wheel. thanks in advance.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-09 Thread Greg Shaw
Perhaps I mis-understand, but the below issues are all based on Nevada, 
not Solaris 10.  

Nevada isn't production code.  For real ZFS testing, you must use a 
production release, currently Solaris 10 (update 5, soon to be update 6).

In the last 2 years, I've stored everything in my environment (home 
directory, builds, etc.) on ZFS on multiple types of storage subsystems 
without issues.  All of this has been on Solaris 10, however.

Btw, I completely agree on the panic issue.If I have a large DB 
server with many pools, and one inconsequential pool fails, I lose the 
entire DB server.   I'd really like to see an option at the zpool level 
directing what to do in a panic for a particular pool.Perhaps this 
is in the latest bits; if so, sorry, I'm running old stuff.  :-)

I also run ZFS on my mac.  While not production quality, some of the 
panic errors dealing with external (firewire, usb, esata) are very 
irritating.   A hiccup due to a jostled cable, and the entire box 
panics.   That's frustrating.

Timh Bergström wrote:
 Unfortunely I can only agree to the doubts about running ZFS in
 production environments, i've lost ditto-blocks, i''ve gotten
 corrupted pools and a bunch of other failures even in
 mirror/raidz/raidz2 setups with or without hardware mirrors/raid5/6.
 Plus the insecurity of a sudden crash/reboot will corrupt or even
 destroy the pools with restore from backup as the only advice. I've
 been lucky so far about getting my pools back thanks to people like
 Victor.

 What would be needed is a proper fsck for ZFS which can resolv minor
 data corruptions, tools for rebuilding, resizing and moving the data
 about on pools is also needed, even recover of data from faulted
 pools, like there is for ext2/3/ufs/ntfs.

 All in all, great FS but not production ready until the tools are in
 place or it gets really really resillient to minor failures and/or
 crashes in both software and hardware. For now i'll stick to XFS/UFS
 and sw/hw-raid and live with the restrictions of such fs.

 //T

 2008/10/9 Mike Gerdts [EMAIL PROTECTED]:
   
 On Thu, Oct 9, 2008 at 7:44 AM, Ahmed Kamal
 [EMAIL PROTECTED] wrote:
 

In the past year I've lost more ZFS file systems than I have any other
type of file system in the past 5 years.  With other file systems I
can almost always get some data back.  With ZFS I can't get any back.

   
 Thats scary to hear!

 
 I am really scared now! I was the one trying to quantify ZFS reliability,
 and that is surely bad to hear!
   
 The circumstances where I have lost data have been when ZFS has not
 handled a layer of redundancy.  However, I am not terribly optimistic
 of the prospects of ZFS on any device that hasn't committed writes
 that ZFS thinks are committed.  Mirrors and raidz would also be
 vulnerable to such failures.

 I also have run into other failures that have gone unanswered on the
 lists.  It makes me wary about using zfs without a support contract
 that allows me to escalate to engineering.  Patching only support
 won't help.

 http://mail.opensolaris.org/pipermail/zfs-discuss/2007-December/044984.html
   Hang only after I mirrored the zpool, no response on the list

 http://mail.opensolaris.org/pipermail/zfs-discuss/2008-June/048255.html
   I think this is fixed around snv_98, but the zfs-discuss list was
   surprisingly silent on acknowledging it as a problem - I had no
   idea that it was being worked until I saw the commit.  The panic
   seemed to be caused by dtrace - core developers of dtrace
   were quite interested in the kernel crash dump.

 http://mail.opensolaris.org/pipermail/zfs-discuss/2008-September/051109.html
   Panic during ON build.  Pool was lost, no response from list.

 --
 Mike Gerdts
 http://mgerdts.blogspot.com/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 



   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] device alias

2007-09-25 Thread Greg Shaw
It would be a manual process.  As with any arbitrary name, it's a useful 
tag, not much more.

James C. McPherson wrote:
 Gregory Shaw wrote:
   
 Hi.  I'd like to request a feature be added to zfs.  Currently, on  
 SAN attached disk, zpool shows up with a big WWN for the disk.   If  
 ZFS (or the zpool command, in particular) had a text field for  
 arbitrary information, it would be possible to add something that  
 would indicate what LUN on what array the disk in question might be.   
 This would make troubleshooting and general understanding of the  
 actual storage layout much simpler, as you'd know something about any  
 disks that are encountering problems.

 Something like:

 zpool status
pool: local
 state: ONLINE
 scrub: scrub completed with 0 errors on Sun Sep 23 04:16:33 2007
 config:

  NAMESTATE READ WRITE CKSUM  NOTE
  local   ONLINE   0 0 0
raidz1ONLINE   0 0 0
  c2t0d0  ONLINE   0 0 0  Internal SATA on  
 left side
  c2t2d0  ONLINE   0 0 0  Internal SATA on  
 right side
  c2t3d0  ONLINE   0 0 0  External SATA disk 1  
 in box on top
  c2t4d0  ONLINE   0 0 0  External SATA disk 2  
 in box on top
  spares
c2t5d0AVAIL   External SATA disk 3  
 in box on top

 errors: No known data errors


 The above would be very useful should a disk fail to identify what  
 device is what.
 

 How would you gather that information?
 How would you ensure that it stayed accurate in
 a hotplug world?



 James C. McPherson
 --
 Senior Kernel Software Engineer, Solaris
 Sun Microsystems
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] device alias

2007-09-25 Thread Greg Shaw


James C. McPherson wrote:
 Bill Sommerfeld wrote:
   
 On Wed, 2007-09-26 at 08:26 +1000, James C. McPherson wrote:
 
 How would you gather that information?
   
 the tools to use would be dependant on the actual storage device in use.
 luxadm for A5x00 and V8x0 internal storage, sccli for 3xxx, etc., etc., 
 

 No consistent interface to use, then, unless another tool
 or cruft gets added to ZFS to make this happen. That would
 seem to defeat one of the major wins of ZFS - storage
 neutrality.


   
I'd be happy with an arbitrary field that could be assigned via a 
command.  Intelligence could be added later if appropriate, but at this 
point, figuring out what-is-where on a big list of disk IDs on a SAN 
device is very difficult.

So, aiming low, a text field that could be assigned.   In the future, 
perhaps something that would associate a serial number of something 
similar to that name?
 How would you ensure that it stayed accurate in
 a hotplug world?
   
 See above.   

 I'm told that with many jbod arrays, SES has the information.
 

 True, but that's still many, but not all.


 James C. McPherson
 --
 Senior Kernel Software Engineer, Solaris
 Sun Microsystems
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss