Re: [zfs-discuss] Sun X4200 Question...

2013-03-14 Thread Gary Driggs
On Mar 14, 2013, at 5:55 PM, Jim Klimov jimkli...@cos.ru wrote:

 However, recently the VM virtual hardware clocks became way slow.

Does NTP help correct the guest's clock?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Gary Driggs
On Feb 26, 2013, at 12:44 AM, Sašo Kiselkov wrote:

I'd also recommend that you go and subscribe to z...@lists.illumos.org, since
this list is going to get shut down by Oracle next month.


Whose description still reads, everything ZFS running on illumos-based
distributions.

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-22 Thread Gary Mills
On Tue, Jan 22, 2013 at 11:54:53PM +, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:
  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Nico Williams
  
  As for swap... really, you don't want to swap.  If you're swapping you
  have problems.  
 
 In solaris, I've never seen it swap out idle processes; I've only
 seen it use swap for the bad bad bad situation.  I assume that's all
 it can do with swap.

You would be wrong.  Solaris uses swap space for paging.  Paging out
unused portions of an executing process from real memory to the swap
device is certainly beneficial.  Swapping out complete processes is a
desperation move, but paging out most of an idle process is a good
thing.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sonnet Tempo SSD supported?

2012-12-04 Thread Gary Driggs
On Dec 4, 2012, Eugen Leitl wrote:

 Either way I'll know the hardware support situation soon
 enough.

Have you tried contacting Sonnet?

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LUN sizes

2012-10-29 Thread Gary Mills
On Mon, Oct 29, 2012 at 09:30:47AM -0500, Brian Wilson wrote:
 
 First I'd like to note that contrary to the nomenclature there isn't
 any one SAN product that all operates the same. There are a number
 of different vendor provided solutions that use a FC SAN to deliver
 luns to hosts, and they each have their own limitations. Forgive my
 pedanticism please.
 
 On Sun, Oct 28, 2012 at 04:43:34PM +0700, Fajar A. Nugraha wrote:
  On Sat, Oct 27, 2012 at 9:16 PM, Edward Ned Harvey
  (opensolarisisdeadlongliveopensolaris)
  opensolarisisdeadlongliveopensola...@nedharvey.com wrote:
   From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-()
   boun...@opensolaris.org] On Behalf Of Fajar A. Nugraha
  
   So my
   suggestion is actually just present one huge 25TB LUN to zfs and let
   the SAN handle redundancy.
 
 You are entering the uncharted waters of ``multi-level disk
 management'' here. Both ZFS and the SAN use redundancy and error-
 checking to ensure data integrity. Both of them also do automatic
 replacement of failing disks. A good SAN will present LUNs that
 behave as perfectly reliable virtual disks, guaranteed to be error
 free. Almost all of the time, ZFS will find no errors. If ZFS does
 find an error, there's no nice way to recover. Most commonly, this
 happens when the SAN is powered down or rebooted while the ZFS host
 is still running.
 
 On your host side, there's also the consideration of ssd/scsi
 queuing. If you're running on only one LUN, you're limiting your
 IOPS to only one IO queue over your FC paths, and if you have that
 throttled (per many storage vendors recommendations about
 ssd:ssd_max_throttle and zfs:zfs_vdev_max_pending), then one LUN
 will throttle your IOPS back on your host. That might also motivate
 you to split into multiple LUNS so your OS doesn't end up
 bottle-necking your IO before it even gets to your SAN HBA.

That's a performance issue rather than a reliability issue.  The other
performance issue to consider is block size.  At the last place I
worked, we used an Iscsi LUN from a Netapp filer.  This LUN reported a
block size of 512 bytes, even though the Netapp itself used a 4K
block size.  This means that the filer was doing the block size
conversion, resulting in much more I/O than the ZFS layer intended.
The fact that Netapp does COW made this situation even worse.

My impression was that very few of their customers encountered this
performance problem because almost all of them used their Netapp only
for NFS or CIFS.  Our Netapp was extremely reliable but did not have
the Iscsi LUN performance that we needed.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool LUN Sizes

2012-10-28 Thread Gary Mills
On Sun, Oct 28, 2012 at 04:43:34PM +0700, Fajar A. Nugraha wrote:
 On Sat, Oct 27, 2012 at 9:16 PM, Edward Ned Harvey
 (opensolarisisdeadlongliveopensolaris)
 opensolarisisdeadlongliveopensola...@nedharvey.com wrote:
  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Fajar A. Nugraha
 
  So my
  suggestion is actually just present one huge 25TB LUN to zfs and let
  the SAN handle redundancy.
 
  create a bunch of 1-disk volumes and let ZFS handle them as if they're JBOD.
 
 Last time I use IBM's enterprise storage (which was, admittedly, a
 long time ago) you can't even do that. And looking at Morris' mail
 address, it should be revelant :)
 
 ... or probably it's just me who haven't found how to do that. Which
 why I suggested just use whatever the SAN can present :)

You are entering the uncharted waters of ``multi-level disk
management'' here.  Both ZFS and the SAN use redundancy and error-
checking to ensure data integrity.  Both of them also do automatic
replacement of failing disks.  A good SAN will present LUNs that
behave as perfectly reliable virtual disks, guaranteed to be error
free.  Almost all of the time, ZFS will find no errors.  If ZFS does
find an error, there's no nice way to recover.  Most commonly, this
happens when the SAN is powered down or rebooted while the ZFS host
is still running.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What happens when you rm zpool.cache?

2012-10-21 Thread Gary Mills
On Sun, Oct 21, 2012 at 11:40:31AM +0200, Bogdan Ćulibrk wrote:
Follow up question regarding this: is there any way to disable
automatic import of any non-rpool on boot without any hacks of removing
zpool.cache?

Certainly.  Import it with an alternate cache file.  You do this by
specifying the `cachefile' property on the command line.  The `zpool'
man page describes how to do this.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Gary
On Wed, Jul 11, 2012 at 7:48 AM, Casper Dik wrote:

 Dan Brown seems to think so in Digital Fortress but it just means he
 has no grasp on big numbers.


Or little else, for that matter. I seem to recall one character in the book
that would routinely slide under a mainframe on his back as if on a
mechanics dolly, solder CPUs to the motherboard above his face, and perform
all manner of bullshit on the fly repairs that never even existed back in
the earliest days of mid-20th century computing. I don't recall anything
else of a technical nature that made a lick of sense and the story was only
further insulting by the mass of alleged super geniuses that could barely
tie their own shoelaces, etc. etc. Reading the one star reviews of this
book on Amazon is far more enlightening  entertaining than reading the
actual book. I found it so insulting that I couldn't finish the last 70
pages of the paperback.

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [developer] Re: History of EPERM for unlink() of directories on ZFS?

2012-06-26 Thread Gary Mills
On Tue, Jun 26, 2012 at 10:41:14AM -0500, Nico Williams wrote:
 On Tue, Jun 26, 2012 at 9:44 AM, Alan Coopersmith
 alan.coopersm...@oracle.com wrote:
  On 06/26/12 05:46 AM, Lionel Cons wrote:
  On 25 June 2012 11:33,  casper@oracle.com wrote:
  To be honest, I think we should also remove this from all other
  filesystems and I think ZFS was created this way because all modern
  filesystems do it that way.
 
  This may be wrong way to go if it breaks existing applications which
  rely on this feature. It does break applications in our case.
 
  Existing applications rely on the ability to corrupt UFS filesystems?
  Sounds horrible.
 
 My guess is that the OP just wants unlink() of an empty directory to
 be the same as rmdir() of the same.  Or perhaps they want unlink() of
 a non-empty directory to result in a recursive rm...  But if they
 really want hardlinks to directories, then yeah, that's horrible.

This all sounds like a good use for LD_PRELOAD and a tiny library
that intercepts and modernizes system calls.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] IOzone benchmarking

2012-05-03 Thread Gary
On Thu, May 3, 2012 at 7:47 AM, Edward Ned Harvey wrote:

 Given the amount of ram you have, I really don't think you'll be able to get
 any useful metric out of iozone in this lifetime.

I still think it would be apropos if dedup and compression were being
used. In that case, does filebench have an option for testing either
of those?

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] IOzone benchmarking

2012-05-01 Thread Gary Driggs
On May 1, 2012, at 1:41 AM, Ray Van Dolson wrote:

 Throughput:
iozone -m -t 8 -T -r 128k -o -s 36G -R -b bigfile.xls

 IOPS:
iozone -O -i 0 -i 1 -i 2 -e -+n -r 128K -s 288G  iops.txt

Do you expect to be reading or writing 36 or 288Gb files very often on
this array? The largest file size I've used in my still lengthy
benchmarks was 16Gb. If you use the sizes you've proposed, it could
take several days or weeks to complete. Try a web search for iozone
examples if you want more details on the command switches.

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] IOzone benchmarking

2012-05-01 Thread Gary
On 5/1/12, Ray Van Dolson wrote:

 The problem is this box has 144GB of memory.  If I go with a 16GB file
 size (which I did), then memory and caching influences the results
 pretty severely (I get around 3GB/sec for writes!).

The idea of benchmarking -- IMHO -- is to vaguely attempt to reproduce
real world loads. Obviously, this is an imperfect science but if
you're going to be writing a lot of small files (e.g. NNTP or email
servers used to be a good real world example) then you're going to
want to benchmark for that. If you're going to want to write a bunch
of huge files (are you writing a lot of 16GB files?) then you'll want
to test for that. Caching anywhere in the pipeline is important for
benchmarks because you aren't going to turn off a cache or remove RAM
in production are you?

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Seagate Constellation vs. Hitachi Ultrastar

2012-04-06 Thread Gary Driggs
I've seen a couple sources that suggest prices should be dropping by
the end of April -- apparently not as low as pre flood prices due in
part to a rise in manufacturing costs but about 10% lower than they're
priced today.

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Comes to OS X Courtesy of Apple's Former Chief ZFS Architect

2012-01-31 Thread Gary Driggs
It looks like the first iteration has finally launched...

http://tenscomplement.com/our-products/zevo-silver-edition

http://www.macrumors.com/2012/01/31/zfs-comes-to-os-x-courtesy-of-apples-former-chief-zfs-architect
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs and iscsi performance help

2012-01-27 Thread Gary Mills
On Fri, Jan 27, 2012 at 03:25:39PM +1100, Ivan Rodriguez wrote:
 
 We have a backup server with a zpool size of 20 TB, we transfer
 information using zfs snapshots every day (we have around 300 fs on
 that pool),
 the storage is a dell md3000i connected by iscsi, the pool is
 currently version 10, the same storage is connected
  to another server with a smaller pool of 3 TB(zpool version 10) this
 server is working fine and speed is good between the storage
 and the server, however  in the server with 20 TB pool performance is
 an issue  after we restart the server
 performance is good but with the time lets say a week the performance
 keeps dropping until we have to
 bounce the server again (same behavior with new version of solaris in
 this case performance drops in 2 days), no errors in logs or storage
 or the zpool status -v

This sounds like a ZFS cache problem on the server.  You might check
on how cache statistics change over time.  Some tuning may eliminate
this degradation.  More memory may also help.  Does a scrub show any
errors?  Does the performance drop affect reads or writes or both?

 We suspect that the pool has some issues probably there is corruption
 somewhere, we tested solaris 10 8/11 with zpool 29,
 although we haven't update the pool itself, with the new solaris the
 performance is even worst and every time
 that we restart the server we get stuff like this:
 
  SOURCE: zfs-diagnosis, REV: 1.0
  EVENT-ID: 0168621d-3f61-c1fc-bc73-c50efaa836f4
 DESC: All faults associated with an event id have been addressed.
  Refer to http://sun.com/msg/FMD-8000-4M for more information.
  AUTO-RESPONSE: Some system components offlined because of the
 original fault may have been brought back online.
  IMPACT: Performance degradation of the system due to the original
 fault may have been recovered.
  REC-ACTION: Use fmdump -v -u EVENT-ID to identify the repaired components.
 [ID 377184 daemon.notice] SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved,
 VER: 1, SEVERITY: Minor
 
 And we need to export and import the pool in order to be  able to  access it.

This is a separate problem, introduced with an upgrade to the Iscsi
service.  The new one has a dependancy on the name service (typically
DNS), which means that it isn't available when the zpool import is
done during the boot.  Check with Oracle support to see if they have
found a solution.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unable to access the zpool after issue a reboot

2012-01-26 Thread Gary Mills
On Thu, Jan 26, 2012 at 04:36:58PM +0100, Christian Meier wrote:
 Hi Sudheer
 
  3)bash-3.2# zpool status
pool: pool name
   state: UNAVAIL
  status: One or more devices could not be opened.  There are insufficient
  replicas for the pool to continue functioning.
  action: Attach the missing device and online it using 'zpool online'.
 see: http://www.sun.com/msg/ZFS-8000-3C
   scan: none requested
  config:NAMESTATE READ WRITE CKSUM
  pool name  UNAVAIL  0 0 0  insufficient replicas
c5t1d1UNAVAIL  0 0 0  cannot open

This means that, at the time of that import, device c5t1d1 was not
available.  What does `ls -l /dev/rdsk/c5t1d1s0' show for the physical
path?

  And the important thing is when I export  import the zpool, then I
  was able to access it.

Yes, later the device became available.  After the boot, `svcs' will
show you the services listed in order of their completion times.  The
ZFS mount is done by this service:

svc:/system/filesystem/local:default

The zpool import (without the mount) is done earlier.  Check to see
if any of the FC services run too late during the boot.

 As Gary and Bob mentioned, I saw this Issue with ISCSI Devices.
 Instead of export / import is a zpool clear also working?
 
 mpathadm list LU
 mpathadm show LU /dev/rdsk/c5t1d1s2

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unable to access the zpool after issue a reboot

2012-01-24 Thread Gary Mills
On Tue, Jan 24, 2012 at 05:33:39PM +0530, sureshkumar wrote:
 
I am new to Solaris  I am facing an issue with the dynapath [multipath
s/w] for Solaris10u10 x86 .
 
I am facing an issue with the zpool.
 
Whats my problem is unable to access the zpool after issue a reboot.

I've seen this happen when the zpool was built on an Iscsi LUN.  At
reboot time, the ZFS import was done before the Iscsi driver was able
to connect to its target.  After the system was up, an export and
import was successful.  The solution was to add a new service that
imported the zpool later during the reboot.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs defragmentation via resilvering?

2012-01-16 Thread Gary Mills
On Mon, Jan 16, 2012 at 09:13:03AM -0600, Bob Friesenhahn wrote:
 On Mon, 16 Jan 2012, Jim Klimov wrote:
 
 I think that in order to create a truly fragmented ZFS layout,
 Edward needs to do sync writes (without a ZIL?) so that every
 block and its metadata go to disk (coalesced as they may be)
 and no two blocks of the file would be sequenced on disk together.
 Although creating snapshots should give that effect...
 
 In my experience, most files on Unix systems are re-written from
 scatch.  For example, when one edits a file in an editor, the editor
 loads the file into memory, performs the edit, and then writes out
 the whole file.  Given sufficient free disk space, these files are
 unlikely to be fragmented.
 
 The case of slowly written log files or random-access databases are
 the worse cases for causing fragmentation.

The case I've seen was with an IMAP server with many users.  E-mail
folders were represented as ZFS directories, and e-mail messages as
files within those directories.  New messages arrived randomly in the
INBOX folder, so that those files were written all over the place on
the storage.  Users also deleted many messages from their INBOX
folder, but the files were retained in snapshots for two weeks.  On
IMAP session startup, the server typically had to read all of the
messages in the INBOX folder, making this portion slow.  The server
also had to refresh the folder whenever new messages arrived, making
that portion slow as well.  Performance degraded when the storage
became 50% full.  It would increase markedly when the oldest snapshot
was deleted.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does raidzN actually protect against bitrot? If yes - how?

2012-01-15 Thread Gary Mills
On Sun, Jan 15, 2012 at 04:06:33PM +, Peter Tribble wrote:
 On Sun, Jan 15, 2012 at 3:04 PM, Jim Klimov jimkli...@cos.ru wrote:
  Does raidzN actually protect against bitrot?
  That's a kind of radical, possibly offensive, question formula
  that I have lately.
 
 Yup, it does. That's why many of us use it.

There's actually no such thing as bitrot on a disk.  Each sector on
the disk is accompanied by a CRC that's verified by the disk
controller on each read.  It will either return correct data or report
an unreadable sector.  There's nothing inbetween.

Of course, if something outside of ZFS writes to the disk, then data
belonging to ZFS will be modified.  I've heard of RAID controllers or
SAN devices doing this when they modify the disk geometry or reserved
areas on the disk.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any HP Servers recommendation for Openindiana (Capacity Server) ?

2012-01-03 Thread Gary Driggs
I can't comment on their 4U servers but HP's 12U includwd SAS
controllers rarely allow JBOD discovery of drives. So I'd recommend an
LSI card and an external storage chassis like those available from
Promise and others.

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any HP Servers recommendation for Openindiana (Capacity Server) ?

2012-01-03 Thread Gary Driggs
On Jan 3, 2012, at 10:36 PM, Eric D. Mudama wrote:

 Supposedly the H200/H700 cards are just their name for the 6gbit LSI SAS 
 cards, but I haven't tested them personally.

They might use the same chipset but their firmware usually doesn't
support JBOD. Unless they've changed in the last couple of years...
Best you can do is try but if you don't see each drive individually
you'll know it's by design and not lack of skill on your part.

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!

2011-12-19 Thread Gary Mills
On Mon, Dec 19, 2011 at 11:58:57AM +, Jan-Aage Frydenbø-Bruvoll wrote:
 
 2011/12/19 Hung-Sheng Tsao (laoTsao) laot...@gmail.com:
  did you run a scrub?
 
 Yes, as part of the previous drive failure. Nothing reported there.
 
 Now, interestingly - I deleted two of the oldest snapshots yesterday,
 and guess what - the performance went back to normal for a while. Now
 it is severely dropping again - after a good while on 1.5-2GB/s I am
 again seeing write performance in the 1-10MB/s range.

That behavior is a symptom of fragmentation.  Writes slow down
dramatically when there are no contiguous blocks available.  Deleting
a snapshot provides some of these, but only temporarily.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CPU sizing for ZFS/iSCSI/NFS server

2011-12-12 Thread Gary Driggs
On Dec 12, 2011, at 11:42 AM, \Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.\ wrote:

 please check out the ZFS appliance 7120 spec 2.4Ghz /24GB memory and ZIL(SSD)

Do those appliances also use the F20 PCIe flash cards? I know the
Exadata storage cells use them but they aren't utilizing ZFS in the
Linux version of the X2-2. Has that changed with the Solaris x86
versions of the appliance? Also, does OCZ or someone make an
equivalent to the F20 now?

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-11 Thread Gary Driggs
What kind of drives are we talking about? Even SATA drives are
available according to application type (desktop, enterprise server,
home PVR, surveillance PVR, etc). Then there are drives with SAS 
fiber channel interfaces. Then you've got Winchester platters vs SSD
vs hybrids. But even before considering that and all the other system
factors, throughput for direct attached storage can vary greatly not
only from interface type and storage tech but even small on drive
controller firmware differences could potentially introduce variances.
That's why server manufacturers like HP, DELL, et al prefer that you
replace failed drives with one of theirs instead of something off the
shelf because they usually have firmware that's been fine tuned in
house or in conjunction with the manufacturer.


On Dec 11, 2011, at 8:25 AM, Edward Ned Harvey
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Nathan Kroenert

 That reminds me of something I have been wondering about... Why only 12x
 faster? If we are effectively reading from memory - as compared to a
 disk reading at approximately 100MB/s (which is about an average PC HDD
 reading sequentially), I'd have thought it should be a lot faster than
 12x.

 Can we really only pull stuff from cache at only a little over one
 gigabyte per second if it's dedup data?

 Actually, cpu's and memory aren't as fast as you might think.  In a system
 with 12 disks, I've had to write my own dd replacement, because dd
 if=/dev/zero bs=1024k wasn't fast enough to keep the disks busy.  Later, I
 wanted to do something similar, using unique data, and it was simply
 impossible to generate random data fast enough.  I had to tweak my dd
 replacement to write serial numbers, which still wasn't fast enough, so I
 had to tweak my dd replacement to write a big block of static data,
 followed by a serial number, followed by another big block (always smaller
 than the disk block, so it would be treated as unique when hitting the
 pool...)

 1 typical disk sustains 1Gbit/sec.  In theory, 12 should be able to sustain
 12 Gbit/sec.  According to Nathan's email, the memory bandwidth might be 25
 Gbit, of which, you probably need to both read  write, thus making it
 effectively 12.5 Gbit...  I'm sure the actual bandwidth available varies by
 system and memory type.

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HP JBOD D2700 - ok?

2011-11-30 Thread Gary
I'd be wary of purchasing HP HBAs without getting a firsthand report
from someone that they're compatible. I've seen several HP controllers
that use LSI chip sets but are crippled in that they won't present
drives as JBOD. That said, I've used a few of the HBAs sourced from
LSI resellers and they work wonderfully with ZFS.

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS forensics

2011-11-23 Thread Gary Driggs
Is zdb still the only way to dive in to the file system? I've seen the 
extensive work by Max Bruning on this but wonder if there are any tools that 
make this easier...?

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS forensics

2011-11-23 Thread Gary Driggs
On Nov 23, 2011, Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D. :

 did you see this link

Thank you for this. Some of the other refs it lists will come in handy as well.

kind regards,
Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] sd_max_throttle

2011-11-03 Thread Gary
Hi folks,

I'm reading through some I/O performance tuning documents and am
finding some older references to sd_max_throttle kernel/project
settings. Have there been any recent books or documentation written
that talks about this more in depth? It seems to be more appropriate
for FC or DAS but I'm wondering if anyone has had to touch this or
other settings with ZFS appliances they've built...?

-Gary
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Does the zpool cache file affect import?

2011-08-29 Thread Gary Mills
I have a system with ZFS root that imports another zpool from a start
method.  It uses a separate cache file for this zpool, like this:

if [ -f $CCACHE ]
then
echo Importing $CPOOL with cache $CCACHE
zpool import -o cachefile=$CCACHE -c $CCACHE $CPOOL
else
echo Importing $CPOOL with device scan
zpool import -o cachefile=$CCACHE $CPOOL
fi

It also exports that zpool from the stop method, which has the side
effect of deleting the cache.  This all works nicely when the server
is rebooted.

What will happen when the server is halted without running the stop
method, so that that zpool is not exported?  I know that there is a
flag in the zpool that indicates when it's been exported cleanly.  The
cache file will exist when the server reboots.  Will the import fail
with the `The pool was last accessed by another system.' error, or
will the import succeed?  Does the cache change the import behavior?
Does it recognize that the server is the same system?  I don't want
to include the `-f' flag in the commands above when it's not needed.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How create a FAT filesystem on a zvol?

2011-07-12 Thread Gary Mills
On Sun, Jul 10, 2011 at 11:16:02PM +0700, Fajar A. Nugraha wrote:
 On Sun, Jul 10, 2011 at 10:10 PM, Gary Mills mi...@cc.umanitoba.ca wrote:
  The `lofiadm' man page describes how to export a file as a block
  device and then use `mkfs -F pcfs' to create a FAT filesystem on it.
 
  Can't I do the same thing by first creating a zvol and then creating
  a FAT filesystem on it?
 
 seems not.
[...]
 Some solaris tools (like fdisk, or mkfs -F pcfs) needs disk geometry
 to function properly. zvols doesn't provide that. If you want to use
 zvols to work with such tools, the easiest way would be using lofi, or
 exporting zvols as iscsi share and import it again.
 
 For example, if you have a 10MB zvol and use lofi, fdisk would show
 these geometry
 
  Total disk size is 34 cylinders
  Cylinder size is 602 (512 byte) blocks
 
 ... which will then be used if you run mkfs -F pcfs -o
 nofdisk,size=20480. Without lofi, the same command would fail with
 
 Drive geometry lookup (need tracks/cylinder and/or sectors/track:
 Operation not supported

So, why can I do it with UFS?

# zfs create -V 10m rpool/vol1
# newfs /dev/zvol/rdsk/rpool/vol1
newfs: construct a new file system /dev/zvol/rdsk/rpool/vol1: (y/n)? y
Warning: 4130 sector(s) in last cylinder unallocated
/dev/zvol/rdsk/rpool/vol1:  20446 sectors in 4 cylinders of 48 tracks, 128 
sectors
10.0MB in 1 cyl groups (14 c/g, 42.00MB/g, 20160 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32,

Why is this different from PCFS?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How create a FAT filesystem on a zvol?

2011-07-10 Thread Gary Mills
The `lofiadm' man page describes how to export a file as a block
device and then use `mkfs -F pcfs' to create a FAT filesystem on it.

Can't I do the same thing by first creating a zvol and then creating
a FAT filesystem on it?  Nothing I've tried seems to work.  Isn't the
zvol just another block device?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

2011-06-20 Thread Gary Mills
On Sun, Jun 19, 2011 at 08:03:25AM -0700, Richard Elling wrote:
 On Jun 19, 2011, at 6:28 AM, Edward Ned Harvey wrote:
  From: Richard Elling [mailto:richard.ell...@gmail.com]
  Sent: Saturday, June 18, 2011 7:47 PM
  
  Actually, all of the data I've gathered recently shows that the number of
  IOPS does not significantly increase for HDDs running random workloads.
  However the response time does :-( 
  
  Could you clarify what you mean by that?  
 
 Yes. I've been looking at what the value of zfs_vdev_max_pending should be.
 The old value was 35 (a guess, but a really bad guess) and the new value is
 10 (another guess, but a better guess).  I observe that data from a fast, 
 modern 
 HDD, for  1-10 threads (outstanding I/Os) the IOPS ranges from 309 to 333 
 IOPS. 
 But as we add threads, the average response time increases from 2.3ms to 
 137ms.
 Since the whole idea is to get lower response time, and we know disks are not 
 simple queues so there is no direct IOPS to response time relationship, maybe 
 it
 is simply better to limit the number of outstanding I/Os.

How would this work for a storage device with an intelligent
controller that provides only a few LUNs to the host, even though it
contains a much larger number of disks?  I would expect the controller
to be more efficient with a large number of outstanding IOs because it
could distribute those IOs across the disks.  It would, of course,
require a non-volatile cache to provide fast turnaround for writes.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD recommendation for ZFS usage

2011-05-30 Thread Gary Mills
On Mon, May 30, 2011 at 08:06:31AM +0200, Thomas Nau wrote:
 
 We are looking for JBOD systems which
 (1) hold 20+ 3.3 SATA drives
 (2) are rack mountable
 (3) have all the nive hot-swap stuff
 (4) allow 2 hosts to connect via SAS (4+ lines per host) and see
 all available drives as disks, no RAID volume.
 In a perfect world both hosts would connect each using
 two independent SAS connectors
 
 The box will be used in a ZFS Solaris/based fileserver in a
 fail-over cluster setup. Only one host will access a drive
 at any given time.

I'm using a J4200 array as shared storage for a cluster.  It needs a
SAS HBA in each cluster node.  The disks in the array are visible to
both nodes in the cluster.  Here's the feature list.  I don't know if
it's still available:

Sun Storage J4200 Array:
# Scales up to 48 SAS/SATA disk drives
# Provides up to 72 Gb/sec of total bandwidth
* Up to 72 Gb/sec of total bandwidth
* Four x4-wide 3 Gb/sec SAS host/uplink ports (48 Gb/sec bandwidth)
* Two x4-wide 3 Gb/sec SAS expansion ports (24 Gb/sec bandwidth)
* Scales up to 48 drives

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice for boot partition layout in ZFS

2011-04-06 Thread Gary Mills
On Wed, Apr 06, 2011 at 08:08:06AM -0700, Erik Trimble wrote:
On 4/6/2011 7:50 AM, Lori Alt wrote:
On 04/ 6/11 07:59 AM, Arjun YK wrote:
  
I'm not sure there's a defined best practice.  Maybe someone else
can answer that question.  My guess is that in environments where,
before, a separate ufs /var slice was used, a separate zfs /var
dataset with a quota might now be appropriate.
Lori

Traditionally, the reason for a separate /var was one of two major
items:
(a)  /var was writable, and / wasn't - this was typical of diskless or
minimal local-disk configurations. Modern packaging systems are making
this kind of configuration increasingly difficult.
(b) /var held a substantial amount of data, which needed to be handled
separately from /  - mail and news servers are a classic example
For typical machines nowdays, with large root disks, there is very
little chance of /var suddenly exploding and filling /  (the classic
example of being screwed... wink).  Outside of the above two cases,
about the only other place I can see that having /var separate is a
good idea is for certain test machines, where you expect frequent
memory dumps (in /var/crash) - if you have a large amount of RAM,
you'll need a lot of disk space, so it might be good to limit /var in
this case by making it a separate dataset.

People forget (c), the ability to set different filesystem options on
/var.  You might want to have `setuid=off' for improved security, for
example.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] One LUN per RAID group

2011-02-14 Thread Gary Mills
With ZFS on a Solaris server using storage on a SAN device, is it
reasonable to configure the storage device to present one LUN for each
RAID group?  I'm assuming that the SAN and storage device are
sufficiently reliable that no additional redundancy is necessary on
the Solaris ZFS server.  I'm also assuming that all disk management is
done on the storage device.

I realize that it is possible to configure more than one LUN per RAID
group on the storage device, but doesn't ZFS assume that each LUN
represents an independant disk, and schedule I/O accordingly?  In that
case, wouldn't ZFS I/O scheduling interfere with I/O scheduling
already done by the storage device?

Is there any reason not to use one LUN per RAID group?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One LUN per RAID group

2011-02-14 Thread Gary Mills
On Mon, Feb 14, 2011 at 03:04:18PM -0500, Paul Kraus wrote:
 On Mon, Feb 14, 2011 at 2:38 PM, Gary Mills mi...@cc.umanitoba.ca wrote:
 
  Is there any reason not to use one LUN per RAID group?
[...]
 In other words, if you build a zpool with one vdev of 10GB and
 another with two vdev's each of 5GB (both coming from the same array
 and raid set) you get almost exactly twice the random read performance
 from the 2x5 zpool vs. the 1x10 zpool.

This finding is surprising to me.  How do you explain it?  Is it
simply that you get twice as many outstanding I/O requests with two
LUNs?  Is it limited by the default I/O queue depth in ZFS?  After
all, all of the I/O requests must be handled by the same RAID group
once they reach the storage device.

 Also, using a 2540 disk array setup as a 10 disk RAID6 (with 2 hot
 spares), you get substantially better random read performance using 10
 LUNs vs. 1 LUN. While inconvenient, this just reflects the scaling of
 ZFS aith number of vdevs and not spindles.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool-poolname has 99 threads

2011-01-31 Thread Gary Mills
After an upgrade of a busy server to Oracle Solaris 10 9/10, I notice
a process called zpool-poolname that has 99 threads.  This seems to be
a limit, as it never goes above that.  It is lower on workstations.
The `zpool' man page says only:

  Processes
 Each imported pool has an associated process,  named  zpool-
 poolname.  The  threads  in  this process are the pool's I/O
 processing threads, which handle the compression,  checksum-
 ming,  and other tasks for all I/O associated with the pool.
 This process exists to  provides  visibility  into  the  CPU
 utilization  of the system's storage pools. The existence of
 this process is an unstable interface.

There are several thousand processes doing ZFS I/O on the busy server.
Could this new process be a limitation in any way?  I'd just like to
rule it out before looking further at I/O performance.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sliced iSCSI device for doing RAIDZ?

2010-09-24 Thread Gary Mills
On Fri, Sep 24, 2010 at 12:01:35AM +0200, Alexander Skwar wrote:
 
  Suppose they gave you two huge lumps of storage from the SAN, and you
  mirrored them with ZFS.  What would you do if ZFS reported that one of
  its two disks had failed and needed to be replaced?  You can't do disk
  management with ZFS in this situation anyway because those aren't real
  disks.  Disk management all has to be done on the SAN storage device.
 
 Yes. I was rather thinking about RAIDZ instead of mirroring.

I was just using a simpler example.

 Anyway. Without redundancy, ZFS cannot do recovery, can
 it? As far as I understand, it could detect block level corruption,
 even if there's not redundancy. But it could not correct such a
 corruption.
 
 Or is that a wrong understanding?

That's correct, but it also should never happen.

 If I got the gist of what you wrote, it boils down to how reliable
 the SAN is? But also SANs could have block level corruption,
 no? I'm a bit confused, because of the (perceived?) contra-
 diction to the Best Practices Guide? :)

The real problem is that ZFS was not designed to run in a SAN
environment, that is one where all of the disk management and
sufficient redundancy reside in the storage device on the SAN.  ZFS
certainly can't do any disk management in this situation.  Error
detection and correction is still a debatable issue, one that quickly
becomes exceedingly complex.  The decision rests on probabilities
rather than certainties.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sliced iSCSI device for doing RAIDZ?

2010-09-23 Thread Gary Mills
On Tue, Sep 21, 2010 at 05:48:09PM +0200, Alexander Skwar wrote:
 
 We're using ZFS via iSCSI on a S10U8 system. As the ZFS Best
 Practices Guide http://j.mp/zfs-bp states, it's advisable to use
 redundancy (ie. RAIDZ, mirroring or whatnot), even if the underlying
 storage does its own RAID thing.
 
 Now, our storage does RaID and the storage people say, it is
 impossible to have it export iSCSI devices which have no redundancy/
 RAID.

If you have a reliable Iscsi SAN and a reliable storage device, you
don't need the additional redundancy provided by ZFS.

 Actually, were would there be a difference? I mean, those iSCSI
 devices anyway don't represent real disks/spindles, but it's just
 some sort of abstractation. So, if they'd give me 3x400 GB compared
 to 1200 GB in one huge lump like they do now, it could be, that
 those would use the same spots on the real hard drives.

Suppose they gave you two huge lumps of storage from the SAN, and you
mirrored them with ZFS.  What would you do if ZFS reported that one of
its two disks had failed and needed to be replaced?  You can't do disk
management with ZFS in this situation anyway because those aren't real
disks.  Disk management all has to be done on the SAN storage device.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Howto reclaim space under legacy mountpoint?

2010-09-19 Thread Gary Gendel
I moved my home directories to a new disk and then mounted the disk using a 
legacy mount point over /export/home.  Here is the output of the zfs list:

NAME USED  AVAIL  REFER  MOUNTPOINT
rpool   55.8G  11.1G83K  /rpool
rpool/ROOT  21.1G  11.1G19K  legacy
rpool/ROOT/snv-134  21.1G  11.1G  14.3G  /
rpool/dump  1.97G  11.1G  1.97G  -
rpool/export30.8G  11.1G23K  /export
rpool/export/home   30.8G  11.1G  29.3G  legacy
rpool/swap  1.97G  12.9G   144M  -
users   32.8G   881G  31.1G  /export/home

The question is how to remove the files from the orginal rpool/export/home (non 
mount point) rpool?  I a bit nervous to do a:

zfs destroy rpool/export/home

Is the the correct and safe methodology?

Thanks,
Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] can ufs zones and zfs zones coexist on a single global zone

2010-09-17 Thread Gary Dunn
Looking at migrating zones built on an M8000 and M5000 to a new M9000. On the 
M9000 we started building new deployments using ZFS. The environments on the 
M8/M5 are UFS. these are whole root zones. they will use global zone resources. 

Can this be done? Or would a ZFS migration be needed? 

thank you,
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Newbie question

2010-09-05 Thread Gary Gendel
I would like to migrate my home directories to a new mirror.  Currently, I have 
them in rpool:

rpool/export
rpool/export/home

I've created a mirror pool, users.

I figure the steps are:
1) snapshot rpool/export/home
2) send the snapshot to users.
3) unmount rpool/export/home
4) mount pool users to /export/home

So, what are the appropriate commands for these steps?

Thanks,
Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Newbie question

2010-09-05 Thread Gary Gendel
Norm,

Thank you.  I just wanted to double-check to make sure I didn't mess up things. 
 There were steps that I was head-scratching after reading the man page.  I'll 
spend a bit more time re-reading it using the steps outlined so I understand 
these fully.

Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with Equallogic storage

2010-08-22 Thread Gary Mills
On Sat, Aug 21, 2010 at 06:36:37PM -0400, Toby Thain wrote:
 
 On 21-Aug-10, at 3:06 PM, Ross Walker wrote:
 
 On Aug 21, 2010, at 2:14 PM, Bill Sommerfeld bill.sommerf...@oracle.com 
  wrote:
 
 On 08/21/10 10:14, Ross Walker wrote:
 ...
 Would I be better off forgoing resiliency for simplicity, putting  
 all my faith into the Equallogic to handle data resiliency?
 
 IMHO, no; the resulting system will be significantly more brittle.
 
 Exactly how brittle I guess depends on the Equallogic system.
 
 If you don't let zfs manage redundancy, Bill is correct: it's a more  
 fragile system that *cannot* self heal data errors in the (deep)  
 stack. Quantifying the increased risk, is a question that Richard  
 Elling could probably answer :)

That's because ZFS does not have a way to handle a large class of
storage designs, specifically the ones with raw storage and disk
management being provided by reliable SAN devices.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris startup script location

2010-08-18 Thread Gary Mills
On Wed, Aug 18, 2010 at 12:16:04AM -0700, Alxen4 wrote:
 Is there any way run start-up script before non-root pool is mounted ?
 
 For example I'm trying to use ramdisk as ZIL device (ramdiskadm )
 So I need to create ramdisk before actual pool is mounted otherwise it 
 complains that log device is missing :)

Yes, it's actually quite easy.  You need to create an SMF manifest and
method.  The manifest should make the ZFS mount dependant on it with
the `dependent' and `/dependent' tag pair.  It also needs to be
dependant on resources it needs, with the `dependency' and
`/dependency' pairs. It should also specify a `single_instance/' and
`transient' service.  The method script can do whatever the mount
requires, such as creating the ramdisk.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-16 Thread Gary Mills
On Fri, Aug 13, 2010 at 01:54:13PM -0700, Erast wrote:
 
 On 08/13/2010 01:39 PM, Tim Cook wrote:
 http://www.theregister.co.uk/2010/08/13/opensolaris_is_dead/
 
 I'm a bit surprised at this development... Oracle really just doesn't
 get it.  The part that's most disturbing to me is the fact they won't be
 releasing nightly snapshots.  It appears they've stopped Illumos in its
 tracks before it really even got started (perhaps that explains the
 timing of this press release)
 
 Wrong. Be patient, with the pace of current Illumos development it soon 
 will have all the closed binaries liberated and ready to sync up with 
 promised ON code drops as dictated by GPL and CDDL licenses.

Is this what you mean, from:

http://hub.opensolaris.org/bin/view/Main/opensolaris_license

Any Covered Software that You distribute or otherwise make available
in Executable form must also be made available in Source Code form and
that Source Code form must be distributed only under the terms of this
License. You must include a copy of this License with every copy of
the Source Code form of the Covered Software You distribute or
otherwise make available. You must inform recipients of any such
Covered Software in Executable form as to how they can obtain such
Covered Software in Source Code form in a reasonable manner on or
through a medium customarily used for software exchange.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS development moving behind closed doors

2010-08-13 Thread Gary Mills
If this information is correct,

http://opensolaris.org/jive/thread.jspa?threadID=133043

further development of ZFS will take place behind closed doors.
Opensolaris will become the internal development version of Solaris
with no public distributions.  The community has been abandoned.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs upgrade unmounts filesystems

2010-07-29 Thread Gary Mills
Zpool upgrade on this system went fine, but zfs upgrade failed:

# zfs upgrade -a
cannot unmount '/space/direct': Device busy
cannot unmount '/space/dcc': Device busy
cannot unmount '/space/direct': Device busy
cannot unmount '/space/imap': Device busy
cannot unmount '/space/log': Device busy
cannot unmount '/space/mysql': Device busy
2 filesystems upgraded

Do I have to shut down all the applications before upgrading the
filesystems?  This is on a Solaris 10 5/09 system.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs upgrade unmounts filesystems

2010-07-29 Thread Gary Mills
On Thu, Jul 29, 2010 at 10:26:14PM +0200, Pawel Jakub Dawidek wrote:
 On Thu, Jul 29, 2010 at 12:00:08PM -0600, Cindy Swearingen wrote:
  
  I found a similar zfs upgrade failure with the device busy error, which
  I believe was caused by a file system mounted under another file system.
  
  If this is the cause, I will file a bug or find an existing one.

No, it was caused by processes active on those filesystems.

  The workaround is to unmount the nested file systems and upgrade them
  individually, like this:
  
  # zfs upgrade space/direct
  # zfs upgrade space/dcc

Except that I couldn't unmount them because the filesystems were busy.

 'zfs upgrade' unmounts file system first, which makes it hard to upgrade
 for example root file system. The only work-around I found is to clone
 root file system (clone is created with most recent version), change
 root file system to newly created clone, reboot, upgrade original root
 file system, change root file system back, reboot, destroy clone.

In this case it wasn't the root filesystem, but I still had to disable
twelve services before doing the upgrade and enable them afterwards.
`fuser -c' is useful to identify the processes.  Mapping them to
services can be difficult.  The server is essentially down during the
upgrade.

For a root filesystem, you might have to boot off the failsafe archive
or a DVD and import the filesystem in order to upgrade it.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] root pool expansion

2010-07-28 Thread Gary Gendel
Right now I have a machine with a mirrored boot setup.  The SAS drives are 43Gs 
and the root pool is getting full.

I do a backup of the pool nightly, so I feel confident that I don't need to 
mirror the drive and can break the mirror and expand the pool with the detached 
drive.

I understand how to do this on a normal pool, but is there any restrictions for 
doing this on the root pool?  Are there any grub issues?

Thanks,
Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS snapshot zvols/iscsi send backup

2010-07-13 Thread Gary Leong
Thanks for quick response. I appreciate it much.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS snapshot zvols/iscsi send backup

2010-07-12 Thread Gary Leong
I'm looking to use ZFS to export ISCSI volumes to a Windows/Linux client.  
Essentially, I'm looking to create two storage ZFS machines that I will export 
ISCSI targets from.  Then from the client side, I will enable mirrorings.  The 
two ZFS machines will be independent of each other.  I had question about 
snapshoting of ISCSI zvols.  

If I do a snapshot of ISCSI volume, it snapshots the blocks.  I know the 
sending the blocks will allow from some from of replication.  However, if I 
send the snapshot to a file, will I be able to recover the ISCSI volume from 
the file(s)?  

e.g.

zfs send tank/t...@1 | gzip -c  zfs.tank.test.gz

Can I recover this ISCSI volume from zfs.tank.test.gz by sending it directly to 
another ZFS machine?  Will I then be able to mount the ZFS volume created from 
this file and have my filesystem be the way it was?  If I assemble the blocks 
like they were before, I assume it assembles everything the way it was before, 
including the filesytem and such.  

Or am I incorrect about this?

Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] are these errors dangerous

2010-06-08 Thread Gary Mitchell
I have seen this too

I 'm guessing you have SATA disks which are on a iSCSI target.
I'm also guessing you have used something like

iscsitadm create target --type raw -b /dev/dsk/c4t0d00 c4t0d0

ie you are not using a zfs shareiscsi property on a zfs volume but creating  
the target from  the device
cNtNdN (dsk or rdsk it doesn't seem to matter)




You see these errors (always block 0) when the iSCSI initiator accesses the 
disks

annoying ... but the iSCSI transactions seem to be OK.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS disks hitting 100% busy

2010-06-07 Thread Gary Mills
Our e-mail server started to slow down today.  One of the disk devices
is frequently at 100% usage.  The heavy writes seem to cause reads to
run quite slowly.  In the statistics below, `c0t0d0' is UFS, containing
the / and /var slices.  `c0t1d0' is ZFS, containing /var/log/syslog,
a couple of databases, and the GNU mailman files.  It's this latter
disk that's been hitting 100% usage.

  $ iostat -xn 5 3
  extended device statistics
  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  8.2   57.8  142.6  538.2  0.0  1.70.1   25.2   0  48 c0t0d0
  5.8  273.0  303.4 24115.9  0.0 18.60.0   66.7   0  73 c0t1d0
  extended device statistics
  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  0.0   57.20.0  294.6  0.0  1.30.0   22.1   0  64 c0t0d0
  0.2  370.21.1 33968.5  0.0 31.40.0   84.9   1 100 c0t1d0
  extended device statistics
  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  0.8   61.06.4  503.0  0.0  2.50.0   40.0   0  70 c0t0d0
  0.0  295.80.0 35273.3  0.0 35.00.0  118.3   0 100 c0t1d0

This system is running Solaris 10 5/09 on a Sun 4450 server.  Both the
disk devices are actually hardware-mirrored pairs of SAS disks, with
the Adaptec RAID controller.  Can anything be done to either reduce
the amount of I/O or to improve the write bandwidth?  I assume that
adding another disk device to the zpool will double the bandwidth.

/var/log/syslog is quite large, reaching about 600 megabytes before
it's rotated.  This takes place each night, with compression bringing
it down to about 70 megabytes.  The server handles about 500,000
messages a day.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is the J4200 SAS array suitable for Sun Cluster?

2010-05-17 Thread Gary Mills
On Sun, May 16, 2010 at 01:14:24PM -0700, Charles Hedrick wrote:
 We use this configuration. It works fine. However I don't know
 enough about the details to answer all of your questions.
 
 The disks are accessible from both systems at the same time. Of
 course with ZFS you had better not actually use them from both
 systems.

That's what I wanted to know.  I'm not familiar with SAS fabrics, so
it's good to know that they operate similarly to multi-initiator SCSI
in a cluster.

 Actually, let me be clear about what we do. We have two J4200's and
 one J4400. One J4200 uses SAS disks, the others SATA. The two with
 SATA disks are used in Sun cluster configurations as NFS
 servers. They fail over just fine, losing no state. The one with SAS
 is not used with Sun Cluster. Rather, it's a Mysql server with two
 systems, one of them as a hot spare. (It also acts as a mysql slave
 server, but it uses different storage for that.) That means that our
 actual failover experience is with the SATA configuration. I will
 say from experience that in the SAS configuration both systems see
 the disks at the same time. I even managed to get ZFS to mount the
 same pool from both systems, which shouldn't be possible. Behavior
 was very strange until we realized what was going on.

Our situation is that we only need a small amount of shared storate
in the cluster.  It's intended for high-availability of core services,
such as DNS and NIS, rather than as a NAS server.

 I get the impression that they have special hardware in the SATA
 version that simulates SAS dual interface drives. That's what lets
 you use SATA drives in a two-node configuration. There's also some
 additional software setup for that configuration.

That would be the SATA interposer that does that.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does ZFS use large memory pages?

2010-05-07 Thread Gary Mills
On Thu, May 06, 2010 at 07:46:49PM -0700, Rob wrote:
 Hi Gary,
 I would not remove this line in /etc/system.
 We have been combatting this bug for a while now on our ZFS file
 system running JES Commsuite 7.
 
 I would be interested in finding out how you were able to pin point
 the problem.

Our problem was a year ago.  Careful reading of Sun bug reports
helped.  Opening a support case with Sun helped even more.  Large
memory pages were likely not involved.

 We seem to have no worries with the system currently, but when the
 file system gets above 80% we seems to have quite a number of
 issues, much the same as what you've had in the past, ps and prstats
 hanging.
 
 are you able to tell me the IDR number that you applied?

The IDR was only needed last year.  Upgrading to Solaris 10 10/09
and applying the latest patches resolved the problem.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Is the J4200 SAS array suitable for Sun Cluster?

2010-05-03 Thread Gary Mills
I'm setting up a two-node cluster with 1U x86 servers.  It needs a
small amount of shared storage, with two or four disks.  I understand
that the J4200 with SAS disks is approved for this use, although I
haven't seen this information in writing.  Does anyone have experience
with this sort of configuration?  I have a few questions.

I understand that the J4200 with SATA disks will not do SCSI
reservations.  Will it with SAS disks?

The X4140 seems to require two SAS HBAs, one for the internal disks
and one for the external disks.  Is this correct?

Will the disks in the J4200 be accessible from both nodes, so that
the cluster can fail over the storage?  I know this works with a
multi-initiator SCSI bus, but I don't know about SAS behavior.

Is there a smaller, and cheaper, SAS array that can be used in this
configuration?  It would still need to have redundant power and
redundant SAS paths.

I plan to use ZFS everywhere, for the root filesystem and the shared
storage.  The only exception will be UFS for /globaldevices .

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Gary Mills
On Mon, Apr 26, 2010 at 01:32:33PM -0500, Dave Pooser wrote:
 On 4/26/10 10:10 AM, Richard Elling richard.ell...@gmail.com wrote:
 
  SAS shines with multiple connections to one or more hosts.  Hence, SAS
  is quite popular when implementing HA clusters.
 
 So that would be how one builds something like the active/active controller
 failover in standalone RAID boxes. Is there a good resource on doing
 something like that with an OpenSolaris storage server? I could see that as
 a project I might want to attempt.

This is interesting.  I have a two-node SPARC cluster that uses a
multi-initiator SCSI array for shared storage.  As an application
server, it need only two disks in the array.  They are a ZFS mirror.
This all works quite nicely under Sun Cluster.

I'd like to duplicate this configuration with two small x86 servers
and a small SAS array, also with only two disks.  It should be easy to
find a pair of 1U servers, but what's the smallest SAS array that's
available?  Does it need an array controller?  What's needed on the
servers to connect to it?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Proposition of a new zpool property.

2010-03-20 Thread Gary Gendel
I'm not sure I like this at all.  Some of my pools take hours to scrub.  I have 
a cron job run scrubs in sequence...  Start one pool's scrub and then poll 
until it's finished, start the next and wait, and so on so I don't create too 
much load and bring all I/O to a crawl.

The job is launched once a week, so the scrubs have plenty of time to finish. :)

Scrubs every hour?  Some of my pools would be in continuous scrub.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshot recycle freezes system activity

2010-03-11 Thread Gary Mills
On Thu, Mar 04, 2010 at 04:20:10PM -0600, Gary Mills wrote:
 We have an IMAP e-mail server running on a Solaris 10 10/09 system.
 It uses six ZFS filesystems built on a single zpool with 14 daily
 snapshots.  Every day at 11:56, a cron command destroys the oldest
 snapshots and creates new ones, both recursively.  For about four
 minutes thereafter, the load average drops and I/O to the disk devices
 drops to almost zero.  Then, the load average shoots up to about ten
 times normal and then declines to normal over about four minutes, as
 disk activity resumes.  The statistics return to their normal state
 about ten minutes after the cron command runs.

I'm pleased to report that I found the culprit and the culprit was me!
Well, ZFS peculiarities may be involved as well.  Let me explain:

We had a single second-level filesystem and five third-level
filesystems, all with 14 daily snapshots.  The snapshots were
maintained by a cron command that did a `zfs list -rH -t snapshot -o
name' to get the names of all of the snapshots, extracted the part
after the `@', and then sorted them uniquely to get a list of suffixes
that were older than 14 days.  The suffixes were Julian dates so they
sorted correctly.  It then did a `zfs destroy -r' to delete them.  The
recursion was always done from the second-level filesystem.  The
top-level filesystem was empty and had no snapshots.  Here's a portion
of the script:

zfs list -rH -t snapshot -o name $FS | \
cut -d@ -f2 | \
sort -ur | \
sed 1,${NR}d | \
xargs -I '{}' zfs destroy -r $FS@'{}'

zfs snapshot -r $...@$jd

Just over two weeks ago, I rearranged the filesystems so that the
second-level filesystem was newly-created and initially had no
snapshots.  It did have a snapshot taken every day thereafter, so that
eventually it also had 14 of them.  It was during that interval that
the complaints started.  My statistics clearly showed the performance
stall and subsequent recovery.  Once that filesystem reached 14
snapshots, the complaints stopped and the statistics showed only a
modest increase in CPU activity, but no stall.

During this interval, the script was doing a recursive destroy for a
snapshot that didn't exist at the specified level, but only existed in
the descendent filesystems.  I'm assuming that that unusual situation
was the cause of the stall, although I don't have good evidence.  By
the time the complaints reached my ears, and I was able to refine my
statistics gathering sufficiently, the problem had gone away.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshot recycle freezes system activity

2010-03-09 Thread Gary Mills
On Mon, Mar 08, 2010 at 03:18:34PM -0500, Miles Nordin wrote:
  gm == Gary Mills mi...@cc.umanitoba.ca writes:
 
 gm destroys the oldest snapshots and creates new ones, both
 gm recursively.
 
 I'd be curious if you try taking the same snapshots non-recursively
 instead, does the pause go away?  

I'm still collecting statistics, but that is one of the things I'd
like to try.

 Because recursive snapshots are special: they're supposed to
 atomically synchronize the cut-point across all the filesystems
 involved, AIUI.  I don't see that recursive destroys should be
 anything special though.
 
 gm Is it destroying old snapshots or creating new ones that
 gm causes this dead time?
 
 sortof seems like you should tell us this, not the other way
 around. :)  Seriously though, isn't that easy to test?  And I'm curious
 myself too.

Yes, that's another thing I'd like to try.  I'll just put a `sleep'
in the script between the two actions to see if the dead time moves
later in the day.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshot recycle freezes system activity

2010-03-09 Thread Gary Mills
On Mon, Mar 08, 2010 at 01:23:10PM -0800, Bill Sommerfeld wrote:
 On 03/08/10 12:43, Tomas Ögren wrote:
 So we tried adding 2x 4GB USB sticks (Kingston Data
 Traveller Mini Slim) as metadata L2ARC and that seems to have pushed the
 snapshot times down to about 30 seconds.
 
 Out of curiosity, how much physical memory does this system have?

Mine has 64 GB of memory with the ARC limited to 32 GB.  The Cyrus
IMAP processes, thousands of them, use memory mapping extensively.
I don't know if this design affects the snapshot recycle behavior.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshot recycle freezes system activity

2010-03-05 Thread Gary Mills
On Thu, Mar 04, 2010 at 04:20:10PM -0600, Gary Mills wrote:
 We have an IMAP e-mail server running on a Solaris 10 10/09 system.
 It uses six ZFS filesystems built on a single zpool with 14 daily
 snapshots.  Every day at 11:56, a cron command destroys the oldest
 snapshots and creates new ones, both recursively.  For about four
 minutes thereafter, the load average drops and I/O to the disk devices
 drops to almost zero.  Then, the load average shoots up to about ten
 times normal and then declines to normal over about four minutes, as
 disk activity resumes.  The statistics return to their normal state
 about ten minutes after the cron command runs.

I should mention that this seems to be a new problem.  We've been
using the same scheme to cycle snapshots for several years.  The
complaints of an unresponsive interval have only happened recently.
I'm still waiting for our help desk to report on when the complaints
started.  It may be the result of some recent change we made, but
so far I can't tell what that might have been.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Snapshot recycle freezes system activity

2010-03-04 Thread Gary Mills
We have an IMAP e-mail server running on a Solaris 10 10/09 system.
It uses six ZFS filesystems built on a single zpool with 14 daily
snapshots.  Every day at 11:56, a cron command destroys the oldest
snapshots and creates new ones, both recursively.  For about four
minutes thereafter, the load average drops and I/O to the disk devices
drops to almost zero.  Then, the load average shoots up to about ten
times normal and then declines to normal over about four minutes, as
disk activity resumes.  The statistics return to their normal state
about ten minutes after the cron command runs.

Is it destroying old snapshots or creating new ones that causes this
dead time?  What does each of these procedures do that could affect
the system?  What can I do to make this less visible to users?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshot recycle freezes system activity

2010-03-04 Thread Gary Mills
On Thu, Mar 04, 2010 at 07:51:13PM -0300, Giovanni Tirloni wrote:
 
On Thu, Mar 4, 2010 at 7:28 PM, Ian Collins [1]...@ianshome.com
wrote:

Gary Mills wrote:

  We have an IMAP e-mail server running on a Solaris 10 10/09 system.
  It uses six ZFS filesystems built on a single zpool with 14 daily
  snapshots.  Every day at 11:56, a cron command destroys the oldest
  snapshots and creates new ones, both recursively.  For about four
  minutes thereafter, the load average drops and I/O to the disk
  devices
  drops to almost zero.  Then, the load average shoots up to about
  ten
  times normal and then declines to normal over about four minutes,
  as
  disk activity resumes.  The statistics return to their normal state
  about ten minutes after the cron command runs.
  Is it destroying old snapshots or creating new ones that causes
  this
  dead time?  What does each of these procedures do that could affect
  the system?  What can I do to make this less visible to users?
  
  I have a couple of Solaris 10 boxes that do something similar
  (hourly snaps) and I've never seen any lag in creating and
  destroying snapshots.  One system with 16 filesystems takes 5
  seconds to destroy the 16 oldest snaps and create 5 recursive new
  ones.  I logged load average on these boxes and there is a small
  spike on the hour, but this is down to sending the snaps, not
  creating them.
  
We've seen the behaviour that Gary describes while destroying datasets
recursively (600GB and with 7 snapshots). It seems that close to the
end the server stalls for 10-15 minutes and NFS activity stops. For
small datasets/snapshots that doesn't happen or is harder to notice.
Does ZFS have to do something special when it's done releasing the
data blocks at the end of the destroy operation ?

That does sound similar to the problem here.  The zpool is 3 TB in
size with about 1.4 TB used.  It does sound as if the stall happens
during the `zfs destroy -r' rather than during the `zfs snapshot -r'.
What can zfs be doing when the CPU load average drops and disk I/O is
close to zero?

I also had peculiar problem here recently when I was upgrading the ZFS
filesystems on our test server from 3 to 4.  When I tried `zfs upgrade
-a', the command hung for a long time and could not be interrupted,
killed, or traced.  Eventually it terminated on its own.  Only the two
upper-level filesystems had been upgraded.  I upgraded the lower-
level ones individually with `zfs upgrade' with no further problems.
I had previously upgraded the zpool with no problems.  I don't know if
this behavior is related to the stall on the production server.  I
haven't attempted the upgrades there yet.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What Happend to my OpenSolaris X86 Install?

2010-02-11 Thread Gary Gendel
My guess is that the grub bootloader wasn't upgraded on the actual boot disk.  
Search for directions on how to mirror ZFS boot drives and you'll see how to 
copy the correct grub loader onto the boot disk.

If you want to do this simpler, swap the disks.  I did this when I was moving 
from SXCE to OSOL so I could make sure that things worked before making one of 
the drives a mirror.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How do separate ZFS filesystems affect performance?

2010-01-14 Thread Gary Mills
On Thu, Jan 14, 2010 at 10:58:48AM +1100, Daniel Carosone wrote:
 On Wed, Jan 13, 2010 at 08:21:13AM -0600, Gary Mills wrote:
  Yes, I understand that, but do filesystems have separate queues of any
  sort within the ZIL?
 
 I'm not sure. If you can experiment and measure a benefit,
 understanding the reasons is helpful but secondary.  If you can't
 experiment so easily, you're stuck asking questions, as now, to see
 whether the effort of experimenting is potentially worthwhile. 

Yes, we're stuck asking questions.  I appreciate your responses.

 Some other things to note (not necessarily arguments for or against):
 
  * you can have multiple slog devices, in case you're creating
so much ZIL traffic that ZIL queueing is a real problem, however
shared or structured between filesystems.

For the time being, I'd like to stay with the ZIL that's internal
to the zpool.

  * separate filesystems can have different properties which might help
tuning and experiments (logbias, copies, compress, *cache), as well
the recordsize.  Maybe you will find that compress on mailboxes
helps, as long as you're not also compressing the db's?

Yes, that's a good point in favour of a separate filesystem.

  * separate filesystems may have different recovery requirements
(snapshot cycles).  Note that taking snapshots is ~free, but
keeping them and deleting them have costs over time.  Perhaps you
can save some of these costs if the db's are throwaway/rebuildable. 

Also a good point.

  If not, would it help to put the database
  filesystems into a separate zpool?
 
 Maybe, if you have the extra devices - but you need to compare with
 the potential benefit of adding those devices (and their IOPS) to
 benefit all users of the existing pool.
 
 For example, if the databases are a distinctly different enough load,
 you could compare putting them on a dedicated pool on ssd, vs using
 those ssd's as additional slog/l2arc.  Unless you can make quite
 categorical separations between the workloads, such that an unbalanced
 configuration matches an unbalanced workload, you may still be better
 with consolidated IO capacity in the one pool.

As well, I'd like to keep all of the ZFS pools on the same external
storage device.  This makes migrating to a different server quite easy.

 Note, also, you can only take recursive atomic snapshots within the
 one pool - this might be important if the db's have to match the
 mailbox state exactly, for recovery.

That's another good point.  It's certainly better to have synchronized
snapshots.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does ZFS use large memory pages?

2010-01-12 Thread Gary Mills
On Mon, Jan 11, 2010 at 01:43:27PM -0600, Gary Mills wrote:
 
 This line was a workaround for bug 6642475 that had to do with
 searching for for large contiguous pages. The result was high system
 time and slow response.  I can't find any public information on this
 bug, although I assume it's been fixed by now.  It may have only
 affected Oracle database.

I eventually found it.  The bug is not visible from Sunsolve even with
a contract, but it is in bugs.opensolaris.org without one.  This is
extremely confusing.

 I'd like to remove this line from /etc/system now, but I don't know
 if it will have any adverse effect on ZFS or the Cyrus IMAP server
 that runs on this machine.  Does anyone know if ZFS uses large memory
 pages?

Bug 6642475 is still outstanding, although related bugs have been fixed.
I'm going to leave `set pg_contig_disable=1' in place.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repeating scrub does random fixes

2010-01-12 Thread Gary Gendel
Thanks for all the suggestions.  Now for a strange tail...

I tried upgrading to dev 130 and, as expected, things did not go well.  All 
sorts of permission errors flew by during the upgrade stage and it would not 
start X-windows.  I've heard that things installed from the  contrib and extras 
repositories might cause issues but I didn't want to spend the time with my 
server offline while i tried to figure this out.

So, I booted back to 111b and scrubs still showed errors.  Late in the evening, 
the pool faulted preventing any backups from the other servers to this pool.  
Greeted this morning with the recover files from backup status message sent 
shivers up my spine.  This IS my backup.

I exported the pool and then imported it, which it did successfully.  Now the 
scrubs run cleanly (at least for a few repeated scrubs spanning several hours). 
 So, was it hardware?  What the heck could have fixed it by just exporting and 
importing the pool?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How do separate ZFS filesystems affect performance?

2010-01-12 Thread Gary Mills
I'm working with a Cyrus IMAP server running on a T2000 box under
Solaris 10 10/09 with current patches.  Mailboxes reside on six ZFS
filesystems, each containing about 200 gigabytes of data.  These are
part of a single zpool built on four Iscsi devices from our Netapp
filer.

One of these ZFS filesystems contains a number of global and per-user
databases in addition to one sixth of the mailboxes.  I'm thinking of
moving these databases to a separate ZFS filesystem.  Access to these
databases must be quick to ensure responsiveness of the server.  We
are currently experiencing a slowdown in performance when the number
of simultaneous IMAP sessions rises above 3000.  These databases are
opened and memory-mapped by all processes.  They have the usual
requirement for locking and synchronous writes whenever they are
updated.

Is moving the databases (IMAP metadata) to a separate ZFS filesystem
likely to improve performance?  I've heard that this is important, but
I'm not clear why this is.  Does each filesystem have its own queue in
the ARC or ZIL?  Here are some statistics taken while the server was
busy and access was slow:

# /usr/local/sbin/zilstat 5 5
   N-Bytes  N-Bytes/s N-Max-RateB-Bytes  B-Bytes/s B-Max-Rateops  =4kB 
4-32kB =32kB
   1126664 225332 515872   1148518422970363469312292163 
51 79
740536 148107 250896953548819070974005888198106 
24 68
758344 151668 179104   1254604825092092682880227 93 
45 89
603304 120660 204344917913618358272084864179 89 
23 67
948896 189779 346520   1588019231760384173824262108 
32123
# /usr/local/sbin/arcstat 5 5
Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
10:50:16  191M   31M 16   14M8   17M   48   18M   1230G   32G
10:50:211K   148 1076572   5878   1530G   32G
10:50:261K   154 1288765   7296   1830G   32G
10:50:31   79661  7547 6   3525830G   32G
10:50:361K   117  9   105812   5344   1030G   32G

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How do separate ZFS filesystems affect performance?

2010-01-12 Thread Gary Mills
On Tue, Jan 12, 2010 at 11:11:36AM -0600, Bob Friesenhahn wrote:
 On Tue, 12 Jan 2010, Gary Mills wrote:
 
 Is moving the databases (IMAP metadata) to a separate ZFS filesystem
 likely to improve performance?  I've heard that this is important, but
 I'm not clear why this is.
 
 There is an obvious potential benefit in that you are then able to 
 tune filesystem parameters to best fit the needs of the application 
 which updates the data.  For example, if the database uses a small 
 block size, then you can set the filesystem blocksize to match.  If 
 the database uses memory mapped files, then using a filesystem 
 blocksize which is closest to the MMU page size may improve 
 performance.

I found a couple of references that suggest just putting the databases
on their own ZFS filesystem has a great benefit.  One is an e-mail
message to a mailing list from Vincent Fox at UC Davis.  They run a
similar system to ours at that site.  He says:

Particularly the database is important to get it's own filesystem so
that it's queue/cache are separated.

The second one is from:

http://blogs.sun.com/roch/entry/the_dynamics_of_zfs

He says:

For file modification that come with some immediate data integrity
constraint (O_DSYNC, fsync etc.) ZFS manages a per-filesystem intent
log or ZIL.

This sounds like the ZIL queue mentioned above.  Is I/O for each of
those handled separately?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repeating scrub does random fixes

2010-01-11 Thread Gary Gendel
I've just made a couple of consecutive scrubs, each time it found a couple of 
checksum errors but on different drives.  No indication of any other errors.  
That a disk scrubs cleanly on a quiescent pool in one run but fails in the next 
is puzzling.  It reminds me of the snv_120 odd number of disks raidz bug I 
reported.

Looks like I've got to bite the bullet and upgrade to the dev tree and hope for 
the best.

Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Does ZFS use large memory pages?

2010-01-11 Thread Gary Mills
Last April we put this in /etc/system on a T2000 server with large ZFS
filesystems:

set pg_contig_disable=1

This was while we were attempting to solve a couple of ZFS problems
that were eventually fixed with an IDR.  Since then, we've removed
the IDR and brought the system up to Solaris 10 10/09 with current
patches.  It's stable now, but seems slower.

This line was a workaround for bug 6642475 that had to do with
searching for for large contiguous pages. The result was high system
time and slow response.  I can't find any public information on this
bug, although I assume it's been fixed by now.  It may have only
affected Oracle database.

I'd like to remove this line from /etc/system now, but I don't know
if it will have any adverse effect on ZFS or the Cyrus IMAP server
that runs on this machine.  Does anyone know if ZFS uses large memory
pages?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Repeating scrub does random fixes

2010-01-10 Thread Gary Gendel
I've been using a 5-disk raidZ for years on SXCE machine which I converted to 
OSOL.  The only time I ever had zfs problems in SXCE was with snv_120, which 
was fixed.

So, now I'm at OSOL snv_111b and I'm finding that scrub repairs errors on 
random disks.  If I repeat the scrub, it will fix errors on other disks.  
Occasionally it runs cleanly.  That it doesn't happen in a consistent manner 
makes me believe it's not hardware related.

fmdump only reports, three types of errors:

ereport.fs.zfs.checksum
ereport.io.scsi.cmd.disk.tran
ereport.io.scsi.cmd.disk.recovered

The middle one seems to be the issue I'd like to track down the source.  Any 
docs on how to do this?

Thanks,
Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repeating scrub does random fixes

2010-01-10 Thread Gary Gendel

Mattias Pantzare wrote:

On Sun, Jan 10, 2010 at 16:40, Gary Gendel g...@genashor.com wrote:
  

I've been using a 5-disk raidZ for years on SXCE machine which I converted to 
OSOL.  The only time I ever had zfs problems in SXCE was with snv_120, which 
was fixed.

So, now I'm at OSOL snv_111b and I'm finding that scrub repairs errors on 
random disks.  If I repeat the scrub, it will fix errors on other disks.  
Occasionally it runs cleanly.  That it doesn't happen in a consistent manner 
makes me believe it's not hardware related.




That is a good indication for hardware related errors. Software will
do the same thing every time but hardware errors are often random.

But you are running an older version now, I would recommend an upgrade.
  


I would have thought that too if it didn't start right after the switch 
from SXCE to OSOL.  As for an upgrade, use the dev repository on my 
laptop and I find that OSOL updates aren't nearly as stable as SXCE 
was.  I tried for a bit, but always had to go back to 111b because 
something crucial broke.  I was hoping to wait until the official 
release in March in order to let things stabilize.  This is my main 
web/mail/file/etc. server and I don't really want to muck too much.


That said, I may take a gambol on upgrading as we're getting closer to 
the 2010.x release.



Gary

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS filesystems not mounted on reboot with Solaris 10 10/09

2009-12-19 Thread Gary Mills
I have a system that was recently upgraded to Solaris 10 10/09.  It
has a UFS root on local disk and a separate zpool on Iscsi disk.
After a reboot, the ZFS filesystems were not mounted, although the
zpool had been imported.  `zfs mount' showed nothing.  `zfs mount -a'
mounted them nicely.  The `canmount' property is `on'.  Why would they
not be mounted at boot?  This used to work with earlier releases of
Solaris 10.

The `zfs mount -a' at boot is run by the /system/filesystem/local:default
service.  It didn't record any errors on the console or in the log

[ Dec 19 08:09:11 Executing start method (/lib/svc/method/fs-local) ]
[ Dec 19 08:09:12 Method start exited with status 0 ]

Is a dependancy missing?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Permanent errors on two files

2009-12-06 Thread Gary Mills
On Fri, Dec 04, 2009 at 02:52:47PM -0700, Cindy Swearingen wrote:
 
 If space/dcc is a dataset, is it mounted? ZFS might not be able to
 print the filenames if the dataset is not mounted, but I'm not sure
 if this is why only object numbers are displayed.

Yes, it's mounted and is quite an active filesystem.

 I would also check fmdump -eV to see how frequent the hardware
 has had problems.

That shows ZFS checksum errors in July, but nothing since that time.
There were also DIMM errors before that, starting in June.  We
replaced the failed DIMMs, also in July.  This is an X4450 with ECC
memory.  There were no disk errors reported.  I suppose we can blame
the memory.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] freeNAS moves to Linux from FreeBSD

2009-12-06 Thread Gary Gendel
The only reason I thought this news would be of interest is that the 
discussions had some interesting comments.  Basically, there is a significant 
outcry because zfs was going away.  I saw NextentaOS and EON mentioned several 
times as the path to go.

Seem that there is some opportunity for OpenSolaris advocacy in this arena 
while the topic is hot.

Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Permanent errors on two files

2009-12-06 Thread Gary Mills
On Sat, Dec 05, 2009 at 01:52:12AM +0300, Victor Latushkin wrote:
 On Dec 5, 2009, at 0:52, Cindy Swearingen cindy.swearin...@sun.com  
 wrote:
 
 The zpool status -v command will generally print out filenames, dnode
 object numbers, or identify metadata corruption problems. These look
 like object numbers, because they are large, rather than metadata
 objects, but an expert will have to comment.
 
 Yes, thi is object numbers and most likely reason these are not turned  
 into filnames is that corresponding files no longer exist.

That seems to be the case:

# zdb -d space/dcc 0x11e887 0xba25aa
Dataset space/dcc [ZPL], ID 21, cr_txg 19, 20.5G, 3672408 objects

 So I'd run scrub another time, if the files are gone and there are no  
 other corruptions scrub will reset error log and zpool status should  
 become clean.

That worked.  After the scrub, there are no errors reported.

 You might be able to identify these object numbers with zdb, but
 I'm not sure how do that.
 
 You can try to use zdb this way to check if these objects still exist
 
 zdb -d space/dcc 0x11e887 0xba25aa

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] virsh troubling zfs!?

2009-11-04 Thread Gary Pennington
On Tue, Nov 03, 2009 at 11:39:28AM -0800, Ralf Teckelmann wrote:
 Hi and hello,
 
 I have a problem confusing me. I hope someone can help me with it.
 I followed a best practise - I think - using dedicated zfs filesystems for 
 my virtual machines.
 Commands (for completion):
 [i]zfs create rpool/vms[/i]
 [i]zfs create rpool/vms/vm1[/i]
 [i] zfs create -V 10G rpool/vms/vm1/vm1-dsk[/i]
 
 This command creates the file system [i]/rpool/vms/vm1/vm1-dsk[/i] and the 
 according [i]/dev/zvol/dsk/rpool/vms/vm1/vm1-dsk[/i].
 

(Clarification)

Your commands create two filesystems:

rpool/vms
rpool/vms/vm1

You then create a ZFS Volume:

rpool/vms/vm1/vm1-dsk

which results in associated dsk and rdsk devices being created as:

/dev/zvol/dsk/rpool/vms/vm1/vm1-dsk
/dev/zvol/rdsk/rpool/vms/vm1/vm1-dsk

These two nodes are artifacts of the zfs volume implementation and are required
to allow zfs volumes to emulate traditional disk devices. They will appear
and disappear accordingly as zfs volumes are created and destroyed.

 If I delete a VM i set up using this filesystem via[i] virsh undefine vm1[/i] 
 the [i]/rpool/vms/vm1/vm1-dsk[/i] gets also deleted, but the 
 [i]/dev/zvol/dsk/rpool/vms/vm1/vm1-dsk[/i] is left.
 

virsh undefine does not delete filesystems, disks or any other kind of
backing storage. In order to delete the three things you created, you need
to issue:

zfs destroy rpool/vms/vm1/vm1-dsk
zfs destroy rpool/vms/vm1
zfs destroy rpool/vms

or (more simply) you can do it recursively, if there's nothing else to be
affected:

zfs destroy -r rpool/vms

Obviously you need to be careful with recursive destruction that no other
filesystems/volumes are affected.

 Without [i]/rpool/vms/vm1/vm1-dsk[/i] I am not able to do [i]zfs destroy 
 rpool/vms/vm1/vm1-dsk[/i] so the [i]/dev/zvol/dsk/rpool/vms/vm1/vm1-dsk[/i] 
 could not be destroyed and will be left forever!? 
 
 How can I get rid of this problem?

You don't have a problem. When the zfs volume is destroyed (as I describe
above), then the associated devices are also removed.

 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Hope that helps.

Gary
-- 
Gary Pennington
Solaris Core OS
Sun Microsystems
gary.penning...@sun.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Apple shuts down open source ZFS project

2009-10-24 Thread Gary Gendel
Apple is known to strong arm in licensing negotiations.  I'd really like to 
hear the straight-talk about what transpired.

That's ok, it just means that I won't be using mac as a server.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Slow reads with ZFS+NFS

2009-10-20 Thread Gary Gogick
Heya all,

I'm working on testing ZFS with NFS, and I could use some guidance - read
speeds are a bit less than I expected.

Over a gig-e line, we're seeing ~30 MB/s reads on average - doesn't seem to
matter if we're doing large numbers of small files or small numbers of large
files, the speed seems to top out there.  We've disabled pre-fetching, which
may be having some affect on read speads, but proved necessary due to severe
performance issues on database reads with it enabled.  (Reading from the DB
with pre-fetching enabled was taking 4-5 times as long than with it
disabled.)

Write speed seems to be fine.  Testing is showing ~95 MB/s, which seems
pretty decent considering there's been no real network tuning done.

The NFS server we're testing is a Sun x4500, configured with a storage pool
consisting of 20x 2-disk mirrors, using separate SSD for logging.  It's
running the latest version of Nexenta Core.  (We've also got a second x4500
in with a raidZ2 config, running OpenSolaris proper, showing the same issues
with reads.)

We're using NFS v4 via TCP, serving various Linux clients (the majority are
CentOS 5.3).  Connectivity is presently provided by a single gigabit
ethernet link; entirely conventional configuration (no jumbo frames/etc).

Our workload is pretty read heavy; we're serving both website assets and
databases via NFS.  The majority of files being served are small ( 1MB).
The databases are MySQL/InnoDB, with the data in separate zfs filesystems
with a record size of 16k.  The website assets/etc. are in zfs filesystems
with the default record size.  On the database server side of things, we've
disabled InnoDB's double write buffer.

I'm wondering if there's any other tuning that'd be a good idea for ZFS in
this situation, or if there's some NFS tuning that should be done when
dealing specifically with ZFS.  Any advice would be greatly appreciated.

Thanks,

-- 
--
Gary Gogick
senior systems administrator  |  workhabit,inc.

// email: g...@workhabit.com  |  web: http://www.workhabit.com
// office: 866-workhabit  | fax: 919-552-9690

--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow reads with ZFS+NFS

2009-10-20 Thread Gary Gogick
Trevor/all,

We've been timing the copying of actual data (1GB of assorted files,
generally  1MB with numerous larger files thrown in) in an attempt to
simulate real world use.  We've been copying different sets of data around
to try and avoid anything being cached anywhere.

I don't recall the specific numbers, but local reading/writing on the x4500
was definitely well over what can be theoretically pushed through a gig-e
line; so I'm pretty convinced the problem is either with the ZFS+NFS combo
or NFS, rather than with ZFS alone.

I'll do some OpenSolaris - OpenSolaris testing tonight and see what
happens.

Thanks for the replies, appreciate the help!



On Tue, Oct 20, 2009 at 1:43 PM, Trevor Pretty trevor_pre...@eagle.co.nzwrote:

  Gary

 Where you measuring the Linux NFS write performance? It's well know that
 Linux can use NFS in a very unsafe mode and report the write complete when
 it is not all the way to safe storage. This is often reported as Solaris has
 slow NFS write performance. This link does not mention NFS v4 but you might
 want to check. http://nfs.sourceforge.net/

 What's the write performance like between the two OpenSolaris systems?


 Richard Elling wrote:

 cross-posting to nfs-discuss

 On Oct 20, 2009, at 10:35 AM, Gary Gogick wrote:



  Heya all,

 I'm working on testing ZFS with NFS, and I could use some guidance -
 read speeds are a bit less than I expected.

 Over a gig-e line, we're seeing ~30 MB/s reads on average - doesn't
 seem to matter if we're doing large numbers of small files or small
 numbers of large files, the speed seems to top out there.  We've
 disabled pre-fetching, which may be having some affect on read
 speads, but proved necessary due to severe performance issues on
 database reads with it enabled.  (Reading from the DB with pre-
 fetching enabled was taking 4-5 times as long than with it disabled.)


  What is the performance when reading locally (eliminate NFS from the
 equation)?
   -- richard



  Write speed seems to be fine.  Testing is showing ~95 MB/s, which
 seems pretty decent considering there's been no real network tuning
 done.

 The NFS server we're testing is a Sun x4500, configured with a
 storage pool consisting of 20x 2-disk mirrors, using separate SSD
 for logging.  It's running the latest version of Nexenta Core.
 (We've also got a second x4500 in with a raidZ2 config, running
 OpenSolaris proper, showing the same issues with reads.)

 We're using NFS v4 via TCP, serving various Linux clients (the
 majority are  CentOS 5.3).  Connectivity is presently provided by a
 single gigabit ethernet link; entirely conventional configuration
 (no jumbo frames/etc).

 Our workload is pretty read heavy; we're serving both website assets
 and databases via NFS.  The majority of files being served are small
 ( 1MB).  The databases are MySQL/InnoDB, with the data in separate
 zfs filesystems with a record size of 16k.  The website assets/etc.
 are in zfs filesystems with the default record size.  On the
 database server side of things, we've disabled InnoDB's double write
 buffer.

 I'm wondering if there's any other tuning that'd be a good idea for
 ZFS in this situation, or if there's some NFS tuning that should be
 done when dealing specifically with ZFS.  Any advice would be
 greatly appreciated.

 Thanks,

 --
 --
 Gary Gogick
 senior systems administrator  |  workhabit,inc.

 // email: g...@workhabit.com  |  web: http://www.workhabit.com
 // office: 866-workhabit  | fax: 919-552-9690

 --
 ___
 zfs-discuss mailing 
 listzfs-disc...@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss

  ___
 zfs-discuss mailing 
 listzfs-disc...@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss

  *

 *

 www.eagle.co.nz

 This email is confidential and may be legally privileged. If received in
 error please destroy and immediately notify us.




-- 
--
Gary Gogick
senior systems administrator  |  workhabit,inc.

// email: g...@workhabit.com  |  web: http://www.workhabit.com
// office: 866-workhabit  | fax: 919-552-9690

--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] If you have ZFS in production, willing to share some details (with me)?

2009-09-21 Thread Gary Mills
On Fri, Sep 18, 2009 at 01:51:52PM -0400, Steffen Weiberle wrote:
 I am trying to compile some deployment scenarios of ZFS.
 
 # of systems

One, our e-mail server for the entire campus.

 amount of storage

2 TB that's 58% used.

 application profile(s)

This is our Cyrus IMAP spool.  In addition to user's e-mail folders
(directories) and messages (files), it contains global, per-folder,
and per-user databases.  The latter two types are quite small.

 type of workload (low, high; random, sequential; read-only, read-write, 
 write-only)

It's quite active.  Message files arrive randomly and are deleted
randomly.  As a result, files in a directory are not located in
proximity on the storage.  Individual users often read all of their
folders and messages in one IMAP session.  Databases are quite active.
Each incoming message adds a file to a directory and reads or updates
several databases.  Most IMAP I/O is done with mmap() rather than with
read()/write().  So far, IMAP peformance is adequate.  The backup,
done by EMC Networker, is very slow because it must read thousands of
small files in directory order.

 storage type(s)

We are using an Iscsi SAN with storage on a Netapp filer.  It exports
four 500-gb LUNs that are striped into one ZFS pool.  All disk
mangement is done on the Netapp.  We have had several disk failures
and replacements on the Netapp, with no effect on the e-mail server.

 industry

A University with 35,000 enabled e-mail accounts.

 whether it is private or I can share in a summary
 anything else that might be of interest

You are welcome to share this information.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS commands hang after several zfs receives

2009-09-15 Thread Gary Mills
On Tue, Sep 15, 2009 at 08:48:20PM +1200, Ian Collins wrote:
 Ian Collins wrote:
 I have a case open for this problem on Solaris 10u7.
 
 The case has been identified and I've just received an IDR,which I 
 will test next week.  I've been told the issue is fixed in update 8, 
 but I'm not sure if there is an nv fix target.
 
 I'll post back once I've abused a test system for a while.
 
 The IDR I was sent appears to have fixed the problem.  I have been 
 abusing the box for a couple of weeks without any lockups.  Roll on 
 update 8!

Was that IDR140221-17?  That one fixed a deadlock bug for us back
in May.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problem with snv_122 Zpool issue

2009-09-12 Thread Gary Gendel
You shouldn't hit the Raid-Z issue because it only happens with an odd number 
of disks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problem with RAID-Z in builds snv_120 - snv_123

2009-09-03 Thread Gary Gendel
Alan,

Thanks for the detailed explanation.  The rollback successfully fixed my 5-disk 
RAID-Z errors.  I'll hold off another upgrade attempt until 124 rolls out.  
Fortunately, I didn't do a zfs upgrade right away after installing 121.  For 
those that did, this could be very painful.

Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snv_110 - snv_121 produces checksum errors on Raid-Z pool

2009-08-28 Thread Gary Gendel
Alan,

Super find.  Thanks, I thought I was just going crazy until I rolled back to 
110 and the errors disappeared.  When you do work out a fix, please ping me to 
let me know when I can try an upgrade again.

Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snv_110 - snv_121 produces checksum errors on Raid-Z pool

2009-08-27 Thread Gary Gendel
It looks like It's definitely related to the snv_121 upgrade.  I decided to 
roll back to snv_110 and the checksum errors have disappeared.  I'd like to 
issue a bug report, but I don't have any information that might help track this 
down, just lots of checksum errors.

Looks like I'm stuck at snv_110 until someone figures out what is broken.  If 
it helps, here is my properly list for this pool.

g...@phoenix[~]101zfs get all archive
NAME PROPERTY  VALUE  SOURCE
archive  type  filesystem -
archive  creation  Mon Jun 18 20:40 2007  -
archive  used  787G   -
archive  available 1.01T  -
archive  referenced125G   -
archive  compressratio 1.13x  -
archive  mounted   yes-
archive  quota none   default
archive  reservation   none   default
archive  recordsize128K   default
archive  mountpoint/archive   default
archive  sharenfs  offdefault
archive  checksum  on default
archive  compression   on local
archive  atime offlocal
archive  devices   on default
archive  exec  on default
archive  setuidon default
archive  readonly  offdefault
archive  zoned offdefault
archive  snapdir   hidden default
archive  aclmode   groupmask  default
archive  aclinheritrestricted default
archive  canmount  on default
archive  shareiscsioffdefault
archive  xattr on default
archive  copies1  default
archive  version   3  -
archive  utf8only  off-
archive  normalization none   -
archive  casesensitivity   sensitive  -
archive  vscan offdefault
archive  nbmandoffdefault
archive  sharesmb  offlocal
archive  refquota  none   default
archive  refreservationnone   default
archive  primarycache  alldefault
archive  secondarycachealldefault

And each of the sub-pools look like this:

g...@phoenix[~]101zfs get all archive/gary
archive/gary  type  filesystem -
archive/gary  creation  Mon Jun 18 20:56 2007  -
archive/gary  used  141G   -
archive/gary  available 1.01T  -
archive/gary  referenced141G   -
archive/gary  compressratio 1.22x  -
archive/gary  mounted   yes-
archive/gary  quota none   default
archive/gary  reservation   none   default
archive/gary  recordsize128K   default
archive/gary  mountpoint/archive/gary  default
archive/gary  sharenfs  offdefault
archive/gary  checksum  on default
archive/gary  compression   on inherited from 
archive
archive/gary  atime offinherited from 
archive
archive/gary  devices   on default
archive/gary  exec  on default
archive/gary  setuidon default
archive/gary  readonly  offdefault
archive/gary  zoned offdefault
archive/gary  snapdir   hidden default
archive/gary  aclmode   groupmask  default
archive/gary  aclinheritpassthroughlocal
archive/gary  canmount  on default
archive/gary  shareiscsioffdefault
archive/gary  xattr on default
archive/gary  copies1  default
archive/gary  version   3  -
archive/gary  utf8only  off-
archive/gary  normalization none   -
archive/gary  casesensitivity   sensitive  -
archive/gary  vscan offdefault
archive/gary

[zfs-discuss] snv_110 - snv_121 produces checksum errors on Raid-Z pool

2009-08-25 Thread Gary Gendel
I have a 5-500GB disk Raid-Z pool that has been producing checksum errors right 
after upgrading SXCE to build 121.  They seem to be randomly occurring on all 5 
disks, so it doesn't look like a disk failure situation.

Repeatingly running a scrub on the pools randomly repairs between 20 and a few 
hundred checksum errors.

Since I hadn't physically touched the machine, it seems a very strong 
coincidence that it started right after I upgraded to 121.

This machine is a SunFire v20z with a Marvell SATA 8-port controller (the same 
one as in the original thumper).  I've seen this kind of problem way back 
around build 40-50 ish, but haven't seen it after that until now.

Anyone else experiencing this problem or knows how to isolate the problem 
definitively?

Thanks,
Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-07 Thread Gary Mills
On Mon, Jul 06, 2009 at 04:54:16PM +0100, Andrew Gabriel wrote:
 Andre van Eyssen wrote:
 On Mon, 6 Jul 2009, Gary Mills wrote:
 
 As for a business case, we just had an extended and catastrophic
 performance degradation that was the result of two ZFS bugs.  If we
 have another one like that, our director is likely to instruct us to
 throw away all our Solaris toys and convert to Microsoft products.
 
 If you change platform every time you get two bugs in a product, you 
 must cycle platforms on a pretty regular basis!
 
 You often find the change is towards Windows. That very rarely has the 
 same rules applied, so things then stick there.

There's a more general principle in operation here.  Organizations do
sometimes change platforms for peculiar reasons, but once they do that
they're not going to do it again for a long time.  That's why they
disregard problems with the new platform.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-06 Thread Gary Mills
On Sat, Jul 04, 2009 at 07:18:45PM +0100, Phil Harman wrote:
 Gary Mills wrote:
 On Sat, Jul 04, 2009 at 08:48:33AM +0100, Phil Harman wrote:
   
 ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC  
 instead of the Solaris page cache. But mmap() uses the latter. So if  
 anyone maps a file, ZFS has to keep the two caches in sync.
 
 That's the first I've heard of this issue.  Our e-mail server runs
 Cyrus IMAP with mailboxes on ZFS filesystems.  Cyrus uses mmap(2)
 extensively.  I understand that Solaris has an excellent
 implementation of mmap(2).  ZFS has many advantages, snapshots for
 example, for mailbox storage.  Is there anything that we can be do to
 optimize the two caches in this environment?  Will mmap(2) one day
 play nicely with ZFS?
 
[..]
 Software engineering is always about prioritising resource. Nothing 
 prioritises performance tuning attention quite like compelling 
 competitive data. When Bart Smaalders and I wrote libMicro we generated 
 a lot of very compelling data. I also coined the phrase If Linux is 
 faster, it's a Solaris bug. You will find quite a few (mostly fixed) 
 bugs with the synopsis linux is faster than solaris at 
 
 So, if mmap(2) playing nicely with ZFS is important to you, probably the 
 best thing you can do to help that along is to provide data that will 
 help build the business case for spending engineering resource on the issue.

First of all, how significant is the double caching in terms of
performance?  If the effect is small, I won't worry about it anymore.

What sort of data do you need?  Would a list of software products that
utilize mmap(2) extensively and could benefit from ZFS be suitable?

As for a business case, we just had an extended and catastrophic
performance degradation that was the result of two ZFS bugs.  If we
have another one like that, our director is likely to instruct us to
throw away all our Solaris toys and convert to Microsoft products.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-04 Thread Gary Mills
On Sat, Jul 04, 2009 at 08:48:33AM +0100, Phil Harman wrote:
 ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC  
 instead of the Solaris page cache. But mmap() uses the latter. So if  
 anyone maps a file, ZFS has to keep the two caches in sync.

That's the first I've heard of this issue.  Our e-mail server runs
Cyrus IMAP with mailboxes on ZFS filesystems.  Cyrus uses mmap(2)
extensively.  I understand that Solaris has an excellent
implementation of mmap(2).  ZFS has many advantages, snapshots for
example, for mailbox storage.  Is there anything that we can be do to
optimize the two caches in this environment?  Will mmap(2) one day
play nicely with ZFS?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-18 Thread Gary Mills
On Thu, Jun 18, 2009 at 12:12:16PM +0200, Cor Beumer - Storage Solution 
Architect wrote:
 
 What they noticed on the the X4500 systems, that when the zpool became 
 filled up for about 50-60% the performance of the system
 did drop enormously.
 They do claim this has to do with the fragmentation of the ZFS 
 filesystem. So we did try over there putting an S7410 system in with 
 about the same config on disks, 44x 1TB SATA BUT 4x 18GB WriteZilla (in 
 a stripe) we were able to get much and much more i/o's from the system 
 the the comparable X4500, however they did put it in production for a 
 couple of weeks, and as soon as the ZFS filesystem did come in the range 
 of about 50-60% filling the did see the same problem.

We had a similar problem with a T2000 and 2 TB of ZFS storage.  Once
the usage reached 1 TB, the write performance dropped considerably and
the CPU consumption increased.  Our problem was indirectly a result of
fragmentation, but it was solved by a ZFS patch.  I understand that
this patch, which fixes a whole bunch of ZFS bugs, should be released
soon.  I wonder if this was your problem.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What causes slow performance under load?

2009-05-13 Thread Gary Mills
On Mon, Apr 27, 2009 at 04:47:27PM -0500, Gary Mills wrote:
 On Sat, Apr 18, 2009 at 04:27:55PM -0500, Gary Mills wrote:
  We have an IMAP server with ZFS for mailbox storage that has recently
  become extremely slow on most weekday mornings and afternoons.  When
  one of these incidents happens, the number of processes increases, the
  load average increases, but ZFS I/O bandwidth decreases.  Users notice
  very slow response to IMAP requests.  On the server, even `ps' becomes
  slow.
 
 The cause turned out to be this ZFS bug:
 
 6596237: Stop looking and start ganging
 
 Apparently, the ZFS code was searching the free list looking for the
 perfect fit for each write.  With a fragmented pool, this search took
 a very long time, delaying the write.  Eventually, the requests arrived
 faster than writes could be sent to the devices, causing the server
 to be unresponsive.

We also had another problem, due to this ZFS bug:

6591646: Hang while trying to enter a txg while holding a txg open

This was a deadlock, with one thread blocking hundreds of other
threads.  Our symptom was that all zpool I/O would stop and the `ps'
command would hang.  A reboot was the only way out.

If you have a support contract, Sun will supply an IDR that fixes
both problems.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What causes slow performance under load?

2009-04-27 Thread Gary Mills
On Sat, Apr 18, 2009 at 04:27:55PM -0500, Gary Mills wrote:
 We have an IMAP server with ZFS for mailbox storage that has recently
 become extremely slow on most weekday mornings and afternoons.  When
 one of these incidents happens, the number of processes increases, the
 load average increases, but ZFS I/O bandwidth decreases.  Users notice
 very slow response to IMAP requests.  On the server, even `ps' becomes
 slow.

The cause turned out to be this ZFS bug:

6596237: Stop looking and start ganging

Apparently, the ZFS code was searching the free list looking for the
perfect fit for each write.  With a fragmented pool, this search took
a very long time, delaying the write.  Eventually, the requests arrived
faster than writes could be sent to the devices, causing the server
to be unresponsive.

There isn't a patch for this one yet, but Sun will supply an IDR if
you open a support case.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Peculiarities of COW over COW?

2009-04-26 Thread Gary Mills
On Sun, Apr 26, 2009 at 05:19:18PM -0400, Ellis, Mike wrote:

 As soon as you put those zfs blocks ontop of iscsi, the netapp won't
 have a clue as far as how to defrag those iscsi files from the
 filer's perspective.  (It might do some fancy stuff based on
 read/write patterns, but that's unlikely)

Since the LUN is just a large file on the Netapp, I assume that all
it can do is to put the blocks back into sequential order.  That might
have some benefit overall.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Peculiarities of COW over COW?

2009-04-26 Thread Gary Mills
On Sun, Apr 26, 2009 at 05:02:38PM -0500, Tim wrote:
 
On Sun, Apr 26, 2009 at 3:52 PM, Gary Mills [1]mi...@cc.umanitoba.ca
wrote:

  We run our IMAP spool on ZFS that's derived from LUNs on a Netapp
  filer.  There's a great deal of churn in e-mail folders, with
  messages
  appearing and being deleted frequently.

  Should ZFS and the Netapp be using the same blocksize, so that they
  cooperate to some extent?
  
Just make sure ZFS is using a block size that is a multiple of 4k,
which I believe it does by default.

Okay, that's good.

I have to ask though... why not just serve NFS off the filer to the
Solaris box?  ZFS on a LUN served off a filer seems to make about as
much sense as sticking a ZFS based lun behind a v-filer (although the
latter might actually might make sense in a world where it were
supported *cough*neverhappen*cough* since you could buy the cheap
newegg disk).

I prefer NFS too, but the IMAP server requires POSIX semantics.
I believe that NFS doesn't support that, at least NFS version 3.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What is the 32 GB 2.5-Inch SATA Solid State Drive?

2009-04-25 Thread Gary Mills
On Fri, Apr 24, 2009 at 09:08:52PM -0700, Richard Elling wrote:
 Gary Mills wrote:
 Does anyone know about this device?
 
 SESX3Y11Z 32 GB 2.5-Inch SATA Solid State Drive with Marlin Bracket
 for Sun SPARC Enterprise T5120, T5220, T5140 and T5240 Servers, RoHS-6
 Compliant
 
 This is from Sun's catalog for the T5120 server.  Would this work well
 as a separate ZIL device for ZFS?  Is there any way I could use this in
 a T2000 server?  The brackets appear to be different.
 
 The brackets are different.  T2000 uses nemo bracket and T5120 uses
 marlin.  For the part-number details, SunSolve is your friend.
 http://sunsolve.sun.com/handbook_pub/validateUser.do?target=Systems/SE_T5120/components
 http://sunsolve.sun.com/handbook_pub/validateUser.do?target=Systems/SunFireT2000_R/components

I see also that no SSD is listed for the T2000.  Has anyone gotten one
to work as a separate ZIL device for ZFS?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] What is the 32 GB 2.5-Inch SATA Solid State Drive?

2009-04-24 Thread Gary Mills
Does anyone know about this device?

SESX3Y11Z 32 GB 2.5-Inch SATA Solid State Drive with Marlin Bracket
for Sun SPARC Enterprise T5120, T5220, T5140 and T5240 Servers, RoHS-6
Compliant

This is from Sun's catalog for the T5120 server.  Would this work well
as a separate ZIL device for ZFS?  Is there any way I could use this in
a T2000 server?  The brackets appear to be different.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   3   >