[zfs-discuss] Possible ZFS problem

2011-08-13 Thread andy thomas
We are using ZFS on a Sun E450 server (4 x 400 MHz CPU, 1 Gb memory, 18 Gb 
system disk and 19 x 300 Gb disks running OSOL snv 134) for archive 
storage where speed is not important. We have 2 RAID-Z1 pools of 8 disks 
plus one spare disk shared between the two pools and this has apparently 
worked well since it was set up several months ago.


However, one of our users recently put a 35 Gb tar.gz file on this server 
and uncompressed it to a 215 Gb tar file. But when he tried to untar it, 
after about 43 Gb had been extracted we noticed the disk usage reported by 
df for that ZFS pool wasn't changing much. Using du -sm on the extracted 
archive directory showed that the size would increase over a period of 30 
seconds or so and then suddenly drop back about 50 Mb and start increasing 
again. In other words it seems to be going into some sort of a loop and 
all we could do was to kill tar and try again when exactly the same thing 
happened after 43 Gb had been extracted.


Thinking the tar file could be corrupt, we sucessfully untarred the file 
on a Linux system (1 Tb disk with plain ext3 filesystem). I suspect my 
problem may be due to limited memory on this system but are there any 
other things I should take into consideration? It's not a major problem as 
the system is intended for storage and users are not supposed to go in and 
untar huge tarfiles on it as it's not a fast system ;-)


Andy


Andy Thomas,
Time Domain Systems

Tel: +44 (0)7866 556626
Fax: +44 (0)20 8372 2582
http://www.time-domain.co.uk
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Possible ZFS problem

2011-08-13 Thread andy thomas

On Sat, 13 Aug 2011, Bob Friesenhahn wrote:


On Sat, 13 Aug 2011, andy thomas wrote:
However, one of our users recently put a 35 Gb tar.gz file on this server 
and uncompressed it to a 215 Gb tar file. But when he tried to untar it, 
after about 43 Gb had been extracted we noticed the disk usage reported by 
df for that ZFS pool wasn't changing much. Using du -sm on the extracted 
archive directory showed that the size would increase over a period of 30 
seconds or so and then suddenly drop back about 50 Mb and start increasing 
again. In other words it seems to be going into some sort of a loop and all 
we could do was to kill tar and try again when exactly the same thing 
happened after 43 Gb had been extracted.


What 'tar' program were you using?  Make sure to also try using the 
Solaris-provided tar rather than something like GNU tar.


I was using GNU tar actually as the original archive was created on a 
Linux machine. I will try it again using Solaris tar.


1GB of memory is not very much for Solaris to use.  A minimum of 2GB is 
recommended for zfs.


We are going to upgrade the system to 4 Gb as soon as possible.

Thanks for the quick response,

Andy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Possible ZFS problem

2011-08-13 Thread andy thomas

On Sat, 13 Aug 2011, Joerg Schilling wrote:


andy thomas a...@time-domain.co.uk wrote:


What 'tar' program were you using?  Make sure to also try using the
Solaris-provided tar rather than something like GNU tar.


I was using GNU tar actually as the original archive was created on a
Linux machine. I will try it again using Solaris tar.


GNU tar does not follow the standard when creating archives, so Sun tar may be
unable to unpack the archive correctly.


So it is GNU tar that is broken and not Solaris tar? I always thought it 
was the other way round. Thanks for letting me know.



But GNU tar makes strange things when unpacking symlinks.

I recommend to use star, it understands GNU tar archives.


I've just installed this (version 1.5a78) from Sunfreeware and am having a 
play.


Danke!

Andy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SUNWsmbs SUNWsmbskr for Sparc OSOL snv_134?

2011-12-18 Thread andy thomas
Does anyone know where I can still find the SUNWsmbs and SUNWsmbskr 
packages for the Sparc version of OpenSolaris? I wanted to experiment with 
ZFS/CIFS on my Sparc server but the ZFS share command fails with:


zfs set sharesmb=on tank1/windows
cannot share 'tank1/windows': smb add share failed

modinfo reports that the nsmb driver is loaded but I think smbsrv also 
needs to be loaded.


The available documentation suggests that SUNWsmbs and SUNWsmbskr need to 
be installed. My system has SUNWsmbfskr installed and according to pkginfo 
this provides 'SMB/CIFS File System client support (Kernel)' - is this the 
same package as SUNWsmbskr?


Thanks in adavnce for any suggestions,

Andy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Failing disk(s) or controller in ZFS pool?

2012-02-14 Thread andy thomas
On one of our servers, we have a RAIDz1 ZFS pool called 'maths2' 
consisting of 7 x 300 Gb disks which in turn contains a single ZFS 
filesystem called 'home'.


Yesterday, using the 'ls' command to list the directories within this pool 
caused the command to hang for a long period period followed by an 'i/o 
error' message. 'zpool status -x maths2' reports the pool is healthy but 
'iostat -en' shows a rather different story:


root@e450:~# iostat -en
   errors ---
  s/w h/w trn tot device
0   0   0   0 fd0
0   0   0   0 c2t3d0
0   0   0   0 c2t0d0
0   0   0   0 c2t1d0
0   0   0   0 c5t3d0
0   0   0   0 c4t0d0
0   0   0   0 c4t1d0
0   0   0   0 c2t2d0
0   0   0   0 c4t2d0
0   0   0   0 c4t3d0
0   0   0   0 c5t0d0
0   0   0   0 c5t1d0
0   0   0   0 c8t0d0
0   0   0   0 c8t1d0
0   0   0   0 c8t2d0
0 503 1658 2161 c9t0d0
0 2515 6260 8775 c9t1d0
0   0   0   0 c8t3d0
0 492 2024 2516 c9t2d0
0 444 1810 2254 c9t3d0
0   0   0   0 c5t2d0
0   1   0   1 rmt/2

Obviously it looks like controller c9 or the cabling associated with it is 
in trouble (the server is an Enterprise 450 with multiple disk 
controllers). On taking the server down and running the 'probe-scsi-all' 
command from the OBP, one disk c9t1d0 was reported as being faulty (no 
media present) but the others seemed fine.


After booting back up, I started scrubbing the maths2 pool and for a long 
time, only disk c9t1d0 reported it was being repaired. After a few hours, 
another disk on this controller reported being repaired:


NAMESTATE READ WRITE CKSUM
maths2  ONLINE   0 0 0
  raidz1-0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c8t3d0  ONLINE   0 0 0
c9t0d0  ONLINE   0 0 0  21K repaired
c9t1d0  ONLINE   0 0 0  938K repaired
c9t2d0  ONLINE   0 0 0
c9t3d0  ONLINE   0 0 0

errors: No known data errors

Now, does this point to a controller/cabling/backplane problem or could 
all 4 disks on this controller have been corrupted in some way? The O/S is 
Osol snv_134 for SPARC and the server has been up  running for nearly a 
year with no problems to date - there are two other RAIDz1 pools on this 
server but these are working fine.


Andy

-
Andy Thomas,
Time Domain Systems

Tel: +44 (0)7866 556626
Fax: +44 (0)20 8372 2582
http://www.time-domain.co.uk
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Failing disk(s) or controller in ZFS pool?

2012-02-14 Thread andy thomas

On Tue, 14 Feb 2012, Richard Elling wrote:


Hi Andy

On Feb 14, 2012, at 10:37 AM, andy thomas wrote:


On one of our servers, we have a RAIDz1 ZFS pool called 'maths2' consisting of 
7 x 300 Gb disks which in turn contains a single ZFS filesystem called 'home'.

Yesterday, using the 'ls' command to list the directories within this pool 
caused the command to hang for a long period period followed by an 'i/o error' 
message. 'zpool status -x maths2' reports the pool is healthy but 'iostat -en' 
shows a rather different story:

root@e450:~# iostat -en
  errors ---
 s/w h/w trn tot device
   0   0   0   0 fd0
   0   0   0   0 c2t3d0
   0   0   0   0 c2t0d0
   0   0   0   0 c2t1d0
   0   0   0   0 c5t3d0
   0   0   0   0 c4t0d0
   0   0   0   0 c4t1d0
   0   0   0   0 c2t2d0
   0   0   0   0 c4t2d0
   0   0   0   0 c4t3d0
   0   0   0   0 c5t0d0
   0   0   0   0 c5t1d0
   0   0   0   0 c8t0d0
   0   0   0   0 c8t1d0
   0   0   0   0 c8t2d0
   0 503 1658 2161 c9t0d0
   0 2515 6260 8775 c9t1d0
   0   0   0   0 c8t3d0
   0 492 2024 2516 c9t2d0
   0 444 1810 2254 c9t3d0
   0   0   0   0 c5t2d0
   0   1   0   1 rmt/2

Obviously it looks like controller c9 or the cabling associated with it is in 
trouble (the server is an Enterprise 450 with multiple disk controllers). On 
taking the server down and running the 'probe-scsi-all' command from the OBP, 
one disk c9t1d0 was reported as being faulty (no media present) but the others 
seemed fine.


We see similar symptoms when a misbehaving disk (usually SATA) disrupts the
other disks in the same fault zone.


OK, I will replace the disk.


After booting back up, I started scrubbing the maths2 pool and for a long time, 
only disk c9t1d0 reported it was being repaired. After a few hours, another 
disk on this controller reported being repaired:

   NAMESTATE READ WRITE CKSUM
   maths2  ONLINE   0 0 0
 raidz1-0  ONLINE   0 0 0
   c5t2d0  ONLINE   0 0 0
   c5t3d0  ONLINE   0 0 0
   c8t3d0  ONLINE   0 0 0
   c9t0d0  ONLINE   0 0 0  21K repaired
   c9t1d0  ONLINE   0 0 0  938K repaired
   c9t2d0  ONLINE   0 0 0
   c9t3d0  ONLINE   0 0 0

errors: No known data errors

Now, does this point to a controller/cabling/backplane problem or could all 4 disks 
on this controller have been corrupted in some way? The O/S is Osol snv_134 for 
SPARC and the server has been up  running for nearly a year with no problems 
to date - there are two other RAIDz1 pools on this server but these are working 
fine.


Not likely. More likely the faulty disk causing issues elsewhere.


It sems odd that 'zpool status' is not reporting a degraded status and 
'zpool status -x' is still saying all pools are healthy. This is a 
little worrying as I use remote monitoring to keep an eye on all the 
servers I admin (many of which run Solaris, OpenIndiana and FreeBSD) and 
one thing that is checked every 15 minutes is the pool status using 'zpool 
status -x'. But this seems to result in a false sense of security and I 
could be blissfully unaware that half a pool has dropped out!



NB, for file and RAID systems that do not use checksums, such corruptions
can be catastrophic. Yea ZFS!


Yes indeed!

cheers, Andy

-
Andy Thomas,
Time Domain Systems

Tel: +44 (0)7866 556626
Fax: +44 (0)20 8372 2582
http://www.time-domain.co.uk
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server upgrade

2012-02-15 Thread andy thomas

On Wed, 15 Feb 2012, David Dyer-Bennet wrote:


While I'm not in need of upgrading my server at an emergency level, I'm
starting to think about it -- to be prepared (and an upgrade could be
triggered by a failure at this point; my server dates to 2006).


One of my most vital servers is a Netra 150 dating from 1997 - still going 
strong, crammed with 12 x 300 Gb disks and running Solaris 9. I think one 
ought to have more faith in Sun hardware.


Andy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server upgrade

2012-02-16 Thread andy thomas

On Thu, 16 Feb 2012, Edward Ned Harvey wrote:


From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of andy thomas

One of my most vital servers is a Netra 150 dating from 1997 - still going
strong, crammed with 12 x 300 Gb disks and running Solaris 9. I think one
ought to have more faith in Sun hardware.


If it's one of your most vital, I think you should have less faith in Sun
hardware.
If it's one of your nobody really cares, I can easily replace it, servers
then... sounds good.  Keep it on as long as it's alive.


Well, it's used as an off-site backup server whose content is in 
addition mirrored to another Linux server internally and as all the 
Netra's disks are UFS, if I ever had a problem with it I'd just pull them 
all out and transfer them to an E450 and power that on in its place.


Andy

-
Andy Thomas,
Time Domain Systems

Tel: +44 (0)7866 556626
Fax: +44 (0)20 8372 2582
http://www.time-domain.co.uk
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Question about ZFS snapshots

2012-09-20 Thread andy thomas
I have a ZFS filseystem and create weekly snapshots over a period of 5 
weeks called week01, week02, week03, week04 and week05 respectively. Ny 
question is: how do the snapshots relate to each other - does week03 
contain the changes made since week02 or does it contain all the changes 
made since the first snapshot, week01, and therefore includes those in 
week02?


To rollback to week03, it's necesaary to delete snapshots week04 and 
week05 first but what if week01 and week02 have also been deleted - will 
the rollback still work or is it ncessary to keep earlier snapshots?


Andy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS best practice for FreeBSD?

2012-10-11 Thread andy thomas
According to a Sun document called something like 'ZFS best practice' I 
read some time ago, best practice was to use the entire disk for ZFS and 
not to partition or slice it in any way. Does this advice hold good for 
FreeBSD as well?


I looked at a server earlier this week that was running FreeBSD 8.0 and 
had 2 x 1 Tb SAS disks in a ZFS 13 mirror with a third identical disk as a 
spare. Large file I/O throughput was OK but the mail jail it hosted had 
periods when it was very slow with accessing lots of small files. All 
three disks (the two in the ZFS mirror plus the spare) had been 
partitioned with gpart so that partition 1 was a 6 GB swap and partition 2 
filled the rest of the disk and had a 'freebsd-zfs' partition on it. It 
was these second partitions that were part of the mirror.


This doesn't sound like a very good idea to me as surelt disk seeks for 
swap and for ZFS file I/O are bound to clash. aren't they?


Another point about the Sun ZFS paper - it mentioned optimum performance 
would be obtained with RAIDz pools if the number of disks was between 3 
and 9. So I've always limited my pools to a maximum of 9 active disks plus 
spares but the other day someone here was talking of seeing hundreds of 
disks in a single pool! So what is the current advice for ZFS in Solaris 
and FreeBSD?


Andy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS best practice for FreeBSD?

2012-10-12 Thread andy thomas

On Thu, 11 Oct 2012, Freddie Cash wrote:


On Thu, Oct 11, 2012 at 2:47 PM, andy thomas a...@time-domain.co.uk wrote:

According to a Sun document called something like 'ZFS best practice' I read
some time ago, best practice was to use the entire disk for ZFS and not to
partition or slice it in any way. Does this advice hold good for FreeBSD as
well?


Solaris disabled the disk cache if the disk was partitioned, thus the
recommendation to always use the entire disk with ZFS.

FreeBSD's GEOM architecture allows the disk cache to be enabled
whether you use the full disk or partition it.

Personally, I find it nicer to use GPT partitions on the disk.  That
way, you can start the partition at 1 MB (gpart add -b 2048 on 512B
disks, or gpart add -b 512 on 4K disks), leave a little wiggle-room
at the end of the disk, and use GPT labels to identify the disk (using
gpt/label-name for the device when adding to the pool).


This is apparently what had been done in this case:

gpart add -b 34 -s 600 -t freebsd-swap da0
gpart add -b 634 -s 1947525101 -t freebsd-zfs da1
gpart show

(stuff relating to a compact flash/SATA boot disk deleted)

=34  1953525101  da0  GPT  (932G)
  34 6001  freebsd-swap  (2.9G)
 634  19475251012  freebsd-zfs  (929G)

=34  1953525101  da2  GPT  (932G)
  34 6001  freebsd-swap  (2.9G)
 634  19475251012  freebsd-zfs  (929G)

=34  1953525101  da1  GPT  (932G)
  34 6001  freebsd-swap  (2.9G)
 634  19475251012  freebsd-zfs  (929G)


Is this a good scheme? The server has 12 G of memory (upped from 4 GB last 
year after it kept crashing with out of memory reports on the console 
screen) so I doubt the swap would actually be used very often. Running 
Bonnie++ on this pool comes up with some very good results for sequential 
disk writes but the latency of over 43 seconds for block reads is 
terrible and is obviously impacting performance as a mail server, as shown 
here:


Version  1.96   --Sequential Output-- --Sequential Input- --Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
hsl-main.hsl.of 24G63  67 80584  20 70568  17   314  98 554226  60 410.1  13
Latency 77140us   43145ms   28872ms 171ms 212ms 232ms
Version  1.96   --Sequential Create-- Random Create
hsl-main.hsl.office -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16 19261  93 + +++ 18491  97 21542  92 + +++ 20691  94
Latency 15399us 488us 226us   27733us 103us 138us


The other issue with this server is it needs to be rebooted every 8-10 
weeks as disk I/O slows to a crawl over time and the server becomes 
unusable. After a reboot, it's fine again. I'm told ZFS 13 on FreeBSD 8.0 
has a lot of problems so I was planning to rebuild the server with FreeBSD 
9.0 and ZFS 28 but I didn't want to make any basic design mistakes in 
doing this.



Another point about the Sun ZFS paper - it mentioned optimum performance
would be obtained with RAIDz pools if the number of disks was between 3 and
9. So I've always limited my pools to a maximum of 9 active disks plus
spares but the other day someone here was talking of seeing hundreds of
disks in a single pool! So what is the current advice for ZFS in Solaris and
FreeBSD?


You can have multiple disks in a vdev.  And you can multiple vdevs in
a pool.  Thus, you can have hundred of disks in a pool.  :)  Just
split the disks up into multiple vdevs, where each vdev is under 9
disks each.  :)  For example, we have 25 disks in the following pool,
but only 6 disks in each vdev (plus log/cache):


[root@alphadrive ~]# zpool list -v
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
storage24.5T  20.7T  3.76T84%  3.88x  DEGRADED  -
 raidz2   8.12T  6.78T  1.34T -
   gpt/disk-a1-  -  - -
   gpt/disk-a2-  -  - -
   gpt/disk-a3-  -  - -
   gpt/disk-a4-  -  - -
   gpt/disk-a5-  -  - -
   gpt/disk-a6-  -  - -
 raidz2   5.44T  4.57T   888G -
   gpt/disk-b1-  -  - -
   gpt/disk-b2-  -  - -
   gpt/disk-b3-  -  - -
   gpt/disk-b4-  -  - -
   gpt/disk-b5-  -  - -
   gpt/disk-b6-  -  - -
 raidz2   5.44T  4.60T   863G -
   gpt/disk-c1

Re: [zfs-discuss] ZFS best practice for FreeBSD?

2012-10-12 Thread andy thomas

On Thu, 11 Oct 2012, Richard Elling wrote:


On Oct 11, 2012, at 2:58 PM, Phillip Wagstrom phillip.wagst...@gmail.com 
wrote:



On Oct 11, 2012, at 4:47 PM, andy thomas wrote:


According to a Sun document called something like 'ZFS best practice' I read 
some time ago, best practice was to use the entire disk for ZFS and not to 
partition or slice it in any way. Does this advice hold good for FreeBSD as 
well?


My understanding of the best practice was that with Solaris prior to 
ZFS, it disabled the volatile disk cache.


This is not quite correct. If you use the whole disk ZFS will attempt to enable 
the
write cache. To understand why, remember that UFS (and ext, by default) can die 
a
horrible death (+fsck) if there is a power outage and cached data is not 
flushed to disk.
So by default, Sun shipped some disks with write cache disabled by default. For 
non-Sun
disks, they are most often shipped with write cache enabled and the most 
popular file
systems (NTFS) properly issue cache flush requests as needed (for the same 
reason ZFS
issues cache flush requests).


Out of interest, how do you enable the write cache on a disk? I recently 
replaced a failing Dell-branded disk on a Dell server with an HP-branded 
disk (both disks were the identical Seagate model) and on running the EFI 
diagnostics just to check all was well, it reported the write cache was 
disabled on the new HP disk but enabled on the remaining Dell disks in the 
server. I couldn't see any way of enabling the cache from the EFI diags so 
I left it as it was - probably not ideal.



With ZFS, the disk cache is used, but after every transaction a cache-flush 
command is issued to ensure that the data made it the platters.


Write cache is flushed after uberblock updates and for ZIL writes. This is 
important for
uberblock updates, so the uberblock doesn't point to a garbaged MOS. It is 
important
for ZIL writes, because they must be guaranteed written to media before ack.


Thanks for the explanation, that all makes sense now.

Andy


 If you slice the disk, enabling the disk cache for the whole disk is dangerous 
because other file systems (meaning UFS) wouldn't do the cache-flush and there 
was a risk for data-loss should the cache fail due to, say a power outage.
Can't speak to how BSD deals with the disk cache.


I looked at a server earlier this week that was running FreeBSD 8.0 and had 2 x 
1 Tb SAS disks in a ZFS 13 mirror with a third identical disk as a spare. Large 
file I/O throughput was OK but the mail jail it hosted had periods when it was 
very slow with accessing lots of small files. All three disks (the two in the 
ZFS mirror plus the spare) had been partitioned with gpart so that partition 1 
was a 6 GB swap and partition 2 filled the rest of the disk and had a 
'freebsd-zfs' partition on it. It was these second partitions that were part of 
the mirror.

This doesn't sound like a very good idea to me as surelt disk seeks for swap 
and for ZFS file I/O are bound to clash. aren't they?


It surely would make a slow, memory starved swapping system even 
slower.  :)


Another point about the Sun ZFS paper - it mentioned optimum performance would 
be obtained with RAIDz pools if the number of disks was between 3 and 9. So 
I've always limited my pools to a maximum of 9 active disks plus spares but the 
other day someone here was talking of seeing hundreds of disks in a single 
pool! So what is the current advice for ZFS in Solaris and FreeBSD?


That number was drives per vdev, not per pool.

-Phil
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--

richard.ell...@richardelling.com
+1-760-896-4422













-
Andy Thomas,
Time Domain Systems

Tel: +44 (0)7866 556626
Fax: +44 (0)20 8372 2582
http://www.time-domain.co.uk
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss