Re: [zfs-discuss] Maximum zfs send/receive throughput

2010-06-25 Thread Thomas Maier-Komor
On 25.06.2010 14:32, Mika Borner wrote:
 
 It seems we are hitting a boundary with zfs send/receive over a network
 link (10Gb/s). We can see peak values of up to 150MB/s, but on average
 about 40-50MB/s are replicated. This is far away from the bandwidth that
 a 10Gb link can offer.
 
 Is it possible, that ZFS is giving replication a too low
 priority/throttling it too much?
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

you can probably improve overall performance by using mbuffer [1] to
stream the data over the network. At least some people have reported
increased performance. mbuffer will buffer the datastream and disconnect
zfs send operations from network latencies.

Get it there:
original source: http://www.maier-komor.de/mbuffer.html
binary package:  http://www.opencsw.org/packages/CSWmbuffer/

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs corruptions in pool

2010-06-06 Thread Thomas Maier-Komor
On 06.06.2010 08:06, devsk wrote:
 I had an unclean shutdown because of a hang and suddenly my pool is degraded 
 (I realized something is wrong when python dumped core a couple of times).
 
 This is before I ran scrub:
 
   pool: mypool
  state: DEGRADED
 status: One or more devices has experienced an error resulting in data
 corruption.  Applications may be affected.
 action: Restore the file in question if possible.  Otherwise restore the
 entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0h7m with 0 errors on Mon May 31 09:00:27 2010
 config:
 
 NAMESTATE READ WRITE CKSUM
 mypool  DEGRADED 0 0 0
   c6t0d0s0  DEGRADED 0 0 0  too many errors
 
 errors: Permanent errors have been detected in the following files:
 
 mypool/ROOT/May25-2010-Image-Update:0x3041e
 mypool/ROOT/May25-2010-Image-Update:0x31524
 mypool/ROOT/May25-2010-Image-Update:0x26d24
 mypool/ROOT/May25-2010-Image-Update:0x37234
 //var/pkg/download/d6/d6be0ef348e3c81f18eca38085721f6d6503af7a
 mypool/ROOT/May25-2010-Image-Update:0x25db3
 //var/pkg/download/cb/cbb0ff02bcdc6649da3763900363de7cff78ec72
 mypool/ROOT/May25-2010-Image-Update:0x26cf6
 
 
 I ran scrub and this is what it has to say afterwards.
 
   pool: mypool
  state: DEGRADED
 status: One or more devices has experienced an unrecoverable error.  An
 attempt was made to correct the error.  Applications are unaffected.
 action: Determine if the device needs to be replaced, and clear the errors
 using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
  scan: scrub repaired 0 in 0h11m with 0 errors on Sat Jun  5 22:43:54 2010
 config:
 
 NAMESTATE READ WRITE CKSUM
 mypool  DEGRADED 0 0 0
   c6t0d0s0  DEGRADED 0 0 0  too many errors
 
 errors: No known data errors
 
 Few of questions:
 
 1. Have the errors really gone away? Can I just clear and be content that 
 errors are really gone?
 
 2. Why did the errors occur anyway if ZFS guarantees on-disk consistency? I 
 wasn't writing anything. Those files were definitely not being touched when 
 the hang and unclean shutdown happened.
 
 I mean I don't mind if I create or modify a file and it doesn't land on disk 
 because on unclean shutdown happened but a bunch of unrelated files getting 
 corrupted, is sort of painful to digest.
 
 3. The action says Determine if the device needs to be replaced. How the 
 heck do I do that?


Is it possible that this system runs on a virtual box? At least I've
seen such a thing happen on a Virtual Box but never on a real machine.

The reason why the error have gone away might be that meta data has
three copies IIRC. So if your disk only had corruptions in the meta data
area these errors can be repaired by scrubbing the pool.

The smartmontools might help you figuring out if the disk is broken. But
if you only had an unexpected shutdown and now everything is clean after
a scrub, I wouldn't expect the disk to be broken. You can get the
smartmontools from opencsw.org.

If your system is really running on a Virtual Box I'd recommend that you
turn of disk write caching of Virtual Box. Search the OpenSolaris forum
of Virtual Box. There is an article somewhere how to do this. IIRC the
subject is somethink like 'zfs pool curruption'. But it is also
somewhere in the docs.

HTH,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fileserver help.

2010-04-13 Thread Thomas Maier-Komor
On 13.04.2010 10:12, Ian Collins wrote:
 On 04/13/10 05:47 PM, Daniel wrote:
 Hi all.

 Im pretty new to the whole OpenSolaris thing, i've been doing a bit of
 research but cant find anything on what i need.

 I am thinking of making myself a home file server running OpenSolaris
 with ZFS and utilizing Raid/Z

 I was wondering if there is anything i can get that will allow Windows
 Media Center based hardware (HTPC or XBOX 360) to steam from my new
 fileserver?

 Any help is appreciated and remember im new :)

 OpenSolaris as a native CIFS service, which enables sharing filesystems
 to windows clients.
 
 I used this blog entry to setup my windows shares:
 
 http://blogs.sun.com/timthomas/entry/solaris_cifs_in_workgroup_mode
 
 With OpenSolaris, you can get the SMB server with the package manager GUI.
 

I guess Daniel is rather looking for a UPnP Media Server [1]  like
ushare or coherence that is able to transcode media files and hand them
out to streaming clients.

I have been trying to get this up and running on a Solaris 10 based
SPARC box, but I had no luck. I am not sure if the problem is my
streaming client (Philips TV), because my FritzBox, which has a
streaming server is also not always visible on the Philips TV. But the
software running on the Solaris box never showed up as a service
provider on the TV...

Anyway, this was on Solaris 10, and I didn't bother too much to get it
setup and running on OpenSolaris. There might be even a package
available in the repository. Just look for the candidates like ushare,
coherence, and of course libupnp. If those aren't available you'll have
to build by hand, I guess this will also require some portability work.

Cheers,
Thomas


[1] http://en.wikipedia.org/wiki/UPnP_AV_MediaServers
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reclaiming Windows Partitions

2010-04-07 Thread Thomas Maier-Komor
On 07.04.2010 18:05, Ron Marshall wrote:
 I finally decided to get rid of my Windows XP partition as I rarely used it 
 except to fire it up to install OS updates and virus signatures.  I had some 
 trouble locating information on how to do this so I thought I'd document it 
 here.
 
 My system is Toshiba Tecra M9.  It had four partitions on it. 
 
 Partition 1 - NTFS Windows XP OS (Drive C:)
 Partition 2 - NTFS Windows data partition (D:)
 Partition 3 - FAT32
 Partition 4 - Solaris2
 
 Partition 1 and 2 where laid down by my company's standard OS install.  I had 
 shrunk these using QTparted to enable me to install OpenSolaris.
 
 Partition 3 was setup to have a common file system mountable by OpenSolaris 
 and Windows.  There may be ways to do this with NTFS now, but this was a 
 legacy from older Solaris installs.
 
 Partition 4 is my OpenSolaris ZFS install
 
 Step 1) Backuped up all my data from Partition 3, and any files I needed from 
 Partition 1 and 2.  I also had a current snapshot of my OpenSolaris partition 
 (Partition 4).
 
 Step 2) Delete Partitions 1,2, and 3.  I did this using fdisk option in 
 format under Opensolaris.
 
format - Select Disk 0 (make note of the short drive name alias, mine was 
 c4t0d0)
 
 You will receive a warning something like this;
 [disk formatted]
 /dev/dsk/c4t0d0s0 is part of active ZFS pool rpool. Please see zpool(1M)
 
 Then select fdisk from the FORMAT MENU
 
 You will see something like this;
 
 Total disk size is 14593 cylinders
  Cylinder size is 16065 (512 byte) blocks
 
   
Cylinders
   PartitionStatus Type
 Start End Length%
   =   ==  =   ===   ==   ===
   1  FAT32LBA 
   x xx
   2  FAT32LBA 
   xx
   3  Win95 FAT32  
 5481  8157267 18
   4  Active Solaris2  
 8158  145796422  44
 
 
 
 SELECT ONE OF THE FOLLOWING:
1. Create a partition
2. Specify the active partition
3. Delete a partition
4. Change between Solaris and Solaris2 Partition IDs
5. Edit/View extended partitions
6. Exit (update disk configuration and exit)
7. Cancel (exit without updating disk configuration)
 Enter Selection: 
 
 Delete the partitions 1, 2 and 3 (Don't forget to back them up before you do 
 this)
 
 Using the fdisk menu create a new Solaris2 partition for use by ZFS.  When 
 you are done you should see something like this;
 
Cylinder size is 16065 (512 byte) blocks
 
Cylinders
   Partition   Status  Type
 Start End Length%
   =   ==  =   ===   ==   ===
   1  Solaris2 
   1  81578157  56
   4  Active  Solaris2 
 8158  14579  6422  44
 
 Exit and update the disk configuration.
 
 
 Step 3) Create the ZFS pool
 
 First you can test if zpool will be successful in creating the pool by using 
 the -n option;
 
  zpool create -n datapool c4t0d0p1  (I will make some notes about this 
 disk name at the end)
 
 Should report something like;
 
 would create 'datapool' with the following layout:
 
   datapool
   c4t0d0p1
 
 By default the zpool command will make a mount-point in your root  / with 
 the same name as your pool.  If you don't want this you can change that in 
 the create command (see the man page for details)
 
 
 Now issue the command without the -n option;
 
zpool create  datapool c4t0d0p1
 
 Now check to see if it is there;
 
   zpool list
 
 It should report something like this;
 
 NAME SIZE  ALLOC   FREECAPDEDUP  HEALTH  ALTROOT
 datapool  62G  30.7G   31.3G49%  1.06x ONLINE   -
 rpool49G  43.4G   5.65G88%  1.00x ONLINE   -
 
 Step 4) Remember to take any of the mount parameters out of your /etc/vfstab 
 file.
 
 You should be good to go at this point. 
 ==
 Notes about disk/partition naming;
 
 In my case the disk is called c4t0d0.  So how did I come up with c4t0d0p1?
 
 The whole disk name is c4t0d0p0.  Each partition is has the following naming 
 convention;
 
 Partition 1 = c4t0d0p1
 Partition 2 = c4t0d0p2
 Partition 3 = c4t0d0p3
 Partition 4 = c4t0d0p4
 
 The fdisk  command does not 

Re: [zfs-discuss] unionfs help

2010-02-04 Thread Thomas Maier-Komor
On 04.02.2010 12:12, dick hoogendijk wrote:
 
 Frank Cusack wrote:
 Is it possible to emulate a unionfs with zfs and zones somehow?  My zones
 are sparse zones and I want to make part of /usr writable within a zone.
 (/usr/perl5/mumble to be exact)
 
 Why don't you just export that directory with NFS (rw) to your sparse zone
 and mount it on /usr/perl5/mumble ? Or is this too simple a thought?
 
What about lofs? I thinks lofs is the equivalent for unionfs on Solaris.

E.g.

mount -F lofs /originial/path /my/alternate/mount/point

- Thomas

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cannot attach c5d0s0 to c4d0s0: device is too small

2010-01-28 Thread Thomas Maier-Komor
On 28.01.2010 15:55, dick hoogendijk wrote:
 
 Cindy Swearingen wrote:
 
 On some disks, the default partitioning is not optimal and you have to
 modify it so that the bulk of the disk space is in slice 0.
 
 Yes, I know, but in this case the second disk indeed is smaller ;-(
 So I wonder, should I reinstall the whole thing on this smaller disk and
 thren let the bigger second attach? That would mean opening up the case
 and all that, because I don't have a DVD player built in.
 So I thought I'd go the zfs send|recv way. What are yout thoughts about this?
 
 Another thought is that a recent improvement was that you can attach a
 disk that is an equivalent size, but not exactly the same geometry.
 Which OpenSolaris release is this?
 
 b131
 And this only works if the difference is realy (REALLY) small. :)
 

have you considered creating an alternate boot environment on the
smaller disk, rebooting into this new boot environment, and then
attaching the larger disk after destroy the old boot environment?

beadm might do this job for you...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS dedup clarification

2009-11-27 Thread Thomas Maier-Komor
Chavdar Ivanov schrieb:
 Hi,
 
 I BFUd successfully snv_128 over snv_125:
 
 ---
 # cat /etc/release 
   Solaris Express Community Edition snv_125 X86
Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
 Use is subject to license terms.
 Assembled 05 October 2009
 # uname -a 
 SunOS cheeky 5.11 snv_128 i86pc i386 i86pc
 ...
 
 being impatient to test zfs dedup. I was able to set dedup=on (I presume with 
 the default sha256 key) on a few filesystems and did the following trivial 
 test (this is an edited script session):
 
 Script started on Wed Oct 28 09:38:38 2009
 # zfs get dedup rpool/export/home
 NAME   PROPERTY  VALUE SOURCE
 rpool/export/home  dedup onlocal
 # for i in 1 2 3 4 5 ; do mkdir /export/home/d${i}  df -k 
 /export/home/d${i}  zfs get used rpool/export/home  cp /testfile 
 /export/home/d${i}; done 
 Filesystemkbytesused   avail capacity  Mounted on
 rpool/export/home17418240  27 6063425 1%/export/home
 NAME   PROPERTY  VALUE  SOURCE
 rpool/export/home  used  27K-
 Filesystemkbytesused   avail capacity  Mounted on
 rpool/export/home17515512  103523 6057381 2%/export/home
 NAME   PROPERTY  VALUE  SOURCE
 rpool/export/home  used  102M   -
 Filesystemkbytesused   avail capacity  Mounted on
 rpool/export/home17682840  271077 6056843 5%/export/home
 NAME   PROPERTY  VALUE  SOURCE
 rpool/export/home  used  268M   -
 Filesystemkbytesused   avail capacity  Mounted on
 rpool/export/home17852184  442345 6054919 7%/export/home
 NAME   PROPERTY  VALUE  SOURCE
 rpool/export/home  used  432M   -
 Filesystemkbytesused   avail capacity  Mounted on
 rpool/export/home17996580  587996 6053933 9%/export/home
 NAME   PROPERTY  VALUE  SOURCE
 rpool/export/home  used  574M   -
 # zfs get all rpool/export/home
 NAME   PROPERTY  VALUE  SOURCE
 rpool/export/home  type  filesystem -
 rpool/export/home  creation  Mon Sep 21  9:27 2009  -
 rpool/export/home  used  731M   -
 rpool/export/home  available 5.77G  -
 rpool/export/home  referenced731M   -
 rpool/export/home  compressratio 1.00x  -
 rpool/export/home  mounted   yes-
 rpool/export/home  quota none   default
 rpool/export/home  reservation   none   default
 rpool/export/home  recordsize128K   default
 rpool/export/home  mountpoint/export/home   inherited 
 from rpool/export
 rpool/export/home  sharenfs  offdefault
 rpool/export/home  checksum  on default
 rpool/export/home  compression   offdefault
 rpool/export/home  atime on default
 rpool/export/home  devices   on default
 rpool/export/home  exec  on default
 rpool/export/home  setuidon default
 rpool/export/home  readonly  offdefault
 rpool/export/home  zoned offdefault
 rpool/export/home  snapdir   hidden default
 rpool/export/home  aclmode   groupmask  default
 rpool/export/home  aclinheritrestricted default
 rpool/export/home  canmount  on default
 rpool/export/home  shareiscsioffdefault
 rpool/export/home  xattr on default
 rpool/export/home  copies1  default
 rpool/export/home  version   4  -
 rpool/export/home  utf8only  off-
 rpool/export/home  normalization none   -
 rpool/export/home  casesensitivity   sensitive  -
 rpool/export/home  vscan offdefault
 rpool/export/home  nbmandoffdefault
 rpool/export/home  sharesmb  offdefault
 rpool/export/home  refquota  none   default
 rpool/export/home  refreservationnone   default
 rpool/export/home  primarycache  alldefault
 rpool/export/home  secondarycachealldefault
 rpool/export/home  usedbysnapshots   0  -
 rpool/export/home  usedbydataset 

Re: [zfs-discuss] ZFS dedup clarification

2009-11-27 Thread Thomas Maier-Komor
Michael Schuster schrieb:
 Thomas Maier-Komor wrote:
 
 Script started on Wed Oct 28 09:38:38 2009
 # zfs get dedup rpool/export/home
 NAME   PROPERTY  VALUE SOURCE
 rpool/export/home  dedup onlocal
 # for i in 1 2 3 4 5 ; do mkdir /export/home/d${i}  df -k
 /export/home/d${i}  zfs get used rpool/export/home  cp /testfile
 /export/home/d${i}; done 
 
 

 as far as I understood it, the dedup works during writing, and won't
 deduplicate already written data (this is planned for a later release).
 
 isn't he doing just that (writing, that is)?
 
 Michael

Oh - I overlooked this very one line...
Maybe zfs's used property returns the accumulated usage, and only zpool
shows the real usage?

- Thomas

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] raidz-1 vs mirror

2009-11-11 Thread Thomas Maier-Komor
Hi everybody,

I am considering moving my data pool from a two disk (10krpm) mirror
layout to a three disk raidz-1. This is just a single user workstation
environment, where I mostly perform compile jobs. From past experiences
with raid5 I am a little bit reluctant to do so, as software raid5 has a
major impact on write performance.

Is this similar with raidz-1 or does the zfs stack work around the
limitations that come with raid5 into play? How big would the penalty be?

As an alternative I could swap the drives for bigger ones - but these
would probably then be 7.2k rpm discs, because of costs.

Any experiences or thoughts?

TIA,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool resilver - error history

2009-11-09 Thread Thomas Maier-Komor
Marcel Gschwandl schrieb:
 Hi all!
 
 I'm running a Solaris 10 Update 6 (10/08) system and had to resilver a zpool. 
 It's now showing 
 
 snip
 scrub: resilver completed after 9h0m with 21 errors on Wed Nov  4 22:07:49 
 2009
 /snip
 
 but I haven't found an option to see what files where affected, Is there any 
 way to do that?
 
 Thanks in advance
 
 Marcel

Try

zpool status -v poolname
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zdb assertion failure/zpool recovery

2009-08-02 Thread Thomas Maier-Komor
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

I have a corrupt pool, which lives on a .vdi file of a VirtualBox. IIRC
the corruption (i.e. pool being not importable) was caused when I killed
virtual box, because it was hung.

This pool consists of a single vdev and I would really like to get some
files out of that thing. So I tried running zdb, but this fails with an
assertion failure:

Assertion failed: object_count == usedobjs (0xce == 0xcd), file
../zdb.c, line 1215
Abort (core dumped)

The core file consists of 107 threads. The thread with the assertion
failure has the following stack trace:
- -  lwp# 1 / thread# 1  
 d2af1997 _lwp_kill (1, 6, 8047638, d2a9ab6e) + 7
 d2a9ab7a raise(6, 0, 8047688, d2a71fea) + 22
 d2a7200a abort(65737341, 6f697472, 6166206e, 64656c69, 626f203a,
7463656a) + f2
 d2a7225a _assert  (80478b0, 8062eec, 4bf, 1) + 82
 08057116 dump_dir (82cadd8, 8047dac, 805aff8, 0) + 33e
 08058b4f dump_zpool (81740c0, 8047dac) + 93
 0805a1d8 main (0, 8047e0c, 8047e24, d2bfc7b4) + 598
 08053d1d _start   (5, 8047ec4, 8047ec8, 8047ecb, 8047ed0, 8047ed3) + 7d


So for me this looks like the object_count of a directory is
inconsistent. Any idea or hint what I could do now?

I read that there is some utility to roolback the pool for simple
(mirror) setups. This setup is even more simple as it consists of a
single vdev. So I would like to try it out.

Does anybody know, where I can get the tool, or how I could use zdb in
this situation to rollback the pool?

TIA,
Thomas
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.8 (SunOS)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkp1nX8ACgkQ6U+hp8PKQZLdaQCfQgCDStLqYX16D8HqeL9McjPT
G78An2aD6P6aTlsM9YfpxyMP8BUkzXQ1
=v/iN
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [osol-discuss] zdb assertion failure/zpool recovery

2009-08-02 Thread Thomas Maier-Komor
Thomas Maier-Komor wrote:
 Hi,
 
 I have a corrupt pool, which lives on a .vdi file of a VirtualBox. IIRC
 the corruption (i.e. pool being not importable) was caused when I killed
 virtual box, because it was hung.
 
 This pool consists of a single vdev and I would really like to get some
 files out of that thing. So I tried running zdb, but this fails with an
 assertion failure:
 
 Assertion failed: object_count == usedobjs (0xce == 0xcd), file
 ../zdb.c, line 1215
 Abort (core dumped)
 
 The core file consists of 107 threads. The thread with the assertion
 failure has the following stack trace:
 -  lwp# 1 / thread# 1  
  d2af1997 _lwp_kill (1, 6, 8047638, d2a9ab6e) + 7
  d2a9ab7a raise(6, 0, 8047688, d2a71fea) + 22
  d2a7200a abort(65737341, 6f697472, 6166206e, 64656c69, 626f203a,
 7463656a) + f2
  d2a7225a _assert  (80478b0, 8062eec, 4bf, 1) + 82
  08057116 dump_dir (82cadd8, 8047dac, 805aff8, 0) + 33e
  08058b4f dump_zpool (81740c0, 8047dac) + 93
  0805a1d8 main (0, 8047e0c, 8047e24, d2bfc7b4) + 598
  08053d1d _start   (5, 8047ec4, 8047ec8, 8047ecb, 8047ed0, 8047ed3) + 7d
 
 
 So for me this looks like the object_count of a directory is
 inconsistent. Any idea or hint what I could do now?
 
 I read that there is some utility to roolback the pool for simple
 (mirror) setups. This setup is even more simple as it consists of a
 single vdev. So I would like to try it out.
 
 Does anybody know, where I can get the tool, or how I could use zdb in
 this situation to rollback the pool?
 
 TIA,
 Thomas

I've searched the web some more and came across
http://www.opensolaris.org/jive/thread.jspa?threadID=85794

The posting by nhand gave me the information I needed to get my pool up
and running again.

Thanks!

- Thomas

___
opensolaris-discuss mailing list
opensolaris-disc...@opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] assertion failure

2009-07-17 Thread Thomas Maier-Komor
Hi,

I am just having trouble with my opensolaris in a virtual box. It
refuses to boot with the following crash dump:

panic[cpu0]/thread=d5a3edc0: assertion failed: 0 ==
dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, numbufs,
dbp), file: ../../common/fs/zfs/dmu.c, line: 614

d5a3eb08 genunix:assfail+5a (f9ce09da4, f9ce0a9c)
d5a3eb68 zfs:dmu_write+1a0 (d55af620, 57, 0, ba)
d5a3ec08 zfs:space_map_sync+304 (d5f13ed4, 1, d5f13c)
d5a3ec7b zfs:metaslab_sync+284 (d5f1ecc0, 122f3, 0,)
d5a3ecb8 zfs:vdev_sync+c6 (d579d940, 122f3,0)
d5a3ed28 zfs:spa_sync+3d0 (d579c980, 122f3,0,)
d5a3eda8 zfs:txg_sync_thread+308 (d55045c0, 0)
d5a3edb8 unix:thread_start+8 ()

This is on snv_117 32-bit

Is this a known issue? Any workarounds?

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of ZFS mirror

2009-06-24 Thread Thomas Maier-Komor
dick hoogendijk schrieb:
 On Wed, 24 Jun 2009 03:14:52 PDT
 Ben no-re...@opensolaris.org wrote:
 
 If I detach c5d1s0, add a 1TB drive, attach that, wait for it to
 resilver, then detach c5d0s0 and add another 1TB drive and attach
 that to the zpool, will that up the storage of the pool?
 
 That will do the trick perfectly. I just did the same last week ;-)
 

Doesn't detaching render the detach disk command the detached disk as a
disk unassociated with a pool? I think it might be better to import the
pool with only one half of the mirror without detaching the disk, and
the do a zpool replace. In this case if something goes wrong during
resilver, you still have the other half of the mirror to bring your pool
back up again. If you detach the disk upfront this won't be possible.

Just an idea...

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-23 Thread Thomas Maier-Komor
Andre van Eyssen schrieb:
 On Mon, 22 Jun 2009, Jacob Ritorto wrote:
 
 Is there a card for OpenSolaris 2009.06 SPARC that will do SATA
 correctly yet?  Need it for a super cheapie, low expectations,
 SunBlade 100 filer, so I think it has to be notched for 5v PCI slot,
 iirc. I'm OK with slow -- main goals here are power saving (sleep all
 4 disks) and 1TB+ space.  Oh, and I hate to be an old head, but I
 don't want a peecee.  They still scare me :)  Thinking root pool on
 16GB ssd, perhaps, so the thing can spin down the main pool and idle
 *really* cheaply..
 
 The LSI SAS controllers with SATA ports work nicely with SPARC. I have
 one in my V880. On a Blade-100, however, you might have some issues due
 to the craptitude of the PCI slots.
 
 To be honest, the Grover was a fun machine at the time, but I think that
 time may have passed.
 
 Oh, and if you do grab the LSI card, don't let James catch you using the
 itmpt driver or lsiutils ;-)
 

I'm also using an LSI SAS card for attaching sata disks to a Blade 2500.
In my experience there is some severe problems:
1) Once the disks spin down due to idleness it can become impossible to
reactivate them without doing a full reboot (i.e. hot plugging won't help)
2) disks that were attached once leave a stale /dev/dsk entry behind
that takes full 7 seconds to stat() with kernel running at 100%.

Apart from that it works fine.

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPARC SATA, please.

2009-06-23 Thread Thomas Maier-Komor
Volker A. Brandt schrieb:
 2) disks that were attached once leave a stale /dev/dsk entry behind
 that takes full 7 seconds to stat() with kernel running at 100%.
 Such entries should go away with an invocation of devfsadm -vC.
 If they don't, it's a bug IMHO.
 yes, they go away. But the problem is when you do this and replug the
 disks they don't show up again... And that's even worse IMO...
 
 So you want such disks to behave more like USB sticks?  If there was
 a good way to mark certain devices or a device tree as volatile
 then this would be an interesting RFE.  I would certainly not want
 *all* of my disks to come and go as they please. :-)
 
 I am not sure how feasible an implementation would be though.
 
 
 Regards -- Volker

yes - that's my usage scenario. Or to be more precise I have a small
chassis with two disks, which I only want to attach for backup purposes.
I just send/receive from my active pool to the backup pool, and then
detach the backup pool. I just like having backup disks being physically
detached when not in use. Like this, nothing can really screw them up
but a fire in the room...

I thought SAS/SATA would be hot-pluggable - so what's the difference
between USB's hot-plug feature and the one of SAS/SATA other that that
USB is handled by the volume manager?

So, yes, it would be nice if one could assign a SATA disk to the volume
manager.

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] replication issue

2009-06-15 Thread Thomas Maier-Komor
Hi,

I just tried replicating a zfs dataset, which failed because the dataset
has a mountpoint set and zfs received tried to mount the target dataset
to the same directory.

I.e. I did the following:
$ zfs send -R mypool/h...@20090615 | zfs receive -d backup
cannot mount '/var/hg': directory is not empty

Is this a known issue or is this a user error because of -d on the
receiving side?

This happened on:
% uname -a
SunOS azalin 5.10 Generic_139555-08 sun4u sparc SUNW,Sun-Blade-2500

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Monitoring ZFS host memory use

2009-05-06 Thread Thomas Maier-Komor
Troy Nancarrow (MEL) schrieb:
 Hi,
  
 Please forgive me if my searching-fu has failed me in this case, but
 I've been unable to find any information on how people are going about
 monitoring and alerting regarding memory usage on Solaris hosts using ZFS.
  
 The problem is not that the ZFS ARC is using up the memory, but that the
 script Nagios is using to check memory usage simply sees, say 96% RAM
 used, and alerts.
 The options I can see, and the risks I see with them, are:
 1) Raise the alert thresholds so that they are both (warn and crit)
 above the maximum that the ARC should let itself be.  The problem is I
 can see those being in the order of 98/99% which doesn't leave a lot of
 room for response if memory usage is headed towards 100%.
 2) Alter the warning script to ignore the ARC cache and do alerting
 based on what's left. Perhaps with a third threshold somewhere above
 where the ARC should let things get, in case for some reason the ARC
 isn't returning memory to apps. The risk I see here is that ignoring the
 ARC may present other odd scenarios where I'm essentially ignoring
 what's causing the memory problems.
  
 So how are others monitoring memory usage on ZFS servers?
  
 I've read (but can't find a written reference) that the ARC limits
 itself such that 1GB of memory is always free.  Is that a hard coded
 number? Is there a bit of leeway around it or can I rely on that exact
 number of bytes being free unless there is impending 100% memory
 utilisation?
  
 Regards,
  
 
 *TROY NANCARROW*
 

the ZFS evil tuning guide contains a description how to limit the arc
size. Look here:
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Solaris_10_8.2F07_and_Solaris_Nevada_.28snv_51.29_Releases

Concerning monitoring of ARC size, I use (of course) my own tool called
sysstat. It shows all key system metrics on one terminal page similar to
top. You can get it there: http://www.maier-komor.de/sysstat.html

HTH,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ext4 bug zfs handling of the very same situation

2009-03-11 Thread Thomas Maier-Komor
Hi,

there was recently a bug reported against EXT4 that gets triggered by
KDE: https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781

Now I'd like to verify that my understanding of ZFS behavior and
implementations is correct, and ZFS is unaffected from this kind of
issue. Maybe somebody would like to comment on this.

The underlying problem with ext4 is that some kde executables do
something like this:
1a) open and read data from file x, close file x
1b) open and truncate file x
1c) write data to file x
1d) close file x

or

2a) open and read data from file x, close file x
2b) open and truncate file x.new
2c) write data to file x.new
2d) close file x.new
2e) rename file x.new to file x

Concerning case 1) I think ZFS may lose data if power is lost right
after 1b) and open(xxx,O_WRONLY|O_TRUNC|O_CREAT) is issued in a
transaction group separately from the one containing 1c/1d.

Concerning case 2) I cannot see ZFS losing any data, because of
copy-on-write and transaction grouping.

Theodore Ts'o (ext4 developer) commented that both cases are flawed and
cannot be supported correctly, because of a lacking fsync() before
close. Is this correct? His comment is over here:
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54

Any thoughts or comments?

TIA,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to use mbuffer with zfs send/recv

2008-12-07 Thread Thomas Maier-Komor
Julius Roberts wrote:
 How do i compile mbuffer for our system,
 
 Thanks to Mike Futerko for help with the compile, i now have it installed OK.
 
  and what syntax to i use to invoke it within the zfs send recv?
 
 Still looking for answers to this one?  Any example syntax, gotchas
 etc would be much appreciated.
 

First start the receive side, then the sender side:

receiver mbuffer -s 128k -m 200M -I sender:8000 | zfs receive filesystem

sender zfs send pool/filesystem | mbuffer -s 128k -m 200M -O receiver:8000

Of course, you should adjust the hostnames accordingly, and set the
mbuffer buffer size to a value that fits your needs (option -m).

BTW: I've just released a new version of mbuffer which defaults to TCP
buffer size of 1M, which can be adjusted with option --tcpbuffer.

Cheers,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs recv' is very slow

2008-11-15 Thread Thomas Maier-Komor

 
 Seems like there's a strong case to have such a program bundled in Solaris.
 

I think, the idea of having a separate configurable buffer program with a high 
feature set fits into UNIX philosophy of having small programs that can be used 
as building blocks to solve larger problems.

mbuffer is already bundled with several Linux distros. And that is also the 
reason its feature set expanded over time. In the beginning there wasn't even 
support for network transfers.

Today mbuffer supports direct transfer to multiple receivers, data transfer 
rate limitation, high/low water mark algorithm, on the fly md5 calculation, 
multi volume tape access, usage of sendfile, and has a configurable buffer 
size/layout.

So ZFS send/receive is just another use case for this tool.

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs recv' is very slow

2008-11-14 Thread Thomas Maier-Komor
Joerg Schilling schrieb:
 Andrew Gabriel [EMAIL PROTECTED] wrote:
 
 That is exactly the issue. When the zfs recv data has been written, zfs 
 recv starts reading the network again, but there's only a tiny amount of 
 data buffered in the TCP/IP stack, so it has to wait for the network to 
 heave more data across. In effect, it's a single buffered copy. The 
 addition of a buffer program turns it into a double-buffered (or cyclic 
 buffered) copy, with the disks running flat out continuously, and the 
 network streaming data across continuously at the disk platter speed.
 
 rmt and star increase the Socket read/write buffer size via
 
 setsockopt(STDOUT_FILENO, SOL_SOCKET, SO_SNDBUF, 
 setsockopt(STDIN_FILENO, SOL_SOCKET, SO_RCVBUF,
 
 when doing remote tape access.
 
 This has a notable effect on throughput.
 
 Jörg
 

yesterday, I've release a new version of mbuffer, which also enlarges
the default TCP buffer size. So everybody using mbuffer for network data
transfer might want to update.

For everybody unfamiliar with mbuffer, it might be worth to note that it
has a bunch of additional features like e.g. sending to multiple clients
at once, high/low watermark flushing to prevent tape drives from
stop/rewind/restart cycles.

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mbuffer WAS'zfs recv' is very slow

2008-11-14 Thread Thomas Maier-Komor
Jerry K schrieb:
 Hello Thomas,
 
 What is mbuffer?  Where might I go to read more about it?
 
 Thanks,
 
 Jerry
 
 
 

 yesterday, I've release a new version of mbuffer, which also enlarges
 the default TCP buffer size. So everybody using mbuffer for network data
 transfer might want to update.

 For everybody unfamiliar with mbuffer, it might be worth to note that it
 has a bunch of additional features like e.g. sending to multiple clients
 at once, high/low watermark flushing to prevent tape drives from
 stop/rewind/restart cycles.

 - Thomas

The man page is included in the source, which you get over here:
http://www.maier-komor.de/mbuffer.html

New release are announce on freshmeat.org.

Maybe I should add an html of the man page to the homepage of mbuffer...

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs recv' is very slow

2008-11-14 Thread Thomas Maier-Komor

- original Nachricht 

Betreff: Re: [zfs-discuss] 'zfs recv' is very slow
Gesendet: Fr, 14. Nov 2008
Von: Bob Friesenhahn[EMAIL PROTECTED]

 On Fri, 14 Nov 2008, Joerg Schilling wrote:
 
  On my first Sun at home (a Sun 2/50 with 1 MB of RAM) in 1986, I could
  set the socket buffer size to 63 kB. 63kB : 1 MB is the same ratio
  as 256 MB : 4 GB.
 
  BTW: a lot of numbers in Solaris did not grow since a long time and
  thus create problems now. Just think about the maxphys values
  63 kB on x86 does not even allow to write a single BluRay disk sector
  with a single transfer.
 
 Bloating kernel memory is not the right answer.  Solaris comes with a 
 quite effective POSIX threads library (standard since 1996) which 
 makes it easy to quickly shuttle the data into a buffer in your own 
 application.  One thread deals with the network while the other thread 
 deals with the device.  I imagine that this is what the supreme 
 mbuffer program is doing.
 
 Bob

Basically, mbuffer just does this - but it additionally has a whole bunch of 
extra functionality. At least there are people who use it to lengthen the live 
of their tape drives with the high/low watermark feature...  

Thomas
--- original Nachricht Ende 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-11-12 Thread Thomas Maier-Komor
Roch schrieb:
 Thomas, for long latency fat links, it should be quite
 beneficial to set the socket buffer on the receive side
 (instead of having users tune tcp_recv_hiwat).
 
 throughput of a tcp connnection is gated by 
 receive socket buffer / round trip time.
 
 Could that be Ross' problem ?
 
 -r
 
 

Hmm, I'm not a TCP expert, but that sounds absolutely possible, if
Solaris 10 isn't tuning the TCP buffer automatically. The default
receive buffer seems to be 48k (at least on a V240 running 118833-33).
So if the block size is something like 128k it would absolutely make
sense to tune the receive buffer to lower the rund trip time...

Ross: Would you like a patch to test if this is the case? Which version
of mbuffer are you currently using?

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS wrecked our system...

2008-11-04 Thread Thomas Maier-Komor
Christiaan Willemsen schrieb:
 
 do the disks show up as expected in format?

 Is your root pool just a single disk or is it a mirror of mutliple
 disks? Did you attach/detach any disks to the root pool before rebooting?
   
 No, we did nothing at all to the pools. The root pool is a hardware
 mirror, not a zfs mirror.
 
 Actually, it looks like Opensolaris can't find any of the disk.

There was recently a thread were someone had an issue importing a
known-to-be-healthy pool after a BIOS update. It turned out that the new
BIOS had a different host protected area on the disks and therefore
delivered a different disk size to OS. I'd check the controller and BIOS
settings that are concerned with disks. Any change in this area might
lead to this effect.

Additionally, I think it is not a good idea to use a RAID controller to
mirror disks for ZFS. Like this a silently corrupted sector cannot be
corrected by ZFS. In contrast if you give ZFS both disks as individual
disks and create a ZPOOL mirror, ZFS is able to detect corrupted sectors
and correct them from the health side of the mirror. A hardware mirror
will never know which side of the mirror is good and which is bad...


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS wrecked our system...

2008-11-04 Thread Thomas Maier-Komor
Christiaan Willemsen schrieb:
 Since the last reboot, our system wont boot anymore. It hangs at the Use is 
 subject to license terms. line for a few minutes, and then gives an error 
 that it can't find the device it needs for making the root pool, and 
 eventually reboots.
 
 We did not change anything to the system or to the Adaptec controller
 
 So I tried the OpenSolaris boot CD. It also takes a few minutes to boot (this 
 was never before the case), halting at the exact same line as the normal boot.
 
 It also complains about drives being offline, but this actually cannot be the 
 case, all drives are working fine..
 
 When I get to a console, and do a zpool import, it can't find any pool. There 
 should be two pools, one for booting, and another one for the data. 
 
 This is all on SNV_98...

do the disks show up as expected in format?

Is your root pool just a single disk or is it a mirror of mutliple
disks? Did you attach/detach any disks to the root pool before rebooting?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] change in zpool_get_prop_int

2008-11-04 Thread Thomas Maier-Komor
Hi,

I'm observing a change in the values returned by zpool_get_prop_int. In Solaris 
10 update 5 this function returned the values for ZPOOL_PROP_CAPACITY in bytes, 
but in update 6 (i.e. nv88?) it seems to be returning the value in kB.

Both Solaris versions were shipped with libzfs.so.2. So how can one distinguish 
between those two variants. 

Any comments on this change?

- Thomas
P.S.: I know this is a private interface, but it is quite handy for my system 
observation tool sysstat...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Building a 2nd pool, can I do it in stages?

2008-10-22 Thread Thomas Maier-Komor
Bob Friesenhahn schrieb:
 On Tue, 21 Oct 2008, Håvard Krüger wrote:
 
 Is it possible to build a RaidZ with 3x 1TB disks and 5x 0.5TB disks,
 and then swap out the 0.5 TB disks as time goes by? Is there a
 documentation/wiki on doing this?
 
 Yes, you can build a raidz vdev with all of these drives but only 0.5TB
 will be used from your 1TB drives.  Once you replace *all* of the 0.5TB
 drives with 1TB drives, then the full space of the 1TB drives will be used.
 
 Depending on how likely it is that you will replace all of these old
 drives, you might consider using the new drives to add a second vdev to
 the pool so that the disk space on all the existing drives may be fully
 used and you obtain better mutiuser performance.
 
 Bob

But in this case one should be aware that if one adds another vdev, it
is currently impossible to get rid of it afterwards. I.e. the pool will
always have to RaidZ vdefs, and the new vdev which would consist in this
scenario of 3 1T disks couldn't be grown by adding another disk. So one
would be forced to add another raid-z vdev.

IMO, I'd go for replacing the 0.5TB disks one by one and stick to a
single vdev.

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Thomas Maier-Komor
Thomas Maier-Komor schrieb:
 BTW: I release a new version of mbuffer today.

WARNING!!!

Sorry people!!!

The latest version of mbuffer has a regression that can CORRUPT output
if stdout is used. Please fall back to the last version. A fix is on the
way...

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Thomas Maier-Komor
Ross Smith schrieb:
 I'm using 2008-05-07 (latest stable), am I right in assuming that one is ok?
 
 
 Date: Wed, 15 Oct 2008 13:52:42 +0200
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]; zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] Improving zfs send performance

 Thomas Maier-Komor schrieb:
 BTW: I release a new version of mbuffer today.
 WARNING!!!

 Sorry people!!!

 The latest version of mbuffer has a regression that can CORRUPT output
 if stdout is used. Please fall back to the last version. A fix is on the
 way...

 - Thomas
 
 _
 Discover Bird's Eye View now with Multimap from Live Search
 http://clk.atdmt.com/UKM/go/111354026/direct/01/

Yes this one is OK. The regression appeared in 20081014.

- Thomas

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Thomas Maier-Komor
Ross schrieb:
 Hi,
 
 I'm just doing my first proper send/receive over the network and I'm getting 
 just 9.4MB/s over a gigabit link.  Would you be able to provide an example of 
 how to use mbuffer / socat with ZFS for a Solaris beginner?
 
 thanks,
 
 Ross
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

receiver mbuffer -I sender:1 -s 128k -m 512M | zfs receive

sender zfs send mypool/[EMAIL PROTECTED] | mbuffer -s 128k -m
512M -O receiver:1

BTW: I release a new version of mbuffer today.

HTH,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-14 Thread Thomas Maier-Komor
Carsten Aulbert schrieb:
 Hi Thomas,
 
 Thomas Maier-Komor wrote:
 
 Carsten,

 the summary looks like you are using mbuffer. Can you elaborate on what
 options you are passing to mbuffer? Maybe changing the blocksize to be
 consistent with the recordsize of the zpool could improve performance.
 Is the buffer running full or is it empty most of the time? Are you sure
 that the network connection is 10Gb/s all the way through from machine
 to machine?
 
 Well spotted :)
 
 right now plain mbuffer with plenty of buffer (-m 2048M) on both ends
 and I have not seen any buffer exceeding the 10% watermark level. The
 network connection are via Neterion XFrame II Sun Fire NICs then via CX4
 cables to our core switch where both boxes are directly connected
 (WovenSystmes EFX1000). netperf tells me that the TCP performance is
 close to 7.5 GBit/s duplex and if I use
 
 cat /dev/zero | mbuffer | socat --- socat | mbuffer  /dev/null
 
 I easily see speeds of about 350-400 MB/s so I think the network is fine.
 
 Cheers
 
 Carsten
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

I don't know socat or what benefit it gives you, but have you tried
using mbuffer to send and receive directly (options -I and -O)?
Additionally, try to set the block size of mbuffer to the recordsize of
zfs (usually 128k):
receiver$ mbuffer -I sender:1 -s 128k -m 2048M | zfs receive
sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:1

As transmitting from /dev/zero to /dev/null is at a rate of 350MB/s, I
guess, you are really hitting the maximum speed of your zpool. From my
understanding, I'd guess sending is always slower than receiving,
because reads are random and writes are sequential. So it should be
quite normal that mbuffer's buffer doesn't really see a lot of usage.

Cheers,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-14 Thread Thomas Maier-Komor
Carsten Aulbert schrieb:
 Hi again,
 
 Thomas Maier-Komor wrote:
 Carsten Aulbert schrieb:
 Hi Thomas,
 I don't know socat or what benefit it gives you, but have you tried
 using mbuffer to send and receive directly (options -I and -O)?
 
 I thought we tried that in the past and with socat it seemed faster, but
 I just made a brief test and I got (/dev/zero - remote /dev/null) 330
 MB/s with mbuffer+socat and 430MB/s with mbuffer alone.
 
 Additionally, try to set the block size of mbuffer to the recordsize of
 zfs (usually 128k):
 receiver$ mbuffer -I sender:1 -s 128k -m 2048M | zfs receive
 sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:1
 
 We are using 32k since many of our user use tiny files (and then I need
 to reduce the buffer size because of this 'funny' error):
 
 mbuffer: fatal: Cannot address so much memory
 (32768*65536=21474836481544040742911).
 
 Does this qualify for a bug report?
 
 Thanks for the hint of looking into this again!
 
 Cheers
 
 Carsten
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Yes this qualifies for a bug report. As a workaround for now, you can
compile in 64 bit mode.
I.e.:
$ ./configure CFLAGS=-g -O -m64
$ make  make install

This works for Sun Studio 12 and gcc. For older version of Sun Studio,
you need to pass -xarch=v9 instead of -m64.

I am planning to release an updated version mbuffer this week. I'll
include a patch for this issue.

Cheers,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-13 Thread Thomas Maier-Komor
Carsten Aulbert schrieb:
 Hi all,
 
 although I'm running all this in a Sol10u5 X4500, I hope I may ask this
 question here. If not, please let me know where to head to.
 
 We are running several X4500 with only 3 raidz2 zpools since we want
 quite a bit of storage space[*], but the performance we get when using
 zfs send is sometimes really lousy. Of course this depends what's in the
 file system, but when doing a few backups today I have seen the following:
 
 receiving full stream of atlashome/[EMAIL PROTECTED] into
 atlashome/BACKUP/[EMAIL PROTECTED]
 in @ 11.1 MB/s, out @ 11.1 MB/s, 14.9 GB total, buffer   0% full
 summary: 14.9 GByte in 45 min 42.8 sec - average of 5708 kB/s
 
 So, a mere 15 GB were transferred in 45 minutes, another user's home
 which is quite large (7TB) took more than 42 hours to be transferred.
 Since all this is going a 10 Gb/s network and the CPUs are all idle I
 would really like to know why
 
 * zfs send is so slow and
 * how can I improve the speed?
 
 Thanks a lot for any hint
 
 Cheers
 
 Carsten
 
 [*] we have some quite a few tests with more zpools but were not able to
 improve the speeds substantially. For this particular bad file system I
 still need to histogram the file sizes.
 


Carsten,

the summary looks like you are using mbuffer. Can you elaborate on what
options you are passing to mbuffer? Maybe changing the blocksize to be
consistent with the recordsize of the zpool could improve performance.
Is the buffer running full or is it empty most of the time? Are you sure
that the network connection is 10Gb/s all the way through from machine
to machine?

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pros/Cons of multiple zpools?

2008-10-09 Thread Thomas Maier-Komor
Joseph Mocker schrieb:
 Hello,
 
 I haven't seen this discussed before. Any pointers would be appreciated.
 
 I'm curious, if I have a set of disks in a system, is there any benefit 
 or disadvantage to breaking the disks into multiple pools instead of a 
 single pool?
 
 Does multiple pools cause any additional overhead for ZFS, for example? 
 Can it cause cache contention/starvation issues?
 
 Thanks...
 
   --joe
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Currently, I've two pools in my system: one for live data and the other
for backup. When doing large backups (i.e. tar'ing one directory
hierarchy from live to backup), I've seen severe memory pressure on the
system - as if both pools were competing for memory...

Maybe with zfs boot/root becoming available, I'll add a third pool for
the OS. From what I've seen, zfs makes very much sense for boot/root if
you are using live upgrade. I like the idea of having OS and data
separated, but on a system with only two disks, I'd definitely go for a
single mirrored zpool where both OS and data reside. I guess sharing one
physical disk among multiple zpools could have severe negative impacts
during concurrent accesses. But I really have no in-depth knowledge to
say for sure. Maybe somebody else can comment on this...

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, SATA, LSI and stability

2008-08-12 Thread Thomas Maier-Komor
Frank Fischer wrote:
 After having massive problems with a supermicro X7DBE box using AOC-SAT2-MV8 
 Marvell controllers and opensolaris snv79 (same as described here: 
 http://sunsolve.sun.com/search/document.do?assetkey=1-66-233341-1) we just 
 start over using new hardware and opensolaris 2008.05 upgraded to snv94. We 
 used again a supermicro X7DBE but now with two LSI SAS3081E SAS controllers. 
 And guess what? Now we get these error-messages in /var/adm/messages:
 
 Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0 (sd11):
 Aug 11 18:20:52 thumper2Error for Command: read(10)
 Error Level: Retryable
 Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  Requested Block: 
 1423173120Error Block: 1423173120
 Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  Vendor: ATA   
  Serial Number:  WD-WCAP
 Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  Sense Key: 
 Unit_Attention
 Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  ASC: 0x29 (power on, 
 reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
 
 Along whit these messages there are a lot of this messages:
 
 Aug 11 18:20:51 thumper2 scsi: [ID 365881 kern.info] /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt1):
 Aug 11 18:20:51 thumper2Log info 0x31123000 received for target 5.
 Aug 11 18:20:51 thumper2scsi_status=0x0, ioc_status=0x804b, 
 scsi_state=0xc
 
 
 I would believe having a faulty disk, but not two:
 
 Aug 11 17:47:47 thumper2 scsi: [ID 365881 kern.info] /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt1):
 Aug 11 17:47:47 thumper2Log info 0x31123000 received for target 4.
 Aug 11 17:47:47 thumper2scsi_status=0x0, ioc_status=0x804b, 
 scsi_state=0xc
 Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],0 (sd10):
 Aug 11 17:47:48 thumper2Error for Command: read(10)
 Error Level: Retryable
 Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  Requested Block: 
 252165120 Error Block: 252165120
 Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  Vendor: ATA   
  Serial Number:
 Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  Sense Key: 
 Unit_Attention
 Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  ASC: 0x29 (power on, 
 reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
 Aug 11 17:48:34 thumper2 scsi: [ID 243001 kern.warning] WARNING: /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0):
 
 
 Does somebody know what is going on here?
 I have checked the disks with iostat -En :
 
 -bash-3.2# iostat -En
 ...
 c4t0d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
 Vendor: FUJITSU  Product: MBA3073RCRevision: 0103 Serial No:  
 Size: 73.54GB 73543163904 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
 Illegal Request: 0 Predictive Failure Analysis: 0 
 c4t5d0   Soft Errors: 4 Hard Errors: 24 Transport Errors: 179 
 Vendor: ATA  Product: ST3750330NS  Revision: SN04 Serial No:  
 Size: 750.16GB 750156374016 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 22 Recoverable: 4 
 Illegal Request: 0 Predictive Failure Analysis: 0 
 c4t6d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
 Vendor: ATA  Product: WDC WD7500AYYS-0 Revision: 4G30 Serial No:  
 Size: 750.16GB 750156374016 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
 Illegal Request: 0 Predictive Failure Analysis: 0 
 c6t4d0   Soft Errors: 6 Hard Errors: 17 Transport Errors: 466 
 Vendor: ATA  Product: ST3750640NS  Revision: GSerial No:  
 Size: 750.16GB 750156374016 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 17 Recoverable: 6 
 Illegal Request: 0 Predictive Failure Analysis: 0 
 c6t5d0   Soft Errors: 2 Hard Errors: 23 Transport Errors: 539 
 Vendor: ATA  Product: WDC WD7500AYYS-0 Revision: 4G30 Serial No:  
 Size: 750.16GB 750156374016 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 23 Recoverable: 2 
 Illegal Request: 0 Predictive Failure Analysis: 0 
 
 I have check the drives with smartctl:
 
 ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
 WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate 0x000f   115   075   006Pre-fail  Always  
  -   94384069
   3 Spin_Up_Time0x0003   093   093   000Pre-fail  Always  
  -   0
   4 Start_Stop_Count0x0032   100   100   020Old_age   Always  
  -   15
   5 Reallocated_Sector_Ct   0x0033   100   100   036Pre-fail  Always  
  -   0
   7 Seek_Error_Rate 0x000f   084   060   030

Re: [zfs-discuss] SATA controller suggestion

2008-06-09 Thread Thomas Maier-Komor
Tom Buskey schrieb:
 On Fri, Jun 6, 2008 at 16:23, Tom Buskey
 [EMAIL PROTECTED] wrote:
 I have an AMD 939 MB w/ Nvidea on the motherboard
 and 4 500GB SATA II drives in a RAIDZ.
 ...
 I get 550 MB/s
 I doubt this number a lot.  That's almost 200
 (550/N-1 = 183) MB/s per
 disk, and drives I've seen are usually more in the
 neighborhood of 80
 MB/s.  How did you come up with this number?  What
 benchmark did you
 run?  While it's executing, what does zpool iostat
 mypool 10 show?
 
 
 time gdd if=/dev/zero bs=1048576 count=10240 of=/data/video/x
 
 real  0m13.503s
 user  0m0.016s
 sys   0m8.981s
  
  

Are you sure gdd doesn't create a sparse file?

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Get your SXCE on ZFS here!

2008-06-07 Thread Thomas Maier-Komor
[EMAIL PROTECTED] wrote:
 Uwe,
 
 Please see pages 55-80 of the ZFS Admin Guide, here:
 
 http://opensolaris.org/os/community/zfs/docs/
 
 Basically, the process is to upgrade from nv81 to nv90 by using the
 standard upgrade feature. Then, use lucreate to migrate your UFS root
 file system to a ZFS file system, like this:
 
 1. Verify you have a current backup.
 2. Read the known issues and requirements.
 3. Upgrade to nv81 to nv90 using the standard upgrade feature.
 4. Migrate your UFS root file system to a ZFS root file system,
 like this:
 # zpool create rpool mirror c0t1d0s0 c0t2d0s0
 # lucreate -c c0t0d0s0 -n zfsBE -p rpool
 5. Activate the ZFS BE, like this:
 # luactivate zfsBE
 
 Please see the doc for more examples of this process.
 
 Cindy
 

Hi Cindy,

unfortunately, this approach fails for me, because lucreate errors out
(see below).

Does anybody know, if this is a known issue?

- Thomas


# lucreate -n nv90ext -p ext1
Analyzing system configuration.
Comparing source boot environment c0t1d0s0 file systems with the file
system(s) you specified for the new boot environment. Determining which
file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
The device /dev/dsk/c1t9d0 is not a root device for any boot
environment; cannot get BE ID.
Creating configuration for boot environment nv90ext.
Source boot environment is c0t1d0s0.
Creating boot environment nv90ext.
Creating file systems on boot environment nv90ext.
Creating zfs file system for / in zone global on ext1/ROOT/nv90ext.
Populating file systems on boot environment nv90ext.
Checking selection integrity.
Integrity check OK.
Populating contents of mount point /.
Copying.
WARNING: The file /tmp/lucopy.errors.5981 contains a list of 45
potential problems (issues) that were encountered while populating boot
environment nv90ext.
INFORMATION: You must review the issues listed in
/tmp/lucopy.errors.5981 and determine if any must be resolved. In
general, you can ignore warnings about files that were skipped because
they did not exist or could not be opened. You cannot ignore errors such
as directories or files that could not be created, or file systems running
out of disk space. You must manually resolve any such problems before you
activate boot environment nv90ext.
Creating shared file system mount points.
Creating compare databases for boot environment nv90ext.
Creating compare database for file system /.
Updating compare databases on boot environment nv90ext.
Making boot environment nv90ext bootable.
ERROR: Unable to determine the configuration of the target boot
environment nv90ext.
ERROR: Update of loader failed.
ERROR: Cannot make ABE nv90ext bootable.
Making the ABE nv90ext bootable FAILED.
ERROR: Unable to make boot environment nv90ext bootable.
ERROR: Unable to populate file systems on boot environment nv90ext.
ERROR: Cannot make file systems for boot environment nv90ext.

$ cat /tmp/lucopy.errors.5981
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/template
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/latest
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/1/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/4/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/5/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/14/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/16/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/18/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/19/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/23/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/25/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/28/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/37/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/43/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/44/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/45/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/46/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/47/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/48/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/51/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/52/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/53/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/55/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/56/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/57/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/58/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/59/ctl
Restoring existing /.alt.tmp.b-aEb.mnt/system/contract/process/60/ctl
Restoring existing 

Re: [zfs-discuss] zfs equivalent of ufsdump and ufsrestore

2008-05-29 Thread Thomas Maier-Komor
Darren J Moffat schrieb:
 Joerg Schilling wrote:
 Poulos, Joe [EMAIL PROTECTED] wrote:

 Is there a  ZFS equivalent of ufsdump and ufsrestore? 

  

  Will creating a tar file work with ZFS? We are trying to backup a
 ZFS file system to a separate disk, and would like to take advantage of
 something like ufsdump rather than using expensive backup software.
 The closest equivalent to ufsdump and ufsrestore is star.
 
 I very strongly disagree.  The closest ZFS equivalent to ufsdump is 'zfs 
 send'.  'zfs send' like ufsdump has initmiate awareness of the the 
 actual on disk layout and is an integrated part of the filesystem 
 implementation.
 
 star is a userland archiver.
 

The man page for zfs states the following for send:

  The format of the stream is evolving. No backwards  compati-
  bility  is  guaranteed.  You may not be able to receive your
  streams on future versions of ZFS.

I think this should be taken into account when considering 'zfs send' 
for backup purposes...

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] opensolaris 2008.05 boot recovery

2008-05-20 Thread Thomas Maier-Komor
Hi,

I've run into an issue with a test machine that I'm happy to encounter
with this machine, because it is no real trouble. But I'd like to know
the solution for this issue in case I run into it again...

I've installed OpenSolaris 2008.05 on an USB disk on a laptop. After
installing I've modified /etc/passwd from hand: I've added another entry
for the same uid with a different login name and home directory, so that
I can login with a local and an NFS import home directory. Unfortunately
I forgot to update /etc/shadow accordingly, so I ended up, being unable
to login at all, because root is a profile and the newly added account
was before the original account with the same uid. So no login possible
anymore.

Normally, such a situation is not a problem. So I did what I usually
would do and booted from the CD. OK, now I zpool imported rpool,
modified /etc/passwd, and /etc/shadow, exported the pool, and rebooted.

BANG! Now the machine doesn't boot anymore, because during this process
it somehow lost the ramdisk image.

Now my question is: How to I reinstall the boot procedure to the laptop
from the install CD?

I've tried using bootadm with -R and the like but couldn't make any
progress...

Any ideas? Any hint would be highly appreciated...

TIA,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance of one single 'cp'

2008-04-18 Thread Thomas Maier-Komor
after some fruitful discussions with Jörg, it turned out that my mtwrite 
patch prevents tar, star, gtar, and unzip from setting the file times 
correctly. I've investigated this issue and updated the patch accordingly.

Unfortunately, I encountered an issue concerning semaphores, which seem 
to have a race condition. At least I couldn't get it to work reliably 
with semaphores so I switched over to condition variables, which works 
now. I'll investigate the semaphore issue as soon as I have time, but 
I'm pretty convinced that there is a race condition in the semaphore 
implementation, as the semaphore value from time to time grew larger 
than the number of elements in the work list.

This was on Solaris 10 - so I'll try to generate a test for SX. Does 
anybody know of any issues related to semaphores?

the work creator did the following:
- lock the structure containing the list
- attach an element to the list
- post the semaphore
- unlock the structure

the worker thread did the following:
- wait on the semaphore
- lock the structure containing the list
- remove an element from the list
- unlock the structure
- perform the work described by the list element
- lock the structure
- update the structure to reflect the work results
- unlock the structure
- restart from the beginning

Is anything wrong with this approach? Replacing the semaphore calls with 
condition calls and swapping steps 1 and 2 of worker thread made it 
reliable...

- Thomas
P.S.: I published the updated mtwrite on my website yesterday - get it 
here: http://www.maier-komor.de/mtwrite.html
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance of one single 'cp'

2008-04-08 Thread Thomas Maier-Komor
Bob Friesenhahn schrieb:
 On my drive array (capable of 260MB/second single-process writes and 
 450MB/second single-process reads) 'zfs iostat' reports a read rate of 
 about 59MB/second and a write rate of about 59MB/second when executing 
 'cp -r' on a directory containing thousands of 8MB files.  This seems 
 very similar to the performance you are seeing.
 
 The system indicators (other than disk I/O) are almost flatlined at 
 zero while the copy is going on.
 
 It seems that a multi-threaded 'cp' could be much faster.
 
 With GNU xargs, find, and cpio, I think that it is possible to cobble 
 together a much faster copy since GNU xargs supports --max-procs and 
 --max-args arguments to allow executing commands concurrently with 
 different sets of files.
 
 Bob


That's the reason I wrote a binary patch (preloadable shared object) for
cp, tar, and friends. You might want to take a look at it...
Here: http://www.maier-komor.de/mtwrite.html

- Thomas

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel memory and zfs

2008-03-27 Thread Thomas Maier-Komor
Richard Elling wrote:
 
 The size of the ARC (cache) is available from kstat in the zfs
 module (kstat -m zfs).  Neel wrote a nifty tool to track it over
 time called arcstat.  See
 http://www.solarisinternals.com/wiki/index.php/Arcstat
 
 Remember that this is a cache and subject to eviction when
 memory pressure grows.  The Solaris Internals books have
 more details on how the Solaris virtual memory system works
 and is recommended reading.
  -- richard
 
 

The arcsize is also displayed in sysstat, which additionally shows a lot
more information in a 'top' like fashion. Get it here:
http://www.maier-komor.de/sysstat.html

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LVM on ZFS

2008-01-24 Thread Thomas Maier-Komor
Kava schrieb:
 My 2 cents ... read somewhere that you should not be running LVM on top of 
 ZFS ... something about additional overhead.
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Just for clarification: I was talking about different aspects of SVM and
ZFS separately. I never recommended that you should run LVM on top of
ZFS. I said that it might be possible, but I would rather do something
else...

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LVM on ZFS

2008-01-23 Thread Thomas Maier-Komor
Thiago Sobral schrieb:
 Hi Thomas,
 
 Thomas Maier-Komor escreveu:
 Thiago Sobral schrieb:

 I need to manage volumes like LVM does on Linux or AIX, and I think
 that ZFS can solve this issue.

 I read the SVM specification and certainly it doesn't will be the
 solution that I'll adopt. I don't have Veritas here.


 Why do you think it doesn't fit your needs? What would you do on Linux
 or AIX that you think SVM cannot do?
 On Linux and AIX it's possible to create volume groups and create
 logical volumes inside it, so I can expand or reduce the logical volume.
 How can I do the same with SVM ?
 If I create a slice (c0t0d0s0) with 100GB, can I create two metadevices
 inside (d0 and d1) and grow them ? I think that I should use
 softpartitions. Shouldn't I !?

AFAIK, you can create soft partitions and grow them, but shrinking is
not possible. Use metattach d0 10g to enlarge a logical volume. After
that use growfs to grow the filesystem within the enlarged volume. This
is documented here:
http://docs.sun.com/app/docs/doc/816-4520/tasks-softpart-1?a=view


 $ zfs create black/lv00
 would give you a filesystem named lv00.
 Ok, but this filesystem get the whole size of the pool and I want to
 limit this (i.e) 10GB.. if later I need, I grow this..

No. The newly created filesystem will only consume as much as currently
needed. I.e. it grows and shrinks automagically within the pool.
Additionally, you can reserve space for the filesystem or an upper
boundary of how much it may consume using somethink like zfs set
reservation=16G black/lv00 and zfs set quota=20G black/lv00. IMO this
is more flexible than anything you will get with any logical volume manager.



 I think you should investigate the docs a little bit more closely or
 be a little bit more precise when posting your question. What are you
 actually trying to accomplish?
 I reading Sun Docs, but I didn't found satisfactory answers.
 Do you have a great document or URL ?

Where did you look? Yes, docs.sun.com is pretty exhaustive. But there
are always tasks sections (see above referenced doc example), where
you can look for certain standard tasks that provide a step by step
guide how to realize things. I think this is pretty good.

HTH,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LVM on ZFS

2008-01-21 Thread Thomas Maier-Komor
Thiago Sobral schrieb:
 Hi folks,
 
 I need to manage volumes like LVM does on Linux or AIX, and I think that 
 ZFS can solve this issue.
 
 I read the SVM specification and certainly it doesn't will be the 
 solution that I'll adopt. I don't have Veritas here.
 

Why do you think it doesn't fit your needs? What would you do on Linux 
or AIX that you think SVM cannot do?

 I created a pool with name black and a volume lv00, then created a 
 filesystem with 'newfs' command:
 #newfs /dev/zvol/rdsk/black/lv00
 
 is this the right way ?
 What's is the best way to manage volumes in Solaris?
 Do you have a URL or document describing this !?
 

You can do it this way? But why would you? Doing a
$ zfs create black/lv00
would give you a filesystem named lv00.

I think you should investigate the docs a little bit more closely or be 
a little bit more precise when posting your question. What are you 
actually trying to accomplish?

 cheers,
 
 TS
 
 

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to relocate a disk

2008-01-18 Thread Thomas Maier-Komor
Robert Milkowski schrieb:
 Hello Thomas,
 
 Friday, January 18, 2008, 10:31:17 AM, you wrote:
 
 TMK Hi,
 
 TMK I'd like to move a disk from one controller to another. This disk is 
 TMK part of a mirror in a zfs pool. How can one do this without having to 
 TMK export/import the pool or reboot the system?
 
 TMK I tried taking it offline and online again, but then zpool says the disk
 TMK is unavailable. Trying a zpool replace didn't work because it complains
 TMK that the new disk is part of a zfs pool...
 
 TMK So how can one do this?
 
 Instead of offline'ing it try to detach it and then attach it.
 
 However offline/online should work...
 

does detach/attach work with just a very short resilvering or will this 
resync the disk completely?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] how to relocate a disk

2008-01-18 Thread Thomas Maier-Komor
Hi,

I'd like to move a disk from one controller to another. This disk is 
part of a mirror in a zfs pool. How can one do this without having to 
export/import the pool or reboot the system?

I tried taking it offline and online again, but then zpool says the disk 
is unavailable. Trying a zpool replace didn't work because it complains 
that the new disk is part of a zfs pool...

So how can one do this?

TIA,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intent logs vs Journaling

2008-01-08 Thread Thomas Maier-Komor
 
 the ZIL is always there in host memory, even when no
 synchronous writes
 are being done, since the POSIX fsync() call could be
 made on an open 
 write channel at any time, requiring all to-date
 writes on that channel
 to be committed to persistent store before it returns
 to the application
 ... it's cheaper to write the ZIL at this point than
 to force the entire 5 sec
 buffer out prematurely
 

I have a question that is related to this topic: Why is there only a (tunable) 
5 second threshold and not also an additional threshold for the buffer size 
(e.g. 50MB)?

Sometimes I see my system writing huge amounts of data to a zfs, but the disks 
staying idle for 5 seconds, although the memory consumption is already quite 
big and it really would make sense (from my uneducated point of view as an 
observer) to start writing all the data to disks. I think this leads to the 
pumping effect that has been previously mentioned in one of the forums here.

Can anybody comment on this?

TIA,
Thomas
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] speedup 2-8x of tar xf on ZFS

2007-02-22 Thread Thomas Maier-Komor
Hi,

now, as I'm back to Germany,I've got access to my machine at home with ZFS, so 
I could test my binary patch for multi-threading with tar on a ZFS filesystems.

Results look like this:
.tar, small files (e.g. gcc source tree), speedup: x8
.tar.gz, small files (gcc sources tree), speedup x4
.tar, medium size files (e.g. object files of a compile binutil tree), speedup 
x5
.tar.gz, medium size files, speedup x2-x3

Speedup is a comparison of the wallclock time (timex real) of tar with the 
patched multi-threaded tar, where the patched version is 2x-8x faster. Be aware 
that on UFS filesystem it is about 1:1 speed - you may even suffer a 5%-10% 
decrease of performance.

This test was on a Blade 2500, with 5GB RAM (i.e. everything in cache) running 
Solaris 10U3, and a ZFS filesystem on two 10k rpm 146G SCSI drives arranged as 
a ZFS mirror.

To me this looks like a pretty good speedup. If you also want to benefit from 
this patch, grab it here (http://www.maier-komor.de/mtwrite.html). The current 
version includes a wrapper for tar called mttar, to ease use, and has some 
enhancements concerning performance and errorhandling (see Changelog for 
details).

Have fun with Solaris!

Cheers,
Thomas
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] patch making tar multi-thread

2007-01-25 Thread Thomas Maier-Komor
Hi everybody,

many people, like myself, tested the performance of the ZFS filesystem by doing 
a tar xf something.tar. Unfortunately, ZFS doesn't handle this workload 
pretty well as all writes are being executed sequentially. So some people 
requested a multi-threaded tar...

Well, here it comes:
I have written a small patch that intercepts the write system calls an friends 
an passes them off to worker threads. 

I'd really like to see some performance metrics with this patch on a ZFS 
filesystem. Unfortunately, I am currently far from home, where I have such a 
system. I'd be pleased if you anybody could send me some results. Feedback and 
RFEs are also welcome.

Get it here:
http://www.maier-komor.de/mtwrite.html

Cheers,
Tom
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: patch making tar multi-thread

2007-01-25 Thread Thomas Maier-Komor
 Hello Thomas,
 
 With ZFS as local file system it shouldn't be a
 problem unless tar
 fdsync's each file but then removing fdsyncs would be
 easier.
 
 In case of nfs/zfs multi-threaded tar should help but
 I guess not for
 writes but rather for file/dirs creation and file
 closes. If you only
 put writes to worker threads I'm not sure it will
 really help much if
 any.
 


write and close are sent to worker threads. For open I cannot imagine a
way to parallelize the operations without rewriting tar completely.

I'd be interested in both local ZFS and ZFS over NFS.

Masking out the fdsyncs could be an easy enhancement. But Sun's tar
doesn't do it anyway IIRC and star can be told to suppress the fdsyncs.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS and savecore

2006-11-10 Thread Thomas Maier-Komor
Hi, 

I'm not sure if this is the right forum, but I guess this topic will be bounced 
into the right direction from here.

With ZFS using as much physical memory as it can get, dumps and livedumps via 
'savecore -L' are huge in size. I just tested it on my workstation and got a 
1.8G vmcore file, when dumping only kernel pages. 

Might it be possible to add an extension that would make it possible, to 
support dumping without the whole ZFS cache? I guess this would make kernel 
live dumps smaller again, as they used to be...

Any comments?

Cheers,
Tom
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: zpool snapshot fails on unmounted filesystem

2006-10-26 Thread Thomas Maier-Komor
Hi Tim,

I just retried to reproduce it to generate a reliable test case. Unfortunately, 
I cannot reproduce the error message. So I really have no idea what might have 
cause it

Sorry,
Tom
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool snapshot fails on unmounted filesystem

2006-10-24 Thread Thomas Maier-Komor
Is this a known problem/bug?

$ zfs snapshot zpool/[EMAIL PROTECTED]
internal error: unexpected error 16 at line 2302 of ../common/libzfs_dataset.c

this occured on:
$ uname -a
SunOS azalin 5.10 Generic_118833-24 sun4u sparc SUNW,Sun-Blade-2500
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] live upgrade incompability

2006-09-21 Thread Thomas Maier-Komor
Hi,

concerning this issue I didn't find anything in the bug database, so I thought 
I report it here...

When running live-upgrade on a system with a zfs, LU creates directories for 
all ZFS filesystems in the ABE. This causes svc:/system/filesystem/local to go 
to maintainance state, when booting the ABE, because the zpool won't be 
imported because of the existing directory structure in its mount point.

I observed this behavior on a Solaris 10 system with live-upgrade 11.10.

Tom
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] howto reduce ?zfs introduced? noise

2006-07-13 Thread Thomas Maier-Komor
Hi,

after switching over to zfs from ufs for my ~/ at home, I am a little bit 
disturbed by the noise the disks are making. To be more precise, I always have 
thunderbird and firefox running on my desktop and either or both seem to be 
writing to my ~/ at short intervals and ZFS flushes these transactions at 
intervals about 2-5 seconds to the disks. In contrast UFS seems to be doing a 
little bit more aggressive caching, which reduces disk noise.

I didn't really track down who is the offender and what is the precise reason. 
I only know that the noise disappears as soon as I close Thunderbird and 
Firefox. So maybe there is an easy way to solve this problem at the application 
level. And anyway I want to move my $HOME to more silent disks. 

But I am curious, if I am the only one who observed this behaviour? Maybe there 
is even an easy way to reduce this noise. Additionally, I'd guess that moving 
the heads of the disks all the time, won't make the disks last any longer...

Cheers,
Tom
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS home/JDS interaction issue

2006-06-29 Thread Thomas Maier-Komor
Hi,

I just upgraded my machine at home to Solaris 10U2. As I already had a ZFS, I 
wanted to migrate my home directories at once to a ZFS from a local UFS 
metadisk. Copying and changing the config of the automounter succeeded without 
any problems. But when I tried to login to JDS, login suceeded, but JDS did not 
start and the X session gets always terminated after a couple of seconds. 
/var/dt/Xerrors says that /dev/fb could not be accessed, although it works 
without any problem when running from the UFS filesystem.

Switching back to my UFS based home resolved this issue. I even tried switching 
over to ZFS and rebooted the machine to make 100% sure everything is in a sane 
state (i.e. no gconfd etc.), but the issue persisted and switching back to UFS 
again resolved this issue.

Has anybody else had similar problems? Any idea how to resolve this?

TIA,
Tom
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss