Re: [zfs-discuss] Can I recover filesystem from an offline pool?

2010-05-30 Thread Jim Horng
10GB of memory + 5 days later.  The pool was imported.
this file server is a virtual machine.  I allocated 2GB of memory and 2 CPU 
cores assume this was enough to mange 6 TB (6x 1TB disks). While the pool I am 
try to recover is only 700 GB and not the 6TB pool I am try to migrate.

So I decided to borrow all available memory for this vm and increase it to 12 
GB.  What I immediately notice is the system does not hang as before and Hard 
drive activity light continue to flash so I left it running.

After Days later the pool as imported.
Here is some of my stats
1. system is running opensolaris svn_134
2. The pool I try to recover/import show about 150GB data with 50GB allocation 
3x dedupped before the zfs send hung.
3.  The Memory usage during the import showing around 2GB of free space i.e. 
10GB used
4. The zpool would walk up once in a while for only a sec or so. very low CPU 
usage.  I remember seeing only 2min of CPU time after 2 days.
5. during this whole import, all other zfs commands are blocked, also include 
not able to open new shell and SSH would hang after password is entered.

I assume this issue is all due to the new dedup feature any only happen to pool 
containing dedupped dataset.  Hopeful this issue fixed and available for 
testing soon.  

I thought the memory available for ARC cache are for performance only.   I am 
surprise to learn having not enough available memory would actually hang system 
when you are deleting a dedupped dataset .  I would consider this a major issue 
as the memory requirement seem to be depend on the characteristic of the 
dedupped dataset and there doesn't seem to have a good document explaining how 
much memories are needed.   

Hope this help will anyone consider testing the dedup feature
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/recv reliability

2010-05-30 Thread Brandon High
On Fri, May 28, 2010 at 10:05 AM, Gregory J. Benscoter
gjb5...@arl.psu.edu wrote:
 I’m primarily concerned with in the possibility of a bit flop. If this
 occurs will the stream be lost? Or will the file that that bit flop occurred
 in be the only degraded file? Lastly how does the reliability of this plan
 compare to more traditional backup tools like tar, cpio, etc…?

You could run the stream through something like par2 and then save the
resulting mess of files to tape. It *should* protect you from bit
flips, but at the expense of increased tape size.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] [RESOLVED] Re: No mount all at boot

2010-05-30 Thread me
I had empty directory /export/home created in root. It was preventing mount.
Just deleted it and all is ok.

On Sun, May 30, 2010 at 5:40 PM, me dea...@gmail.com wrote:

 I was trying to expand space of rpool. I didn't done it but after removing
 one (not in use) disk from VM configuration, system doesn't start (no X).
 After shell login i found out that there is no home:

 zfs mount
 rpool/ROOT/opensolaris /

 Home can be mount manually correctly. What is wrong?

 --
 Dmitry


-- 
Dmitry
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] expand zfs for OpenSolaris running inside vm

2010-05-30 Thread me
Thanks! It is exactly i was looking for.


On Sat, May 29, 2010 at 12:44 AM, Cindy Swearingen 
cindy.swearin...@oracle.com wrote:

 2. Attaching a larger disk to the root pool and then detaching
  the smaller disk

 I like #2 best. See this section in the ZFS troubleshooting wiki:

 http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide

 Replacing/Relabeling the Root Pool Disk


Size of pool is changed, i updated swap size too. Now i have detached old
disk.
I did

installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0


Reboot and fail to startup. Grub is loads, os loading screen shows and then
restart :(. I loaded rescue disc console but don't know what to do.

-- 
Dmitry
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] [RESOLVED] Re: expand zfs for OpenSolaris running inside vm

2010-05-30 Thread me
Reinstalling grub helped.

What is the purpose of dump slice?

On Sun, May 30, 2010 at 9:05 PM, me dea...@gmail.com wrote:

 Thanks! It is exactly i was looking for.


 On Sat, May 29, 2010 at 12:44 AM, Cindy Swearingen 
 cindy.swearin...@oracle.com wrote:

 2. Attaching a larger disk to the root pool and then detaching
  the smaller disk

 I like #2 best. See this section in the ZFS troubleshooting wiki:

 http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide

 Replacing/Relabeling the Root Pool Disk


 Size of pool is changed, i updated swap size too. Now i have detached old
 disk.
 I did

 installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t1d0s0


 Reboot and fail to startup. Grub is loads, os loading screen shows and then
 restart :(. I loaded rescue disc console but don't know what to do.

 --
 Dmitry


-- 
Dmitry
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs/lofi/share panic

2010-05-30 Thread Frank Middleton

On 05/27/10 05:16 PM, Dennis Clarke wrote:


I just tried this with a UFS based filesystem just for a lark.


It never failed on UFS, regardless of the contents of /etc/dfs/dfstab.


Guess I must now try this with a ZFS fs under that iso file.


Just tried it again with b134  *with* share /mnt in /etc/dfs/dfstab.

# mount -O -F hsfs /export/iso_images/moblin-2.1-PR-Final-ivi-201002090924.img 
/mnt
# ls /mnt
isolinux  LiveOS
# unshare /mnt
/mnt: path doesn't exist
# share /mnt
# unshare /mnt
# share /mnt

Panic ensues (the following observed on the serial console); note that
the dataset is not UFS!

# May 30 13:35:44 host5 ufs: NOTICE: mount: not a UFS magic number (0x0)

panic[cpu1]/thread=30001f5f560: BAD TRAP: type=31 rp=2a1014769a0 addr=218 mmu_fsr=0 
occurred in module nfssrv due to a NULL pointer dereference

Tried again after it rebooted

Edited /etc/dfs/dfstab  to remove the share /mnt
# unshare /mnt
# mount -O -F hsfs /backups/icon/moblin-2.1-PR-Final-ivi-201002090924.img /mnt
# ls /mnt
isolinux  LiveOS
# unshare /mnt
/mnt: bad path
# share /mnt
# unshare  /mnt
# share /mnt

No panic. So the problem all along appears to be what happens if you
mount -O to an already shared mountpoint. Deliberately sharing before
mounting (but with nothing in /etc/dfs/dfstab) resulted in a slightly
different panic (more like the ones documented in the CR):

panic[cpu1]/thread=30002345e0: BAD TRAP: type=34 rp=2a100f84460 
addr=ff6f6c2f5267 mmu_fsr=0

unshare: alignment error:

So CR6798273 should be amended to show the following:

To reproduce, share (say) /mnt
mount -O some-image-file /mnt
share /mnt
unshare /mnt
share/mnt
unshare ./mnt
Highly reproducible panic ensues.

Workaround - make sure mountpoints are not shared before
mounting iso images stored on a ZFS dataset.

So the problem, now seen to be relatively trivial, isn't fixed. at least
in b134. For all of you who responded both off and on the list and
motivated this experiment,  much thanks. Perhaps someone with
access to a more recent build could try this, and if it still happens,
update and reopen CR6798273, although it doesn't seem very
important now.

Regards -- Frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool/zfs list question

2010-05-30 Thread Roy Sigurd Karlsbakk
Hi all

Using zpool/zfs list -H gives me a good overview of things, and is easy to 
parse, except that the allocation and data sizes are reported in 'human 
readable' form. For scripting, this is somehow non-optimal.

Is there a way to report zpool/zfs stats in a fixed scale, like KiB or even 
bytes?


Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool/zfs list question

2010-05-30 Thread Brandon High
On Sun, May 30, 2010 at 11:46 AM, Roy Sigurd Karlsbakk
r...@karlsbakk.net wrote:
 Is there a way to report zpool/zfs stats in a fixed scale, like KiB or even 
 bytes?

Some (but not all) commands use -p.
 -p
 Use exact (parseable) numeric output.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Disk space overhead (total volume size) by ZFS

2010-05-30 Thread Sandon Van Ness
I just wanted to make sure this is normal and is expected. I fully
expected that as the file-system filled up I would see more disk space
being used than with other file-systems due to its features but what I
didn't expect was to lose out on ~500-600GB to be missing from the total
volume size right at file-system creation.

Comparing two systems, one being JFS and one being ZFS, one being raidz2
one being raid6. Here is the differences I see:

ZFS:
r...@opensolaris: 11:22 AM :/data# df -k /data
Filesystemkbytesused   avail capacity  Mounted on
data 17024716800 258872352 16765843815 2%/data

JFS:
r...@sabayonx86-64: 11:22 AM :~# df -k /data2
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/sdd117577451416   2147912 17575303504   1% /data2

zpool list shows the raw capacity right?

r...@opensolaris: 11:25 AM :/data# zpool list data
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
data   18.1T   278G  17.9T 1%  1.00x  ONLINE  -

Ok, i would expect it to be rounded to 18.2 but that seems about right
for 20 trillion bytes (what 20x1 TB is):

r...@sabayonx86-64: 11:23 AM :~# echo | awk '{print
20/1024/1024/1024/1024}'
18.1899

Now minus two drives for parity:

r...@sabayonx86-64: 11:23 AM :~# echo | awk '{print
18/1024/1024/1024/1024}'
16.3709

Yet when running zfs list it also lists the amount of storage
significantly smaller:

r...@opensolaris: 11:23 AM :~# zfs list data
NAME   USED  AVAIL  REFER  MOUNTPOINT
data   164K  15.9T  56.0K  /data

I would expect this to be 16.4T.

Taking the df -k values JFS gives me a total volume size of:

r...@sabayonx86-64: 11:31 AM :~# echo | awk '{print
17577451416/1024/1024/1024}'
16.3703

and zfs is:

r...@sabayonx86-64: 11:31 AM :~# echo | awk '{print
17024716800/1024/1024/1024}'
15.8555

So basically with JFS I see no decrease in total volume size but a huge
difference on ZFS. Is this normal/expected? Can anything be disabled to
not lose 500-600 GB of space?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disk space overhead (total volume size) by ZFS

2010-05-30 Thread Brandon High
On Sun, May 30, 2010 at 2:37 PM, Sandon Van Ness san...@van-ness.com wrote:
 ZFS:
 r...@opensolaris: 11:22 AM :/data# df -k /data

'zfs list' is more accurate than df, since it will also show space
used by snapshots. eg:
bh...@basestar:~$ df -h /export/home/bhigh
Filesystem size   used  avail capacity  Mounted on
tank/export/home/bhigh
   5.3T   8.2G   2.8T 1%/export/home/bhigh
bh...@basestar:~$ zfs list tank/export/home/bhigh
NAME USED  AVAIL  REFER  MOUNTPOINT
tank/export/home/bhigh  51.0G  2.85T  8.16G  /export/home/bhigh

 zpool list shows the raw capacity right?

Yes. It shows the raw capacity, including space that will be used for
parity. Its USED column includes space used by all active datasets and
snapshots.

 So basically with JFS I see no decrease in total volume size but a huge
 difference on ZFS. Is this normal/expected? Can anything be disabled to
 not lose 500-600 GB of space?

Are you using any snapshots? They'll consume space.

What is the recordsize, and what kind of data are you storing? Small
blocks or lots of small files ( 128k) will have more overhead for
metadata.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disk space overhead (total volume size) by ZFS

2010-05-30 Thread Sandon Van Ness
On 05/30/2010 02:51 PM, Brandon High wrote:
 On Sun, May 30, 2010 at 2:37 PM, Sandon Van Ness san...@van-ness.com wrote:
   
 ZFS:
 r...@opensolaris: 11:22 AM :/data# df -k /data
 
 'zfs list' is more accurate than df, since it will also show space
 used by snapshots. eg:
 bh...@basestar:~$ df -h /export/home/bhigh
 Filesystem size   used  avail capacity  Mounted on
 tank/export/home/bhigh
5.3T   8.2G   2.8T 1%/export/home/bhigh
 bh...@basestar:~$ zfs list tank/export/home/bhigh
 NAME USED  AVAIL  REFER  MOUNTPOINT
 tank/export/home/bhigh  51.0G  2.85T  8.16G  /export/home/bhigh

   
 zpool list shows the raw capacity right?
 
 Yes. It shows the raw capacity, including space that will be used for
 parity. Its USED column includes space used by all active datasets and
 snapshots.

   
 So basically with JFS I see no decrease in total volume size but a huge
 difference on ZFS. Is this normal/expected? Can anything be disabled to
 not lose 500-600 GB of space?
 
 Are you using any snapshots? They'll consume space.

 What is the recordsize, and what kind of data are you storing? Small
 blocks or lots of small files ( 128k) will have more overhead for
 metadata.

 -B

   

Yeah I know all about issues with snapshots and stuff like this but this
is on a totally new/empty file-system. Its basically over 500 gigabytes
smaller right from the get-go even before any data has ever been written
to it. I would totally expect some numbers to be off on a used
file-system but not so much on a completely brand-new one.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Small stalls slowing down rsync from holding network saturation every 5 seconds

2010-05-30 Thread Sandon Van Ness
Basically for a few seconds at a time I can get very nice speeds through
rsync (saturating a 1 gig link) which is around 112-113 megabytes/sec
which is about as good as I can expect after overhead. The problem is
that every 5 seconds when data is actually written to disks (physically
looking at the disk LEDs I see the activity on the one sending data
stall as the ZFS machine is showing disk activity (writes).

Basically this problem is bringing down the average write speed from
around 112-113 megabytes/sec to around 100 megabytes/sec (sometimes
lower) and thus lowering speeds by 10% (or a bit more). I only really
care as 10% can make a difference when you are copying terrabytes of
data to the machine. Anyway here is what I am seeing on the linux
machine that is sending to the ZFS machine:


Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
avgrq-sz avgqu-sz   await  svctm  %util
sdc3 17.00 0.00  496.000.00   112.00 0.00  
462.45 0.420.84   0.16   7.90

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   6.340.003.800.000.00   89.86

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
avgrq-sz avgqu-sz   await  svctm  %util
sdc3  1.00 0.00  584.000.00   111.75 0.00  
391.88 0.430.74   0.12   7.10

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   6.840.005.440.000.00   87.72

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
avgrq-sz avgqu-sz   await  svctm  %util
sdc3  4.00 0.00  557.000.00   112.00 0.00  
411.81 0.430.77   0.13   7.10

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   4.640.002.510.000.00   92.86

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
avgrq-sz avgqu-sz   await  svctm  %util
sdc3  1.98 0.00  104.950.0023.76 0.00  
463.70 0.090.84   0.17   1.78

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   7.110.005.080.000.00   87.82

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
avgrq-sz avgqu-sz   await  svctm  %util
sdc3  9.00 0.00  538.000.00   112.69 0.00  
428.97 0.390.72   0.14   7.50

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   7.460.005.560.000.00   86.98

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
avgrq-sz avgqu-sz   await  svctm  %util
sdc3  8.00 0.00  524.000.00   112.00 0.00  
437.74 0.380.72   0.13   6.90

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   7.980.005.960.000.00   86.06

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
avgrq-sz avgqu-sz   await  svctm  %util
sdc3  1.00 0.00  493.000.00   111.29 0.00  
462.31 0.390.80   0.16   7.90


This is iostat -xm 1 sdc3

Basically you can see its reading at around full gig speed and then
reads drop down due to writes being stalled on the ZFS machine. These
are 1 second averages.

On the ZFS end with 10 second averages I see approximately 100 MB/sec:


data25.1G  18.1T  0834  0   100M
data26.2G  18.1T  0833  0   100M
data27.3G  18.1T  0833  0   100M


Change this to 1 second and I see:
data32.7G  18.1T  0  0  0  0
data32.7G  18.1T  0  2.86K  0   360M
data33.3G  18.1T  0264  0  21.8M
data33.3G  18.1T  0  0  0  0
data33.3G  18.1T  0  0  0  0
data33.3G  18.1T  0  0  0  0
data33.3G  18.1T  0  2.94K  0   369M
data33.8G  18.1T  0375  0  35.1M
data33.8G  18.1T  0  0  0  0
data33.8G  18.1T  0  0  0  0
data33.8G  18.1T  0  0  0  0
data33.8G  18.1T  0  2.90K  0   365M
data34.4G  18.1T  0599  0  62.6M
data34.4G  18.1T  0  0  0  0
data34.4G  18.1T  0  0  0  0
data34.4G  18.1T  0  0  0  0
data34.4G  18.1T  0  2.10K  0   265M
data34.9G  18.1T  0  1.77K  0   211M


I tried changing the txg sync time from 30 to 1 and that did make things
more smooth but in general lowered speeds (down to 90 megabytes/sec or
so). Actually writing files to the array I see well excess of 112
megabytes/sec so I would think I should be able to get this to go at
full gig speeds without the small stalls:

r...@opensolaris: 11:36 AM :/data# dd bs=1M count=10 if=/dev/zero
of=./100gb.bin
10+0 records in
10+0 records out
10485760 bytes (105 GB) copied, 233.257 s, 450 MB/s

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org

Re: [zfs-discuss] Disk space overhead (total volume size) by ZFS

2010-05-30 Thread Mattias Pantzare
On Sun, May 30, 2010 at 23:37, Sandon Van Ness san...@van-ness.com wrote:
 I just wanted to make sure this is normal and is expected. I fully
 expected that as the file-system filled up I would see more disk space
 being used than with other file-systems due to its features but what I
 didn't expect was to lose out on ~500-600GB to be missing from the total
 volume size right at file-system creation.

 Comparing two systems, one being JFS and one being ZFS, one being raidz2
 one being raid6. Here is the differences I see:

 ZFS:
 r...@opensolaris: 11:22 AM :/data# df -k /data
 Filesystem            kbytes    used   avail capacity  Mounted on
 data                 17024716800 258872352 16765843815     2%    /data

 JFS:
 r...@sabayonx86-64: 11:22 AM :~# df -k /data2
 Filesystem           1K-blocks      Used Available Use% Mounted on
 /dev/sdd1            17577451416   2147912 17575303504   1% /data2

 zpool list shows the raw capacity right?

 r...@opensolaris: 11:25 AM :/data# zpool list data
 NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
 data   18.1T   278G  17.9T     1%  1.00x  ONLINE  -

 Ok, i would expect it to be rounded to 18.2 but that seems about right
 for 20 trillion bytes (what 20x1 TB is):

 r...@sabayonx86-64: 11:23 AM :~# echo | awk '{print
 20/1024/1024/1024/1024}'
 18.1899

 Now minus two drives for parity:

 r...@sabayonx86-64: 11:23 AM :~# echo | awk '{print
 18/1024/1024/1024/1024}'
 16.3709

 Yet when running zfs list it also lists the amount of storage
 significantly smaller:

 r...@opensolaris: 11:23 AM :~# zfs list data
 NAME   USED  AVAIL  REFER  MOUNTPOINT
 data   164K  15.9T  56.0K  /data

 I would expect this to be 16.4T.

 Taking the df -k values JFS gives me a total volume size of:

 r...@sabayonx86-64: 11:31 AM :~# echo | awk '{print
 17577451416/1024/1024/1024}'
 16.3703

 and zfs is:

 r...@sabayonx86-64: 11:31 AM :~# echo | awk '{print
 17024716800/1024/1024/1024}'
 15.8555

 So basically with JFS I see no decrease in total volume size but a huge
 difference on ZFS. Is this normal/expected? Can anything be disabled to
 not lose 500-600 GB of space?

This may be the answer:
http://www.cuddletech.com/blog/pivot/entry.php?id=1013
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disk space overhead (total volume size) by ZFS

2010-05-30 Thread Sandon Van Ness
On 05/30/2010 03:10 PM, Mattias Pantzare wrote:

  On Sun, May 30, 2010 at 23:37, Sandon Van Ness san...@van-ness.com wrote:

   
  I just wanted to make sure this is normal and is expected. I fully
  expected that as the file-system filled up I would see more disk space
  being used than with other file-systems due to its features but what I
  didn't expect was to lose out on ~500-600GB to be missing from the total
  volume size right at file-system creation.
 
  Comparing two systems, one being JFS and one being ZFS, one being raidz2
  one being raid6. Here is the differences I see:
 
  ZFS:
  r...@opensolaris: 11:22 AM :/data# df -k /data
  Filesystemkbytesused   avail capacity  Mounted on
  data 17024716800 258872352 16765843815 2%/data
 
  JFS:
  r...@sabayonx86-64: 11:22 AM :~# df -k /data2
  Filesystem   1K-blocks  Used Available Use% Mounted on
  /dev/sdd117577451416   2147912 17575303504   1% /data2
 
  zpool list shows the raw capacity right?
 
  r...@opensolaris: 11:25 AM :/data# zpool list data
  NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
  data   18.1T   278G  17.9T 1%  1.00x  ONLINE  -
 
  Ok, i would expect it to be rounded to 18.2 but that seems about right
  for 20 trillion bytes (what 20x1 TB is):
 
  r...@sabayonx86-64: 11:23 AM :~# echo | awk '{print
  20/1024/1024/1024/1024}'
  18.1899
 
  Now minus two drives for parity:
 
  r...@sabayonx86-64: 11:23 AM :~# echo | awk '{print
  18/1024/1024/1024/1024}'
  16.3709
 
  Yet when running zfs list it also lists the amount of storage
  significantly smaller:
 
  r...@opensolaris: 11:23 AM :~# zfs list data
  NAME   USED  AVAIL  REFER  MOUNTPOINT
  data   164K  15.9T  56.0K  /data
 
  I would expect this to be 16.4T.
 
  Taking the df -k values JFS gives me a total volume size of:
 
  r...@sabayonx86-64: 11:31 AM :~# echo | awk '{print
  17577451416/1024/1024/1024}'
  16.3703
 
  and zfs is:
 
  r...@sabayonx86-64: 11:31 AM :~# echo | awk '{print
  17024716800/1024/1024/1024}'
  15.8555
 
  So basically with JFS I see no decrease in total volume size but a huge
  difference on ZFS. Is this normal/expected? Can anything be disabled to
  not lose 500-600 GB of space?
  
 
  This may be the answer:
  http://www.cuddletech.com/blog/pivot/entry.php?id=1013

   
That is definitely interesting; however, I am seeing more than 1.6% of a
descrepancy:

When using a newer df based off gnu coreutils I use -B to specify the
unit of 1 billion bytes which is 1 GB using the HD companies scale. On
the raid/jfs:
r...@sabayonx86-64: 03:14 PM :~# df -B 10 /data2
Filesystem  1GB-blocks  Used Available Use% Mounted on
/dev/sdd118000 3 17998   1% /data2

on the ZFS

r...@opensolaris: 03:16 PM :/data# df -B 10 /data
Filesystem  1GB-blocks  Used Available Use% Mounted on
data 17434 1 17434   1% /data

Interesting enough I am seeing almost exactly double that as its 3.14%
by my calculations. Maybe this was cahnged in newer versions to have
more of a reserve? I am running b134.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Small stalls slowing down rsync from holding network saturation every 5 seconds

2010-05-30 Thread Richard Elling
On May 30, 2010, at 3:04 PM, Sandon Van Ness wrote:

 Basically for a few seconds at a time I can get very nice speeds through
 rsync (saturating a 1 gig link) which is around 112-113 megabytes/sec
 which is about as good as I can expect after overhead. The problem is
 that every 5 seconds when data is actually written to disks (physically
 looking at the disk LEDs I see the activity on the one sending data
 stall as the ZFS machine is showing disk activity (writes).
 
 Basically this problem is bringing down the average write speed from
 around 112-113 megabytes/sec to around 100 megabytes/sec (sometimes
 lower) and thus lowering speeds by 10% (or a bit more). I only really
 care as 10% can make a difference when you are copying terrabytes of
 data to the machine. Anyway here is what I am seeing on the linux
 machine that is sending to the ZFS machine:
 
 
 Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sdc3 17.00 0.00  496.000.00   112.00 0.00  
 462.45 0.420.84   0.16   7.90
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   6.340.003.800.000.00   89.86
 
 Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sdc3  1.00 0.00  584.000.00   111.75 0.00  
 391.88 0.430.74   0.12   7.10
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   6.840.005.440.000.00   87.72
 
 Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sdc3  4.00 0.00  557.000.00   112.00 0.00  
 411.81 0.430.77   0.13   7.10
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   4.640.002.510.000.00   92.86
 
 Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sdc3  1.98 0.00  104.950.0023.76 0.00  
 463.70 0.090.84   0.17   1.78
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   7.110.005.080.000.00   87.82
 
 Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sdc3  9.00 0.00  538.000.00   112.69 0.00  
 428.97 0.390.72   0.14   7.50
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   7.460.005.560.000.00   86.98
 
 Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sdc3  8.00 0.00  524.000.00   112.00 0.00  
 437.74 0.380.72   0.13   6.90
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   7.980.005.960.000.00   86.06
 
 Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sdc3  1.00 0.00  493.000.00   111.29 0.00  
 462.31 0.390.80   0.16   7.90
 
 
 This is iostat -xm 1 sdc3
 
 Basically you can see its reading at around full gig speed and then
 reads drop down due to writes being stalled on the ZFS machine. These
 are 1 second averages.
 
 On the ZFS end with 10 second averages I see approximately 100 MB/sec:
 
 
 data25.1G  18.1T  0834  0   100M
 data26.2G  18.1T  0833  0   100M
 data27.3G  18.1T  0833  0   100M
 
 
 Change this to 1 second and I see:
 data32.7G  18.1T  0  0  0  0
 data32.7G  18.1T  0  2.86K  0   360M
 data33.3G  18.1T  0264  0  21.8M
 data33.3G  18.1T  0  0  0  0
 data33.3G  18.1T  0  0  0  0
 data33.3G  18.1T  0  0  0  0
 data33.3G  18.1T  0  2.94K  0   369M
 data33.8G  18.1T  0375  0  35.1M
 data33.8G  18.1T  0  0  0  0
 data33.8G  18.1T  0  0  0  0
 data33.8G  18.1T  0  0  0  0
 data33.8G  18.1T  0  2.90K  0   365M
 data34.4G  18.1T  0599  0  62.6M
 data34.4G  18.1T  0  0  0  0
 data34.4G  18.1T  0  0  0  0
 data34.4G  18.1T  0  0  0  0
 data34.4G  18.1T  0  2.10K  0   265M
 data34.9G  18.1T  0  1.77K  0   211M
 
 
 I tried changing the txg sync time from 30 to 1 and that did make things
 more smooth but in general lowered speeds (down to 90 megabytes/sec or
 so). Actually writing files to the array I see well excess of 112
 megabytes/sec so I would think I should be able to get this to go at
 full gig speeds without the small stalls:

I have better luck tuning the zfs_txg_synctime_ms from 5000 to 1000 or less.

 
 r...@opensolaris: 11:36 AM :/data# dd bs=1M 

Re: [zfs-discuss] Zfs mirror boot hang at boot

2010-05-30 Thread Frank Cusack

On 5/29/10 12:54 AM -0700 Matt Connolly wrote:

I'm running snv_134 on 64-bit x86 motherboard, with 2 SATA drives. The
zpool rpool uses whole disk of each drive.


Can't be.  zfs can't boot from a whole disk pool on x86 (maybe sparc too).
You have a single solaris partition with the root pool on it.  I am only
being pedantic because whole disk has a special meaning to zfs, distinct
from a single partition using the entire disk.

...

If I detach a drive from the pool, then the system also correctly boots
off a single connected drive. However, reattaching the 2nd drive causes a
whole resilver to occur.


By detach do you mean running zpool detach, or simply removing the
drive physically without running any command?  I suppose the former
because if you just remove it I'd think you'd have the same non-booting
problem.  If that's right, then that is the expected behavior.
zpool detach causes zfs to forget everything it knows about the device
being detached.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Small stalls slowing down rsync from holding network saturation every 5 seconds

2010-05-30 Thread Sandon Van Ness
On 05/30/2010 04:22 PM, Richard Elling wrote:
 If you want to decouple the txg commit completely, then you might consider
 using a buffer of some sort.  I use mbuffer for pipes, but that may be tricky
 to use in an rsync environment.
   -- richard
   
I initially thought this was I/O but now I think it might be a CPU
bottleneck causing the problem.

When I write zero's using DD I can get quite good speed (~500
megabytes/sec) but I think the problem is that it totally maxes both of
my CPU's cores when I do so (2-5% idle in top).

The problem is that when it does the write burst its taking away CPU
usage from rsync which is actually what might be causing the dip during
writes (not the I/O activity itself) but the CPU generated from the writes.

I actually verified this by making a non raidz2 pool and just added all
the disks into a pool with nothing specified:

zpool create data c4t5000C500028BD5FCd0p0 c4t5000C50009A4D727d0p0
c4t5000C50009A46AF5d0p0 c4t5000C50009A515B0d0p0 c4t5000C500028A81BEd0p0
c4t5000C500028B44A1d0p0 c4t5000C500028B415Bd0p0 c4t5000C500028B23D2d0p0
c4t5000C5000CC3338Dd0p0 c4t5000C500027F59C8d0p0 c4t5000C50009DBF8D4d0p0
c4t5000C500027F3C1Fd0p0 c4t5000C5000DAF02F3d0p0 c4t5000C5000DA7ED4Ed0p0
c4t5000C5000DAEF990d0p0 c4t5000C5000DAEEF8Ed0p0 c4t5000C5000DAEB881d0p0
c4t5000C5000A121581d0p0 c4t5000C5000DAC848Fd0p0 c4t5000C50002770EE6d0p0

Once I did this I got a nice stable 115 megabyte/ssec over the network
so 11.5% better.

So the problem appears to be that when it goes to write data it will use
100% of its cpu power (even for x number of ms of time) which stalls the
network while it does this. This happens when it does parity
calculations but doesn't when using a non parity zpool. I don't think I
can throttle parity calculations at all so I don't think there will be a
fix to this unfortunately =(. I can live with losing 10% off my rsync
speed though.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss