Re: [zfs-discuss] zil and root on the same SSD disk

2011-01-13 Thread Jorgen Lundman
Whenever I do a root pool, ie, configure a pool using the c?t?d?s0 notation, it 
will always complain about overlapping slices, since *s2 is the entire disk. 
This warning seems excessive, but -f will ignore it.

As for ZIL, the first time I created a slice for it. This worked well, the 
second time I did:

# zfs create -V 2G rpool/slog
# zfs set refreservation=2G rpool/slog

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  c9d0s0ONLINE   0 0 0

  pool: zpool
 state: ONLINE
config:

NAMESTATE READ WRITE CKSUM
zpool   ONLINE   0 0 0
  raidz1-0  ONLINE   0 0 0
c8t0d0  ONLINE   0 0 0
c8t1d0  ONLINE   0 0 0
c8t2d0  ONLINE   0 0 0
c8t3d0  ONLINE   0 0 0
c8t4d0  ONLINE   0 0 0
logs
  /dev/zvol/dsk/rpool/slog  ONLINE   0 0 0


Which I prefer now, as I can potentially change it size and reboot, compared to 
slices that are much more static. Don't know how it compares performance wise, 
but right now the NAS is fast enough (the nic is the slowest part).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Mirroring raidz ?

2011-01-12 Thread Jorgen Lundman
I have a server, with two external drive cages attached, on separate 
controllers:

c0::dsk/c0t0d0 disk connectedconfigured   unknown
c0::dsk/c0t1d0 disk connectedconfigured   unknown
c0::dsk/c0t2d0 disk connectedconfigured   unknown
c0::dsk/c0t3d0 disk connectedconfigured   unknown
c0::dsk/c0t4d0 disk connectedconfigured   unknown
c0::dsk/c0t5d0 disk connectedconfigured   unknown
c0::dsk/c0t6d0 disk connectedconfigured   unknown
c0::dsk/c0t7d0 disk connectedconfigured   unknown
c0::dsk/c0t8d0 disk connectedconfigured   unknown
c0::dsk/c0t9d0 disk connectedconfigured   unknown
c0::dsk/c0t10d0disk connectedconfigured   unknown
c0::dsk/c0t11d0disk connectedconfigured   unknown

c1::dsk/c1t1d0 disk connectedconfigured   unknown
c1::dsk/c1t2d0 disk connectedconfigured   unknown
c1::dsk/c1t3d0 disk connectedconfigured   unknown
c1::dsk/c1t4d0 disk connectedconfigured   unknown
c1::dsk/c1t5d0 disk connectedconfigured   unknown
c1::dsk/c1t6d0 disk connectedconfigured   unknown
c1::dsk/c1t7d0 disk connectedconfigured   unknown
c1::dsk/c1t8d0 disk connectedconfigured   unknown
c1::dsk/c1t9d0 disk connectedconfigured   unknown
c1::dsk/c1t10d0disk connectedconfigured   unknown
c1::dsk/c1t11d0disk connectedconfigured   unknown

It would be nice to create a setup similar to

zpool create sub1 raidz c0t0d0 c0t1d0 c0t2d0 c0t3d0 c0t4d0 c0t5d0
zpool add sub1 raidz c0t6d0 c0t7d0 c0t8d0 c0t9d0 c0t10d0 c0t11d0

zpool create sub2 raidz c1t0d0 c1t1d0 c1t2d0 c1t3d0 c1t4d0 c1t5d0
zpool add sub2 raidz c1t6d0 c1t7d0 c1t8d0 c1t9d0 c1t10d0 c1t11d0

zpool create pool mirror sub1 sub2

As I could lose a HDD in either external drive cage, or indeed a whole external 
drive cage (controller/cable/power) without downtime.

But I have a feeling I can not do this? What would be recommended?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS panic on blade BL465c G1

2010-10-03 Thread Jorgen Lundman
Hello list,

I got a c7000 with BL465c G1 blades to play with and have been trying to get 
some form of Solaris to work on it. 

However, this is the state:

OpenSolaris 134: Installs with ZFS, but no BNX nic drivers.
OpenIndiana 147: Panics on zpool create everytime, even from console. Has no 
UFS option, has NICS.
Solaris 10 u9: Panics on zpool create, but has UFS option, has NICs.

One option would be to get 147 NIC drivers for 134.

But for now, the ZFS panics happens on create. The blade has a HP Smart Array 
e200i card in it, both HDDs set up as single HDD logical volumes;

# format
AVAILABLE DISK SELECTIONS:
   0. c1t0d0 DEFAULT cyl 17841 alt 2 hd 255 sec 63
  /p...@2,0/pci1166,1...@11/pci1166,1...@0/pci103c,3...@8/s...@0,0
   1. c1t1d0 DEFAULT cyl 17841 alt 2 hd 255 sec 63
  /p...@2,0/pci1166,1...@11/pci1166,1...@0/pci103c,3...@8/s...@1,0

1; p; p
Total disk cylinders available: 17841 + 2 (reserved cylinders)

Part  TagFlag Cylinders SizeBlocks
  0   rootwm   1 - 17840  136.66GB(17840/0/0) 286599600
  1 unassignedwu   00 (0/0/0) 0
  2 backupwm   0 - 17840  136.67GB(17841/0/0) 286615665
  3 unassignedwm   00 (0/0/0) 0
  4 unassignedwm   00 (0/0/0) 0
  5 unassignedwm   00 (0/0/0) 0
  6 unassignedwm   00 (0/0/0) 0
  7 unassignedwm   00 (0/0/0) 0
  8   bootwu   0 - 07.84MB(1/0/0) 16065


# zpool create -f zboot c1t1d0s0

panic[cpu2]/thread=fe80011a2c60:
BAD TRAP: type=e (#pf Page fault) rp=fe80011a2940 addr=278 occurred in modul
e unix due to a NULL pointer dereference


sched:
#pf Page fault
Bad kernel fault at addr=0x278
pid=0, pc=0xfb8406fb, sp=0xfe80011a2a38, eflags=0x10246
cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de
cr2: 278 cr3: 1161f000 cr8: c
rdi:  278 rsi:4 rdx: fe80011a2c60
rcx:   14  r8:0  r9:0
rax:0 rbx:  278 rbp: fe80011a2a60
r10:0 r11:1 r12:   10
r13:0 r14:4 r15: 9bb02ef0
fsb:0 gsb: 8ac7a800  ds:   43
 es:   43  fs:0  gs:  1c3
trp:e err:2 rip: fb8406fb
 cs:   28 rfl:10246 rsp: fe80011a2a38
 ss:   30

fe80011a2850 unix:die+da ()
fe80011a2930 unix:trap+5e6 ()
fe80011a2940 unix:cmntrap+140 ()
fe80011a2a60 unix:mutex_enter+b ()
fe80011a2a70 zfs:zio_buf_alloc+1d ()
fe80011a2aa0 zfs:zio_vdev_io_start+120 ()
fe80011a2ad0 zfs:zio_execute+7b ()
fe80011a2af0 zfs:zio_nowait+1a ()
fe80011a2b60 zfs:vdev_probe+f0 ()
fe80011a2ba0 zfs:vdev_open+2b1 ()
fe80011a2bc0 zfs:vdev_open_child+21 ()
fe80011a2c40 genunix:taskq_thread+295 ()
fe80011a2c50 unix:thread_start+8 ()

syncing file systems...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] opensolaris lightweight install

2010-01-06 Thread Jorgen Lundman


On my NAS I use Velitium: http://sourceforge.net/projects/velitium/ which goes 
down to about 70MB at the smallest.



(2010/01/07 15:23), Frank Cusack wrote:

been searching and searching ...





--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] quotas on zfs at solaris 10 update 9 (10/09)

2009-12-23 Thread Jorgen Lundman



Len Zaifman wrote:

Because we have users who will create millions of files in a directory it would 
be nice to report the number of files a user has or a group has in a filesystem.

Is there a way (other than find) to get this?


I don't know if there is a good way, but I have noticed that with ZFS, the 
number in ls which used to be for blocks actually report the number of 
entries in the directory (-1).


drwxr-xr-x  13 root bin   13 Oct 28 02:58 spool
^^

# ls -la spool | wc -l
  14

Which means you can probably add things up a little faster.



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replacing log with SSD on Sol10 u8

2009-11-26 Thread Jorgen Lundman


Ok the logfix program compiled for svn111 does run, and lets me change the HDD 
32GB slog, with the new SSD (~29GB) slog, comes up as faulty, but I can replace 
it with itself, and everything is OK. I can attach the second SSD without issues.



Assuming that it doesn't try to write the full 32GB ever, it should be ok. Don't 
know if ZPOOL stores the physical size in the label, or when importing.


# zpool export zpool1
# ./logfix /dev/rdsk/c5t1d0s0 /dev/rdsk/c10t4d0s0 13049515403703921770
# zpool import zpool1
# zpool status

logs
  13049515403703921770  FAULTED  0 0 0  was 
/dev/dsk/c10t4d0s0

# zpool replace -f zpool1 13049515403703921770 c10t4d0
# zpool status

logs
  c10t4d0ONLINE   0 0 0

# zpool attach zpool1 c10t4d0 c9t4d0

 logs
   mirror-1   ONLINE   0 0 0
 c10t4d0  ONLINE   0 0 0
 c9t4d0   ONLINE   0 0 0

And back in Solaris 10 u8:

# zpool import zpool1
# zpool status

logs
  mirrorONLINE   0 0 0
c6t4d0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 0


It does at least have a solution, even if it is rather unattractive. 12 servers, 
and has to be done at 2am means I will be testy for a while.


Lund


Jorgen Lundman wrote:



Interesting. Unfortunately, I can not zpool offline, nor zpool
detach, nor zpool remove the existing c6t4d0s0 device.



I thought perhaps we could boot something newer than b125 [*1] and I
would be able to remove the slog device that is too big.

The dev-127.iso does not boot [*2] due to splashimage, so I had to edit
the ISO to remove that for booting.

After booting with -B console=ttya, I find that it can not add the
/dev/dsk entries for the 24 HDDs, since / is on a too-small ramdisk.
Disk-full messages ensue. Yay!

After I have finally imported the pools, without upgrading (since I have
to boot back to Sol 10 u8 for production), I attempt to remove the
slog that is no longer needed:


# zpool remove zpool1 c6t4d0s0
cannot remove c6t4d0s0: pool must be upgrade to support log removal


Sigh.


Lund



[*1]
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6574286

[*2]
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6739497






--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] rquota didnot show userquota (Solaris 10)

2009-11-26 Thread Jorgen Lundman


Hopefully this will confirm to you that it should work:

x4500-10:~# zfs get userqu...@57564 zpool1/sd02_www
NAME PROPERTY VALUESOURCE
zpool1/sd02_www  userqu...@57564  29.5Glocal



prov01# df -h |grep sd02
x4500-10.unix:/export/sd02/www16T   839G15T 6%/export/sd02/www

remote/read/write/setuid/devices/vers=4/hard/intr/quota/xattr/dev=4700038


# quota -v 57564
Disk quotas for (no account) (uid 57564):
Filesystem usage  quota  limittimeleft  files  quota  limittimeleft
/export/sd02/www
  15 30932992 30932992  0  0  0



I would suggest the usual things to check:

online Aug_20   svc:/network/nfs/rquota:default

If you are using NFSv4, check /var/run/nfs4_domain match, we mount the volumes 
with -o rq (due to legacy reasons, it was UFS at one time) but I don't know if 
that is still required.




Willi Burmeister wrote:

Hi,

we have a new fileserver running on X4275 hardware with Solaris 10U8.

On this fileserver we created one test dir with quota and mounted these
on another Solaris 10 system. Here the quota command didnot show the
used quota. Does this feature only work with OpenSolaris or is it
intended to work on Solaris 10?

Here what we did on the server:

# zfs create -o mountpoint=/export/home2 zpool1/home
# zfs set sharenfs=rw=sparcs zpool1/home
# zfs set userqu...@wib=1m zpool1/home

# mkdir /export/home2/wib
# cpsome stuff  /export/home2/wib
# chown -Rh wib:sysadmin /export/home2/wib

# zfs userspace zpool1/home
TYPENAMEUSED  QUOTA
POSIX User  root  3K   none
POSIX User  wib 154K 1M

# quota -v wib
Disk quotas for wib (uid 90):
Filesystem usage  quota  limittimeleft  files  quota  limittimeleft
/export/home2
  154   1024   1024   -  -  -  -   -

and the client:

# mountserver:/export/home2/wib /mnt

% cd /mnt
% du -sk .
154 .

% quota -v wib
Disk quotas for wib (uid 90):
Filesystem usage  quota  limittimeleft  files  quota  limittimeleft


A simple snoop on the network shows us:

   client -  server   PORTMAP C GETPORT prog=100011 (RQUOTA) vers=1 proto=UDP
   server -  client   PORTMAP R GETPORT port=32865
   client -  server   RQUOTA C GETQUOTA Uid=90 Path=/export/home2/wib
   server -  client   RQUOTA R GETQUOTA No quota

Why 'no quota'?

Both systems are nearly fully patched.


Any help is appreciated. Thanks in advance.

Willi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replacing log with SSD on Sol10 u8

2009-11-25 Thread Jorgen Lundman



Interesting. Unfortunately, I can not zpool offline, nor zpool
detach, nor zpool remove the existing c6t4d0s0 device.



I thought perhaps we could boot something newer than b125 [*1] and I would be 
able to remove the slog device that is too big.


The dev-127.iso does not boot [*2] due to splashimage, so I had to edit the ISO 
to remove that for booting.


After booting with -B console=ttya, I find that it can not add the /dev/dsk 
entries for the 24 HDDs, since / is on a too-small ramdisk. Disk-full messages 
ensue. Yay!


After I have finally imported the pools, without upgrading (since I have to boot 
back to Sol 10 u8 for production), I attempt to remove the slog that is no 
longer needed:



# zpool remove zpool1 c6t4d0s0
cannot remove c6t4d0s0: pool must be upgrade to support log removal


Sigh.


Lund



[*1]
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6574286

[*2]
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6739497




--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Replacing log with SSD on Sol10 u8

2009-11-20 Thread Jorgen Lundman


Hello list,

I pre-created the pools we would use for when the SSD eventually come in. Not my 
finest moment perhaps.


Since I knew the SSDs would be 32GB in size, I created 32GB slices on HDDs in 
slot 36 and 44.


* For future reference to others thinking to do the same, do not bother setting 
up the log until you have the SSDs, or make the slices half that of the planned 
SSD size.


So the SSDs arrived, and I have a spare X4540 to attempt the replacement, before 
we have to do it on all the production x4540s. Hopefully with no downtime.


SunOS x4500-15.unix 5.10 Generic_141445-09 i86pc i386 i86pc

logs
  c5t4d0s0  ONLINE  0 0 0
  c6t4d0s0  ONLINE  0 0 0

# zpool detach zpool1 c5t4d0s0
# hdadm offline disk c5t4

This was very exciting, this is the first time EVER that the blue LED has turned 
on. much rejoicing! ;)


Took slot 36 out, and inserted the first SSD. Lights came on green again, but 
just in case;


# hdadm online disk c5t4

I used format to fdisk it, change to EFI label.

# zpool attach zpool1 c6t4d0s0 c5t4d0
cannot attach c5t4d0 to c6t4d0s0; the device is too small

Uhoh.

Of course, I created a slice of 32gb, literally, and SSD 32GB is the old HDD 
human size. This has been fixed in OpenSolaris already (attaching smaller 
mirrors), but apparently not for Solaris 10 u8. I appear screwed.



Are there patches to fix this perhaps? Hopefully? ;)


However, would I COULD do is add a new device;

# zpool add zpool1 log c5t4d0
# zpool status
logs
  c6t4d0s0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 0

Interesting. Unfortunately, I can not zpool offline, nor zpool detach, nor 
zpool remove the existing c6t4d0s0 device.




At this point we are essentially stuck. I would have to re-create the whole pool 
to fix this. With servers live and full of customer data, this will be awkward.



So I switched to a more .. direct approach.

I also knew that if the log-device fails, it will go back to using the default 
log device.


# hdadm offline disk c6t4

Even though this says OK, it does not actually work since the device is in 
use.

In the end, I simply pulled out the HDD. Since we had already added a second 
log device, there were no hiccups at all. It barely noticed it was gone.



logs
  c6t4d0s0  UNAVAIL  0 0 0  corrupted data
c5t4d0  ONLINE   0 0 0

At this point we inserted the second SSD, did the format for EFI label, and we 
were a little surprised that this worked;


# zpool attach zpool1 c5t4d0 c6t4d0

So now we have the situation of:

logs
  c6t4d0s0  UNAVAIL  0 0 0  corrupted data
  mirrorONLINE   0 0 0
c5t4d0  ONLINE   0 0 0
c6t4d0  ONLINE   0 0 0

It would be nice to get rid of c6t4d0s0 though. Any thoughts?  What would you 
experts do in this situation? We have to run Solaris 10 (lng battle there, 
no support for Opensolaris from anyone in Japan).


Can I delete the sucker using zdb?

Thanks for any reply,





--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS directory and file quota

2009-11-18 Thread Jorgen Lundman

In that case instead of rewriting the part of my code which handles
quota creation/updating/checking, I would need to completely rewrite
the quota logic. :-(


So what do you do just now with UFS ? Is it a separate filesystem for
the mail directory ? If so it really shouldn't be that big of a deal to
rewrite to run 'zfs set userquota@' instead of updating the UFS quota file.



Certainly we changed the provisioning from using edquota to 'zfs set userquota' 
without issue. All the read-only code uses rquota / quotas-command to look up 
quota, and work exactly the same with ZFS userquotas, and did not need any changes.





--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS dedup vs compression vs ZFS user/group quotas

2009-11-03 Thread Jorgen Lundman


We recently found that the ZFS user/group quota accounting for disk-usage worked 
opposite to what we were expecting. Ie, any space saved from compression was a 
benefit to the customer, not to us.


(We expected the Google style: Give a customer 2GB quota, and if compression 
saves space, that is profit to us)


Is the space saved with dedup charged in the same manner? I would expect so, I 
figured some of you would just know.  I will check when b128 is out.


I don't suppose I can change the model? :)

Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS user quota, userused updates?

2009-10-19 Thread Jorgen Lundman


Are there way to force ZFS to update, or refresh it in some way when the user 
quota/used value is not true to what is the case? Are there known way to make it 
out of sync that we should avoid?


SunOS x4500-11.unix 5.10 Generic_141445-09 i86pc i386 i86pc
(Solaris 10 10/09 u8)


zpool1/sd01_mail   223M  15.6T   222M  /export/sd01/mail


# zfs userspace zpool1/sd01_mail
TYPENAMEUSED  QUOTA
POSIX User  1029   54.0M   100M

# df -h .
Filesystem size   used  avail capacity  Mounted on
zpool1/sd01_mail16T   222M16T 1%/export/sd01/mail


# ls -lhn
total 19600
-rw---   1 1029 21001.7K Oct 20 12:03 
1256007793.V4700025I1770M252506.vmx06.unix:2,S
-rw---   1 1029 21001.7K Oct 20 12:04 
1256007873.V4700025I1772M63715.vmx06.unix:2,S
-rw---   1 1029 21001.6K Oct 20 12:05 
1256007926.V4700025I1773M949133.vmx06.unix:2,S
-rw---   1 1029 2100 76M Oct 20 12:23 
1256009005.V4700025I1791M762643.vmx06.unix:2,S
-rw---   1 1029 2100 54M Oct 20 12:36 
1256009769.V4700034I179eM739748.vmx05.unix:2,S

-rw--T   1 1029 21002.0M Oct 20 14:39 file

The 54M file appears to to accounted for, but the 76M is not. I recently added a 
2M by chown to see if it was a local-disk, vs NFS problem. The previous had not 
updated for 2 hours.



# zfs get useru...@1029 zpool1/sd01_mail
NAME  PROPERTY   VALUE  SOURCE
zpool1/sd01_mail  useru...@1029  54.0M  local


Any suggestions would be most welcome,

Lund


--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on s10u8

2009-10-17 Thread Jorgen Lundman


We were holding our breath for ZFS user-quotas, so we went u8, and upgraded the 
pool immediately. No issues here. But I had to install from CD, as LiveUpgrade 
failed. (bootadm -e no such argument)


ZFS send appears faster in u8 too, as it was still slow in u7.

Lund


dick hoogendijk wrote:

Any known issues for the new ZFS on solaris 10 update 8?
Or is it still wiser to wait doing a zpool upgrade? Because older ABE's
can no longer be accessed then.



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server

2009-09-30 Thread Jorgen Lundman


I too went with a 5in3 case for HDDs, in a nice portable Mini-ITX case, with 
Intel Atom. More of a SOHO NAS for home use, rather than a beast. Still, I can 
get about 10TB in it.


http://lundman.net/wiki/index.php/ZFS_RAID

I can also recommend the embeddedSolaris project for making a small bootable 
Solaris. Very flexible and can put on the Admin GUIs, and so on.


https://sourceforge.net/projects/embeddedsolaris/

Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Solaris License with ZFS USER quotas?

2009-09-28 Thread Jorgen Lundman


Hello list,

We are unfortunately still experiencing some issues regarding our support 
license with Sun, or rather our Sun Vendor.


We need ZFS User quotas. (That's not the zfs file-system quota) which first 
appeared in svn_114.


We would like to run something like svn_117 (don't really care which version 
per-se, that is just the one version we have done the most testing with).


But our Vendor will only support Solaris 10. After weeks of wrangling, they have 
reluctantly agreed to let us run OpenSolaris 2009.06. (Which does not have ZFS 
User quotas).


When I approach Sun-Japan directly I just get told that they don't speak 
English.  When my Japanese colleagues approach Sun-Japan directly, it is 
suggested to us that we stay with our current Vendor.


* Will there be official Solaris 10, or OpenSolaris releases with ZFS User 
quotas? (Will 2010.02 contain ZFS User quotas?)


* Can we get support overseas perhaps, that will let us run a version of Solaris 
with ZFS User quotas? Support generally includes having the ability to replace 
hardware when it dies, and/or, send panic dumps if they happen for future patches.


Internally, we are now discussing returning our 12x x4540, and calling NetApp. I 
would rather not (more work for me).


I understand Sun is probably experiencing some internal turmoil at the moment, 
but it has been rather frustrating for us.


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris License with ZFS USER quotas?

2009-09-28 Thread Jorgen Lundman



Tomas Ögren wrote:


http://sparcv9.blogspot.com/2009/08/solaris-10-update-8-1009-is-comming.html
which is in no way official, says it'll be in 10u8 which should be
coming within a month.

/Tomas


That would be perfect. I wonder why I have so much trouble finding information 
about future releases of Solaris.


Thanks

Lund


--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding SATA cards for ZFS; was Lundman home NAS

2009-08-31 Thread Jorgen Lundman
The mv8 is a marvell based chipset, and it appears there are no Solaris 
drivers for it.  There doesn't appear to be any movement from Sun or 
marvell to provide any either.




Do you mean specifically Marvell 6480 drivers? I use both DAC-SATA-MV8 
and AOC-SAT2-MV8, which use Marvell MV88SX and works very well in 
Solaris. (Package SUNWmv88sx).


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4540 dead HDD replacement, remains configured.

2009-08-21 Thread Jorgen Lundman

Nope, that it does not.



Ian Collins wrote:

Jorgen Lundman wrote:


Finally came to the reboot maintenance to reboot the x4540 to make it 
see the newly replaced HDD.


I tried, reboot, then power-cycle, and reboot -- -r,

but I can not make the x4540 accept any HDD in that bay. I'm starting 
to think that perhaps we did not lose the original HDD, but rather the 
slot, and there is a hardware problem.


This is what I see after a reboot, the disk is c1t5d0, sd37, s...@5,0 or 
slot 13.


c1::dsk/c1t4d0 disk connectedconfigured 
unknown
c1::dsk/c1t5d0 disk connectedconfigured 
unknown
c1::dsk/c1t6d0 disk connectedconfigured 
unknown



Does format show it?



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4540 dead HDD replacement, remains configured.

2009-08-19 Thread Jorgen Lundman
   30, 2311 Aug 20 02:23 s...@4,0:wd,raw
drwxr-xr-x   2 root sys2 Aug  6 14:31 s...@5,0
drwxr-xr-x   2 root sys2 Apr 17 17:52 s...@6,0
brw-r-   1 root sys   30, 2432 Jul  6 09:50 s...@6,0:a
crw-r-   1 root sys   30, 2432 Jul  6 09:48 s...@6,0:a,raw
brw-r-   1 root sys   30, 2433 Jul  6 09:50 s...@6,0:b
crw-r-   1 root sys   30, 2433 Jul  6 09:48 s...@6,0:b,raw
brw-r-   1 root sys   30, 2434 Jul  6 09:50 s...@6,0:c
[snip]
brw-r-   1 root sys   30, 2452 Jul  6 09:50 s...@6,0:u
crw-r-   1 root sys   30, 2452 Jul  6 09:48 s...@6,0:u,raw
brw-r-   1 root sys   30, 2439 Aug 20 02:24 s...@6,0:wd
crw-r-   1 root sys   30, 2439 Aug 20 02:23 s...@6,0:wd,raw
drwxr-xr-x   2 root sys2 Apr 17 17:52 s...@7,0
brw-r-   1 root sys   30, 2496 Jul  2 15:30 s...@7,0:a
crw-r-   1 root sys   30, 2496 Jul  6 09:48 s...@7,0:a,raw
brw-r-   1 root sys   30, 2497 Jul  6 09:50 s...@7,0:b
crw-r-   1 root sys   30, 2497 Jul  6 09:48 s...@7,0:b,raw
brw-r-   1 root sys   30, 2498 Jul  6 09:50 s...@7,0:c
crw-r-   1 root sys   30, 2498 Jul  6 09:43 s...@7,0:c,raw
brw-r-   1 root sys   30, 2499 Jul  6 09:50 s...@7,0:d
crw-r-   1 root sys   30, 2499 Jul  6 09:48 s...@7,0:d,raw
brw-r-   1 root sys   30, 2500 Jul  6 09:50 s...@7,0:e
crw-r-   1 root sys   30, 2500 Jul  6 09:48 s...@7,0:e,raw


So it seems s...@5,0 is empty, it is peculiar that all other HDDs on c1tX 
works though.



Eventually I notice that cfgadm goes to:

c1::dsk/c1t4d0 disk connectedconfigured 
unknown

c1::dsk/c1t5d0 disk connectedconfigured   failed
c1::dsk/c1t6d0 disk connectedconfigured 
unknown



We promoted the Spare in use to replace c1t5d0, so now the pool looks like:

  pool: zpool1
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
zpool1  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t3d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c2t3d0  ONLINE   0 0 0
c3t3d0  ONLINE   0 0 0
c4t3d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c0t7d0  ONLINE   0 0 0
c1t7d0  ONLINE   0 0 0
c2t7d0  ONLINE   0 0 0
c3t7d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c2t0d0  ONLINE   0 0 0
c3t0d0  ONLINE   0 0 0
c4t0d0  ONLINE   0 0 0
c5t0d0  ONLINE   0 0 0
c0t6d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c1t6d0  ONLINE   0 0 0
c2t6d0  ONLINE   0 0 0
c3t6d0  ONLINE   0 0 0
c4t6d0  ONLINE   0 0 0
c5t6d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
c2t1d0  ONLINE   0 0 0
c3t1d0  ONLINE   0 0 0
c4t1d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
c0t5d0  ONLINE   0 0 0
c4t7d0  ONLINE   0 0 0   [was c1t5d0]
c2t5d0  ONLINE   0 0 0
c3t5d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c4t5d0  ONLINE   0 0 0
c5t5d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c2t2d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c3t2d0  ONLINE   0 0 0
c4t2d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c0t4d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c2t4d0  ONLINE   0 0 0
c3t4d0  ONLINE   0 0 0
c4t4d0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 0
c5t7d0  ONLINE   0 0 0





--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing

Re: [zfs-discuss] Ssd for zil on a dell 2950

2009-08-19 Thread Jorgen Lundman


Does un-taring something count? It is what I used for our tests.

I tested with ZIL disable, zil cache on /tmp/zil, CF-card (300x) and 
cheap SSD. Waiting for X-25E SSDs to arrive for testing those:


http://mail.opensolaris.org/pipermail/zfs-discuss/2009-July/030183.html

If you want a quick answer, disable ZIL (you need to unmount/mount, 
export/import or reboot) on your ZFS volume and try it. That is the 
theoretical maximum. You can get close to this using various 
technologies, SSD and all that.


I am no expert on this, I knew nothing about it 2 weeks ago.

But for our provisioning engine to untar Movable-Types for customers, 5 
mins to 45secs is quite an improvement. I can get that to 11seconds 
theoretically. (ZIL disable)


Lund


Monish Shah wrote:

Hello Greg,

I'm curious how much performance benefit you gain from the ZIL 
accelerator. Have you measured that?  If not, do you have a gut feel 
about how much it helped?  Also, for what kind of applications does it 
help?


(I know it helps with synchronous writes.  I'm looking for real world 
answers like: Our XYZ application was running like a dog and we added 
an SSD for ZIL and the response time improved by X%.)


Of course, I would welcome a reply from anyone who has experience with 
this, not just Greg.


Monish

- Original Message - From: Greg Mason gma...@msu.edu
To: HUGE | David Stahl dst...@hugeinc.com
Cc: zfs-discuss zfs-discuss@opensolaris.org
Sent: Thursday, August 20, 2009 4:04 AM
Subject: Re: [zfs-discuss] Ssd for zil on a dell 2950


Hi David,

We are using them in our Sun X4540 filers. We are actually using 2 SSDs
per pool, to improve throughput (since the logbias feature isn't in an
official release of OpenSolaris yet). I kind of wish they made an 8G or
16G part, since the 32G capacity is kind of a waste.

We had to go the NewEgg route though. We tried to buy some Sun-branded
disks from Sun, but that's a different story. To summarize, we had to
buy the NewEgg parts to ensure a project stayed on-schedule.

Generally, we've been pretty pleased with them. Occasionally, we've had
an SSD that wasn't behaving well. Looks like you can replace log devices
now though... :) We use the 2.5 to 3.5 SATA adapter from IcyDock, in a
Sun X4540 drive sled. If you can attach a standard sata disk to a Dell
sled, this approach would most likely work for you as well. Only issue
with using the third-party parts is that the involved support
organizations for the software/hardware will make it very clear that
such a configuration is quite unsupported. That said, we've had pretty
good luck with them.

-Greg



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] libzfs API: sharenfs, sharesmb, shareiscsi, $custom ?

2009-08-17 Thread Jorgen Lundman


It is llink, which was initially just HTTP streamer for 
Syabas/NetworkMediaTank, but I did just add UPnP, based on whtsup360 dumps.


http://lundman.net/wiki/index.php/Llink

You are welcome to try the latest development sources and test, but I 
believe the current state is that it shows up on 360, but does not browse.


However, llink just shares content, no transcoding. At least not yet.


Regarding the original question;
It seems that it is easy to add own attributes, I will peruse the  API 
documentation detailing how I would iterate file-systems looking for my 
attribute, since I would rather not system(zfs) hack it.



Lund



Ross wrote:

Hi Jorgen,

Does that software work to stream media to an xbox 360?  If so could I have a 
play with it?  It sounds ideal for my home server.

cheers,

Ross


--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] libzfs API: sharenfs, sharesmb, shareiscsi, $custom ?

2009-08-17 Thread Jorgen Lundman


I cheated and simply use system() for the time being, but I will say 
that was rather nice and easy. Thank you Sun and OpenSolaris people.



llink.conf:
# If you use ZFS, you can auto-export any filesystem with a certain 
attribute set.

# For example: zfs set net.lundman:sharellink=on zpool1/media
ROOT|ZFS=net.lundman:sharellink|PATH=/usr/sbin/zfs


@root.c
debugf(  : looking for ZFS filesystems\n);
snprintf(buffer, sizeof(buffer), %s list -H -o mountpoint,%s,
 path, zfs);
spawn = lion_system(buffer, 0, LION_FLAG_FULFILL, zfs);
if (spawn) lion_set_handler(spawn, root_zfs_handler);



# zfs set net.lundman:sharellink=on zpool1/media

# ./llink -d -v 32

./llink - Jorgen Lundman v2.2.1 lund...@shinken.interq.or.jp build 1451 
(Tue Aug 18 14:02:44 2009) (libdvdnav).


  : looking for ZFS filesystems
  : [root] recognising 'xtrailers'
  : zfs command running
  : zfs adding '/zpool1/media'
  : [root] recognising '/zpool1/media'
  : zfs command finished.
[main] ready!


--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] libzfs API: sharenfs, sharesmb, shareiscsi, $custom ?

2009-08-16 Thread Jorgen Lundman


Hello list,

As the developer of software that exports data/shares like that of NFS 
and Samba. (HTTP/UPnP export, written in C) I am curious if the libzfs 
API is flexible enough for me to create my own file-system attributes, 
similar to that of sharenfs and obtain this information in my software.


Perhaps something in the lines of:

 zfs -o shareupnp=on zpool1/media

And I will modify my streamer software to do the necessary calls to 
obtain the file-systems set to export.


Or are there other suggestions to achieve similar results? I could 
mirror sharenfs but I was under the impression that the API is 
flexible. The ultimate goal is to move away from static paths listed in 
the config file.


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] x4540 dead HDD replacement, remains configured.

2009-08-06 Thread Jorgen Lundman


x4540 snv_117

We lost a HDD last night, and it seemed to take out most of the bus or 
something and forced us to reboot. (We have yet to experience losing a 
disk that didn't force a reboot mind you).


So today, I'm looking at replacing the broken HDD, but no amount of work 
makes it turn on the blue LED. After trying that for an hour, we just 
replaced the HDD anyway. But no amount of work will make it 
use/recognise it. (We tried more than one working spare HDD too).


For example:

# zpool status

  raidz1  DEGRADED 0 0 0
c5t1d0ONLINE   0 0 0
c0t5d0ONLINE   0 0 0
spare DEGRADED 0 0  285K
  c1t5d0  UNAVAIL  0 0 0  cannot open
  c4t7d0  ONLINE   0 0 0  4.13G resilvered
c2t5d0ONLINE   0 0 0
c3t5d0ONLINE   0 0 0
spares
  c4t7d0  INUSE currently in use



# zpool offline zpool1 c1t5d0

  raidz1  DEGRADED 0 0 0
c5t1d0ONLINE   0 0 0
c0t5d0ONLINE   0 0 0
spare DEGRADED 0 0  285K
  c1t5d0  OFFLINE  0 0 0
  c4t7d0  ONLINE   0 0 0  4.13G resilvered
c2t5d0ONLINE   0 0 0
c3t5d0ONLINE   0 0 0


# cfgadm -al
Ap_Id  Type Receptacle   Occupant 
Condition
c1 scsi-bus connectedconfigured 
unknown

c1::dsk/c1t5d0 disk connectedconfigured   failed

# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -al
c1::dsk/c1t5d0 disk connectedconfigured   failed
# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -fc unconfigure c1::dsk/c1t5d0
# cfgadm -fc unconfigure c1::dsk/c1t5d0
# cfgadm -al
c1::dsk/c1t5d0 disk connectedconfigured   failed

# hdadm offline slot 13
 1:5:9:   13:   17:   21:   25:   29:   33:   37:   41:   45:
c0t1  c0t5  c1t1  c1t5  c2t1  c2t5  c3t1  c3t5  c4t1  c4t5  c5t1  c5t5
^b+   ^++   ^b+   ^--   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++

# cfgadm -al
c1::dsk/c1t5d0 disk connectedconfigured   failed

 # fmadm faulty
FRU : HD_ID_47 
(hc://:product-id=Sun-Fire-X4540:chassis-id=0915AMR048:server-id=x4500-10.unix:serial=9QMB024K:part=SEAGATE-ST35002NSSUN500G-09107B024K:revision=SU0D/chassis=0/bay=47/disk=0)

  faulty

 # fmadm repair HD_ID_47
fmadm: recorded repair to HD_ID_47

 # format | grep c1t5d0
 #

 # hdadm offline slot 13
 1:5:9:   13:   17:   21:   25:   29:   33:   37:   41:   45:
c0t1  c0t5  c1t1  c1t5  c2t1  c2t5  c3t1  c3t5  c4t1  c4t5  c5t1  c5t5
^b+   ^++   ^b+   ^--   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++

 # cfgadm -al
c1::dsk/c1t5d0 disk connectedconfigured   failed

 # ipmitool sunoem led get|grep 13
 hdd13.fail.led   | ON
 hdd13.ok2rm.led  | OFF

# zpool online zpool1 c1t5d0
warning: device 'c1t5d0' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present

# cfgadm -c disconnect c1::dsk/c1t5d0
cfgadm: Hardware specific failure: operation not supported for SCSI device


Bah, why were they changed to SCSI? Increasing the size of the hammer...


# cfgadm -x replace_device c1::sd37
Replacing SCSI device: /devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0
This operation will suspend activity on SCSI bus: c1
Continue (yes/no)? y
SCSI bus quiesced successfully.
It is now safe to proceed with hotplug operation.
Enter y if operation is complete or n to abort (yes/no)? y

# cfgadm -al
c1::dsk/c1t5d0 disk connectedconfigured   failed


I am fairly certain that if I reboot, it will all come back ok again. 
But I would like to believe that I should be able to replace a disk 
without rebooting on a X4540.


Any other commands I should try?

Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lundman home NAS

2009-08-06 Thread Jorgen Lundman


The case is made by Chyangfun, and the model made for Mini-ITX 
motherboards is called CGN-S40X. They had 6 pcs left last I talked to 
them, and need 3 week lead for more if I understand it correctly. I need 
to finish my LCD panel work before I will open shop to sell these.


As for temperature, I have only check the server HDDs so far (on my 
wiki) but will test with green HDDs tonight.


I do not know if Solaris can retrieve the Atom chipset temperature readings.

The parts I used should be listed on my wiki.



Anon wrote:

I have the same case which I use as directed attached storage.  I never thought 
about using it with a motherboard inside.

Could you provide a complete parts list?

What sort of temperatures at the chip, chipset, and drives did you find?

Thanks!


--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4540 dead HDD replacement, remains configured.

2009-08-06 Thread Jorgen Lundman


I suspect this is what it is all about:

 # devfsadm -v
devfsadm[16283]: verbose: no devfs node or mismatched dev_t for 
/devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0:a

[snip]

and indeed:

brw-r-   1 root sys   30, 2311 Aug  6 15:34 s...@4,0:wd
crw-r-   1 root sys   30, 2311 Aug  6 15:24 s...@4,0:wd,raw
drwxr-xr-x   2 root sys2 Aug  6 14:31 s...@5,0
drwxr-xr-x   2 root sys2 Apr 17 17:52 s...@6,0
brw-r-   1 root sys   30, 2432 Jul  6 09:50 s...@6,0:a
crw-r-   1 root sys   30, 2432 Jul  6 09:48 s...@6,0:a,raw

Perhaps because it was booted with the dead disk in place, it never 
configured the entire sd5 mpt driver. Why the other hard-disks work I 
don't know.


I suspect the only way to fix this, is to reboot again.

Lund


Jorgen Lundman wrote:


x4540 snv_117

We lost a HDD last night, and it seemed to take out most of the bus or 
something and forced us to reboot. (We have yet to experience losing a 
disk that didn't force a reboot mind you).


So today, I'm looking at replacing the broken HDD, but no amount of work 
makes it turn on the blue LED. After trying that for an hour, we just 
replaced the HDD anyway. But no amount of work will make it 
use/recognise it. (We tried more than one working spare HDD too).


For example:

# zpool status

  raidz1  DEGRADED 0 0 0
c5t1d0ONLINE   0 0 0
c0t5d0ONLINE   0 0 0
spare DEGRADED 0 0  285K
  c1t5d0  UNAVAIL  0 0 0  cannot open
  c4t7d0  ONLINE   0 0 0  4.13G resilvered
c2t5d0ONLINE   0 0 0
c3t5d0ONLINE   0 0 0
spares
  c4t7d0  INUSE currently in use



# zpool offline zpool1 c1t5d0

  raidz1  DEGRADED 0 0 0
c5t1d0ONLINE   0 0 0
c0t5d0ONLINE   0 0 0
spare DEGRADED 0 0  285K
  c1t5d0  OFFLINE  0 0 0
  c4t7d0  ONLINE   0 0 0  4.13G resilvered
c2t5d0ONLINE   0 0 0
c3t5d0ONLINE   0 0 0


# cfgadm -al
Ap_Id  Type Receptacle   Occupant Condition
c1 scsi-bus connectedconfigured unknown
c1::dsk/c1t5d0 disk connectedconfigured   
failed


# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -al
c1::dsk/c1t5d0 disk connectedconfigured   
failed

# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -c unconfigure c1::dsk/c1t5d0
# cfgadm -fc unconfigure c1::dsk/c1t5d0
# cfgadm -fc unconfigure c1::dsk/c1t5d0
# cfgadm -al
c1::dsk/c1t5d0 disk connectedconfigured   
failed


# hdadm offline slot 13
 1:5:9:   13:   17:   21:   25:   29:   33:   37:   41:   45:
c0t1  c0t5  c1t1  c1t5  c2t1  c2t5  c3t1  c3t5  c4t1  c4t5  c5t1  c5t5
^b+   ^++   ^b+   ^--   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++

# cfgadm -al
c1::dsk/c1t5d0 disk connectedconfigured   
failed


 # fmadm faulty
FRU : HD_ID_47 
(hc://:product-id=Sun-Fire-X4540:chassis-id=0915AMR048:server-id=x4500-10.unix:serial=9QMB024K:part=SEAGATE-ST35002NSSUN500G-09107B024K:revision=SU0D/chassis=0/bay=47/disk=0) 


  faulty

 # fmadm repair HD_ID_47
fmadm: recorded repair to HD_ID_47

 # format | grep c1t5d0
 #

 # hdadm offline slot 13
 1:5:9:   13:   17:   21:   25:   29:   33:   37:   41:   45:
c0t1  c0t5  c1t1  c1t5  c2t1  c2t5  c3t1  c3t5  c4t1  c4t5  c5t1  c5t5
^b+   ^++   ^b+   ^--   ^++   ^++   ^++   ^++   ^++   ^++   ^++   ^++

 # cfgadm -al
c1::dsk/c1t5d0 disk connectedconfigured   
failed


 # ipmitool sunoem led get|grep 13
 hdd13.fail.led   | ON
 hdd13.ok2rm.led  | OFF

# zpool online zpool1 c1t5d0
warning: device 'c1t5d0' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present

# cfgadm -c disconnect c1::dsk/c1t5d0
cfgadm: Hardware specific failure: operation not supported for SCSI device


Bah, why were they changed to SCSI? Increasing the size of the hammer...


# cfgadm -x replace_device c1::sd37
Replacing SCSI device: /devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0
This operation will suspend activity on SCSI bus: c1
Continue (yes/no)? y
SCSI bus quiesced successfully.
It is now safe to proceed with hotplug operation.
Enter y if operation is complete or n to abort (yes/no)? y

# cfgadm -al
c1::dsk/c1t5d0 disk connectedconfigured   
failed



I am fairly certain that if I reboot, it will all come back ok again. 
But I would like to believe that I should be able to replace a disk 
without rebooting on a X4540.


Any other commands I should try?

Lund



--
Jorgen

Re: [zfs-discuss] x4540 dead HDD replacement, remains configured.

2009-08-06 Thread Jorgen Lundman


Well, to be fair, there were some special cases.

I know we had 3 separate occasions with broken HDDs, when we were using 
UFS. 2 of these appeared to hang, and the 3rd only hung once we replaced 
the disk. This is most likely due to use using UFS in zvol (for quotas). 
We got an IDR patch, and eventually this was released as UFS 3-way 
deadlock writing log with zvol. I forget the number right now, but the 
patch is out.


This is the very first time we have lost a disk in a purely-ZFS system, 
and I was somewhat hoping that this would be the time everything went 
smoothly. But it did not.


However, I have also experienced (once) a disk dying in such a way that 
it took out the chain in a netapp, so perhaps the disk died like this 
here to (it is really dead).


But still disappointing.

Power cycling the x4540 takes about 7 minutes (service to service), but 
with Sol svn116(?) and up it can do quiesce-reboots, which take about 57 
seconds. In this case, we had to power cycle.




Ross wrote:

Whoah!

We have yet to experience losing a
disk that didn't force a reboot

Do you have any notes on how many times this has happened Jorgen, or what steps 
you've taken each time?

I appreciate you're probably more concerned with getting an answer to your 
question, but if ZFS needs a reboot to cope with failures on even an x4540, 
that's an absolute deal breaker for everything we want to do with ZFS.

Ross


--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lundman home NAS

2009-08-02 Thread Jorgen Lundman


100Mbit is quite flat at 11MB/s;

http://lundman.net/wiki/index.php/Lraid5_iozone#Solaris_10_64-bit.2C_OsX_10.5.5_NFSv3.2C_100MBit.2C_ZIL_cache_disabled

1Gbit, MTU 1500;

http://lundman.net/wiki/index.php/Lraid5_iozone#Solaris_10_64-bit.2C_OsX_10.5.5_NFSv3.2C_1GBit.2C_ZIL_cache_disabled

Not sure how to enable jumbo frames on the rge0. When I use
dladm set-linkprop -p mtu 9000 rge0

I get operation not supported. PERM is r--.

Most likely I have to set it with rge.conf, and reboot, but I would need 
to rebuild my USB image for that. (unplumb, modunload, modload, plumb 
did not seem to enable it either).




Jorgen Lundman wrote:
Ok I have redone the initial tests as 4G instead. Graphs are on the same 
place.


http://lundman.net/wiki/index.php/Lraid5_iozone

I also mounted it with nfsv3 and mounted it for more iozone. Alas, I 
started with 100mbit, so it has taken quite a while. It is constantly at 
11MB/s though. ;)




Jorgen Lundman wrote:
I was following Toms Hardware on how they test NAS units. I have 2GB 
memory, so I will re-run the test at 4, if I figure out which option 
that is.


I used Excel for the graphs in this case, gnuplot did not want to 
work. (Nor did Excel mind you)



Bob Friesenhahn wrote:

On Sat, 1 Aug 2009, Louis-Frédéric Feuillette wrote:

I find the results suspect.  1.2GB/s read, and 500MB/s write ! These 
are

impressive numbers indeed.  I then looked at the file sizes that iozone
used...  How much memory do you have?  I seems like the files would be
able to comfortably fit in memory.  I think this test needs to be 
re-run

with Large files (ie 2*Memory size ) for them to give more accurate
data.


The numbers are indeed suspect but the iozone sweep test is quite 
useful in order to see the influence of zfs's caching via the ARC. 
The sweep should definitely be run to at least 2X the memory size.



Unrelated, what did you use to generate those graphs, they look good.


Iozone output may be plotted via gnuplot or Microsoft Excel.  This 
looks like the gnuplot output.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss






--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lundman home NAS

2009-08-01 Thread Jorgen Lundman

Some preliminary speed tests, not too bad for a pci32 card.

http://lundman.net/wiki/index.php/Lraid5_iozone


Jorgen Lundman wrote:


Finding a SATA card that would work with Solaris, and be hot-swap, and 
more than 4 ports, sure took a while. Oh and be reasonably priced ;) 
Double the price of the dual core Atom did not seem right.


The SATA card was a close fit to the jumper were the power-switch cable 
attaches, as you can see in one of the photos. This is because the MV8 
card is quite long, and has the big plastic SATA sockets. It does fit, 
but it was the tightest spot.


I also picked the 5-in-3 drive cage that had the shortest depth 
listed, 190mm. For example the Supermicro M35T is 245mm, another 5cm. 
Not sure that would fit.


Lund


Nathan Fiedler wrote:

Yes, please write more about this. The photos are terrific and I
appreciate the many useful observations you've made. For my home NAS I
chose the Chenbro ES34069 and the biggest problem was finding a
SATA/PCI card that would work with OpenSolaris and fit in the case
(technically impossible without a ribbon cable PCI adapter). After
seeing this, I may reconsider my choice.

For the SATA card, you mentioned that it was a close fit with the case
power switch. Would removing the backplane on the card have helped?

Thanks

n


On Fri, Jul 31, 2009 at 5:22 AM, Jorgen Lundmanlund...@gmo.jp wrote:

I have assembled my home RAID finally, and I think it looks rather good.

http://www.lundman.net/gallery/v/lraid5/p1150547.jpg.html

Feedback is welcome.

I have yet to do proper speed tests, I will do so in the coming week 
should

people be interested.

Even though I have tried to use only existing, and cheap, parts the 
end sum
became higher than I expected. Final price is somewhere in the 47,000 
yen

range. (Without hard disks)

If I were to make and sell these, they would be 57,000 or so, so I do 
not
really know if anyone would be interested. Especially since SOHO NAS 
devices

seem to start around 80,000.

Anyway, sure has been fun.

Lund

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lundman home NAS

2009-08-01 Thread Jorgen Lundman
Ok I have redone the initial tests as 4G instead. Graphs are on the same 
place.


http://lundman.net/wiki/index.php/Lraid5_iozone

I also mounted it with nfsv3 and mounted it for more iozone. Alas, I 
started with 100mbit, so it has taken quite a while. It is constantly at 
11MB/s though. ;)




Jorgen Lundman wrote:
I was following Toms Hardware on how they test NAS units. I have 2GB 
memory, so I will re-run the test at 4, if I figure out which option 
that is.


I used Excel for the graphs in this case, gnuplot did not want to work. 
(Nor did Excel mind you)



Bob Friesenhahn wrote:

On Sat, 1 Aug 2009, Louis-Frédéric Feuillette wrote:


I find the results suspect.  1.2GB/s read, and 500MB/s write ! These are
impressive numbers indeed.  I then looked at the file sizes that iozone
used...  How much memory do you have?  I seems like the files would be
able to comfortably fit in memory.  I think this test needs to be re-run
with Large files (ie 2*Memory size ) for them to give more accurate
data.


The numbers are indeed suspect but the iozone sweep test is quite 
useful in order to see the influence of zfs's caching via the ARC. The 
sweep should definitely be run to at least 2X the memory size.



Unrelated, what did you use to generate those graphs, they look good.


Iozone output may be plotted via gnuplot or Microsoft Excel.  This 
looks like the gnuplot output.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Lundman home NAS

2009-07-31 Thread Jorgen Lundman


I have assembled my home RAID finally, and I think it looks rather good.

http://www.lundman.net/gallery/v/lraid5/p1150547.jpg.html

Feedback is welcome.

I have yet to do proper speed tests, I will do so in the coming week 
should people be interested.


Even though I have tried to use only existing, and cheap, parts the end 
sum became higher than I expected. Final price is somewhere in the 
47,000 yen range. (Without hard disks)


If I were to make and sell these, they would be 57,000 or so, so I do 
not really know if anyone would be interested. Especially since SOHO NAS 
devices seem to start around 80,000.


Anyway, sure has been fun.

Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lundman home NAS

2009-07-31 Thread Jorgen Lundman


Finding a SATA card that would work with Solaris, and be hot-swap, and 
more than 4 ports, sure took a while. Oh and be reasonably priced ;) 
Double the price of the dual core Atom did not seem right.


The SATA card was a close fit to the jumper were the power-switch cable 
attaches, as you can see in one of the photos. This is because the MV8 
card is quite long, and has the big plastic SATA sockets. It does fit, 
but it was the tightest spot.


I also picked the 5-in-3 drive cage that had the shortest depth 
listed, 190mm. For example the Supermicro M35T is 245mm, another 5cm. 
Not sure that would fit.


Lund


Nathan Fiedler wrote:

Yes, please write more about this. The photos are terrific and I
appreciate the many useful observations you've made. For my home NAS I
chose the Chenbro ES34069 and the biggest problem was finding a
SATA/PCI card that would work with OpenSolaris and fit in the case
(technically impossible without a ribbon cable PCI adapter). After
seeing this, I may reconsider my choice.

For the SATA card, you mentioned that it was a close fit with the case
power switch. Would removing the backplane on the card have helped?

Thanks

n


On Fri, Jul 31, 2009 at 5:22 AM, Jorgen Lundmanlund...@gmo.jp wrote:

I have assembled my home RAID finally, and I think it looks rather good.

http://www.lundman.net/gallery/v/lraid5/p1150547.jpg.html

Feedback is welcome.

I have yet to do proper speed tests, I will do so in the coming week should
people be interested.

Even though I have tried to use only existing, and cheap, parts the end sum
became higher than I expected. Final price is somewhere in the 47,000 yen
range. (Without hard disks)

If I were to make and sell these, they would be 57,000 or so, so I do not
really know if anyone would be interested. Especially since SOHO NAS devices
seem to start around 80,000.

Anyway, sure has been fun.

Lund

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08

2009-07-30 Thread Jorgen Lundman



Bob Friesenhahn wrote:
Something to be aware of is that not all SSDs are the same.  In fact, 
some faster SSDs may use a RAM write cache (they all do) and then 
ignore a cache sync request while not including hardware/firmware 
support to ensure that the data is persisted if there is power loss. 
Perhaps your fast CF device does that.  If so, that would be really 
bad for zfs if your server was to spontaneously reboot or lose power. 
This is why you really want a true enterprise-capable SSD device for 
your slog.


Naturally, we just wanted to try the various technologies to see how 
they compared. Store-bought CF card took 26s, store-bought SSD 48s. We 
have not found a PCI NVRam card yet.


When talking to our Sun vendor, they have no solutions, which is annoying.

X25-E would be good, but some pools have no spares, and since you can't 
remove vdevs, we'd have to move all customers off the x4500 before we 
can use it.


CF card need reboot to see the cards, but 6 servers are x4500, not 
x4540, so not really a global solution.


PCI NVRam cards need a reboot, but should work in both x4500 and x4540 
without zpool rebuilding. But can't actually find any with Solaris drivers.


Peculiar.

Lund


--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08

2009-07-30 Thread Jorgen Lundman


X25-E would be good, but some pools have no spares, and since you can't 
remove vdevs, we'd have to move all customers off the x4500 before we 
can use it.


Ah it just occurred to me that perhaps for our specific problem, we will 
buy two X25-Es and replace the root mirror. The OS and ZIL logs can live 
 together and put /var in the data pool. That way we would not need to 
rebuild the data-pool and all the work that comes with that.


Shame I can't zpool replace to a smaller disk (500GB HDD to 32GB SSD) 
though, I will have to lucreate and reboot one time.


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08

2009-07-29 Thread Jorgen Lundman


We just picked up the fastest SSD we could in the local biccamera, which 
turned out to be a CSSDーSM32NI, with supposedly 95MB/s write speed.


I put it in place, and replaced the slog over:

  0m49.173s
  0m48.809s

So, it is slower than the CF test. This is disappointing. Everyone else 
seems to use Intel X25-M, which have a write-speed of 170MB/s (2nd 
generation) so perhaps that is why it works better for them. It is 
curious that it is slower than the CF card. Perhaps because it shares 
with so many other SATA devices?


Oh and we'll probably have to get a 3.5 frame for it, as I doubt it'll 
stay standing after the next earthquake. :)


Lund


Jorgen Lundman wrote:


This thread started over in nfs-discuss, as it appeared to be an nfs 
problem initially. Or at the very least, interaction between nfs and zil.


Just summarising speeds we have found when untarring something. Always 
in a new/empty directory. Only looking at write speed. read is always 
very fast.


The reason we started to look at this was because the 7 year old netapp 
being phased out, could untar the test file in 11 seconds. The 
x4500/x4540 Suns took 5 minutes.


For all our tests, we used MTOS-4.261-ja.tar.gz, just a random tarball I 
had lying around, but it can be downloaded here if you want the same 
test. (http://www.movabletype.org/downloads/stable/MTOS-4.261-ja.tar.gz)


The command executed generally, is:

# mkdir .test34  time gtar --directory=.test34 -zxf 
/tmp/MTOS-4.261-ja.tar.gz




Solaris 10 1/06 intel client: netapp 6.5.1 FAS960 server: NFSv3
  0m11.114s

Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 server: nfsv4
  5m11.654s

Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv3
  8m55.911s

Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv4
  10m32.629s


Just untarring the tarball on the x4500 itself:

: x4500 OpenSolaris svn117 server
  0m0.478s

: x4500 Solaris 10 10/08 server
  0m1.361s



So ZFS itself is very fast. Replacing NFS with different protocols, 
identical setup, just changing tar with rsync, and nfsd with sshd.


The baseline test, using:
rsync -are ssh /tmp/MTOS-4.261-ja /export/x4500/testXX


Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync on nfsv4
  3m44.857s

Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync+ssh
  0m1.387s

So, get rid of nfsd and it goes from 3 minutes to 1 second!

Lets share it with smb, and mount it:


OsX 10.5.6 intel client: x4500 OpenSolaris svn117 : smb+untar
  0m24.480s


Neat, even SMB can beat nfs in default settings.

This would then indicate to me that nfsd is broken somehow, but then we 
try again after only disabling ZIL.



Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE ZIL: nfsv4
  0m8.453s
  0m8.284s
  0m8.264s

Nice, so this is theoretically the fastest NFS speeds we can reach? We 
run postfix+dovecot for mail, which probably would be safe to not use 
ZIL. The other type is FTP/WWW/CGI, which has more active 
writes/updates. Probably not as good. Comments?



Enable ZIL, but disable zfscache (Just as a test, I have been told 
disabling zfscache is far more dangerous).



Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE zfscacheflush: nfsv4
  0m45.139s

Interesting. Anyway, enable ZIL and zfscacheflush again, and learn a 
whole lot about slog.


First I tried creating a 2G slog on the boot mirror:


Solaris 10 6/06 : x4500 OpenSolaris svn117 slog boot pool: nfsv4

  1m59.970s


Some improvements. For a lark, I created a 2GB file in /tmp/ and changed 
the slog to that. (I know, having the slog in volatile RAM is pretty 
much the same as disabling ZIL. But it should give me theoretical 
maximum speed with ZIL enabled right?).



Solaris 10 6/06 : x4500 OpenSolaris svn117 slog /tmp/junk: nfsv4
  0m8.916s


Nice! Same speed as ZIL disabled. Since this is a X4540, we thought we 
would test with a CF card attached. Alas the 600X (92MB/s) card are not 
out until next month, rats! So, we bought a 300X (40MB/s) card.



Solaris 10 6/06 : x4500 OpenSolaris svn117 slog 300X CFFlash: nfsv4
  0m26.566s


Not too bad really. But you have to reboot to see a CF card, fiddle with 
BIOS for the boot order etc. Just not an easy add on a live system. A 
SATA emulated SSD DISK can be hot-swapped.



Also, I learned an interesting lesson about rebooting with slog at 
/tmp/junk.



I am hoping to pick up a SSD SATA device today and see what speeds we 
get out of that.


That rsync (1s) vs nfs(8s) I can accept as over-head on a much more 
complicated protocol, but why would it take 3 minutes to write the same 
data on the same pool, with rsync(1s) vs nfs(3m)? The ZIL was on, slog 
is default, but both writing the same way. Does nfsd add FD_SYNC to 
every close regardless as to whether the application did or not?

This I have not yet wrapped my head around.

For example, I know rsync

Re: [zfs-discuss] [n/zfs-discuss] Strange speeds with x4500, Solaris 10 10/08

2009-07-28 Thread Jorgen Lundman


This thread started over in nfs-discuss, as it appeared to be an nfs 
problem initially. Or at the very least, interaction between nfs and zil.


Just summarising speeds we have found when untarring something. Always 
in a new/empty directory. Only looking at write speed. read is always 
very fast.


The reason we started to look at this was because the 7 year old netapp 
being phased out, could untar the test file in 11 seconds. The 
x4500/x4540 Suns took 5 minutes.


For all our tests, we used MTOS-4.261-ja.tar.gz, just a random tarball I 
had lying around, but it can be downloaded here if you want the same 
test. (http://www.movabletype.org/downloads/stable/MTOS-4.261-ja.tar.gz)


The command executed generally, is:

# mkdir .test34  time gtar --directory=.test34 -zxf 
/tmp/MTOS-4.261-ja.tar.gz




Solaris 10 1/06 intel client: netapp 6.5.1 FAS960 server: NFSv3
  0m11.114s

Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 server: nfsv4
  5m11.654s

Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv3
  8m55.911s

Solaris 10 6/06 intel client: x4500 Solaris 10 10/08 server: nfsv4
  10m32.629s


Just untarring the tarball on the x4500 itself:

: x4500 OpenSolaris svn117 server
  0m0.478s

: x4500 Solaris 10 10/08 server
  0m1.361s



So ZFS itself is very fast. Replacing NFS with different protocols, 
identical setup, just changing tar with rsync, and nfsd with sshd.


The baseline test, using:
rsync -are ssh /tmp/MTOS-4.261-ja /export/x4500/testXX


Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync on nfsv4
  3m44.857s

Solaris 10 6/06 intel client: x4500 OpenSolaris svn117 : rsync+ssh
  0m1.387s

So, get rid of nfsd and it goes from 3 minutes to 1 second!

Lets share it with smb, and mount it:


OsX 10.5.6 intel client: x4500 OpenSolaris svn117 : smb+untar
  0m24.480s


Neat, even SMB can beat nfs in default settings.

This would then indicate to me that nfsd is broken somehow, but then we 
try again after only disabling ZIL.



Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE ZIL: nfsv4
  0m8.453s
  0m8.284s
  0m8.264s

Nice, so this is theoretically the fastest NFS speeds we can reach? We 
run postfix+dovecot for mail, which probably would be safe to not use 
ZIL. The other type is FTP/WWW/CGI, which has more active 
writes/updates. Probably not as good. Comments?



Enable ZIL, but disable zfscache (Just as a test, I have been told 
disabling zfscache is far more dangerous).



Solaris 10 6/06 : x4500 OpenSolaris svn117 DISABLE zfscacheflush: nfsv4
  0m45.139s

Interesting. Anyway, enable ZIL and zfscacheflush again, and learn a 
whole lot about slog.


First I tried creating a 2G slog on the boot mirror:


Solaris 10 6/06 : x4500 OpenSolaris svn117 slog boot pool: nfsv4

  1m59.970s


Some improvements. For a lark, I created a 2GB file in /tmp/ and changed 
the slog to that. (I know, having the slog in volatile RAM is pretty 
much the same as disabling ZIL. But it should give me theoretical 
maximum speed with ZIL enabled right?).



Solaris 10 6/06 : x4500 OpenSolaris svn117 slog /tmp/junk: nfsv4
  0m8.916s


Nice! Same speed as ZIL disabled. Since this is a X4540, we thought we 
would test with a CF card attached. Alas the 600X (92MB/s) card are not 
out until next month, rats! So, we bought a 300X (40MB/s) card.



Solaris 10 6/06 : x4500 OpenSolaris svn117 slog 300X CFFlash: nfsv4
  0m26.566s


Not too bad really. But you have to reboot to see a CF card, fiddle with 
BIOS for the boot order etc. Just not an easy add on a live system. A 
SATA emulated SSD DISK can be hot-swapped.



Also, I learned an interesting lesson about rebooting with slog at 
/tmp/junk.



I am hoping to pick up a SSD SATA device today and see what speeds we 
get out of that.


That rsync (1s) vs nfs(8s) I can accept as over-head on a much more 
complicated protocol, but why would it take 3 minutes to write the same 
data on the same pool, with rsync(1s) vs nfs(3m)? The ZIL was on, slog 
is default, but both writing the same way. Does nfsd add FD_SYNC to 
every close regardless as to whether the application did or not?

This I have not yet wrapped my head around.

For example, I know rsync and tar does not use fdsync (but dovecot does) 
on its close(), but does NFS make it fdsync anyway?



Sorry for the giant email.


--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Mirror cloning

2009-07-24 Thread Jorgen Lundman



Darren J Moffat wrote:
Maybe the 2 disk mirror is a special enough case that this could be 
worth allowing without having to deal with all the other cases as well. 
  The only reason I think it is a special enough cases is because it is 
the config we use for the root/boot pool.


See 6849185 and 5097228.



Ah of course, you have a valid point and mirrors can be used it much 
more complicated situations.


Been reading your blog all day, while impatiently waiting for zfs-crypto..

Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Mirror cloning

2009-07-24 Thread Jorgen Lundman




That is because you had only one other choice: filesystem level copy.
With ZFS I believe you will find that snapshots will allow you to have
better control over this. The send/receive process is very, very similar
to a mirror resilver, so you are only carrying your previous process
forward into a brave new world. You'll find that send/receive is much
more flexible than broken mirrors can be.
 -- richard


Perhaps, but when the crunch is on, it is hard to beat the 3 minute 
cloning. zfs send will not be done in 3 minutes, especially if the 
version used is before zfs send speed fixes, like official Sol 10 
10/08.  (I am not sure, but zfs send sounds like you already need the 
2nd server set up and running with IPs etc? )


Anyway, we have found a procedure now, so it is all possible. But it 
would have been nicer to be able to detach the disk politely ;)


Lund



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Mirror cloning

2009-07-23 Thread Jorgen Lundman


Ok, so it seems that with DiskSuite, detaching a mirror does nothing to 
the disk you detached.


However, zpool detach appears to mark the disk as blank, so nothing 
will find any pools (import, import -D etc). zdb -l will show labels, 
but no amount of work that we have found will bring the HDD back online 
in the new server. Grub is blank, and findroot can not see any pool.


zpool will not let you offline the 2nd disk in a mirror. This is 
incorrect behaviour.


You can not cfgadm unconfigure the sata device while zpool has the disk.

We can just yank the disk, but we had issues getting a new-blank disk 
recognised after that. cfgadm would not release the old disk.



However, we found we can do this:

# cfgadm -x sata_port_deactivate sata0/1::dsk/c0t1d0

This will make zpool mark it:

 c0t1d0s0  REMOVED  0 0 0

and eventually:

 c0t1d0s0  FAULTED  0 0 0  too many errors


After that, we pull out the disk, and issue:

# zpool detach zboot c0t1d0s0
# cfgadm -x sata_port_activate sata0/1::dsk/c0t1d0
# cfgadm -c configure sata0/1::dsk/c0t1d0
# format   (fdisk, partition as required to be the same)
# zpool attach zboot c0t0d0s0 c0t1d0s0


There is one final thing to address, when the disk is used in a new 
machine, it will generally panic with pool was used previously with 
system-id xx. Which requires more miniroot work. It would be nice 
to be able to avoid this as well. But you can't export the / pool 
before pulling out the disk, either.




Jorgen Lundman wrote:


Hello list,

Before we started changing to ZFS bootfs, we used DiskSuite mirrored ufs 
boot.


Very often, if we needed to grow a cluster by another machine or two, we 
would simply clone a running live server. Generally the procedure for 
this would be;


1 detach the 2nd HDD, metaclear, and delete metadb on 2nd disk.
2 mount the 2nd HDD under /mnt, and change system/vfstab to be a single 
boot HDD, and no longer mirrored, as well as host name, and IP addresses.

3 bootadm update-archive -R /mnt
4 unmount, cfgadm unconfigure, and pull out the HDD.

and generally, in about ~4 minutes, we have a new live server in the 
cluster.



We tried to do the same thing to day, but with a ZFS bootfs. We did:

1 zpool detach on the 2nd HDD.
2 cfgadm unconfigure the HDD, and pull out the disk.

The source server was fine, could insert new disk, attach it, and it 
resilvered.


However, the new destination server had lots of issues. At first, grub 
would give no menu at all, just the grub? command prompt.


The command: findroot(pool_zboot,0,a) would return Error 15: No such 
file.


After booting a Solaris Live CD, I could zpool import the pool, but of 
course it was in Degraded mode etc.


Now it would show menu, but if you boot it, it would flash the message 
that the pool was last accessed by Solaris $sysid, and panic.


After a lot of reboots, and fiddling, I managed to get miniroot to at 
least boot, then, only after inserting a new HDD and letting the pool 
become completely good would it let me boot into multi-user.


Is there something we should do perhaps, that will let the cloning 
procedure go smoothly? Should I export the 'now separated disk' 
somehow? In fact, can I mount that disk to make changes to it before 
pulling out the disk?


Most documentation on cloning uses zfs send, which would be possible, 
but 4 minutes is hard to beat when your cluster is under heavy load.


Lund



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Mirror cloning

2009-07-23 Thread Jorgen Lundman



Jorgen Lundman wrote:
However, zpool detach appears to mark the disk as blank, so nothing 
will find any pools (import, import -D etc). zdb -l will show labels, 


For kicks, I tried to demonstrate this does indeed happen, so I dd'ed 
the first 1024 1k blocks from the disk, zpool detach it, then dd'ed the 
image back out to the HDD.


Pulled out disk and it boots directly without any interventions. If only 
zpool detach had a flag to tell it not to scribble over the detached disk.


Guess I could diff the before and after disk image and work out what it 
is that it does, and write a tool to undo it, or figure out if I can 
undo it using zdb.


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris live CD that supports ZFS root mount for fs fixes

2009-07-16 Thread Jorgen Lundman
We used the OpenSolaris preview 2010.02 DVD on genunix.org, to fix our 
broken zboot after attempting to clone.  It had the zpool and zfs tools 
enough to import, re-mount etc.


Lund


Matt Weatherford wrote:


Hi,

I borked a libc.so library file on my solaris 10 server (zfs root) - was 
wondering if there
is a good live CD that will be able to mount my ZFS root fs so that I 
can make this
quick repair on the system boot drive and get back running again.  Are 
all ZFS
roots created equal? Its an x86 solaris 10 box. If I boot a belenix live 
CD will it be

able to mount this ZFS root?

Thanks,

Matt

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Jorgen Lundman


I have no idea. I downloaded the script from Bob without modifications 
and ran it specifying only the name of our pool. Should I have changed 
something to run the test?


We have two kinds of x4500/x4540, those with Sol 10 10/08, and 2 running 
svn117 for ZFS quotas. Worth trying on both?


Lund




Ross wrote:

Jorgen,

Am I right in thinking the numbers here don't quite work.  48M blocks is just 
9,000 files isn't it, not 93,000?

I'm asking because I had to repeat a test earlier - I edited the script with 
vi, but when I ran it, it was still using the old parameters.  I ignored it as 
a one off, but I'm wondering if your test has done a similar thing.

Ross



x4540 running svn117

# ./zfs-cache-test.ksh zpool1
zfs create zpool1/zfscachetest
creating data file set 93000 files of 8192000 bytes0
under 
/zpool1/zfscachetest ...

done1
zfs unmount zpool1/zfscachetest
zfs mount zpool1/zfscachetest

doing initial (unmount/mount) 'cpio -o . /dev/null'
48000247 blocks

real4m7.13s
user0m9.27s
sys 0m49.09s

doing second 'cpio -o . /dev/null'
48000247 blocks

real4m52.52s
user0m9.13s
sys 0m47.51s








___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discu
ss


--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Jorgen Lundman


Ah yes, my apologies! I haven't quite worked out why OsX VNC server 
can't handle keyboard mappings. I have to copy'paste @ even. As I 
pasted the output into my mail over VNC, it would have destroyed the 
(not very) unusual characters.



Ross wrote:

Aaah, nevermind, it looks like there's just a rogue 9 appeared in your output.  
It was just a standard run of 3,000 files.


--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Jorgen Lundman



I also ran this on my future RAID/NAS. Intel Atom 330 (D945GCLF2) dual 
core 1.6ghz, on a single HDD pool. svn_114, 64 bit, 2GB RAM.


bash-3.23 ./zfs-cache-test.ksh zboot
zfs create zboot/zfscachetest
creating data file set (3000 files of 8192000 bytes) under 
/zboot/zfscachetest ...

done1
zfs unmount zboot/zfscachetest
zfs mount zboot/zfscachetest

doing initial (unmount/mount) 'cpio -c 131072 -o . /dev/null'
48000256 blocks

real7m45.96s
user0m6.55s
sys 1m20.85s

doing second 'cpio -c 131072 -o . /dev/null'
48000256 blocks

real7m50.35s
user0m6.76s
sys 1m32.91s

feel free to clean up with 'zfs destroy zboot/zfscachetest'.





Bob Friesenhahn wrote:
There has been no forward progress on the ZFS read performance issue for 
a week now.  A 4X reduction in file read performance due to having read 
the file before is terrible, and of course the situation is considerably 
worse if the file was previously mmapped as well.  Many of us have sent 
a lot of money to Sun and were not aware that ZFS is sucking the life 
out of our expensive Sun hardware.


It is trivially easy to reproduce this problem on multiple machines. For 
example, I reproduced it on my Blade 2500 (SPARC) which uses a simple 
mirrored rpool.  On that system there is a 1.8X read slowdown from the 
file being accessed previously.


In order to raise visibility of this issue, I invite others to see if 
they can reproduce it in their ZFS pools.  The script at


http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-cache-test.ksh

Implements a simple test.  It requires a fair amount of disk space to 
run, but the main requirement is that the disk space consumed be more 
than available memory so that file data gets purged from the ARC. The 
script needs to run as root since it creates a filesystem and uses 
mount/umount.  The script does not destroy any data.


There are several adjustments which may be made at the front of the 
script.  The pool 'rpool' is used by default, but the name of the pool 
to test may be supplied via an argument similar to:


# ./zfs-cache-test.ksh Sun_2540
zfs create Sun_2540/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/Sun_2540/zfscachetest ...

Done!
zfs unmount Sun_2540/zfscachetest
zfs mount Sun_2540/zfscachetest

Doing initial (unmount/mount) 'cpio -o  /dev/null'
48000247 blocks

real2m54.17s
user0m7.65s
sys 0m36.59s

Doing second 'cpio -o  /dev/null'
48000247 blocks

real11m54.65s
user0m7.70s
sys 0m35.06s

Feel free to clean up with 'zfs destroy Sun_2540/zfscachetest'.

And here is a similar run on my Blade 2500 using the default rpool:

# ./zfs-cache-test.ksh
zfs create rpool/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/rpool/zfscachetest ...

Done!
zfs unmount rpool/zfscachetest
zfs mount rpool/zfscachetest

Doing initial (unmount/mount) 'cpio -o  /dev/null'
48000247 blocks

real13m3.91s
user2m43.04s
sys 9m28.73s

Doing second 'cpio -o  /dev/null'
48000247 blocks

real23m50.27s
user2m41.81s
sys 9m46.76s

Feel free to clean up with 'zfs destroy rpool/zfscachetest'.

I am interested to hear about systems which do not suffer from this bug.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Jorgen Lundman
c2t2d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c3t1d0  ONLINE   0 0 0
c4t1d0  ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c2t3d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c3t2d0  ONLINE   0 0 0
c4t2d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c6t2d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
c2t4d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c3t3d0  ONLINE   0 0 0
c4t3d0  ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c6t3d0  ONLINE   0 0 0
c1t5d0  ONLINE   0 0 0
c2t5d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c3t4d0  ONLINE   0 0 0
c4t4d0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 0
c6t4d0  ONLINE   0 0 0
c1t6d0  ONLINE   0 0 0
c2t6d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c3t5d0  ONLINE   0 0 0
c4t5d0  ONLINE   0 0 0
c5t5d0  ONLINE   0 0 0
c6t5d0  ONLINE   0 0 0
c1t7d0  ONLINE   0 0 0
c2t7d0  ONLINE   0 0 0
spares
  c3t6d0AVAIL
  c4t6d0AVAIL
  c5t6d0AVAIL
  c6t6d0AVAIL

errors: No known data errors

zfs create zpool1/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/zpool1/zfscachetest ...

Done!
zfs unmount zpool1/zfscachetest
zfs mount zpool1/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o  /dev/null'
48000256 blocks

real3m5.51s
user0m1.70s
sys 0m29.53s

Doing second 'cpio -C 131072 -o  /dev/null'
48000256 blocks

real4m7.63s
user0m1.67s
sys 0m26.66s

Feel free to clean up with 'zfs destroy zpool1/zfscachetest'.





Intel Atom:

bash-3.2# ./zfs-cache-test.ksh zboot
System Configuration: 


System architecture: i386
System release level: 5.11 snv_114
CPU ISA list: amd64 pentium_pro+mmx pentium_pro pentium+mmx pentium i486 
i386 i86


Pool configuration:
  pool: zboot
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
zboot   ONLINE   0 0 0
  c1d0s0ONLINE   0 0 0

errors: No known data errors

zfs create zboot/zfscachetest
Creating data file set (3000 files of 8192000 bytes) under 
/zboot/zfscachetest ...

Done!
zfs unmount zboot/zfscachetest
zfs mount zboot/zfscachetest

Doing initial (unmount/mount) 'cpio -C 131072 -o  /dev/null'
48000256 blocks

real7m27.87s
user0m6.51s
sys 1m20.28s

Doing second 'cpio -C 131072 -o  /dev/null'
48000256 blocks

real7m25.34s
user0m6.63s
sys 1m32.04s

Feel free to clean up with 'zfs destroy zboot/zfscachetest'.

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-14 Thread Jorgen Lundman




You have some mighty pools there.  Something I find quite interesting is 
that those who have mighty pools generally obtain about the same data 
rate regardless of their relative degree of excessive might. This 
causes me to believe that the Solaris kernel is throttling the read rate 
so that throwing more and faster hardware at the problem does not help.





Are you saying the X4500s we have are set up incorrectly, or done in a 
way which will make them run poorly?


The servers came with no documentation nor advise. I have yet to find a 
good place that suggest configurations for dedicated x4500 NFS servers. 
We had to find out about the NFSD_SERVERS when the first trouble came 
in. (Followed by 5 other tweaks and limits-reached troubles).


If Sun really wants to compete with NetApp, you'd think they would ship 
us hardware configured for NFS servers, not x4500s configured for 
desktops :(  They are cheap though! Nothing like being Wall-Mart of Storage!


That is how the pools were created as well. Admittedly it may be down to 
our Vendor again.


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Mirror cloning

2009-07-14 Thread Jorgen Lundman


Hello list,

Before we started changing to ZFS bootfs, we used DiskSuite mirrored ufs 
boot.


Very often, if we needed to grow a cluster by another machine or two, we 
would simply clone a running live server. Generally the procedure for 
this would be;


1 detach the 2nd HDD, metaclear, and delete metadb on 2nd disk.
2 mount the 2nd HDD under /mnt, and change system/vfstab to be a single 
boot HDD, and no longer mirrored, as well as host name, and IP addresses.

3 bootadm update-archive -R /mnt
4 unmount, cfgadm unconfigure, and pull out the HDD.

and generally, in about ~4 minutes, we have a new live server in the 
cluster.



We tried to do the same thing to day, but with a ZFS bootfs. We did:

1 zpool detach on the 2nd HDD.
2 cfgadm unconfigure the HDD, and pull out the disk.

The source server was fine, could insert new disk, attach it, and it 
resilvered.


However, the new destination server had lots of issues. At first, grub 
would give no menu at all, just the grub? command prompt.


The command: findroot(pool_zboot,0,a) would return Error 15: No such file.

After booting a Solaris Live CD, I could zpool import the pool, but of 
course it was in Degraded mode etc.


Now it would show menu, but if you boot it, it would flash the message 
that the pool was last accessed by Solaris $sysid, and panic.


After a lot of reboots, and fiddling, I managed to get miniroot to at 
least boot, then, only after inserting a new HDD and letting the pool 
become completely good would it let me boot into multi-user.


Is there something we should do perhaps, that will let the cloning 
procedure go smoothly? Should I export the 'now separated disk' 
somehow? In fact, can I mount that disk to make changes to it before 
pulling out the disk?


Most documentation on cloning uses zfs send, which would be possible, 
but 4 minutes is hard to beat when your cluster is under heavy load.


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Jorgen Lundman


x4540 running svn117

# ./zfs-cache-test.ksh zpool1
zfs create zpool1/zfscachetest
creating data file set 93000 files of 8192000 bytes0 under 
/zpool1/zfscachetest ...

done1
zfs unmount zpool1/zfscachetest
zfs mount zpool1/zfscachetest

doing initial (unmount/mount) 'cpio -o . /dev/null'
48000247 blocks

real4m7.13s
user0m9.27s
sys 0m49.09s

doing second 'cpio -o . /dev/null'
48000247 blocks

real4m52.52s
user0m9.13s
sys 0m47.51s








___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to discover disks?

2009-07-06 Thread Jorgen Lundman


If you want to use the entire disk in a zpool, you should use the 
notation without the c? trailing part. Ie, c2d0. (SATA related disks 
do not have the t? target part, whereas SCSI, and SCSI-emulated 
devices do. Like CDROMs, USB etc).


If you are using just a part of a disk, one partition/slice, you will 
use the s? notation. For example, c2d0s6.


There is one caveat, x86 bootable HDDs need to be SMI partitioned, EFI 
partitions will not work. So for bootable root volumes, it has to be a 
partition.


Run format on the disk, and create your partition the way you want it. 
Probably just s0 spanning the entire disk. (Not counting the virtual 
s8 boot partition, and of course the entire-disk partition s2).


Then write it as a SMI label, then you can attach it to your root pool.
It usually reminds you to run installgrub on the disk too.

I am not an expert on this, this is just what I have found out so far.

Lund




Hua-Ying Ling wrote:

When I use cfgadm -a it only seems to list usb devices?

#cfgadm -a
Ap_Id  Type Receptacle   Occupant Condition
usb2/1 unknown  emptyunconfigured ok
usb2/2 unknown  emptyunconfigured ok
usb2/3 unknown  emptyunconfigured ok
usb3/1 unknown  emptyunconfigured ok

I'm trying to convert a nonredundant storage pool to a mirrored pool.
I'm following the zfs admin guide on page 71.

I currently have a existing rpool:

#zpool status

  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  c3d0s0ONLINE   0 0 0

I want to mirror this drive, I tried using format to get the disk name

#format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c3d0 DEFAULT cyl 24318 alt 2 hd 255 sec 63
  /p...@0,0/pci-...@14,1/i...@0/c...@0,0
   1. c3d1 drive type unknown
  /p...@0,0/pci-...@14,1/i...@0/c...@1,0

So I tried

#zpool attach rpool c3d0s0 c3d1s0 // failed
cannot open '/dev/dsk/c3d1s0': No such device or address

#zpool attach rpool c3d0s0 c3d1 // failed
cannot label 'c3d1': EFI labeled devices are not supported on root pools.

Thoughts?

Thanks,
Hua-Ying

On Mon, Jul 6, 2009 at 2:37 AM, Carsten
Aulbertcarsten.aulb...@aei.mpg.de wrote:

Hi

Hua-Ying Ling wrote:

How do I discover the disk name to use for zfs commands such as:
c3d0s0?  I tried using format command but it only gave me the first 4
letters: c3d1.  Also why do some command accept only 4 letter disk
names and others require 6 letters?


Usually i find

cfgadm -a

helpful enough for that (mayby adding '|grep disk' to it).

Why sometimes 4 and sometimes 6 characters:

c3d1 - this would be disk#1 on controller#3
c3d0s0 - this would be slice #0 (partition) on disk #0 on controller #3

Usually there is a also t0 there, e.g.:

cfgadm -a|grep disk |head
sata0/0::dsk/c0t0d0disk connectedconfigured   ok
sata0/1::dsk/c0t1d0disk connectedconfigured   ok
sata0/2::dsk/c0t2d0disk connectedconfigured   ok
sata0/3::dsk/c0t3d0disk connectedconfigured   ok
sata0/4::dsk/c0t4d0disk connectedconfigured   ok
sata0/5::dsk/c0t5d0disk connectedconfigured   ok
sata0/6::dsk/c0t6d0disk connectedconfigured   ok
sata0/7::dsk/c0t7d0disk connectedconfigured   ok
sata1/0::dsk/c1t0d0disk connectedconfigured   ok
sata1/1::dsk/c1t1d0disk connectedconfigured   ok


HTH

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Interposing on readdir and friends

2009-07-02 Thread Jorgen Lundman


I am no expert, but I recently wrote a wrapper for my Media players, 
that expands .RAR archives, and presents files inside as regular 
contents of the directory.


It may give you a starting point;

wikihttp://lundman.net/wiki/index.php/Librarchy
tarball http://www.lundman.net/ftp/librarcy/librarcy-1.0.3.tar.gz
CVSweb  http://www.lundman.net/cvs/viewvc.cgi/lundman/librarcy/

Lund


Peter Tribble wrote:

We've just stumbled across an interesting problem in one of our
applications that fails when run on a ZFS filesystem.

I don't have the code, so I can't fix it at source, but it's relying
on the fact that if you do readdir() on a directory, the files come
back in the order they were added to the directory. This appears
to be true (within certain limitations) on UFS, but certainly isn't
true on ZFS.

Is there any way to force readdir() to return files in a specific order?
(On UFS, we have a scipt that creates symlinks in the correct order.
Ugly, but seems to have worked for many years.)

If not, I was looking at interposing my own readdir() (that's assuming
the application is using readdir()) that actually returns the entries in
the desired order. However, I'm having a bit of trouble hacking this
together (the current source doesn't compile in isolation on my S10
machine).



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Open Solaris version recommendation? b114, b117?

2009-07-02 Thread Jorgen Lundman


We have been told we can have support for OpenSolaris finally, so we can 
move the ufs on zvol over to zfs with user-quotas.


Does anyone have any feel for the versions of Solaris that has zfs user 
quotas? We will put it on the x4540 for customers.


I have run b114 for about 5 weeks, and have yet to experience any 
problems. But b117 is what 2010/02 version will be based on, so perhaps 
that is a better choice. Other versions worth considering?


I know it's a bit vague, but perhaps there is a known panic in a certain 
version that I may not be aware.


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best controller card for 8 SATA drives ?

2009-06-21 Thread Jorgen Lundman


I only have a 32bit PCI bus in the Intel Atom 330 board, so I have no 
choice than to be slower, but I can confirm that the Supermicro 
dac-sata-mv8 (SATA-1) card works just fine, and does display in cfgadm. 
(Hot-swapping is possible).


I have been told aoc-sat2-mv8 does as well (SATA-II) but I have not 
personally tried it.


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] PicoLCD Was: Best controller card for 8 SATA drives ?

2009-06-21 Thread Jorgen Lundman


I hesitate to post this question here, since the relation to ZFS is 
tenuous at best (zfs to sata controller to LCD panel).


But maybe someone has already been down this path before me. Looking at 
building a RAID, with osol and zfs, I naturally want a front-panel. I 
was looking at something like;


http://www.mini-box.com/picoLCD-256x64-Sideshow-CDROM-Bay

Since it appears to come with OpenSource drivers. Based on lcd4linux, 
which I can compile with marginal massaging. Has anyone run this 
successfully with osol?


It appears to handle mrtg directly, so I should be able to graph a whole 
load of ZFS data. Has someone already been down this road too?




--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on 32 bit?

2009-06-17 Thread Jorgen Lundman



casper@sun.com wrote:


It's true for most of the Intel Atom family (Zxxx and Nxxx but not the
230 and 330 as those are 64 bit) Those are new systems.

Casper

___


I've actually just started to build my home raid using the Atom 330 
(D945GCLF2):


Status of virtual processor 0 as of: 06/17/2009 16:25:55
  on-line since 09/17/2008 14:32:04.
  The i386 processor operates at 1600 MHz,
and has an i387 compatible floating point processor.
Status of virtual processor 1 as of: 06/17/2009 16:25:55
  on-line since 09/17/2008 14:32:24.
  The i386 processor operates at 1600 MHz,
and has an i387 compatible floating point processor.
Status of virtual processor 2 as of: 06/17/2009 16:25:55
  on-line since 09/17/2008 14:32:24.
  The i386 processor operates at 1600 MHz,
and has an i387 compatible floating point processor.
Status of virtual processor 3 as of: 06/17/2009 16:25:55
  on-line since 09/17/2008 14:32:26.
  The i386 processor operates at 1600 MHz,
and has an i387 compatible floating point processor.

and booted 64 bit just fine. (I thought uname -a showed that, but 
apparently it does not).


The only annoyance is that the onboard ICH7 is the $27c0, and not the 
$27c1 (with ahci mode for hot-swapping). But I always were planning on 
adding a SATA PCI card since I need more than 2 HDDs.


But to stay on-topic, it sounds like Richard Elling summed it up nicely, 
which is something Richard is really good at.


Lund



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the new consumer NAS devices run OpenSolaris?

2009-05-28 Thread Jorgen Lundman
It's not what you want, but I use fullsized PC located at balcony and 

 don't worries about heat, vibration and noise. AFAIK, this is the
 cheapest, simplest and quickest way to build custom server. I ever
 know people who made 19 rack with set of servers on balcony also. =)
And offcause I don't understand what the strong neccesary to hold 

 this HDD-server near you. Are you going to swap HDD every day?
 Or do you like to see it while working? %-) 1000BaseTX
 standart allow up to 105 meters distance, remember it ;-)

I have played with the idea of using the balcony. But in summer, it hits 
40C+ for a couple of months, so if they were in a rack or similar 
storage it would get even hotter. I would also have to have AC for it. 
Inside, we already have AC on :)


I wanted more hands-off really. Just a little box plugged in, replace 
the HDDs when the red light comes on, the rest is automatic. But of 
course, at the same time, it is MY data, so I'd rather it was using ZFS 
and so on.


The Thecus, and QNAP, raids both use Intel chipsets. I am curious if I 
picked up an empty box; 2nd hand for next-to-nothing, if I couldn't 
re-flash it with osol, or eon, or freenas.




--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs send speed. Was: User quota design discussion..

2009-05-27 Thread Jorgen Lundman


I changed to try zfs send on a UFS on zvolume as well:

received 92.9GB stream in 2354 seconds (40.4MB/sec)

Still fast enough to use. I have yet to get around to trying something 
considerably larger in size.


Lund


Jorgen Lundman wrote:



So you recommend I also do speed test on larger volumes? The test data I 
had on the b114 server was only 90GB. Previous tests included 500G ufs 
on zvol etc.  It is just it will take 4 days to send it to the b114 
server to start with ;) (From Sol10 servers).


Lund

Dirk Wriedt wrote:

Jorgen,

what is the size of the sending zfs?

I thought replication speed depends on the size of the sending fs, too 
not only size of the snapshot being sent.


Regards
Dirk


--On Freitag, Mai 22, 2009 19:19:34 +0900 Jorgen Lundman 
lund...@gmo.jp wrote:



Sorry, yes. It is straight;

# time zfs send zpool1/leroy_c...@speedtest | nc 172.20.12.232 3001
real19m48.199s

# /var/tmp/nc -l -p 3001 -vvv | time zfs recv -v zpool1/le...@speedtest
received 82.3GB stream in 1195 seconds (70.5MB/sec)


Sending is osol-b114.
Receiver is Solaris 10 10/08

When we tested Solaris 10 10/08 - Solaris 10 10/08 these were the 
results;


zfs send | nc | zfs recv - 1 MB/s
tar -cvf /zpool/leroy | nc | tar -xvf -  - 2.5 MB/s
ufsdump | nc | ufsrestore- 5.0 MB/s

So, none of those solutions was usable with regular Sol 10. Note most 
our volumes are ufs in

zvol, but even zfs volumes were slow.

Someone else had mentioned the speed was fixed in an earlier release, 
I had not had a chance to
upgrade. But since we wanted to try zfs user-quotas, I finally had 
the chance.


Lund


Brent Jones wrote:
On Thu, May 21, 2009 at 10:17 PM, Jorgen Lundman lund...@gmo.jp 
wrote:

To finally close my quest. I tested zfs send in osol-b114 version:

received 82.3GB stream in 1195 seconds (70.5MB/sec)

Yeeaahh!

That makes it completely usable! Just need to change our support 
contract to

allow us to run b114 and we're set! :)


Thanks,

Lund


Jorgen Lundman wrote:

We finally managed to upgrade the production x4500s to Sol 10 10/08
(unrelated to this) but with the hope that it would also make zfs 
send

usable.

Exactly how does build 105 translate to Solaris 10 10/08?  My 
current
speed test has sent 34Gb in 24 hours, which isn't great. Perhaps 
the next

version of Solaris 10 will have the improvements.

1



Robert Milkowski wrote:

Hello Jorgen,

If you look at the list archives you will see that it made a huge
difference for some people including me. Now I'm easily able to
saturate GbE linke while zfs send|recv'ing.


Since build 105 it should be *MUCH* for faster.



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Can you give any details about your data set, what you piped zfs
send/receive through (SSH?), hardware/network, etc?
I'm envious of your speeds!




--
Dirk Wriedt, dirk.wri...@sun.com, Sun Microsystems GmbH
Systemingenieur Strategic Accounts
Nagelsweg 55, 20097 Hamburg, Germany
Tel.: +49-40-251523-132 Fax: +49-40-251523-425 Mobile: +49 172 848 4166
Never been afraid of chances I been takin' - Joan Jett

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 
Kirchheim-Heimstetten

Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering






--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs send speed. Was: User quota design discussion..

2009-05-26 Thread Jorgen Lundman



So you recommend I also do speed test on larger volumes? The test data I 
had on the b114 server was only 90GB. Previous tests included 500G ufs 
on zvol etc.  It is just it will take 4 days to send it to the b114 
server to start with ;) (From Sol10 servers).


Lund

Dirk Wriedt wrote:

Jorgen,

what is the size of the sending zfs?

I thought replication speed depends on the size of the sending fs, too 
not only size of the snapshot being sent.


Regards
Dirk


--On Freitag, Mai 22, 2009 19:19:34 +0900 Jorgen Lundman 
lund...@gmo.jp wrote:



Sorry, yes. It is straight;

# time zfs send zpool1/leroy_c...@speedtest | nc 172.20.12.232 3001
real19m48.199s

# /var/tmp/nc -l -p 3001 -vvv | time zfs recv -v zpool1/le...@speedtest
received 82.3GB stream in 1195 seconds (70.5MB/sec)


Sending is osol-b114.
Receiver is Solaris 10 10/08

When we tested Solaris 10 10/08 - Solaris 10 10/08 these were the 
results;


zfs send | nc | zfs recv - 1 MB/s
tar -cvf /zpool/leroy | nc | tar -xvf -  - 2.5 MB/s
ufsdump | nc | ufsrestore- 5.0 MB/s

So, none of those solutions was usable with regular Sol 10. Note most 
our volumes are ufs in

zvol, but even zfs volumes were slow.

Someone else had mentioned the speed was fixed in an earlier release, 
I had not had a chance to
upgrade. But since we wanted to try zfs user-quotas, I finally had the 
chance.


Lund


Brent Jones wrote:

On Thu, May 21, 2009 at 10:17 PM, Jorgen Lundman lund...@gmo.jp wrote:

To finally close my quest. I tested zfs send in osol-b114 version:

received 82.3GB stream in 1195 seconds (70.5MB/sec)

Yeeaahh!

That makes it completely usable! Just need to change our support 
contract to

allow us to run b114 and we're set! :)


Thanks,

Lund


Jorgen Lundman wrote:

We finally managed to upgrade the production x4500s to Sol 10 10/08
(unrelated to this) but with the hope that it would also make zfs 
send

usable.

Exactly how does build 105 translate to Solaris 10 10/08?  My 
current
speed test has sent 34Gb in 24 hours, which isn't great. Perhaps 
the next

version of Solaris 10 will have the improvements.

1



Robert Milkowski wrote:

Hello Jorgen,

If you look at the list archives you will see that it made a huge
difference for some people including me. Now I'm easily able to
saturate GbE linke while zfs send|recv'ing.


Since build 105 it should be *MUCH* for faster.



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Can you give any details about your data set, what you piped zfs
send/receive through (SSH?), hardware/network, etc?
I'm envious of your speeds!




--
Dirk Wriedt, dirk.wri...@sun.com, Sun Microsystems GmbH
Systemingenieur Strategic Accounts
Nagelsweg 55, 20097 Hamburg, Germany
Tel.: +49-40-251523-132 Fax: +49-40-251523-425 Mobile: +49 172 848 4166
Never been afraid of chances I been takin' - Joan Jett

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 
Kirchheim-Heimstetten

Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering




--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs send speed. Was: User quota design discussion..

2009-05-22 Thread Jorgen Lundman

Sorry, yes. It is straight;

# time zfs send zpool1/leroy_c...@speedtest | nc 172.20.12.232 3001
real19m48.199s

# /var/tmp/nc -l -p 3001 -vvv | time zfs recv -v zpool1/le...@speedtest
received 82.3GB stream in 1195 seconds (70.5MB/sec)


Sending is osol-b114.
Receiver is Solaris 10 10/08

When we tested Solaris 10 10/08 - Solaris 10 10/08 these were the results;

zfs send | nc | zfs recv - 1 MB/s
tar -cvf /zpool/leroy | nc | tar -xvf -  - 2.5 MB/s
ufsdump | nc | ufsrestore- 5.0 MB/s

So, none of those solutions was usable with regular Sol 10. Note most 
our volumes are ufs in zvol, but even zfs volumes were slow.


Someone else had mentioned the speed was fixed in an earlier release, I 
had not had a chance to upgrade. But since we wanted to try zfs 
user-quotas, I finally had the chance.


Lund


Brent Jones wrote:

On Thu, May 21, 2009 at 10:17 PM, Jorgen Lundman lund...@gmo.jp wrote:

To finally close my quest. I tested zfs send in osol-b114 version:

received 82.3GB stream in 1195 seconds (70.5MB/sec)

Yeeaahh!

That makes it completely usable! Just need to change our support contract to
allow us to run b114 and we're set! :)


Thanks,

Lund


Jorgen Lundman wrote:

We finally managed to upgrade the production x4500s to Sol 10 10/08
(unrelated to this) but with the hope that it would also make zfs send
usable.

Exactly how does build 105 translate to Solaris 10 10/08?  My current
speed test has sent 34Gb in 24 hours, which isn't great. Perhaps the next
version of Solaris 10 will have the improvements.

1



Robert Milkowski wrote:

Hello Jorgen,

If you look at the list archives you will see that it made a huge
difference for some people including me. Now I'm easily able to
saturate GbE linke while zfs send|recv'ing.


Since build 105 it should be *MUCH* for faster.



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Can you give any details about your data set, what you piped zfs
send/receive through (SSH?), hardware/network, etc?
I'm envious of your speeds!



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Replacing HDD with larger HDD..

2009-05-22 Thread Jorgen Lundman


What is the current answer regarding replacing HDDs in a raidz, one at a 
time, with a larger HDD? The Best-Practises-Wiki seems to suggest it is 
possible (but perhaps just for mirror, not raidz?)


I am currently running osol-b114.



I did this test with data files to simulate this situation;

# mkfile 1G disk0[12345]

-rw--T   1 root root 1073741824 May 23 09:19 disk01
-rw--T   1 root root 1073741824 May 23 09:19 disk02
-rw--T   1 root root 1073741824 May 23 09:20 disk03
-rw--T   1 root root 1073741824 May 23 09:20 disk04
-rw--T   1 root root 1073741824 May 23 09:20 disk05

# zpool create grow raidz /var/tmp/disk01 /var/tmp/disk02 
/var/tmp/disk03 /var/tmp/disk04 /var/tmp/disk05


# zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
grow  4.97G   138K  4.97G 0%  ONLINE  -

# zfs create -o compression=on -o atime=off grow/fs1

# zfs list
NAME  USED  AVAIL  REFER  MOUNTPOINT
grow  153K  3.91G  35.1K  /grow
grow/fs1 33.6K  3.91G  33.6K  /grow/fs1

# zpool status grow
NAME STATE READ WRITE CKSUM
grow ONLINE   0 0 0
  raidz1 ONLINE   0 0 0
/var/tmp/disk01  ONLINE   0 0 0
/var/tmp/disk02  ONLINE   0 0 0
/var/tmp/disk03  ONLINE   0 0 0
/var/tmp/disk04  ONLINE   0 0 0
/var/tmp/disk05  ONLINE   0 0 0

-

That is our starting position, raidz using 5 1GB disks, giving us a 
total 3.91G file-system.


Now to replace each one at a time with a 2GB disk.



-rw--T   1 root root 2147483648 May 23 09:36 bigger_disk01
-rw--T   1 root root 2147483648 May 23 09:37 bigger_disk02
-rw--T   1 root root 2147483648 May 23 09:40 bigger_disk03
-rw--T   1 root root 2147483648 May 23 09:40 bigger_disk04
-rw--T   1 root root 2147483648 May 23 09:41 bigger_disk05


# zpool offline grow /var/tmp/disk01
# zpool replace grow /var/tmp/disk01 /var/tmp/bigger_disk01

# zpool status grow
  pool: grow
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Sat May 23 
09:43:51 2009


config:

NAMESTATE READ WRITE CKSUM
growONLINE   0 0 0
  raidz1ONLINE   0 0 0
/var/tmp/bigger_disk01  ONLINE   0 0 0  1.04M 
resilvered


/var/tmp/disk02 ONLINE   0 0 0
/var/tmp/disk03 ONLINE   0 0 0
/var/tmp/disk04 ONLINE   0 0 0
/var/tmp/disk05 ONLINE   0 0 0


Do the same for all 5 disks



# zpool status grow
 scrub: resilver completed after 0h0m with 0 errors on Sat May 23 
09:46:28 2009


config:

NAMESTATE READ WRITE CKSUM
growONLINE   0 0 0
  raidz1ONLINE   0 0 0
/var/tmp/bigger_disk01  ONLINE   0 0 0
/var/tmp/bigger_disk02  ONLINE   0 0 0
/var/tmp/bigger_disk03  ONLINE   0 0 0
/var/tmp/bigger_disk04  ONLINE   0 0 0
/var/tmp/bigger_disk05  ONLINE   0 0 0  1.04M 
resilvered




I was somewhat it just be magical here, but unfortunately;

# zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
grow  4.97G  5.35M  4.96G 0%  ONLINE  -

It is still the same size. I would expect it to go to 9G.


-



I did a few commands to see if you can tell it to make it happen. Scrub, 
zfs unmount/mount, zpool upgrade, etc. No difference.






Then something peculiar happened. I tried to export it, and import it to 
see if it helped;


# zpool export grow
# zpool import grow
cannot import 'grow': no such pool available

And alas, grow is completely gone, and no amount of import would see 
it. Oh well.






--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replacing HDD with larger HDD..

2009-05-22 Thread Jorgen Lundman



Rob Logan wrote:

you meant to type
zpool import -d /var/tmp grow



Bah - of course, I can not just expect zpool to know what random 
directory to search.


You Sir, are a genius.

Works like a charm, and thank you.

Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs send speed. Was: User quota design discussion..

2009-05-21 Thread Jorgen Lundman


To finally close my quest. I tested zfs send in osol-b114 version:

received 82.3GB stream in 1195 seconds (70.5MB/sec)

Yeeaahh!

That makes it completely usable! Just need to change our support 
contract to allow us to run b114 and we're set! :)



Thanks,

Lund


Jorgen Lundman wrote:


We finally managed to upgrade the production x4500s to Sol 10 10/08 
(unrelated to this) but with the hope that it would also make zfs send 
usable.


Exactly how does build 105 translate to Solaris 10 10/08?  My current 
speed test has sent 34Gb in 24 hours, which isn't great. Perhaps the 
next version of Solaris 10 will have the improvements.




Robert Milkowski wrote:

Hello Jorgen,

If you look at the list archives you will see that it made a huge
difference for some people including me. Now I'm easily able to
saturate GbE linke while zfs send|recv'ing.


Since build 105 it should be *MUCH* for faster.





--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS userquota groupquota test

2009-05-20 Thread Jorgen Lundman


I have been playing around with osol-nv-b114 version, and the ZFS user 
and group quotas.


First of all, it is fantastic. Thank you all! (Sun, Ahrens and anyone 
else involved).


I'm currently copying over one of the smaller user areas, and setting up 
their quotas, so I have yet to start large scale testing. But the 
initial work is very promising. (Just 90G data, 341694 accounts)


Using userquota@, userused@ and userspace commands are easy to pick up.

With a test account with 50M quota, and a while [ 1 ] script copying a 
5M file, it reaches about 120M before the user is stopped (as expected).


The lazy-update-quota is not a problem for us, and less of a problem the 
more quota the user has (50M is a bit low).


I was unable to get ZFS quota to work with rquota. (Ie, NFS mount the 
volume on another server, and issue quota 1234. It returns nothing).


I assume rquota is just not implemented, not a problem for us.

perl cpan module Quota does not implement ZFS quotas. :)




--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS userquota groupquota test

2009-05-20 Thread Jorgen Lundman



Matthew Ahrens wrote:


Thanks for the feedback!


Thank you for the work, it sure is great!




This should work, at least on Solaris clients.  Perhaps you can only 
request information about yourself from the client?




Odd, but I just assumed it wouldn't work and didn't check further. But 
telnet/rquota wasn't running.


But I do find that, from a server mounting the NFS volume:

# quota -v 1234
Disk quotas for (no account) (uid 1234):
Filesystem usage  quota  limittimeleft  files  quota  limit 
timeleft

/export/leroy
   55409 1048576 1048576  0  0  0


However, on the x4500 server itself:

# quota -v 1234
Disk quotas for (no account) (uid 1234):
Filesystem usage  quota  limittimeleft  files  quota  limit 
timeleft




Of course I should use zfs get userused on the server, but that is 
probably what confused the situation. Perhaps something to do with that 
mount doesn't think it is mounted with quota when local.


I could try mountpoint=legacy and explicitly list rq when mounting maybe .

But we don't need it to work, it was just different from legacy 
behaviour. :)


Lund



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS userquota groupquota test

2009-05-20 Thread Jorgen Lundman



Oh I forgot the more important question.

Importing all the user quota settings; Currently as a long file of zfs 
set commands, which is taking a really long time. For example, 
yesterday's import is still running.


Are there bulk-import solutions? Like zfs set -f file.txt or similar?

If not, I could potentially use zfs ioctls perhaps to write my own bulk 
import program? Large imports are rare, but I was just curious if there 
was a better way to issue large amounts of zfs set commands.





Jorgen Lundman wrote:



Matthew Ahrens wrote:


Thanks for the feedback!


Thank you for the work, it sure is great!




This should work, at least on Solaris clients.  Perhaps you can only 
request information about yourself from the client?




Odd, but I just assumed it wouldn't work and didn't check further. But 
telnet/rquota wasn't running.


But I do find that, from a server mounting the NFS volume:

# quota -v 1234
Disk quotas for (no account) (uid 1234):
Filesystem usage  quota  limittimeleft  files  quota  limit 
timeleft

/export/leroy
   55409 1048576 1048576  0  0  0


However, on the x4500 server itself:

# quota -v 1234
Disk quotas for (no account) (uid 1234):
Filesystem usage  quota  limittimeleft  files  quota  limit 
timeleft




Of course I should use zfs get userused on the server, but that is 
probably what confused the situation. Perhaps something to do with that 
mount doesn't think it is mounted with quota when local.


I could try mountpoint=legacy and explicitly list rq when mounting maybe .

But we don't need it to work, it was just different from legacy 
behaviour. :)


Lund





--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs and b114 version

2009-05-19 Thread Jorgen Lundman



I tried LUpdate 3 times with same result, burnt the ISO and installed 
the old fashioned way, and it boots fine.





Jorgen Lundman wrote:



Most annoying. If su.static really had been static I would be able to
figure out what goes wrong.
When I boot into miniroot/failsafe it works just fine, including if I
set crle to use only libraries in /a/lib:/a/usr/lib (and 64 bit).

So startd not launching must be some other file/permission on disk
somewhere, but without single user shell, it isn't easy to see why.

There is nothing in /var/{adm,log}. (But then, syslogd has not started
yet).

svc.startd.log merely states:
May 19 14:15:23/1: restarting after interruption


Perhaps it will work better if I just install from CD instead of using 
LiveUpdate






Jorgen Lundman wrote:


I used LUpdate to create a b114 BE on the spare X4540, and booted it, 
but alas, I get the following message on boot:



SunOS Release 5.11 Version snv_114 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.

/kernel/drv/amd64/pcic: undefined symbol 'cardbus_can_suspend'
WARNING: mod_load: cannot load module 'pcic'
libscf.c:3257: scf_handle_bind() failed with unexpected error 1017. 
Aborting.

May 19 10:16:34 svc.startd[11]: restarting after interruption


Entering System Maintenance Mode
ld.so.1: su.static: fatal: relocation error: file /sbin/su.static: 
symbol audit_su_init_info: referenced symbol not found

May 19 10:16:51 svc.startd[19]: restarting after interruption
libscf.c:3257: scf_handle_bind() failed with unexpected error 1017. 
Aborting.



I will probably have to wait a little bit longer after all :)

Lund



Jorgen Lundman wrote:



The website has not been updated yet to reflect its availability 
(thus it may not be official yet), but you can get SXCE b114 now 
from 
https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US/-/USD/viewproductdetail-start?productref=sol-express_b114-full-x86-sp-...@cds-cds_smi 





I don't mind learning something new, but that's even faster! I will 
try that image and work on my kernel building projects a little later...


Thanks!








--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs and b114 version

2009-05-18 Thread Jorgen Lundman


I used LUpdate to create a b114 BE on the spare X4540, and booted it, 
but alas, I get the following message on boot:



SunOS Release 5.11 Version snv_114 64-bit 

Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved. 

Use is subject to license terms. 



/kernel/drv/amd64/pcic: undefined symbol 'cardbus_can_suspend'
WARNING: mod_load: cannot load module 'pcic'
libscf.c:3257: scf_handle_bind() failed with unexpected error 1017. 
Aborting.

May 19 10:16:34 svc.startd[11]: restarting after interruption


Entering System Maintenance Mode
ld.so.1: su.static: fatal: relocation error: file /sbin/su.static: 
symbol audit_su_init_info: referenced symbol not found

May 19 10:16:51 svc.startd[19]: restarting after interruption
libscf.c:3257: scf_handle_bind() failed with unexpected error 1017. 
Aborting.



I will probably have to wait a little bit longer after all :)

Lund



Jorgen Lundman wrote:



The website has not been updated yet to reflect its availability (thus 
it may not be official yet), but you can get SXCE b114 now from 
https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US/-/USD/viewproductdetail-start?productref=sol-express_b114-full-x86-sp-...@cds-cds_smi 





I don't mind learning something new, but that's even faster! I will try 
that image and work on my kernel building projects a little later...


Thanks!




--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Zfs and b114 version

2009-05-17 Thread Jorgen Lundman


http://dlc.sun.com/osol/on/downloads/b114/

This URL makes me think that if I just sit down and figure out how to 
compile OpenSolaris, I can try b114 now^h^h^h eventually ? I am really 
eager to try out the new quota support.. has someone already tried 
compiling it perhaps? How complicated is compiling osol compared to, 
say, NetBSD/FreeBSD, Linux etc ? (IRIX and its quickstarting??)




--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the new consumer NAS devices run OpenSolaris?

2009-04-20 Thread Jorgen Lundman


Re-surfacing an old thread. I was wondering myself if there are any 
home-use commercial NAS devices with zfs. I did find that there is 
Thecus 7700. But, it appears to come with Linux, and use ZFS in FUSE, 
but I (perhaps unjustly) don't feel comfortable with :)


Perhaps we will start to see more home NAS devices with zfs options, or 
at least to be able to run EON ?





Joe S wrote:

In the last few weeks, I've seen a number of new NAS devices released
from companies like HP, QNAP, VIA, Lacie, Buffalo, Iomega,
Cisco/Linksys, etc. Most of these are powered by Intel Celeron, Intel
Atom, AMD Sempron, Marvell Orion, or Via C7 chips. I've also noticed
that most allow a maximum of 1 or 2 GB of RAM.

Is it likely that any of these will run OpenSolaris?

Has anyone else tried?

http://www.via.com.tw/en/products/embedded/nsd7800/
http://www.hp.com/united-states/digitalentertainment/mediasmart/serverdemo/index-noflash.html
http://www.qnap.com/pro_detail_feature.asp?p_id=108

I prefer one of these instead of the huge PC I have at home.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs send speed. Was: User quota design discussion..

2009-04-09 Thread Jorgen Lundman


We finally managed to upgrade the production x4500s to Sol 10 10/08 
(unrelated to this) but with the hope that it would also make zfs send 
usable.


Exactly how does build 105 translate to Solaris 10 10/08?  My current 
speed test has sent 34Gb in 24 hours, which isn't great. Perhaps the 
next version of Solaris 10 will have the improvements.




Robert Milkowski wrote:

Hello Jorgen,

If you look at the list archives you will see that it made a huge
difference for some people including me. Now I'm easily able to
saturate GbE linke while zfs send|recv'ing.


Since build 105 it should be *MUCH* for faster.



--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-14 Thread Jorgen Lundman
Sorry, did not mean it as a complaints, it just has been for us. But if 
it has been made faster, that would be excellent. ZFS send is very powerful.


Lund


Robert Milkowski wrote:

Hello Jorgen,

Friday, March 13, 2009, 1:14:12 AM, you wrote:

JL That is a good point, I had not even planned to support quotas for ZFS
JL send, but consider a rescan to be the answer.  We don't ZFS send very 
JL often as it is far too slow.


Since build 105 it should be *MUCH* for faster.




--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Jorgen Lundman



Bob Friesenhahn wrote:
In order for this to work, ZFS data blocks need to somehow be associated 
with a POSIX user ID.  To start with, the ZFS POSIX layer is implemented 
on top of a non-POSIX Layer which does not need to know about POSIX user 
IDs.  ZFS also supports snapshots and clones.


This I did not know, but now that you point it out, this would be the 
right way to design it. So the advantage of requiring less ZFS 
integration is no longer the case.


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Jorgen Lundman



Eric Schrock wrote:

Note that:

6501037 want user/group quotas on ZFS 


Is already committed to be fixed in build 113 (i.e. in the next month).

- Eric


Wow, that would be fantastic. We have the Sun vendors camped out at the 
data center trying to apply fresh patches. I believe 6798540 fixed the 
largest issue but it would be desirable to be able to use just ZFS.


Is this a project needing donations? I see your address is at Sun.com, 
and we already have 9 x4500s, but maybe you need some pocky, asse, 
collon or pocari sweat...



Lundy


[1]
BugID:6798540
 3-way deadlock happens in ufs filesystem on zvol when writng ufs log

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Jorgen Lundman




As it turns out, I'm working on zfs user quotas presently, and expect to 
integrate in about a month.  My implementation is in-kernel, integrated 
with the rest of ZFS, and does not have the drawbacks you mention below.


I merely suggested my design as it may have been something I _could_ 
have implemented, as it required little ZFS knowledge. (Adding hooks is 
usually easier). But naturally that has already been shown not to be 
the case.


A proper implementation is always going to be much more desirable :)





Good, that's the behavior that user quotas will have -- delayed 
enforcement.


There probably are situations where precision is required, or perhaps 
historical reasons, but for us a delayed enforcement may even be better.


Perhaps it would be better for the delivery of an email message that 
goes over the quota, to be allowed to complete writing the entire 
message. Than it is to abort a write() call somewhere in the middle, and 
return failures all the way back to generating a bounce message. Maybe.. 
can't say I have thought about it.




My implementation does not have this drawback.  Note that you would need 
to use the recovery mechanism in the case of a system crash / power loss 
as well.  Adding potentially hours to the crash recovery time is not 
acceptable.


Great! Will there be any particular limits on how many uids, or size of 
uids in your implementation? UFS generally does not, but I did note that 
if uid go over 1000 it flips out and changes the quotas file to 
128GB in size.



Not to mention that this information needs to get stored somewhere, and 
dealt with when you zfs send the fs to another system.


That is a good point, I had not even planned to support quotas for ZFS 
send, but consider a rescan to be the answer.  We don't ZFS send very 
often as it is far too slow.


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] User quota design discussion..

2009-03-11 Thread Jorgen Lundman


In the style of a discussion over a beverage, and talking about 
user-quotas on ZFS, I recently pondered a design for implementing user 
quotas on ZFS after having far too little sleep.


It is probably nothing new, but I would be curious what you experts 
think of the feasibility of implementing such a system and/or whether or 
not it would even realistically work.


I'm not suggesting that someone should do the work, or even that I will, 
but rather in the interest of chatting about it.


Feel free to ridicule me as required! :)

Thoughts:

Here at work we would like to have user quotas based on uid (and 
presumably gid) to be able to fully replace the NetApps we run. Current 
ZFS are not good enough for our situation. We simply can not mount 
500,000 file-systems on all the NFS clients. Nor do all servers we run 
support mirror-mounts. Nor do auto-mount see newly created directories 
without a full remount.


Current UFS-style-user-quotas are very exact. To the byte even. We do 
not need this precision. If a user has 50MB of quota, and they are able 
to reach 51MB usage, then that is acceptable to us. Especially since 
they have to go under 50MB to be able to write new data, anyway.


Instead of having complicated code in the kernel layer, slowing down the 
file-system with locking and semaphores (and perhaps avoiding learning 
indepth ZFS code?), I was wondering if a more simplistic setup could be 
designed, that would still be acceptable. I will use the word 
'acceptable' a lot. Sorry.


My thoughts are that the ZFS file-system will simply write a 
'transaction log' on a pipe. By transaction log I mean uid, gid and 
'byte count changed'. And by pipe I don't necessarily mean pipe(2), but 
it could be a fifo, pipe or socket. But currently I'm thinking 
'/dev/quota' style.


User-land will then have a daemon, whether or not it is one daemon per 
file-system or really just one daemon does not matter. This process will 
open '/dev/quota' and empty the transaction log entries constantly. Take 
the uid,gid entries and update the byte-count in its database. How we 
store this database is up to us, but since it is in user-land it should 
have more flexibility, and is not as critical to be fast as it would 
have to be in kernel.


The daemon process can also grow in number of threads as demand increases.

Once a user's quota reaches the limit (note here that /the/ call to 
write() that goes over the limit will succeed, and probably a couple 
more after. This is acceptable) the process will blacklist the uid in 
kernel. Future calls to creat/open(CREAT)/write/(insert list of calls) 
will be denied. Naturally calls to unlink/read etc should still succeed. 
If the uid goes under the limit, the uid black-listing will be removed.


If the user-land process crashes or dies, for whatever reason, the 
buffer of the pipe will grow in the kernel. If the daemon is restarted 
sufficiently quickly, all is well, it merely needs to catch up. If the 
pipe does ever get full and items have to be discarded, a full-scan will 
be required of the file-system. Since even with UFS quotas we need to 
occasionally run 'quotacheck', it would seem this too, is acceptable (if 
undesirable).


If you have no daemon process running at all, you have no quotas at all. 
But the same can be said about quite a few daemons. The administrators 
need to adjust their usage.


I can see a complication with doing a rescan. How could this be done 
efficiently? I don't know if there is a neat way to make this happen 
internally to ZFS, but from a user-land only point of view, perhaps a 
snapshot could be created (synchronised with the /dev/quota pipe 
reading?) and start a scan on the snapshot, while still processing 
kernel log. Once the scan is complete, merge the two sets.


Advantages are that only small hooks are required in ZFS. The byte 
updates, and the blacklist with checks for being blacklisted.


Disadvantages are that it is loss of precision, and possibly slower 
rescans? Sanity?


But I do not really know the internals of ZFS, so I might be completely 
wrong, and everyone is laughing already.


Discuss?

Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Introducing zilstat

2009-02-04 Thread Jorgen Lundman

Interesting, but what does it mean :)


The x4500 for mail (NFS vers=3 on ufs on zpool with quotas):

# ./zilstat.ksh
N-Bytes  N-Bytes/s N-Max-Bytes/sB-Bytes  B-Bytes/s B-Max-Bytes/s
 376720 376720 376720128614412861441286144
 419608 419608 419608136806413680641368064
 555256 555256 555256173260817326081732608
 538808 538808 538808167936016793601679360
 626048 626048 626048177356817735681773568
 753824 753824 753824210534421053442105344
 652632 652632 652632171622417162241716224

Fairly constantly between 1-2MB/s. That doesn't sound too bad though. 
It's only got 400 nfsd threads at the moment, but peaks at 1024. 
Incidentally, what is the highest recommended nfsd_threads for a x4500 
anyway?

Lund



Marion Hakanson wrote:
 The zilstat tool is very helpful, thanks!
 
 I tried it on an X4500 NFS server, while extracting a 14MB tar archive,
 both via an NFS client, and locally on the X4500 itself.  Over NFS,
 said extract took ~2 minutes, and showed peaks of 4MB/sec buffer-bytes
 going through the ZIL.
 
 When run locally on the X4500, the extract took about 1 second, with
 zilstat showing all zeroes.  I wonder if this is a case where that
 ZIL bypass kicks in for 32K writes, in the local tar extraction.
 Does zilstat's underlying dtrace include these bypass-writes in the
 totals it displays?
 
 I think if it's possible to get stats on this bypassed data, I'd like
 to see it as another column (or set of columns) in the zilstat output.
 
 Regards,
 
 Marion
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

-- 
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Introducing zilstat

2009-02-04 Thread Jorgen Lundman


Richard Elling wrote:

 # ./zilstat.ksh
 N-Bytes  N-Bytes/s N-Max-Bytes/sB-Bytes  B-Bytes/s B-Max-Bytes/s
  376720 376720 376720128614412861441286144
  419608 419608 419608136806413680641368064
  555256 555256 555256173260817326081732608
  538808 538808 538808167936016793601679360
  626048 626048 626048177356817735681773568
  753824 753824 753824210534421053442105344
  652632 652632 652632171622417162241716224

 Fairly constantly between 1-2MB/s. That doesn't sound too bad though.   
 
 I think your workload would benefit from a fast, separate log device.

Interesting. Today is the first I've heard about it.. one of the x4500 
is really really slow, something like 15 seconds to do an unlink. But I 
assumed it was because the ufs inside zvol is _really_ bloated. Maybe we 
need to experiment with it on the test x4500.


 
 Highest recommended is what you need to get the job done.
 For the most part, the defaults work well.  But you can experiment
 with them and see if you can get better results.

It came shipped with 16. And I'm sorry but 16 didn't cut it at all :) We 
set it at 1024 as it was the highest number I found via Google.

Lund


-- 
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replacing HDD in x4500

2009-02-03 Thread Jorgen Lundman

I've been told we got a BugID:

3-way deadlock happens in ufs filesystem on zvol when writing ufs log

but I can not view the BugID yet (presumably due to my accounts weak 
credentials)

Perhaps it isn't something we do wrong, that would be a nice change.

Lund


Jorgen Lundman wrote:
 I assume you've changed the failmode to continue already?

 http://prefetch.net/blog/index.php/2008/03/01/configuring-zfs-to-gracefully-deal-with-failures/
  
 
 This appears to be new to 10/08, so that is another vote to upgrade. 
 Also interesting that the default is wait, since it almost behaves 
 like it. Not sure why it would block zpool, zfs and df commands as 
 well though?
 
 
 Lund
 
 

-- 
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replacing HDD in x4500

2009-01-27 Thread Jorgen Lundman

Thanks for your reply,

While the savecore is working its way up the chain to (hopefully) Sun, 
the vendor asked us not to use it, so we moved x4500-02 to use x4500-04 
and x4500-05. But perhaps moving to Sol 10 10/08 on x4500-02 when fixed 
is the way to go.

The savecore had the usual info, that everything is blocked waiting on 
locks:


   601*  threads trying to get a mutex (598 user, 3 kernel)
   longest sleeping 10 minutes 13.52 seconds earlier
   115*  threads trying to get an rwlock (115 user, 0 kernel)

1678   total threads in allthreads list (1231 user, 447 kernel)
10   thread_reapcnt
 0   lwp_reapcnt
1688   nthread

   thread pri pctcpu   idle   PID  wchan 
command
   0xfe8000137c80  60  0.000  -9m44.88s 0 0xfe84d816cdc8 
sched
   0xfe800092cc80  60  0.000  -9m44.52s 0 0xc03c6538 
sched
   0xfe8527458b40  59  0.005  -1m41.38s  1217 0xb02339e0 
/usr/lib/nfs/rquotad
   0xfe8527b534e0  60  0.000   -5m4.79s   402 0xfe84d816cdc8 
/usr/lib/nfs/lockd
   0xfe852578f460  60  0.000  -4m59.79s   402 0xc0633fc8 
/usr/lib/nfs/lockd
   0xfe8532ad47a0  60  0.000  -10m4.40s   623 0xfe84bde48598 
/usr/lib/nfs/nfsd
   0xfe8532ad3d80  60  0.000  -10m9.10s   623 0xfe84d816ced8 
/usr/lib/nfs/nfsd
   0xfe8532ad3360  60  0.000  -10m3.77s   623 0xfe84d816cde0 
/usr/lib/nfs/nfsd
   0xfe85341e9100  60  0.000  -10m6.85s   623 0xfe84bde48428 
/usr/lib/nfs/nfsd
   0xfe85341e8a40  60  0.000  -10m4.76s   623 0xfe84d816ced8 
/usr/lib/nfs/nfsd

SolarisCAT(vmcore.0/10X) tlist sobj locks | grep nfsd | wc -l
  680

scl_writer = 0xfe8000185c80  - locking thread



thread 0xfe8000185c80
 kernel thread: 0xfe8000185c80  PID: 0 
cmd: sched
t_wchan: 0xfbc8200a  sobj: condition var (from genunix:bflush+0x4d)
t_procp: 0xfbc22dc0(proc_sched)
   p_as: 0xfbc24a20(kas)
   zone: global
t_stk: 0xfe8000185c80  sp: 0xfe8000185aa0  t_stkbase: 
0xfe8000181000
t_pri: 99(SYS)  pctcpu: 0.00
t_lwp: 0x0  psrset: 0  last CPU: 0
idle: 44943 ticks (7 minutes 29.43 seconds)
start: Tue Jan 27 23:44:21 2009
age: 674 seconds (11 minutes 14 seconds)
tstate: TS_SLEEP - awaiting an event
tflg:   T_TALLOCSTK - thread structure allocated from stk
tpflg:  none set
tsched: TS_LOAD - thread is in memory
 TS_DONT_SWAP - thread/LWP should not be swapped
pflag:  SSYS - system resident process

pc:  0xfb83616f unix:_resume_from_idle+0xf8 resume_return
startpc: 0xeff889e0 zfs:spa_async_thread+0x0

unix:_resume_from_idle+0xf8 resume_return()
unix:swtch+0x12a()
genunix:cv_wait+0x68()
genunix:bflush+0x4d()
genunix:ldi_close+0xbe()
zfs:vdev_disk_close+0x6a()
zfs:vdev_close+0x13()
zfs:vdev_raidz_close+0x26()
zfs:vdev_close+0x13()
zfs:vdev_reopen+0x1d()
zfs:spa_async_reopen+0x5f()
zfs:spa_async_thread+0xc8()
unix:thread_start+0x8()
-- end of kernel thread's stack --




Blake wrote:
 I'm not an authority, but on my 'vanilla' filer, using the same
 controller chipset as the thumper, I've been in really good shape
 since moving to zfs boot in 10/08 and doing 'zpool upgrade' and 'zfs
 upgrade' to all my mirrors (3 3-way).  I'd been having similar
 troubles to yours in the past.
 
 My system is pretty puny next to yours, but it's been reliable now for
 slightly over a month.
 
 
 On Tue, Jan 27, 2009 at 12:19 AM, Jorgen Lundman lund...@gmo.jp wrote:
 The vendor wanted to come in and replace an HDD in the 2nd X4500, as it
 was constantly busy, and since our x4500 has always died miserably in
 the past when a HDD dies, they wanted to replace it before the HDD
 actually died.

 The usual was done, HDD replaced, resilvering started and ran for about
 50 minutes. Then the system hung, same as always, all ZFS related
 commands would just hang and do nothing. System is otherwise fine and
 completely idle.

 The vendor for some reason decided to fsck root-fs, not sure why as it
 is mounted with logging, and also decided it would be best to do so
 from a CDRom boot.

 Anyway, that was 12 hours ago and the x4500 is still down. I think they
 have it at single-user prompt resilvering again. (I also noticed they'd
 decided to break the mirror of the root disks for some very strange
 reason). It still shows:

   raidz1  DEGRADED 0 0 0
 c0t1d0ONLINE   0 0 0
 replacing UNAVAIL  0 0 0  insufficient replicas
   c1t1d0s0/o  OFFLINE  0 0 0
   c1t1d0  UNAVAIL  0 0 0  cannot open

 So I am pretty sure it'll hang again sometime soon. What is interesting
 though is that this is on x4500-02, and all our previous troubles mailed
 to the list was regarding our first x4500. The hardware is all
 different, but identical. Solaris 10 5/08.

 Anyway, I think they want to boot CDrom

Re: [zfs-discuss] Replacing HDD in x4500

2009-01-27 Thread Jorgen Lundman
 
 I assume you've changed the failmode to continue already?
 
 http://prefetch.net/blog/index.php/2008/03/01/configuring-zfs-to-gracefully-deal-with-failures/
  

This appears to be new to 10/08, so that is another vote to upgrade. 
Also interesting that the default is wait, since it almost behaves 
like it. Not sure why it would block zpool, zfs and df commands as 
well though?


Lund


-- 
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Replacing HDD in x4500

2009-01-26 Thread Jorgen Lundman

The vendor wanted to come in and replace an HDD in the 2nd X4500, as it 
was constantly busy, and since our x4500 has always died miserably in 
the past when a HDD dies, they wanted to replace it before the HDD 
actually died.

The usual was done, HDD replaced, resilvering started and ran for about 
50 minutes. Then the system hung, same as always, all ZFS related 
commands would just hang and do nothing. System is otherwise fine and 
completely idle.

The vendor for some reason decided to fsck root-fs, not sure why as it 
is mounted with logging, and also decided it would be best to do so 
from a CDRom boot.

Anyway, that was 12 hours ago and the x4500 is still down. I think they 
have it at single-user prompt resilvering again. (I also noticed they'd 
decided to break the mirror of the root disks for some very strange 
reason). It still shows:

   raidz1  DEGRADED 0 0 0
 c0t1d0ONLINE   0 0 0
 replacing UNAVAIL  0 0 0  insufficient replicas
   c1t1d0s0/o  OFFLINE  0 0 0
   c1t1d0  UNAVAIL  0 0 0  cannot open

So I am pretty sure it'll hang again sometime soon. What is interesting 
though is that this is on x4500-02, and all our previous troubles mailed 
to the list was regarding our first x4500. The hardware is all 
different, but identical. Solaris 10 5/08.

Anyway, I think they want to boot CDrom to fsck root again for some 
reason, but since customers have been without their mail for 12 hours, 
they can go a little longer, I guess.

What I was really wondering, has there been any progress or patches 
regarding the system always hanging whenever a HDD dies (or is replaced 
it seems). It really is rather frustrating.

Lund

-- 
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-16 Thread Jorgen Lundman

Sorry, I popped up to Hokkdaido for a holiday. I want to thank you all 
for the replies.

I mentioned AVS as I thought it to do be the only product close to 
enabling us to do a (makeshift) fail-over setup.

We have 5-6 ZFS filesystem, and 5-6 zvol with UFS (for quotas). To do 
zfs send snapshots every minute might perhaps be possible (just not 
very attractive), but if the script dies at any time, you need to resend 
the full volumes, this currently takes 5 days. (Even using nc).

Since we are forced by vendor to run Sol10, it sounds like AVS is not an 
option for us.

If we were interested in finding a method to replicate data to a 2nd 
x4500, what other options are there for us? We do not need instant 
updates, just someplace to fail-over to when the x4500 panics, or a HDD 
dies. (Which equals panic) It currently takes 2 hours to fsck the UFS 
volumes after a panic (and yes, they are logging; it is actually just 
the one UFS volume that always needs fsck).

Vendor has mentioned VeritasVolumReplicator but I was under the 
impression that Veritas is a whole different set to zfs/zpool.

Lund




Jim Dunham wrote:
 On Sep 11, 2008, at 5:16 PM, A Darren Dunham wrote:
 On Thu, Sep 11, 2008 at 04:28:03PM -0400, Jim Dunham wrote:
 On Sep 11, 2008, at 11:19 AM, A Darren Dunham wrote:

 On Thu, Sep 11, 2008 at 10:33:00AM -0400, Jim Dunham wrote:
 The issue with any form of RAID 1, is that the instant a disk  
 fails
 out of the RAID set, with the next write I/O to the remaining  
 members
 of the RAID set, the failed disk (and its replica) are instantly  
 out
 of sync.
 Does raidz fall into that category?
 Yes. The key reason is that as soon as ZFS (or other mirroring  
 software)
 detects a disk failure in a RAID 1 set, it will stop writing to the
 failed disk, which also means it will also stop writing to the  
 replica of
 the failed disk. From the point of view of the remote node, the  
 replica
 of the failed disk is no longer being updated.

 Now if replication was stopped, or the primary node powered off or
 panicked, during the import of the ZFS storage pool on the secondary
 node, the replica of the failed disk must not be part of the ZFS  
 storage
 pool as its data is stale. This happens automatically, since the ZFS
 metadata on the remaining disks have already given up on this  
 member of
 the RAID set.
 Then I misunderstood what you were talking about.  Why the restriction
 on RAID 1 for your statement?
 
 No restriction. I meant to say, RAID 1 or greater.
 
 Even for a mirror, the data is stale and
 it's removed from the active set.  I thought you were talking about
 block parity run across columns...

 -- 
 Darren
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 Jim Dunham
 Engineering Manager
 Storage Platform Software Group
 Sun Microsystems, Inc.
 work: 781-442-4042
 cell: 603.724.2972
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] x4500 vs AVS ?

2008-09-03 Thread Jorgen Lundman

If we get two x4500s, and look at AVS, would it be possible to:

1) Setup AVS to replicate zfs, and zvol (ufs) from 01 - 02 ? Supported 
by Sol 10 5/08 ?


Assuming 1, if we setup a home-made IP fail-over so that; should 01 go 
down, all clients are redirected to 02.


2) Fail-back, are there methods in AVS to handle fail-back? Since 02 has 
been used, it will have newer/modified files, and will need to replicate 
backwards until synchronised, before fail-back can occur.


We did ask our vendor, but we were just told that AVS does not support 
x4500.


Lund

-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.

2008-08-11 Thread Jorgen Lundman

So it does appear that it is zpool that hangs, possibly during 
resilvering (we lost a HDD at midnight, this what was started all this).

After boot:

x4500-02:~# zpool status -x
   pool: zpool1
  state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
 continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scrub: resilver in progress, 11.10% done, 2h11m to go
config:

 NAME  STATE READ WRITE CKSUM
 zpool1DEGRADED 0 0 0
   raidz1  ONLINE   0 0 0
[snip]
 c7t3d0ONLINE   0 0 0
 replacing UNAVAIL  0 0 0  insufficient replicas
   c8t3d0s0/o  UNAVAIL  0 0 0  cannot open
   c8t3d0  UNAVAIL  0 0 0  cannot open
   raidz1  ONLINE   0 0 0


You can run zpool for about 4-5 minutes, then they start to hang. For 
example, I tried to issue;

# zpool offline zpool1 c8t3d0

.. and the system stops z-responding.

# mdb -k

::ps!grep pool

R732722732662  0 0x4a004000 b92a8030 zpool

  b92a8030::walk thread|::findstack -v
stack pointer for thread fe85285d07e0: fe800283fc40
[ fe800283fc40 _resume_from_idle+0xf8() ]
   fe800283fc70 swtch+0x12a()
   fe800283fc90 cv_wait+0x68()
   fe800283fcc0 spa_config_enter+0x50()
   fe800283fce0 spa_vdev_enter+0x2a()
   fe800283fd10 vdev_offline+0x29()
   fe800283fd40 zfs_ioc_vdev_offline+0x58()
   fe800283fd80 zfsdev_ioctl+0x13e()
   fe800283fd90 cdev_ioctl+0x1d()
   fe800283fdb0 spec_ioctl+0x50()
   fe800283fde0 fop_ioctl+0x25()
   fe800283fec0 ioctl+0xac()
   fe800283ff10 sys_syscall32+0x101()


Similarly, nfs:

  ::ps!grep nfsd
R548  1548548  1 0x4200 b92ad6d0 nfsd
  b92ad6d0::walk thread|::findstack -v
stack pointer for thread 9af8e540: fe8001046cc0
[ fe8001046cc0 _resume_from_idle+0xf8() ]
   fe8001046cf0 swtch+0x12a()
   fe8001046d40 cv_wait_sig_swap_core+0x177()
   fe8001046d50 cv_wait_sig_swap+0xb()
   fe8001046da0 cv_waituntil_sig+0xd7()
   fe8001046e50 poll_common+0x420()
   fe8001046ec0 pollsys+0xbe()
   fe8001046f10 sys_syscall32+0x101()





-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.

2008-08-11 Thread Jorgen Lundman
ok, so I tried installing 138053-02, and umounting/unsharing for the 
entire resilvering process,

meanwhile, onsite support decided to replace the mainboard due to some 
reason (not that I was full of confidence here)

... and between us, it has actually been up for 2 hours, and has a clean 
zpool status.

Going to get some sleep, and really hope it has been fixed. Thank you to 
everyone who helped.

Lund


Jorgen Lundman wrote:
 
 Jorgen Lundman wrote:
 Anyway, it has almost rebooted, so I need to go remount everything.

 
 Not that it wants to stay up for longer than ~20 mins, then hangs. In 
 that all IO hangs, including nfsd.
 
 I thought this might have been related:
 
 http://sunsolve.sun.com/search/document.do?assetkey=1-66-233341-1
 
 # /usr/X11/bin/scanpci | /usr/sfw/bin/ggrep -A1 vendor 0x11ab device
 0x6081
 pci bus 0x0001 cardnum 0x01 function 0x00: vendor 0x11ab device 0x6081
   Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller
 
 But it claims resolved for our version:
 
 SunOS x4500-02.unix 5.10 Generic_127128-11 i86pc i386 i86pc
 
 Perhaps I should see if there are any recommended patches for Sol 10 5/08?
 
 
 Lund
 

-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] x4500 dead HDD, hung server, unable to boot.

2008-08-10 Thread Jorgen Lundman

SunOS x4500-02.unix 5.10 Generic_127128-11 i86pc i386 i86pc


Admittedly we are not having much luck with the x4500s.

This time it was the new x4500, running Solaris 10 5/08. Drive 
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0 (sd30): stopped 
responding, and even after a hard reset, it would simply repeat 
retryable, reset, and fatal messages forever.

So unable to login on console. Again we ended up with the problem of 
knowing which HDD that actually is broken. Turns out to be drive #40. 
(Has anyone got a map we can print? Since we couldn't boot it, any Unix 
commands needed to map are a bit useless, nor do we have a hd utility).

That a HDD died in the first month of operation is understandable, but 
does it really have to take the whole server with it? Not to mention 
stop it from booting. Eventually the NOC staff guessed the correct drive 
from the blinking of LEDs (no LED was RED), and we were able to boot.

Log outputs:

Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 670675 kern.info] NOTICE: 
marvell88sx5: device on port 3 reset: device disconnected or device error
Aug 11 08:47:59 x4500-02.unix sata: [ID 801593 kern.notice] NOTICE: 
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Aug 11 08:47:59 x4500-02.unix  port 3: device reset
Aug 11 08:47:59 x4500-02.unix sata: [ID 801593 kern.notice] NOTICE: 
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Aug 11 08:47:59 x4500-02.unix  port 3: link lost
Aug 11 08:47:59 x4500-02.unix sata: [ID 801593 kern.notice] NOTICE: 
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Aug 11 08:47:59 x4500-02.unix  port 3: link established
Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 812950 kern.warning] 
WARNING: marvell88sx5: error on port 3:
Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 517869 kern.info] 
device error
Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 517869 kern.info] 
device disconnected
Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 517869 kern.info] 
device connected
Aug 11 08:47:59 x4500-02.unix marvell88sx: [ID 517869 kern.info] 
EDMA self disabled
Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.warning] WARNING: 
/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd30):
Aug 11 08:47:59 x4500-02.unix   Error for Command: read 
Error Level: Retryable
Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.notice] 
Requested Block: 439202Error Block: 439202
Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.notice] Vendor: 
ATASerial Number:
Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.notice] Sense 
Key: No Additional Sense
Aug 11 08:47:59 x4500-02.unix scsi: [ID 107833 kern.notice] ASC: 0x0 
(no additional sense info), ASCQ: 0x0, FRU: 0x0


scrub: resilver in progress, 10.27% done, 2h14m to go



Perhaps not related, but equally annoying:

# fmdump
TIME UUID SUNW-MSG-ID
Aug 11 08:16:32.3925 64da6f29-4dda-44aa-e9ca-ad7054aaeaa1 ZFS-8000-D3
Aug 11 09:08:18.7834 086e6170-e4c7-c66b-c908-e37840db7e96 ZFS-8000-D3

# fmdump -v -u 086e6170-e4c7-c66b-c908-e37840db7e96
TIME UUID SUNW-MSG-ID
Aug 11 09:08:18.7834 086e6170-e4c7-c66b-c908-e37840db7e96 ZFS-8000-D3
^C^Z^\

Alas, kill -9 does not kill fmdump either, and it appears to lock the 
server (as well). I will remove the command for now, as it definitely 
hangs the server every time. Hard reset done again.

Lund



-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.

2008-08-10 Thread Jorgen Lundman

 the 'hd' utility on the tools and drivers cd produces the attached 
 output on thumper.
 

Clearly I need to find and install this utility, but even then, that 
seems to just add yet another way to number the drives.

The message I get from kernel is:

/[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0 (sd30):

And I need to get the answer 40. The hd output additionally gives me 
sdar  ?

Lund

-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.

2008-08-10 Thread Jorgen Lundman

 See http://www.sun.com/servers/x64/x4500/arch-wp.pdf page 21.
 Ian

Referring to Page 20? That does show the drive order, just like it does 
on the box, but not how to map them from the kernel message to drive 
slot number.

Lund


-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.

2008-08-10 Thread Jorgen Lundman

 Does the SATA controller show any information in its log (if you go into 
 the controller BIOS, if there is one)?
 
 Seeing more reports of full systems hangs from an unresponsive drive 
 makes me very concerned about bring a 4500 into our environment  :(
 

Not that I can see. Rebooting the new x4500 for the 6th time now as it 
keeps hanging on IO. (Box is 100% idle, but any IO commands like 
zpool/zfs/fmdump etc will just hung). I have absolutely no idea why it 
hangs now, we have pulled out the replacement drive to see if it stays 
up (in case it is a drive channel problem).

The most disappointing aspects of all this, is the incredibly poor 
support we have had from our vendor (compared to NetApp support that we 
have had in the past). I would have thought being the biggest ISP in 
Japan would mean we'd be interesting to Sun, even if just a little bit. 
I suspect we are one the first to try x4500 here as well.

Anyway, it has almost rebooted, so I need to go remount everything.

Lund

-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.

2008-08-10 Thread Jorgen Lundman


Jorgen Lundman wrote:
 
 Anyway, it has almost rebooted, so I need to go remount everything.
 

Not that it wants to stay up for longer than ~20 mins, then hangs. In 
that all IO hangs, including nfsd.

I thought this might have been related:

http://sunsolve.sun.com/search/document.do?assetkey=1-66-233341-1

# /usr/X11/bin/scanpci | /usr/sfw/bin/ggrep -A1 vendor 0x11ab device
0x6081
pci bus 0x0001 cardnum 0x01 function 0x00: vendor 0x11ab device 0x6081
  Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller

But it claims resolved for our version:

SunOS x4500-02.unix 5.10 Generic_127128-11 i86pc i386 i86pc

Perhaps I should see if there are any recommended patches for Sol 10 5/08?


Lund

-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.

2008-08-10 Thread Jorgen Lundman


James C. McPherson wrote:
 One question to ask is: are you seeing the same messages
 on your system that are shown in that Sunsolve doc? Not
 just the write errors, but the whole sequence.

Unfortunately, I get no messages at all. I/O just stops. But login 
shells are fine, as long as I don't issue commands that query zfs/zpool 
in any way. Nothing on console, dmesg, or the various log files. Just 
booted with -k since it happens so frequently.

Most likely are not related to that bug. Having to do hard resets 
(well,from ILOM) doesn't feel good.


 Can you force a crash dump when the system hangs? If you
 can, then you could provide that to the support engineer
 who has accepted the call you've already logged with Sun's
 support organisation.
 
 You _did_ log a call, didn't you?

Crash dump will be next time (30 mins or so), and we can only log a call 
with vendor, and if they feel like it, will push it to Sun.  Although, 
we do have SunSolve logins, can we by-pass the middleman, and avoid the 
whole translation fiasco, and log directly with Sun?

Lund

-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replacing the boot HDDs in x4500

2008-08-01 Thread Jorgen Lundman


Ross wrote:
 I do think a zfs import after booting from the new drives should 
  work fine, and it doesn't automatically upgrade the pool,
  so you can still go back to snv_70b if needed.

Alas, it would be downgrade. Which is why I think it will fail.


 
 PS.  In your first post you said you had no time to copy the filesystem, so 
 why are you trying to use send/receive?  Both rsync and send/receive will 
 take a long time to complete.
  
  

zfs send of the /zvol/ufs volume would take 2 days. Currently it panics 
at least once a day. There appears to be no way to resume a half 
transfered zfs send. So, rsyncing smaller bits.

zfs send -i only works if you have a full copy already, which we can't 
get from above.



-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replacing the boot HDDs in x4500

2008-08-01 Thread Jorgen Lundman


Ross wrote:
 Not if you don't upgrade the pool it won't.  ZFS can import and work with an 
 old version of the filesystem fine.  The manual page for zpool upgrade says:
 Older versions can continue to be used
 
 Just import it on Solaris 5/08 without doing the upgrade.  Your ZFS pool will 
 be available and can be served out from the new version.  If you do find any 
 problems (which I wouldn't expect to be honest), you can plug your old 
 snv_70b boot disk in if necessary.

Now/old server is ZFS version 2 zfs. The new boot HDDs/OS, are only ZFS 
version 1. I do not think zfs version 1 will read version 2. I see no 
script talking about converting a version 2 to a version 1.



-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Replacing the boot HDDs in x4500

2008-07-31 Thread Jorgen Lundman

We have having some issues in copying the existing data on our Sol 11 
snv_70b x4500 to the new Sol 10 5/08 x4500. With all the panics, and 
crashes, we have had no chance to completely copy a single filesystem. 
(ETA for that is about 48 hours).

What are the chances that I can zpool import all filesystems if I were 
to simply drop in the two mirrored Sol 10 5/08 boot HDDs on the x4500 
and reboot? I assume Sol10 5/08 zpool version would be newer, so in 
theory it would work.

Comments?



-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 performance tuning.

2008-07-24 Thread Jorgen Lundman
We have had a disk fail in the the existing x4500 and it sure froze the 
whole server. I believe it is an OS problem which (should have) been 
fixed in a version newer than we have. If you want me to test it on the 
new x4500 because it runs Sol10 508 I can do.


Ross wrote:
 Hi Jorgen,
 
 This isn't an answer to your problem I'm afraid, but a request for you to do 
 a test when you get your new x4500.
 
 Could you try pulling a SATA drive to see if the system hangs?  I'm finding 
 Solaris just locks up if I pull a drive connected to the Supermicro 
 AOC-SAT2-MV8 card, and I was under the belief that uses the same chipset as 
 the Thumper.
 
 I'm hoping this is just a driver problem, or a problem specific to the 
 Supermicro card, but since our loan x4500 went back to Sun I'm unable to test 
 this myself, and if the x4500's do lock up I'm a bit concerned about how they 
 handle hardware failures.
 
 thanks,
 
 Ross
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 performance tuning.

2008-07-24 Thread Jorgen Lundman

Since we were drowning, we decided to go ahead and reboot with my 
guesses, even though I have not heard and expert opinions on the 
changes.  (Also, 3 mins was way under estimated. It takes 12 minutes to 
reboot our x4500).

The new values are:  (original)

set bufhwm_pct=10(2%)
set maxusers=4096(2048)
set ndquot=5048000   (50480)
set ncsize=1038376   (129797)
set ufs_ninode=1038376   (129797)


It does appear to run more better, but it hard to tell. 7 out of 10 
tries, statvfs64 takes less than 2seconds, but I did get as high as 14s.

However, 2 hours later the x4500 hung. Pingable, but no console, nor NFS 
response. The LOM was fine, and I performed a remote reboot.

Since then it has stayed up 5 hours.

PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP 

521 daemon   7404K 6896K sleep   60  -20   0:25:03 3.1% nfsd/754
Total: 1 processes, 754 lwps, load averages: 0.82, 0.79, 0.79

CPU states: 90.6% idle,  0.0% user,  9.4% kernel,  0.0% iowait,  0.0% swap
Memory: 16G real, 829M free, 275M swap in use, 16G swap free


  10191915 total name lookups (cache hits 82%)

 maxsize 1038376
 maxsize reached 993770

(Increased it by nearly x10 and it still gets a high 'reached').


Lund


Jorgen Lundman wrote:
 We are having slow performance with the UFS volumes on the x4500. They
 are slow even on the local server. Which makes me think it is (for once) 
 not NFS related.
 
 
 Current settings:
 
 SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc
 
 # cat /etc/release
  Solaris Express Developer Edition 9/07 snv_70b X86
 Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
  Use is subject to license terms.
  Assembled 30 August 2007
 
 NFSD_SERVERS=1024
 LOCKD_SERVERS=128
 
 PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP
 
   12249 daemon   7204K 6748K sleep   60  -20  54:16:26  14% nfsd/731
 
 load averages:  2.22,  2.32,  2.42 12:31:35
 63 processes:  62 sleeping, 1 on cpu
 CPU states: 68.7% idle,  0.0% user, 31.3% kernel,  0.0% iowait,  0.0% swap
 Memory: 16G real, 1366M free, 118M swap in use, 16G swap free
 
 
 /etc/system:
 
 set ndquot=5048000
 
 
 We have a setup like:
 
 /export/zfs1
 /export/zfs2
 /export/zfs3
 /export/zfs4
 /export/zfs5
 /export/zdev/vol1/ufs1
 /export/zdev/vol2/ufs2
 /export/zdev/vol3/ufs3
 
 What is interesting is that if I run df, it will display everything at 
 normal speed, but pause before vol1/ufs1 file system. truss confirms 
 that statvfs64() is slow (5 seconds usually). All other ZFS and UFS 
 filesystems behave normally. vol1/ufs1 is the most heavily used UFS 
 filesystem.
 
 Disk:
 /dev/zvol/dsk/zpool1/ufs1
 991G   224G   758G23%/export/ufs1
 
 Inodes:
 /dev/zvol/dsk/zpool1/ufs1
  37698475 2504405360%   /export/ufs1
 
 
 
 
 Possible problems:
 
 # vmstat -s
 866193018 total name lookups (cache hits 57%)
 
 # kstat -n inode_cache
 module: ufs instance: 0
 name:   inode_cache class:ufs
   maxsize 129797
   maxsize reached 269060
   thread idles319098740
   vget idles  62136
 
 
 This leads me to think we should consider setting;
 
 set ncsize=259594(doubled... are there better values?)
 set ufs_ninode=259594
 
 in /etc/system, and reboot. But it is costly to reboot based only on my
 guess. Do you have any other suggestions to explore? Will this help?
 
 
 Sincerely,
 
 Jorgen Lundman
 
 

-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] x4500 performance tuning.

2008-07-23 Thread Jorgen Lundman

We are having slow performance with the UFS volumes on the x4500. They
are slow even on the local server. Which makes me think it is (for once) 
not NFS related.


Current settings:

SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc

# cat /etc/release
 Solaris Express Developer Edition 9/07 snv_70b X86
Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
 Use is subject to license terms.
 Assembled 30 August 2007

NFSD_SERVERS=1024
LOCKD_SERVERS=128

PID USERNAME  SIZE   RSS STATE  PRI NICE  TIME  CPU PROCESS/NLWP

  12249 daemon   7204K 6748K sleep   60  -20  54:16:26  14% nfsd/731

load averages:  2.22,  2.32,  2.42 12:31:35
63 processes:  62 sleeping, 1 on cpu
CPU states: 68.7% idle,  0.0% user, 31.3% kernel,  0.0% iowait,  0.0% swap
Memory: 16G real, 1366M free, 118M swap in use, 16G swap free


/etc/system:

set ndquot=5048000


We have a setup like:

/export/zfs1
/export/zfs2
/export/zfs3
/export/zfs4
/export/zfs5
/export/zdev/vol1/ufs1
/export/zdev/vol2/ufs2
/export/zdev/vol3/ufs3

What is interesting is that if I run df, it will display everything at 
normal speed, but pause before vol1/ufs1 file system. truss confirms 
that statvfs64() is slow (5 seconds usually). All other ZFS and UFS 
filesystems behave normally. vol1/ufs1 is the most heavily used UFS 
filesystem.

Disk:
/dev/zvol/dsk/zpool1/ufs1
991G   224G   758G23%/export/ufs1

Inodes:
/dev/zvol/dsk/zpool1/ufs1
 37698475 2504405360%   /export/ufs1




Possible problems:

# vmstat -s
866193018 total name lookups (cache hits 57%)

# kstat -n inode_cache
module: ufs instance: 0
name:   inode_cache class:ufs
maxsize 129797
maxsize reached 269060
thread idles319098740
vget idles  62136


This leads me to think we should consider setting;

set ncsize=259594(doubled... are there better values?)
set ufs_ninode=259594

in /etc/system, and reboot. But it is costly to reboot based only on my
guess. Do you have any other suggestions to explore? Will this help?


Sincerely,

Jorgen Lundman


-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 performance tuning.

2008-07-23 Thread Jorgen Lundman
 SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc
 That's a very old release, have you considered upgrading?
 Ian.
 

It was the absolute latest version available when we received the x4500, 
and now it is live and supporting a large number of customers. However, 
the 2nd unit will arrive next week (Will be Sol10 508, as that is the 
only/newest OS version the vendor will support).

So yes, in a way we will move to a newer version if we can work out a 
good way to migrate from one x4500 to another x4500:)

But in the meanwhile, we were hoping we could do some kernel tweaking, 
reboot (3 minute downtime) and it would perform a little better. It 
would be nice to have someone who knows more than me, give their opinion 
  as to if my guesses has any chances of succeeding.

For example, Postfix delivering mail, system calls like open() and 
fdsync() is currently taking upwards of 7 seconds to complete.

Lund


-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 panic report.

2008-07-11 Thread Jorgen Lundman

Today we had another panic, at least it was during work time :) Just a 
shame the 999GB ufs takes 80+ mins to fsck. (Yes, it is mounted 'logging').







panic[cpu3]/thread=ff001e70dc80:
free: freeing free block, dev:0xb60024, block:13144, ino:1737885, 
fs:/export
/saba1


ff001e70d500 genunix:vcmn_err+28 ()
ff001e70d550 ufs:real_panic_v+f7 ()
ff001e70d5b0 ufs:ufs_fault_v+1d0 ()
ff001e70d6a0 ufs:ufs_fault+a0 ()
ff001e70d770 ufs:free+38f ()
ff001e70d830 ufs:indirtrunc+260 ()
ff001e70dab0 ufs:ufs_itrunc+738 ()
ff001e70db60 ufs:ufs_trans_itrunc+128 ()
ff001e70dbf0 ufs:ufs_delete+3b0 ()
ff001e70dc60 ufs:ufs_thread_delete+da ()
ff001e70dc70 unix:thread_start+8 ()

syncing file systems...

panic[cpu3]/thread=ff001e70dc80:
panic sync timeout

dumping to /dev/dsk/c6t0d0s1, offset 65536, content: kernel


  $c
vpanic()
vcmn_err+0x28(3, f783a128, ff001e70d678)
real_panic_v+0xf7(0, f783a128, ff001e70d678)
ufs_fault_v+0x1d0(ff04facf65c0, f783a128, ff001e70d678)
ufs_fault+0xa0()
free+0x38f(ff001e70d8d0, a6a7358, 2000, 89)
indirtrunc+0x260(ff001e70d8d0, a6a42b8, , 0, 89)
ufs_itrunc+0x738(ff0550b9fde0, 0, 81, fffec0594db0)
ufs_trans_itrunc+0x128(ff0550b9fde0, 0, 81, fffec0594db0)
ufs_delete+0x3b0(fffed20e2a00, ff0550b9fde0, 1)
ufs_thread_delete+0xda(64704840)
thread_start+8()

  ::panicinfo
  cpu3
   thread ff001e70dc80
  message
free: freeing free block, dev:0xb60024, block:13144, ino:1737885, 
fs:/export
/saba1
  rdi f783a128
  rsi ff001e70d678
  rdx f783a128
  rcx ff001e70d678
   r8 f783a128
   r90
  rax3
  rbx0
  rbp ff001e70d4d0
  r10 fffec3d40580
  r10 fffec3d40580
  r11 ff001e70dc80
  r12 f783a128
  r13 ff001e70d678
  r143
  r15 f783a128
   fsbase0
   gsbase fffec3d40580
   ds   4b
   es   4b
   fs0
   gs  1c3
   trapno0
  err0
  rip fb83c860
   cs   30
   rflags  246
  rsp ff001e70d488
   ss   38
   gdt_hi0
   gdt_lo 81ef
   idt_hi0
   idt_lo 7fff
  ldt0
 task   70
  cr0 8005003b
  cr2 fed0e010
  cr3  2c0
  cr4  6f8





Jorgen Lundman wrote:
 On Saturday the X4500 system paniced, and rebooted. For some reason the 
 /export/saba1 UFS partition was corrupt, and needed fsck. This is why 
 it did not come back online. /export/saba1 is mounted logging,noatime, 
 so fsck should never (-ish) be needed.
 
 SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc
 
 /export/saba1 on /dev/zvol/dsk/zpool1/saba1 
 read/write/setuid/devices/intr/largefiles/logging/quota/xattr/noatime/onerror=panic/dev=2d80024
  
 on Sat Jul  5 08:48:54 2008
 
 
 One possible related bug:
 
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4884138
 
 
 What would be the best solution? Go back to latest Solaris 10 and pass 
 it on to Sun support, or find a patch for this problem?
 
 
 
 Panic dump follows:
 
 
 -rw-r--r--   1 root root 2529300 Jul  5 08:48 unix.2
 -rw-r--r--   1 root root 10133225472 Jul  5 09:10 vmcore.2
 
 
 # mdb unix.2 vmcore.2
 Loading modules: [ unix genunix specfs dtrace cpu.AuthenticAMD.15 uppc 
 pcplusmp scsi_vhci ufs md ip hook neti sctp arp usba uhci s1394 qlc fctl 
 nca lofs zfs random cpc crypto fcip fcp logindmux nsctl sdbc ptm sv ii 
 sppp rdc nfs ]
 
   $c
 vpanic()
 vcmn_err+0x28(3, f783ade0, ff001e737aa8)
 real_panic_v+0xf7(0, f783ade0, ff001e737aa8)
 ufs_fault_v+0x1d0(fffed0bfb980, f783ade0, ff001e737aa8)
 ufs_fault+0xa0()
 dqput+0xce(1db26ef0)
 dqrele+0x48(1db26ef0)
 ufs_trans_dqrele+0x6f(1db26ef0)
 ufs_idle_free+0x16d(ff04f17b1e00)
 ufs_idle_some+0x152(3f60)
 ufs_thread_idle+0x1a1()
 thread_start+8()
 
 
   ::cpuinfo
   ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD 
 PROC
0 fbc2fc10  1b00  60   nono t-0 
 ff001e737c80 sched
1 fffec3a0a000  1f10  -1   nono t-0ff001e971c80
   (idle)
2 fffec3a02ac0  1f00  -1   nono t-1ff001e9dbc80
   (idle)
3 fffec3d60580  1f00  -1

[zfs-discuss] x4500 panic report.

2008-07-06 Thread Jorgen Lundman

On Saturday the X4500 system paniced, and rebooted. For some reason the 
/export/saba1 UFS partition was corrupt, and needed fsck. This is why 
it did not come back online. /export/saba1 is mounted logging,noatime, 
so fsck should never (-ish) be needed.

SunOS x4500-01.unix 5.11 snv_70b i86pc i386 i86pc

/export/saba1 on /dev/zvol/dsk/zpool1/saba1 
read/write/setuid/devices/intr/largefiles/logging/quota/xattr/noatime/onerror=panic/dev=2d80024
 
on Sat Jul  5 08:48:54 2008


One possible related bug:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4884138


What would be the best solution? Go back to latest Solaris 10 and pass 
it on to Sun support, or find a patch for this problem?



Panic dump follows:


-rw-r--r--   1 root root 2529300 Jul  5 08:48 unix.2
-rw-r--r--   1 root root 10133225472 Jul  5 09:10 vmcore.2


# mdb unix.2 vmcore.2
Loading modules: [ unix genunix specfs dtrace cpu.AuthenticAMD.15 uppc 
pcplusmp scsi_vhci ufs md ip hook neti sctp arp usba uhci s1394 qlc fctl 
nca lofs zfs random cpc crypto fcip fcp logindmux nsctl sdbc ptm sv ii 
sppp rdc nfs ]

  $c
vpanic()
vcmn_err+0x28(3, f783ade0, ff001e737aa8)
real_panic_v+0xf7(0, f783ade0, ff001e737aa8)
ufs_fault_v+0x1d0(fffed0bfb980, f783ade0, ff001e737aa8)
ufs_fault+0xa0()
dqput+0xce(1db26ef0)
dqrele+0x48(1db26ef0)
ufs_trans_dqrele+0x6f(1db26ef0)
ufs_idle_free+0x16d(ff04f17b1e00)
ufs_idle_some+0x152(3f60)
ufs_thread_idle+0x1a1()
thread_start+8()


  ::cpuinfo
  ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD 
PROC
   0 fbc2fc10  1b00  60   nono t-0 
ff001e737c80 sched
   1 fffec3a0a000  1f10  -1   nono t-0ff001e971c80
  (idle)
   2 fffec3a02ac0  1f00  -1   nono t-1ff001e9dbc80
  (idle)
   3 fffec3d60580  1f00  -1   nono t-1ff001ea50c80
  (idle)

  ::panicinfo
  cpu0
   thread ff001e737c80
  message dqput: dqp-dq_cnt == 0
  rdi f783ade0
  rsi ff001e737aa8
  rdx f783ade0
  rcx ff001e737aa8
   r8 f783ade0
   r90
  rax3
  rbx0
  rbp ff001e737900
  r10 fbc26fb0
  r10 fbc26fb0
  r11 ff001e737c80
  r12 f783ade0
  r13 ff001e737aa8
  r143
  r15 f783ade0
   fsbase0
   gsbase fbc26fb0
   ds   4b
   es   4b
   fsbase0
   gsbase fbc26fb0
   ds   4b
   es   4b
   fs0
   gs  1c3
   trapno0
  err0
  rip fb83c860
   cs   30
   rflags  246
  rsp ff001e7378b8
   ss   38
   gdt_hi0
   gdt_lo e1ef
   idt_hi0
   idt_lo 77c00fff
  ldt0
 task   70
  cr0 8005003b
  cr2 fee7d650
  cr3  2c0
  cr4  6f8

  ::msgbuf
quota_ufs: over hard disk limit (pid 600, uid 178199, inum 941499, fs 
/export/zero1)
quota_ufs: over hard disk limit (pid 600, uid 33647, inum 29504134, fs 
/export/zero1)

panic[cpu0]/thread=ff001e737c80:
dqput: dqp-dq_cnt == 0


ff001e737930 genunix:vcmn_err+28 ()
ff001e737980 ufs:real_panic_v+f7 ()
ff001e7379e0 ufs:ufs_fault_v+1d0 ()
ff001e737ad0 ufs:ufs_fault+a0 ()
ff001e737b00 ufs:dqput+ce ()
ff001e737b30 ufs:dqrele+48 ()
ff001e737b70 ufs:ufs_trans_dqrele+6f ()
ff001e737bc0 ufs:ufs_idle_free+16d ()
ff001e737c10 ufs:ufs_idle_some+152 ()
ff001e737c60 ufs:ufs_thread_idle+1a1 ()
ff001e737c70 unix:thread_start+8 ()

syncing file systems...




-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 panic report.

2008-07-06 Thread Jorgen Lundman
  Since the panic stack only ever goes through ufs, you should
log a call with Sun support.

We do have support, but they only speak Japanese, and I'm still quite 
poor at it. But I have started the process of having it translated and 
passed along to the next person. It is always fun to see what it becomes 
at the other end. Meanwhile, I like to research and see if it is a 
already known problem, rather than just sit around and wait.



  quota_ufs: over hard disk limit (pid 600, uid 33647, inum 29504134, 
fs /export/zero1)

 
 Although given the entry in the msgbuf, perhaps
 you might want to fix up your quota settings on that
 particular filesystem.
 

Customers pay for a certain amount of disk-quota, and being users, 
always stay close to the edge. Those messages are as constant as 
precipitation in the rainy season.

Are you suggestion that indicate a problem, beyond that the user is out 
of space?

Lund


-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >