Re: [zfs-discuss] zpool throughput: snv 134 vs 138 vs 143

2010-07-20 Thread Brent Jones
On Tue, Jul 20, 2010 at 10:29 AM, Chad Cantwell c...@iomail.org wrote:
 No, this wasn't it.  A non debug build with the same NIGHTLY_OPTIONS
 at Rich Lowe's 142 build is still very slow...

 On Tue, Jul 20, 2010 at 09:52:10AM -0700, Chad Cantwell wrote:
 Yes, I think this might have been it.  I missed the NIGHTLY_OPTIONS variable 
 in
 opensolaris and I think it was compiling a debug build.  I'm not sure what 
 the
 ramifications are of this or how much slower a debug build should be, but I'm
 recompiling a release build now so hopefully all will be well.

 Thanks,
 Chad

 On Tue, Jul 20, 2010 at 08:39:42AM +0100, Robert Milkowski wrote:
  On 20/07/2010 07:59, Chad Cantwell wrote:
  
  I've just compiled and booted into snv_142, and I experienced the same 
  slow dd and
  scrubbing as I did with my 142 and 143 compilations and with the Nexanta 
  3 RC2 CD.
  So, this would seem to indicate a build environment/process flaw rather 
  than a
  regression.
  
 
  Are you sure it is not a debug vs. non-debug issue?
 
 
  --
  Robert Milkowski
  http://milek.blogspot.com
 

Could it somehow not be compiling 64-bit support?


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send to remote any ideas for a faster way than ssh?

2010-07-19 Thread Brent Jones
On Mon, Jul 19, 2010 at 11:14 AM, Bruno Sousa bso...@epinfante.com wrote:
 Hi,

 If you can share those scripts that make use of mbuffer, please feel
 free to do so ;)


 Bruno
 On 19-7-2010 20:02, Brent Jones wrote:
 On Mon, Jul 19, 2010 at 9:06 AM, Richard Jahnel rich...@ellipseinc.com 
 wrote:

 I've tried ssh blowfish and scp arcfour. both are CPU limited long before 
 the 10g link is.

 I'vw also tried mbuffer, but I get broken pipe errors part way through the 
 transfer.

 I'm open to ideas for faster ways to to either zfs send directly or through 
 a compressed file of the zfs send output.

 For the moment I;

 zfs send  pigz
 scp arcfour the file gz file to the remote host
 gunzip  to zfs receive

 This takes a very long time for 3 TB of data, and barely makes use the 10g 
 connection between the machines due to the CPU limiting on the scp and 
 gunzip processes.

 Thank you for your thoughts

 Richard J.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 I found builds 130 had issues with TCP. I could reproduce TCP
 timeouts/socket errors up until I got on 132. I have stayed on 132 so
 far since I haven't found any other show stoppers.
 Mbuffer is probably your best bet, I rolled mbuffer into my
 replication scripts, which I could share if anyone's interested.
 Older versions of my script are on www.brentrjones.com but I have a
 new one which uses mbuffer


I can't seem to upload files to my Wordpress site any longer, so I put
it up on Pastebin for now:

http://pastebin.com/2feviTCy

Hope it helps others


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS list snapshots incurs large delay

2010-07-13 Thread Brent Jones
I have been running a pair of X4540's for almost 2 years now, the
usual spec (Quad core, 64GB RAM, 48x 1TB).
I have a pair of mirrored drives for rpool, and a Raidz set with 5-6
disks in each vdev for the rest of the disks.
I am running snv_132 on both systems.

I noticed an oddity on one particular system, that when running a
scrub, or a zfs list -t snapshot, the results take forever.
Mind you, these are identical systems in hardware, and software. The
primary system replicates all data sets to the secondary nightly, so
there isn't much of a discrepancy of space used.

Primary system:
# time zfs list -t snapshot | wc -l
979

real1m23.995s
user0m0.360s
sys 0m4.911s

Secondary system:
# time zfs list -t snapshot | wc -l
979

real0m1.534s
user0m0.223s
sys 0m0.663s


At the time of running both of those, no other activity was happening,
load average of .05 or so. Subsequent runs also take just as long on
the primary, no matter how many times I run it, it will take about 1
minute and 25 seconds each time, very little drift (+- 1 second if
that)

Both systems are at about 77% used space on the storage pool, no other
distinguishing factors that I can discern.
Upon a reboot, performance is respectable for a little while, but
within days, it will sink back to those levels. I suspect a memory
leak, but both systems run the same software versions and packages, so
I can't envision that.

Would anyone have any ideas what may cause this?

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [osol-help] ZFS list snapshots incurs large delay

2010-07-13 Thread Brent Jones

 It could be a disk failing and dragging I/O down with it.

 Try to check for high asvc_t with `iostat -XCn 1` and errors in `iostat -En`

 Any timeouts or retries in /var/adm/messages ?

 --
 Giovanni Tirloni
 gtirl...@sysdroid.com


I checked for high service times during a scrub, and all disks are
pretty equal.During a scrub, each disks peaks about 350 reads/sec,
with an asvc time of up to 30 during those read spikes (I assume it
means 30ms, which isn't terrible for a highly loaded SATA disk).
No errors reported by smartctl, iostat, or adm/messages

I opened a case on Sunsolve, but I fear since I am running a dev build
that I will be out of luck. I cannot run 2009.06 due to CIFS
segfaults, and problems with zfs send/recv hanging pools (well
documented issues).
I'd run Solaris proper, but not having in-kernel CIFS or COMSTAR would
be a major setback for me.



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] COMSTAR iSCSI and two Windows computers

2010-06-18 Thread Brent Jones
On Thu, Jun 17, 2010 at 10:44 PM, Giovanni giof...@gmail.com wrote:
 Hi guys

 I wanted to ask how i could setup a iSCSI device to be shared by 2 computers 
 concurrently, by that i mean sharing files like it was a NFS share but use 
 iSCSI instead.

 I tried and setup iSCSI on both computers and was able to see my files (I had 
 formatted it NTFS before), from my laptop I uploaded a 400MB video file to 
 the root directory and from my desktop I browsed the same directory and the 
 file was not there??

 Thanks

iSCSI is not a clustered file system, in fact it isn't a file system
at all. For iSCSI, you need to configure data fencing, typically
handled by clustering suites from various operating systems to control
which host has access to the iSCSI volumes at one time.
You should stick to CIFS or NFS, or investigate a real clustered file system.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-05-26 Thread Brent Jones
On Wed, May 26, 2010 at 5:08 AM, Matt Connolly
matt.connolly...@gmail.com wrote:
 I've set up an iScsi volume on OpenSolaris (snv_134) with these commands:

 sh-4.0# zfs create rpool/iscsi
 sh-4.0# zfs set shareiscsi=on rpool/iscsi
 sh-4.0# zfs create -s -V 10g rpool/iscsi/test

 The underlying zpool is a mirror of two SATA drives. I'm connecting from a 
 Mac client with global SAN initiator software, connected via Gigabit LAN. It 
 connects fine, and I've initialiased a mac format volume on that iScsi volume.

 Performance, however, is terribly slow, about 10 times slower than an SMB 
 share on the same pool. I expected it would be very similar, if not faster 
 than SMB.

 Here's my test results copying 3GB data:

 iScsi:                  44m01s          1.185MB/s
 SMB share:              4m27            11.73MB/s

 Reading (the same 3GB) is also worse than SMB, but only by a factor of about 
 3:

 iScsi:                  4m36            11.34MB/s
 SMB share:              1m45            29.81MB/s


 Is there something obvious I've missed here?
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Try jumbo frames, and making sure flow control is enabled on your
iSCSI switches and all network cards

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] send/recv over ssh

2010-05-20 Thread Brent Jones
On Thu, May 20, 2010 at 3:42 PM, Brandon High bh...@freaks.com wrote:
 On Thu, May 20, 2010 at 1:23 PM, Thomas Burgess wonsl...@gmail.com wrote:
 I know i'm probably doing something REALLY stupid.but for some reason i
 can't get send/recv to work over ssh.  I just built a new media server and

 Unless you need to have the send to be encrypted, ssh is going to slow
 you down a lot.

 I've used mbuffer when doing sends on the same network, it's worked well.

 -B

 --
 Brandon High : bh...@freaks.com
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Problem with mbuffer, if you do scripted send/receives, you'd have to
pre-start an Mbuffer session on the receiving end somehow.
SSH is always running on the receiving end, so no issues there.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-31 Thread Brent Jones
On Wed, Mar 31, 2010 at 1:00 AM, Karsten Weiss
k.we...@science-computing.de wrote:
 Hi Adam,

 Very interesting data. Your test is inherently
 single-threaded so I'm not surprised that the
 benefits aren't more impressive -- the flash modules
 on the F20 card are optimized more for concurrent
 IOPS than single-threaded latency.

 Thanks for your reply. I'll probably test the multiple write case, too.

 But frankly at the moment I care the most about the single-threaded case
 because if we put e.g. user homes on this server I think they would be
 severely disappointed if they would have to wait 2m42s just to extract a 
 rather
 small 50 MB tarball. The default 7m40s without SSD log were unacceptable
 and we were hoping that the F20 would make a big difference and bring the
 performance down to acceptable runtimes. But IMHO 2m42s is still too slow
 and disabling the ZIL seems to be the only option.

 Knowing that 100s of users could do this in parallel with good performance
 is nice but it does not improve the situation for the single user which only
 cares for his own tar run. If there's anything else we can do/try to improve
 the single-threaded case I'm all ears.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Use something other than Open/Solaris with ZFS as an NFS server?  :)

I don't think you'll find the performance you paid for with ZFS and
Solaris at this time. I've been trying to more than a year, and
watching dozens, if not hundreds of threads.
Getting half-ways decent performance from NFS and ZFS is impossible
unless you disable the ZIL.

You'd be better off getting NetApp

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fishworks 2010Q1 and dedup bug?

2010-03-05 Thread Brent Jones
On Fri, Mar 5, 2010 at 10:49 AM, Tonmaus sequoiamo...@gmx.net wrote:
 Hi,

 I have tried what dedup does on a test dataset that I have filled with 372 GB 
 of partly redundant data. I have used snv_133. All in all, it was successful. 
 The net data volume was only 120 GB. Destruction of the dataset finally took 
 a while, but without any compromise of anything else.

 After this successful test I am planning to use dedup productively soon.

 Regards,

 Tonmaus
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


120GB isn't a large enough test. Do what you will, but there have now
been at least a dozen reports of people locking up their 7000 series,
and X4500/X4540's by enabling de-dupe on large datasets. Myself
included.

Check CR 6924390 for updates (if any)

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fishworks 2010Q1 and dedup bug?

2010-03-04 Thread Brent Jones
On Thu, Mar 4, 2010 at 8:08 AM, Henrik Johansson henr...@henkis.net wrote:
 Hi all,
 Now that the Fishworks 2010.Q1 release seems to get deduplication, does
 anyone know if bugid: 6924824 (destroying a dedup-enabled dataset bricks
 system) is still valid, it has not been fixed in in onnv and it is not
 mentioned in the release notes.
 This is one of the bugs i've been keeping my eyes on before using dedup for
 any serious work, so I was a but surprised to see that it was in the 2010Q1
 release but not fixed in ON. It might not be an issue, just curious, both
 from a fishworks perspective and from a OpenSolaris perspective.
 Regards
 Henrik
 http://sparcv9.blogspot.com

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



My rep says Use dedupe at your own risk at this time.

Guess they've been seeing a lot of issues, and regardless if its
'supported' or not, he said not to use it.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-18 Thread Brent Jones
On Wed, Feb 17, 2010 at 11:03 PM, Matt registrat...@flash.shanje.com wrote:
 No SSD Log device yet.  I also tried disabling the ZIL, with no effect on 
 performance.

 Also - what's the best way to test local performance?  I'm _somewhat_ dumb as 
 far as opensolaris goes, so if you could provide me with an exact command 
 line for testing my current setup (exactly as it appears above) I'd love to 
 report the local I/O readings.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


No one has said if they're using dks, rdsk, or file-backed COMSTAR LUNs yet.
I'm using file-backed COMSTAR LUNs, with ZIL currently disabled.
I can get between 100-200MB/sec, depending on random/sequential and block sizes.

Using dsk/rdsk, I was not able to see that level of performance at all.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-17 Thread Brent Jones
On Wed, Feb 17, 2010 at 10:42 PM, Matt registrat...@flash.shanje.com wrote:



 I've got a very similar rig to the OP showing up next week (plus an 
 infiniband card) I'd love to get this performing up to GB Ethernet speeds, 
 otherwise I may have to abandon the iSCSI project if I can't get it to 
 perform.


Do you have an SSD log device? If not, try disabling the ZIL
temporarily to see if that helps. Your workload will likely benefit
from a log device.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread Brent Jones
On Wed, Feb 10, 2010 at 3:12 PM, Marc Nicholas geekyth...@gmail.com wrote:
 How does lowering the flush interval help? If he can't ingress data
 fast enough, faster flushing is a Bad Thibg(tm).

 -marc

 On 2/10/10, Kjetil Torgrim Homme kjeti...@linpro.no wrote:
 Bob Friesenhahn bfrie...@simple.dallas.tx.us writes:
 On Wed, 10 Feb 2010, Frank Cusack wrote:

 The other three commonly mentioned issues are:

  - Disable the naggle algorithm on the windows clients.

 for iSCSI?  shouldn't be necessary.

  - Set the volume block size so that it matches the client filesystem
    block size (default is 128K!).

 default for a zvol is 8 KiB.

  - Check for an abnormally slow disk drive using 'iostat -xe'.

 his problem is lazy ZFS, notice how it gathers up data for 15 seconds
 before flushing the data to disk.  tweaking the flush interval down
 might help.

 An iostat -xndz 1 readout of the %b% coloum during a file copy to
 the LUN shows maybe 10-15 seconds of %b at 0 for all disks, then 1-2
 seconds of 100, and repeats.

 what are the other values?  ie., number of ops and actual amount of data
 read/written.

 --
 Kjetil T. Homme
 Redpill Linpro AS - Changing the game

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 --
 Sent from my mobile device
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


ZIL performance issues? Is writecache enabled on the LUNs?

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-10 Thread Brent Jones
On Wed, Feb 10, 2010 at 4:05 PM, Brent Jones br...@servuhome.net wrote:
 On Wed, Feb 10, 2010 at 3:12 PM, Marc Nicholas geekyth...@gmail.com wrote:
 How does lowering the flush interval help? If he can't ingress data
 fast enough, faster flushing is a Bad Thibg(tm).

 -marc

 On 2/10/10, Kjetil Torgrim Homme kjeti...@linpro.no wrote:
 Bob Friesenhahn bfrie...@simple.dallas.tx.us writes:
 On Wed, 10 Feb 2010, Frank Cusack wrote:

 The other three commonly mentioned issues are:

  - Disable the naggle algorithm on the windows clients.

 for iSCSI?  shouldn't be necessary.

  - Set the volume block size so that it matches the client filesystem
    block size (default is 128K!).

 default for a zvol is 8 KiB.

  - Check for an abnormally slow disk drive using 'iostat -xe'.

 his problem is lazy ZFS, notice how it gathers up data for 15 seconds
 before flushing the data to disk.  tweaking the flush interval down
 might help.

 An iostat -xndz 1 readout of the %b% coloum during a file copy to
 the LUN shows maybe 10-15 seconds of %b at 0 for all disks, then 1-2
 seconds of 100, and repeats.

 what are the other values?  ie., number of ops and actual amount of data
 read/written.

 --
 Kjetil T. Homme
 Redpill Linpro AS - Changing the game

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 --
 Sent from my mobile device
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 ZIL performance issues? Is writecache enabled on the LUNs?

 --
 Brent Jones
 br...@servuhome.net


Also, are you using rdsk based iSCSI LUNs, or file-based LUNs?

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help needed with zfs send/receive

2010-02-02 Thread Brent Jones
On Tue, Feb 2, 2010 at 12:05 PM, Arnaud Brand t...@tib.cc wrote:
 Hi folks,

 I'm having (as the title suggests) a problem with zfs send/receive.
 Command line is like this :
 pfexec zfs send -Rp tank/t...@snapshot | ssh remotehost pfexec zfs recv -v -F
 -d tank

 This works like a charm as long as the snapshot is small enough.

 When it gets too big (meaning somewhere between 17G and 900G), I get ssh
 errors (can't read from remote host).

 I tried various encryption options (the fastest being in my case arcfour)
 with no better results.
 I tried to setup a script to insert dd on the sending and receiving side to
 buffer the flow, still read errors.
 I tried with mbuffer (which gives better performance), it didn't get better.
 Today I tried with netcat (and mbuffer) and I got better throughput, but it
 failed at 269GB transferred.

 The two machines are connected to the switch with 2x1GbE (Intel) joined
 together with LACP.
 The switch logs show no errors on the ports.
 kstat -p | grep e1000g shows one recv error on the sending side.

 I can't find anything in the logs which could give me a clue about what's
 happening.

 I'm running build 131.

 If anyone has the slightest clue of where I could look or what I could do to
 pinpoint/solve the problem, I'd be very gratefull if (s)he could share it
 with me.

 Thanks and have a nice evening.

 Arnaud



 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



This issue seems to have started after snv_129 for me. I get connect
reset by peer, or transfers (of any kind) simply timeout.

Smaller transfers succeed most of the time, while larger ones usually
fail. Rolling back to snv_127 (my last one) does not exhibit this
issue. I have not had time to narrow down any causes, but I did find
one bug report that found some TCP test scenarios failed during one of
the builds, but unable to find that CR at this time.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help needed with zfs send/receive

2010-02-02 Thread Brent Jones
On Tue, Feb 2, 2010 at 7:41 PM, Brent Jones br...@servuhome.net wrote:
 On Tue, Feb 2, 2010 at 12:05 PM, Arnaud Brand t...@tib.cc wrote:
 Hi folks,

 I'm having (as the title suggests) a problem with zfs send/receive.
 Command line is like this :
 pfexec zfs send -Rp tank/t...@snapshot | ssh remotehost pfexec zfs recv -v -F
 -d tank

 This works like a charm as long as the snapshot is small enough.

 When it gets too big (meaning somewhere between 17G and 900G), I get ssh
 errors (can't read from remote host).

 I tried various encryption options (the fastest being in my case arcfour)
 with no better results.
 I tried to setup a script to insert dd on the sending and receiving side to
 buffer the flow, still read errors.
 I tried with mbuffer (which gives better performance), it didn't get better.
 Today I tried with netcat (and mbuffer) and I got better throughput, but it
 failed at 269GB transferred.

 The two machines are connected to the switch with 2x1GbE (Intel) joined
 together with LACP.
 The switch logs show no errors on the ports.
 kstat -p | grep e1000g shows one recv error on the sending side.

 I can't find anything in the logs which could give me a clue about what's
 happening.

 I'm running build 131.

 If anyone has the slightest clue of where I could look or what I could do to
 pinpoint/solve the problem, I'd be very gratefull if (s)he could share it
 with me.

 Thanks and have a nice evening.

 Arnaud



 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



 This issue seems to have started after snv_129 for me. I get connect
 reset by peer, or transfers (of any kind) simply timeout.

 Smaller transfers succeed most of the time, while larger ones usually
 fail. Rolling back to snv_127 (my last one) does not exhibit this
 issue. I have not had time to narrow down any causes, but I did find
 one bug report that found some TCP test scenarios failed during one of
 the builds, but unable to find that CR at this time.

 --
 Brent Jones
 br...@servuhome.net


Ah, I found the CR that seemed to describe the situation (broken
pipe/connection reset by peer)

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6905510


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS filesystem lock after running auto-replicate.ksh - how to clear?

2010-01-23 Thread Brent Jones
On Sat, Jan 23, 2010 at 8:44 AM, Fletcher Cocquyt fcocq...@stanford.edu wrote:
 Fletcher Cocquyt fcocquyt at stanford.edu writes:


 I found this script for replicating zfs data:

 http://www.infrageeks.com/groups/infrageeks/wiki/8fb35/zfs_autoreplicate_script.html

  - I am testing it out in the lab with b129.
 It error-ed out the first run with some syntax error about the send component
 (recursive needed?)

 ..snip..

 How do I clear the lock - I have not been able to find documentation on 
 this...

 thanks!


 Hi, as one helpful user pointed out, the lock is not from ZFS, but an 
 attribute
 set by the script to prevent contention (multiple replications etc).
 I used zfs get/set to clear the attribute and I was able to replicate the
 initial dataset - still working on the incrementals!

 thanks!


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


As the person who put in the original code for the ZFS lock/depend
checks, the script is relatively simple. Seems Infrageeks added some
better documentation which is very helpful.

You'll want to make sure your remote side doesn't differ, ie. has the
same current snapshots as the sender side. If the replication fails
for some reason, unlock both sides with 'zfs set'.

What problems are your experiencing with incrementals?


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write bursts cause short app stalls

2010-01-06 Thread Brent Jones
On Wed, Jan 6, 2010 at 2:40 PM, Saso Kiselkov skisel...@gmail.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Buffering the writes in the OS would work for me as well - I've got RAM
 to spare. Slowing down rm is perhaps one way to go, but definitely not a
 real solution. On rare occasions I could still get lockups, leading to
 screwed up recordings and if its one thing people don't like about IPTV,
 it's packet loss. Eliminating even the possibility of packet loss
 completely would be the best way to go, I think.

 Regards,
 - --
 Saso


I shouldn't dare suggest this, but what about disabling the ZIL? Since
this sounds like transient data to begin with, any risks would be
pretty low I'd imagine.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zvol (slow) vs file (fast) performance snv_130

2010-01-02 Thread Brent Jones
On Wed, Dec 30, 2009 at 9:35 PM, Ross Walker rswwal...@gmail.com wrote:
 On Dec 30, 2009, at 11:55 PM, Steffen Plotner swplot...@amherst.edu
 wrote:

 Hello,

 I was doing performance testing, validating zvol performance in
 particularly, and found that zvol write performance to be slow ~35-44MB/s at
 1MB blocksize writes. I then tested the underlying zfs file system with the
 same test and got 121MB/s.  Is there any way to fix this? I really would
 like to have compatible performance between the zfs filesystem and the zfs
 zvols.

 Been there.
 ZVOLs were changed a while ago to make each operation synchronous so to
 provide data consistency in the event of a system crash or power outage,
 particularly when used as backing stores for iscsitgt or comstar.
 While I think that the change is necessary I think they should have made the
 cooked 'dsk' device node run with caching enabled to provide an alternative
 for those willing to take the risk, or modify iscsitgt/comstar to issue a
 sync after every write if write-caching is enabled on the backing device and
 the user doesn't want to write cache, or advertise WCE on the mode page to
 the initiators and let them sync.
 I also believe performance can be better. When using zvols with iscsitgt and
 comstar I was unable to break 30MB/s with 4k sequential read workload to a
 zvol with a 128k recordsize (recommended for sequential IO), not very good.
 To the same hardware running Linux and iSCSI Enterprise Target I was able to
 drive over 50MB/s with the same workload. This isn't writes, just reads. I
 was able to do somewhat better going to the physical device with iscsitgt
 and comstar, but not as good as Linux, so I kept on using Linux for iSCSI
 and Solaris for NFS which performed better.
 -Ross

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



I also noticed that using ZVOLS instead of files, for 20MB/sec read
I/O, I saw as many as 900 iops to the disks themselves.
When using file based luns to Comstar, doing 20MB/sec read I/O will
just issue a couple hundred iops.
Seemed to get decent performance, it was required for me to either
throw away my X4540's and switch to 7000's with expensive SSDs, or
switch to file-based Comstar LUNs and disable the ZIL  :(

Sad when a $50k piece of equipment requires such sacrifice.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zvol (slow) vs file (fast) performance snv_130

2009-12-30 Thread Brent Jones
On Wed, Dec 30, 2009 at 8:55 PM, Steffen Plotner swplot...@amherst.edu wrote:
 Hello,

 I was doing performance testing, validating zvol performance in
 particularly, and found that zvol write performance to be slow ~35-44MB/s at
 1MB blocksize writes. I then tested the underlying zfs file system with the
 same test and got 121MB/s.  Is there any way to fix this? I really would
 like to have compatible performance between the zfs filesystem and the zfs
 zvols.

 # first test is a file test at the root of the zpool vg_satabeast8_vol0
 dd if=/dev/zero of=/vg_satabeast8_vol0/testing bs=1M count=32768
 32768+0 records in
 32768+0 records out
 34359738368 bytes (34 GB) copied, 285.037 s, 121 MB/s

 # create zvol
 zfs create -V 100G -b 4k vg_satabeast8_vol0/lv_test

 # test zvol with 'dsk' device
 dd if=/dev/zero of=/dev/zvol/dsk/vg_satabeast8_vol0/lv_test bs=1M
 count=32768
 32768+0 records in
 32768+0 records out
 34359738368 bytes (34 GB) copied, 981.219 s, 35.0 MB/s

 # test zvol with 'rdsk' device (results are better than 'dsk', however, not
 as good as a regular file)
 dd if=/dev/zero of=/dev/zvol/rdsk/vg_satabeast8_vol0/lv_test bs=1M
 count=32768
 32768+0 records in
 32768+0 records out
 34359738368 bytes (34 GB) copied, 766.247 s, 44.8 MB/s


uname -a
 SunOS zfs-debug-node 5.11 snv_130 i86pc i386 i86pc Solaris

 I believe this problem is affecting performance tests others are doing with
 Comstar and exported zvol logical units.

 Steffen
 ___
 Steffen Plotner    Amherst College    Tel
 (413) 542-2348
 Systems/Network Administrator/Programmer   PO BOX 5000    Fax
 (413) 542-2626
 Systems  Networking   Amherst, MA 01002-5000
 swplot...@amherst.edu


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Why did you make the ZFS file system have 4k blocks?
I'd let ZFS manage that for you, which by default I believe is 128K

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [osol-help] zfs destroy stalls, need to hard reboot

2009-12-29 Thread Brent Jones
On Sun, Dec 27, 2009 at 1:35 PM, Brent Jones br...@servuhome.net wrote:
 On Sun, Dec 27, 2009 at 12:55 AM, Stephan Budach stephan.bud...@jvm.de 
 wrote:
 Brent,

 I had known about that bug a couple of weeks ago, but that bug has been 
 files against v111 and we're at v130. I have also seached the ZFS part of 
 this forum and really couldn't find much about this issue.

 The other issue I noticed is that, as opposed to the statements I read, that 
 once zfs is underway destroying a big dataset, other operations would 
 continue to work, but that doesen't seem to be the case. When destroying the 
 3 TB dataset, the other zvol that had been exported via iSCSI stalled as 
 well and that's really bad.

 Cheers,
 budy
 --
 This message posted from opensolaris.org
 ___
 opensolaris-help mailing list
 opensolaris-h...@opensolaris.org


 I just tested your claim, and you appear to be correct.

 I created a couple dummy ZFS filesystems, loaded them with about 2TB,
 exported them via CIFS, and destroyed one of them.
 The destroy took the usual amount of time (about 2 hours), and
 actually, quite to my surprise, all I/O on the ENTIRE zpool stalled.
 I dont recall seeing this prior to 130, in fact, I know I would have
 noticed this, as we create and destroy large ZFS filesystems very
 frequently.

 So it seems the original issue I reported many months back has
 actually gained some new negative impacts  :(

 I'll try to escalate this with my Sun support contract, but Sun
 support still isn't very familiar/clued in about OpenSolaris, so I
 doubt I will get very far.

 Cross posting to ZFS-discuss also, as other may have seen this and
 know of a solution/workaround.



 --
 Brent Jones
 br...@servuhome.net


I did some more testing, and it seems this is 100% reproducible ONLY
if the file system and/or entire pool had compression or de-dupe
enabled at one point.
It doesn't seem to matter if de-dupe/compression was enabled for 5
minutes, or the entire life of the pool, as soon as either are turned
on in snv_130, doing any type of mass change (like deleting a big file
system) will hang ALL I/O for a significant amount of time.

If I create a filesystem with neither enabled, fill it with a few TB
of data, and do a 'zfs destroy' on it, it'll go pretty quick, just a
couple minutes, and no noticeable impact to system I/O.

I'm curious about the 7000 series appliances, since those supposedly
ship now with de-dupe as a fully supported option. Is the code
significantly different in the core of ZFS on the 7000 appliances than
a recent build of OpenSolaris?
My sales rep assures me theres very little overhead by enabling
de-dupe on the 7000 series (which he's trying to sell us, obviously)
but I can't see how that could be, when I have the same hardware the
7000's run on (fully loaded X4540).

Any thoughts from anyone?

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Troubleshooting dedup performance

2009-12-28 Thread Brent Jones
 the previously high performance.

A bit of a let down, so I will wait on the sidelines for this feature to mature.


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [osol-help] zfs destroy stalls, need to hard reboot

2009-12-27 Thread Brent Jones
On Sun, Dec 27, 2009 at 12:55 AM, Stephan Budach stephan.bud...@jvm.de wrote:
 Brent,

 I had known about that bug a couple of weeks ago, but that bug has been files 
 against v111 and we're at v130. I have also seached the ZFS part of this 
 forum and really couldn't find much about this issue.

 The other issue I noticed is that, as opposed to the statements I read, that 
 once zfs is underway destroying a big dataset, other operations would 
 continue to work, but that doesen't seem to be the case. When destroying the 
 3 TB dataset, the other zvol that had been exported via iSCSI stalled as well 
 and that's really bad.

 Cheers,
 budy
 --
 This message posted from opensolaris.org
 ___
 opensolaris-help mailing list
 opensolaris-h...@opensolaris.org


I just tested your claim, and you appear to be correct.

I created a couple dummy ZFS filesystems, loaded them with about 2TB,
exported them via CIFS, and destroyed one of them.
The destroy took the usual amount of time (about 2 hours), and
actually, quite to my surprise, all I/O on the ENTIRE zpool stalled.
I dont recall seeing this prior to 130, in fact, I know I would have
noticed this, as we create and destroy large ZFS filesystems very
frequently.

So it seems the original issue I reported many months back has
actually gained some new negative impacts  :(

I'll try to escalate this with my Sun support contract, but Sun
support still isn't very familiar/clued in about OpenSolaris, so I
doubt I will get very far.

Cross posting to ZFS-discuss also, as other may have seen this and
know of a solution/workaround.



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write bursts cause short app stalls

2009-12-26 Thread Brent Jones
On Fri, Dec 25, 2009 at 9:56 PM, Tim Cook t...@cook.ms wrote:


 On Fri, Dec 25, 2009 at 11:43 PM, Brent Jones br...@servuhome.net wrote:

 
 
 
 
  Hang on... if you've got 77 concurrent threads going, I don't see how
  that's
  a sequential I/O load.  To the backend storage it's going to look like
  the
  equivalent of random I/O.  I'd also be surprised to see 12 1TB disks
  supporting 600MB/sec throughput and would be interested in hearing where
  you
  got those numbers from.
 
  Is your video capture doing 430MB or 430Mbit?
 
  --
  --Tim
 
 

 Think he said 430Mbit/sec, which if these are security cameras, would
 be a good sized installation (30+ cameras).
 We have a similar system, albeit running on Windows. Writing about
 400Mbit/sec using just 6, 1TB SATA drives is entirely possible, and
 working quite well on our system without any frame loss or much
 latency.

 Once again, Mb or MB?  They're two completely different numbers.  As for
 getting 400Mbit out of 6 SATA drive, that's not really impressive at all.
 If you're saying you got 400MB, that's a different story entirely, and while
 possible with sequential I/O and a proper raid setup, it isn't happening
 with random.


Mb, megabit.
400 megabit is not terribly high, a single SATA drive could write that
24/7 without a sweat. Which is why he is reporting his issue.

Sequential or random, any modern system should be able to perform that
task without causing disruption to other processes running on the
system (if Windows can, Solaris/ZFS most definitely should be able
to).

I have similar workload on my X4540's, streaming backups from multiple
systems at a time. These are very high end machines, dual quadcore
opterons and 64GB RAM, 48x 1TB drives in 5-6 disk RAIDZ vdevs.

The write stalls have been a significant problem since ZFS came out,
and hasn't really been addressed in an acceptable fashion yet, though
work has been done to improve it.

I'm still trying to find the case number I have open with Sunsolve or
whatever, it was for exactly this issue, and I believe the fix was to
add dozens more classes to the scheduler, to allow more fair disk
I/O and overall niceness on the system when ZFS commits a
transaction group.




-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write bursts cause short app stalls

2009-12-25 Thread Brent Jones
On Fri, Dec 25, 2009 at 7:47 PM, Tim Cook t...@cook.ms wrote:


 On Fri, Dec 25, 2009 at 11:57 AM, Saso Kiselkov skisel...@gmail.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 I've started porting a video streaming application to opensolaris on
 ZFS, and am hitting some pretty weird performance issues. The thing I'm
 trying to do is run 77 concurrent video capture processes (roughly
 430Mbit/s in total) all writing into separate files on a 12TB J4200
 storage array. The disks in the array are arranged into a single RAID-0
 ZFS volume (though I've tried different RAID levels, none helped). CPU
 performance is not an issue (barely hitting 35% utilization on a single
 CPU quad-core X2250). I/O bottlenecks can also be ruled out, since the
 storage array's sequential write performance is around 600MB/s.

 The problem is the bursty behavior of ZFS writes. All the capture
 processes do, in essence is poll() on a socket and then read() and
 write() any available data from it to a file. The poll() call is done
 with a timeout of 250ms, expecting that if no data arrives within 0.25
 seconds, the input is dead and recording stops (I tried increasing this
 value, but the problem still arises, although not as frequently). When
 ZFS decides that it wants to commit a transaction group to disk (every
 30 seconds), the system stalls for a short amount of time and depending
 on the number capture of processes currently running, the poll() call
 (which usually blocks for 1-2ms), takes on the order of hundreds of ms,
 sometimes even longer. I figured that I might be able to resolve this by
 lowering the txg timeout to something like 1-2 seconds (I need ZFS to
 write as soon as data arrives, since it will likely never be
 overwritten), but I couldn't find any tunable parameter for it anywhere
 on the net. On FreeBSD, I think this can be done via the
 vfs.zfs.txg_timeout sysctl. A glimpse into the source at

 http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/txg.c
 on line 40 made me worry that somebody maybe hard-coded this value into
 the kernel, in which case I'd be pretty much screwed in opensolaris.

 Any help would be greatly appreciated.

 Regards,
 - --
 Saso




 Hang on... if you've got 77 concurrent threads going, I don't see how that's
 a sequential I/O load.  To the backend storage it's going to look like the
 equivalent of random I/O.  I'd also be surprised to see 12 1TB disks
 supporting 600MB/sec throughput and would be interested in hearing where you
 got those numbers from.

 Is your video capture doing 430MB or 430Mbit?

 --
 --Tim

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Think he said 430Mbit/sec, which if these are security cameras, would
be a good sized installation (30+ cameras).
We have a similar system, albeit running on Windows. Writing about
400Mbit/sec using just 6, 1TB SATA drives is entirely possible, and
working quite well on our system without any frame loss or much
latency.

The writes lag is noticeable however with ZFS, and the behavior of the
transaction group writes. If you have a big write that needs to land
on disk, it seems all other I/O, CPU and niceness is thrown out the
window in favor of getting all that data on disk.
I was on a watch list for a ZFS I/O scheduler bug with my paid Solaris
support, I'll try to find that bug number, but I believe some
improvements were done in 129 and 130.



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs zend is very slow

2009-12-16 Thread Brent Jones
On Wed, Dec 16, 2009 at 12:19 PM, Michael Herf mbh...@gmail.com wrote:
 Mine is similar (4-disk RAIDZ1)
  - send/recv with dedup on: 4MB/sec
  - send/recv with dedup off: ~80M/sec
  - send  /dev/null: ~200MB/sec.
 I know dedup can save some disk bandwidth on write, but it shouldn't save
 much read bandwidth (so I think these numbers are right).
 There's a warning in a Jeff Bonwick post that if the DDT (de-dupe tables)
 don't fit in RAM, things will be slower.
 Wonder what that threshold is?
 Second try of the same recv appears to go randomly faster (5-12MB bursting
 to 100MB/sec briefly) - DDT in core should make the second try quite a bit
 faster, but it's not as fast as I'd expect.
 My zdb -D output:
 DDT-sha256-zap-duplicate: 633396 entries, size 361 on disk, 179 in core
 DDT-sha256-zap-unique: 5054608 entries, size 350 on disk, 185 in core
 6M entries doesn't sound like that much for a box with 6GB of RAM.

 CPU load is also low.
 mike

 On Wed, Dec 16, 2009 at 8:19 AM, Brandon High bh...@freaks.com wrote:

 On Wed, Dec 16, 2009 at 8:05 AM, Bob Friesenhahn
 bfrie...@simple.dallas.tx.us wrote:
   In his case 'zfs send' to /dev/null was still quite fast and the
  network
  was also quite fast (when tested with benchmark software).  The
  implication
  is that ssh network transfer performace may have dropped with the
  update.

 zfs send appears to be fast still, but receive is slow.

 I tried a pipe from the send to the receive, as well as using mbuffer
 with a 100mb buffer, both wrote at ~ 12 MB/s.

 -B

 --
 Brandon High : bh...@freaks.com
 Indecision is the key to flexibility.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



I'm seeing similar results, though my file systems currently have
de-dupe disabled, and only compression enable, both systems being
snv_129.
An old 111 build is also sending to the 129 main file server slow,
when it used to be the 111 could send about 25MB/sec over SSH to the
main file server which used to run 127. Since 128 however, the main
file server is receiving ZFS snapshots at a fraction of the previous
speed.
129 fixed it a bit, I was literaly getting just a couple hundred
-BYTES- a second on 128, but 129 I can get about 9-10MB/sec if I'm
lucky, but usually 4-5MB/sec. No other configuration changes on the
network occured, except for my X4540's being upgraded to snv_129.

It does appear to be the zfs receive part, because I can send to
/dev/null at close to 800MB/sec (42 drives in 5-6 disk vdevs, RAID-Z)

Something must've changed in either SSH, or the ZFS receive bits to
cause this, but sadly since I upgrade my pool, I cannot roll back
these hosts  :(

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs zend is very slow

2009-12-16 Thread Brent Jones
On Wed, Dec 16, 2009 at 7:43 PM, Edward Ned Harvey
sola...@nedharvey.com wrote:
 I'm seeing similar results, though my file systems currently have
 de-dupe disabled, and only compression enable, both systems being

 I can't say this is your issue, but you can count on slow writes with
 compression on.  How slow is slow?  Don't know.  Irrelevant in this case?
 Possibly.



I'm willing to accept slower writes with compression enabled, par for
the course. Local writes, even with compression enabled, can still
exceed 500MB/sec, with moderate to high CPU usage.
These problems seem to have manifested after snv_128, and seemingly
only affect ZFS receive speeds. Local pool performance is still very
fast.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool fragmentation issues?

2009-12-15 Thread Brent Jones
On Tue, Dec 15, 2009 at 5:28 PM, Bill Sprouse bill.spro...@sun.com wrote:
 Hi Everyone,

 I hope this is the right forum for this question.  A customer is using a
 Thumper as an NFS file server to provide the mail store for multiple email
 servers (Dovecot).  They find that when a zpool is freshly created and
 populated with mail boxes, even to the extent of 80-90% capacity,
 performance is ok for the users, backups and scrubs take a few hours (4TB of
 data). There are around 100 file systems.  After running for a while (couple
 of months) the zpool seems to get fragmented, backups take 72 hours and a
 scrub takes about 180 hours.  They are running mirrors with about 5TB usable
 per pool (500GB disks).  Being a mail store, the writes and reads are small
 and random.  Record size has been set to 8k (improved performance
 dramatically).  The backup application is Amanda.  Once backups become too
 tedious, the remedy is to replicate the pool and start over.  Things get
 fast again for a while.

 Is this expected behavior given the application (email - small, random
 writes/reads)?  Are there recommendations for system/ZFS/NFS configurations
 to improve this sort of thing?  Are there best practices for structuring
 backups to avoid a directory walk?

 Thanks,
 bill
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Anyone reason in particular they chose to use Dovecot with the old Mbox format?
Mbox has been proven many times over to be painfully slow when the
files get larger, and in this day and age, I can't imagine anyone
having smaller than a 50MB mailbox. We have about 30,000 e-mail users
on various systems, and it seems the average size these days is
approaching close to a GB. Though Dovecot has done a lot to improve
the performance of Mbox mailboxes, Maildir might be more rounded for
your system.

I wonder if the soon to be released block/parity rewrite tool will
freshen up a pool thats heavily fragmented, without having to redo
the pools.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

2009-12-13 Thread Brent Jones
On Sat, Dec 12, 2009 at 8:14 PM, Brent Jones br...@servuhome.net wrote:
 On Sat, Dec 12, 2009 at 11:39 AM, Brent Jones br...@servuhome.net wrote:
 On Sat, Dec 12, 2009 at 7:55 AM, Bob Friesenhahn
 bfrie...@simple.dallas.tx.us wrote:
 On Sat, 12 Dec 2009, Brent Jones wrote:

 I've noticed some extreme performance penalties simply by using snv_128

 Does the 'zpool scrub' rate seem similar to before?  Do you notice any read
 performance problems?  What happens if you send to /dev/null rather than via
 ssh?

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/


 Scrubs on both systems seem to take about the same amoutn of time (16
 hours, on a 48TB pool, with about 20TB used)

 I'll test to dev/null tonight

 --
 Brent Jones
 br...@servuhome.net


 I tested send performance to /dev/null, and I sent a 500GB filesystem
 in just a few minutes.

 The two servers are linked over GigE fiber (between two cities)

 Iperf output:

 [ ID] Interval       Transfer     Bandwidth
 [  5]  0.0-60.0 sec  2.06 GBytes    295 Mbits/sec
 [ ID] Interval       Transfer     Bandwidth
 [  4]  0.0-60.0 sec  2.38 GBytes    341 Mbits/sec

 Usually a bit faster, but some other stuff goes over that pipe.


 Though looking at network traffic between these two hosts during the
 send, I see a lot of network traffic (about 100-150Mbit usually)
 during the send. So theres traffic, but a 100MB send has taken over 10
 minutes and still not complete. But given 100Mbit/sec, it should take
 about 10 seconds roughly, not 10 minutes.
 There is a little bit of disk activity, maybe a MB/sec on average, and
 about 30 iops.
 So it seems the hosts are exchanging a lot of data about the snapshot,
 but not actually replicating any data for a very long time.
 SSH CPU usage is minimal, just a few percent (arcfour, but tried
 others, no difference)

 Odd behavior to be sure, and looks very familiar to what snapshot
 replication did back in build 101, before they made significant speed
 improvements to snapshot replication. Wonder if this is a major
 regression, due to changes in newer ZFS versions, maybe to accomodate
 de-dupe?

 Sadly, I can't roll back, since I already upgraded my pool, but I may
 try upgrading to 129, but my IPS doesn't seem to recognize the newer
 version yet.


 --
 Brent Jones
 br...@servuhome.net


I found some time to dig into my troubles updating to 129 (my dev
repository can no longer be called Dev, must use the opensolaris.org
name, bleh)

But at least build 129 seems to fix this. Not sure what the issue is,
but bouncing between 128 and 129, I can reproduce 100% of the time
terrible ZFS send/recv times.
Though, 129 still isnt as fast at 127, with the same datasets and
configuration, but it's good enough for now.



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

2009-12-12 Thread Brent Jones
I've noticed some extreme performance penalties simply by using snv_128

I take snapshots, and send them over SSH to another server over
Gigabit ethernet.
Before, I would get 20-30MBps, prior to snv_128 (127, and nearly all
previous builds).

However, simply image-updating to snv_128 has caused a majority of my
snapshots to do this:

receiving incremental stream of pdxfilu01/vault/0...@20091212-01:15:00
into pdxfilu02/vault/0...@20091212-01:15:00
received 13.8KB stream in 491 seconds (28B/sec)

De-dupe is NOT enabled on any pool, but I have upgraded to the newest
ZFS pool version, which prevents me from rolling back to snv_127,
which would send at many tens of megabytes a second.

This is on an X4540, dual quad cores, and 64GB RAM.

Anyone else seeing similar issues?

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

2009-12-12 Thread Brent Jones
On Sat, Dec 12, 2009 at 7:55 AM, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 On Sat, 12 Dec 2009, Brent Jones wrote:

 I've noticed some extreme performance penalties simply by using snv_128

 Does the 'zpool scrub' rate seem similar to before?  Do you notice any read
 performance problems?  What happens if you send to /dev/null rather than via
 ssh?

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/


Scrubs on both systems seem to take about the same amoutn of time (16
hours, on a 48TB pool, with about 20TB used)

I'll test to dev/null tonight

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

2009-12-12 Thread Brent Jones
On Sat, Dec 12, 2009 at 11:39 AM, Brent Jones br...@servuhome.net wrote:
 On Sat, Dec 12, 2009 at 7:55 AM, Bob Friesenhahn
 bfrie...@simple.dallas.tx.us wrote:
 On Sat, 12 Dec 2009, Brent Jones wrote:

 I've noticed some extreme performance penalties simply by using snv_128

 Does the 'zpool scrub' rate seem similar to before?  Do you notice any read
 performance problems?  What happens if you send to /dev/null rather than via
 ssh?

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/


 Scrubs on both systems seem to take about the same amoutn of time (16
 hours, on a 48TB pool, with about 20TB used)

 I'll test to dev/null tonight

 --
 Brent Jones
 br...@servuhome.net


I tested send performance to /dev/null, and I sent a 500GB filesystem
in just a few minutes.

The two servers are linked over GigE fiber (between two cities)

Iperf output:

[ ID] Interval   Transfer Bandwidth
[  5]  0.0-60.0 sec  2.06 GBytes295 Mbits/sec
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-60.0 sec  2.38 GBytes341 Mbits/sec

Usually a bit faster, but some other stuff goes over that pipe.


Though looking at network traffic between these two hosts during the
send, I see a lot of network traffic (about 100-150Mbit usually)
during the send. So theres traffic, but a 100MB send has taken over 10
minutes and still not complete. But given 100Mbit/sec, it should take
about 10 seconds roughly, not 10 minutes.
There is a little bit of disk activity, maybe a MB/sec on average, and
about 30 iops.
So it seems the hosts are exchanging a lot of data about the snapshot,
but not actually replicating any data for a very long time.
SSH CPU usage is minimal, just a few percent (arcfour, but tried
others, no difference)

Odd behavior to be sure, and looks very familiar to what snapshot
replication did back in build 101, before they made significant speed
improvements to snapshot replication. Wonder if this is a major
regression, due to changes in newer ZFS versions, maybe to accomodate
de-dupe?

Sadly, I can't roll back, since I already upgraded my pool, but I may
try upgrading to 129, but my IPS doesn't seem to recognize the newer
version yet.


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled

2009-12-08 Thread Brent Jones
On Tue, Dec 8, 2009 at 6:36 PM, Jack Kielsmeier jac...@netins.net wrote:
 Ah, good to know! I'm learning all kinds of stuff here :)

 The command (zpool import) is still running and I'm still seeing disk 
 activity.

 Any rough idea as to how long this command should last? Looks like each disk 
 is being read at a rate of 1.5-2 megabytes per second.

 Going worst case, assuming each disk is  1572864 megs (the 1.5TB disks are 
 actually smaller than this due to the 'rounding' drive manufacturers do) and 
 2 megs/sec read rate per disk, that means hopefully at most I should have to 
 wait:

 1572864(megs) / 2(megs/second) / 60 (seconds / minute) / 60 (minutes / hour) 
 / 24 (hour / day):

 9.1 days

 Again, I don't know if the zpool import is looking at the entire contents of 
 the disks, or what exactly it's doing, but I'm hoping that would be the 
 'maximum' I'd have to wait for this command to finish :)
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I submitted a bug a while ago about this:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6855208

I'll escalate since I have a support contract. But yes, I see this as
a serious bug, I thought my machine had locked up entirely as well, it
took about 2 days to finish a destroy on a volume about 12TB in size.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] flar and tar the best way to backup S10 ZFS only?

2009-11-23 Thread Brent Jones
On Mon, Nov 23, 2009 at 8:24 PM, Trevor Pretty
trevor_pre...@eagle.co.nz wrote:

 I'm persuading a customer that when he goes to S10 he should use ZFS for
 everything. We only have one M3000 and a J4200 connected to it. We are not
 talking about a massive site here with a SAN etc. The M3000 is their
 mainframe. His RTO and RPO are both about 12 hours, his business gets
 difficult without the server but does not die horribly.

 He currently uses ufsdump to tape each night which is sent off site. However
 ufsrestore -i has saved is bacon in the past and does not want to loose
 this functionality.

 A couple of questions.

 flar seems to work with ZFS quite well and will backup the whole root pool
 flar(1M)

 This seems to be the best way to get the equivalent of ufsrestore -r and a
 great way to recover in a DR event:-
 http://www.sun.com/bigadmin/content/submitted/flash_archive.jsp

 My Questions...

 Q: Is there the equivalent of ufsretore -i with flar? (which seems to be an
 ugly shell script around cpio or pax)

 Q: Therefore should I have a tar of the root pool as well?

 Q: There is no reason I cannot use flar on the other non root pools?

 Q: Or is tar better for the non root pools?

 We will have LOTS of disk space, his whole working dataset will easily fit
 onto an LTO4, so can anybody think of good a reason why you would not flar
 the root pool into another pool and then just tar off this pool each night
 to tape? In fact we will have so much disk space (compared to now) I expect
 we will will be able to keep most backups on-line for quite some time.


 Discuss :-)


 --
 Trevor Pretty | Technical Account Manager | T: +64 9 639 0652 | M: +64 21
 666 161
 Eagle Technology Group Ltd.
 Gate D, Alexandra Park, Greenlane West, Epsom
 Private Bag 93211, Parnell, Auckland

 www.eagle.co.nz

 This email is confidential and may be legally privileged. If received in
 error please destroy and immediately notify us.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



With the cost of tapes, drives, and off-site storage service (unless
its stored at the owners home), you could probably co-locate a server
with fast internet connectivity, a bundle of local storage, and just
ZFS snapshot your relevant pools to that server.

I second the recommendation of Amanda from Richard as well though,
pretty flexible solution. And it can backup much more than just local
ZFS snapshots if that would be a benefit to you as well.


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Comstar thin provisioning space reclamation

2009-11-19 Thread Brent Jones
On Wed, Nov 18, 2009 at 4:09 PM, Brent Jones br...@servuhome.net wrote:
 On Tue, Nov 17, 2009 at 10:32 AM, Ed Plese e...@edplese.com wrote:
 You can reclaim this space with the SDelete utility from Microsoft.
 With the -c option it will zero any free space on the volume.  For
 example:

 C:\sdelete -c C:

 I've tested this with xVM and with compression enabled for the zvol,
 but it worked very well.


 Ed Plese



 It seems the compression setting on the zvol is key here. Tried
 without compression turned on, and the thin provisioned file grew to
 its maximum size.
 I'm re-running it on the same volume, this time with compression
 turned on to see how it behaves next  :)



 --
 Brent Jones
 br...@servuhome.net


Turning compression on was the key. Reclaimed about 5TB of space
running sdelete (though it takes a very long time)


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Comstar thin provisioning space reclamation

2009-11-18 Thread Brent Jones
On Tue, Nov 17, 2009 at 10:32 AM, Ed Plese e...@edplese.com wrote:
 You can reclaim this space with the SDelete utility from Microsoft.
 With the -c option it will zero any free space on the volume.  For
 example:

 C:\sdelete -c C:

 I've tested this with xVM and with compression enabled for the zvol,
 but it worked very well.


 Ed Plese



It seems the compression setting on the zvol is key here. Tried
without compression turned on, and the thin provisioned file grew to
its maximum size.
I'm re-running it on the same volume, this time with compression
turned on to see how it behaves next  :)



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Comstar thin provisioning space reclamation

2009-11-17 Thread Brent Jones
I use several file-backed thin provisioned iSCSI volumes presented over Comstar.
The initiators are Windows 2003/2008 systems with the MS MPIO initiator.

The Windows systems only claim to be using about 4TB of space, but the
ZFS volume says 7.12TB is used.
Granted, I imagine ZFS allocates the blocks as soon as Windows needs
space, and Windows will eventually not need that space again.
Is there a way to reclaim un-used space on a thin provisioned iSCSI target?



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ..and now ZFS send dedupe

2009-11-09 Thread Brent Jones
On Mon, Nov 9, 2009 at 12:45 PM, Nigel Smith
nwsm...@wilusa.freeserve.co.uk wrote:
 More ZFS goodness putback before close of play for snv_128.

  http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010768.html

  http://hg.genunix.org/onnv-gate.hg/rev/216d8396182e

 Regards
 Nigel Smith
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Are these recent developments due to help/support from Oracle?
Or is it business as usual for ZFS developments?

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs mount error

2009-11-03 Thread Brent Jones
On Mon, Nov 2, 2009 at 1:34 PM, Ramin Moazeni ramin.moaz...@sun.com wrote:
 Hello

 A customer recently had a power outage.  Prior to the outage, they did a
 graceful shutdown of their system.
 On power-up, the system is not coming up due to zfs errors as follows:
 cannot mount 'rpool/export': Number of symbolic links encountered during
 path name traversal exceeds MAXSYMLINKS
 mount '/export/home': failed to create mountpoint.

 The possible cause of this might be that a symlink is created pointing to
 itself since the customer stated
 that they created lots of symlink to get their env ready. However, since
 /export is not getting mounted, they
 can not go back and delete/fix the symlinks.

 Can someone suggest a way to fix this issue?

 Thanks
 Ramin Moazeni
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I see these very frequently on my systems, regardless of a clean
shutdown or not, 1/3 of the time filesystems cannot mount.

What I do, is boot into single user mode, make sure the filesystem in
question is NOT mounted, and just delete the directory that its trying
to mount into.



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] million files in single directory

2009-10-03 Thread Brent Jones
On Sat, Oct 3, 2009 at 6:50 PM, Jeff Haferman j...@haferman.com wrote:

 A user has 5 directories, each has tens of thousands of files, the
 largest directory has over a million files.  The files themselves are
 not very large, here is an ls -lh on the directories:
 [these are all ZFS-based]

 [r...@cluster]# ls -lh
 total 341M
 drwxr-xr-x+ 2 someone cluster  13K Sep 14 19:09 0/
 drwxr-xr-x+ 2 someone cluster  50K Sep 14 19:09 1/
 drwxr-xr-x+ 2 someone cluster 197K Sep 14 19:09 2/
 drwxr-xr-x+ 2 someone cluster 785K Sep 14 19:09 3/
 drwxr-xr-x+ 2 someone cluster 3.1M Sep 14 19:09 4/

 When I go into directory 0, it takes about a minute for an ls -1 |
 grep wc to return (it has about 12,000 files).  Directory 1 takes
 between 5-10 minutes for the same command to return (it has about 50,000
 files).

 I did an rsync of this directory structure to another filesystem
 [lustre-based, FWIW] and it took about 24 hours to complete.  We have
 done rsyncs on other directories that are much larger in terms of
 file-sizes, but have thousands of files rather than tens, hundreds, and
 millions of files.

 Is there someway to speed up simple things like determining the
 contents of these directories?  And why does an rsync take so much
 longer on these directories when directories that contain hundreds of
 gigabytes transfer much faster?

 Jeff

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Be happy you don't have Windows + NTFS with hundreds of thousands, or
millions of files.
Explorer will crash, run your system out of memory and slow it down,
or plain out hard lock windows for hours on end.
This is on brand new hardware, 64bit, 32GB RAM, and 15k SAS disks.

Regardless of filesystem, I'd suggest splitting your directory
structure into a hierarchy. It makes sense even just for cleanliness.


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snv_110 - snv_121 produces checksum errors on Raid-Z pool

2009-09-02 Thread Brent Jones
On Wed, Sep 2, 2009 at 6:27 AM, Frank Middletonf.middle...@apogeect.com wrote:
 On 09/02/09 05:40 AM, Henrik Johansson wrote:

 For those of us which have already upgraded and written data to our
 raidz pools, are there any risks of inconsistency, wrong checksums in
 the pool? Is there a bug id?

 This may not be a new problem insofar as it may also affect mirrors.
 As part of the ancient mirrored drives should not have checksum
 errors thread, I used Richard Elling's amazing zcksummon script
 http://www.richardelling.com/Home/scripts-and-programs-1/zcksummon
 to help diagnose this (thanks, Richard, for all your help).

 The bottom line is that hardware glitches (as found on cheap PCs
 without ECC on buses and memory) can put ZFS into a mode where it
 detects bogus checksum errors. If you set copies=2, it seems to
 always be able to repair them, but they are never actually repaired.
 Every time you scrub, it finds a checksum error on the affected file(s)
 and it pretends to repair it (or may fail if you have copies=1 set).

 Note: I have not tried this on raidz, only mirrors, where it is
 highly reproducible. It would be really interesting to see if
 raidz gets results similar to the mirror case when running zcksummon.
 Note I have NEVER had this problem on SPARC, only on certain
 bargain-basement PCs (used as X-Terminals) which as it turns out
 have mobos notorious for not detecting bus parity errors.

 If this is the same problem, you can certainly mitigate it by
 setting copies=2 and actually copying the files (e.g., by
 promoting a snapshot, which I believe will do this - can someone
 confirm?). My guess is that snv121 has done something to make
 the problem more likely to occur, but the problem itself is
 quite old (predates snv100). Could you share with us some details
 of your hardware, especially how much memory and if it has ECC
 orbus parity?

 Cheers -- Frank

 On 09/02/09 05:40 AM, Henrik Johansson wrote:

 Hi Adam,


 On Sep 2, 2009, at 1:54 AM, Adam Leventhal wrote:

 Hi James,

 After investigating this problem a bit I'd suggest avoiding deploying
 RAID-Z
 until this issue is resolved. I anticipate having it fixed in build 124.


 Regards

 Henrik
 http://sparcv9.blogspot.com http://sparcv9.blogspot.com/


 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I see this issue on each of my X4540's, 64GB of ECC memory, 1TB drives.
Rolling back to snv_118 does not reveal any checksum errors, only snc_121

So, the commodity hardware here doesn't hold up, unless Sun isn't
validating their equipment (not likely, as these servers have had no
hardware issues prior to this build)


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Petabytes on a budget - blog

2009-09-02 Thread Brent Jones
On Wed, Sep 2, 2009 at 12:12 PM, Roland Rambauroland.ram...@sun.com wrote:
 Jacob,

 Jacob Ritorto schrieb:

 Torrey McMahon wrote:

 3) Performance isn't going to be that great with their design but...they
 might not need it.


 Would you be able to qualify this assertion?  Thinking through it a bit,
 even if the disks are better than average and can achieve 1000Mb/s each,
 each uplink from the multiplier to the controller will still have 1000Gb/s
 to spare in the slowest SATA mode out there.  With (5) disks per multiplier
 * (2) multipliers * 1000GB/s each, that's 1Gb/s at the PCI-e interface,
 which approximately coincides with a meager 4x PCI-e slot.

 they use a 85$ PC motherboard - that does not have meager 4x PCI-e slots,
 it has one 16x and 3 *1x* PCIe slots, plus 3 PCI slots ( remember, long time
 ago: 32-bit wide 33 MHz, probably shared bus ).

 Also it seems that all external traffic uses the single GbE motherboard
 port.

  -- Roland


 --

 **
 Roland Rambau                 Platform Technology Team
 Principal Field Technologist  Global Systems Engineering
 Phone: +49-89-46008-2520      Mobile:+49-172-84 58 129
 Fax:   +49-89-46008-      mailto:roland.ram...@sun.com
 **
    Sitz der Gesellschaft: Sun Microsystems GmbH,
    Sonnenallee 1, D-85551 Kirchheim-Heimstetten
    Amtsgericht München: HRB 161028;  Geschäftsführer:
    Thomas Schröder, Wolfgang Engels, Wolf Frenkel
    Vorsitzender des Aufsichtsrates:   Martin Häring
 *** UNIX * /bin/sh  FORTRAN **
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Probably for their usage patterns, these boxes make sense. But I
concur that the reliability and performance would be very suspect to
any organization which values their data in any fashion.
Personally, I have some old Dual P3 systems still running fine at
home, on what were cheap motherboards. But would I advocate such a
system to protect business data? Not a chance.

I'm sure at the price they offer storage, this was the only way they
could be profitable, and it's a pretty creative solution.
For my personal data backups, I'm sure their service would meet all my
needs, but thats about as far as I would trust these systems - MP3's,
backups of photos for which I already maintain a couple copies of.


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4540 dead HDD replacement, remains configured.

2009-08-06 Thread Brent Jones
On Wed, Aug 5, 2009 at 11:48 PM, Jorgen Lundmanlund...@gmo.jp wrote:

 I suspect this is what it is all about:

  # devfsadm -v
 devfsadm[16283]: verbose: no devfs node or mismatched dev_t for
 /devices/p...@0,0/pci10de,3...@b/pci1000,1...@0/s...@5,0:a
 [snip]

 and indeed:

 brw-r-   1 root     sys       30, 2311 Aug  6 15:34 s...@4,0:wd
 crw-r-   1 root     sys       30, 2311 Aug  6 15:24 s...@4,0:wd,raw
 drwxr-xr-x   2 root     sys            2 Aug  6 14:31 s...@5,0
 drwxr-xr-x   2 root     sys            2 Apr 17 17:52 s...@6,0
 brw-r-   1 root     sys       30, 2432 Jul  6 09:50 s...@6,0:a
 crw-r-   1 root     sys       30, 2432 Jul  6 09:48 s...@6,0:a,raw

 Perhaps because it was booted with the dead disk in place, it never
 configured the entire sd5 mpt driver. Why the other hard-disks work I
 don't know.

 I suspect the only way to fix this, is to reboot again.

 Lund



I have a pair of X4540's also, and getting any kind of drive status,
or failure alert is a lost cause.
I've opened several cases with Sun with the following issues:

ILOM/BMC can't see any drives (status, FRU, firmware, etc)
FMA cannot see a drive failure (you can pull a drive, and it could be
hours before 'zpool status' will show a failed drive, even during a
'zfs scrub')
Hot swapping drives rarely works, system will not see new drive until a reboot

Things I've tried that Sun has suggested:

New BIOS
New controller firmware
New ILOM firmware
Upgrading to new releases of Osol (currently on 118, no luck)
Replacing ILOM card
Custom FMA configs

Nothing works, and my cases with Sun have been open for about 6 months
now, with no resolution in sight.

Given that Sun now makes the 7000, I can only assume their support on
the more whitebox version, AKA X4540, is either near an end, or they
don't intend to support any advanced monitoring whatsoever.

Sad, really.. as my $900 Dell and HP servers can send SMS, Jabber
messages, SNMP traps, etc, on ANY IPMI event, hardware issue, and what
have you without any tinkering or excuses.


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Managing ZFS Replication

2009-07-31 Thread Brent Jones
On Fri, Jul 31, 2009 at 10:25 AM, Joseph L.
Casalejcas...@activenetwerx.com wrote:
I came up with a somewhat custom script, using some pre-existing
scripts I found about the land.

http://www.brentrjones.com/?p=45

 Brent,
 That was super helpful. I had to make some simple changes to the ssh
 syntax as I use a specific user and identity file going from Solaris
 10 to OpenSolaris 0906 but I am getting this message:

 The Source snapshot does exist on the Destination, clear to send a new one!
 Taking snapshot: /sbin/zfs snapshot mypool2/back...@2009-07-31t16:34:54Z
 receiving incremental stream of mypool2/back...@2009-07-31t16:34:54Z into 
 mypool/back...@2009-07-31t16:34:54Z
 received 39.7GB stream in 2244 seconds (18.1MB/sec)
 cannot set property for 'mypool2/back...@2009-07-31t16:34:54Z': snapshot 
 properties cannot be modified
 cannot set property for 'mypool2/back...@2009-58-30t21:58:15Z': snapshot 
 properties cannot be modified
 cannot set property for 'mypool2/back...@2009-07-31t16:34:54Z': snapshot 
 properties cannot be modified

 Is that intended to modify the properties of a snapshot? Does that work
 in some other version of Solaris other than 10u7?

 Thanks so much for that pointer!
 jlc
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


If I recall correctly, modifiable snapshot properties aren't support
in older versions of ZFS :(
I wrote the script on Opensolaris 2008.11, which did have modifiable
snapshot properties.
Can you upgrade your pool versions possibly?


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Managing ZFS Replication

2009-07-30 Thread Brent Jones
On Thu, Jul 30, 2009 at 3:54 PM, Joseph L.
Casalejcas...@activenetwerx.com wrote:
 Anyone come up with a solution to manage the replication of ZFS snapshots?
 The send/recv criteria gets tricky with all but the first unless you purge
 the destination of snapshots, then force a full stream into it.

 I was hoping to script a daily update but I see that I would have to keep 
 track
 of what's been done on both sides when using the -i|I syntax so it would not
 be reliable in a hands off script.

 Would AVS be a possible solution in a mixed S10/Osol/SXCE environment? I 
 presume
 that would make it fairly trivially but right now I am duplicating data from 
 an
 s10 box to an osol snv118 box based on hardware/application needs forcing the
 two platforms.

 Thanks for any ideas!
 jlc
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I came up with a somewhat custom script, using some pre-existing
scripts I found about the land.

http://www.brentrjones.com/?p=45

I schedule some file systems every 5 minutes, hour, and nightly
depending on requirements. It has worked quite well for me, and proved
to be quite useful in restoring as well (already had to use it).
E-mails status reports, handles conflicts in a simple but effective
way, and replication can be reversed by just starting to run it from
the other system.

I expanded on it by being able to handle A-B and B-A replication
(mirror half of A to B, and half of B to A for paired redundancy).
I'll post that version up in a few weeks when I clean it up a little.

Credits go to Constantin Gonzalez for inspiration and source for parts
of my script.
http://blogs.sun.com/constantin/


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs destroy slow?

2009-07-28 Thread Brent Jones
On Mon, Jul 27, 2009 at 3:58 AM, Markus Koveromarkus.kov...@nebula.fi wrote:
 Oh well, whole system seems to be deadlocked.

 nice. Little too keen keeping data safe :-P



 Yours

 Markus Kovero



 From: zfs-discuss-boun...@opensolaris.org
 [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Markus Kovero
 Sent: 27. heinäkuuta 2009 13:39
 To: zfs-discuss@opensolaris.org
 Subject: [zfs-discuss] zfs destroy slow?



 Hi, how come zfs destroy being so slow, eg. destroying 6TB dataset renders
 zfs admin commands useless for time being, in this case for hours?

 (running osol 111b with latest patches.)



 Yours

 Markus Kovero

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



I submitted a bug, but I don't think its been assigned a case number yet.
I see this exact same behavior on my X4540's. I create a lot of
snapshots, and when I tidy up, zfs destroy can 'stall' any and all ZFS
related commands for hours, or even days (in the case of nested
snapshots).
The only resolution is not to ever use zfs destroy, or just simply
wait it out. It will eventually finish, just not in any reasonable
timeframe.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs destroy slow?

2009-07-28 Thread Brent Jones



 I submitted a bug, but I don't think its been assigned a case number yet.
 I see this exact same behavior on my X4540's. I create a lot of
 snapshots, and when I tidy up, zfs destroy can 'stall' any and all ZFS
 related commands for hours, or even days (in the case of nested
 snapshots).
 The only resolution is not to ever use zfs destroy, or just simply
 wait it out. It will eventually finish, just not in any reasonable
 timeframe.

 --
 Brent Jones
 br...@servuhome.net


Correction, looks like my bug is 6855208

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Opensolaris attached to 70 disk HP array

2009-07-23 Thread Brent Jones
Looking at this external array by HP:
http://h18006.www1.hp.com/products/storageworks/600mds/index.html

70 disks in 5U, which could probably be configured in JBOD.
Has anyone attempted to connect this to a box running opensolaris to
create a 70 disk pool?

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-19 Thread Brent Jones
On Sat, Jul 18, 2009 at 7:39 PM, Russelno-re...@opensolaris.org wrote:
 Yes you'll find my name all over VB at the moment, but I have found it to be 
 stable
 (don't install the addons disk for solaris!!, use 3.0.2, and for me 
 winXP32bit and
 OpenSolaris 2009.6 has been rock solid, it was (seems) to be opensolaris 
 failed
 with extract_boot_list doesn't belong to 101, but noone on opensol, seems
 interested about it as other have reported it to, prob a rare issue.

 But yer, I hope Vicktor or someone will take a look. My worry is that if we
 can't recover from this, which a number of people (in variuos forms) have 
 come accross zfs may be introuble. We had this happen at work about 18 months 
 ago
 lost all the data (20TB)(didn't know about zdb nor did sun support) so we 
 have start
 to back away, but I though since jan 2009 patches things were meant to be 
 alot better, esp with sun using it in there storage servers now
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


No offense, but you trusted 10TB of important data, running in
OpenSolaris from inside Virtualbox (not stable) on top of Windows XP
(arguably not stable, especially for production) on probably consumer
grade hardware with unknown support for any of the above products?

I'd like to say this was an unfortunate circumstance, but there are
many levels of fail here, and to blame ZFS seems misplaced, and the
subject on this thread especially inflammatory.



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Hanging receive

2009-07-03 Thread Brent Jones
On Fri, Jul 3, 2009 at 8:31 PM, Ian Collinsi...@ianshome.com wrote:
 Ian Collins wrote:

 I was doing an incremental send between pools, the receive side is locked
 up and no zfs/zpool commands work on that pool.

 The stacks look different from those reported in the earlier ZFS snapshot
 send/recv hangs X4540 servers thread.

 Here is the process information from scat (other commands hanging on the
 pool are also in cv_wait):

 Has anyone else seen anything like this?  The box wouldn't even reboot, it
 had to be power cycled.  It locks up on receive regularly now.

 SolarisCAT(live/10X) proc -L 18500
      addr         PID    PPID   RUID/UID     size      RSS     swresv
 time  command
 == == == == ==  
 == =
 0xffc8d1990398  18500  14729          0    5369856  2813952  1064960
   32 zfs receive -v -d backup

  user (LWP_SYS) thread: 0xfe84e0d5bc20  PID: 18500 
 cmd: zfs receive -v -d backup
 t_wchan: 0xa0ed62a2  sobj: condition var (from
 zfs:txg_wait_synced+0x83)
 t_procp: 0xffc8d1990398
  p_as: 0xfee19d29c810  size: 5369856  RSS: 2813952
  hat: 0xfedb762d2818  cpuset:
  zone: global
 t_stk: 0xfe8000143f10  sp: 0xfe8000143b10  t_stkbase:
 0xfe800013f000
 t_pri: 59(TS)  pctcpu: 0.00
 t_lwp: 0xfe84e92d6ec0  lwp_regs: 0xfe8000143f10
  mstate: LMS_SLEEP  ms_prev: LMS_SYSTEM
  ms_state_start: 15 minutes 4.476756638 seconds earlier
  ms_start: 15 minutes 8.447715668 seconds earlier
 psrset: 0  last CPU: 2
 idle: 102425 ticks (17 minutes 4.25 seconds)
 start: Thu Jul  2 22:23:06 2009
 age: 1029 seconds (17 minutes 9 seconds)
 syscall: #54 ioctl(, 0x0) (sysent: genunix:ioctl+0x0)
 tstate: TS_SLEEP - awaiting an event
 tflg:   T_DFLTSTK - stack is default size
 tpflg:  TP_TWAIT - wait to be freed by lwp_wait
       TP_MSACCT - collect micro-state accounting information
 tsched: TS_LOAD - thread is in memory
       TS_DONT_SWAP - thread/LWP should not be swapped
 pflag:  SKILLED - SIGKILL has been posted to the process
       SMSACCT - process is keeping micro-state accounting
       SMSFORK - child inherits micro-state accounting

 pc:      unix:_resume_from_idle+0xf8 resume_return:  addq   $0x8,%rsp

 unix:_resume_from_idle+0xf8 resume_return()
 unix:swtch+0x12a()
 genunix:cv_wait+0x68()
 zfs:txg_wait_synced+0x83()
 zfs:dsl_sync_task_group_wait+0xed()
 zfs:dsl_sync_task_do+0x54()
 zfs:dmu_objset_create+0xc5()
 zfs:zfs_ioc_create+0xee()
 zfs:zfsdev_ioctl+0x14c()
 genunix:cdev_ioctl+0x1d()
 specfs:spec_ioctl+0x50()
 genunix:fop_ioctl+0x25()
 genunix:ioctl+0xac()
 unix:_syscall32_save+0xbf()
 -- switch to user thread's user stack --

 The box is an x4500, Solaris 10u7.



 --
 Ian.

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I hit this too:
6826836

Fixed in 117

http://opensolaris.org/jive/thread.jspa?threadID=104852tstart=120


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Brent Jones
On Tue, Jun 30, 2009 at 12:25 PM, Bob
Friesenhahnbfrie...@simple.dallas.tx.us wrote:
 On Mon, 29 Jun 2009, Lejun Zhu wrote:

 With ZFS write throttle, the number 2.5GB is tunable. From what I've read
 in the code, it is possible to e.g. set zfs:zfs_write_limit_override =
 0x800 (bytes) to make it write 128M instead.

 This works, and the difference in behavior is profound.  Now it is a matter
 of finding the best value which optimizes both usability and performance.
  A tuning for 384 MB:

 # echo zfs_write_limit_override/W0t402653184 | mdb -kw
 zfs_write_limit_override:       0x3000      =       0x1800

 CPU is smoothed out quite a lot and write latencies (as reported by a
 zio_rw.d dtrace script) are radically different than before.

 Perfmeter display for 256 MB:
 http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-256mb.png

 Perfmeter display for 384 MB:
 http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-384mb.png

 Perfmeter display for 768 MB:
 http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-768mb.png

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Maybe there could be a supported ZFS tuneable (per file system even?)
that is optimized for 'background' tasks, or 'foreground'.

Beyond that, I will give this tuneable a shot and see how it impacts
my own workload.

Thanks!

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-29 Thread Brent Jones
On Mon, Jun 29, 2009 at 2:48 PM, Bob
Friesenhahnbfrie...@simple.dallas.tx.us wrote:
 On Wed, 24 Jun 2009, Lejun Zhu wrote:

 There is a bug in the database about reads blocked by writes which may be
 related:

 http://bugs.opensolaris.org/view_bug.do?bug_id=6471212

 The symptom is sometimes reducing queue depth makes read perform better.

 I have been banging away at this issue without resolution.  Based on Roch
 Bourbonnais's blog description of the ZFS write throttle code, it seems that
 I am facing a perfect storm.  Both the storage write bandwidth (800+
 MB/second) and the memory size of my system (20 GB) result in the algorithm
 batching up 2.5 GB of user data to write. Since I am using mirrors, this
 results in 5 GB of data being written at full speed to the array on a very
 precise schedule since my application is processing fixed-sized files with a
 fixed algorithm. The huge writes lead to at least 3 seconds of read
 starvation, resulting in a stalled application and a square-wave of system
 CPU utilization.  I could attempt to modify my application to read ahead by
 3 seconds but that would require gigabytes of memory, lots of complexity,
 and would not be efficient.

 Richard Elling thinks that my array is pokey, but based on write speed and
 memory size, ZFS is always going to be batching up data to fill the write
 channel for 5 seconds so it does not really matter how fast that write
 channel is.  If I had 32GB of RAM and 2X the write speed, the situation
 would be identical.

 Hopefully someone at Sun is indeed working this read starvation issue and it
 will be resolved soon.

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I see similar square-wave performance. However, my load is primarily
write-based, when those commits happen, I see all network activity
pause while the buffer is commited to disk.
I write about 750Mbit/sec over the network to the X4540's during
backup windows using primarily iSCSI. When those writes occur to my
RaidZ volume, all activity pauses until the writes are fully flushed.

One thing to note, on 117, the effects are seemingly reduced and a bit
more even performance, but it is still there.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] ZFS snapshot send/recv hangs X4540 servers

2009-06-28 Thread Brent Jones
On Fri, Jun 26, 2009 at 10:14 AM, Brent Jonesbr...@servuhome.net wrote:
 On Thu, Jun 25, 2009 at 12:00 AM, James Leverj...@jamver.id.au wrote:

 On 25/06/2009, at 4:38 PM, John Ryan wrote:

 Can I ask the same question - does anyone know when the 113 build will
 show up on pkg.opensolaris.org/dev ?

 On 24/06/2009, at 9:49 PM, Dave Miner wrote to indiana-discuss:

 There were problems with 116 that caused us to not release it.  117 is
 under construction, available in the next few days.

 cheers,
 James
 ___
 storage-discuss mailing list
 storage-disc...@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/storage-discuss


 I checked this morning, and 117 is available now

 --
 Brent Jones
 br...@servuhome.net


Confirming this issue is fixed on build 117.
Snapshots are significantly faster as well. My average transfer speed
went from about 15MB/sec to over 40MB/sec. I imagine that 40MB/sec is
now a limitation of the CPU, as I can see SSH maxing out a single core
on the quad cores.
Maybe SSH can be made multi-threaded next?  :)


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for iSCSI based SAN

2009-06-26 Thread Brent Jones
On Fri, Jun 26, 2009 at 6:04 PM, Bob
Friesenhahnbfrie...@simple.dallas.tx.us wrote:
 On Fri, 26 Jun 2009, Scott Meilicke wrote:

 I ran the RealLife iometer profile on NFS based storage (vs. SW iSCSI),
 and got nearly identical results to having the disks on iSCSI:

 Both of them are using TCP to access the server.

 So it appears NFS is doing syncs, while iSCSI is not (See my earlier zpool
 iostat data for iSCSI). Isn't this what we expect, because NFS does syncs,
 while iSCSI does not (assumed)?

 If iSCSI does not do syncs (presumably it should when a cache flush is
 requested) then NFS is safer in case the server crashes and reboots.

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I'll chime in here as I've had experience with this subject as well
(ZFS NFS/iSCSI).
It depends on your NFS client!

I was using the FreeBSD NFSv3 client, which by default does an fsync()
for every NFS block (8KB afaik).
However, I changed the source and recompile so it would only fsync()
on file close or I believe after 5MB. I went from 3MB/sec, to over
100MB/sec after my change.
I detailed my struggle here:

http://www.brentrjones.com/?p=29

As for iSCSI, I am currently benchmarking the COMSTAR iSCSI target. I
previously used the old iscsitgtd framework with ZFS. Previously I
would get about 35-40MB/sec.
My initial testing with the new COMSTAR iSCSI target is not revealing
any substantial performance increase at all.
I've tried zvol based lu's, and file based lu's with no perceived
performance difference at all.

The iSCSI target is an X4540, 64GB RAM, and 48x 1TB disks configured
with 8 vdevs with 5-6 disks each. No SSD, ZIL enabled.

My NFS performance is now over 100MB/sec, I can get over 100MB/sec
with CIFS as well. However, my iSCSI performance is still rather low
for the hardware.

It is a standard GigE network, currently jumbo frames are disabled,
when I get some time I may make a VLAN with jumbo frames enabled and
see if that changes anything at all (not likely).

I am CC'ing the storage-discuss group as well for coverage as this
covers ZFS, and storage.

If anyone has some thoughts, code, or tests, I can run them on my
X4540's and see how it goes.

Thanks


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS snapshot send/recv hangs X4540 servers

2009-06-11 Thread Brent Jones


 After examining the dump we got from you (thanks again), we're relatively
 sure you are hitting

 6826836 Deadlock possible in dmu_object_reclaim()

 This was introduced in nv_111 and fixed in nv_113.

 Sorry for the trouble.

 -tim



Do you know when new builds will show up on pkg.opensolaris.org/dev ?


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS snapshot send/recv hangs X4540 servers

2009-06-08 Thread Brent Jones
On Sun, Jun 7, 2009 at 3:50 AM, Ian Collinsi...@ianshome.com wrote:
 Ian Collins wrote:

 Tim Haley wrote:

 Brent Jones wrote:

 On the sending side, I CAN kill the ZFS send process, but the remote
 side leaves its processes going, and I CANNOT kill -9 them. I also
 cannot reboot the receiving system, at init 6, the system will just
 hang trying to unmount the file systems.
 I have to physically cut power to the server, but a couple days later,
 this issue will occur again.


 A crash dump from the receiving server with the stuck receives would be
 highly useful, if you can get it. Reboot -d would be best, but it might just
 hang. You can try savecore -L.

 I tried a reboot -d (I even had kmem-flags=0xf set), but it did hang. I
 didn't try savecore.

 One thing I didn't try was scat on the running system. What should I look
 for (with scat) if this happens again?

 I now have a system with a hanging zfs receive, any hints on debugging it?

 --
 Ian.

I haven't figured out a way to identify the problem, still trying to
find a 100% way to reproduce this problem.
Seemingly the more snapshots I send at a given time, the likelihood of
this happening goes up, but, correlation is not causation  :)

I might try to open a support case with Sun (have a support contract),
but Opensolaris doesn't seem to be well understood by the support
folks yet, so not sure how far it will get.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS snapshot send/recv hangs X4540 servers

2009-06-08 Thread Brent Jones

 I haven't figured out a way to identify the problem, still trying to
 find a 100% way to reproduce this problem.
 Seemingly the more snapshots I send at a given time, the likelihood of
 this happening goes up, but, correlation is not causation  :)

 I might try to open a support case with Sun (have a support contract),
 but Opensolaris doesn't seem to be well understood by the support
 folks yet, so not sure how far it will get.

 --
 Brent Jones
 br...@servuhome.net


I can reproduce this 100% by sending about 6 or more snapshots at once.

Here is some output that JBK helped me put together:

Here is a pastebin 'mdb' findstack output:
http://pastebin.com/m4751b08c

Not sure what I'm looking at, but maybe someone at Sun can see whats going on?



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS snapshot send/recv hangs X4540 servers

2009-06-08 Thread Brent Jones
On Mon, Jun 8, 2009 at 9:38 PM, Richard Lowerichl...@richlowe.net wrote:
 Brent Jones br...@servuhome.net writes:



 I've had similar issues with similar traces.  I think you're waiting on
 a transaction that's never going to come.

 I thought at the time that I was hitting:
   CR 6367701 hang because tx_state_t is inconsistent

 But given the rash of reports here, it seems perhaps this is something
 different.

 I, like you, hit it when sending snapshots, it seems (in my case) to be
 specific to incremental streams, rather than full streams, I can send
 seemingly any number of full streams, but incremental sends via send -i,
 or send -R of datasets with multiple snapshots, will get into a state
 like that above.

 -- Rich


For now, back to snv_106 (the most stable build that I've seen, like it a lot)
I'll open a case in the morning, and see what they suggest.


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS snapshot send/recv hangs X4540 servers

2009-06-05 Thread Brent Jones
Hello all,
I had been running snv_106 for about 3 or 4 months on a pair of X4540's.
I would ship snapshots from the primary server to the secondary server
nightly, which was working really well.

However, I have upgraded to 2009.06, and my replication scripts appear
to hang when performing zfs send/recv.
When one zfs send/recv process hangs, you cannot send any other
snapshots from any other filesystem to the remote host.
I have about 20 file systems I snapshots and replicate nightly.

The script I use to perform the snapshots is here:
http://www.brentrjones.com/wp-content/uploads/2009/03/replicate.ksh

On the remote side, I end up with many hung processes, like this:

  bjones 11676 11661   0 01:30:03 ?   0:00 /sbin/zfs recv -vFd pdxfilu02
  bjones 11673 11660   0 01:30:03 ?   0:00 /sbin/zfs recv -vFd pdxfilu02
  bjones 11664 11653   0 01:30:03 ?   0:00 /sbin/zfs recv -vFd pdxfilu02
  bjones 13727 13722   0 14:21:20 ?   0:00 /sbin/zfs recv -vFd pdxfilu02

And so on, one for each file system.

On the receiving end, 'zfs list' shows one filesystem attempting to
receive a snapshot, but I cannot stop it:

$ zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
pdxfilu02/data/fs01/%20090605-00:30:00  1.74G  27.2T   208G
/pdxfilu02/data/fs01/%20090605-00:30:00



On the sending side, I CAN kill the ZFS send process, but the remote
side leaves its processes going, and I CANNOT kill -9 them. I also
cannot reboot the receiving system, at init 6, the system will just
hang trying to unmount the file systems.
I have to physically cut power to the server, but a couple days later,
this issue will occur again.


I'f I boot to my snv_106 BE, everything works fine, this issue has
never occurred on that version.

Any thoughts?

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] ZFS snapshot send/recv hangs X4540 servers

2009-06-05 Thread Brent Jones
On Fri, Jun 5, 2009 at 2:28 PM, Mike La Spina mike.lasp...@laspina.ca wrote:
 Hi,

 I have replications between hosts and they are working fine with zfs 
 send/recv's after upgrading to Indiana snv_111b (2009.06).

 Have you run the commands manually to see any messages/prompts are occurring?

 It sounds like its waiting for some input.

 Regards,

 Mike

 http://blog.laspina.ca/
 --
 This message posted from opensolaris.org
 ___
 storage-discuss mailing list
 storage-disc...@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/storage-discuss


If I power cycle the server, I can run the replication script manually.
The script will go automatically again for another night or two,
before hanging up.
I've piped all output to a file, and there isn't any prompt for user
input, and the zfs receive on the remote side is un-killable (and
hangs the server when trying to restart).

It appears to be the receiving end choking on a snapshot, and not
allowing any more to run.
Once one snapshot freezes, running another (for a different file
system) zfs send/recv will just stall, with another un-killable zfs
receive.


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] ZFS snapshot send/recv hangs X4540 servers

2009-06-05 Thread Brent Jones
On Fri, Jun 5, 2009 at 2:49 PM, Rick Romero r...@havokmon.com wrote:
 On Fri, 2009-06-05 at 14:45 -0700, Brent Jones wrote:
 On Fri, Jun 5, 2009 at 2:28 PM, Mike La Spina mike.lasp...@laspina.ca 
 wrote:
  Hi,
 
  I have replications between hosts and they are working fine with zfs 
  send/recv's after upgrading to Indiana snv_111b (2009.06).
 
  Have you run the commands manually to see any messages/prompts are 
  occurring?
 
  It sounds like its waiting for some input.
 
  Regards,
 
  Mike
 
  http://blog.laspina.ca/
  --
  This message posted from opensolaris.org
  ___
  storage-discuss mailing list
  storage-disc...@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/storage-discuss
 

 If I power cycle the server, I can run the replication script manually.
 The script will go automatically again for another night or two,
 before hanging up.
 I've piped all output to a file, and there isn't any prompt for user
 input, and the zfs receive on the remote side is un-killable (and
 hangs the server when trying to restart).

 It appears to be the receiving end choking on a snapshot, and not
 allowing any more to run.
 Once one snapshot freezes, running another (for a different file
 system) zfs send/recv will just stall, with another un-killable zfs
 receive.


 Is it the version of ZFS?   I think it was upgraded.  I noticed
 something similar after upgrading ZFS on FreeBSD 7 STABLE.  I was trying
 to zfs send my @Tuesday, and an automatic script ran (which deletes
 @Tuesday and takes a new snap) - and rather than failing as I expected,
 the destroy and snapshot commands hung around until the send was done
 (hosed up my incrementals - doh :)

 Rick




Running the latest version of ZFS on all my file systems.
My replication script adds a user property to the file system, to
effectively lock it.
My cleanup scripts check for that lock flag, and will die if they see it set.

Its the send/receive that is hung up, I see the pending receiving
still sitting there, more than 24 hours later.

Sad

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS snapshot send/recv hangs X4540 servers

2009-06-05 Thread Brent Jones
On Fri, Jun 5, 2009 at 3:25 PM, Ian Collins i...@ianshome.com wrote:
 Brent Jones wrote:

 On the sending side, I CAN kill the ZFS send process, but the remote
 side leaves its processes going, and I CANNOT kill -9 them. I also
 cannot reboot the receiving system, at init 6, the system will just
 hang trying to unmount the file systems.
 I have to physically cut power to the server, but a couple days later,
 this issue will occur again.



 I have seen this on Solaris 10.  Something appears to break with a pool or
 filesystem causing zfs receive to hang in the kernel.  Once this happens,
 any zfs command that changes the state of the pool/filesystem will hang,
 including a zpool detach or an int 6.

 Can you get truss -p or mdb -p to work on the stuck process?

 --
 Ian.



I cannot.

# truss -p 11308
truss: unanticipated system error: 11308
(r...@pdxfilu02)-(06:29 PM Fri Jun 05)-(log)
# mdb -p 11308
mdb: cannot debug 11308: unanticipated system error
mdb: failed to initialize target: No such file or directory


All the hung zfs receives PID's have '1' as their PPID.
Is it safe to truss PID 1?  :)

When you saw this, how did you escape it? I've found only pulling the
plug will fix it.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS snapshot send/recv hangs X4540 servers

2009-06-05 Thread Brent Jones
On Fri, Jun 5, 2009 at 4:20 PM, Tim Haley tim.ha...@sun.com wrote:
 Brent Jones wrote:

 Hello all,
 I had been running snv_106 for about 3 or 4 months on a pair of X4540's.
 I would ship snapshots from the primary server to the secondary server
 nightly, which was working really well.

 However, I have upgraded to 2009.06, and my replication scripts appear
 to hang when performing zfs send/recv.
 When one zfs send/recv process hangs, you cannot send any other
 snapshots from any other filesystem to the remote host.
 I have about 20 file systems I snapshots and replicate nightly.

 The script I use to perform the snapshots is here:
 http://www.brentrjones.com/wp-content/uploads/2009/03/replicate.ksh

 On the remote side, I end up with many hung processes, like this:

  bjones 11676 11661   0 01:30:03 ?           0:00 /sbin/zfs recv -vFd
 pdxfilu02
  bjones 11673 11660   0 01:30:03 ?           0:00 /sbin/zfs recv -vFd
 pdxfilu02
  bjones 11664 11653   0 01:30:03 ?           0:00 /sbin/zfs recv -vFd
 pdxfilu02
  bjones 13727 13722   0 14:21:20 ?           0:00 /sbin/zfs recv -vFd
 pdxfilu02

 And so on, one for each file system.

 On the receiving end, 'zfs list' shows one filesystem attempting to
 receive a snapshot, but I cannot stop it:

 $ zfs list
 NAME                                       USED  AVAIL  REFER  MOUNTPOINT
 pdxfilu02/data/fs01/%20090605-00:30:00  1.74G  27.2T   208G
 /pdxfilu02/data/fs01/%20090605-00:30:00



 On the sending side, I CAN kill the ZFS send process, but the remote
 side leaves its processes going, and I CANNOT kill -9 them. I also
 cannot reboot the receiving system, at init 6, the system will just
 hang trying to unmount the file systems.
 I have to physically cut power to the server, but a couple days later,
 this issue will occur again.


 A crash dump from the receiving server with the stuck receives would be
 highly useful, if you can get it.  Reboot -d would be best, but it might
 just hang. You can try savecore -L.

 -tim

 I'f I boot to my snv_106 BE, everything works fine, this issue has
 never occurred on that version.

 Any thoughts?




I'm doing a savecore -L, but I have 64GB of ram, which makes the dumps
a pita to work with.

Is there any additional information I can provide?

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS snapshot send/recv hangs X4540 servers

2009-06-05 Thread Brent Jones
On Fri, Jun 5, 2009 at 4:20 PM, Tim Haley tim.ha...@sun.com wrote:
 Brent Jones wrote:

 Hello all,
 I had been running snv_106 for about 3 or 4 months on a pair of X4540's.
 I would ship snapshots from the primary server to the secondary server
 nightly, which was working really well.

 However, I have upgraded to 2009.06, and my replication scripts appear
 to hang when performing zfs send/recv.
 When one zfs send/recv process hangs, you cannot send any other
 snapshots from any other filesystem to the remote host.
 I have about 20 file systems I snapshots and replicate nightly.

 The script I use to perform the snapshots is here:
 http://www.brentrjones.com/wp-content/uploads/2009/03/replicate.ksh

 On the remote side, I end up with many hung processes, like this:

  bjones 11676 11661   0 01:30:03 ?           0:00 /sbin/zfs recv -vFd
 pdxfilu02
  bjones 11673 11660   0 01:30:03 ?           0:00 /sbin/zfs recv -vFd
 pdxfilu02
  bjones 11664 11653   0 01:30:03 ?           0:00 /sbin/zfs recv -vFd
 pdxfilu02
  bjones 13727 13722   0 14:21:20 ?           0:00 /sbin/zfs recv -vFd
 pdxfilu02

 And so on, one for each file system.

 On the receiving end, 'zfs list' shows one filesystem attempting to
 receive a snapshot, but I cannot stop it:

 $ zfs list
 NAME                                       USED  AVAIL  REFER  MOUNTPOINT
 pdxfilu02/data/fs01/%20090605-00:30:00  1.74G  27.2T   208G
 /pdxfilu02/data/fs01/%20090605-00:30:00



 On the sending side, I CAN kill the ZFS send process, but the remote
 side leaves its processes going, and I CANNOT kill -9 them. I also
 cannot reboot the receiving system, at init 6, the system will just
 hang trying to unmount the file systems.
 I have to physically cut power to the server, but a couple days later,
 this issue will occur again.


 A crash dump from the receiving server with the stuck receives would be
 highly useful, if you can get it.  Reboot -d would be best, but it might
 just hang. You can try savecore -L.

 -tim

 I'f I boot to my snv_106 BE, everything works fine, this issue has
 never occurred on that version.

 Any thoughts?




Well, I think I found a specific file system that is causing this.
I kicked off a zpool scrub to see if there might be corruption on
either end, but that takes well over 40 hours on these servers.


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs scheduled replication script?

2009-03-29 Thread Brent Jones
On Sat, Mar 28, 2009 at 5:40 PM, Fajar A. Nugraha fa...@fajar.net wrote:
 On Sun, Mar 29, 2009 at 3:40 AM, Brent Jones br...@servuhome.net wrote:
 I have since modified some scripts out there, and rolled them into my
 own, you can see it here at pastebin.com:

 http://pastebin.com/m3871e478

 Thanks Brent.

 Your script seems to handle failed replication and locking pretty well.
 It doesn't seem to log WHY the replication failed though, so I think
 there should be something that captures stderr on line 91.

 One more question, is there anything on that script that requires ksh?
 A quick glance seems to indicate that it will work with bash as well.

 Regards,

 Fajar


I'll see about capturing from stderr, something I should've added anyways.

It would probably work under bash too, but some of the case checking
came from the original Sun scripts, which were in KSH.
I looked it up, and bash has -z string checking, so, everything
'should' work under bash. I'll test on Monday

I'd love to see others improve on the original Sun ones, or parts of
mine... I only say that because the dependency checking, and the
locking were something that I didn't see in any one elses scripts. I
did it a pretty lame way I'm sure, hopefully someone can find a better
way  :)

CC:ing opensolaris-discuss, as others are probably asking similar
questions about snapshot replication.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Notations in zpool status

2009-03-29 Thread Brent Jones
On Sun, Mar 29, 2009 at 8:54 AM, Harry Putnam rea...@newsguy.com wrote:
 Is there some handy way to make notations about zpools.  Something
 that would show up in the output of `zpool status' (or some other
 command)

 I mean descriptive notes maybe outlining the zpools' purpose?

 Browsing around in `man zpool' I don't see that, but may be
 overlooking it.  The man page is near 1000 lines.

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


You can add user properties to file systems. But afaik they would not
show up in zpool status.
For example:
zfs set note:purpose=This file system is important

zfs get note:purpose somefilesystem

Maybe that helps...

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can this be done?

2009-03-29 Thread Brent Jones
On Sun, Mar 29, 2009 at 1:37 PM, Michael Shadle mike...@gmail.com wrote:
 On Sun, Mar 29, 2009 at 10:35 AM, David Magda dma...@ee.ryerson.ca wrote:

 Create new pool, move data to it (zfs send/recv), destroy old RAID-Z1 pool.

 Would send/recv be more efficient than just a massive rsync or related?

 Also I'd have to reduce the data on my existing raidz1 as it is almost
 full, and the raidz2 it would be sending to would be 1.5tb smaller
 technically.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



I'd personally say send/recv would be more efficient, rsync is awfully
slow on large data sets. But, it depends what build you are using!
BugID 6418042 (slow zfs send/recv) was fixed in build 105, it impacted
send/recv operations local to remote, not sure if it happens local to
local, but I experienced it doing local-remote send/recv.

Not sure the best way to handle moving data around, when space is
tight though...

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs scheduled replication script?

2009-03-28 Thread Brent Jones
On Sat, Mar 28, 2009 at 11:20 AM, Fajar A. Nugraha fa...@fajar.net wrote:
 I have a backup system using zfs send/receive (I know there are pros
 and cons to that, but it's suitable for what I need).

 What I have now is a script which runs daily, do zfs send, compress
 and write it to a file, then transfer it with ftp to a remote host. It
 does full backup every 1st, and do incremental (with 1st as reference)
 after that. It works, but not quite resource-effective (for example,
 the full backup every month, and the big size of incremental backup on
 30th).

 I'm thinking of changing it to a script which can automate replication
 of a zfs pool or filesystem via zfs send/receive to a remote host (via
 ssh or whatever). It should be smart enough to choose between full and
 incremental, and choose which snapshot to base the incremental stream
 from (in case a scheduled incremental is missed), preferably able to
 use snapshots created by zfs/auto-snapshot smf service.

 To prevent re-inventing the wheel, does such script exists already?
 I prever not to use AVS as I can't use on existing zfs pool.

 Regards,

 Fajar
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


The ZFS automatic snapshot tools have the ability to execute any
command after a snapshot has taken place, such as ZFS send.
However, I ran into some issues, as it wasn't strictly designed for
that and didn't handle errors very well.

I have since modified some scripts out there, and rolled them into my
own, you can see it here at pastebin.com:

http://pastebin.com/m3871e478

Inspiration:
http://blogs.sun.com/constantin/entry/zfs_replicator_script_new_edition
http://blogs.sun.com/timf/en_IE/entry/zfs_automatic_snapshots_in_nv

Those are some good resources, from that, you can make something work
that is tailored to your environment.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Mount ZFS hangs on boot

2009-03-18 Thread Brent Jones
Hello,
I have an X4540 running 2008.11 snv_106
I rebooted it tonight since I had a hung iSCSI connection to the Sun
box that wouldn't go away (couldn't delete that particular ZFS
filesystem till the initiator drops connection)

Upon reboot, the system will hang after printing the license header. I
then rebooted again with '-m milestone=none' and then went to single
user mode after that.
It appears system/filesystem/usr:default is the service that is
hanging during boot.
I've looked at 'zpool iostat' and there is a substantial amount of IO
happening after you initiate system/filesystem/usr:default but it
will it there for quite some time and not start.
I am unsure what it's doing, zpool status shows 0 errors, all devices
normal (46 drives, in 5-6 disk RAIDZ groups).
I also loaded arcstat.pl (saw it floating around here) and its showing
~230 ops/sec, with 100% arc misses.
Whatever its doing, the load is very random I/O, and heavy, but little
progress appears to be happening.

I only have about 50 filesystems, and just a handful of snapshots for
each filesystem.

Thanks!

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mount ZFS hangs on boot

2009-03-18 Thread Brent Jones
On Wed, Mar 18, 2009 at 11:28 AM, Miles Nordin car...@ivy.net wrote:
 bj == Brent Jones br...@servuhome.net writes:

    bj I only have about 50 filesystems, and just a handful of
    bj snapshots for each filesystem.

 there were earlier stories of people who had imports taking hours to
 complete with no feedback because ZFS was rolling forward some
 partly-completed operation interrupted by the crash, like destroying a
 snapshot or something.  maybe you shoudl just wait.


Wait I did, and it did finally come up.
A partially completed operation may make sense, as when the iSCSI
target was block due to a Windows box hanging, and the connection not
letting go, a ZFS destroy on that pool never did complete.
So maybe it tried to finish that action.

A mystery for sure, but its up and working now.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs streams data corruption

2009-02-24 Thread Brent Jones
On Tue, Feb 24, 2009 at 10:41 AM, Christopher Mera cm...@reliantsec.net wrote:
 Either way -  it would be ideal to quiesce the system before a snapshot 
 anyway, no?

 My next question now is what particular steps would be recommended to quiesce 
 a system for the clone/zfs stream that I'm looking to achieve...


 All your help is appreciated.

 Regards,
 Christopher Mera
 -Original Message-
 From: Mattias Pantzare [mailto:pantz...@gmail.com]
 Sent: Tuesday, February 24, 2009 1:38 PM
 To: Nicolas Williams
 Cc: Christopher Mera; zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] zfs streams  data corruption

 On Tue, Feb 24, 2009 at 19:18, Nicolas Williams
 nicolas.willi...@sun.com wrote:
 On Mon, Feb 23, 2009 at 10:05:31AM -0800, Christopher Mera wrote:
 I recently read up on Scott Dickson's blog with his solution for
 jumpstart/flashless cloning of ZFS root filesystem boxes.  I have to say
 that it initially looks to work out cleanly, but of course there are
 kinks to be worked out that deal with auto mounting filesystems mostly.

 The issue that I'm having is that a few days after these cloned systems
 are brought up and reconfigured they are crashing and svc.configd
 refuses to start.

 When you snapshot a ZFS filesystem you get just that -- a snapshot at
 the filesystem level.  That does not mean you get a snapshot at the
 _application_ level.  Now, svc.configd is a daemon that keeps a SQLite2
 database.  If you snapshot the filesystem in the middle of a SQLite2
 transaction you won't get the behavior that you want.

 In other words: quiesce your system before you snapshot its root
 filesystem for the purpose of replicating that root on other systems.

 That would be a bug in ZFS or SQLite2.

 A snapshoot should be an atomic operation. The effect should be the
 same as power fail in the meddle of an transaction and decent
 databases can cope with that.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


If you are writing a script to handle ZFS snapshots/backups, you could
issue an SMF command to stop the service before taking the snapshot.
Or at the very minimum, perform an SQL dump of the DB so you at least
have a consistent full copy of the DB as a flat file in case you can't
stop the DB service.


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs streams data corruption

2009-02-24 Thread Brent Jones
On Tue, Feb 24, 2009 at 11:32 AM, Christopher Mera cm...@reliantsec.net wrote:
 Thanks for your responses..

 Brent:
 And I'd have to do that for every system that I'd want to clone?  There
 must be a simpler way.. perhaps I'm missing something.


 Regards,
 Chris


Well, unless the database software itself can notice a snapshot
taking place, and flush all data to disk, pause transactions until the
snapshot is finished, then properly resume, I don't know what to tell
you.
It's an issue for all databases, Oracle, MSSQL, MySQL... how to do an
atomic backup, without stopping transactions, and maintaining
consistency.
Replication is on possible solution, dumping to a file periodically is
one, or just tolerating that your database will not be consistent
after a snapshot and have to replay logs / consistency check it after
bringing it up from a snapshot.

Once you figure that out in a filesystem agnostic way, you'll be a
wealthy person indeed.


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs recv' is very slow

2009-02-13 Thread Brent Jones
On Mon, Feb 2, 2009 at 6:55 AM, Robert Milkowski mi...@task.gda.pl wrote:
 It definitely does. I made some tests today comparing b101 with b105 while 
 doing 'zfs send -R -I A B /dev/null' with several dozen snapshots between A 
 and B. Well, b105 is almost 5x faster in my case - that's pretty good.

 --
 Robert Milkowski
 http://milek.blogspot.com
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Sad to report that I am seeing the slow zfs recv issue cropping up
again while running b105  :(

Not sure what has triggered the change, but I am seeing the same
behavior again: massive amounts of reads on the receiving side, while
only receiving just tiny bursts of data amounting to a mere megabyte a
second.

It doesn't seem to happen every single time though which is odd, but I
can provoke it by destroying a snapshot from the pool I am sending,
then taking another snapshot and re-sending it. It seems to cause the
receiving side to go into this read storm before any data is
transferred.

I'm going to open a case in the morning, and see if I can't get an
engineer to look at this.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Write caches on X4540

2009-02-11 Thread Brent Jones
On Wed, Feb 11, 2009 at 2:13 PM, Greg Mason gma...@msu.edu wrote:
 We're using some X4540s, with OpenSolaris 2008.11.

 According to my testing, to optimize our systems for our specific workload,
 I've determined that we get the best performance with the write cache
 disabled on every disk, and with zfs:zfs_nocacheflush=1 set in /etc/system.

 The only issue is setting the write cache permanently, or at least quickly.

 Right now, as it is, I've scripted up format to run on boot, disabling the
 write cache of all disks. This takes around two minutes. I'd like to avoid
 needing to take this time on every bootup (which is more often than you'd
 think, we've got quite a bit of construction happening, which necessitates
 bringing everything down periodically). This would also be painful in the
 event of unplanned downtime for one of our Thors.

 so, basically, my question is: Is there a way to quickly or permanently
 disable the write cache on every disk in an X4540?

 Thanks,

 -Greg
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


We use several X4540's over here as well, what type of workload do you
have, and how much performance increase did you see by disabling the
write caches?


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss bug - sidelined??

2009-02-06 Thread Brent Jones
://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Could this be related to the ZFS TXG/transfer group buffers?

ie. it'll buffer writes for a bit before committing to disk. Then,
when its time to commit to disk, it realizes the disk is failed, and
from then enter those failmode conditions (wait, continue, panic, ?).
Could this be the case?

http://blogs.sun.com/roch/date/20080514


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Unusual CIFS write bursts

2009-01-27 Thread Brent Jones
On Tue, Jan 27, 2009 at 5:47 PM, Richard Elling
richard.ell...@gmail.com wrote:
 comment far below...

 Brent Jones wrote:

 On Mon, Jan 26, 2009 at 10:40 PM, Brent Jones br...@servuhome.net wrote:





 --
 Brent Jones
 br...@servuhome.net



 I found some insight to the behavior I found at this Sun blog by Roch
 Bourbonnais : http://blogs.sun.com/roch/date/20080514

 Excerpt from the section that I seem to have encountered:

 The new code keeps track of the amount of data accepted in a TXG and
 the time it takes to sync. It dynamically adjusts that amount so that
 each TXG sync takes about 5 seconds (txg_time variable). It also
 clamps the limit to no more than 1/8th of physical memory. 

 So, when I fill up that transaction group buffer, that is when I see
 that 4-5 second I/O burst of several hundred megabytes per second.
 He also documents that the buffer flush can, and does issue delays to
 the writing threads, which is why I'm seeing those momentary drops in
 throughput and sluggish system performance while that write buffer is
 flushed to disk.


 Yes, this tends to be more efficient. You can tune it by setting
 zfs_txg_synctime, which is 5 by default.  It is rare that we've seen
 this be a win, which is why we don't mention it in the Evil Tuning
 Guide.

 Wish there was a better way to handle that, but at the speed I'm
 writing (and I'll be getting a 10GigE link soon), I don't see any
 other graceful methods of handling that much data in a buffer


 I think your workload might change dramatically when you get a
 faster pipe.  So unless you really feel compelled to change it, I
 wouldn't suggest changing it.
 -- richard

 Loving these X4540's so far though...





Are there any additional tuneables, such as opening a new txg buffer
before the previous one is flushed? Or otherwise allow writes to
continue without the tick delay? My workload will be pretty
consistent, it is going to serve a few roles, which I hope to
accomplish in the same units:
- large scale backups
- cifs share for window app servers
- nfs server for unix app servers

GigE quickly became the bottleneck, and I imagine 10GigE will add
further stress to those write buffers.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Unusual CIFS write bursts

2009-01-26 Thread Brent Jones
c4t7d0  ONLINE   0 0 0
c5t7d0  ONLINE   0 0 0
c6t7d0  ONLINE   0 0 0
c7t7d0  ONLINE   0 0 0
c8t7d0  ONLINE   0 0 0
c9t7d0  ONLINE   0 0 0
spares
  c6t2d0AVAIL
  c7t3d0AVAIL
  c8t4d0AVAIL
  c9t5d0AVAIL



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs recv' is very slow

2009-01-22 Thread Brent Jones
On Fri, Jan 9, 2009 at 11:41 PM, Brent Jones br...@servuhome.net wrote:
 On Fri, Jan 9, 2009 at 7:53 PM, Ian Collins i...@ianshome.com wrote:
 Ian Collins wrote:
 Send/receive speeds appear to be very data dependent.  I have several 
 different filesystems containing differing data types.  The slowest to 
 replicate is mail and my guess it's the changes to the index files that 
 takes the time.  Similar sized filesystems with similar deltas where files 
 are mainly added or deleted appear to replicate faster.


 Has anyone investigated this?  I have been replicating a server today
 and the differences between incremental processing is huge, for example:

 filesystem A:

 received 1.19Gb stream in 52 seconds (23.4Mb/sec)

 filesystem B:

 received 729Mb stream in 4564 seconds (164Kb/sec)

 I can delve further into the content if anyone is interested.

 --
 Ian.

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 What hardware, to/from is this?

 How are those filesystems laid out, what is their total size, used
 space, and guessable file count / file size distribution?

 I'm also trying to put together the puzzle to provide more detail to a
 case I opened with Sun regarding this.

 --
 Brent Jones
 br...@servuhome.net


Just to update this, hope no one is tired of hearing about it. I just
image-updated to snv_105 to obtain patch for CR 6418042 at the
recommendation from a Sun support technician.

My results are much improved, on the order of 5-100 times faster
(either over Mbuffer or SSH). Not only do snapshots begin sending
right away (no longer requiring several minutes of reads before
sending any data), the actual send will sustain about 35-50MB/sec over
SSH, and up to 100MB/s via Mbuffer (on a single Gbit link, I am
network limited now, something I never thought I would say I love to
see!).

Previously, I was lucky if the snapshot would begin sending any data
after about 10 minutes, and once it did begin sending, it would
usually peak at about 1MB/sec via SSH, and up to 20MB/sec over
Mbuffer.
Mbuffer seems to play a much larger role now, as SSH appears to only
be single threaded for compression/encryption, peaking a single CPU
worth of power.
Mbuffers raw network performance saturates my Gigabit link, and making
me consider link bonding or something to see how fast it -really- can
go, now that the taps are open!

So, my issues appears pretty much resolved, although snv_105 is in the
/dev branch, things appear stable for the most part.

Please let me know if you have any questions, or want additional info
on my setup and testing.

Regards,

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + OpenSolaris for home NAS?

2009-01-17 Thread Brent Jones
On Sat, Jan 17, 2009 at 2:46 PM, JZ j...@excelsioritsolutions.com wrote:


 I don't know if this email is even relevant to the list discussion. I will
 leave that conclusion to the smart mail server policy here.


*cough*


-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs recv' is very slow

2009-01-09 Thread Brent Jones
On Fri, Jan 9, 2009 at 7:53 PM, Ian Collins i...@ianshome.com wrote:
 Ian Collins wrote:
 Send/receive speeds appear to be very data dependent.  I have several 
 different filesystems containing differing data types.  The slowest to 
 replicate is mail and my guess it's the changes to the index files that 
 takes the time.  Similar sized filesystems with similar deltas where files 
 are mainly added or deleted appear to replicate faster.


 Has anyone investigated this?  I have been replicating a server today
 and the differences between incremental processing is huge, for example:

 filesystem A:

 received 1.19Gb stream in 52 seconds (23.4Mb/sec)

 filesystem B:

 received 729Mb stream in 4564 seconds (164Kb/sec)

 I can delve further into the content if anyone is interested.

 --
 Ian.

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


What hardware, to/from is this?

How are those filesystems laid out, what is their total size, used
space, and guessable file count / file size distribution?

I'm also trying to put together the puzzle to provide more detail to a
case I opened with Sun regarding this.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs recv' is very slow

2009-01-07 Thread Brent Jones
On Wed, Jan 7, 2009 at 12:36 AM, Andrew Gabriel andrew.gabr...@sun.com wrote:
 Brent Jones wrote:

 Reviving an old discussion, but has the core issue been addressed in
 regards to zfs send/recv performance issues? I'm not able to find any
 new bug reports on bugs.opensolaris.org related to this, but my search
 kung-fu may be weak.

 I raised:
 CR 6729347 Poor zfs receive performance across networks
 (Seems to still be in the Dispatched state nearly half a year later.)

 This relates mainly to full archives, and is most obvious when
 the disk throughput is the same order of magnitude as the network
 throughput. (It becomes less obvious if one is significantly
 different from the other, either way around.)

 There appears to be an additional problem for incrementals, which
 spend long periods sending almost no data at all (I presume this
 is when zfs send is searching for changed blocks to send).
 I don't know off-hand of a bugid for this.

 Using mbuffer can speed it up dramatically, but this seems like a hack
 without addressing a real problem with zfs send/recv.

 I don't think it's a hack, but something along these lines should
 be more properly integrated into the zfs receive command or
 documented.

 Trying to send any meaningful sized snapshots from say an X4540 takes
 up to 24 hours, for as little as 300GB changerate.

 Are those incrementals from a much larger filesystem?
 If so, that's probably mainly the the other problem.

Yah, the incrementals are from a 30TB volume, with about 1TB used.
Watching iostat on each side during the incremental sends, the sender
side is hardly doing anything, maybe 50iops read, and that could be
from other machines accessing it, really light load.
The receiving side however, for about 3 minutes it is peaking around
1500 iops reads, and no writes.
It will do that for 3-5 minutes, then it will calm down and only read
sporadically, and write about 1MB/sec.
Using Mbuffer can get the writes to spike to 20-30MB/sec, but the
initial massive reads still remain.

I have yet to devise a script that starts Mbuffer zfs recv on the
receiving side with proper parameters, then start an Mbuffer ZFS send
on the sending side, but I may work on one later this week.
I'd like the snapshots to be sent every 15 minutes, just to keep the
amount of change that needs to be sent as low as possible.

Not sure if its worth opening a case with Sun since we have a support
contract...


 --
 Andrew




-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send fails incremental snapshot

2009-01-06 Thread Brent Jones
On Mon, Jan 5, 2009 at 4:29 PM, Brent Jones br...@servuhome.net wrote:
 On Mon, Jan 5, 2009 at 2:50 PM, Richard Elling richard.ell...@sun.com wrote:
 Correlation question below...

 Brent Jones wrote:

 On Sun, Jan 4, 2009 at 11:33 PM, Carsten Aulbert
 carsten.aulb...@aei.mpg.de wrote:


 Hi Brent,

 Brent Jones wrote:


 I am using 2008.11 with the Timeslider automatic snapshots, and using
 it to automatically send snapshots to a remote host every 15 minutes.
 Both sides are X4540's, with the remote filesystem mounted read-only
 as I read earlier that would cause problems.
 The snapshots send fine for several days, I accumulate many snapshots
 at regular intervals, and they are sent without any problems.
 Then I will get the dreaded:
 
 cannot receive incremental stream: most recent snapshot of pdxfilu02
 does not match incremental source
 



 Which command line are you using?

 Maybe you need to do a rollback first (zfs receive -F)?

 Cheers

 Carsten
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



 I am using a command similar to this:

 zfs send -i pdxfilu01/arch...@zfs-auto-snap:frequent-2009-01-04-03:30
 pdxfilu01/arch...@zfs-auto-snap:frequent-2009-01-04-03:45 | ssh -c
 blowfish u...@host.com /sbin/zfs recv -d pdxfilu02

 It normally works, then after some time it will stop. It is still
 doing a full snapshot replication at this time (very slowly it seems,
 I'm bit by the bug of slow zfs send/resv)

 Once I get back on my regular snapshotting, if it comes out of sync
 again, I'll try doing a -F rollback and see if that helps.


 When this gets slow, are the other snapshot-related commands also
 slow?  For example, normally I see zfs list -t snapshot completing
 in a few seconds, but sometimes it takes minutes?
 -- richard



 I'm not seeing zfs related commands any slower. On the remote side, it
 builds up thousands of snapshots, and aside from SSH scrolling as fast
 as it can over the network, no other slowness.
 But the actual send and receive is getting very very slow, almost to
 the point of needing the scrap the project and find some other way to
 ship data around!

 --
 Brent Jones
 br...@servuhome.net


Got a small update on the ZFS send, I am in fact seeing ZFS list
taking several minutes to complete. I must have timed it correctly
during the send, and both sides are not completing the ZFS list, and
its been about 5 minutes already. There is a small amount of network
traffic between the two hosts, so maybe it's comparing what needs to
be sent, not sure.
I'll update when/if it completes.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs recv' is very slow

2009-01-06 Thread Brent Jones
On Sat, Dec 6, 2008 at 11:40 AM, Ian Collins i...@ianshome.com wrote:
 Richard Elling wrote:
 Ian Collins wrote:
 Ian Collins wrote:
 Andrew Gabriel wrote:
 Ian Collins wrote:
 I've just finished a small application to couple zfs_send and
 zfs_receive through a socket to remove ssh from the equation and the
 speed up is better than 2x.  I have a small (140K) buffer on the
 sending
 side to ensure the minimum number of sent packets

 The times I get for 3.1GB of data (b101 ISO and some smaller
 files) to a
 modest mirror at the receive end are:

 1m36s for cp over NFS,
 2m48s for zfs send though ssh and
 1m14s through a socket.

 So the best speed is equivalent to 42MB/s.
 It would be interesting to try putting a buffer (5 x 42MB = 210MB
 initial stab) at the recv side and see if you get any improvement.

 It took a while...

 I was able to get about 47MB/s with a 256MB circular input buffer. I
 think that's about as fast it can go, the buffer fills so receive
 processing is the bottleneck.  Bonnie++ shows the pool (a mirror) block
 write speed is 58MB/s.

 When I reverse the transfer to the faster box, the rate drops to 35MB/s
 with neither the send nor receive buffer filling.  So send processing
 appears to be the limit in this case.
 Those rates are what I would expect writing to a single disk.
 How is the pool configured?

 The slow system has a single mirror pool of two SATA drives, the
 faster one a stripe of 4 mirrors and an IDE SD boot drive.

 ZFS send though ssh from the slow to the fast box takes 189 seconds, the
 direct socket connection send takes 82 seconds.

 --
 Ian.

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Reviving an old discussion, but has the core issue been addressed in
regards to zfs send/recv performance issues? I'm not able to find any
new bug reports on bugs.opensolaris.org related to this, but my search
kung-fu may be weak.

Using mbuffer can speed it up dramatically, but this seems like a hack
without addressing a real problem with zfs send/recv.
Trying to send any meaningful sized snapshots from say an X4540 takes
up to 24 hours, for as little as 300GB changerate.



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send fails incremental snapshot

2009-01-05 Thread Brent Jones
On Sun, Jan 4, 2009 at 11:33 PM, Carsten Aulbert
carsten.aulb...@aei.mpg.de wrote:
 Hi Brent,

 Brent Jones wrote:
 I am using 2008.11 with the Timeslider automatic snapshots, and using
 it to automatically send snapshots to a remote host every 15 minutes.
 Both sides are X4540's, with the remote filesystem mounted read-only
 as I read earlier that would cause problems.
 The snapshots send fine for several days, I accumulate many snapshots
 at regular intervals, and they are sent without any problems.
 Then I will get the dreaded:
 
 cannot receive incremental stream: most recent snapshot of pdxfilu02
 does not match incremental source
 


 Which command line are you using?

 Maybe you need to do a rollback first (zfs receive -F)?

 Cheers

 Carsten
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I am using a command similar to this:

zfs send -i pdxfilu01/arch...@zfs-auto-snap:frequent-2009-01-04-03:30
pdxfilu01/arch...@zfs-auto-snap:frequent-2009-01-04-03:45 | ssh -c
blowfish u...@host.com /sbin/zfs recv -d pdxfilu02

It normally works, then after some time it will stop. It is still
doing a full snapshot replication at this time (very slowly it seems,
I'm bit by the bug of slow zfs send/resv)

Once I get back on my regular snapshotting, if it comes out of sync
again, I'll try doing a -F rollback and see if that helps.



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500, snv_101a, hd and zfs [SEC=UNCLASSIFIED]

2009-01-05 Thread Brent Jones
On Mon, Jan 5, 2009 at 6:55 PM, Elaine Ashton elaine.ash...@sun.com wrote:

 On Jan 5, 2009, at 9:33 PM, LEES, Cooper wrote:

 Elaine,

 Very bizarre problem you're having. I have no problems on either of
 my x4500s. One on 10u6 and one on indiana snv_101b_rc2.

 I agree, which is why I was hoping someone might know what the deal is.

 Just a straight 'hd' takes over a minute and a half. The real killer
 is /opt/SUNWhd/hd/bin/hdadm write_cache display all which displays
 all the write_cache states for each drive. This takes hours. How long
 does that take on your 101b system? I swear, something must be
 terribly amiss with this box, but I'm just not sure where to start
 looking.

 e.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I'd suggest opening a case with Sun but you ARE Sun  ;p
The 'hd' tools don't even work on the X4540's, and even the ILOM
webgui doesn't show the drive as even being installed (yet, I had 48,
1TB drives all working fine).
So, at least you're able to see your drives... sorta.
I -wish- I could see my drives cache status, state, FRU, etc...:(

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS send fails incremental snapshot

2009-01-04 Thread Brent Jones
Hello all,
I am using 2008.11 with the Timeslider automatic snapshots, and using
it to automatically send snapshots to a remote host every 15 minutes.
Both sides are X4540's, with the remote filesystem mounted read-only
as I read earlier that would cause problems.
The snapshots send fine for several days, I accumulate many snapshots
at regular intervals, and they are sent without any problems.
Then I will get the dreaded:

cannot receive incremental stream: most recent snapshot of pdxfilu02
does not match incremental source


Manually sending does not work, or destroying snapshots on the remote
side and resending the batch again from the earliest point in time.
The only way I have found that works, is to destroy the entire zfs
filesystem on the remote side, and begin anew.

Is there a way to force a ZFS receive, or to get more information
about what changed on the remote system to cause it not to accept
anymore snapshots?

Thank you in advance

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs mount hangs

2008-12-30 Thread Brent Jones
,
 but I would really like to be able to recover it to save some time.

 Anything special to look for in zdb output? Any other diagnostics that
 would be useful?

 Thanks in advance!

 Best Regards //Magnus


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I had a similar problem, but did not run truss to find the cause as it
was not a live filesystem yet.
Recreating the filesystem with the same name resulted in it not
mounting and just hanging, but if I created it with a different name
it would mount and run perfectly fine.
I settled on the new name and continued on and have no noticed the
problem again.
But seeing this post, I'll capture as much data as I can if it happens again.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool cannot replace a replacing device

2008-12-10 Thread Brent Jones
On Tue, Dec 9, 2008 at 8:37 AM, Courtney Malone
[EMAIL PROTECTED] wrote:
 I have another drive on the way, which will be handy in the future, but it 
 doesn't solve the problem that zfs wont let me manipulate that pool in a 
 manner that will return it to a non-degraded state, (even with a replacement 
 drive or hot spare, i have already tried adding a spare) and I don't have 
 somewhere to dump ~6TB of data and do a restore.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Did you file a bug report? If so, can you link it so we can see the
resolve (if one comes, even)

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs iscsi sustained write performance

2008-12-08 Thread Brent Jones
On Mon, Dec 8, 2008 at 3:09 PM, milosz [EMAIL PROTECTED] wrote:
 hi all,

 currently having trouble with sustained write performance with my setup...

 ms server 2003/ms iscsi initiator 2.08 w/intel e1000g nic directly connected 
 to snv_101 w/ intel e1000g nic.

 basically, given enough time, the sustained write behavior is perfectly 
 periodic.  if i copy a large file to the iscsi target, iostat reports 10 
 seconds or so of -no- writes to disk, just small reads... then 2-3 seconds of 
 disk-maxed writes, during which time windows reports the write performance 
 dropping to zero (disk queues maxed).

 so iostat will report something like this for each of my zpool disks (with 
 iostat -xtc 1)

 1s: %b 0
 2s: %b 0
 3s: %b 0
 4s: %b 0
 5s: %b 0
 6s: %b 0
 7s: %b 0
 8s: %b 0
 9s: %b 0
 10s: %b 0
 11s: %b 100
 12s: %b 100
 13s: %b 100
 14s: %b 0
 15s: %b 0

 it looks like solaris hangs out caching the writes and not actually 
 committing them to disk... when the cache gets flushed, the iscsitgt (or 
 whatever) just stops accepting writes.

 this is happening across controllers and zpools.  also, a test copy of a 10gb 
 file from one zpool to another (not iscsi) yielded similar iostat results: 10 
 seconds of big reads from the source zpool, 2-3 seconds of big writes to the 
 target zpool (target zpool is 5x  bigger than source zpool).

 anyone got any ideas?  point me in the right direction?

 thanks,

 milosz
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Are you running at compression? I see this behavior with heavy loads,
and GZIP compression enabled.
What does 'zfs get compression' say?

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs recv' is very slow

2008-11-14 Thread Brent Jones
On Fri, Nov 14, 2008 at 10:04 AM, Joerg Schilling
[EMAIL PROTECTED] wrote:
 Andrew Gabriel [EMAIL PROTECTED] wrote:

 Andrew Gabriel wrote:
  Interesting idea, but for 7200 RPM disks (and a 1Gb ethernet link), I
  need a 250GB buffer (enough to buffer 4-5 seconds worth of data). That's
  many orders of magnitude bigger than SO_RCVBUF can go.

 No -- that's wrong -- should read 250MB buffer!
 Still some orders of magnitude bigger than SO_RCVBUF can go.

 It's affordable e.g. on a X4540 with 64 GB of RAM.

 ZFS started with constraints that could not be made true in 2001.

 On my first Sun at home (a Sun 2/50 with 1 MB of RAM) in 1986, I could
 set the socket buffer size to 63 kB. 63kB : 1 MB is the same ratio
 as 256 MB : 4 GB.

 BTW: a lot of numbers in Solaris did not grow since a long time and
 thus create problems now. Just think about the maxphys values
 63 kB on x86 does not even allow to write a single BluRay disk sector
 with a single transfer.

 Jörg

 --
  EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
  URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I'd like to see Sun's position on the speed at which large file
systems perform ZFS send/receive.
I expect my X4540's to nearly fill 48TB (or more considering
compression), and taking 24 hours to transfer 100GB is, well, I could
do better on an ISDN line from 1995.

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] continuous replication

2008-11-12 Thread Brent Jones
On Wed, Nov 12, 2008 at 5:58 PM, River Tarnell
[EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Daryl Doami:
 As an aside, replication has been implemented as part of the new Storage
 7000 family.  Here's a link to a blog discussing using the 7000
 Simulator running in two separate VMs and replicating w/ each other:

 that's interesting, although 'less than a minute later' makes me suspect they
 might just be using snapshots and send/recv?

 presumably, if fishworks is based on (Open)Solaris, any new ZFS features they
 created will make it back into Solaris proper eventually...

- river.
 -BEGIN PGP SIGNATURE-

 iD8DBQFJG4m5IXd7fCuc5vIRAvY3AJ0dRRblJhwfA7X/s8CUU775hd3HNgCffARy
 x8Vryc/+Fl+a4pjJWN/KsDM=
 =ImHD
 -END PGP SIGNATURE-


Yah from what I can tell, it's just using
already-there-but-easier-to-look-at approach.
Not belittling the accomplishment, rolling all the system tools into a
coherent package is great, and the analytic is just awesome.

I am doing a similar project, and weighed several options for
replication. AVS was coveted for it's near real time replication and
ability to switch directions to replicate to the primary if you had
a fail-over.
But some AVS limitations[1] are probably going to make us use zfs
send/receive and it should keep up (delta per day is ~100GB)

We will be testing both methods here in the next few weeks, will keep
the list posted to our findings.

[1] sending drive rebuilds over the link sucks

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenStorage GUI

2008-11-11 Thread Brent Jones
On Tue, Nov 11, 2008 at 9:52 AM, Adam Leventhal [EMAIL PROTECTED] wrote:
 On Nov 11, 2008, at 9:38 AM, Bryan Cantrill wrote:

 Just to throw some ice-cold water on this:

  1.  It's highly unlikely that we will ever support the x4500 --
 only the
  x4540 is a real possibility.


 And to warm things up a bit: there's already an upgrade path from the
 x4500 to the x4540 so that would be required before any upgrade to the
 equivalent of the Sun Storage 7210.

 Adam

 --
 Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


We just ordered several X4540's, excited to get them in place soon.
Having the Openstorage GUI as an option down the road is very
appealing for our VM/hosted side, after we install this bulk storage
environment.

Wish I could get my hands on a beta of this GUI...

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] VERY URGENT Compliance for ZFS

2008-11-10 Thread Brent Jones
On Mon, Nov 10, 2008 at 12:42 PM, Keith Bierman [EMAIL PROTECTED] wrote:

 On Nov 10, 2008, at 4:47 AM, Vikash Gupta wrote:

 Hi Parmesh,

 Looks like this tender specification meant for Veritas.

 How do you handle this particular clause ?
 Shall provide Centralized, Cross platform, Single console management
 GUI

 Does it really make sense to have a discussion like this on an
 external open list? Contracts are customarily private, and company
 confidential.

 --
 Keith H. Bierman   [EMAIL PROTECTED]  | AIM kbiermank
 5430 Nassau Circle East  |
 Cherry Hills Village, CO 80113   | 303-997-2749
 speaking for myself* Copyright 2008




 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Not sure disclosing confidential information online will help dispel
any concerns about the contract...


-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to diagnose zfs - iscsi - nfs hang

2008-11-07 Thread Brent Jones
On Fri, Nov 7, 2008 at 9:11 AM, Jacob Ritorto [EMAIL PROTECTED] wrote:
 I have a PC server running Solaris 10 5/08 which seems to frequently become 
 unable to share zfs filesystems via the shareiscsi and sharenfs options.  It 
 appears, from the outside, to be hung -- all clients just freeze, and while 
 they're able to ping the host, they're not able to transfer nfs or iSCSI 
 data.  They're in the same subnet and I've found no network problems thus far.

 After hearing so much about the Marvell problems I'm beginning to wonder it 
 they're the culprit, though they're supposed to be fixed in 127128-11, which 
 is the kernel I'm running.

 I have an exact hardware duplicate of this machine running Nevada b91 (iirc) 
 that doesn't exhibit this problem.

 There's nothing in /var/adm/messages and I'm not sure where else to begin.

 Would someone please help me in diagnosing this failure?

 thx
 jake
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I saw this in Nev b87, where for whatever reason, CIFS and NFS would
completely hang and no longer serve requests (I don't use iscsi,
unable to confirm if that had hung too).
The server was responsive, SSH was fine and could execute commands,
clients could ping it and reach it, but CIFS and NFS were essentially
hung.
Intermittently, the system would recover and resume offering shares,
no triggering events could be correlated.
Since upgrading to newer builds, I haven't seen similar issues.

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs recv' is very slow

2008-11-06 Thread Brent Jones
On Thu, Nov 6, 2008 at 4:19 PM, River Tarnell
[EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Ian Collins:
 That's very slow.  What's the nature of your data?

 mainly two sets of mid-sized files; one of 200KB-2MB in size and other under
 50KB.  they are organised into subdirectories, A/B/C/file.  each directory
 has 18,000-25,000 files.  total data size is around 2.5TB.

 hm, something changed while i was writing this mail: now the transfer is
 running at 2MB/sec, and the read i/o has disappeared.  that's still slower 
 than
 i'd expect, but an improvement.

 Time each phase (send to a file, copy the file to B and receive from the 
 file).  When I tried this on a filesystem with a range of file sizes, I had 
 about 30% of the total transfer time in send, 50% in copy and 20% in receive.

 i'd rather not interrupt the current send, as it's quite large.  once it's
 finished, i'll test with smaller changes...

- river.
 -BEGIN PGP SIGNATURE-

 iD8DBQFJE4mXIXd7fCuc5vIRAv0/AJoCRtMBN1/WD7zVVRzV2n4xeqBvyACeLNL/
 rLB1iHlu4xZdUPSiNj/iWl4=
 =+F7d
 -END PGP SIGNATURE-
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Theres been a couple threads about this now, tracked some bug ID's/ticket:

6333409
6418042
66104157

If you wanna see the status

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Windows XP nfs client poor performance

2008-10-20 Thread Brent Jones
On Mon, Oct 20, 2008 at 9:29 AM, Bob Bencze [EMAIL PROTECTED] wrote:
 Greetings.
 I have a X4500 with an 8TB RAIDZ datapool, currently 75% full. I have it 
 carved up into several  filesystems. I share out two of the  filesystems 
 /datapool/data4 (approx 1.5TB) and /datapool/data5 (approx 3.5TB). THe data 
 is imagery, and the primary application on the PCs is Socetset.
 The clients are Windows XP Pro, and I use services for unix (SFU) to mount 
 the nfs shares from the thumper. When a client PC accesses files from data4, 
 they come across quickly. When the same client accesses files from data5, the 
 transfer rate comes to a crawl, and sometimes the application times out.
 The only difference I can see is the size of the volume, the data is all of 
 the same type.

 I could find no references for any limitations on the volume size of nfs 
 shares or mounts. It seems inconsistent and difficult to duplicate. I plan to 
 begin a more in-depth troubleshooting of the problem with dtrace.

 Has anyone seen anything like this before?

 Thanks.

 -Bob Bencze
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


SFU NFS is often slow, but tunable, here is something you might find
handy to squeeze some speed out of it:
http://technet.microsoft.com/en-us/library/bb463205.aspx

HTH

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Brent Jones
On Wed, Oct 15, 2008 at 2:17 PM, Scott Williamson
[EMAIL PROTECTED] wrote:
 Hi All,

 Just want to note that I had the same issue with zfs send + vdevs that had
 11 drives in them on a X4500. Reducing the count of drives per zvol cleared
 this up.

 One vdev is IOPS limited to the speed of one drive in that vdev, according
 to this post (see comment from ptribble.)


Scott,

Can you tell us the configuration that you're using that is working for you?
Were you using RaidZ, or RaidZ2? I'm wondering what the sweetspot is
to get a good compromise in vdevs and usable space/performance

Thanks!

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...

2008-10-14 Thread Brent Jones
On Tue, Oct 14, 2008 at 12:31 AM, Gray Carper [EMAIL PROTECTED] wrote:
 Hey, all!

 We've recently used six x4500 Thumpers, all publishing ~28TB iSCSI targets 
 over ip-multipathed 10GB ethernet, to build a ~150TB ZFS pool on an x4200 
 head node. In trying to discover optimal ZFS pool construction settings, 
 we've run a number of iozone tests, so I thought I'd share them with you and 
 see if you have any comments, suggestions, etc.

 First, on a single Thumper, we ran baseline tests on the direct-attached 
 storage (which is collected into a single ZFS pool comprised of four raidz2 
 groups)...

 [1GB file size, 1KB record size]
 Command: iozone -i o -i 1 -i 2 -r 1k -s 1g -f /data-das/perftest/1gbtest
 Write: 123919
 Rewrite: 146277
 Read: 383226
 Reread: 383567
 Random Read: 84369
 Random Write: 121617

 [8GB file size, 512KB record size]
 Command:
 Write:  373345
 Rewrite:  665847
 Read:  2261103
 Reread:  2175696
 Random Read:  2239877
 Random Write:  666769

 [64GB file size, 1MB record size]
 Command: iozone -i o -i 1 -i 2 -r 1m -s 64g -f /data-das/perftest/64gbtest
 Write: 517092
 Rewrite: 541768
 Read: 682713
 Reread: 697875
 Random Read: 89362
 Random Write: 488944

 These results look very nice, though you'll notice that the random read 
 numbers tend to be pretty low on the 1GB and 64GB tests (relative to their 
 sequential counterparts), but the 8GB random (and sequential) read is 
 unbelievably good.

 Now we move to the head node's iSCSI aggregate ZFS pool...

 [1GB file size, 1KB record size]
 Command: iozone -i o -i 1 -i 2 -r 1k -s 1g -f 
 /volumes/data-iscsi/perftest/1gbtest
 Write:  127108
 Rewrite:  120704
 Read:  394073
 Reread:  396607
 Random Read:  63820
 Random Write:  5907

 [8GB file size, 512KB record size]
 Command: iozone -i 0 -i 1 -i 2 -r 512 -s 8g -f 
 /volumes/data-iscsi/perftest/8gbtest
 Write:  235348
 Rewrite:  179740
 Read:  577315
 Reread:  662253
 Random Read:  249853
 Random Write:  274589

 [64GB file size, 1MB record size]
 Command: iozone -i o -i 1 -i 2 -r 1m -s 64g -f 
 /volumes/data-iscsi/perftest/64gbtest
 Write:  190535
 Rewrite:  194738
 Read:  297605
 Reread:  314829
 Random Read:  93102
 Random Write:  175688

 Generally speaking, the results look good, but you'll notice that random 
 writes are atrocious on the 1GB tests and random reads are not so great on 
 the 1GB and 64GB tests, but the 8GB test looks great across the board. 
 Voodoo! ; Incidentally, I ran all these tests against the ZFS pool in disk, 
 raidz1, and raidz2 modes - there were no significant changes in the results.

 So, how concerned should we be about the low scores here and there? Any 
 suggestions on how to improve our configuration? And how excited should we be 
 about the 8GB tests? ;

 Thanks so much for any input you have!
 -Gray
 ---
 University of Michigan
 Medical School Information Services
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Your setup sounds very interesting how you export iSCSI to another
head unit, can you give me some more details on your file system
layout, and how you mount it on the head unit?
Sounds like a pretty clever way to export awesomely large volumes!

Regards,

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Segmentation fault / core dump with recursive send/recv

2008-10-09 Thread Brent Jones
On Wed, Oct 8, 2008 at 10:49 PM, BJ Quinn [EMAIL PROTECTED] wrote:
 Oh and I had been doing this remotely, so I didn't notice the following error 
 before -

 receiving incremental stream of datapool/[EMAIL PROTECTED] into backup/[EMAIL 
 PROTECTED]
 cannot receive incremental stream: destination backup/shares has been modified
 since most recent snapshot

 This is reported after the first snapshot, BACKUP081007 gets copied, and then 
 it quits.  I don't see why it would have been modified.  I guess it's 
 possible I cd'ed into the backup directory at some point during the 
 send/recv, but I don't think so.  Should I set the readonly property on the 
 backup FS or something?
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Correct, the other side should be set Read Only, that way nothing at
all is modified when the other hosts tries to zfs send.

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quantifying ZFS reliability

2008-09-29 Thread Brent Jones
On Mon, Sep 29, 2008 at 9:28 PM, Richard Elling [EMAIL PROTECTED] wrote:
 Ahmed Kamal wrote:
 Hi everyone,

 We're a small Linux shop (20 users). I am currently using a Linux
 server to host our 2TBs of data. I am considering better options for
 our data storage needs. I mostly need instant snapshots and better
 data protection. I have been considering EMC NS20 filers and Zfs based
 solutions. For the Zfs solutions, I am considering NexentaStor product
 installed on a pogoLinux StorageDirector box. The box will be mostly
 sharing 2TB over NFS, nothing fancy.

 Now, my question is I need to assess the zfs reliability today Q4-2008
 in comparison to an EMC solution. Something like EMC is pretty mature
 and used at the most demanding sites. Zfs is fairly new, and from time
 to time I have heard it had some pretty bad bugs. However, the EMC
 solution is like 4X more expensive. I need to somehow quantify the
 relative quality level, in order to judge whether or not I should be
 paying all that much to EMC. The only really important reliability
 measure to me, is not having data loss!
 Is there any real measure like percentage of total corruption of a
 pool that can assess such a quality, so you'd tell me zfs has pool
 failure rate of 1 in a 10^6, while EMC has a rate of 1 in a 10^7. If
 not, would you guys rate such a zfs solution as ??% the reliability of
 an EMC solution ?

 EMC does not, and cannot, provide end-to-end data validation.  So how
 would measure its data reliability?  If you search the ZFS-discuss archives,
 you will find instances where people using high-end storage also had data
 errors detected by ZFS.  So, you should consider them complementary rather
 than adversaries.
  -- richard

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Key word, detected  :)
I recall a few of those, I'll cite later when I'm not tired, where
people used insert SAN vendor here as the iSCSI target, with ZFS on
top of.
ZFS was quite capable of detecting errors, but since they did not let
ZFS handle the RAID, and instead relied on another level, ZFS was not
able to correct the errors.

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >