Re: [zfs-discuss] Snapshot recycle freezes system activity
On Mon, Mar 08, 2010 at 03:18:34PM -0500, Miles Nordin wrote: gm == Gary Mills mi...@cc.umanitoba.ca writes: gm destroys the oldest snapshots and creates new ones, both gm recursively. I'd be curious if you try taking the same snapshots non-recursively instead, does the pause go away? I'm still collecting statistics, but that is one of the things I'd like to try. Because recursive snapshots are special: they're supposed to atomically synchronize the cut-point across all the filesystems involved, AIUI. I don't see that recursive destroys should be anything special though. gm Is it destroying old snapshots or creating new ones that gm causes this dead time? sortof seems like you should tell us this, not the other way around. :) Seriously though, isn't that easy to test? And I'm curious myself too. Yes, that's another thing I'd like to try. I'll just put a `sleep' in the script between the two actions to see if the dead time moves later in the day. -- -Gary Mills--Unix Group--Computer and Network Services- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Snapshot recycle freezes system activity
On Mon, Mar 08, 2010 at 01:23:10PM -0800, Bill Sommerfeld wrote: On 03/08/10 12:43, Tomas Ögren wrote: So we tried adding 2x 4GB USB sticks (Kingston Data Traveller Mini Slim) as metadata L2ARC and that seems to have pushed the snapshot times down to about 30 seconds. Out of curiosity, how much physical memory does this system have? Mine has 64 GB of memory with the ARC limited to 32 GB. The Cyrus IMAP processes, thousands of them, use memory mapping extensively. I don't know if this design affects the snapshot recycle behavior. -- -Gary Mills--Unix Group--Computer and Network Services- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] rpool devaliases
Can I create a devalias to boot the other mirror similar to UFS? Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)
On Mar 8, 2010, at 11:46 PM, ольга крыжановская olga.kryzh anov...@gmail.com wrote: tmpfs lacks features like quota and NFSv4 ACL support. May not be the best choice if such features are required. True, but if the OP is looking for those features they are more then unlikely looking for an in-memory file system. This would be more for something like temp databases in a RDBMS or a cache of some sort. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to verify ecc for ram is active and enabled?
Yay! Something where I can contribute! Iam a hardware guy trying to live in a software world, but I think I know how this one works. The reason is that the vendor (ACER) of the mainboard says it is not supported, and I can not get into the bios any more, but osol boots fine and sees 8GB. Crucial says it's not supported because Acer says it's not supported... This is an MCP78S based motherboard (apparently equivalent Asus and Gigabyte boards are _supported_ platforms for this memory)... The chipset may support ECC memory, and reply just fine to the OS and drivers that no errors have occurred, and the memory chips may check ECC and generate the ECC error signal to the chip set, but if the motherboard does not have a copper trace between the pin on the memory socket that connects to the ECC error pin on the memory DIMM and the pin on the chipset that receives the error signal, the chip set will never hear the memory complain about ECC errors whether they happen or not. The phone line is cut. If the motherboard maker doesn't assure you it's connected by telling you that explicitly, or worse yet says it's not supported, chances are it's not supported. Support for a memory DIMM does not necessarily mean that the ECC works, only that the regular memory works. I did not buy a Gigabyte board for the home server I'm laboriously (for a hardware guy in a software land) getting running, because although Gigabyte says they support the ECC memory DIMMs, they do not have any BIOS means for enabling/disabling the ECC in BIOS, and that tells me that they *tolerate* ECC DIMMs rather than *using* the ECC functions. ASUS, for the same chipset in my case, has a BIOS setting for enable/disable ECC reporting, so they have at least considered. it. I have the same issue coming up, because even if ASUS lets you turn reporting on an off, that's NOT a guarantee that the copper trace is there and all connected. I read in this forum a method for inducing ECC errors involving holding a tungsten incandescent bulb near the DIMMs to induce errors. It's worth a search. I will be doing that test when I get to the point where I have the thing running well enough for the test to be meaningful. R.G. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recover rpool
I redirected the console to the serial port and managed to capture the panic information below: SunOS Release 5.11 Version snv_111b 64-bit Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. panic[cpu0]/thread=ff0007c39c60: mutex_enter: bad mutex, lp=e8 owner=f000ec62f000ec60 thread=ff0007c39c60 ff0007c38ca0 unix:mutex_panic+73 () ff0007c38d00 unix:mutex_vector_enter+446 () ff0007c38d50 genunix:kmem_slab_alloc+31 () ff0007c38db0 genunix:kmem_cache_alloc+130 () ff0007c38dd0 zfs:zio_buf_alloc+2c () ff0007c38e10 zfs:arc_get_data_buf+173 () ff0007c38e60 zfs:arc_buf_alloc+a2 () ff0007c38f00 zfs:arc_read_nolock+137 () ff0007c38fa0 zfs:arc_read+75 () ff0007c390d0 zfs:scrub_visitbp+161 () ff0007c391e0 zfs:scrub_visitbp+27c () ff0007c392f0 zfs:scrub_visitbp+21d () ff0007c39400 zfs:scrub_visitbp+21d () ff0007c39510 zfs:scrub_visitbp+21d () ff0007c39620 zfs:scrub_visitbp+21d () ff0007c39730 zfs:scrub_visitbp+21d () ff0007c39840 zfs:scrub_visitbp+432 () ff0007c39890 zfs:scrub_visit_rootbp+4f () ff0007c398f0 zfs:scrub_visitds+7e () ff0007c39aa0 zfs:dsl_pool_scrub_sync+126 () ff0007c39b10 zfs:dsl_pool_sync+192 () ff0007c39ba0 zfs:spa_sync+32a () ff0007c39c40 zfs:txg_sync_thread+265 () ff0007c39c50 unix:thread_start+8 () skipping system dump - no dump device configured rebooting... Can anyone tell me what is going wrong? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can you manually trigger spares?
On Mon, 8 Mar 2010, Tim Cook wrote: Is there a way to manually trigger a hot spare to kick in? Yes - just use 'zpool replace fserv 12589257915302950264 c3t6d0'. That's all the fma service does anyway. If you ever get your drive to come back online, the fma service should recognize that and resilver it, switching the spare back to AVAIL. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] rpool devaliases
On 09/03/2010 13:18, Tony MacDoodle wrote: Can I create a devalias to boot the other mirror similar to UFS? yes ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to verify ecc for ram is active and enabled?
I'm curious to know whether the following output : bash-4.0# echo memscrub_scans_done/U | mdb -k memscrub_scans_done: memscrub_scans_done:1985 means that Solaris considers ECC memory is effectively installed (the fact that it is non-zero)? I have installed unbuffered ECC memory (2x4GB crucial kit CT2KIT51272AA667). The reason is that the vendor (ACER) of the mainboard says it is not supported, and I can not get into the bios any more, but osol boots fine and sees 8GB. Crucial says it's not supported because Acer says it's not supported... This is an MCP78S based motherboard (apparently equivalent Asus and Gigabyte boards are _supported_ platforms for this memory)... The following output from smbios: 0 78 SMB_TYPE_BIOS (BIOS information) Vendor: Phoenix Technologies, LTD Version String: R01-B0 Release Date: 03/31/2009 Address Segment: 0xe000 ROM Size: 524288 bytes Image Size: 131072 bytes Characteristics: 0x7fcb9e90 SMB_BIOSFL_ISA (ISA is supported) SMB_BIOSFL_PCI (PCI is supported) SMB_BIOSFL_PLUGNPLAY (Plug and Play is supported) SMB_BIOSFL_APM (APM is supported) SMB_BIOSFL_FLASH (BIOS is Flash Upgradeable) SMB_BIOSFL_SHADOW (BIOS shadowing is allowed) SMB_BIOSFL_CDBOOT (Boot from CD is supported) SMB_BIOSFL_SELBOOT (Selectable Boot supported) SMB_BIOSFL_ROMSOCK (BIOS ROM is socketed) SMB_BIOSFL_EDD (EDD Spec is supported) SMB_BIOSFL_525_360K (int 0x13 5.25 360K floppy) SMB_BIOSFL_525_12M (int 0x13 5.25 1.2M floppy) SMB_BIOSFL_35_720K (int 0x13 3.5 720K floppy) SMB_BIOSFL_35_288M (int 0x13 3.5 2.88M floppy) SMB_BIOSFL_I5_PRINT (int 0x5 print screen svcs) SMB_BIOSFL_I9_KBD (int 0x9 8042 keyboard svcs) SMB_BIOSFL_I14_SER (int 0x14 serial svcs) SMB_BIOSFL_I17_PRINTER (int 0x17 printer svcs) SMB_BIOSFL_I10_CGA (int 0x10 CGA svcs) Characteristics Extension Byte 1: 0x33 SMB_BIOSXB1_ACPI (ACPI is supported) SMB_BIOSXB1_USBL (USB legacy is supported) SMB_BIOSXB1_LS120 (LS-120 boot is supported) SMB_BIOSXB1_ATZIP (ATAPI ZIP drive boot is supported) Characteristics Extension Byte 2: 0x5 SMB_BIOSXB2_BBOOT (BIOS Boot Specification supported) SMB_BIOSXB2_ETCDIST (Enable Targeted Content Distrib.) Version Number: 0.0 Embedded Ctlr Firmware Version Number: 0.0 IDSIZE TYPE 1 78 SMB_TYPE_SYSTEM (system information) Manufacturer: Acer Product: Aspire X3200 Version: R01-A3 Serial Number: 9E3PM75C7P839053093003 UUID: ---- Wake-Up Event: 0x6 (power switch) SKU Number: Family: IDSIZE TYPE 2 62 SMB_TYPE_BASEBOARD (base board) Manufacturer: Acer Product: WMCP78M Version: Serial Number: 00 Asset Tag: Location Tag: Chassis: 48 Flags: 0x1 SMB_BBFL_MOTHERBOARD (board is a motherboard) Board Type: 0xa (motherboard) IDSIZE TYPE 3 76 SMB_TYPE_CHASSIS (system enclosure or chassis) Manufacturer: Acer Version: Serial Number: 00 Asset Tag: 00 OEM Data: 0x0 Lock Present: N Chassis Type: 0x3 (desktop) Boot-Up State: 0x2 (unknown) Power Supply State: 0x2 (unknown) Thermal State: 0x2 (unknown) Chassis Height: 0u Power Cords: 0 Element Records: 0 IDSIZE TYPE 4 101 SMB_TYPE_PROCESSOR (processor) Manufacturer: AMD Version: AMD Phenom(tm) 9550 Quad-Core Processor Serial Number: Asset Tag: Location Tag: Socket AM2 Part Number: Family: 1 (other) CPUID: 0x178bfbff00100f23 Type: 3 (central processor) Socket Upgrade: 4 (ZIF socket) Socket Status: Populated Processor Status: 1 (enabled) Supported Voltages: 1.2V External Clock Speed: Unknown Maximum Speed: 2200MHz Current Speed: 2200MHz L1 Cache: 8 L2 Cache: 9 L3 Cache: None IDSIZE TYPE 8 33 SMB_TYPE_CACHE (processor cache) Location Tag: Internal Cache Level: 1 Maximum Installed Size: 131072 bytes Installed Size: 131072 bytes Speed: Unknown Supported SRAM Types: 0x20 SMB_CAT_SYNC (synchronous) Current SRAM Type: 0x20 (synchronous) Error Correction Type: 2 (unknown) Logical Cache Type: 2 (unknown) Associativity: 2 (unknown) Mode: 1 (write-back) Location: 0 (internal) Flags: 0x1 SMB_CAF_ENABLED (enabled at boot time) IDSIZE TYPE 9 33 SMB_TYPE_CACHE (processor cache) Location Tag: External Cache Level: 2 Maximum Installed Size: 524288 bytes Installed Size: 524288 bytes Speed: Unknown Supported SRAM Types: 0x20 SMB_CAT_SYNC (synchronous) Current SRAM Type: 0x20 (synchronous) Error Correction Type: 2 (unknown) Logical Cache Type: 2 (unknown) Associativity: 2 (unknown) Mode: 1 (write-back) Location: 0 (internal) Flags: 0x1 SMB_CAF_ENABLED (enabled at boot time)
Re: [zfs-discuss] Should ZFS write data out when disk are idle
I am talking about having a write queue, which points to ready to write, full stripes. Ready to write full stripes would be *The last byte of the full stripe has been updated. *The file has been closed for writing. (Exception to the above rule) I believe there is now a scheduler for ZFS, to handle reads and write conflicts. For example on a large Multi-Gigabyte NVRAM array, the only big consideration is how big is the Fibre Channel pipe is and the limit on outstanding I/Os But on SATA off the motherboard, then it is about how much RAM cache each disk has is a consideration as well as the speed of the SATA connection as well as the number of outstanding I/Os When it comes time to do txg some of the record blocks (most of the full 128k ones) will have been written out already. If we have only written out full record blocks then there has been no performance loss. Eventually a txg going to happen, eventually these full writes will need to happen, but if we can choose a less busy time for them all the better. e.g. on a raidz with 5 disks, if I have 128x4 worth of data to write, lets write it. on a mirror if I have 128k worth to write, lets write it. (record size 128k), or let it be a tunable for zpool, as some arrays (RAID5) like to have larger chunks of data. Why wait for the txg if the disk are not being pressured for reads. Rather than a pause every 30 seconds. Bob wrote : (I may not have explained it well enough) It is not true that there is no cost though. Since ZFS uses COW, this approach requires that new blocks be allocated and written at a much higher rate. There is also an opportunity cost in that if a read comes in while these continuous writes are occurring, the read will be delayed. At some stage a write needs to happen. **Full** writes have very small COW cost compare with small writes. As I said above I talking about a write of 4x128k on a 5 disk raidz before the write would happen early. There are many applications which continually write/overwrite file content, or which update a file at a slow pace. For example, log files are typically updated at a slow rate. Updating a block requires reading it first (if it is not already cached in the ARC), which can be quite expensive. By waiting a bit longer, there is a much better chance that the whole block is overwritten, so zfs can discard the existing block on disk without bothering to re-read it. Apps which update at slow pace will not trigger the above early write, until they have at least written a record size worth of data, application which write slow than 128k (recordsize) in more than 30 secs will never trigger the early write on a mirrored disk or even a raidz setup. What this will catch is the big writer of files greater than 128k (recordsize) on mirrored disk; and files larger than (4x128k) on RaidZ 5disks sets. So that commands like dd if=x of=y bs=512k will not cause issues (pauses/delays) when the txg timeout. PS I already set zfs:zfs_write_limit_override and I would not recommend anyone to set this very low to get the above effect. It's just an idea on how to prevent the delay effect, it may not be practical? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recover rpool
When I boot from a snv133 live cd and attempt to import the rpool it panics with this output: Sun Microsystems Inc. SunOS 5.11 snv_133 February 2010 j...@opensolaris:~$ pfexec su Mar 9 03:11:37 opensolaris su: 'su root' succeeded for jack on /dev/console j...@opensolaris:~# zpool import -f -o ro -o failmode=continue -R /mnt rpool panic[cpu1]/thread=ff00086e0c60: BAD TRAP: type=e (#pf Page fault) rp=ff00086dfe60 addr=278 occurred in module unix due to a NULL pointer dereference sched: #pf Page fault Bad kernel fault at addr=0x278 pid=0, pc=0xfb862b6b, sp=0xff00086dff58, eflags=0x10246 cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de cr2: 278cr3: c80cr8: c rdi: 278 rsi:4 rdx: ff00086e0c60 rcx:0 r8: 40 r9:21d9a rax:0 rbx:0 rbp: ff00086dffb0 r10: 7f6fc8 r11: 6e r12:0 r13: 278 r14:4 r15: ff01cfe27e08 fsb:0 gsb: ff01ccfa5080 ds: 4b es: 4b fs:0 gs: 1c3 trp:e err:2 rip: fb862b6b cs: 30 rfl:10246 rsp: ff00086dff58 ss: 38 ff00086dfd40 unix:die+dd () ff00086dfe50 unix:trap+177e () ff00086dfe60 unix:cmntrap+e6 () ff00086dffb0 unix:mutex_enter+b () ff00086dffd0 zfs:zio_buf_alloc+2c () ff00086e0010 zfs:arc_get_data_buf+173 () ff00086e0060 zfs:arc_buf_alloc+a2 () ff00086e0100 zfs:arc_read_nolock+12f () ff00086e01a0 zfs:arc_read+75 () ff00086e0230 zfs:scrub_prefetch+b9 () ff00086e02f0 zfs:scrub_visitbp+5f1 () ff00086e03b0 zfs:scrub_visitbp+6e3 () ff00086e0470 zfs:scrub_visitbp+6e3 () ff00086e0530 zfs:scrub_visitbp+6e3 () ff00086e05f0 zfs:scrub_visitbp+6e3 () ff00086e06b0 zfs:scrub_visitbp+6e3 () ff00086e0750 zfs:scrub_visitdnode+84 () ff00086e0810 zfs:scrub_visitbp+1a6 () ff00086e0860 zfs:scrub_visit_rootbp+4f () ff00086e08c0 zfs:scrub_visitds+7e () ff00086e0a80 zfs:dsl_pool_scrub_sync+163 () ff00086e0af0 zfs:dsl_pool_sync+25b () ff00086e0ba0 zfs:spa_sync+36f () ff00086e0c40 zfs:txg_sync_thread+24a () ff00086e0c50 unix:thread_start+8 () -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] new video: George Wilson on ZFS Dedup
Brand new video! George Wilson on ZFS Dedup - Oracle Solaris Video http://bit.ly/b5MMpn -- best regards, Deirdré Straughan Solaris Technical Content blog: Un Posto al Sole http://blogs.sun.com/deirdre/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] what to do when errors occur during scrub
[I hope this isn't a repost double whammy. I posted this message under `Message-ID: 87fx4ai5sp@newsguy.com' over 15 hrs ago but it never appeared on my nntp server (gmane) far as I can see] I'm a little at a loss here as to what to do about these two errors that turned up during a scrub. The discs involved are a matched pair in mirror mode. zpool status -v z3 (wrapped for mail): ---- ---=--- - scrub: scrub completed after 1h48m with 2 errors on Mon Mar 8 10:26:49 2010 config: NAMESTATE READ WRITE CKSUM z3 ONLINE 0 0 2 mirror-0 ONLINE 0 0 4 c5d0ONLINE 0 0 4 c6d0ONLINE 0 0 4 errors: Permanent errors have been detected in the following files: [NOTE: Edited to ease reading -ed -hp] z3/proje...@zfs-auto-snap:monthly-2009-08-30-09:26:/Training/\ [... huge path snipped ...]/2_Database.mov /t/bk-test-DiskDamage-021710_005252/rsnap/misc/hourly.4/\ [... huge path snipped ...]/es.utf-8.sug ---- ---=--- - Those are just two on disk files. Can it be as simple as just deleting them? Or is something more technical required. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] what to do when errors occur during scrub
I'm a little at a loss here as to what to do about these two errors that turned up during a scrub. The discs involved are a matched pair in mirror mode. zpool status -v z3 (wrapped for mail): ---- ---=--- - scrub: scrub completed after 1h48m with 2 errors on Mon Mar 8 10:26:49 2010 config: NAMESTATE READ WRITE CKSUM z3 ONLINE 0 0 2 mirror-0 ONLINE 0 0 4 c5d0ONLINE 0 0 4 c6d0ONLINE 0 0 4 errors: Permanent errors have been detected in the following files: [NOTE: Edited to ease reading -ed -hp] z3/proje...@zfs-auto-snap:monthly-2009-08-30-09:26:/Training/\ [... huge path snipped ...]/2_Database.mov /t/bk-test-DiskDamage-021710_005252/rsnap/misc/hourly.4/\ [... huge path snipped ...]/es.utf-8.sug ---- ---=--- - Those are just two on disk files. Can it be as simple as just deleting them? Or is something more technical required. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)
Ross is correct - advanced OS features are not required here - just the ability to store a file - don’t even need unix style permissions -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Ross Walker Sent: Tuesday, March 09, 2010 6:23 AM To: ольга крыжановская Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop) On Mar 8, 2010, at 11:46 PM, ольга крыжановская olga.kryzh anov...@gmail.com wrote: tmpfs lacks features like quota and NFSv4 ACL support. May not be the best choice if such features are required. True, but if the OP is looking for those features they are more then unlikely looking for an in-memory file system. This would be more for something like temp databases in a RDBMS or a cache of some sort. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recover rpool
Found a site that recommended setting the following system file entries set zfs:zfs_recover=1 set aok=1 and running this command zdb -e -bcsvL rpool but I get the following error: Traversing all blocks to verify checksums ... out of memory -- generating core dump Abort The laptop has 4GB of memory, and I did not see memory utilization pass 400MB. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] and another video: ZFS Dynamic LUN Expansion
And another brand-new video: ZFS Dynamic LUN Expansion - Oracle Solaris Video http://bit.ly/cwwCZl -- best regards, Deirdré Straughan Solaris Technical Content blog: Un Posto al Sole http://blogs.sun.com/deirdre/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recover rpool
Hi D, Is this a 32-bit system? We were looking at your panic messages and they seem to indicate a problem with memory and not necessarily a problem with the pool or the disk. Your previous zpool status output also indicates that the disk is okay. Maybe someone with similar recent memory problems can advise. Thanks, Cindy On 03/09/10 09:15, D. Pinnock wrote: When I boot from a snv133 live cd and attempt to import the rpool it panics with this output: Sun Microsystems Inc. SunOS 5.11 snv_133 February 2010 j...@opensolaris:~$ pfexec su Mar 9 03:11:37 opensolaris su: 'su root' succeeded for jack on /dev/console j...@opensolaris:~# zpool import -f -o ro -o failmode=continue -R /mnt rpool panic[cpu1]/thread=ff00086e0c60: BAD TRAP: type=e (#pf Page fault) rp=ff00086dfe60 addr=278 occurred in module unix due to a NULL pointer dereference sched: #pf Page fault Bad kernel fault at addr=0x278 pid=0, pc=0xfb862b6b, sp=0xff00086dff58, eflags=0x10246 cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de cr2: 278cr3: c80cr8: c rdi: 278 rsi:4 rdx: ff00086e0c60 rcx:0 r8: 40 r9:21d9a rax:0 rbx:0 rbp: ff00086dffb0 r10: 7f6fc8 r11: 6e r12:0 r13: 278 r14:4 r15: ff01cfe27e08 fsb:0 gsb: ff01ccfa5080 ds: 4b es: 4b fs:0 gs: 1c3 trp:e err:2 rip: fb862b6b cs: 30 rfl:10246 rsp: ff00086dff58 ss: 38 ff00086dfd40 unix:die+dd () ff00086dfe50 unix:trap+177e () ff00086dfe60 unix:cmntrap+e6 () ff00086dffb0 unix:mutex_enter+b () ff00086dffd0 zfs:zio_buf_alloc+2c () ff00086e0010 zfs:arc_get_data_buf+173 () ff00086e0060 zfs:arc_buf_alloc+a2 () ff00086e0100 zfs:arc_read_nolock+12f () ff00086e01a0 zfs:arc_read+75 () ff00086e0230 zfs:scrub_prefetch+b9 () ff00086e02f0 zfs:scrub_visitbp+5f1 () ff00086e03b0 zfs:scrub_visitbp+6e3 () ff00086e0470 zfs:scrub_visitbp+6e3 () ff00086e0530 zfs:scrub_visitbp+6e3 () ff00086e05f0 zfs:scrub_visitbp+6e3 () ff00086e06b0 zfs:scrub_visitbp+6e3 () ff00086e0750 zfs:scrub_visitdnode+84 () ff00086e0810 zfs:scrub_visitbp+1a6 () ff00086e0860 zfs:scrub_visit_rootbp+4f () ff00086e08c0 zfs:scrub_visitds+7e () ff00086e0a80 zfs:dsl_pool_scrub_sync+163 () ff00086e0af0 zfs:dsl_pool_sync+25b () ff00086e0ba0 zfs:spa_sync+36f () ff00086e0c40 zfs:txg_sync_thread+24a () ff00086e0c50 unix:thread_start+8 () ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Using zfs-auto-snapshot for automatic backups
On Mon, Mar 8, 2010 at 1:47 PM, Tim Foster tim.fos...@sun.com wrote: Looking at the errors, it looks like SMF isn't exporting the values for action_authorization or value_authorization in the SMF manifest it produces, resulting the service not being allowed to set values in svccfg when it runs as 'zfssnap'. After playing around a bit, I found the right way to verify this: $ svcprop -p general zfs/auto-snapshot:rpool-backup general/action_authorization astring solaris.smf.manage.zfs-auto-snapshot general/value_authorization astring solaris.smf.manage.zfs-auto-snapshot general/enabled boolean true general/entity_stability astring Unstable This is exactly the output that I'm getting for zfs/auto-snapshot:daily I'm still seeing errors from svccfg before and after the zfs send. This isn't affecting any of the default instances, since they don't use backup-save-cmd. $ svcprop -p start/user zfs/auto-snapshot:rpool-backup zfssnap $ svcprop -p stop/user zfs/auto-snapshot:rpool-backup zfssnap /etc/user_attr contains: zfssnaptype=role;auths=solaris.smf.manage.zfs-auto-snapshot;profiles=ZFS File System Management The instance runs as zfssnap and the user *should* be able to change values in smf, right? -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)
On Mar 9, 2010, at 9:40 AM, Matt Cowger wrote: Ross is correct - advanced OS features are not required here - just the ability to store a file - don’t even need unix style permissions KISS. Just use tmpfs, though you might also consider limiting its size. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance http://nexenta-atlanta.eventbrite.com (March 16-18, 2010) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recover rpool
On 03/ 9/10 10:53 AM, Cindy Swearingen wrote: Hi D, Is this a 32-bit system? We were looking at your panic messages and they seem to indicate a problem with memory and not necessarily a problem with the pool or the disk. Your previous zpool status output also indicates that the disk is okay. To perhaps clarify, you're panicking trying to grab a mutex, which hints that something has stomped on the memory containing that mutex. The reason for the 32-bit question is that sometimes a deep stack can overrun on a 32-bit box. That's probably not what happened here, but we ask anyway. -tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] about zfs exported on nfs
[First, a brief apology. I inadvertently posted this message to the `general' group when it should have been to the `zfs' group. In that last few days I seem to be all thumbs when posting.. and have created several bumbling posts to opensolaris lists. ] summary: A zfs fs set with smb and nfs on, and set chmod g-s (set-gid) with a local users uid:gid is being mounted by a remote linux host (and windows hosts, but not discussing that here). The remote user is the same as the local user in both numeric UID and numeric GID The zfs nfs/cifs share is mounted like this on a linux client: mount -t nfs -o users,exec,dev,suid Any files/directories create by the linux user end up with nobody:nobody uid:gid and any attempt to change that from the client host fails, even if done as root. Details: I'm not sure when this trouble started... its been a while, long enough to have existed over a couple of builds (b129 b133). But was not always a problem. I jumped from 129 to 133 so don't know about builds in between. I have a zfs_fs .. /projects on zpool z3 this is a hierarchy that is fairly deep but only the top level is zfs. (Aside: That is something I intend to change soon) That is, the whole thing, of course, is zfs, but the lower levels have been created by whatever remote host was working there. z3/projects has these two settings: z3/projects sharenfs on z3/projects sharesmb name=projects So both cifs and nfs are turned on making the zfs host both a zfs and nfs server. Also when z3/projects was created, it was set: chmod g-s (set gid) right away. The remote linux user in this discussion has the same numeric UID and GID as the local zfs user who is owner of /projects Later, and more than once by now, I've run this command from the zfs host: /bin/chmod -R A=everyone@:full_set:fd:allow /projects to get read/write to work when working from windows hosts. The filesystem is primarily accessed as an nsf mounted filesystem on a linux (gentoo linux) host. But is also used over cifs by a couple of windows hosts. On the linux client host, `/projects' gets mounted like this: mount -t nfs -o users,exec,dev,suid That has been the case both before having the problem and now. The trouble I see is that all files get created with: nobody:nobody as UID:GID, even though /projects is set as normal USER:GROUP of a user on the zfs/nfs server. From the remote (we only deal with the linux remote here) any attempt to change uid:gid fails, even if done by root on the remote. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)
That's a very good point - in this particular case, there is no option to change the blocksize for the application. On 3/9/10 10:42 AM, Roch Bourbonnais roch.bourbonn...@sun.com wrote: I think This is highlighting that there is extra CPU requirement to manage small blocks in ZFS. The table would probably turn over if you go to 16K zfs records and 16K reads/writes form the application. Next step for you is to figure how much reads/writes IOPS do you expect to take in the real workloads and whether or not the filesystem portion will represent a significant drain of CPU resource. -r Le 8 mars 10 à 17:57, Matt Cowger a écrit : Hi Everyone, It looks like I¹ve got something weird going with zfs performance on a ramdiskS.ZFS is performing not even a 3rd of what UFS is doing. Short version: Create 80+ GB ramdisk (ramdiskadm), system has 96GB, so we aren¹t swapping Create zpool on it (zpool create ramS.) Change zfs options to turn off checksumming (don¹t want it or need it), atime, compression, 4K block size (this is the applications native blocksize) etc. Run a simple iozone benchmark (seq. write, seq. read, rndm write, rndm read). Same deal for UFS, replacing the ZFS stuff with newfs stuff and mounting the UFS forcedirectio (no point in using a buffer cache memory for something that¹s already in memory) Measure IOPs performance using iozone: iozone -e -i 0 -i 1 -i 2 -n 5120 -O -q 4k -r 4k -s 5g With the ZFS filesystem I get around: ZFS (seq write) 42360 (seq read)31010 (random read)20953 (random write)32525 Not SOO bad, but here¹s UFS: UFS (seq write )42853 (seq read) 100761(random read) 100471 (random write) 101141 For all tests besides the seq write, UFS utterly destroys ZFS. I¹m curious if anyone has any clever ideas on why this huge disparity in performance exists. At the end of the day, my application will run on either filesystem, it just surprises me how much worse ZFS performs in this (admittedly edge case) scenario. --M ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] backup zpool to tape
Hello all, I need to backup some zpools to tape. I currently have two servers, for the purpose of this conversation we will call them server1 and server2 respectively. Server1, has several zpools which are replicated to a single zpool on server2 through a zfs send/recv script. This part works perfectly. I now need to get this backed up to tape. My origional plan was to have a disk set up which would hold a file based zpool, and then do zfs send/recv to this pool. My problem however is I run into this bug: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6929751 In my case where I reboot the server I cannot get the pool to come back up. It shows UNAVAIL, I have tried to export before reboot and reimport it and have not been successful and I dont like this in the case a power issue of some sort happens. My other option was to mount using lofiadm however I cannot get it to mount on boot, so the same thing happens. Does anyone have any experience with backing up zpools to tape? Please any ideas would be greatly beneficial. Thanks, Greg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Weird drive configuration, how to improve the situation
Okay... I found the solution to my problem. And it has nothing to do with my hard drives... It was the Realtek NIC drivers. I read about problems and added a new driver (I got that from the forum thread). And now I have about 30MB/s read and 25MB/s write performance. That's enough (for the beginning). Thanks for all your input and support. Thomas -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] what to do when errors occur during scrub
Hi Harry, Reviewing other postings where permanent errors where found on redundant ZFS configs, one was resolved by re-running the zpool scrub and one resolved itself because the files with the permanent errors were most likely temporary files. One of the files with permanent errors below is a snapshot and the other looks another backup. I would recommend the top section of this troubleshooting wiki to determine if hardware issues are causing these permanent errors: http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide If it turns out that some hardware problem, power failure, or other event caused these errors and if rerunning the scrub doesn't remove these files, then I would remove them manually (if you have copies of the data somewhere else). Thanks, Cindy On 03/09/10 10:08, Harry Putnam wrote: [I hope this isn't a repost double whammy. I posted this message under `Message-ID: 87fx4ai5sp@newsguy.com' over 15 hrs ago but it never appeared on my nntp server (gmane) far as I can see] I'm a little at a loss here as to what to do about these two errors that turned up during a scrub. The discs involved are a matched pair in mirror mode. zpool status -v z3 (wrapped for mail): ---- ---=--- - scrub: scrub completed after 1h48m with 2 errors on Mon Mar 8 10:26:49 2010 config: NAMESTATE READ WRITE CKSUM z3 ONLINE 0 0 2 mirror-0 ONLINE 0 0 4 c5d0ONLINE 0 0 4 c6d0ONLINE 0 0 4 errors: Permanent errors have been detected in the following files: [NOTE: Edited to ease reading -ed -hp] z3/proje...@zfs-auto-snap:monthly-2009-08-30-09:26:/Training/\ [... huge path snipped ...]/2_Database.mov /t/bk-test-DiskDamage-021710_005252/rsnap/misc/hourly.4/\ [... huge path snipped ...]/es.utf-8.sug ---- ---=--- - Those are just two on disk files. Can it be as simple as just deleting them? Or is something more technical required. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] backup zpool to tape
gd == Gregory Durham gregory.dur...@gmail.com writes: gd it to mount on boot I do not understand why you have a different at-boot-mounting problem with and without lofiadm: either way it's your script doing the importing explicitly, right? so just add lofiadm to your script. I guess you were exporting pools explicitly at shutdown because you didn't trust solaris to unmount the two levels of zfs in the right order? Anyway I would guess it doesn't matter because my ``back up file zpools to tape'' suggestion seems to be bogus bad advice. The other bug referenced in the one you quoted, 6915127, seems a lot more disruptive and says there are weird corruption problems with using file vdev's directly, and then there are deadlock problems with lofiadm from the two layers of zfs that haven't been ironed out yet. I guess file-based zpools do not work, and we're back to having no good plan that I can see to back up zpools to tape that preserves dedup, snapshots/clones, NFSv4 acl's, u.s.w. I assumed they did work because it looked like regression tests people were quoting and many examples depended upon them, but now it seems they don't, which explains some problems I had last month extracting an s10brand image from a .VDI. :( (iirc i got the image out using lofiadm and just assumed I was confused, banging away at things until they work and then forgetting about them. not good on me.) There is only zfs send which is made with replication in mind ( * it'll intentionally destroy the entire stream and any incremental descendents if there's a single bit-flip, which is a good feature to make sure the replication is retried if the copy's not faithful but a bad feature for tape. If ZFS rallies against other filesystems for their fragile lack of metadata copies and checksums, why should the tape format be so oddly fragile that tape archives become massive gamma gremlin detectors? * and it has no scrub-like method analagous to 'tar t' or 'cpio -it' because it's assumed you'll always recv it in a situation where you've the opportunity to re-send, while a tape is something you might like to validate after transporting it or every few years. If pools need scrubing why don't tapes? * and no partial-restore feature because it assumes if you don't have enough space on the destination for the entire dataset you'll use rsync or cpio or some other tree-granularity tool instead of the replication toolkit. a tool which does not fully exist (sparse files, 4GB files, NFSv4 ACL's), but that's a separate problem. ). how about zpools on zvol's. Does that avoid the deadlock/corruption bugs with file vdevs? It's not a workaround for the cases in the bug becuase they wanted to use NFS to replace iSCSI, but for backups, zvols might be okay, if they work? It's certainly possible to write them onto a tape (dd was originally meant for such things). pgpaynQ63iMAj.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recover rpool
My Laptop is a 64bit system Dell Latitude D630 Intel Core2 Duo Processor T7100 4GB RAM -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] what to do when errors occur during scrub
Cindy Swearingen cindy.swearin...@sun.com writes: Hi Harry, Reviewing other postings where permanent errors where found on redundant ZFS configs, one was resolved by re-running the zpool scrub and one resolved itself because the files with the permanent errors were most likely temporary files. what search strings did you use to find those?... I always seem to use search strings that miss what I'm after its helpful to see how others conduct searches. One of the files with permanent errors below is a snapshot and the other looks another backup. I would recommend the top section of this troubleshooting wiki to determine if hardware issues are causing these permanent errors: http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide A lot of that seems horribly complex for what is apparently (and this may turn out to be wishful thinking) a pretty minor problem. But it does say that repeated scrubs will most likely remove all traces of corruption (assuming its not caused by hardware). However I see no evidence that the `scrub' command is doing anything at all (more on that below). I decided to take the line of least Resistance and simply deleted the file. As you guessed, they were backups and luckily for me, redundant. So following a scrub... I see errors that look more technical. But first the info given by `zpool status' appears to either be referencing a earlier scrub or is seriously wrong in what it reports. root # zpool status -vx z3 pool: z3 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 1h48m with 2 errors on Mon Mar 8 10:26:49 2010 config: ---- ---=--- - I just ran a scrub moments ago, but `status' is still reporting one from earlier in the day. It says 1HR and 48 minutes but that is completely wrong too. ---- ---=--- - NAMESTATE READ WRITE CKSUM z3 ONLINE 0 0 2 mirror-0 ONLINE 0 0 4 c5d0ONLINE 0 0 4 c6d0ONLINE 0 0 4 errors: Permanent errors have been detected in the following files: 0x42:0x552d z3/t:0xe1d99f ---- ---=--- - The `status' report, even though it seems to have bogus information about the scrub, does show different output for the errors. Are those hex addresses of devices or what? There is nothing at all on z3/t Also - it appears `zpool scrub -s z3' doesn't really do anything. The status report above is taken immediately after a scrub command. The `scub -s' command just returns the prompt... no output and apparently no scrub either. Does the failure to scrub indicate it cannot be scrubbed? Does a status report that shows the pool on line and not degraded really mean anything is that just as spurious as the scrub info there? Sorry if I seem like a lazy dog but I don't really see a section in the trouble shooting (from viewing the outline of sections) that appears to deal with directly with scrubbing. Apparently I'm supposed to read and digest the whole thing so as to know what to do... but I get quickly completely lost in the discussion. They say to use fmdump for a list of defective hardware... but I don't see anything that appears to indicate a problem unless the two entries from March 5th mean something that is not apparent. fmdump (I removed the exact times from the lines so this wouldn't wrap) [...] Mar 05 ... 9ea9e105-72b1-69bd-e1e6-88322cd8b847 ZFS-8000-GH Mar 05 ... a37779fb-8018-ec8c-bd72-ec32f4b40ff6 ZFS-8000-GH Mar 08 ... a37779fb-8018-ec8c-bd72-ec32f4b40ff6 FMD-8000-4M Repaired Mar 08 ... a37779fb-8018-ec8c-bd72-ec32f4b40ff6 FMD-8000-6U Resolved Mar 08 ... 9ea9e105-72b1-69bd-e1e6-88322cd8b847 FMD-8000-4M Repaired Mar 08 ... 9ea9e105-72b1-69bd-e1e6-88322cd8b847 FMD-8000-6U Resolved ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)
On Mar 9, 2010, at 1:42 PM, Roch Bourbonnais roch.bourbonn...@sun.com wrote: I think This is highlighting that there is extra CPU requirement to manage small blocks in ZFS. The table would probably turn over if you go to 16K zfs records and 16K reads/writes form the application. Next step for you is to figure how much reads/writes IOPS do you expect to take in the real workloads and whether or not the filesystem portion will represent a significant drain of CPU resource. I think it highlights more the problem of ARC vs ramdisk, or specifically ZFS on ramdisk while ARC is fighting with ramdisk for memory. It is a wonder it didn't deadlock. If I were to put a ZFS file system on a ramdisk, I would limit the size of the ramdisk and ARC so both, plus the kernel fit nicely in memory with room to spare for user apps. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)
Could you retest it with mmap() used? Olga 2010/3/9 Matt Cowger mcow...@salesforce.com: It can, but doesn't in the command line shown below. M On Mar 8, 2010, at 6:04 PM, ольга крыжановская olga.kryzh anov...@gmail.com wrote: Does iozone use mmap() for IO? Olga On Tue, Mar 9, 2010 at 2:57 AM, Matt Cowger mcow...@salesforce.com wrote: Hi Everyone, It looks like I've got something weird going with zfs performance on a ramdiskZFS is performing not even a 3rd of what UFS is doing. Short version: Create 80+ GB ramdisk (ramdiskadm), system has 96GB, so we aren't swapping Create zpool on it (zpool create ram) Change zfs options to turn off checksumming (don't want it or need it), atime, compression, 4K block size (this is the applications native blocksize) etc. Run a simple iozone benchmark (seq. write, seq. read, rndm write, rndm read). Same deal for UFS, replacing the ZFS stuff with newfs stuff and mounting the UFS forcedirectio (no point in using a buffer cache memory for something that's already in memory) Measure IOPs performance using iozone: iozone -e -i 0 -i 1 -i 2 -n 5120 -O -q 4k -r 4k -s 5g With the ZFS filesystem I get around: ZFS (seq write) 42360 (seq read)31010 (random read)20953 (random write)32525 Not SOO bad, but here's UFS: UFS (seq write )42853 (seq read) 100761(random read) 100471 (random write) 101141 For all tests besides the seq write, UFS utterly destroys ZFS. I'm curious if anyone has any clever ideas on why this huge dispar ity in performance exists. At the end of the day, my application will run on either filesystem, it just surprises me how much worse ZFS performs in this (admittedly edge case) scenario. --M ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- , __ , { \/`o;-Olga Kryzhanovska -;o`\/ } .'-/`-/ olga.kryzhanov...@gmail.com \-`\-'. `'-..-| / Solaris/BSD//C/C++ programmer \ |-..-'` /\/\ /\/\ `--` `--` -- , __ , { \/`o;-Olga Kryzhanovska -;o`\/ } .'-/`-/ olga.kryzhanov...@gmail.com \-`\-'. `'-..-| / Solaris/BSD//C/C++ programmer \ |-..-'` /\/\ /\/\ `--` `--` ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] backup zpool to tape
Thank you for such a thorough look into my issue. As you said, I guess I am down to trying to backup to a zvol and then backing that up to tape. Has anyone tried this solution? I would be very interested to find out. Anyone else with any other solutions? Thanks! Greg -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)
This is a good point, and something that I tried. I limited the ARC to 1GB and 4GB (both well within the memory footprint of the system even with the ramdisk).equally poor resultsthis doesn't feel like ARC righting with locked memory pages. --M -Original Message- From: Ross Walker [mailto:rswwal...@gmail.com] Sent: Tuesday, March 09, 2010 3:53 PM To: Roch Bourbonnais Cc: Matt Cowger; zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop) On Mar 9, 2010, at 1:42 PM, Roch Bourbonnais roch.bourbonn...@sun.com wrote: I think This is highlighting that there is extra CPU requirement to manage small blocks in ZFS. The table would probably turn over if you go to 16K zfs records and 16K reads/writes form the application. Next step for you is to figure how much reads/writes IOPS do you expect to take in the real workloads and whether or not the filesystem portion will represent a significant drain of CPU resource. I think it highlights more the problem of ARC vs ramdisk, or specifically ZFS on ramdisk while ARC is fighting with ramdisk for memory. It is a wonder it didn't deadlock. If I were to put a ZFS file system on a ramdisk, I would limit the size of the ramdisk and ARC so both, plus the kernel fit nicely in memory with room to spare for user apps. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Should ZFS write data out when disk are idle
Sorry, Full Stripe on a RaidZ is the recordsize ie if the record size is 128k on a RaidZ and its made up of 5 disks, then 128k is spread across 4 disks with the calc parity on the 5 disk, which means the writes are 32k to each disk. For a RaidZ, when data is written to a disk, are individual 32k join together to the same disk and written out as a single I/O to the disk? e.g. 128k for file a, 128k for file b, 128k for file c. When written out does zfs do 32k+32k+32k i/o to each disk, or will it do one 96k i/o if the space is available sequentially? Cheers -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] sharenfs option rw,root=host1 don't take effect
Hi All, I had create a ZFS filesystem test and shared it with zfs set sharenfs=root=host1 test, and I checked the sharenfs option and it already update to root=host1: bash-3.00# zfs get sharenfs test - NAME PROPERTY VALUESOURCE test sharenfs rw,root=host local - and NFS command share show it already shared as rw,root=host1 also: - bash-3.00# share - /test sec=sys,rw,root=host1 - But at host1, after I mounted this filesystem and tried to do some write operation at it, it still return permission denied: - bash-3.00# touch ll touch: cannot create ll: Permission denied - Thanks for any reply. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] what to do when errors occur during scrub
On 3/9/2010 4:57 PM, Harry Putnam wrote: Also - it appears `zpool scrub -s z3' doesn't really do anything. The status report above is taken immediately after a scrub command. The `scub -s' command just returns the prompt... no output and apparently no scrub either. The -s switch is documented to STOP a scrub, though I've never used it. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharenfs option rw,root=host1 don't take effect
Hi All, I had create a ZFS filesystem test and shared it with zfs set sharenfs=root=host1 test, and I checked the sharenfs option and it already update to root=host1: Try to use a backslash to escape those special chars like so : zfs set sharenfs=nosub\,nosuid\,rw\=hostname1\:hostname2\,root\=hostname2 zpoolname/zfsname/pathname Dennis ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [osol-discuss] Moving Storage to opensolaris+zfs. What about backup?
On Mar 8, 2010, at 7:55 AM, Erik Trimble wrote: Assume your machine has died the True Death, and you are starting with new disks (and, at least a similar hardware setup). I'm going to assume that you named the original snapshot 'rpool/ROOT/whate...@today' (1) Boot off the OpenSolaris LiveCD ... (10) Activate the restored BE: # beadm activate New You should now be all set. Note: I have not /explicitly/ tried the above - I should go do that now to see what happens. :-) If anyone is going to implement this, much the same procedure is documented at Simon Breden's blog: http://breden.org.uk/2009/08/29/home-fileserver-mirrored-ssd-zfs-root-boot/ which walks through the commands for executing the backup and the restore. --Ware ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Should ZFS write data out when disk are idle
On Mar 9, 2010, at 6:13 PM, Damon Atkins wrote: Sorry, Full Stripe on a RaidZ is the recordsize ie if the record size is 128k on a RaidZ and its made up of 5 disks, then 128k is spread across 4 disks with the calc parity on the 5 disk, which means the writes are 32k to each disk. Nominally. For a RaidZ, when data is written to a disk, are individual 32k join together to the same disk and written out as a single I/O to the disk? I/Os can be coalesced, but there is no restriction as to what can be coalesced. In other words, subsequent writes can also be coalesced if they are contiguous. e.g. 128k for file a, 128k for file b, 128k for file c. When written out does zfs do 32k+32k+32k i/o to each disk, or will it do one 96k i/o if the space is available sequentially? I'm not sure how one could write one 96KB physical I/O to three different disks? -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance http://nexenta-atlanta.eventbrite.com (March 16-18, 2010) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharenfs option rw,root=host1 don't take effect
And I update the sharenfs option with rw,ro...@100.198.100.0/24, it works fine, and the NFS client can do the write without error. Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to verify ecc for ram is active and enabled?
Hi, thanks for the reply... I guess I'm so far as well, but my question is targetted at understanding the realworld implication of the kernel software memory scrubber. That is, in looking through the code a bit I notice that if hardware ECC is active the software scrubber is disabled. It is also disabled in absence of ECC memory (or unmatched ECC memory). In my particular case: bash-4.0# echo memscrub_scans_done/U | mdb -k memscrub_scans_done: memscrub_scans_done: 1985 It appears not to be disabled. My question, I guess, put differently is if it _is_ enabled does it indeed do something useful in the sense of error detection? That is, if it is enabled but *cannot* determine anything related to ECC, _why_ is it running in the first place? That is, if ECC is crippled then the software scrubber gives false impression of doing something useful and is perhaps a bug. On the other hand, if it *can* determine ECC (not crippled), then can we conclude that it is effective [enough] to be able to run as a small and reasonably reliable server? That is, correct correctable errors and be able to log memory errors for eventual action... cheers -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] (FreeBSD) ZFS RAID: Disk fails while replacing another disk
Christian Hessmann wrote: Victor, Btw, they affect some files referenced by snapshots as 'zpool status -v' suggests: tank/DVD:0x9cd tank/d...@2010025100:/Memento.m4v tank/d...@2010025100:/Payback.m4v tank/d...@2010025100:/TheManWhoWasntThere.m4v In case of OpenSolaris it is not that difficult to work around this bug without getting rid of files (snapshots referencing them) with errors, but in I'm not sure how to do the same on FreeBSD. But you always have option of destroying snapshot indicated above (and may be more). I'm still reluctant to reboot the machine, so what I did now was as you suggested destroy these snapshots (after deleting the files from the current filesystem, of course). I'm not so sure the result is good, though: === [r...@camelot /tank/DVD]# zpool status -v tank pool: tank state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver completed after 10h42m with 136 errors on Tue Mar 2 07:55:05 2010 config: NAME STATE READ WRITE CKSUM tank DEGRADED 137 0 0 raidz1 ONLINE 0 0 0 ad17p2 ONLINE 0 0 0 ad18p2 ONLINE 0 0 0 ad20p2 ONLINE 0 0 0 raidz1 DEGRADED 326 0 0 replacing DEGRADED 0 0 0 ad16p2 OFFLINE 2 241K 6 ad4p2ONLINE 0 0 0 839G resilvered ad14p2 ONLINE 0 0 0 5.33G resilvered ad15p2 ONLINE 418 0 0 5.33G resilvered errors: Permanent errors have been detected in the following files: tank/DVD:0x9cd 0x2064:0x25a4 0x20ae:0x503 0x20ae:0x9cd === Any further information available on this hex messages? This tells that ZFS can no longer map object numbers from errlog into meaningful names, and this is expected, as you have destroyed them. Now you need to rerun a scrub. regards, victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss