Re: [zfs-discuss] CPU sizing for ZFS/iSCSI/NFS server
On Mon, Dec 12, 2011 at 03:01:08PM -0500, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." wrote: > 4c@2.4ghz Yep, that's the plan. Thanks. > On 12/12/2011 2:44 PM, Albert Chin wrote: > >On Mon, Dec 12, 2011 at 02:40:52PM -0500, "Hung-Sheng Tsao (Lao Tsao 老曹) > >Ph.D." wrote: > >>please check out the ZFS appliance 7120 spec 2.4Ghz /24GB memory and > >>ZIL(SSD) > >>may be try the ZFS simulator SW > >Good point. Thanks. > > > >>regards > >> > >>On 12/12/2011 2:28 PM, Albert Chin wrote: > >>>We're preparing to purchase an X4170M2 as an upgrade for our existing > >>>X4100M2 server for ZFS, NFS, and iSCSI. We have a choice for CPU, some > >>>more expensive than others. Our current system has a dual-core 1.8Ghz > >>>Opteron 2210 CPU with 8GB. Seems like either a 6-core Intel E5649 > >>>2.53Ghz CPU or 4-core Intel E5620 2.4Ghz CPU would be more than > >>>enough. Based on what we're using the system for, it should be more > >>>I/O bound than CPU bound. We are doing compression in ZFS but that > >>>shouldn't be too CPU intensive. Seems we should be caring more about > >>>more cores than high Ghz. > >>> > >>>Recommendations? > >>> > >>-- > >>Hung-Sheng Tsao Ph D. > >>Founder& Principal > >>HopBit GridComputing LLC > >>cell: 9734950840 > >>http://laotsao.wordpress.com/ > >>http://laotsao.blogspot.com/ > >> > >>begin:vcard > >>fn:Hung-Sheng Tsao > >>n:Tsao;Hung-Sheng > >>email;internet:laot...@gmail.com > >>tel;cell:9734950840 > >>x-mozilla-html:TRUE > >>version:2.1 > >>end:vcard > >> > > > > -- > Hung-Sheng Tsao Ph D. > Founder& Principal > HopBit GridComputing LLC > cell: 9734950840 > http://laotsao.wordpress.com/ > http://laotsao.blogspot.com/ -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CPU sizing for ZFS/iSCSI/NFS server
On Mon, Dec 12, 2011 at 02:40:52PM -0500, "Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." wrote: > please check out the ZFS appliance 7120 spec 2.4Ghz /24GB memory and > ZIL(SSD) > may be try the ZFS simulator SW Good point. Thanks. > regards > > On 12/12/2011 2:28 PM, Albert Chin wrote: > >We're preparing to purchase an X4170M2 as an upgrade for our existing > >X4100M2 server for ZFS, NFS, and iSCSI. We have a choice for CPU, some > >more expensive than others. Our current system has a dual-core 1.8Ghz > >Opteron 2210 CPU with 8GB. Seems like either a 6-core Intel E5649 > >2.53Ghz CPU or 4-core Intel E5620 2.4Ghz CPU would be more than > >enough. Based on what we're using the system for, it should be more > >I/O bound than CPU bound. We are doing compression in ZFS but that > >shouldn't be too CPU intensive. Seems we should be caring more about > >more cores than high Ghz. > > > >Recommendations? > > > > -- > Hung-Sheng Tsao Ph D. > Founder& Principal > HopBit GridComputing LLC > cell: 9734950840 > http://laotsao.wordpress.com/ > http://laotsao.blogspot.com/ > > begin:vcard > fn:Hung-Sheng Tsao > n:Tsao;Hung-Sheng > email;internet:laot...@gmail.com > tel;cell:9734950840 > x-mozilla-html:TRUE > version:2.1 > end:vcard > -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] CPU sizing for ZFS/iSCSI/NFS server
We're preparing to purchase an X4170M2 as an upgrade for our existing X4100M2 server for ZFS, NFS, and iSCSI. We have a choice for CPU, some more expensive than others. Our current system has a dual-core 1.8Ghz Opteron 2210 CPU with 8GB. Seems like either a 6-core Intel E5649 2.53Ghz CPU or 4-core Intel E5620 2.4Ghz CPU would be more than enough. Based on what we're using the system for, it should be more I/O bound than CPU bound. We are doing compression in ZFS but that shouldn't be too CPU intensive. Seems we should be caring more about more cores than high Ghz. Recommendations? -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] cannot receive new filesystem stream: invalid backup stream
I have two snv_126 systems. I'm trying to zfs send a recursive snapshot from one system to another: # zfs send -v -R tww/opt/chro...@backup-20091225 |\ ssh backupserver "zfs receive -F -d -u -v tww" ... found clone origin tww/opt/chroots/a...@ab-1.0 receiving incremental stream of tww/opt/chroots/ab-...@backup-20091225 into tww/opt/chroots/ab-...@backup-20091225 cannot receive new filesystem stream: invalid backup stream If I do the following on the origin server: # zfs destroy -r tww/opt/chroots/ab-1.0 # zfs list -t snapshot -r tww/opt/chroots | grep ab-1.0 tww/opt/chroots/a...@ab-1.0 tww/opt/chroots/hppa1.1-hp-hpux11...@ab-1.0 tww/opt/chroots/hppa1.1-hp-hpux11...@ab-1.0 ... # zfs list -t snapshot -r tww/opt/chroots | grep ab-1.0 |\ while read a; do zfs destroy $a; done then another zfs send like the above, the zfs send/receive succeeds. However, If I then perform a few operations like the following: zfs snapshot tww/opt/chroots/a...@ab-1.0 zfs clone tww/opt/chroots/a...@ab-1.0 tww/opt/chroots/ab-1.0 zfs rename tww/opt/chroots/ab/hppa1.1-hp-hpux11.00 tww/opt/chroots/ab-1.0/hppa1.1-hp-hpux11.00 zfs rename tww/opt/chroots/hppa1.1-hp-hpux11...@ab tww/opt/chroots/hppa1.1-hp-hpux11...@ab-1.0 zfs destroy tww/opt/chroots/ab/hppa1.1-hp-hpux11.00 zfs destroy tww/opt/chroots/hppa1.1-hp-hpux11...@ab zfs snapshot tww/opt/chroots/hppa1.1-hp-hpux11...@ab zfs clone tww/opt/chroots/hppa1.1-hp-hpux11...@ab tww/opt/chroots/ab/hppa1.1-hp-hpux11.00 zfs rename tww/opt/chroots/ab/hppa1.1-hp-hpux11.11 tww/opt/chroots/ab-1.0/hppa1.1-hp-hpux11.11 zfs rename tww/opt/chroots/hppa1.1-hp-hpux11...@ab tww/opt/chroots/hppa1.1-hp-hpux11...@ab-1.0 zfs destroy tww/opt/chroots/ab/hppa1.1-hp-hpux11.11 zfs destroy tww/opt/chroots/hppa1.1-hp-hpux11...@ab zfs snapshot tww/opt/chroots/hppa1.1-hp-hpux11...@ab zfs clone tww/opt/chroots/hppa1.1-hp-hpux11...@ab tww/opt/chroots/ab/hppa1.1-hp-hpux11.11 ... and then perform another zfs send/receive, the error above occurs. Why? -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] heads up on SXCE build 125 (LU + mirrored root pools)
On Thu, Nov 05, 2009 at 01:01:54PM -0800, Chris Du wrote: > I think I finally see what you mean. > > # luactivate b126 > System has findroot enabled GRUB > ERROR: Unable to determine the configuration of the current boot environment > . A possible solution was posted in the thread: http://opensolaris.org/jive/thread.jspa?threadID=115503&tstart=0 -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] STK 2540 and Ignore Cache Sync (ICS)
On Mon, Oct 26, 2009 at 09:58:05PM +0200, Mertol Ozyoney wrote: > In all 2500 and 6000 series you can assign raid set's to a controller and > that controller becomes the owner of the set. When I configured all 32-drives on a 6140 array and the expansion chassis, CAM automatically split the drives amongst controllers evenly. > The advantage of 2540 against it's bigger brothers (6140 which is EOL'ed) > and competitors 2540 do use dedicated data paths for cache mirroring just > like higher end unit disks (6180,6580, 6780) improving write performance > significantly. > > Spliting load between controllers can most of the time increase performance, > but you do not need to split in two equal partitions. > > Also do not forget that first tray have dedicated data lines to the > controller so generaly it's wise not to mix those drives with other drives > on other trays. But, if you have an expansion chassis, and create a zpool with drives on the first tray and subsequent trays, what's the difference? You cannot tell zfs which vdev to assign writes to so it seems pointless to balance your pool based on the chassis when reads/writes are potentially spread across all vdevs. > Best regards > Mertol > > > > > Mertol Ozyoney > Storage Practice - Sales Manager > > Sun Microsystems, TR > Istanbul TR > Phone +902123352200 > Mobile +905339310752 > Fax +90212335 > Email mertol.ozyo...@sun.com > > > > -Original Message- > From: zfs-discuss-boun...@opensolaris.org > [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Bob Friesenhahn > Sent: Tuesday, October 13, 2009 10:59 PM > To: Nils Goroll > Cc: zfs-discuss@opensolaris.org > Subject: Re: [zfs-discuss] STK 2540 and Ignore Cache Sync (ICS) > > On Tue, 13 Oct 2009, Nils Goroll wrote: > > > > Regarding my bonus question: I haven't found yet a definite answer if > there > > is a way to read the currently active controller setting. I still assume > that > > the nvsram settings which can be read with > > > > service -d -c read -q nvsram region=0xf2 host=0x00 > > > > do not necessarily reflect the current configuration and that the only way > to > > make sure the controller is running with that configuration is to reset > it. > > I believe that in the STK 2540, the controllers operate Active/Active > except that each controller is Active for half the drives and Standby > for the others. Each controller has a copy of the configuration > information. Whichever one you communicate with is likely required to > mirror the changes to the other. > > In my setup I load-share the fiber channel traffic by assigning six > drives as active on one controller and six drives as active on the > other controller, and the drives are individually exported with a LUN > per drive. I used CAM to do that. MPXIO sees the changes and does > map 1/2 the paths down each FC link for more performance than one FC > link offers. > > Bob > -- > Bob Friesenhahn > bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] "zfs send..." too slow?
On Sun, Oct 25, 2009 at 01:45:05AM -0700, Orvar Korvar wrote: > I am trying to backup a large zfs file system to two different > identical hard drives. I have therefore started two commands to backup > "myfs" and when they have finished, I will backup "nextfs" > > zfs send mypool/m...@now | zfs receive backupzpool1/now & zfs send > mypool/m...@now | zfs receive backupzpool2/now ; zfs send > mypool/nex...@now | zfs receive backupzpool3/now > > in parallell. The logic is that the same file data is cached and > therefore easy to send to each backup drive. > > Should I instead have done one "zfs send..." and waited for it to > complete, and then started the next? > > It seems that "zfs send..." takes quite some time? 300GB takes 10 > hours, this far. And I have in total 3TB to backup. This means it will > take 100 hours. Is this normal? If I had 30TB to back up, it would > take 1000 hours, which is more than a month. Can I speed this up? It's not immediately obvious what the cause is. Maybe the server running zfs send has slow MB/s performance reading from disk. Maybe the network. Or maybe the remote system. This might help: http://tinyurl.com/yl653am -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance problems with Thumper and >7TB ZFS pool using RAIDZ2
On Sat, Oct 24, 2009 at 03:31:25PM -0400, Jim Mauro wrote: > Posting to zfs-discuss. There's no reason this needs to be > kept confidential. > > 5-disk RAIDZ2 - doesn't that equate to only 3 data disks? > Seems pointless - they'd be much better off using mirrors, > which is a better choice for random IO... Is it really pointless? Maybe they want the insurance RAIDZ2 provides. Given the choice between insurance and performance, I'll take insurance, though it depends on your use case. We're using 5-disk RAIDZ2 vdevs. While I want the performance a mirrored vdev would give, it scares me that you're just one drive away from a failed pool. Of course, you could have two mirrors in each vdev but I don't want to sacrifice that much space. However, over the last two years, we haven't had any demonstratable failures that would give us cause for concern. But, it's still unsettling. Would love to hear other opinions on this. > Looking at this now... > > /jim > > > Jeff Savit wrote: >> Hi all, >> >> I'm looking for suggestions for the following situation: I'm helping >> another SE with a customer using Thumper with a large ZFS pool mostly >> used as an NFS server, and disappointments in performance. The storage >> is an intermediate holding place for data to be fed into a relational >> database, and the statement is that the NFS side can't keep up with >> data feeds written to it as flat files. >> >> The ZFS pool has 8 5-volume RAIDZ2 groups, for 7.3TB of storage, with >> 1.74TB available. Plenty of idle CPU as shown by vmstat and mpstat. >> iostat shows queued I/O and I'm not happy about the total latencies - >> wsvc_t in excess of 75ms at times. Average of ~60KB per read and only >> ~2.5KB per write. Evil Tuning guide tells me that RAIDZ2 is happiest >> for long reads and writes, and this is not the use case here. >> >> I was surprised to see commands like tar, rm, and chown running >> locally on the NFS server, so it looks like they're locally doing file >> maintenance and pruning at the same time it's being accessed remotely. >> That makes sense to me for the short write lengths and for the high >> ZFS ACL activity shown by DTrace. I wonder if there is a lot of sync >> I/O that would benefit from separately defined ZILs (whether SSD or >> not), so I've asked them to look for fsync activity. >> >> Data collected thus far is listed below. I've asked for verification >> of the Solaris 10 level (I believe it's S10u6) and ZFS recordsize. >> Any suggestions will be appreciated. >> >> regards, Jeff -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help! System panic when pool imported
On Mon, Oct 19, 2009 at 09:02:20PM -0500, Albert Chin wrote: > On Mon, Oct 19, 2009 at 03:31:46PM -0700, Matthew Ahrens wrote: > > Thanks for reporting this. I have fixed this bug (6822816) in build > > 127. > > Thanks. I just installed OpenSolaris Preview based on 125 and will > attempt to apply the patch you made to this release and import the pool. Did the above and the zpool import worked. Thanks! > > --matt > > > > Albert Chin wrote: > >> Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a > >> snapshot a few days ago: > >> # zfs snapshot a...@b > >> # zfs clone a...@b tank/a > >> # zfs clone a...@b tank/b > >> > >> The system started panicing after I tried: > >> # zfs snapshot tank/b...@backup > >> > >> So, I destroyed tank/b: > >> # zfs destroy tank/b > >> then tried to destroy tank/a > >> # zfs destroy tank/a > >> > >> Now, the system is in an endless panic loop, unable to import the pool > >> at system startup or with "zpool import". The panic dump is: > >> panic[cpu1]/thread=ff0010246c60: assertion failed: 0 == > >> zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) (0x0 == > >> 0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512 > >> > >> ff00102468d0 genunix:assfail3+c1 () > >> ff0010246a50 zfs:dsl_dataset_destroy_sync+85a () > >> ff0010246aa0 zfs:dsl_sync_task_group_sync+eb () > >> ff0010246b10 zfs:dsl_pool_sync+196 () > >> ff0010246ba0 zfs:spa_sync+32a () > >> ff0010246c40 zfs:txg_sync_thread+265 () > >> ff0010246c50 unix:thread_start+8 () > >> > >> We really need to import this pool. Is there a way around this? We do > >> have snv_114 source on the system if we need to make changes to > >> usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs > >> destroy" transaction never completed and it is being replayed, causing > >> the panic. This cycle continues endlessly. > >> > >> > > > > _______ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > -- > albert chin (ch...@thewrittenword.com) > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help! System panic when pool imported
On Mon, Oct 19, 2009 at 03:31:46PM -0700, Matthew Ahrens wrote: > Thanks for reporting this. I have fixed this bug (6822816) in build > 127. Thanks. I just installed OpenSolaris Preview based on 125 and will attempt to apply the patch you made to this release and import the pool. > --matt > > Albert Chin wrote: >> Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a >> snapshot a few days ago: >> # zfs snapshot a...@b >> # zfs clone a...@b tank/a >> # zfs clone a...@b tank/b >> >> The system started panicing after I tried: >> # zfs snapshot tank/b...@backup >> >> So, I destroyed tank/b: >> # zfs destroy tank/b >> then tried to destroy tank/a >> # zfs destroy tank/a >> >> Now, the system is in an endless panic loop, unable to import the pool >> at system startup or with "zpool import". The panic dump is: >> panic[cpu1]/thread=ff0010246c60: assertion failed: 0 == >> zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) (0x0 == >> 0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512 >> >> ff00102468d0 genunix:assfail3+c1 () >> ff0010246a50 zfs:dsl_dataset_destroy_sync+85a () >> ff0010246aa0 zfs:dsl_sync_task_group_sync+eb () >> ff0010246b10 zfs:dsl_pool_sync+196 () >> ff0010246ba0 zfs:spa_sync+32a () >> ff0010246c40 zfs:txg_sync_thread+265 () >> ff0010246c50 unix:thread_start+8 () >> >> We really need to import this pool. Is there a way around this? We do >> have snv_114 source on the system if we need to make changes to >> usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs >> destroy" transaction never completed and it is being replayed, causing >> the panic. This cycle continues endlessly. >> >> > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iscsi/comstar performance
On Tue, Oct 13, 2009 at 01:00:35PM -0400, Frank Middleton wrote: > After a recent upgrade to b124, decided to switch to COMSTAR for iscsi > targets for VirtualBox hosted on AMD64 Fedora C10. Both target and > initiator are running zfs under b124. This combination seems > unbelievably slow compared to the old iscsi subsystem. > > A scrub of a local 20GB disk on the target took 16 minutes. A scrub of > a 20GB iscsi disk took 106 minutes! It seems to take much longer to > boot from iscsi, so it seems to be reading more slowly too. > > There are a lot of variables - switching to Comstar, snv124, VBox > 3.08, etc., but such a dramatic loss of performance probably has a > single cause. Is anyone willing to speculate? Maybe this will help: http://mail.opensolaris.org/pipermail/storage-discuss/2009-September/007118.html -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs receive should allow to keep received system
On Mon, Sep 28, 2009 at 03:16:17PM -0700, Igor Velkov wrote: > Not so good as I hope. > zfs send -R xxx/x...@daily_2009-09-26_23:51:00 |ssh -c blowfish r...@xxx.xx > zfs recv -vuFd xxx/xxx > > invalid option 'u' > usage: > receive [-vnF] > receive [-vnF] -d > > For the property list, run: zfs set|get > > For the delegated permission list, run: zfs allow|unallow > r...@xxx:~# uname -a > SunOS xxx 5.10 Generic_13-03 sun4u sparc SUNW,Sun-Fire-V890 > > What's wrong? Looks like -u was a recent addition. -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Should usedbydataset be the same after zfs send/recv for a volume?
On Mon, Sep 28, 2009 at 07:33:56PM -0500, Albert Chin wrote: > When transferring a volume between servers, is it expected that the > usedbydataset property should be the same on both? If not, is it cause > for concern? > > snv114# zfs list tww/opt/vms/images/vios/near.img > NAME USED AVAIL REFER MOUNTPOINT > tww/opt/vms/images/vios/near.img 70.5G 939G 15.5G - > snv114# zfs get usedbydataset tww/opt/vms/images/vios/near.img > NAME PROPERTY VALUE SOURCE > tww/opt/vms/images/vios/near.img usedbydataset 15.5G - > > snv119# zfs list t/opt/vms/images/vios/near.img > NAME USED AVAIL REFER MOUNTPOINT > t/opt/vms/images/vios/near.img 14.5G 2.42T 14.5G - > snv119# zfs get usedbydataset t/opt/vms/images/vios/near.img > NAMEPROPERTY VALUE SOURCE > t/opt/vms/images/vios/near.img usedbydataset 14.5G - Don't know if it matters but disks on both send/recv server are different, 300GB FCAL on the send, 750GB SATA on the recv. -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Should usedbydataset be the same after zfs send/recv for a volume?
When transferring a volume between servers, is it expected that the usedbydataset property should be the same on both? If not, is it cause for concern? snv114# zfs list tww/opt/vms/images/vios/near.img NAME USED AVAIL REFER MOUNTPOINT tww/opt/vms/images/vios/near.img 70.5G 939G 15.5G - snv114# zfs get usedbydataset tww/opt/vms/images/vios/near.img NAME PROPERTY VALUE SOURCE tww/opt/vms/images/vios/near.img usedbydataset 15.5G - snv119# zfs list t/opt/vms/images/vios/near.img NAME USED AVAIL REFER MOUNTPOINT t/opt/vms/images/vios/near.img 14.5G 2.42T 14.5G - snv119# zfs get usedbydataset t/opt/vms/images/vios/near.img NAMEPROPERTY VALUE SOURCE t/opt/vms/images/vios/near.img usedbydataset 14.5G - -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] refreservation not transferred by zfs send when sending a volume?
snv114# zfs get used,reservation,volsize,refreservation,usedbydataset,usedbyrefreservation tww/opt/vms/images/vios/mello-0.img NAME PROPERTY VALUE SOURCE tww/opt/vms/images/vios/mello-0.img used 30.6G - tww/opt/vms/images/vios/mello-0.img reservation none default tww/opt/vms/images/vios/mello-0.img volsize 25G- tww/opt/vms/images/vios/mello-0.img refreservation25Glocal tww/opt/vms/images/vios/mello-0.img usedbydataset 5.62G - tww/opt/vms/images/vios/mello-0.img usedbyrefreservation 25G- Sent tww/opt/vms/images/vios/mello-0.img from snv_114 server to snv_119 server. On snv_119 server: snv119# zfs get used,reservation,volsize,refreservation,usedbydataset,usedbyrefreservation t/opt/vms/images/vios/mello-0.img NAME PROPERTY VALUE SOURCE t/opt/vms/images/vios/mello-0.img used 5.32G - t/opt/vms/images/vios/mello-0.img reservation none default t/opt/vms/images/vios/mello-0.img volsize 25G- t/opt/vms/images/vios/mello-0.img refreservationnone default t/opt/vms/images/vios/mello-0.img usedbydataset 5.32G - t/opt/vms/images/vios/mello-0.img usedbyrefreservation 0 - Any reason the refreservation and usedbyrefreservation properties are not sent? -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On Mon, Sep 28, 2009 at 10:16:20AM -0700, Richard Elling wrote: > On Sep 28, 2009, at 3:42 PM, Albert Chin wrote: > >> On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote: >>> On Mon, 28 Sep 2009, Richard Elling wrote: >>>> >>>> Scrub could be faster, but you can try >>>>tar cf - . > /dev/null >>>> >>>> If you think about it, validating checksums requires reading the >>>> data. >>>> So you simply need to read the data. >>> >>> This should work but it does not verify the redundant metadata. For >>> example, the duplicate metadata copy might be corrupt but the problem >>> is not detected since it did not happen to be used. >> >> Too bad we cannot scrub a dataset/object. > > Can you provide a use case? I don't see why scrub couldn't start and > stop at specific txgs for instance. That won't necessarily get you to a > specific file, though. If your pool is borked but mostly readable, yet some file systems have cksum errors, you cannot "zfs send" that file system (err, snapshot of filesystem). So, you need to manually fix the file system by traversing it to read all files to determine which must be fixed. Once this is done, you can snapshot and "zfs send". If you have many file systems, this is time consuming. Of course, you could just rsync and be happy with what you were able to recover, but if you have clones branched from the same parent, which a few differences inbetween shapshots, having to rsync *everything* rather than just the differences is painful. Hence the reason to try to get "zfs send" to work. But, this is an extreme example and I doubt pools are often in this state so the engineering time isn't worth it. In such cases though, a "zfs scrub" would be useful. -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub
On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote: > On Mon, 28 Sep 2009, Richard Elling wrote: >> >> Scrub could be faster, but you can try >> tar cf - . > /dev/null >> >> If you think about it, validating checksums requires reading the data. >> So you simply need to read the data. > > This should work but it does not verify the redundant metadata. For > example, the duplicate metadata copy might be corrupt but the problem > is not detected since it did not happen to be used. Too bad we cannot scrub a dataset/object. -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Quickest way to find files with cksum errors without doing scrub
Without doing a zpool scrub, what's the quickest way to find files in a filesystem with cksum errors? Iterating over all files with "find" takes quite a bit of time. Maybe there's some zdb fu that will perform the check for me? -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help! System panic when pool imported
On Sun, Sep 27, 2009 at 10:06:16AM -0700, Andrew wrote: > This is what my /var/adm/messages looks like: > > Sep 27 12:46:29 solaria genunix: [ID 403854 kern.notice] assertion failed: ss > == NULL, file: ../../common/fs/zfs/space_map.c, line: 109 > Sep 27 12:46:29 solaria unix: [ID 10 kern.notice] > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a97a0 > genunix:assfail+7e () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9830 > zfs:space_map_add+292 () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a98e0 > zfs:space_map_load+3a7 () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9920 > zfs:metaslab_activate+64 () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a99e0 > zfs:metaslab_group_alloc+2b7 () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9ac0 > zfs:metaslab_alloc_dva+295 () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9b60 > zfs:metaslab_alloc+9b () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9b90 > zfs:zio_dva_allocate+3e () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9bc0 > zfs:zio_execute+a0 () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9c40 > genunix:taskq_thread+193 () > Sep 27 12:46:29 solaria genunix: [ID 655072 kern.notice] ff00089a9c50 > unix:thread_start+8 () I'm not sure that aok=1/zfs:zfs_recover=1 would help you because zfs_panic_recover isn't in the backtrace (see http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6638754). Sometimes a Sun zfs engineer shows up on the freenode #zfs channel. I'd pop up there and ask. There are somewhat similar bug reports at bugs.opensolaris.org. I'd post a bug report just in case. -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help! System panic when pool imported
On Sun, Sep 27, 2009 at 12:25:28AM -0700, Andrew wrote: > I'm getting the same thing now. > > I tried moving my 5-disk raidZ and 2disk Mirror over to another > machine, but that machine would keep panic'ing (not ZFS related > panics). When I brought the array back over, I started getting this as > well.. My Mirror array is unaffected. > > snv111b (2009.06 release) What does the panic dump look like? -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Help! System panic when pool imported
On Fri, Sep 25, 2009 at 05:21:23AM +, Albert Chin wrote: > [[ snip snip ]] > > We really need to import this pool. Is there a way around this? We do > have snv_114 source on the system if we need to make changes to > usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs > destroy" transaction never completed and it is being replayed, causing > the panic. This cycle continues endlessly. What are the implications of adding the following to /etc/system: set zfs:zfs_recover=1 set aok=1 And importing the pool with: # zpool import -o ro -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Help! System panic when pool imported
Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a snapshot a few days ago: # zfs snapshot a...@b # zfs clone a...@b tank/a # zfs clone a...@b tank/b The system started panicing after I tried: # zfs snapshot tank/b...@backup So, I destroyed tank/b: # zfs destroy tank/b then tried to destroy tank/a # zfs destroy tank/a Now, the system is in an endless panic loop, unable to import the pool at system startup or with "zpool import". The panic dump is: panic[cpu1]/thread=ff0010246c60: assertion failed: 0 == zap_remove_int(mos, ds_prev->ds_phys->ds_next_clones_obj, obj, tx) (0x0 == 0x2), file: ../../common/fs/zfs/dsl_dataset.c, line: 1512 ff00102468d0 genunix:assfail3+c1 () ff0010246a50 zfs:dsl_dataset_destroy_sync+85a () ff0010246aa0 zfs:dsl_sync_task_group_sync+eb () ff0010246b10 zfs:dsl_pool_sync+196 () ff0010246ba0 zfs:spa_sync+32a () ff0010246c40 zfs:txg_sync_thread+265 () ff0010246c50 unix:thread_start+8 () We really need to import this pool. Is there a way around this? We do have snv_114 source on the system if we need to make changes to usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs destroy" transaction never completed and it is being replayed, causing the panic. This cycle continues endlessly. -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs snapshot -r panic on b114
While a resilver was running, we attempted a recursive snapshot which resulted in a kernel panic: panic[cpu1]/thread=ff00104c0c60: assertion failed: 0 == zap_remove_int(mos, next_clones_obj, dsphys->ds_next_snap_obj, tx) (0x0 == 0x2), file: ../../common/ fs/zfs/dsl_dataset.c, line: 1869 ff00104c0960 genunix:assfail3+c1 () ff00104c0a00 zfs:dsl_dataset_snapshot_sync+4a2 () ff00104c0a50 zfs:snapshot_sync+41 () ff00104c0aa0 zfs:dsl_sync_task_group_sync+eb () ff00104c0b10 zfs:dsl_pool_sync+196 () ff00104c0ba0 zfs:spa_sync+32a () ff00104c0c40 zfs:txg_sync_thread+265 () ff00104c0c50 unix:thread_start+8 () System is a X4100M2 running snv_114. Any ideas? -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to recover from "can't open objset", "cannot iterate filesystems"?
Recently upgraded a system from b98 to b114. Also replaced two 400G Seagate Barracudea 7200.8 SATA disks with two WD 750G RE3 SATA disks from a 6-device raidz1 pool. Replacing the first 750G went ok. While replacing the second 750G disk, I noticed CKSUM errors on the first disk. Once the second disk was replaced, I halted the system, upgraded to b114, and rebooted. Both b98 and b114 gave the errors: WARNING: can't open objset for tww/opt/dists/cd-8.1 cannot iterate filesystems: I/O error How do I recover from this? # zpool status tww pool: tww state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM tww ONLINE 0 0 3 raidz1ONLINE 0 012 c4t0d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 errors: 855 data errors, use '-v' for a list -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool replace complete but old drives not detached
$ cat /etc/release Solaris Express Community Edition snv_114 X86 Copyright 2009 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 04 May 2009 I recently replaced two drives in a raidz2 vdev. However, after the resilver completed, the old drives were not automatically detached. Why? How do I detach the drives that were replaced? # zpool replace tww c6t600A0B800029996605B04668F17Dd0 \ c6t600A0B8000299CCC099B4A400A9Cd0 # zpool replace tww c6t600A0B800029996605C24668F39Bd0 \ c6t600A0B8000299CCC0A744A94F7E2d0 ... resilver runs to completion ... # zpool status tww pool: tww state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver completed after 25h11m with 23375 errors on Sun Sep 6 02:09:07 2009 config: NAME STATE READ WRITE CKSUM tww DEGRADED 0 0 207K raidz2 ONLINE 0 0 0 c6t600A0B800029996605964668CB39d0ONLINE 0 0 0 c6t600A0B8000299CCC06C84744C892d0ONLINE 0 0 0 c6t600A0B8000299CCC05B44668CC6Ad0ONLINE 0 0 0 c6t600A0B800029996605A44668CC3Fd0ONLINE 0 0 0 c6t600A0B8000299CCC05BA4668CD2Ed0ONLINE 0 0 0 c6t600A0B800029996605AA4668CDB1d0ONLINE 0 0 0 c6t600A0B8000299966073547C5CED9d0ONLINE 0 0 0 raidz2 DEGRADED 0 0 182K replacingDEGRADED 0 0 0 c6t600A0B800029996605B04668F17Dd0 DEGRADED 0 0 0 too many errors c6t600A0B8000299CCC099B4A400A9Cd0 ONLINE 0 0 0 255G resilvered c6t600A0B8000299CCC099E4A400B94d0ONLINE 0 0 218K 10.2M resilvered c6t600A0B8000299CCC0A6B4A93D3EEd0ONLINE 0 0 242 246G resilvered spareDEGRADED 0 0 0 c6t600A0B8000299CCC05CC4668F30Ed0 DEGRADED 0 0 3 too many errors c6t600A0B8000299CCC05D84668F448d0 ONLINE 0 0 0 255G resilvered spareDEGRADED 0 0 0 c6t600A0B800029996605BC4668F305d0 DEGRADED 0 0 0 too many errors c6t600A0B800029996605C84668F461d0 ONLINE 0 0 0 255G resilvered c6t600A0B800029996609EE4A89DA51d0ONLINE 0 0 0 246G resilvered replacingDEGRADED 0 0 0 c6t600A0B800029996605C24668F39Bd0 DEGRADED 0 0 0 too many errors c6t600A0B8000299CCC0A744A94F7E2d0 ONLINE 0 0 0 255G resilvered raidz2 ONLINE 0 0 233K c6t600A0B8000299CCC0A154A89E426d0ONLINE 0 0 0 c6t600A0B800029996609F74A89E1A5d0ONLINE 0 0 758 6.50K resilvered c6t600A0B8000299CCC0A174A89E520d0ONLINE 0 0 311 3.50K resilvered c6t600A0B800029996609F94A89E24Bd0ONLINE 0 0 21.8K 32K resilvered c6t600A0B8000299CCC0A694A93D322d0ONLINE 0 0 0 1.85G resilvered c6t600A0B8000299CCC0A0C4A89DDE8d0ONLINE 0 0 27.4K 41.5K resilvered c6t600A0B800029996609F04A89DB1Bd0ONLINE 0 0 7.13K 24K resilvered spares c6t600A0B8000299CCC05D84668F448d0 INUSE currently in use c6t600A0B800029996605C84668F461d0 INUSE currently in use c6t600A0B80002999660A454A93CEDBd0 AVAIL c6t600A0B80002999660ADA4A9CF2EDd0 AVAIL -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool scrub started resilver, not scrub (DTL non-empty?)
On Mon, Aug 31, 2009 at 02:40:54PM -0500, Albert Chin wrote: > On Wed, Aug 26, 2009 at 02:33:39AM -0500, Albert Chin wrote: > > # cat /etc/release > > Solaris Express Community Edition snv_105 X86 > >Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. > > Use is subject to license terms. > >Assembled 15 December 2008 > > > So, why is a resilver in progress when I asked for a scrub? > > Still seeing the same problem with snv_114. > # cat /etc/release > Solaris Express Community Edition snv_114 X86 > Copyright 2009 Sun Microsystems, Inc. All Rights Reserved. > Use is subject to license terms. > Assembled 04 May 2009 > > How do I scrub this pool? >From http://black-chair.livejournal.com/21419.html zpool scrub *should* work just fine. The only time a scrub will turn into a resilver is when there are disks whose DTL (dirty time log -- the thing that keeps track of which transactions groups need to be resilvered on which disks) is non-empty. The idea is that if a resilver is necessary, than it's higher priority than a scrub. Can you tell me a little more about your setup? Just drop me a personal e-mail -- Jeff Bonwick, first.l...@sun.com. So, how do I find out if the DTL is non-empty? Looks like it is nevery non-empty. And, if it is non-empty, how do I make it empty? -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool scrub started resilver, not scrub
On Wed, Aug 26, 2009 at 02:33:39AM -0500, Albert Chin wrote: > # cat /etc/release > Solaris Express Community Edition snv_105 X86 >Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. > Use is subject to license terms. >Assembled 15 December 2008 > So, why is a resilver in progress when I asked for a scrub? Still seeing the same problem with snv_114. # cat /etc/release Solaris Express Community Edition snv_114 X86 Copyright 2009 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 04 May 2009 How do I scrub this pool? -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snv_110 -> snv_121 produces checksum errors on Raid-Z pool
On Thu, Aug 27, 2009 at 06:29:52AM -0700, Gary Gendel wrote: > It looks like It's definitely related to the snv_121 upgrade. I > decided to roll back to snv_110 and the checksum errors have > disappeared. I'd like to issue a bug report, but I don't have any > information that might help track this down, just lots of checksum > errors. So, on snv_121, can you read the files with checksum errors? Is it simply the reporting mechanism that is wrong or are the files really damaged? -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool scrub started resilver, not scrub
# cat /etc/release Solaris Express Community Edition snv_105 X86 Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 15 December 2008 # zpool status tww pool: tww state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver completed after 6h15m with 27885 errors on Wed Aug 26 07:18:03 2009 config: NAME STATE READ WRITE CKSUM twwONLINE 0 0 54.5K raidz2 ONLINE 0 0 0 c6t600A0B800029996605964668CB39d0 ONLINE 0 0 0 c6t600A0B8000299CCC06C84744C892d0 ONLINE 0 0 0 c6t600A0B8000299CCC05B44668CC6Ad0 ONLINE 0 0 0 c6t600A0B800029996605A44668CC3Fd0 ONLINE 0 0 0 c6t600A0B8000299CCC05BA4668CD2Ed0 ONLINE 0 0 0 c6t600A0B800029996605AA4668CDB1d0 ONLINE 0 0 0 c6t600A0B8000299966073547C5CED9d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c6t600A0B800029996605B04668F17Dd0 ONLINE 0 0 0 c6t600A0B8000299CCC099E4A400B94d0 ONLINE 0 0 0 c6t600A0B800029996605B64668F26Fd0 ONLINE 0 0 0 c6t600A0B8000299CCC05CC4668F30Ed0 ONLINE 0 0 0 c6t600A0B800029996605BC4668F305d0 ONLINE 0 0 0 c6t600A0B8000299CCC099B4A400A9Cd0 ONLINE 0 0 0 c6t600A0B800029996605C24668F39Bd0 ONLINE 0 0 0 raidz2 ONLINE 0 0 109K c6t600A0B8000299CCC0A154A89E426d0 ONLINE 0 0 0 c6t600A0B800029996609F74A89E1A5d0 ONLINE 0 018 2.50K resilvered c6t600A0B8000299CCC0A174A89E520d0 ONLINE 0 039 4.50K resilvered c6t600A0B800029996609F94A89E24Bd0 ONLINE 0 0 486 75K resilvered c6t600A0B80002999660A454A93CEDBd0 ONLINE 0 0 0 2.55G resilvered c6t600A0B8000299CCC0A0C4A89DDE8d0 ONLINE 0 034 2K resilvered c6t600A0B800029996609F04A89DB1Bd0 ONLINE 0 0 173 18K resilvered spares c6t600A0B8000299CCC05D84668F448d0AVAIL c6t600A0B800029996605C84668F461d0AVAIL errors: 27885 data errors, use '-v' for a list # zpool scrub tww # zpool status tww pool: tww state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver in progress for 0h11m, 2.82% done, 6h21m to go config: ... So, why is a resilver in progress when I asked for a scrub? -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Resilver complete, but device not replaced, odd zpool status output
On Tue, Aug 25, 2009 at 06:05:16AM -0500, Albert Chin wrote: > [[ snip snip ]] > > After the resilver completed: > # zpool status tww > pool: tww > state: DEGRADED > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. >see: http://www.sun.com/msg/ZFS-8000-8A > scrub: resilver completed after 6h9m with 27886 errors on Tue Aug 25 > 08:32:41 2009 > config: > > NAME STATE READ > WRITE CKSUM > tww DEGRADED 0 > 0 76.0K > raidz2 ONLINE 0 > 0 0 > c6t600A0B800029996605964668CB39d0ONLINE 0 > 0 0 > c6t600A0B8000299CCC06C84744C892d0ONLINE 0 > 0 0 > c6t600A0B8000299CCC05B44668CC6Ad0ONLINE 0 > 0 0 > c6t600A0B800029996605A44668CC3Fd0ONLINE 0 > 0 0 > c6t600A0B8000299CCC05BA4668CD2Ed0ONLINE 0 > 0 0 > c6t600A0B800029996605AA4668CDB1d0ONLINE 0 > 0 0 > c6t600A0B8000299966073547C5CED9d0ONLINE 0 > 0 0 > raidz2 ONLINE 0 > 0 0 > c6t600A0B800029996605B04668F17Dd0ONLINE 0 > 0 0 > c6t600A0B8000299CCC099E4A400B94d0ONLINE 0 > 0 0 > c6t600A0B800029996605B64668F26Fd0ONLINE 0 > 0 0 > c6t600A0B8000299CCC05CC4668F30Ed0ONLINE 0 > 0 0 > c6t600A0B800029996605BC4668F305d0ONLINE 0 > 0 0 > c6t600A0B8000299CCC099B4A400A9Cd0ONLINE 0 > 0 0 > c6t600A0B800029996605C24668F39Bd0ONLINE 0 > 0 0 > raidz2 DEGRADED 0 > 0 153K > c6t600A0B8000299CCC0A154A89E426d0ONLINE 0 > 0 1 1K resilvered > c6t600A0B800029996609F74A89E1A5d0ONLINE 0 > 0 2.14K 5.67M resilvered > c6t600A0B8000299CCC0A174A89E520d0ONLINE 0 > 0 299 34K resilvered > c6t600A0B800029996609F94A89E24Bd0ONLINE 0 > 0 29.7K 23.5M resilvered > replacingDEGRADED 0 > 0 118K > c6t600A0B8000299CCC0A194A89E634d0 OFFLINE 20 > 1.28M 0 > c6t600A0B800029996609EE4A89DA51d0 ONLINE 0 > 0 0 1.93G resilvered > c6t600A0B8000299CCC0A0C4A89DDE8d0ONLINE 0 > 0 247 54K resilvered > c6t600A0B800029996609F04A89DB1Bd0ONLINE 0 > 0 24.2K 51.3M resilvered > spares > c6t600A0B8000299CCC05D84668F448d0 AVAIL > c6t600A0B800029996605C84668F461d0 AVAIL > > errors: 27886 data errors, use '-v' for a list > > # zpool replace c6t600A0B8000299CCC0A194A89E634d0 \ > c6t600A0B800029996609EE4A89DA51d0 > invalid vdev specification > use '-f' to override the following errors: > /dev/dsk/c6t600A0B800029996609EE4A89DA51d0s0 is part of active ZFS > pool tww. Please see zpool(1M). > > So, what is going on? Rebooted the server and see the same problem. So, I ran: # zpool detach tww c6t600A0B8000299CCC0A194A89E634d0 and now the zpool status output looks "normal": # zpool status tww pool: tww state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver in progress for 0h16m, 7.88% done, 3h9m to go config: NAME STATE READ WRITE CKSUM twwONLINE 0 0 5 raidz2 ONLINE 0 0 0 c6t600A0B800029996605964668CB39d0 ONLINE 0 0 0 c6t600A0B8000299CCC06C84744C892d0 ONLINE 0 0 0 c6t600A0B8000299CCC05B44668CC6Ad0 ONLINE 0 0 0 c6t600A0B800029996605A44668CC3Fd0 ONLINE 0 0 0 c6t600A0B8000299CCC05BA4668CD2Ed0 ONLINE 0 0 0 c6t600A0B800029996605AA4668CDB1d0 ONLINE 0 0 0 c6
[zfs-discuss] Resilver complete, but device not replaced, odd zpool status output
: NAME STATE READ WRITE CKSUM tww DEGRADED 0 0 76.0K raidz2 ONLINE 0 0 0 c6t600A0B800029996605964668CB39d0ONLINE 0 0 0 c6t600A0B8000299CCC06C84744C892d0ONLINE 0 0 0 c6t600A0B8000299CCC05B44668CC6Ad0ONLINE 0 0 0 c6t600A0B800029996605A44668CC3Fd0ONLINE 0 0 0 c6t600A0B8000299CCC05BA4668CD2Ed0ONLINE 0 0 0 c6t600A0B800029996605AA4668CDB1d0ONLINE 0 0 0 c6t600A0B8000299966073547C5CED9d0ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c6t600A0B800029996605B04668F17Dd0ONLINE 0 0 0 c6t600A0B8000299CCC099E4A400B94d0ONLINE 0 0 0 c6t600A0B800029996605B64668F26Fd0ONLINE 0 0 0 c6t600A0B8000299CCC05CC4668F30Ed0ONLINE 0 0 0 c6t600A0B800029996605BC4668F305d0ONLINE 0 0 0 c6t600A0B8000299CCC099B4A400A9Cd0ONLINE 0 0 0 c6t600A0B800029996605C24668F39Bd0ONLINE 0 0 0 raidz2 DEGRADED 0 0 153K c6t600A0B8000299CCC0A154A89E426d0ONLINE 0 0 1 1K resilvered c6t600A0B800029996609F74A89E1A5d0ONLINE 0 0 2.14K 5.67M resilvered c6t600A0B8000299CCC0A174A89E520d0ONLINE 0 0 299 34K resilvered c6t600A0B800029996609F94A89E24Bd0ONLINE 0 0 29.7K 23.5M resilvered replacingDEGRADED 0 0 118K c6t600A0B8000299CCC0A194A89E634d0 OFFLINE 20 1.28M 0 c6t600A0B800029996609EE4A89DA51d0 ONLINE 0 0 0 1.93G resilvered c6t600A0B8000299CCC0A0C4A89DDE8d0ONLINE 0 0 247 54K resilvered c6t600A0B800029996609F04A89DB1Bd0ONLINE 0 0 24.2K 51.3M resilvered spares c6t600A0B8000299CCC05D84668F448d0 AVAIL c6t600A0B800029996605C84668F461d0 AVAIL errors: 27886 data errors, use '-v' for a list # zpool replace c6t600A0B8000299CCC0A194A89E634d0 \ c6t600A0B800029996609EE4A89DA51d0 invalid vdev specification use '-f' to override the following errors: /dev/dsk/c6t600A0B800029996609EE4A89DA51d0s0 is part of active ZFS pool tww. Please see zpool(1M). So, what is going on? -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why so many data errors with raidz2 config and one failing drive?
On Mon, Aug 24, 2009 at 02:01:39PM -0500, Bob Friesenhahn wrote: > On Mon, 24 Aug 2009, Albert Chin wrote: >> >> Seems some of the new drives are having problems, resulting in CKSUM >> errors. I don't understand why I have so many data errors though. Why >> does the third raidz2 vdev report 34.0K CKSUM errors? > > Is it possible that this third raidz2 is inflicted with a shared > problem such as a cable, controller, backplane, or power supply? Only > one drive is reported as being unscathed. Well, we're just using unused drives on the existing array. No other changes. > Do you periodically scrub your array? No. Guess we will now :) But, I think all of the data loss is a result of the new drives, not ones that were already part of the two previous vdevs. -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Why so many data errors with raidz2 config and one failing drive?
Added a third raidz2 vdev to my pool: pool: tww state: DEGRADED status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: resilver in progress for 0h57m, 13.36% done, 6h9m to go config: NAME STATE READ WRITE CKSUM tww DEGRADED 0 0 16.9K raidz2 ONLINE 0 0 0 c6t600A0B800029996605964668CB39d0ONLINE 0 0 0 c6t600A0B8000299CCC06C84744C892d0ONLINE 0 0 0 c6t600A0B8000299CCC05B44668CC6Ad0ONLINE 0 0 0 c6t600A0B800029996605A44668CC3Fd0ONLINE 0 0 0 c6t600A0B8000299CCC05BA4668CD2Ed0ONLINE 0 0 0 c6t600A0B800029996605AA4668CDB1d0ONLINE 0 0 0 c6t600A0B8000299966073547C5CED9d0ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c6t600A0B800029996605B04668F17Dd0ONLINE 0 0 0 c6t600A0B8000299CCC099E4A400B94d0ONLINE 0 0 0 c6t600A0B800029996605B64668F26Fd0ONLINE 0 0 0 c6t600A0B8000299CCC05CC4668F30Ed0ONLINE 0 0 0 c6t600A0B800029996605BC4668F305d0ONLINE 0 0 0 c6t600A0B8000299CCC099B4A400A9Cd0ONLINE 0 0 0 c6t600A0B800029996605C24668F39Bd0ONLINE 0 0 0 raidz2 DEGRADED 0 0 34.0K c6t600A0B8000299CCC0A154A89E426d0ONLINE 0 0 0 c6t600A0B800029996609F74A89E1A5d0ONLINE 0 0 7 4K resilvered c6t600A0B8000299CCC0A174A89E520d0ONLINE 0 0 2 4K resilvered c6t600A0B800029996609F94A89E24Bd0ONLINE 0 048 24.5K resilvered replacingDEGRADED 0 0 78.7K c6t600A0B8000299CCC0A194A89E634d0 UNAVAIL 20 277K 0 experienced I/O failures c6t600A0B800029996609EE4A89DA51d0 ONLINE 0 0 0 38.1M resilvered c6t600A0B8000299CCC0A0C4A89DDE8d0ONLINE 0 0 6 6K resilvered c6t600A0B800029996609F04A89DB1Bd0ONLINE 0 086 92K resilvered spares c6t600A0B8000299CCC05D84668F448d0 AVAIL c6t600A0B800029996605C84668F461d0 AVAIL errors: 17097 data errors, use '-v' for a list Seems some of the new drives are having problems, resulting in CKSUM errors. I don't understand why I have so many data errors though. Why does the third raidz2 vdev report 34.0K CKSUM errors? The number of data errors appears to be increasing as well as the resilver process continues. -- albert chin (ch...@thewrittenword.com) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS space efficiency when copying files from another source
On Mon, Nov 24, 2008 at 08:43:18AM -0800, Erik Trimble wrote: > I _really_ wish rsync had an option to "copy in place" or something like > that, where the updates are made directly to the file, rather than a > temp copy. Isn't this what --inplace does? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZIL & NVRAM partitioning?
On Sat, Sep 06, 2008 at 11:16:15AM -0700, Kaya Bekiroğlu wrote: > > The big problem appears to be getting your hands on these cards. > > Although I have the drivers now my first supplier let me down, and > > while the second insists they have shipped the cards it's been three > > weeks now and there's no sign of them. > > Thanks to Google Shopping I was able to order two of these cards from: > http://www.printsavings.com/01390371OP-discount-MICRO+MEMORY-MM5425--512MB-NVRAM-battery.aspx > > They appear to be in good working order, but unfortunately I am unable > to verify the driver. "pkgadd -d umem_Sol_Drv_Cust_i386_v01_11.pkg" > hangs on "## Installing part 1 of 3." on snv_95. I do not have other > Solaris versions to experiment with; this is really just a hobby for > me. Does the card come with any programming specs to help debug the driver? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How do you grow a ZVOL?
On Thu, Jul 17, 2008 at 04:28:34PM -0400, Charles Menser wrote: > I've looked for anything I can find on the topic, but there does not > appear to be anything documented. > > Can a ZVOL be expanded? I think setting the volsize property expands it. Dunno what happens on the clients though. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss Digest, Vol 33, Issue 19
On Sun, Jul 06, 2008 at 08:36:33PM -0400, Gilberto Mautner wrote: > We're trying to accomplish the same goal over here, ie. serving > multiple VMware images from a NFS server. > > Could you tell what kind of NVRAM device did you end up choosing? We > bought a Micromemory PCI card but can't get a Solaris driver for it... A Solaris driver definitely exists for the uMem PCI cards. Don't know how you'd get it though but some performance stats were done with one of the their PCI cards on Solaris: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on > Gilberto > > > On 7/6/08 9:54 AM, "[EMAIL PROTECTED]" > <[EMAIL PROTECTED]> wrote: > > > -- > > > > Message: 6 > > Date: Sun, 06 Jul 2008 06:37:40 PDT > > From: Ross <[EMAIL PROTECTED]> > > Subject: [zfs-discuss] Measuring ZFS performance - IOPS and throughput > > To: zfs-discuss@opensolaris.org > > Message-ID: <[EMAIL PROTECTED]> > > Content-Type: text/plain; charset=UTF-8 > > > > Can anybody tell me how to measure the raw performance of a new system I'm > > putting together? I'd like to know what it's capable of in terms of IOPS > > and > > raw throughput to the disks. > > > > I've seen Richard's raidoptimiser program, but I've only seen results for > > random read iops performance, and I'm particularly interested in write > > performance. That's because the live server will be fitted with 512MB of > > nvram for the ZIL, and I'd like to see what effect that actually has. > > > > The disk system will be serving NFS to VMware to act as the datastore for a > > number of virtual machines. I plan to benchmark the individual machines to > > see what kind of load they put on the server, but I need the raw figures > > from > > the disk to get an idea of how many machines I can serve before I need to > > start thinking bigger. > > > > I'd also like to know if there's any easy way to see the current performance > > of the system once it's in use? I know VMware has performance monitoring > > built into the console, but I'd prefer to take figures directly off the > > storage server if possible. > > > > thanks, > > > > Ross > > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] J4200/J4400 Array
On Thu, Jul 03, 2008 at 01:43:36PM +0300, Mertol Ozyoney wrote: > You are right that J series do not have nvram onboard. However most Jbods > like HPS's MSA series have some nvram. > The idea behind not using nvram on the Jbod's is > > -) There is no use to add limited ram to a JBOD as disks already have a lot > of cache. > -) It's easy to design a redundant Jbod without nvram. If you have nvram and > need redundancy you need to design more complex HW and more complex firmware > -) Bateries are the first thing to fail > -) Servers already have too much ram Well, if the server attached to the J series is doing ZFS/NFS, performance will increase with zfs:zfs_nocacheflush=1. But, without battery-backed NVRAM, this really isn't "safe". So, for this usage case, unless the server has battery-backed NVRAM, I don't see how the J series is good for ZFS/NFS usage. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] J4200/J4400 Array
On Wed, Jul 02, 2008 at 04:49:26AM -0700, Ben B. wrote: > According to the Sun Handbook, there is a new array : > SAS interface > 12 disks SAS or SATA > > ZFS could be used nicely with this box. Doesn't seem to have any NVRAM storage on board, so seems like JBOD. > There is an another version called > J4400 with 24 disks. > > Doc is here : > http://docs.sun.com/app/docs/coll/j4200 -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration for VMware
On Fri, Jun 27, 2008 at 08:13:14AM -0700, Ross wrote: > Bleh, just found out the i-RAM is 5v PCI only. Won't work on PCI-X > slots which puts that out of the question for the motherboad I'm > using. Vmetro have a 2GB PCI-E card out, but it's for OEM's only: > http://www.vmetro.com/category4304.html, and I don't have any space in > this server to mount a SSD. Maybe you can call Vmetro and get the names of some resellers whom you could call to get pricing info? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/recv question
On Thu, Mar 06, 2008 at 10:34:07PM -0800, Bill Shannon wrote: > Darren J Moffat wrote: > > I know this isn't answering the question but rather than using "today" > > and "yesterday" why not not just use dates ? > > Because then I have to compute yesterday's date to do the incremental > dump. Not if you set a ZFS property with the date of the last backup. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance with Sun StorageTek 2540
On Fri, Feb 15, 2008 at 09:00:05PM +, Peter Tribble wrote: > On Fri, Feb 15, 2008 at 8:50 PM, Bob Friesenhahn > <[EMAIL PROTECTED]> wrote: > > On Fri, 15 Feb 2008, Peter Tribble wrote: > > > > > > May not be relevant, but still worth checking - I have a 2530 (which > > ought > > > to be that same only SAS instead of FC), and got fairly poor performance > > > at first. Things improved significantly when I got the LUNs properly > > > balanced across the controllers. > > > > What do you mean by "properly balanced across the controllers"? Are > > you using the multipath support in Solaris 10 or are you relying on > > ZFS to balance the I/O load? Do some disks have more affinity for a > > controller than the other? > > Each LUN is accessed through only one of the controllers (I presume the > 2540 works the same way as the 2530 and 61X0 arrays). The paths are > active/passive (if the active fails it will relocate to the other path). > When I set mine up the first time it allocated all the LUNs to controller B > and performance was terrible. I then manually transferred half the LUNs > to controller A and it started to fly. http://groups.google.com/group/comp.unix.solaris/browse_frm/thread/59b43034602a7b7f/0b500afc4d62d434?lnk=st&q=#0b500afc4d62d434 -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] LowEnd Batt. backed raid controllers that will deal with ZFS commit semantics correctly?
On Fri, Jan 25, 2008 at 12:59:18AM -0500, Kyle McDonald wrote: > ... With the 256MB doing write caching, is there any further benefit > to moving thte ZIL to a flash or other fast NV storage? Do some tests with/without ZIL enabled. You should see a big difference. You should see something equivalent to the performance of ZIL disabled with ZIL/RAM. I'd do ZIL with a battery-backed RAM in a heartbeat if I could find a card. I think others would as well. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharenfs with over 10000 file systems
On Wed, Jan 23, 2008 at 08:02:22AM -0800, Akhilesh Mritunjai wrote: > I remember reading a discussion where these kind of problems were > discussed. > > Basically it boils down to "everything" not being aware of the > radical changes in "filesystems" concept. > > All these things are being worked on, but it might take sometime > before everything is made aware that yes it's no longer unusual that > there can be 1+ filesystems on one machine. But shouldn't sharemgr(1M) be "aware"? It's relatively new. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] LowEnd Batt. backed raid controllers that will deal with ZFS commit semantics correctly?
On Tue, Jan 22, 2008 at 09:20:30PM -0500, Kyle McDonald wrote: > Anyone know the answer to this? I'll be ordering 2 of the 7K's for > my x346's this week. If niether A nor B will work I'm not sure > there's any advantage to using the 7k card considering I want ZFS to > do the mirroring. Why even both with a H/W RAID array when you won't use the H/W RAID? Better to find a decent SAS/FC JBOD with cache. Would definitely be cheaper. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] LowEnd Batt. backed raid controllers that will deal with ZFS commit semantics correctly?
On Tue, Jan 22, 2008 at 12:47:37PM -0500, Kyle McDonald wrote: > > My primary use case, is NFS base storage to a farm of software build > servers, and developer desktops. For the above environment, you'll probably see a noticable improvement with a battery-backed NVRAM-based ZIL. Unfortunately, no inexpensive cards exist for the common consumer (with ECC memory anyways). If you convince http://www.micromemory.com/ to sell you one, let us know :) Set "set zfs:zil_disable = 1" in /etc/system to gauge the type of improvement you can expect. Don't use this in production though. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)
On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote: > There seems to be a persistent issue we have with ZFS where one of the > SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS > does not offline the disk and instead hangs all zpools across the > system. If it is not caught soon enough, application data ends up in > an inconsistent state. We've had this issue with b54 through b77 (as > of last night). > > We don't seem to be the only folks with this issue reading through the > archives. Are there any plans to fix this behavior? It really makes > ZFS less than desirable/reliable. http://blogs.sun.com/eschrock/entry/zfs_and_fma FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68: http://www.opensolaris.org/os/community/arc/caselog/2007/283/ http://www.opensolaris.org/os/community/on/flag-days/all/ -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Trial x4500, zfs with NFS and quotas.
On Wed, Nov 28, 2007 at 05:40:57PM +0900, Jorgen Lundman wrote: > > Ah it's a somewhat mis-leading error message: > > bash-3.00# mount -F lofs /zpool1/test /export/test > bash-3.00# share -F nfs -o rw,anon=0 /export/test > Could not share: /export/test: invalid path > bash-3.00# umount /export/test > bash-3.00# zfs set sharenfs=off zpool1/test > bash-3.00# mount -F lofs /zpool1/test /export/test > bash-3.00# share -F nfs -o rw,anon=0 /export/test > > So if any zfs file-system has sharenfs enabled, you will get "invalid > path". If you disable sharenfs, then you can export the lofs. I reported bug #6578437. We recently ugraded to b77 and this bug appears to be fixed now. > Lund > > > J.P. King wrote: > >> > >> I can not export lofs on NFS. Just gives invalid path, > > > > Tell that to our mirror server. > > > > -bash-3.00$ /sbin/mount -p | grep linux > > /data/linux - /linux lofs - no ro > > /data/linux - /export/ftp/pub/linux lofs - no ro > > -bash-3.00$ grep linux /etc/dfs/sharetab > > /linux - nfs ro Linux directories > > -bash-3.00$ df -k /linux > > Filesystem 1K-blocks Used Available Use% Mounted on > > data 3369027462 3300686151 68341312 98% /data > > > >> and: > >> > >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6578437 > > > > I'm using straight Solaris, not Solaris Express or equivalents: > > > > -bash-3.00$ uname -a > > SunOS leprechaun.csi.cam.ac.uk 5.10 Generic_127111-01 sun4u sparc > > SUNW,Sun-Fire-V240 Solaris > > > > I can't comment on the bug, although I notice it is categorised under > > nfsv4, but the description doesn't seem to match that. > > > >> Jorgen Lundman | <[EMAIL PROTECTED]> > > > > Julian > > -- > > Julian King > > Computer Officer, University of Cambridge, Unix Support > > > > -- > Jorgen Lundman | <[EMAIL PROTECTED]> > Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) > Shibuya-ku, Tokyo| +81 (0)90-5578-8500 (cell) > Japan| +81 (0)3 -3375-1767 (home) > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why did resilvering restart?
On Tue, Nov 20, 2007 at 11:39:30AM -0600, Albert Chin wrote: > On Tue, Nov 20, 2007 at 11:10:20AM -0600, [EMAIL PROTECTED] wrote: > > > > [EMAIL PROTECTED] wrote on 11/20/2007 10:11:50 AM: > > > > > On Tue, Nov 20, 2007 at 10:01:49AM -0600, [EMAIL PROTECTED] wrote: > > > > Resilver and scrub are broken and restart when a snapshot is created > > > > -- the current workaround is to disable snaps while resilvering, > > > > the ZFS team is working on the issue for a long term fix. > > > > > > But, no snapshot was taken. If so, zpool history would have shown > > > this. So, in short, _no_ ZFS operations are going on during the > > > resilvering. Yet, it is restarting. > > > > > > > Does 2007-11-20.02:37:13 actually match the expected timestamp of > > the original zpool replace command before the first zpool status > > output listed below? > > No. We ran some 'zpool status' commands after the last 'zpool > replace'. The 'zpool status' output in the initial email is from this > morning. The only ZFS command we've been running is 'zfs list', 'zpool > list tww', 'zpool status', or 'zpool status -v' after the last 'zpool > replace'. I think the 'zpool status' command was resetting the resilvering. We upgraded to b77 this morning which did not exhibit this problem. Resilvering is now done. > Server is on GMT time. > > > Is it possible that another zpool replace is further up on your > > pool history (ie it was rerun by an admin or automatically from some > > service)? > > Yes, but a zpool replace for the same bad disk: > 2007-11-20.00:57:40 zpool replace tww c0t600A0B8000299966059E4668CBD3d0 > c0t600A0B800029996606584741C7C3d0 > 2007-11-20.02:35:22 zpool detach tww c0t600A0B800029996606584741C7C3d0 > 2007-11-20.02:37:13 zpool replace tww c0t600A0B8000299966059E4668CBD3d0 > c0t600A0B8000299CCC06734741CD4Ed0 > > We accidentally removed c0t600A0B800029996606584741C7C3d0 from the > array, hence the 'zpool detach'. > > The last 'zpool replace' has been running for 15h now. > > > -Wade > > > > > > > > > > > > [EMAIL PROTECTED] wrote on 11/20/2007 09:58:19 AM: > > > > > > > > > On b66: > > > > > # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \ > > > > > c0t600A0B8000299CCC06734741CD4Ed0 > > > > > < some hours later> > > > > > # zpool status tww > > > > > pool: tww > > > > >state: DEGRADED > > > > > status: One or more devices is currently being resilvered. The > > pool > > > > will > > > > > continue to function, possibly in a degraded state. > > > > > action: Wait for the resilver to complete. > > > > >scrub: resilver in progress, 62.90% done, 4h26m to go > > > > > < some hours later> > > > > > # zpool status tww > > > > > pool: tww > > > > >state: DEGRADED > > > > > status: One or more devices is currently being resilvered. The > > pool > > > > will > > > > > continue to function, possibly in a degraded state. > > > > > action: Wait for the resilver to complete. > > > > >scrub: resilver in progress, 3.85% done, 18h49m to go > > > > > > > > > > # zpool history tww | tail -1 > > > > > 2007-11-20.02:37:13 zpool replace tww > > > > c0t600A0B8000299966059E4668CBD3d0 > > > > > c0t600A0B8000299CCC06734741CD4Ed0 > > > > > > > > > > So, why did resilvering restart when no zfs operations occurred? I > > > > > just ran zpool status again and now I get: > > > > > # zpool status tww > > > > > pool: tww > > > > >state: DEGRADED > > > > > status: One or more devices is currently being resilvered. The > > pool > > > > will > > > > > continue to function, possibly in a degraded state. > > > > > action: Wait for the resilver to complete. > > > > >scrub: resilver in progress, 0.00% done, 134h45m to go > > > > > > > > > > What's going on? > > > > > > > > > > -- > > > > > albert chin ([EMAIL PROTECTED]) > > > > > __
Re: [zfs-discuss] Why did resilvering restart?
On Tue, Nov 20, 2007 at 11:10:20AM -0600, [EMAIL PROTECTED] wrote: > > [EMAIL PROTECTED] wrote on 11/20/2007 10:11:50 AM: > > > On Tue, Nov 20, 2007 at 10:01:49AM -0600, [EMAIL PROTECTED] wrote: > > > Resilver and scrub are broken and restart when a snapshot is created > > > -- the current workaround is to disable snaps while resilvering, > > > the ZFS team is working on the issue for a long term fix. > > > > But, no snapshot was taken. If so, zpool history would have shown > > this. So, in short, _no_ ZFS operations are going on during the > > resilvering. Yet, it is restarting. > > > > Does 2007-11-20.02:37:13 actually match the expected timestamp of > the original zpool replace command before the first zpool status > output listed below? No. We ran some 'zpool status' commands after the last 'zpool replace'. The 'zpool status' output in the initial email is from this morning. The only ZFS command we've been running is 'zfs list', 'zpool list tww', 'zpool status', or 'zpool status -v' after the last 'zpool replace'. Server is on GMT time. > Is it possible that another zpool replace is further up on your > pool history (ie it was rerun by an admin or automatically from some > service)? Yes, but a zpool replace for the same bad disk: 2007-11-20.00:57:40 zpool replace tww c0t600A0B8000299966059E4668CBD3d0 c0t600A0B800029996606584741C7C3d0 2007-11-20.02:35:22 zpool detach tww c0t600A0B800029996606584741C7C3d0 2007-11-20.02:37:13 zpool replace tww c0t600A0B8000299966059E4668CBD3d0 c0t600A0B8000299CCC06734741CD4Ed0 We accidentally removed c0t600A0B800029996606584741C7C3d0 from the array, hence the 'zpool detach'. The last 'zpool replace' has been running for 15h now. > -Wade > > > > > > > > [EMAIL PROTECTED] wrote on 11/20/2007 09:58:19 AM: > > > > > > > On b66: > > > > # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \ > > > > c0t600A0B8000299CCC06734741CD4Ed0 > > > > < some hours later> > > > > # zpool status tww > > > > pool: tww > > > >state: DEGRADED > > > > status: One or more devices is currently being resilvered. The > pool > > > will > > > > continue to function, possibly in a degraded state. > > > > action: Wait for the resilver to complete. > > > >scrub: resilver in progress, 62.90% done, 4h26m to go > > > > < some hours later> > > > > # zpool status tww > > > > pool: tww > > > >state: DEGRADED > > > > status: One or more devices is currently being resilvered. The > pool > > > will > > > > continue to function, possibly in a degraded state. > > > > action: Wait for the resilver to complete. > > > >scrub: resilver in progress, 3.85% done, 18h49m to go > > > > > > > > # zpool history tww | tail -1 > > > > 2007-11-20.02:37:13 zpool replace tww > > > c0t600A0B8000299966059E4668CBD3d0 > > > > c0t600A0B8000299CCC06734741CD4Ed0 > > > > > > > > So, why did resilvering restart when no zfs operations occurred? I > > > > just ran zpool status again and now I get: > > > > # zpool status tww > > > > pool: tww > > > >state: DEGRADED > > > > status: One or more devices is currently being resilvered. The > pool > > > will > > > > continue to function, possibly in a degraded state. > > > > action: Wait for the resilver to complete. > > > >scrub: resilver in progress, 0.00% done, 134h45m to go > > > > > > > > What's going on? > > > > > > > > -- > > > > albert chin ([EMAIL PROTECTED]) > > > > _______ > > > > zfs-discuss mailing list > > > > zfs-discuss@opensolaris.org > > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > ___ > > > zfs-discuss mailing list > > > zfs-discuss@opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > > > > > -- > > albert chin ([EMAIL PROTECTED]) > > ___ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why did resilvering restart?
On Tue, Nov 20, 2007 at 10:01:49AM -0600, [EMAIL PROTECTED] wrote: > Resilver and scrub are broken and restart when a snapshot is created > -- the current workaround is to disable snaps while resilvering, > the ZFS team is working on the issue for a long term fix. But, no snapshot was taken. If so, zpool history would have shown this. So, in short, _no_ ZFS operations are going on during the resilvering. Yet, it is restarting. > -Wade > > [EMAIL PROTECTED] wrote on 11/20/2007 09:58:19 AM: > > > On b66: > > # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \ > > c0t600A0B8000299CCC06734741CD4Ed0 > > < some hours later> > > # zpool status tww > > pool: tww > >state: DEGRADED > > status: One or more devices is currently being resilvered. The pool > will > > continue to function, possibly in a degraded state. > > action: Wait for the resilver to complete. > >scrub: resilver in progress, 62.90% done, 4h26m to go > > < some hours later> > > # zpool status tww > > pool: tww > >state: DEGRADED > > status: One or more devices is currently being resilvered. The pool > will > > continue to function, possibly in a degraded state. > > action: Wait for the resilver to complete. > >scrub: resilver in progress, 3.85% done, 18h49m to go > > > > # zpool history tww | tail -1 > > 2007-11-20.02:37:13 zpool replace tww > c0t600A0B8000299966059E4668CBD3d0 > > c0t600A0B8000299CCC06734741CD4Ed0 > > > > So, why did resilvering restart when no zfs operations occurred? I > > just ran zpool status again and now I get: > > # zpool status tww > > pool: tww > >state: DEGRADED > > status: One or more devices is currently being resilvered. The pool > will > > continue to function, possibly in a degraded state. > > action: Wait for the resilver to complete. > >scrub: resilver in progress, 0.00% done, 134h45m to go > > > > What's going on? > > > > -- > > albert chin ([EMAIL PROTECTED]) > > ___ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Why did resilvering restart?
On b66: # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \ c0t600A0B8000299CCC06734741CD4Ed0 < some hours later> # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 62.90% done, 4h26m to go < some hours later> # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 3.85% done, 18h49m to go # zpool history tww | tail -1 2007-11-20.02:37:13 zpool replace tww c0t600A0B8000299966059E4668CBD3d0 c0t600A0B8000299CCC06734741CD4Ed0 So, why did resilvering restart when no zfs operations occurred? I just ran zpool status again and now I get: # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 0.00% done, 134h45m to go What's going on? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Oops (accidentally deleted replaced drive)
On Mon, Nov 19, 2007 at 06:23:01PM -0800, Eric Schrock wrote: > You should be able to do a 'zpool detach' of the replacement and then > try again. Thanks. That worked. > - Eric > > On Mon, Nov 19, 2007 at 08:20:04PM -0600, Albert Chin wrote: > > Running ON b66 and had a drive fail. Ran 'zfs replace' and resilvering > > began. But, accidentally deleted the replacement drive on the array > > via CAM. > > > > # zpool status -v > > ... > > raidz2 DEGRADED 0 0 > > 0 > > c0t600A0B800029996605964668CB39d0 ONLINE 0 0 > > 0 > > spare DEGRADED 0 0 > > 0 > > replacingUNAVAIL 0 79.14 > > 0 insufficient replicas > > c0t600A0B8000299966059E4668CBD3d0 UNAVAIL 27 370 > > 0 cannot open > > c0t600A0B800029996606584741C7C3d0 UNAVAIL 0 82.32 > > 0 cannot open > > c0t600A0B8000299CCC05D84668F448d0ONLINE 0 0 > > 0 > > c0t600A0B8000299CCC05B44668CC6Ad0 ONLINE 0 0 > > 0 > > c0t600A0B800029996605A44668CC3Fd0 ONLINE 0 0 > > 0 > > c0t600A0B8000299CCC05BA4668CD2Ed0 ONLINE 0 0 > > 0 > > > > > > Is there a way to recover from this? > > # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \ > > c0t600A0B8000299CCC06734741CD4Ed0 > > cannot replace c0t600A0B8000299966059E4668CBD3d0 with > > c0t600A0B8000299CCC06734741CD4Ed0: cannot replace a replacing device > > > > -- > > albert chin ([EMAIL PROTECTED]) > > ___ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Oops (accidentally deleted replaced drive)
Running ON b66 and had a drive fail. Ran 'zfs replace' and resilvering began. But, accidentally deleted the replacement drive on the array via CAM. # zpool status -v ... raidz2 DEGRADED 0 0 0 c0t600A0B800029996605964668CB39d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 replacingUNAVAIL 0 79.14 0 insufficient replicas c0t600A0B8000299966059E4668CBD3d0 UNAVAIL 27 370 0 cannot open c0t600A0B800029996606584741C7C3d0 UNAVAIL 0 82.32 0 cannot open c0t600A0B8000299CCC05D84668F448d0ONLINE 0 0 0 c0t600A0B8000299CCC05B44668CC6Ad0 ONLINE 0 0 0 c0t600A0B800029996605A44668CC3Fd0 ONLINE 0 0 0 c0t600A0B8000299CCC05BA4668CD2Ed0 ONLINE 0 0 0 Is there a way to recover from this? # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \ c0t600A0B8000299CCC06734741CD4Ed0 cannot replace c0t600A0B8000299966059E4668CBD3d0 with c0t600A0B8000299CCC06734741CD4Ed0: cannot replace a replacing device -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS & array NVRAM cache?
On Tue, Sep 25, 2007 at 06:01:00PM -0700, Vincent Fox wrote: > I don't understand. How do you > > "setup one LUN that has all of the NVRAM on the array dedicated to it" > > I'm pretty familiar with 3510 and 3310. Forgive me for being a bit > thick here, but can you be more specific for the n00b? If you're using CAM, disable NVRAM on all of your LUNs. Then, create another LUN equivalent to the size of your NVRAM. Assign the ZIL to this LUN. You'll then have an NVRAM-backed ZIL. I posted a question along these lines to storage-discuss: http://mail.opensolaris.org/pipermail/storage-discuss/2007-July/003080.html You'll need to determine the performance impact of removing NVRAM from your data LUNs. Don't blindly do it. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zfs log device (zil) ever coming to Sol10?
On Tue, Sep 18, 2007 at 12:59:02PM -0400, Andy Lubel wrote: > I think we are very close to using zfs in our production environment.. Now > that I have snv_72 installed and my pools set up with NVRAM log devices > things are hauling butt. How did you get NVRAM log devices? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] separate intent log blog
On Fri, Jul 27, 2007 at 08:32:48AM -0700, Adolf Hohl wrote: > what is necessary to get it working from the solaris side. Is a > driver on board or is there no special one needed? I'd imagine so. > I just got a packed MM-5425CN with 256M. However i am lacking a > pci-x 64bit connector and not sure if it is worth the whole effort > for my personal purposes. Huh? So your MM-5425CN doesn't fit into a PCI slot? > Any comment are very appreciated How did you obtain your card? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] separate intent log blog
On Wed, Jul 18, 2007 at 01:54:23PM -0600, Neil Perrin wrote: > Albert Chin wrote: > > On Wed, Jul 18, 2007 at 01:29:51PM -0600, Neil Perrin wrote: > >> I wrote up a blog on the separate intent log called "slog blog" > >> which describes the interface; some performance results; and > >> general status: > >> > >> http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on > > > > So, how did you get a "pci Micro Memory pci1332,5425 card" :) I > > presume this is the PCI-X version. > > I wasn't involved in the aquisition but was just sent one internally > for testing. Yes it's PCI-X. I assume your asking because they can > not (or no longer) be obtained? Sadly, not from any reseller I know of. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] separate intent log blog
On Wed, Jul 18, 2007 at 01:00:22PM -0700, Eric Schrock wrote: > You can find these at: > > http://www.umem.com/Umem_NVRAM_Cards.html > > And the one Neil was using in particular: > > http://www.umem.com/MM-5425CN.html They only sell to OEMs. Our Sun VAR looked for one as well but they cannot find anyone selling them. > - Eric > > On Wed, Jul 18, 2007 at 01:54:23PM -0600, Neil Perrin wrote: > > > > > > Albert Chin wrote: > > > On Wed, Jul 18, 2007 at 01:29:51PM -0600, Neil Perrin wrote: > > >> I wrote up a blog on the separate intent log called "slog blog" > > >> which describes the interface; some performance results; and > > >> general status: > > >> > > >> http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on > > > > > > So, how did you get a "pci Micro Memory pci1332,5425 card" :) I > > > presume this is the PCI-X version. > > > > I wasn't involved in the aquisition but was just sent one internally > > for testing. Yes it's PCI-X. I assume your asking because they can > > not (or no longer) be obtained? > > > > Neil. > > ___ > > zfs-discuss mailing list > > zfs-discuss@opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] separate intent log blog
On Wed, Jul 18, 2007 at 01:29:51PM -0600, Neil Perrin wrote: > I wrote up a blog on the separate intent log called "slog blog" > which describes the interface; some performance results; and > general status: > > http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on So, how did you get a "pci Micro Memory pci1332,5425 card" :) I presume this is the PCI-X version. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log
On Tue, Jul 10, 2007 at 07:12:35AM -0500, Al Hopper wrote: > On Mon, 9 Jul 2007, Albert Chin wrote: > > > On Tue, Jul 03, 2007 at 11:02:24AM -0700, Bryan Cantrill wrote: > >> > >> On Tue, Jul 03, 2007 at 10:26:20AM -0500, Albert Chin wrote: > >>> It would also be nice for extra hardware (PCI-X, PCIe card) that > >>> added NVRAM storage to various sun low/mid-range servers that are > >>> currently acting as ZFS/NFS servers. > >> > >> You can do it yourself very easily -- check out the umem cards from > >> Micro Memory, available at http://www.umem.com. Reasonable prices > >> ($1000/GB), they have a Solaris driver, and the performance > >> absolutely rips. > > > > The PCIe card is in beta, they don't sell to individual customers, and > > the person I spoke with didn't even know a vendor (Tier 1/2 OEMs) that > > had a Solaris driver. They do have a number of PCI-X cards though. > > > > So, I guess we'll be testing the "dedicate all NVRAM to LUN" solution > > once b68 is released. > > or ramdiskadm(1M) might be interesting... Well, that's not really an option as a panic of the server would not be good. While the on-disk data would be consistent, data the clients wrote to the server might not have been committed. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log
On Tue, Jul 03, 2007 at 11:02:24AM -0700, Bryan Cantrill wrote: > > On Tue, Jul 03, 2007 at 10:26:20AM -0500, Albert Chin wrote: > > It would also be nice for extra hardware (PCI-X, PCIe card) that > > added NVRAM storage to various sun low/mid-range servers that are > > currently acting as ZFS/NFS servers. > > You can do it yourself very easily -- check out the umem cards from > Micro Memory, available at http://www.umem.com. Reasonable prices > ($1000/GB), they have a Solaris driver, and the performance > absolutely rips. The PCIe card is in beta, they don't sell to individual customers, and the person I spoke with didn't even know a vendor (Tier 1/2 OEMs) that had a Solaris driver. They do have a number of PCI-X cards though. So, I guess we'll be testing the "dedicate all NVRAM to LUN" solution once b68 is released. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log
On Tue, Jul 03, 2007 at 11:02:24AM -0700, Bryan Cantrill wrote: > On Tue, Jul 03, 2007 at 10:26:20AM -0500, Albert Chin wrote: > > PSARC 2007/171 will be available in b68. Any documentation anywhere on > > how to take advantage of it? > > > > Some of the Sun storage arrays contain NVRAM. It would be really nice > > if the array NVRAM would be available for ZIL storage. > > It depends on your array, of course, but in most arrays you can control > the amount of write cache (i.e., NVRAM) dedicated to particular LUNs. > So to use the new separate logging most effectively, you should take > your array, and dedicate all of your NVRAM to a single LUN that you then > use as your separate log device. Your pool should then use a LUN or LUNs > that do not have any NVRAM dedicated to it. Hmm, interesting. We'll try to find out if the 6140's can do this. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log
On Tue, Jul 03, 2007 at 10:31:28AM -0700, Richard Elling wrote: > Albert Chin wrote: > > On Tue, Jul 03, 2007 at 09:01:50AM -0700, Richard Elling wrote: > >> Albert Chin wrote: > >>> Some of the Sun storage arrays contain NVRAM. It would be really nice > >>> if the array NVRAM would be available for ZIL storage. It would also > >>> be nice for extra hardware (PCI-X, PCIe card) that added NVRAM storage > >>> to various sun low/mid-range servers that are currently acting as > >>> ZFS/NFS servers. Or maybe someone knows of cheap SSD storage that > >>> could be used for the ZIL? I think several HD's are available with > >>> SCSI/ATA interfaces. > >> First, you need a workload where the ZIL has an impact. > > > > ZFS/NFS + zil_disable is faster than ZFS/NFS without zil_disable. So, > > I presume, ZFS/NFS + an NVRAM-backed ZIL would be noticeably faster > > than ZFS/NFS + ZIL. > > ... for NFS workloads which are sync-sensitive. Well, yes. We've made the decision not to set zil_disable in lieu of the possibility of the ZFS/NFS server crashing and having the clients out of sync with what's on the ZFS/NFS server. I think this is the common case though for a ZFS/NFS server. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log
On Tue, Jul 03, 2007 at 09:01:50AM -0700, Richard Elling wrote: > Albert Chin wrote: > > Some of the Sun storage arrays contain NVRAM. It would be really nice > > if the array NVRAM would be available for ZIL storage. It would also > > be nice for extra hardware (PCI-X, PCIe card) that added NVRAM storage > > to various sun low/mid-range servers that are currently acting as > > ZFS/NFS servers. Or maybe someone knows of cheap SSD storage that > > could be used for the ZIL? I think several HD's are available with > > SCSI/ATA interfaces. > > First, you need a workload where the ZIL has an impact. ZFS/NFS + zil_disable is faster than ZFS/NFS without zil_disable. So, I presume, ZFS/NFS + an NVRAM-backed ZIL would be noticeably faster than ZFS/NFS + ZIL. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log
On Tue, Jul 03, 2007 at 05:31:00PM +0200, [EMAIL PROTECTED] wrote: > > >PSARC 2007/171 will be available in b68. Any documentation anywhere on > >how to take advantage of it? > > > >Some of the Sun storage arrays contain NVRAM. It would be really nice > >if the array NVRAM would be available for ZIL storage. It would also > >be nice for extra hardware (PCI-X, PCIe card) that added NVRAM storage > >to various sun low/mid-range servers that are currently acting as > >ZFS/NFS servers. Or maybe someone knows of cheap SSD storage that > >could be used for the ZIL? I think several HD's are available with > >SCSI/ATA interfaces. > > Would flash memory be fast enough (current flash memory has reasonable > sequential write throughput but horrible "I/O" ops) Good point. The speeds for the following don't seem very impressive: http://www.adtron.com/products/A25fb-SerialATAFlashDisk.html http://www.sandisk.com/OEM/ProductCatalog(1321)-SanDisk_SSD_SATA_5000_25.aspx -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to take advantage of PSARC 2007/171: ZFS Separate Intent Log
PSARC 2007/171 will be available in b68. Any documentation anywhere on how to take advantage of it? Some of the Sun storage arrays contain NVRAM. It would be really nice if the array NVRAM would be available for ZIL storage. It would also be nice for extra hardware (PCI-X, PCIe card) that added NVRAM storage to various sun low/mid-range servers that are currently acting as ZFS/NFS servers. Or maybe someone knows of cheap SSD storage that could be used for the ZIL? I think several HD's are available with SCSI/ATA interfaces. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS version 5 to version 6 fails to import or upgrade
On Tue, Jun 19, 2007 at 07:16:06PM -0700, John Brewer wrote: > bash-3.00# zpool import > pool: zones > id: 4567711835620380868 > state: ONLINE > status: The pool is formatted using an older on-disk version. > action: The pool can be imported using its name or numeric identifier, though > some features will not be available without an explicit 'zpool > upgrade'. > config: > > zones ONLINE > c0d1s5ONLINE zpool import lists the pools available for import. Maybe you need to actually _import_ the pool first before you can upgrade. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: No zfs_nocacheflush in Solaris 10?
On Fri, May 25, 2007 at 09:54:04AM -0700, Grant Kelly wrote: > > It would also be worthwhile doing something like the following to > > determine the max throughput the H/W RAID is giving you: > > # time dd of= if=/dev/zero bs=1048576 count=1000 > > or a 2Gbps 6140 with 300GB/10K drives, we get ~46MB/s on a > > single-drive RAID-0 array, ~83MB/s on a 4-disk RAID-0 array w/128k > > stripe, and ~69MB/s on a seven-disk RAID-5 array w/128k strip. > > Well the Solaris kernel is telling me that it doesn't understand > zfs_nocacheflush, but the array sure is acting like it! > I ran the dd example, but increased the count for a longer running time. I don't think a longer running time is going to give you a more accurate measurement. > 5-disk RAID5 with UFS: ~79 MB/s What about against a raw RAID-5 device? > 5-disk RAID5 with ZFS: ~470 MB/s I don't think you want to if=/dev/zero on ZFS. There's probably some optimization going on. Better to use /dev/urandom or concat n-many files comprised of random bits. > I'm assuming there's some caching going on with ZFS that's really > helping out? Yes. > Also, no Santricity, just Sun's Common Array Manager. Is it possible > to use both without completely confusing the array? I think both are ok. CAM is free. Dunno about Santricity. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] No zfs_nocacheflush in Solaris 10?
On Fri, May 25, 2007 at 12:01:45PM -0400, Andy Lubel wrote: > Im using: > > zfs set:zil_disable 1 > > On my se6130 with zfs, accessed by NFS and writing performance almost > doubled. Since you have BBC, why not just set that? I don't think it's enough to have BBC to justify zil_disable=1. Besides, I don't know anyone from Sun recommending zil_disable=1. If your storage array has BBC, it doesn't matter. What matters is what happens when ZIL isn't flushed and your file server crashes (ZFS file system is still consistent but you'll lose some info that hasn't been flushed by ZIL). Even having your file server on a UPS won't help here. http://blogs.sun.com/erickustarz/entry/zil_disable discusses some of the issues affecting zil_disable=1. We know we get better performance with zil_disable=1 but we're not taking any chances. > -Andy > > > > On 5/24/07 4:16 PM, "Albert Chin" > <[EMAIL PROTECTED]> wrote: > > > On Thu, May 24, 2007 at 11:55:58AM -0700, Grant Kelly wrote: > >> I'm running SunOS Release 5.10 Version Generic_118855-36 64-bit > >> and in [b]/etc/system[/b] I put: > >> > >> [b]set zfs:zfs_nocacheflush = 1[/b] > >> > >> And after rebooting, I get the message: > >> > >> [b]sorry, variable 'zfs_nocacheflush' is not defined in the 'zfs' > >> module[/b] > >> > >> So is this variable not available in the Solaris kernel? > > > > I think zfs:zfs_nocacheflush is only available in Nevada. > > > >> I'm getting really poor write performance with ZFS on a RAID5 volume > >> (5 disks) from a storagetek 6140 array. I've searched the web and > >> these forums and it seems that this zfs_nocacheflush option is the > >> solution, but I'm open to others as well. > > > > What type of poor performance? Is it because of ZFS? You can test this > > by creating a RAID-5 volume on the 6140, creating a UFS file system on > > it, and then comparing performance with what you get against ZFS. > > > > It would also be worthwhile doing something like the following to > > determine the max throughput the H/W RAID is giving you: > > # time dd of= if=/dev/zero bs=1048576 count=1000 > > For a 2Gbps 6140 with 300GB/10K drives, we get ~46MB/s on a > > single-drive RAID-0 array, ~83MB/s on a 4-disk RAID-0 array w/128k > > stripe, and ~69MB/s on a seven-disk RAID-5 array w/128k strip. > -- > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] No zfs_nocacheflush in Solaris 10?
On Fri, May 25, 2007 at 12:14:45AM -0400, Torrey McMahon wrote: > Albert Chin wrote: > >On Thu, May 24, 2007 at 11:55:58AM -0700, Grant Kelly wrote: > > > > > >>I'm getting really poor write performance with ZFS on a RAID5 volume > >>(5 disks) from a storagetek 6140 array. I've searched the web and > >>these forums and it seems that this zfs_nocacheflush option is the > >>solution, but I'm open to others as well. > >> > > > >What type of poor performance? Is it because of ZFS? You can test this > >by creating a RAID-5 volume on the 6140, creating a UFS file system on > >it, and then comparing performance with what you get against ZFS. > > If it's ZFS then you might want to check into modifying the 6540 NVRAM > as mentioned in this thread > > http://mail.opensolaris.org/pipermail/zfs-discuss/2006-December/024194.html > > there is a fix that doesn't involve modifying the NVRAM in the works. (I > don't have an estimate.) The above URL helps only if you have Santricity. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] No zfs_nocacheflush in Solaris 10?
On Thu, May 24, 2007 at 11:55:58AM -0700, Grant Kelly wrote: > I'm running SunOS Release 5.10 Version Generic_118855-36 64-bit > and in [b]/etc/system[/b] I put: > > [b]set zfs:zfs_nocacheflush = 1[/b] > > And after rebooting, I get the message: > > [b]sorry, variable 'zfs_nocacheflush' is not defined in the 'zfs' module[/b] > > So is this variable not available in the Solaris kernel? I think zfs:zfs_nocacheflush is only available in Nevada. > I'm getting really poor write performance with ZFS on a RAID5 volume > (5 disks) from a storagetek 6140 array. I've searched the web and > these forums and it seems that this zfs_nocacheflush option is the > solution, but I'm open to others as well. What type of poor performance? Is it because of ZFS? You can test this by creating a RAID-5 volume on the 6140, creating a UFS file system on it, and then comparing performance with what you get against ZFS. It would also be worthwhile doing something like the following to determine the max throughput the H/W RAID is giving you: # time dd of= if=/dev/zero bs=1048576 count=1000 For a 2Gbps 6140 with 300GB/10K drives, we get ~46MB/s on a single-drive RAID-0 array, ~83MB/s on a 4-disk RAID-0 array w/128k stripe, and ~69MB/s on a seven-disk RAID-5 array w/128k strip. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Rsync update to ZFS server over SSH faster than over NFS?
On Mon, May 21, 2007 at 13:23:48 -0800, Marion Hakanson wrote: >Albert Chin wrote: >> Why can't the NFS performance match that of SSH? > > My first guess is the NFS vs array cache-flush issue. Have you > configured the 6140 to ignore SYNCHRONIZE_CACHE requests? That'll > make a huge difference for NFS clients of ZFS file servers. Doesn't setting zfs:zfs_nocacheflush=1 achieve the same result: http://blogs.digitar.com/jjww/?itemid=44 The 6140 has a non-volatile cache. Dunno if it's order-preserving though. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Rsync update to ZFS server over SSH faster than over NFS?
On Tue, May 22, 2007 at 03:23:46PM +0100, Dick Davies wrote: > > Take off every ZIL! > > http://number9.hellooperator.net/articles/2007/02/12/zil-communication Interesting. With "set zfs:zil_disable = 1", I get: 1. [copy 400MB of gcc-3.4.3 via rsync/NFS] # mount file-server:/opt/test /mnt # rsync -vaHR --delete --stats gcc343 /mnt ... (old) sent 409516941 bytes received 80590 bytes 5025736.58 bytes/sec (new) sent 409516941 bytes received 80590 bytes 7380135.69 bytes/sec 2. [copy 400MB of gcc-3.4.3 via tar/NFS to ZFS file system] # mount file-server:/opt/test /mnt # time tar cf - gcc343 | (cd /mnt; tar xpf - ) ... (old) 419721216 bytes in 1:08.65 => 6113928.86 bytes/sec (new) 419721216 bytes in 0:44.67 => 9396042.44 bytes/sec > > > On 22/05/07, Albert Chin > <[EMAIL PROTECTED]> wrote: > >On Mon, May 21, 2007 at 06:11:36PM -0500, Nicolas Williams wrote: > >> On Mon, May 21, 2007 at 06:09:46PM -0500, Albert Chin wrote: > >> > But still, how is tar/SSH any more multi-threaded than tar/NFS? > >> > >> It's not that it is, but that NFS sync semantics and ZFS sync > >> semantics conspire against single-threaded performance. > > > >What's why we have "set zfs:zfs_nocacheflush = 1" in /etc/system. But, > >that's only helps ZFS. Is there something similar for NFS? > > > >-- > >albert chin ([EMAIL PROTECTED]) > >___ > >zfs-discuss mailing list > >zfs-discuss@opensolaris.org > >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > -- > Rasputin :: Jack of All Trades - Master of Nuns > http://number9.hellooperator.net/ > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Rsync update to ZFS server over SSH faster than over
On Mon, May 21, 2007 at 08:26:37PM -0700, Paul Armstrong wrote: > GIven you're not using compression for rsync, the only thing I can > think if would be that the stream compression of SSH is helping > here. SSH compresses by default? I thought you had to specify -oCompression and/or -oCompressionLevel? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Rsync update to ZFS server over SSH faster than over NFS?
On Mon, May 21, 2007 at 06:11:36PM -0500, Nicolas Williams wrote: > On Mon, May 21, 2007 at 06:09:46PM -0500, Albert Chin wrote: > > But still, how is tar/SSH any more multi-threaded than tar/NFS? > > It's not that it is, but that NFS sync semantics and ZFS sync > semantics conspire against single-threaded performance. What's why we have "set zfs:zfs_nocacheflush = 1" in /etc/system. But, that's only helps ZFS. Is there something similar for NFS? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Rsync update to ZFS server over SSH faster than over NFS?
On Mon, May 21, 2007 at 04:55:35PM -0600, Robert Thurlow wrote: > Albert Chin wrote: > > >I think the bigger problem is the NFS performance penalty so we'll go > >lurk somewhere else to find out what the problem is. > > Is this with Solaris 10 or OpenSolaris on the client as well? Client is RHEL 4/x86_64. But, we just ran a concurrent tar/SSH across Solaris 10, HP-UX 11.23/PA, 11.23/IA, AIX 5.2, 5.3, RHEL 4/x86, 4/x86_64 and the average was ~4562187 bytes/sec. But, the gcc343 copy on each of these machines isn't the same size. It's certainly less than 400MBx7 though. While performance on one system is fine, things degrade when you add clients. > I guess this goes back to some of the "why is tar slow over NFS" > discussions we've had, some here and some on nfs-discuss. A more > multi-threaded workload would help; so will planned work to focus > on performance of NFS and ZFS together, which can sometimes be > slower than expected. But still, how is tar/SSH any more multi-threaded than tar/NFS? I've posted to nfs-discuss so maybe someone knows something. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Rsync update to ZFS server over SSH faster than over NFS?
On Mon, May 21, 2007 at 02:55:18PM -0600, Robert Thurlow wrote: > Albert Chin wrote: > > >Why can't the NFS performance match that of SSH? > > One big reason is that the sending CPU has to do all the comparisons to > compute the list of files to be sent - it has to fetch the attributes > from both local and remote and compare timestamps. With ssh, local > processes at each end do lstat() calls in parallel and chatter about > the timestamps, and the lstat() calls are much cheaper. I would wonder > how long the attr-chatter takes in your two cases before bulk data > starts to be sent - deducting that should reduce the imbalance you're > seeing. If rsync were more multi-threaded and could manage multiple > lstat() calls in parallel NFS would be closer. Well, there is no data on the file server as this is an initial copy, so there is very little for rsync to do. To compare the rsync overhead, I conducted some more tests, using tar: 1. [copy 400MB of gcc-3.4.3 via tar/NFS to ZFS file system] # mount file-server:/opt/test /mnt # time tar cf - gcc343 | (cd /mnt; tar xpf - ) ... 419721216 bytes in 1:08.65 => 6113928.86 bytes/sec 2. [copy 400MB of gcc-3.4.3 via tar/ssh to ZFS file system] # time tar cf - gcc343 | ssh -oForwardX11=no file-server \ 'cd /opt/test; tar xpf -' ... 419721216 bytes in 35:82 => 11717510.21 bytes/sec 3. [copy 400MB of gcc-3.4.3 via tar/NFS to Fibre-attached file system] # mount file-server:/opt/fibre-disk /mnt # time tar cf - gcc343 | (cd /mnt; tar xpf - ) ... 419721216 bytes in 56:87 => 7380362.51 bytes/sec 4. [copy 400MB of gcc-3.4.3 via tar/ssh to Fibre-attached file system] # time tar cf - gcc343 | ssh -oForwardX11=no file-server \ 'cd /opt/fibre-disk; tar xpf -' ... 419721216 bytes in 35:89 => 11694656.34 bytes/sec So, it would seem using #1 and #2, NFS performance can stand some improvement. And, I'd have thought that since #2/#4 were similar, #1/#3 should be as well. Maybe some NFS/ZFS issues would answer the discrepancy. I think the bigger problem is the NFS performance penalty so we'll go lurk somewhere else to find out what the problem is. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Rsync update to ZFS server over SSH faster than over NFS?
We're testing an X4100M2, 4GB RAM, with a 2-port 4GB Fibre Channel QLogic connected to a 2GB Fibre Channel 6140 array. The X4100M2 is running OpenSolaris b63. We have 8 drives in the Sun 6140 configured as individual RAID-0 arrays and have a ZFS RAID-Z2 array comprising 7 of the drives (for testing, we're treating the 6140 as JBOD for now). The RAID-0 stripe size is 128k. We're testing updates to the X4100M2 using rsync across the network with ssh and using NFS: 1. [copy 400MB of gcc-3.4.3 via rsync/NFS] # mount file-server:/opt/test /mnt # rsync -vaHR --delete --stats gcc343 /mnt ... sent 409516941 bytes received 80590 bytes 5025736.58 bytes/sec 2. [copy 400MB of gcc-3.4.3 via rsync/ssh] # rsync -vaHR -e 'ssh' --delete --stats gcc343 file-server:/opt/test ... sent 409516945 bytes received 80590 bytes 9637589.06 bytes/sec The network is 100MB. /etc/system on the file server is: set maxphys = 0x80 set ssd:ssd_max_throttle = 64 set zfs:zfs_nocacheflush = 1 Why can't the NFS performance match that of SSH? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS+NFS on storedge 6120 (sun t4)
On Sat, Apr 21, 2007 at 09:05:01AM +0200, Selim Daoud wrote: > isn't there another flag in /etc/system to force zfs not to send flush > requests to NVRAM? I think it's zfs_nocacheflush=1, according to Matthew Ahrens in http://blogs.digitar.com/jjww/?itemid=44. > s. > > > On 4/20/07, Marion Hakanson <[EMAIL PROTECTED]> wrote: > >[EMAIL PROTECTED] said: > >> We have been combing the message boards and it looks like there was a > >lot of > >> talk about this interaction of zfs+nfs back in november and before but > >since > >> i have not seen much. It seems the only fix up to that date was to > >disable > >> zil, is that still the case? Did anyone ever get closure on this? > > > >There's a way to tell your 6120 to ignore ZFS cache flushes, until ZFS > >learns to do that itself. See: > > http://mail.opensolaris.org/pipermail/zfs-discuss/2006-December/024194.html > > > >Regards, > > > >Marion > > > > > >___ > >zfs-discuss mailing list > >zfs-discuss@opensolaris.org > >http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 6410 expansion shelf
On Thu, Mar 22, 2007 at 01:21:04PM -0700, Frank Cusack wrote: > Does anyone have a 6140 expansion shelf that they can hook directly to > a host? Just wondering if this configuration works. Previously I > though the expansion connector was proprietary but now I see it's > just fibre channel. The 6140 controller unit has either 2GB or 4GB cache. Does the 6140 expansion shelf have cache as well or is the cache in the controller unit used for all expansions shelves? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] File level snapshots in ZFS?
On Thu, Mar 29, 2007 at 11:52:56PM +0530, Atul Vidwansa wrote: > Is it possible to take file level snapshots in ZFS? Suppose I want to > keep a version of the file before writing new data to it, how do I do > that? My goal would be to rollback the file to earlier version (i.e. > discard the new changes) depending upon a policy. I would like to > keep only 1 version of a file at a time and while writing new data, > earlier version will be discarded and current state of file (before > writing) would be saved in the version. Doubt it. Snapshots are essentiall "free" and take no time so might as well just snapshot the file system. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Hypothetical question about multiple host access to 6140 array
Hypothetical question. Say you have one 6140 controller tray and one 6140 expansion tray. In the beginning, these are connected to a 5220 or 5320 NAS appliance. Then, say you get a Sun server, a X4100, and connect it to the 6140 controller tray (the 6140 supports multiple data hosts). Is it possible to allocate some disks from the 6140 array to ZFS on the X4100 for the purpose of migrating data from the appliance to ZFS? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hot spares - in standby?
On Mon, Jan 29, 2007 at 09:37:57PM -0500, David Magda wrote: > On Jan 29, 2007, at 20:27, Toby Thain wrote: > > >On 29-Jan-07, at 11:02 PM, Jason J. W. Williams wrote: > > > >>I seem to remember the Massive Array of Independent Disk guys ran > >>into > >>a problem I think they called static friction, where idle drives > >>would > >>fail on spin up after being idle for a long time: > > > >You'd think that probably wouldn't happen to a spare drive that was > >spun up from time to time. In fact this problem would be (mitigated > >and/or) caught by the periodic health check I suggested. > > What about a rotating spare? > > When setting up a pool a lot of people would (say) balance things > around buses and controllers to minimize single points of failure, > and a rotating spare could disrupt this organization, but would it be > useful at all? Agami Systems has the concept of "Enterprise Sparing", where the hot spare is distributed amongst data drives in the array. When a failure occurs, the rebuild occurs in parallel across _all_ drives in the array: http://www.issidata.com/specs/agami/enterprise-classreliability.pdf -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Mon, Jan 29, 2007 at 11:17:05AM -0800, Jeffery Malloch wrote: > From what I can tell from this thread ZFS if VERY fussy about > managing writes,reads and failures. It wants to be bit perfect. So > if you use the hardware that comes with a given solution (in my case > an Engenio 6994) to manage failures you risk a) bad writes that > don't get picked up due to corruption from write cache to disk b) > failures due to data changes that ZFS is unaware of that the > hardware imposes when it tries to fix itself. > > So now I have a $70K+ lump that's useless for what it was designed > for. I should have spent $20K on a JBOD. But since I didn't do > that, it sounds like a traditional model works best (ie. UFS et al) > for the type of hardware I have. No sense paying for something and > not using it. And by using ZFS just as a method for ease of file > system growth and management I risk much more corruption. Well, ZFS with HW RAID makes sense in some cases. However, it seems that if you are unwilling to lose 50% disk space to RAID 10 or two mirrored HW RAID arrays, you either use RAID 0 on the array with ZFS RAIDZ/RAIDZ2 on top of that or a JBOD with ZFS RAIDZ/RAIDZ2 on top of that. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
On Thu, Jan 25, 2007 at 02:24:47PM -0600, Al Hopper wrote: > On Thu, 25 Jan 2007, Bill Sommerfeld wrote: > > > On Thu, 2007-01-25 at 10:16 -0500, Torrey McMahon wrote: > > > > > > So there's no way to treat a 6140 as JBOD? If you wanted to use a 6140 > > > > with ZFS, and really wanted JBOD, your only choice would be a RAID 0 > > > > config on the 6140? > > > > > > Why would you want to treat a 6140 like a JBOD? (See the previous > > > threads about JBOD vs HW RAID...) > > > > Let's turn this around. Assume I want a FC JBOD. What should I get? > > Many companies make FC expansion "boxes" to go along with their FC based > hardware RAID arrays. Often, the expansion chassis is identical to the > RAID equipped chassis - same power supplies, same physical chassis and > disk drive carriers - the only difference is that the slots used to house > the (dual) RAID H/W controllers have been blanked off. These expansion > chassis are designed to be daisy chained back to the "box" with the H/W > RAID. So you simply use one of the expansion chassis and attach it > directly to a system equipped with an FC HBA and ... you've got an FC > JBOD. Nearly all of them will support two FC connections to allow dual > redundant connections to the FC RAID H/W. So if you equip your ZFS host > with either a dual-port FC HBA or two single-port FC HBAs - you have a > pretty good redundant FC JBOD solution. > > An example of such an expansion box is the DS4000 EXP100 from IBM. It's > also possible to purchase a 3510FC box from Sun with no RAID controllers - > but their nearest equivalent of an "empty" box comes with 6 (overpriced) > disk drives pre-installed. :( > > Perhaps you could use your vast influence at Sun to persuade them to sell > an empty 3510FC box? Or an empty box bundled with a single or dual-port > FC card (Qlogic based please). Well - there's no harm in making the > suggestion ... right? Well, when you buy disk for the Sun 5320 NAS Appliance, you get a Controller Unit shelf and, if you expand storage, an Expansion Unit shelf that connects to the Controller Unit. Maybe the Expansion Unit shelf is a JBOD 6140? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
On Thu, Jan 25, 2007 at 10:16:47AM -0500, Torrey McMahon wrote: > Albert Chin wrote: > >On Wed, Jan 24, 2007 at 10:19:29AM -0800, Frank Cusack wrote: > > > >>On January 24, 2007 10:04:04 AM -0800 Bryan Cantrill <[EMAIL PROTECTED]> > >>wrote: > >> > >>>On Wed, Jan 24, 2007 at 09:46:11AM -0800, Moazam Raja wrote: > >>> > >>>>Well, he did say fairly cheap. the ST 3511 is about $18.5k. That's > >>>>about the same price for the low-end NetApp FAS250 unit. > >>>> > >>>Note that the 3511 is being replaced with the 6140: > >>> > >>Which is MUCH nicer but also much pricier. Also, no non-RAID option. > >> > > > >So there's no way to treat a 6140 as JBOD? If you wanted to use a 6140 > >with ZFS, and really wanted JBOD, your only choice would be a RAID 0 > >config on the 6140? > > Why would you want to treat a 6140 like a JBOD? (See the previous > threads about JBOD vs HW RAID...) Well, a 6140 with RAID 10 is not an option because we don't want to lose 50% disk capacity. So, we're left with RAID 5. Yes, we could layer ZFS on top of this. But what do you do if you want RAID 6? Easiest way to get it is ZFS RAIDZ2 on top of JBOD. The only reason I'd consider RAID is if the HW RAID performance was enough of a win over ZFS SW RAID. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Thumper Origins Q
On Wed, Jan 24, 2007 at 10:19:29AM -0800, Frank Cusack wrote: > On January 24, 2007 10:04:04 AM -0800 Bryan Cantrill <[EMAIL PROTECTED]> > wrote: > > > >On Wed, Jan 24, 2007 at 09:46:11AM -0800, Moazam Raja wrote: > >>Well, he did say fairly cheap. the ST 3511 is about $18.5k. That's > >>about the same price for the low-end NetApp FAS250 unit. > > > >Note that the 3511 is being replaced with the 6140: > > Which is MUCH nicer but also much pricier. Also, no non-RAID option. So there's no way to treat a 6140 as JBOD? If you wanted to use a 6140 with ZFS, and really wanted JBOD, your only choice would be a RAID 0 config on the 6140? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mounting a ZFS clone
On Tue, Jan 16, 2007 at 01:28:04PM -0800, Eric Kustarz wrote: > Albert Chin wrote: > >On Mon, Jan 15, 2007 at 10:55:23AM -0600, Albert Chin wrote: > > > >>I have no hands-on experience with ZFS but have a question. If the > >>file server running ZFS exports the ZFS file system via NFS to > >>clients, based on previous messages on this list, it is not possible > >>for an NFS client to mount this NFS-exported ZFS file system on > >>multiple directories on the NFS client. > > > > > >At least, I thought I read this somewhere. Is the above possible? I > >don't see why it should not be. > > Yes, you can mount multiple *filesystems* via NFS. And the fact that the file systems on the remote server are ZFS is irrelevant? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mounting a ZFS clone
On Mon, Jan 15, 2007 at 10:55:23AM -0600, Albert Chin wrote: > I have no hands-on experience with ZFS but have a question. If the > file server running ZFS exports the ZFS file system via NFS to > clients, based on previous messages on this list, it is not possible > for an NFS client to mount this NFS-exported ZFS file system on > multiple directories on the NFS client. At least, I thought I read this somewhere. Is the above possible? I don't see why it should not be. -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Mounting a ZFS clone
I have no hands-on experience with ZFS but have a question. If the file server running ZFS exports the ZFS file system via NFS to clients, based on previous messages on this list, it is not possible for an NFS client to mount this NFS-exported ZFS file system on multiple directories on the NFS client. So, let's say I create a ZFS clone of some ZFS file system. Is it possible for an NFS client to mount the ZFS file system _and_ the clone without problems? If the clone is underneath the ZFS file system hierarchy, will mounting the ZFS file system I created the clone from allow the NFS client access to the remote ZFS file system and the clone? -- albert chin ([EMAIL PROTECTED]) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss