Re: [zfs-discuss] Snapshots, txgs and performance
Hello there, I think you should share it with the list, if you can, seems like an interesting work. ZFS has some issues with snapshots and spa_sync performance for snapshots deletion. Thanks Leal [ http://www.eall.com.br/blog ] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD (SLC) for cache...
Thanks Adam, So, if i understand well, the MLC SSD more appropriate for read cache is more theory than pratice right now. Right? I mean, SUN is just using SLC SSD's? That would explain the only support for SLC on SUN hardware (x42xx) series. Thanks again. Leal [ http://www.eall.com.br/blog ] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] When writing to SLOG at full speed all disk IO is blocked
Ok Bob, but i think that is the problem about picket fencing... and so we are talking about commit the sync operations to disk. What i'm seeing is no read activity from disks when the slog is beeing written. The disks are zero (no read, no write). Thanks a lot for your reply. Leal [ http://www.eall.com.br/blog ] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] When writing to SLOG at full speed all disk IO is blocked
Hello, Well, i'm trying to understand this workload, but what i'm seeing to reproduce this is just flood the SSD with writes, and the disks show no activity. I'm testing with aggr (two links), and for one or two seconds there is no read activity (output from server). Right now i'm suspecting something with the network, because i did some ZFS tuning, and seems like i'm not getting the 100% utilization on SSD, and the behaviour is still happening. I need to confirm this, and will share with you. Thanks for your reply. Leal [ http://www.eall.com.br/blog ] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work
That's only one element of it Bob. ZFS also needs devices to fail quickly and in a predictable manner. A consumer grade hard disk could lock up your entire pool as it fails. The kit Sun supply is more likely to fail in a manner ZFS can cope with. I agree 100%. Hardware, firmware, drivers, should be fully integrated to a mission critical app. With the wrong firmware, and consumer grade HD, disks failures stalls the entire pool. I have experience with disks failing and taking 2 or tree seconds to the system cope with (not just ZFS, but the controller, etc). Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Fishworks iSCSI cache enabled...
Hello all, Somebody using iSCSI cache enable on 7000 series? I'm talking about OpenSolaris (ZFS) as an iSCSI initiator, because i don't know another filesystem that handles disk caches. So, that option was created for ZFS ;-)? Any suggestions on this? Thanks Leal [ http://www.eall.com.br/blog ] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] When writing to SLOG at full speed all disk IO is blocked
Hello all... I'm seeing this behaviour in an old build (89), and i just want to hear from you if there is some known bug about it. I'm aware of the picket fencing problem, and that ZFS is not choosing right if write to slog is better or not (thinking if we have a better throughput from disks). But i did not find anything about 100% slog activity (~115MB/s) blocks IO from disks. Two or three seconds of zero read or write on disks... Thanks a lot for your time! Leal [ http://www.eall.com.br/blog ] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
Note that this issue does not apply at all to NFS service, database service, or any other usage which does synchronous writes. Bob Hello Bob, There is impact for all workloads. The fact that the write is sync or not, is just a question to write on slog (SSD) or not. But the txg interval and sync time is the same. Actually the zil code is just to preserve that exact same thing for synchronous writes. Leal [ http://www.eall.com.br/blog ] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zio_taskq_threads and TXG sync
Hello all, I'm trying to understand the ZFS IO scheduler ( http://www.eall.com.br/blog/?p=1170 ), and why sometimes the system seems to be stalled for some seconds, and every application that needs some IO (most read, i think), have serious problems. What can be a big problem in iSCSI or NFS soft mounts. Looking at the code, i could get to the zio_taskq_threads structure, and to this bug report: http://bugs.opensolaris.org/bugdatabase/printableBug.do?bug_id=6826241 And seems like it was already integrated to newer releases (i don't know since when)... Somebody could explain the real diff between the ISSUE and INTR, READ and WRITE changes, and maybe why in the first implementation were the same value for both? ;-) Another move that i did not fully understand very well, was the time between txg syncs, from 5s to 30s, what i think can make this problem worst, because we will have more data to commit. Well to much questions... ;-) PS: Where i can find the patches and attachments from the bugs.opensolaris.org? The files mention attach, but i can not find them. Thanks a lot for your time! Leal [ http://www.eall.com.br/blog ] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS and Amanda
Hello all, There is some project here to integrate amanda on opensolaris, or some howto for integration with ZFS? Some use case (using the opensource version)? The amanda site there is a few instructions, but i think here we can create something more specific to OS. Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] E2BIG
Hello all... We are getting this error: E2BIG - Arg list too long, when trying to send incremental backups (b89 - b101). Do you know about any bugs related to that? I did a look on the archives, and google but could not find anything. What i did find was something related with wrong timestamps (32bits), and some ZFS test on the code: zfs_vnops.c. But the error is EOVERFLOW... Thanks a lot for your time! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS Block Monitor
FYI (version 0.3): http://www.eall.com.br/blog/?p=970 Leal [ http://www.eall.com.br/blog ] Hello all.. I did some tests to understand the behaviour of ZFS and slog (SSD), and for understand the workload i did implement a simple software to visualize the data blocks (read/write). I'm posting here the link in the case somebody wants to try it. http://www.eall.com.br/blog/?p=906 Thanks a lot for your time. Leal http://www.eall.com.br/blog] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] NFS Block Monitor
Hello all.. I did some tests to understand the behaviour of ZFS and slog (SSD), and for understand the workload i did implement a simple software to visualize the data blocks (read/write). I'm posting here the link in the case somebody wants to try it. http://www.eall.com.br/blog/?p=906 Thanks a lot for your time. Leal [http://www.eall.com.br/blog] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to find out the zpool of an uberblock printed with the fbt:zfs:uberblock_update: probes?
Hello Bernd, Now i see your point... ;-) Well, following a very simple math: - One txg each 5 seconds = 17280/day; - Each txg writing 1MB (L0-L3) = 17GB/day In the paper the math was 10 years = ( 2.7 * the size of the USB drive) writes per day, right? So, in a 4GB drive, would be ~10GB/day. Then, just the labels update would make our USB drive live for 5 years... and if each txg update 5MB of data, our drive would live for just a year. Help, i'm not good with numbers... ;-) Leal [http://www.eall.com.br/blog] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to find out the zpool of an uberblock printed with the fbt:zfs:uberblock_update: probes?
Marcelo, Hello there... I did some more tests. You are getting very useful informations with your tests. Thanks a lot!! I found that not each uberblock_update() is also followed by a write to the disk (although the txg is increased every 30 seconds for each of the three zpools of my 2008.11 system). In these cases, ub_rootbp.blk_birth stays at the same value while txg is incremented by 1. Are you sure about that? I mean, what i could understand for the ondiskformat, is that there is a correlation 1:1 between txg, creation time, and ubberblock. Each time there is write to the pool, we have another state of the filesystem. Actually, we just need another valid uberblock when we change the filesystem state (write to it). But each sync command on the OS level is followed by a vdev_uberblock_sync() directly after the uberblock_update() and then by four writes to the four uberblock copies (one per copy) on disk. Hmm, maybe the uberblock_update is not really important in our discussion... ;-) And a change to one or more files in any pool during the 30 seconds interval is also followed by a vdev_uberblock_sync() of that pool at the end of the interval. So, what is the uberblock_update? So on my system (a web server) during time when there is enough activity that each uberblock_update() is followed by vdev_uberblock_sync(), I get: 2 writes per minute (*60) I'm totally lost... 2 writes per minute? writes per hour (*24) 2880 writes per day ut only each 128th time to the same block - = 22.5 writes to the same block on the drive per day. If we take the lower number of max. writes in the referenced paper which is 10.000, we get 10.000/22.5 = 444.4 days or one year and 79 days. For 100.000, we get .4 days or more than 12 years. Ok, but i think the number is 10.000. 100.000 would be a static wear leveling, and it is a non-trivial implementation for USB pen drives right? During times without http access to my server, only about each 5th to 10th uberblock_update() is followed by vdev_uberblock_sync() for rpool, and much less for the two data pools, which means that the corresponding uberblocks on disk will be skipped for writing (if I did not overlook anything), and the device will likely be worn out later. I need to know what is the uberblock_update... it seems not related with txg, sync of disks, labels, nothing... ;-) Thanks a lot Bernd. Leal [http://www.eall.com.br/blog] Regards, Bernd Marcelo Leal wrote: Hello Bernd, Now i see your point... ;-) Well, following a very simple math: - One txg each 5 seconds = 17280/day; - Each txg writing 1MB (L0-L3) = 17GB/day In the paper the math was 10 years = ( 2.7 * the size of the USB drive) writes per day, right? So, in a 4GB drive, would be ~10GB/day. Then, just the labels update would make our USB drive live for 5 years... and if each txg update 5MB of data, our drive would live for just a year. Help, i'm not good with numbers... ;-) Leal [http://www.eall.com.br/blog] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to find out the zpool of an uberblock printed with the fbt:zfs:uberblock_update: probes?
Hi, Hello Bernd, After I published a blog entry about installing OpenSolaris 2008.11 on a USB stick, I read a comment about a possible issue with wearing out blocks on the USB stick after some time because ZFS overwrites its uberblocks in place. I did not understand well what you are trying to say with wearing out blocks, but in fact the uberblocks are not overwriten in place. The pattern you did notice with the dtrace script, is the update of the uberblock that is maintained in an array of 128 elements (1K each, just one active at time). Each physical vdev has four labes (256K structures) L0, L1, L2, and L3. Two in the begining and two at the end. Because the labels are in fixed location on disk, is the only update that zfs does not uses cow, but a two staged update. IIRC, the update is L0 and L2,and after that L1 and L3. Take a look: http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_label.c So: - The label is overwritten (in a two staged update); - The uberblock is not overwritten, but do write to a new element on the array. So, the transition from one uberblock(txg and timestamp) to another is atomic. I'm deploying a USB solution too, so if you can clarify the problem, i would appreciate it. ps.: I did look your blog, but did not see any comments around that, and the comments section is closed. ;-) Leal [http://www.eall.com.br/blog] I tried to get more information about how updating uberblocks works with the following dtrace script: /* io:genunix::start */ io:genunix:default_physio:start, io:genunix:bdev_strategy:start, io:genunix:biodone:done { printf (%d %s %d %d, timestamp, execname, args[0]-b_blkno, rgs[0]-b_bcount); } fbt:zfs:uberblock_update:entry { printf (%d (%d) %d, %d, %d, %d, %d, %d, %d, %d, timestamp, args[0]-ub_timestamp, args[0]-ub_rootbp.blk_prop, args[0]-ub_guid_sum, args[0]-ub_rootbp.blk_birth, args[0]-ub_rootbp.blk_fill, args[1]-vdev_id, args[1]-vdev_asize, args[1]-vdev_psize, args[2]); e output shows the following pattern after most of the uberblock_update events: 0 34404 uberblock_update:entry 244484736418912 (1231084189) 226475971064889345, 4541013553469450828, 26747, 159, 0, 0, 0, 26747 0 6668bdev_strategy:start 244485190035647 sched 502 1024 0 6668bdev_strategy:start 244485190094304 sched 1014 1024 0 6668bdev_strategy:start 244485190129133 sched 39005174 1024 0 6668bdev_strategy:start 244485190163273 sched 39005686 1024 0 6656 biodone:done 244485190745068 sched 502 1024 0 6656 biodone:done 244485191239190 sched 1014 1024 0 6656 biodone:done 244485191737766 sched 39005174 1024 0 6656 biodone:done 244485192236988 sched 39005686 1024 ... 0 34404 uberblock_update:entry 244514710086249 1231084219) 9226475971064889345, 4541013553469450828, 26747, 159, 0, 0, 0, 26748 0 34404 uberblock_update:entry 244544710086804 1231084249) 9226475971064889345, 4541013553469450828, 26747, 159, 0, 0, 0, 26749 ... 0 34404 uberblock_update:entry 244574740885524 1231084279) 9226475971064889345, 4541013553469450828, 26750, 159, 0, 0, 0, 26750 0 6668 bdev_strategy:start 244575189866189 sched 508 1024 0 6668 bdev_strategy:start 244575189926518 sched 1020 1024 0 6668 bdev_strategy:start 244575189961783 sched 39005180 1024 0 6668 bdev_strategy:start 244575189995547 sched 39005692 1024 0 6656 biodone:done 244575190584497 sched 508 1024 0 6656 biodone:done 244575191077651 sched 1020 1024 0 6656 biodone:done 244575191576723 sched 39005180 1024 0 6656 biodone:done 244575192077070 sched 39005692 1024 I am not a dtrace or zfs expert, but to me it looks like in many cases, an uberblock update is followed by a write of 1024 bytes to four different disk blocks. I also found that the four block numbers are incremented with always even numbers (256, 258, 260, ,..) 127 times and then the first block is written again. Which would mean that for a txg of 5, the four uberblock copies have been written 5/127=393 times (Correct?). What I would like to find out is how to access fields from arg1 (this is the data of type vdev in: int uberblock_update(uberblock_t *ub, vdev_t *rvd, uint64_t txg) ). When using the fbt:zfs:uberblock_update:entry probe, its elements are always 0, as you can see in the above output. When using the fbt:zfs:uberblock_update:return probe, I am getting an error message like the following: dtrace: failed to compile script zfs-uberblock-report-04.d: line 14: operator - must be applied to a pointer Any idea how to access the fields of vdev, or how to print out the pool name associated to an uberblock_update event? Regards, Bernd
Re: [zfs-discuss] Practical Application of ZFS
Hello, - One way is virtualization, if you use a virtualization technology that uses NFS for example, you could add your virtual images on a ZFS filesystem. NFS can be used without virtualization too, but as you said the machines are windows, i don't think the NFS client for windows is production ready. Maybe somebody else on the list can say... - Virtualization inside solaris branded zones... IIRC, the idea is have branded zones to support another OS (like GNU/Linux/ MS Windows, etc). - Another option is iSCSI, and you would not need virtualization. Leal [http://www.eall.com.br/blog] ZFS is the bomb. It's a great file system. What are it's real world applications besides solaris userspace? What I'd really like is to utilize the benefits of ZFS across all the platforms we use. For instance, we use Microsoft Windows Servers as our primary platform here. How might I utilize ZFS to protect that data? The only way I can visualize doing so would be to virtualize the windows server and store it's image in a ZFS pool. That would add additional overhead but protect the data at the disk level. It would also allow snapshots of the Windows Machine's virtual file. However none of these benefits would protect Windows from hurting it's own data, if you catch my meaning. Obviously ZFS is ideal for large databases served out via application level or web servers. But what other practical ways are there to integrate the use of ZFS into existing setups to experience it's benefits. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How ZFS decides if write to the slog or directly to the POOL
Marcelo Leal writes: Hello all, Somedays ago i was looking at the code and did see some variable that seems to make a correlation between the size of the data, and if the data is written to the slog or directly to the pool. But i did not find it anymore, and i think is way more complex than that. For example, if we have a pool of just two disks, it's fine to write to the slog (SSD). But if we have a 20 disks pool, write to the slog will not be a good idea, don't you agree? But if someone has that configuration (20 disks and a slog), the ZFS code would not identify that, and write directly to the pool? I'm asking this because i did some tests and seems like the SSD became a bottleneck... and i guess that even if the admin did make such mistake, the ZFS had the logic to avoid write to the intent log. Thanks a lot for your time! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-disc ss Hi Marcelo, you are right on and this is being tracked as : 6706578 a single zil writer should not abuse the e slog http://bugs.opensolaris.org/bugdatabase/view_bug.do?bu g_id=6706578 Thanks, i did put a link here: http://www.eall.com.br/blog/?p=842 Sun Storage 7000 line does already have a fix for this. ;-) Leal [http://www.eall.com.br/blog] -r http://blogs.sun.com/mws/entry/introducing_the_sun_sto rage_7000 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Thanks a lot Sanjeev! If you look my first message you will see that discrepancy in zdb... Leal. [http://www.eall.com.br/blog] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Hello all, # zpool status pool: mypool state: ONLINE scrub: scrub completed after 0h2m with 0 errors on Fri Dec 19 09:32:42 2008 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t8d0 ONLINE 0 0 0 c0t9d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t10d0 ONLINE 0 0 0 c0t11d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t12d0 ONLINE 0 0 0 c0t13d0 ONLINE 0 0 0 logs ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 errors: No known data errors - zfs list -r shows eight filesystems, and nine snapshots per filesystem. ... mypool/colorado 1.83G 4.00T 1.13G /mypool/colorado mypool/color...@centenario-2008-12-28-01:00:00 40.3M - 1.46G - mypool/color...@centenario-2008-12-29-01:00:00 30.0M - 1.54G - mypool/color...@campeao-2008-12-29-09:00:00 10.4M - 1.24G - mypool/color...@campeao-2008-12-29-13:00:00 31.5M - 1.29G - mypool/color...@campeao-2008-12-29-17:00:00 5.46M - 1.10G - mypool/color...@campeao-2008-12-29-21:00:00 4.23M - 1.13G - mypool/color...@centenario-2008-12-30-01:00:00 0 - 1.16G - mypool/color...@campeao-2008-12-30-01:00:00 0 - 1.16G - mypool/color...@campeao-2008-12-30-05:00:00 6.24M - 1.16G - ... - How many entries does it have ? Now there is just one file, the problematic one... but before the whole problem, four or five small files (the whole pool is pretty empty). - Which filesystem (of the zpool) does it belong to ? See above... Thanks a lot! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
execve(/usr/bin/rm, 0x08047DBC, 0x08047DC8) argc = 2 mmap(0x, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFEFF resolvepath(/usr/lib/ld.so.1, /lib/ld.so.1, 1023) = 12 resolvepath(/usr/bin/rm, /usr/bin/rm, 1023) = 11 sysconfig(_CONFIG_PAGESIZE) = 4096 xstat(2, /usr/bin/rm, 0x08047A68) = 0 open(/var/ld/ld.config, O_RDONLY) Err#2 ENOENT xstat(2, /lib/libc.so.1, 0x080471C8) = 0 resolvepath(/lib/libc.so.1, /lib/libc.so.1, 1023) = 14 open(/lib/libc.so.1, O_RDONLY)= 3 mmap(0x0001, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFEFB mmap(0x0001, 1380352, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE5 mmap(0xFEE5, 1272553, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFEE5 mmap(0xFEF97000, 32482, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 1273856) = 0xFEF97000 mmap(0xFEF9F000, 6400, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFEF9F000 munmap(0xFEF87000, 65536) = 0 memcntl(0xFEE5, 208132, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0 close(3)= 0 mmap(0x0001, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF9 munmap(0xFEFB, 32768) = 0 getcontext(0x08047820) getrlimit(RLIMIT_STACK, 0x08047818) = 0 getpid()= 3269 [3268] lwp_private(0, 1, 0xFEF92A00) = 0x01C3 setustack(0xFEF92A60) sysi86(SI86FPSTART, 0xFEFA0014, 0x133F, 0x1F80) = 0x0001 brk(0x08063770) = 0 brk(0x08065770) = 0 sysconfig(_CONFIG_PAGESIZE) = 4096 ioctl(0, TCGETA, 0x08047D3C)= 0 brk(0x08065770) = 0 brk(0x08067770) = 0 fstatat64(AT_FDCWD, Arquivos.file, 0x08047C80, 0x1000) Err#2 ENOENT fstat64(2, 0x08046CE0) = 0 write(2, r m : , 4) = 4 write(2, Arquivos . fil.., 13) = 13 write(2, : , 2) = 2 write(2, N o s u c h f i l e.., 25) = 25 write(2, \n, 1) = 1 _exit(2) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
execve(/usr/bin/ls, 0x08047DA8, 0x08047DB4) argc = 2 mmap(0x, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFEFF resolvepath(/usr/lib/ld.so.1, /lib/ld.so.1, 1023) = 12 resolvepath(/usr/bin/ls, /usr/bin/ls, 1023) = 11 xstat(2, /usr/bin/ls, 0x08047A58) = 0 open(/var/ld/ld.config, O_RDONLY) Err#2 ENOENT sysconfig(_CONFIG_PAGESIZE) = 4096 xstat(2, /lib/libc.so.1, 0x080471B8) = 0 resolvepath(/lib/libc.so.1, /lib/libc.so.1, 1023) = 14 open(/lib/libc.so.1, O_RDONLY)= 3 mmap(0x0001, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFEFB mmap(0x0001, 1380352, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE5 mmap(0xFEE5, 1272553, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFEE5 mmap(0xFEF97000, 32482, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 1273856) = 0xFEF97000 mmap(0xFEF9F000, 6400, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFEF9F000 munmap(0xFEF87000, 65536) = 0 memcntl(0xFEE5, 208132, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0 close(3)= 0 mmap(0x0001, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF9 munmap(0xFEFB, 32768) = 0 getcontext(0x08047810) getrlimit(RLIMIT_STACK, 0x08047808) = 0 getpid()= 5410 [5409] lwp_private(0, 1, 0xFEF92A00) = 0x01C3 setustack(0xFEF92A60) sysi86(SI86FPSTART, 0xFEFA0014, 0x133F, 0x1F80) = 0x0001 brk(0x08067320) = 0 brk(0x08069320) = 0 time() = 1230662014 ioctl(1, TCGETA, 0x08047ABC)= 0 sysconfig(_CONFIG_PAGESIZE) = 4096 brk(0x08069320) = 0 brk(0x08073320) = 0 lstat64(., 0x080469A0)= 0 xstat(2, /lib/libsec.so.1, 0x08045F98)= 0 resolvepath(/lib/libsec.so.1, /lib/libsec.so.1, 1023) = 16 open(/lib/libsec.so.1, O_RDONLY) = 3 mmap(0x0001, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFEFB mmap(0x0001, 151552, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE2 mmap(0xFEE2, 58047, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFEE2 mmap(0xFEE3F000, 13477, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 61440) = 0xFEE3F000 mmap(0xFEE43000, 5760, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFEE43000 munmap(0xFEE2F000, 65536) = 0 memcntl(0xFEE2, 13752, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0 close(3)= 0 munmap(0xFEFB, 32768) = 0 pathconf(., 20) = 2 acl(., ACE_GETACLCNT, 0, 0x) = 6 stat64(., 0x08046890) = 0 acl(., ACE_GETACL, 6, 0x08071C48) = 6 openat(AT_FDCWD, ., O_RDONLY|O_NDELAY|O_LARGEFILE) = 3 fcntl(3, F_SETFD, 0x0001) = 0 fstat64(3, 0x080479A0) = 0 getdents64(3, 0xFEF94000, 8192) = 80 lstat64(./Arquivos.file, 0x08046930) Err#2 ENOENT getdents64(3, 0xFEF94000, 8192) = 0 close(3)= 0 ioctl(1, TCGETA, 0x08046BBC)= 0 fstat64(1, 0x08046B20) = 0 write(1, t o t a l 0\n, 8) = 8 _exit(0) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cannot remove a file on a GOOD ZFS filesystem
Hello all... Can that be caused by some cache on the LSI controller? Some flush that the controller or disk did not honour? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How ZFS decides if write to the slog or directly to the POOL
Hello all, Somedays ago i was looking at the code and did see some variable that seems to make a correlation between the size of the data, and if the data is written to the slog or directly to the pool. But i did not find it anymore, and i think is way more complex than that. For example, if we have a pool of just two disks, it's fine to write to the slog (SSD). But if we have a 20 disks pool, write to the slog will not be a good idea, don't you agree? But if someone has that configuration (20 disks and a slog), the ZFS code would not identify that, and write directly to the pool? I'm asking this because i did some tests and seems like the SSD became a bottleneck... and i guess that even if the admin did make such mistake, the ZFS had the logic to avoid write to the intent log. Thanks a lot for your time! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] OpenSolaris panic while ZFS receiving (SXDE 89)
Hello all, I'm getting many OpenSolaris kernel panic while send/receiving data. I did try to create another pool and another host to test, and the same error. And the send side can be any server (i did test with four different servers, all build 89). The panic message: --- cut here - Dec 12 17:10:33 testserver unix: [ID 836849 kern.notice] Dec 12 17:10:33 testserver ^Mpanic[cpu2]/thread=ff000fe16c80: Dec 12 17:10:33 testserver genunix: [ID 683410 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ff000fe16700 addr=ff06b5989c10 Dec 12 17:10:33 testserver unix: [ID 10 kern.notice] Dec 12 17:10:33 testserver unix: [ID 839527 kern.notice] sched: Dec 12 17:10:33 testserver unix: [ID 753105 kern.notice] #pf Page fault Dec 12 17:10:33 testserver unix: [ID 532287 kern.notice] Bad kernel fault at addr=0xff06b5989c10 Dec 12 17:10:33 testserver unix: [ID 243837 kern.notice] pid=0, pc=0xf7dcd19a, sp=0xff000fe167f0, eflags=0x10202 Dec 12 17:10:33 testserver unix: [ID 211416 kern.notice] cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de Dec 12 17:10:33 testserver unix: [ID 624947 kern.notice] cr2: ff06b5989c10 Dec 12 17:10:33 testserver unix: [ID 625075 kern.notice] cr3: 340 Dec 12 17:10:33 testserver unix: [ID 625715 kern.notice] cr8: c Dec 12 17:10:33 testserver unix: [ID 10 kern.notice] Dec 12 17:10:33 testserver unix: [ID 592667 kern.notice]rdi: ff06b098ac40 rsi: ff06b5989bc0 rdx: 13 Dec 12 17:10:33 testserver unix: [ID 592667 kern.notice]rcx: 3 r8: ff02d878eb00 r9: 1c8 Dec 12 17:10:33 testserver unix: [ID 592667 kern.notice]rax: 4 rbx:79b7a rbp: ff00 0fe16890 Dec 12 17:10:33 testserver unix: [ID 592667 kern.notice]r10: c03e3db44351bd r11:0 r12: ff06 a0c16188 Dec 12 17:10:33 testserver unix: [ID 592667 kern.notice]r13: ff02d4530540 r14: ff06a2d37d38 r15: ff19 0b636d30 Dec 12 17:10:33 testserver unix: [ID 592667 kern.notice]fsb: 0 gsb: ff02d597ab00 ds: 4b Dec 12 17:10:33 testserver unix: [ID 592667 kern.notice] es: 4b fs:0 gs: 1c3 Dec 12 17:10:33 testserver unix: [ID 592667 kern.notice]trp: e err:0 rip: f7dcd19a Dec 12 17:10:33 testserver unix: [ID 592667 kern.notice] cs: 30 rfl:10202 rsp: ff00 0fe167f0 Dec 12 17:10:33 testserver unix: [ID 266532 kern.notice] ss: 38 Dec 12 17:10:33 testserver unix: [ID 10 kern.notice] Dec 12 17:10:33 testserver genunix: [ID 655072 kern.notice] ff000fe165e0 unix:die+ea () Dec 12 17:10:33 testserver genunix: [ID 655072 kern.notice] ff000fe166f0 unix:trap+13b9 () Dec 12 17:10:33 testserver genunix: [ID 655072 kern.notice] ff000fe16700 unix:cmntrap+e9 () Dec 12 17:10:33 testserver genunix: [ID 655072 kern.notice] ff000fe16890 zfs:dbuf_write+ca () Dec 12 17:10:33 testserver genunix: [ID 655072 kern.notice] ff000fe168e0 zfs:dbuf_sync_indirect+ab () Dec 12 17:10:33 testserver genunix: [ID 655072 kern.notice] ff000fe16920 zfs:dbuf_sync_list+5e () Dec 12 17:10:33 testserver genunix: [ID 655072 kern.notice] ff000fe16990 zfs:dnode_sync+23b () Dec 12 17:10:33 testserver genunix: [ID 655072 kern.notice] ff000fe169d0 zfs:dmu_objset_sync_dnodes+55 () Dec 12 17:10:33 testserver genunix: [ID 655072 kern.notice] ff000fe16a50 zfs:dmu_objset_sync+13d () Dec 12 17:10:33 testserver genunix: [ID 655072 kern.notice] ff000fe16aa0 zfs:dsl_dataset_sync+5d () Dec 12 17:10:33 testserver genunix: [ID 655072 kern.notice] ff000fe16b10 zfs:dsl_pool_sync+b5 () Dec 12 17:10:33 testserver genunix: [ID 655072 kern.notice] ff000fe16bb0 zfs:spa_sync+20e () Dec 12 17:10:33 testserver genunix: [ID 655072 kern.notice] ff000fe16c60 zfs:txg_sync_thread+226 () Dec 12 17:10:33 testserver genunix: [ID 655072 kern.notice] ff000fe16c70 unix:thread_start+8 () Dec 12 17:10:33 testserver unix: [ID 10 kern.notice] Dec 12 17:10:33 testserver genunix: [ID 672855 kern.notice] syncing file systems... Dec 12 17:10:33 testserver genunix: [ID 733762 kern.notice] 142 Dec 12 17:10:34 testserver genunix: [ID 733762 kern.notice] 20 Dec 12 17:10:35 testserver genunix: [ID 733762 kern.notice] 2 Dec 12 17:10:57 testserver last message repeated 20 times Dec 12 17:10:58 testserver genunix: [ID 622722 kern.notice] done (not all i/o completed) Dec 12 17:10:59 testserver genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c0t0d0s1, offset 4194893824, content: kernel Dec 12 17:11:08 testserver genunix: [ID 409368 kern.notice] ^M100% done: 346040 pages dumped, compression ratio 2.47, Dec 12 17:11:08 testserver genunix: [ID 851671 kern.notice] dump succeeded - cut here -- I did find some messages that seems to be related, but with Sol 10. Do you think an update can solve
Re: [zfs-discuss] Lost Disk Space
A percentage of the total space is reserved for pool overhead and is not allocatable, but shows up as available in zpool list. Something to change/show in the future? -- Leal [http://www.posix.brte.com.br/blog] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Enabling load balance with zfs
I think the better solution is to have two pools, and write a script to change the recording destination time to time, or move the files after it. Like the prototype by Hartz. ps.: Is this a reality show? ;-) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] DNLC and ARC
Hello, In ZFS the DNLC concept is gone, or is in ARC too? I mean, all the cache in ZFS is ARC right? I was thinking if we can tune the DNLC in ZFS like in UFS.. if we have too *many* files and directories, i guess we can have a better performance having all the metadata cached, and that is even more important in NFS operations. DNLC is LRU right? And ARC should be totally dynamic, but as in another thread here, i think reading a *big* file can mess with the whole thing. Can we hold an area in memory for DNLC cache, or that is not the ARC way? thanks, Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] DNLC and ARC
Hello Neil, Leal, ZFS uses the DNLC. It still provides the fastest lookup of directory, name to vnode. Ok, so the whole concept remains true? We can tune the DNLC and expect the same behaviour on ZFS? The DNLC is kind of LRU. An async process will use a rotor to move through the hash chains and select the LRU entry but will select first negative cache entries and vnodes only referenced by the DNLC. Underlying this ZFS uses the ZAP and Fat ZAP to store the mappings. Here i did not understand very well. You are saying that ZFS uses DNLC just for one level? ZFS does not use the 2nd level DNLC which allows caching of directories. This is only used by UFS to avoid a linear search of large directories. What is the ZFS way here? One of the points of my question is exactly that... in an environment with many directories with *many* files, i think ZFS would has the *same* problems too. So, having directories cache on DNLC could be a good solution. Can you explain how ZFS handles the performance in directories with hundreds of files? There is a lot of docs around UFS/DNLC, but for now i think the only doc about ZFS/ARC and DNLC is the source code. ;-) Neil. Thanks a lot! I was thinking in tune DNLC to have as many metadata (directories and files) as i can, to minimize lookups/stats and etc (in NFS there is a lot of getattr ops). So we could have *all* the metadata cached, and use what remains in memory to cache data. Maybe that kind of tuning would be usefull for just a few workloads, but could be a *huge* enhancement for that workloads. Leal -- posix rules -- [http://www.posix.brte.com.br/blog] On 10/30/08 04:50, Marcelo Leal wrote: Hello, In ZFS the DNLC concept is gone, or is in ARC too? I mean, all the cache in ZFS is ARC right? I was thinking if we can tune the DNLC in ZFS like in UFS.. if we have too *many* files and directories, i guess we can have a better performance having all the metadata cached, and that is even more important in NFS operations. DNLC is LRU right? And ARC should be totally dynamic, but as in another thread here, i think reading a *big* file can mess with the whole thing. Can we hold an area in memory for DNLC cache, or that is not the ARC way? thanks, Leal. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Managing low free space and snapshots
Hello, In the situation you have described, if i understood well, you would not have any space. When you take a snapshot, your snapshot is referencing the blocks older than it... Ex.: You have a 500gb disk, and create a 5gb file, you got 495gb free space. So you delete the file, you have 500gb again. But if you take a snapshot *before* delete the file, that file is still there (in the snap), and using space (5gb)... ;-) Leal [http://www.posix.brte.com.br/blog] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] COW updates [C1]
Because of one change to just one file, the MOS is a brend new one Yes, all writes in ZFS are done in transaction groups.. so, evertime there is a commit, something is really write to disk, there is a new txg and all the blocks written are related to that txg (even the ubberblock). I don´t know if i understood the other questions about updates in the MOS, 128K uberblocks, and regular files... but the location of the active uberblock is on the vdev´s labels (L0...L3), and the label´s update is the only one is not COW on ZFS, because the location of labels are fixed on disks. The updates in labels are done following a staged approach = (L0/L2 after L1/L3). And the updates to an uberblock are done by writing a modified uberblock to another element of the uberblock array(128). peace. Leal -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
I agree with you Constantin that the sync is a performance problem, in the same way i think in a NFS environment it is just *required*. If the sync can be relaxed in a specific NFS environment, my first opinion is that the NFS is not necessary on that environment in first place. IMHO a protocol like iSCSI would have a much better performance in such situation, at least would be no caution to handle the consistency between other clients. That said, options are always good, and have the possibility to disable the ZIL per filesystem is more one *gun* in the world. And as always, can reach the cops and the bad guys. Keep in mind that JB is trying to send to jail who is winning performance benchs without syncing to disks. ;-) Keep the good work in your blog! Leal -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
Bah, I've done it again. I meant use it as a slog device, not as the ZIL... But the slog is the ZIL. formaly a *separate* intent log. What´s the matter? I think everyone did understand. I think you did make a confusion some threads before about ZIL and L2ARC. That is a different thing.. ;-) Leal. I'll get this terminology in my head eventually. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Building a 2nd pool, can I do it in stages?
Hello there, It´s not a wiki, but has many considerations about your question: http://www.opensolaris.org/jive/thread.jspa?threadID=78841tstart=60 Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis
But the slog is the ZIL. formaly a *separate* intent log. No the slog is not the ZIL! Ok, when you did write this: I've been slogging for a while on support for separate intent logs (slogs) for ZFS. Without slogs, the ZIL is allocated dynamically from the main pool. You were talking about The body of code in the statement: the ZIL is allocated ? So i have misunderstood you... Leal. Here's the definition of the terms as we've been trying to use them: ZIL: The body of code the supports synchronous requests, , which writes out to the Intent Logs Intent Log: A stable storage log. There is one per file system zvol. slog: An Intent Log on a separate stable device - - preferably high speed. We don't really have name for an Intent Log when it's embedded in the main pool. I have in the past used the term clog for chained log. Originally before slogs existed, it was just the Intent Log. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disabling COMMIT at NFS level,
On 10/22/08 13:56, Marcelo Leal wrote: But the slog is the ZIL. formaly a *separate* intent log. No the slog is not the ZIL! Ok, when you did write this: I've been slogging for a while on support for separate intent logs (slogs) for ZFS. Without slogs, the ZIL is allocated dynamically from the main pool. You were talking about The body of code in the statement: the ZIL is allocated ? So i have misunderstood you... Leal. I guess I need to fix that! See? I think you are being a little dramatic... Ok, there is the ZIL (code), and the ZFS intent log. It's just inevitable that people will call the ZFS intent log.. ZIL for short. But i respect you! You write the code... ;-) Let's go back to the point. Leal Anyway the slog is not the ZIL it's one of the two currently possible Intent Log types. Sorry for the confusion: Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tool to figure out optimum ZFS recordsize for a Mail server Maildir tree?
Hello Roch! Leave the default recordsize. With 128K recordsize, files smaller than 128K are stored as single record tightly fitted to the smallest possible # of disk sectors. Reads and writes are then managed with fewer ops. In the write ZFS is dynamic, but in the read? If i have many small files (smaller than 128K), i would not waste time reading 128K? And after the ZFS has allocated a FSB of 64K for example, if that file gets bigger, ZFS will use 64K blocks right? Not tuning the recordsize is very generally more space efficient and more performant. Large DB (fixed size aligned accesses to uncacheable working set) is the exception here (tuning recordsize helps) and a few other corner cases. -r Le 15 sept. 08 à 04:49, Peter Eriksson a écrit : I wonder if there exists some tool that can be used to figure out an optimal ZFS recordsize configuration? Specifically for a mail server using Maildir (one ZFS filesystem per user). Ie, lot's of small files (one file per email). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Booting 0811 from USB Stick
Hello all, Did you make a install on the USB stick, or did you use the Distribution Constructor (DC)? Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Lost Disk Space
Hello there... I did see that already, talk with some guys without answer too... ;-) Actually, this week i did not see discrepancy between tools, but the pool information was wrong (space used). Exporting/importing, scrub, and etc, did not solve. I know that zfs is async in the status report ;-), but only after a reboot the status was OK again. ps.: b89 Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Tuning for a file server, disabling data cache (almost)
Hello all, I think he got some point here... maybe that would be an interesting feature for that kind of workload. Caching all the metadata, would make the rsync task more fast (for many files). Try to cache the data is really waste of time, because the data will not be read again, and will just send away the good metadata cached. That is what i understand when he said about the 96k being descarded soon. He wants to configure an area to copy the data, and that´s it. Leave my metadat cache alone. ;-) Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Improving zfs send performance
Hello all, I think in SS 11 should be -xarch=amd64. Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS-over-iSCSI performance testing (with low random access results)...
So, there is no raid10 in a solaris/zfs setup? I´m talking about no redundancy... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow
Hello all, I think the problem here is the ZFS´ capacity for recovery from a failure. Forgive me, but thinking about creating a code without failures, maybe the hackers did forget that other people can make mistakes (if they can´t). - ZFS does not need fsck. Ok, that´s a great statement, but i think ZFS needs one. Really does. And in my opinion a enhanced zdb would be the solution. Flexibility. Options. - I have 90% of something i think is your filesystem, do you want it? I think a software is as good as it can recovery from failures. And i don´t want to know who failed, i´m not going to send anyone to jail, i´m not a lawyer. I agree with Jeff, really do, but that is another problem... The solution Jeff is working one, i think is really great, since it does NOT be the all or nothing again... I don´t know about you, but A LOT of times i was saved by the Lost and Found directory! All the beauty of a UNIX system is rm /etc/passwd after have edited it, and get the whole file doing a cat /dev/mem. ;-) I think there are a lot of parts in ZFS design that remembers me when you see something left on the floor at home, so you ask for your son why he did not get it, and he says it was not me. peace. Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow
On Fri, Oct 10, 2008 at 06:15:16AM -0700, Marcelo Leal wrote: - ZFS does not need fsck. Ok, that?s a great statement, but i think ZFS needs one. Really does. And in my opinion a enhanced zdb would be the solution. Flexibility. Options. About 99% of the problems reported as I need ZFS fsck can be summed up by two ZFS bugs: 1. If a toplevel vdev fails to open, we should be able to pull information from necessary ditto blocks to open the pool and make what progress we can. Right now, the root vdev code assumes can't open = faulted pool, which results in failure scenarios that are perfectly recoverable most of the time. This needs to be fixed so that pool failure is only determined by the ability to read critical metadata (such as the root of the DSL). . If an uberblock ends up with an inconsistent view of the world (due to failure of DKIOCFLUSHWRITECACHE, for example), we should be able to go back to previous uberblocks to find a good view of our pool. This is the failure mode described by Jeff. hese are both bugs in ZFS and will be fixed. That´s it! It´s 100% for me! ;-) One is the all-or-nothing problem, and the other is about guilty... ;-)) There are some interesting possibilities for limited forensic tools - in particular, I like the idea of a mdb backend for reading and writing ZFS pools[1]. In my opinion would be great the whole functionality in zdb. it´s simple, and the concepts are clear on the tool. mdb is a debugger, needs concepts that i think is different in a tool for read/fix filesystems. Just an opinion... What does not mean we can not have both. Like i said, flexibility, options... ;-) But I haven't actually heard a reasonable proposal for what a fsck-like tool I think we must NOT stuck in the word fsck, i have used it just as an example (Lost and Found). And i think other users used just as an example too. The important is the two points you have described very *well*. (i.e. one that could repair things automatically) would actually *do*, let alone how it would work in the variety of situations it needs to (compressed RAID-Z?) where the standard ZFS infrastructure fails. - Eric [1] http://mbruning.blogspot.com/2008/08/recovering-remove d-file-on-zfs-disk.html -- Eric Schrock, Fishworks http://blogs.sun.com/eschrock zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss Many thanks for your answer! Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZSF Solaris
ZFS has not limit for snapshots and filesystems too, but try to create a lot snapshots and filesytems and you will have to wait a lot for your pool to import too... ;-) I think you should not think about the limits, but performance. Any filesytem with *too many entries by directory will suffer. So, my advice is configure your app to create a better hierarchy. Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on Hitachi SAN, pool recovery
Just curiosity, why don´t use SC? Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive filehandle issue
Hello Adrian, Thanks, i was using send/receive (that´s why i did put it on subject ;), and i did like to know if ZFS could have some solution for that as i said before. The send/receive is not an exact copy of the filesystem (creation time, fsid, etc) are different. So, the FH using that for composition, the ZFS has the same problem like any other filesystem (stale issue). It was just a guess, as i was trying the same filesystem, maybe the references inside it would remain the same, and Solaris/ZFS could not be using the things that cause this issue... ps.: I woul like to find my chat with the ZFS engineer. I should have misunderstood him. Thanks again! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A question about recordsize...
Hello milek, That information remains true? ZFS algorithm for selecting block sizes The initial block size is the smallest support block size larger than the first write to the file. Grow to the next largest block size for the entire file when the total file length increases beyond the current block size (up to the maximum block size). Shrink the block size when the entire file will fit in a single smaller block. ZFS currently support nine block sizes, from 512 bytes to 128K. Larger block size could be supported in the future, but see roch's blog on why 128k is enough Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A question about recordsize...
On Fri, 5 Sep 2008, Marcelo Leal wrote: 4 - The last one... ;-) For the FSB allocation, how the zfs knows the file size, for know if the file is smaller than the FSB? Something related to the txg? When the write goes to the disk, the zfs knows (some way) if that write is a whole file or a piece of it? For synchronous writes (file opened with O_DSYNC option), ZFS must write the data based on what it has been provided in the write so at any point in time, the quality of the result (amount of data in tail block) depends on application requests. However, if the application continues to extend the file via synchronous writes, existing data in the sub-sized tail block will be re-written to a new location (due to ZFS COW) with the extra data added. This means that the filesystem block size is more important for synchronous writes, and particularly if there is insufficient RAM to cache the already written block. If i understand well, the recordsize is really important for big files. Because with small files, and small updates, we have a lot of chances to have the data well organized on disk. I think the problem is the big files... where we have tiny updates. In the creation´s time of the pool, the recordsize is 128k, but i don´t know if that limit is real for, let´s say, when we are copying a DVD image. I think the recordsize can be lager. If so, if in lager files we can have a recordsize of... 1mb? So, what happen if we would change after that, 1k? For asynchronous writes, ZFS will buffer writes in RAM for up to five seconds before actually writing it. This buffering allows ZFS to make better informed decisions about how to write the data so that the data is written to full blocks as contiguously as possible. If the application writes asynchronously, but then issues an fsync() call, then any cached data will be committed to disk at that time. It can be seen that for asynchronous writes, the quality of the written data layout is somewhat dependent on how much RAM the system has available and how fast the data is written. With more RAM, there can be more useful write caching (up to five seconds) and ZFS can make better decisions when it writes the data so that the data in a file can be written optimally, even with the pressure of multi-user writes. Agree. Any other ZFS experts to answer the first questions? ;-) Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss Thanks bfriesen! Leal -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS send/receive filehandle issue
Hello all, Some way to workaround the filehandle issue with a send/receive ZFS procedure? In the ZFS begining, i did a conversation with some of the devel guys, and did ask about how ZFS would treat the NFS filehandle.. IIRC, the answere was: No problem, the NFS filehandle will not depend on the disk/path/creation... like ufs or other filesystems. So i thought (wrong maybe ;), all informations required to preserve the FH consistency would be on the filesystem itself. I did some tests with send/receive a filesystem from one node to another, changing the IP from one node to the other, and got the FH issue (stale), from a GNU/Linux client. Any chance to have a working scenario like this, or for non-shared discs i need to have many pools, and in the case i want to migrate (portions of) the NFS services, i need to hot-swap discs (export/import the pool on the other host)? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Terabyte scrub
You are right! Seeing the numbers i could not think very well ;-) What matters is the used size, and not the storage capacity! My fault... Thanks a lot for the answers. Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] send/receive statistics
Thanks a lot for the answers! Relling did say something about checksum, i did ask to him about a more detailed explanation about it. Because i did not understand what checksum the receive part has to check, as the send can be redirected to a file on a disc or tape... In the end, i think if we can import (receive) the snapshot, and that procedure ends fine, we are in good shape. Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] A question about recordsize...
Hello! Assuming the default recordsize (FSB) in zfs is 128k, so: 1 - If i have a file with 10k, the zfs will allocate a FSD of 10k. Right? As zfs is not static like the other filesystems, i don´t have that old internal fragmentation... 2 - If the above is right, i don´t need to adjust the recordsize (FSB) if i will handle a lot of tiny files. Right? 3 - if the two above are right ones, so the tuning of the recordsize is just important for files greater than the FSB. Let´s say, 129k... but so, another question: If the file is 129k, the zfs will allocate one filesystem block of 128k and another of... 1k! Right? Or two of 128k? 4 - The last one... ;-) For the FSB allocation, how the zfs knows the file size, for know if the file is smaller than the FSB? Something related to the txg? When the write goes to the disk, the zfs knows (some way) if that write is a whole file or a piece of it? Thanks a lot! Leal. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] send/receive statistics
Hello all, Any plans (or already have), a send/receive way to get the transfer backup statistics? I mean, the how much was transfered, time and/or bytes/sec? And the last question... i did see in many threads the question about the consistency between the send/receive through ssh... but no definitive answers. So, my last question is: If the transfer is completed (send/receive), i can trust the backup was good. The receive returning 0 is definitive? Thanks. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Terabyte scrub
Hello all, I was used to use mirrors and solaris 10, in which the scrub process for 500gb took about two hours... and with solaris express (snv_79a) tests, terabytes in minutes. I did search for release changes in the scrub process, and could not find anything about enhancements in this magnitude. So i ask you... :) What?? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CIFS HA service with solaris 10 and SC 3.2
Hello all, i would like to continue with this topic, and after doing some research about the topic, i have some (many) doubts, and maybe we could use this thread to give some responses to me and other users that can have the same questions... First, sorry to CC to many forums, but i think is a relevant topic to all of them, so... Second, would be nice to clear the understanding on some topics... 1) What the difference between the smb server in solaris/opensolaris, and the new project CIFS? 2) I think samba.org has an implementation of CIFS protocol, to make a unix-like operating system to be a SMB/CIFS server. Why don't use that? license problems? the smbserver that is already on solaris/opensolaris is not a samba.org implementation? 3) One of the goals to the CIFS Server project on OpenSolaris, is to support OpenSolaris as a storage operating system... we can not do it with samba.org implementation, or smbserver implementation that is already there? 4) And the last one: ZFS has smb/cifs share/on/off capabilities, what is the relation of that with all of that?? 5) Ok, there is another question... there is a new projetc (data migration manager/dmm), that is intend to migrate NFS(GNU/Linux) services, and CIFS(MS/Windows) services to Solaris/Opensolaris and ZFS. That project is on storage community i think...but, how can we create a migration plan if we can not handle the services yet? or can? Ok, i'm very confuse, but is not just my fault, i think is a little complicated all this efforts without a glue, don't you agree? And in the top of all, is a need to have an agent to implement HA services on it... i want implement a SMB/CIFS server on solaris/opensolaris, and don't know if we have the solution in ou community or not, and if there is an agent to provide HA or we need to create a project to implement that... See, i need help :-) Ok, that's all! Leal. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS data recovery command
Hello all, In a traditional filesystem, we have a few filesystems, but with ZFS, we can have thousands.. The question is: There is a command or procedure to remake the filesystems, in a recovery from backup scenario? I mean, imagine that i have a ZFS pool with 1,000 filesystems, and for some reason, i loose that pool. ZFS has some command, so we can export the ZPOOL structure and recreate it quickly (to use a traditional backup software)? Actually, i have a lot of concerns about it. we are evaluating some backup softwares around here, and my concern is a scenario where for any reason i loose the pool. What features ZFS provides me for a recovery from that situation? What backup solutions you use? There is some backup software that understands zfs send/receive? A lot of questions :) Thanks for your time, Leal. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS with raidz
Hello... If i have understood well, you will have a host with EMC RAID5 discs. Is that right? You pay a lot of money to have EMC discs, and i think is not a good idea have another layer of *any* RAID on top of it. If you have EMC RAID5 (eg. symmetrix), you don't need to have a software RAID... ZFS was designed to have a RAID solution to cheap discs! I think is not your case, and anything that is too much is not good. Generates complexity and loop... :) I think ZFS can trust on the EMC thing... Leal. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cp -p gives errors on Linux w/ NFS-mounted ZFS
Hello all, I'm having the same problem here, any news? I need to use ACL's on the GNU/Linux clients. I'm using nfsv3, and on the GNU/Linux servers that feature was working, i think we need a solution for solaris/opensolaris. Now, with the dmm project, how we can start a migration process, if we can not provide the same services on the target machine? Thanks a lot for your time! Leal. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS cli for REMOTE Administration
No answer... well, do you not have this problem or there is another option to delegate such administration? I was thinking if we can delegate a single filesystem administration to some user through ZFS administration web console (67889). Can i create a user and give him administration rights to a single filesystem (and its snapshots, of course)? Thanks. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS still crashing after patch
Hello, If you believe that the problem can be related to ZIL code, you can try to disable it to debug (isolate) the problem. If it is not a fileserver (NFS), disabling the zil should not impact consistency. Leal. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS cli for REMOTE Administration
Hello all, Some time ago i did write a simple script to handle on the fly filesystem(zfs) creation for linux clients (http://www.posix.brte.com.br/blog/?p=102). I was thinking in improve that script to handle more generic remote actions... but i think we could start a project on this: A client, text based, to execute remote administration on a given ZFS filesystem. This administration could be done from Linux/OSX/FreeBSD, for example. I think the main use of such feature would be snapshot administration... Today we have a little problem managing snapshots on the server, not the creation, but the deletion of that snapshots. When the user exeeds the quota, or needs free some space, he cannot do it if that filesystem has snapshots. I just want your opinion, and maybe other solutions if i'm not seeing a more simple one. And if you agree with me, we could make a project proposal around this. What do you think? Thanks a lot for your time! Leal. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss