Re: [zfs-discuss] VM's on ZFS - 7210
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 If I remember correctly, ESX always uses synchronous writes over NFS. If so, adding a dedicated log device (such as a DDRdrive) might help you out here. You should be able to test it by disabling the ZIL for a short while and see if performance improves (http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29). I'm not sure how reliable the DDRdrive is in practice, but in theory it should be much better than an SSD, since DRAM doesn't wear. - -- Saso On 08/27/2010 07:04 AM, Mark wrote: We are using a 7210, 44 disks I believe, 11 stripes of RAIDz sets. When I installed I selected the best bang for the buck on the speed vs capacity chart. We run about 30 VM's on it, across 3 ESX 4 servers. Right now, its all running NFS, and it sucks... sooo slow. iSCSI was no better. I am wondering how I can increase the performance, cause they want to add more vm's... the good news is most are idleish, but even idle vm's create a lot of random chatter to the disks! So a few options maybe... 1) Change to iSCSI mounts to ESX, and enable write-cache on the LUN's since the 7210 is on a UPS. 2) get a Logzilla SSD mirror. (do ssd's fail, do I really need a mirror?) 3) reconfigure the NAS to a RAID10 instead of RAIDz Obviously all 3 would be ideal , though with a SSD can I keep using NFS for the same performance since the R_SYNC's would be satisfied with the SSD? I am dreadful of getting the OK to spend the $$,$$$ SSD's and then not get the performance increase we want. How would you weight these? I noticed in testing on a 5 disk OpenSolaris, that changing from a single RAIDz pool to RAID10 netted a larger IOP increase then adding an Intel SSD as a Logzilla. That's not going to scale the same though with a 44 disk, 11 raidz striped RAID set. Some thoughts? Would simply moving to write-cache enabled iSCSI LUN's without a SSD speed things up a lot by itself? -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkx3gMQACgkQRO8UcfzpOHDL7ACfW43C6lkMD389j/vmldqMDK1f 1H0AoNFdhgHfWKCCMaJQ2DJACpkQicU7 =KIyA -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS offline ZIL corruption not detected
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 If I might add my $0.02: it appears that the ZIL is implemented as a kind of circular log buffer. As I understand it, when a corrupt checksum is detected, it is taken to be the end of the log, but this kind of defeats the checksum's original purpose, which is to detect device failure. Thus we would first need to change this behavior to only be used for failure detection. This leaves the question of how to detect the end of the log, which I think could be done by using a monotonously incrementing counter on the ZIL entries. Once we find an entry where the counter != n+1, then we know that the block is the last one in the sequence. Now that we can use checksums to detect device failure, it would be possible to implement ZIL-scrub, allowing an environment to detect ZIL device degradation before it actually results in a catastrophe. - -- Saso On 08/26/2010 03:22 PM, Eric Schrock wrote: On Aug 26, 2010, at 2:40 AM, StorageConcepts wrote: 1) zil needs to report truncated transactions on zilcorruption As Neil outlined, this isn't possible while preserving current ZIL performance. There is no way to distinguish the last ZIL block without incurring additional writes for every block. If it's even possible to implement this paranoid ZIL tunable, are you willing to take a 2-5x performance hit to be able to detect this failure mode? - Eric -- Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkx2dUwACgkQRO8UcfzpOHD6QgCfWRBvqYxwKOqrFeaMyQ3nZDVX Pu0AoJJHPybVT3GqvQbJPL8Xa58aC5P1 =pQJU -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS offline ZIL corruption not detected
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I see, thank you for the clarification. So it is possible to have something equivalent to main storage self-healing on ZIL, with ZIL-scrub to activate it. Or is that already implemented also? (Sorry for asking these obvious questions, but I'm not familiar with ZFS source code.) - -- Saso On 08/26/2010 04:31 PM, Darren J Moffat wrote: On 26/08/2010 15:08, Saso Kiselkov wrote: If I might add my $0.02: it appears that the ZIL is implemented as a kind of circular log buffer. As I understand it, when a corrupt checksum It is NOT circular since that implies limited number of entries that get overwritten. is detected, it is taken to be the end of the log, but this kind of defeats the checksum's original purpose, which is to detect device failure. Thus we would first need to change this behavior to only be used for failure detection. This leaves the question of how to detect the end of the log, which I think could be done by using a monotonously incrementing counter on the ZIL entries. Once we find an entry where the counter != n+1, then we know that the block is the last one in the sequence. See the comment part way down zil_read_log_block about how we do something pretty much like that for checking the chain of log blocks: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/zil.c#zil_read_log_block This is the checksum in the BP checksum field. But before we even got there we checked the ZILOG2 checksum as part of doing the zio (in zio_checksum_verify() stage): http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/zio_checksum.c#zio_checksum_error A ZILOG2 checksum is a embedded in the block (at the start, the original ZILOG was at the end) version of fletcher4. If that failed - ie the block was corrupt we would have returned an error back through the dsl_read() of the log block. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkx2f64ACgkQRO8UcfzpOHA7rACgoyydAq2hO/VIfdknRb09WWGJ BkwAn2i3nPtWNnfXwyW2089YMb8FRkZP =YMqL -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Disk space on Raidz1 configuration
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 ZFS and du use binary byte multipliers (1kB = 1024 B, etc.), while drive manufacturers use decimal conversion (1kB = 1000 B). So your 1.5TB drives are in fact ~1.36 TiB (binary TB): 7 x 1,36 TiB = 9.52 TiB - 1,36 TiB for parity = 8.16 TiB - -- Saso On 08/06/2010 01:29 PM, Per Jorgensen wrote: I have 7 * 1,5TB disk in a raidz1 configuration, then the system (how i understanding it) uses 1,5TB ( 1 disk ) for parity, but when i uses df the available space in my newly created pool it says FilesystemSize Used Avail Use% Mounted on bf8.0T 36K 8.0T 1% /bf when I uses zpool list it says NAMESIZE USED AVAILCAP HEALTH ALTROOT bf 9.50T 292K 9.50T 0% ONLINE - the pool is created with the following command, and compression is set to off zpool create -f bf raidz1 c9t0d0 c9t1d0 c9t2d0 c9t3d0 c9t4d0 c9t5d0 c9t6d0 and when I do some calculation 7 x 1,5TB = 10,5TB - 1,5TB for parity = 9,5 TB , so to my questions 1. why do I only have 8TB in my bf pool ? 2. why do zpool list and df reports diffrents disk space avaibable thanks Per Jorgensen -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkxb9qkACgkQRO8UcfzpOHArigCghxLGcQjueptfokCXvCA/rm5q WaQAoKkRAKAcXU/dtazbrahJwwyhUUwk =Y7SN -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CPU requirements for zfs performance
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I didn't mean to imply that I use it for my media storage, just that I occasionally encounter situations when it could be useful. BR, - -- Saso On 07/22/2010 11:23 AM, Roy Sigurd Karlsbakk wrote: - Original Message - I do encounter situations when I (or somebody from my family) accidentally create multiple copies of photo albums. :-) I wouldn't recommend using dedup on this system. Dedup requires lots of RAM or L2ARC, and I don't think it is suitable for your needs. You may want to svn co http://svn.karlsbakk.net/svn/roy/deduba; and test that script. It's a script that looks through a directory and, using MD5 and SHA256, finds identical files. It's somehow unfinished, but it works. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkxIDoIACgkQRO8UcfzpOHD9HACghZz27u6JvhJuLBPrSFJCicrX U00AnA4eDGnK9MGLI07pI3KtABlFKARm =Gn/l -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CPU requirements for zfs performance
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I do encounter situations when I (or somebody from my family) accidentally create multiple copies of photo albums. :-) - -- Saso On 07/21/2010 05:20 PM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Saso Kiselkov If you plan on using it as a storage server for multimedia data (movies), don't even bother considering compression, as most media files already come heavily compressed. Dedup might still come in handy, though. If you're storing movies, I agree compression is a waste. But I think dedup will also be a waste, unless you have multiple copies of the same movie on your disk for some reason. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkxHEJIACgkQRO8UcfzpOHBbmQCgqYrov99f1WgtELe0I2pkt44v j0gAoNONmMj6C4fI8l00amJZnhG9rgJz =eWQ9 -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CR 6880994 and pkg fix
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 How about running memtest86+ (http://www.memtest.org/) on the machine for a while? It doesn't test the arithmetics on the CPU very much, but it stresses data paths quite a lot. Just a quick suggestion... - -- Saso Damon Atkins wrote: You could try copying the file to /tmp (ie swap/ram) and do a continues loop of checksums e.g. while [ ! -f ibdlpi.so.1.x ] ; do sleep 1; cp libdlpi.so.1 libdlpi.so.1.x ; A=`sha512sum -b libdlpi.so.1.x` ; [ $A == what it should be libdlpi.so.1.x ] rm libdlpi.so.1.x ; done ; date Assume the file never goes to swap, it would tell you if something on the motherboard is playing up. I have seen CPU randomly set a byte to 0 which should not be 0, think it was an L1 or L2 cache problem. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkuqHm8ACgkQRO8UcfzpOHD9PQCgyehtxeAt8tieOlIKfHICQQI9 bFoAnRGzfWayNDsjHj5NdF+5n++Pdqaq =cru5 -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Booting OpenSolaris on ZFS root on Sun Netra 240
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I'm kind stuck at trying to get my aging Netra 240 machine to boot OpenSolaris. The live CD and installation worked perfectly, but when I reboot and try to boot from the installed disk, I get: Rebooting with command: boot disk0 Boot device: /p...@1c,60/s...@2/d...@0,0 File and args: | The file just loaded does not appear to be executable. I suspect it's due to the fact that my OBP can't boot a ZFS root (OpenBoot 4.22.19). Is there a to work around this? Regards, - -- Saso -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAktqz7kACgkQRO8UcfzpOHCqhgCgl8I+5zCTBLb0MUVq9cz5zrqz 9LgAoIurhee3/+nfXtUBwVczkjKxQVaj =7dXF -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Just tried and didn't help :-(. Regards, - -- Saso Brent Jones wrote: On Wed, Jan 6, 2010 at 2:40 PM, Saso Kiselkov skisel...@gmail.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Buffering the writes in the OS would work for me as well - I've got RAM to spare. Slowing down rm is perhaps one way to go, but definitely not a real solution. On rare occasions I could still get lockups, leading to screwed up recordings and if its one thing people don't like about IPTV, it's packet loss. Eliminating even the possibility of packet loss completely would be the best way to go, I think. Regards, - -- Saso I shouldn't dare suggest this, but what about disabling the ZIL? Since this sounds like transient data to begin with, any risks would be pretty low I'd imagine. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAktFovIACgkQRO8UcfzpOHCWawCfSeXjpYjLvRE/5guwYZaSc0L/ XP8An2Q+5NBMDIurAkq+EF07woVzPuIW =rLoe -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I've encountered a new problem on the opposite end of my app - the write() calls to disk sometimes block for a terribly long time (5-10 seconds) when I start deleting stuff on the filesystem where my recorder processes are writing. Looking at iostat I can see that the disk load is strongly uneven - with a lowered zfs_txg_timeout=1 I get normal writes every second, but when I start deleting stuff (e.g. rm -r *), huge load spikes appear from time to time, even to the level of blocking all processes writing to the filesystem and filling up the network input buffer and starting to drop packets. Is there a way that I can increase the write I/O priority, or increase the write buffer in ZFS so that write()s won't block? Regards, - -- Saso Saso Kiselkov wrote: Ok, I figured out that apparently I was the idiot in this story, again. I forgot to set SO_RCVBUF on my network sockets higher, so that's why I was dropping input packets. The zfs_txg_timeout=1 flag is still necessary (or else dropping occurs when commiting data to disk), but by increasing network input buffer sizes it seems I was able to cut input packet loss to zero. Thanks for all the valuable advice! Regards, -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAktE0xoACgkQRO8UcfzpOHBvhwCfSl6Acb2nPvtcFFgzZrkTCIFk bhEAoKjfv3BWnIRtEsCZt9W0SfKN3xPT =/f+g -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I'm aware of the theory and realize that deleting stuff requires writes. I'm also running on the latest b130 and write stuff to disk in large 128k chunks. The thing I was wondering about is whether there is a mechanism that might lower the I/O scheduling priority of a given process (e.g. lower the priority of the rm command) in a manner similar to CPU scheduling priority. Another solution would be to increase the max size the ZFS write buffer, so that writes would not block. What I'd specifically like to avoid doing is buffer writes in the recorder process. Besides being complicated to do (the process periodically closes and reopens several output files at specific moments in time and keeping them in sync is a bit hairy), I need the written data to appear in the filesystem very soon after being received from the network. The logic behind this is that this is streaming media data which a user can immediately start playing back while it's being recorded. It's crucial that the user be able to follow the real-time recording with at most a 1-2 second delay (in fact, at the moment I can get down to 1 second behind live TV). If I buffer writes for up to 10 seconds in user-space, other playback processes can fail due to running out of data. Regards, - -- Saso Bob Friesenhahn wrote: On Wed, 6 Jan 2010, Saso Kiselkov wrote: I've encountered a new problem on the opposite end of my app - the write() calls to disk sometimes block for a terribly long time (5-10 seconds) when I start deleting stuff on the filesystem where my recorder processes are writing. Looking at iostat I can see that the disk load is strongly uneven - with a lowered zfs_txg_timeout=1 I get normal writes every second, but when I start deleting stuff (e.g. rm -r *), huge load spikes appear from time to time, even to the level of blocking all processes writing to the filesystem and filling up the network input buffer and starting to drop packets. Is there a way that I can increase the write I/O priority, or increase the write buffer in ZFS so that write()s won't block? Deleting stuff results in many small writes to the pool in order to free up blocks and update metadata. It is one of the most challenging tasks that any filesystem will do. It seems that most recent development OpenSolaris has added use of a new scheduling class in order to limit the impact of such load spikes. I am eagerly looking forward to being able to use this. It is difficult for your application to do much if the network device driver fails to work, but your application can do some of its own buffering and use multithreading so that even a long delay can be handled. Use of the asynchronous write APIs may also help. Writes should be blocked up to the size of the zfs block (e.g. 128K), and also aligned to the zfs block if possible. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAktFAaQACgkQRO8UcfzpOHDsHwCcC4CeWjmZgfINiVYXuyXKAjZg a24AnA2mXCZMJzcAGlu9w8e81X2duNGI =T7qS -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Be sure to also update to the latest dev b130 release, as that also helps with a more smooth scheduling class for the zfs threads. If the upgrade breaks anything, you can always just boot back into the old environment before the upgrade. Regards, - -- Saso Bill Werner wrote: Thanks for this thread! I was just coming here to discuss this very same problem. I'm running 2009.06 on a Q6600 with 8GB of RAM. I have a Windows system writing multiple OTA HD video streams via CIFS to the 2009.06 system running Samba. I then have multiple clients reading back other HD video streams. The write client never skips a beat, but the read clients have constant problems getting data when the burst writes occur. I am now going to try the txg_timeout and see if that helps. It would be nice if these tunables were settable on a per-pool basis though. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAks/sloACgkQRO8UcfzpOHC7ywCffZSGYBwd3hRZE5BAfMZpT/g6 ebsAmQFDJ5VyOcaCXKW1TN6I7wmE9w1O =Ex5W -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ok, I figured out that apparently I was the idiot in this story, again. I forgot to set SO_RCVBUF on my network sockets higher, so that's why I was dropping input packets. The zfs_txg_timeout=1 flag is still necessary (or else dropping occurs when commiting data to disk), but by increasing network input buffer sizes it seems I was able to cut input packet loss to zero. Thanks for all the valuable advice! Regards, - -- Saso Saso Kiselkov wrote: I tried removing the flow and subjectively packet loss occurs a bit less often, but still it is happening. Right now I'm trying to figure out of it's due to the load on the server or not - I've left only about 15 concurrent recording instances, producing 8% load on the system. If the packet loss still occurs, I guess I'll have to disregard the loss measurements as irrelevant, since at such a load the server should not be dropping packets at all... I guess. Regards, -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAks7YhIACgkQRO8UcfzpOHC8RACgrryGDuVNBYg7q7FPzTKbL8UJ u+YAoJeUhNYGWwXGi3IqOPPIS4jW9x1j =f+GQ -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I tried removing the flow and subjectively packet loss occurs a bit less often, but still it is happening. Right now I'm trying to figure out of it's due to the load on the server or not - I've left only about 15 concurrent recording instances, producing 8% load on the system. If the packet loss still occurs, I guess I'll have to disregard the loss measurements as irrelevant, since at such a load the server should not be dropping packets at all... I guess. Regards, - -- Saso Robert Milkowski wrote: I included networking-discuss@ On 28/12/2009 15:50, Saso Kiselkov wrote: Thank you for the advice. After trying flowadm the situation improved somewhat, but I'm still getting occasional packet overflow (10-100 packets about every 10-15 minutes). This is somewhat unnerving, because I don't know how to track it down. Here are the flowadm settings I use: # flowadm show-flow iptv FLOWLINKIPADDR PROTO LPORT RPORT DSFLD iptve1000g1 LCL:224.0.0.0/4 -- -- -- -- # flowadm show-flowprop iptv FLOW PROPERTYVALUE DEFAULTPOSSIBLE iptv maxbw -- -- ? iptv priorityhigh -- high I also tuned udp_max_buf to 256MB. All recording processes are boosted to the RT priority class and zfs_txg_timeout=1 to force the system to commit data to disk in smaller and more manageable chunks. Is there any further tuning you could recommend? Regards, ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAks58KIACgkQRO8UcfzpOHCSJQCePCPVhbbfdogNHL735qz3A3dI 4acAn2jofXsGsveDYCgkelwg1xXKFVId =UPRk -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I progressed with testing a bit further and found that I was hitting another scheduling bottleneck - the network. While the write burst was running and ZFS was commiting data to disk, the server was dropping incomming UDP packets (netstat -s | grep udpInOverflows grew by about 1000-2000 packets during every write burst). To work around that I had to boost the scheduling priority of recorder processes to the real-time class and I also had to lower zfs_txg_timeout=1 (there was still minor packet drop after just doing priocntl on the processes) to even out the CPU load. Any ideas on why ZFS should completely thrash the network layer and make it drop incomming packets? Regards, - -- Saso Robert Milkowski wrote: On 26/12/2009 12:22, Saso Kiselkov wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thank you, the post you mentioned helped me move a bit forward. I tried putting: zfs:zfs_txg_timeout = 1 btw: you can tune it on a live system without a need to do reboots. mi...@r600:~# echo zfs_txg_timeout/D | mdb -k zfs_txg_timeout: zfs_txg_timeout:30 mi...@r600:~# echo zfs_txg_timeout/W0t1 | mdb -kw zfs_txg_timeout:0x1e= 0x1 mi...@r600:~# echo zfs_txg_timeout/D | mdb -k zfs_txg_timeout: zfs_txg_timeout:1 mi...@r600:~# echo zfs_txg_timeout/W0t30 | mdb -kw zfs_txg_timeout:0x1 = 0x1e mi...@r600:~# echo zfs_txg_timeout/D | mdb -k zfs_txg_timeout: zfs_txg_timeout:30 mi...@r600:~# -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAks4sa8ACgkQRO8UcfzpOHAASgCdF1QWcKvpvK58BPBVr9EDmrWK zmoAoLeX3Q+avIDbb+CONlh++pAIGOob =NcRo -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thank you for the advice. After trying flowadm the situation improved somewhat, but I'm still getting occasional packet overflow (10-100 packets about every 10-15 minutes). This is somewhat unnerving, because I don't know how to track it down. Here are the flowadm settings I use: # flowadm show-flow iptv FLOWLINKIPADDR PROTO LPORT RPORT DSFLD iptve1000g1 LCL:224.0.0.0/4 -- -- -- -- # flowadm show-flowprop iptv FLOW PROPERTYVALUE DEFAULTPOSSIBLE iptv maxbw -- -- ? iptv priorityhigh -- high I also tuned udp_max_buf to 256MB. All recording processes are boosted to the RT priority class and zfs_txg_timeout=1 to force the system to commit data to disk in smaller and more manageable chunks. Is there any further tuning you could recommend? Regards, - -- Saso I need all IP multicast input traffic on e1000g1 to get the highest possible priority. Markus Kovero wrote: Hi, Try to add flow for traffic you want to get prioritized, I noticed that opensolaris tends to drop network connectivity without priority flows defined, I believe this is a feature presented by crossbow itself. flowadm is your friend that is. I found this particularly annoying if you monitor servers with icmp-ping and high load causes checks to fail therefore triggering unnecessary alarms. Yours Markus Kovero -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Saso Kiselkov Sent: 28. joulukuuta 2009 15:25 To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS write bursts cause short app stalls I progressed with testing a bit further and found that I was hitting another scheduling bottleneck - the network. While the write burst was running and ZFS was commiting data to disk, the server was dropping incomming UDP packets (netstat -s | grep udpInOverflows grew by about 1000-2000 packets during every write burst). To work around that I had to boost the scheduling priority of recorder processes to the real-time class and I also had to lower zfs_txg_timeout=1 (there was still minor packet drop after just doing priocntl on the processes) to even out the CPU load. Any ideas on why ZFS should completely thrash the network layer and make it drop incomming packets? Regards, ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAks406oACgkQRO8UcfzpOHBVFwCguUVlMhTt9PlcbcqUjJzJ8Oij CiIAoJJFHu1wtLMbyOyhXbyDPTkSFSFc =VLoO -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thanks for the mdb syntax - I wasn't sure how to set it using mdb at runtime, which is why I used /etc/system. I was quite intrigued to find out that the Solaris kernel was in fact designed for being tuned at runtime using some generic debugging mechanism, rather than like other traditional kernels, using a defined kernel settings interface (sysctl comes to mind). Anyway, upgrading to b130 helped my issue and I hope that by the time we start selling this product, OpenSolaris 2010.02 comes out, so that I can tell people to just grab the latest stable OpenSolaris release, rather than having to go to a development branch or tuning kernel parameters to even get the software working as it should. Regards, - -- Saso Robert Milkowski wrote: On 26/12/2009 12:22, Saso Kiselkov wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thank you, the post you mentioned helped me move a bit forward. I tried putting: zfs:zfs_txg_timeout = 1 btw: you can tune it on a live system without a need to do reboots. mi...@r600:~# echo zfs_txg_timeout/D | mdb -k zfs_txg_timeout: zfs_txg_timeout:30 mi...@r600:~# echo zfs_txg_timeout/W0t1 | mdb -kw zfs_txg_timeout:0x1e= 0x1 mi...@r600:~# echo zfs_txg_timeout/D | mdb -k zfs_txg_timeout: zfs_txg_timeout:1 mi...@r600:~# echo zfs_txg_timeout/W0t30 | mdb -kw zfs_txg_timeout:0x1 = 0x1e mi...@r600:~# echo zfs_txg_timeout/D | mdb -k zfs_txg_timeout: zfs_txg_timeout:30 mi...@r600:~# -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAks3lGkACgkQRO8UcfzpOHBzcwCgyDlxr94I9r8kHbVEkTt1lu0Y AOIAmgJnZ5nZw8j7FS+irrJWJ4RBup0Q =0g8/ -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 The application I'm working on is a kind of large-scale network-PVR system for our IPTV services. It records all running TV channels in a X-hour carrousel (typically 24 or 48-hours), retaining only those bits which users have marked as being interesting to them. The current setup I'm doing development on is a small 12TB array, future deployment is planned on several 96TB X4540 machines. I agree that I kind of misused the term `sequential' - it really is 77 concurrent sequential writes. However, as I explained, I/O is not the bottleneck here, as the array is capable of writes around 600MBytes/s, and the write load I'm putting on it is around 55MBytes/s (430Mbit/s). The problem is, as Brent explained, that as soon as the OS decides it wants to write the transaction group to disk, it totally ignores all other time-critical activity in the system and focuses on just that, causing an input poll() stall on all network sockets. What I'd need to do is force it to commit transactions to disk more often so as to even the load out over a longer period of time, to bring the CPU usage spikes down to a more manageable and predictable level. Regards, - -- Saso Tim Cook wrote: On Fri, Dec 25, 2009 at 11:43 PM, Brent Jones br...@servuhome.net wrote: Hang on... if you've got 77 concurrent threads going, I don't see how that's a sequential I/O load. To the backend storage it's going to look like the equivalent of random I/O. I'd also be surprised to see 12 1TB disks supporting 600MB/sec throughput and would be interested in hearing where you got those numbers from. Is your video capture doing 430MB or 430Mbit? -- --Tim Think he said 430Mbit/sec, which if these are security cameras, would be a good sized installation (30+ cameras). We have a similar system, albeit running on Windows. Writing about 400Mbit/sec using just 6, 1TB SATA drives is entirely possible, and working quite well on our system without any frame loss or much latency. Once again, Mb or MB? They're two completely different numbers. As for getting 400Mbit out of 6 SATA drive, that's not really impressive at all. If you're saying you got 400MB, that's a different story entirely, and while possible with sequential I/O and a proper raid setup, it isn't happening with random. The writes lag is noticeable however with ZFS, and the behavior of the transaction group writes. If you have a big write that needs to land on disk, it seems all other I/O, CPU and niceness is thrown out the window in favor of getting all that data on disk. I was on a watch list for a ZFS I/O scheduler bug with my paid Solaris support, I'll try to find that bug number, but I believe some improvements were done in 129 and 130. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAks1y8oACgkQRO8UcfzpOHBkDQCgxScaPPS7d+peoiY16Nafo8lu 1nsAoNMwiUdOdQKCZpdyPGoAWz36IWY5 =T6fy -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Would an upgrade to the development repository of 2010.02 do the same? I'd like to avoid having to do a complete reinstall, since I've got quite a bit of custom software in the system already in various places and recompiling and fine-tuning would take me another 1-2 days. Regards, - -- Saso Leonid Kogan wrote: Try b130. http://genunix.org/ Cheers, LK On 12/26/2009 12:59 AM, Saso Kiselkov wrote: Hi, I tried it and I got the following error message: # zfs set logbias=throughput content cannot set property for 'content': invalid property 'logbias' Is it because I'm running some older version which does not have this feature? (2009.06) Regards, -- Saso Leonid Kogan wrote: Hi there, Try to: zfs set logbias=throughputyourdataset Good luck, LK ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAks1zCIACgkQRO8UcfzpOHA1SQCaAqK+2v/+lQnuaXPc4pOju7UC oaIAoNKJO3oOr4DCdCXHCp+vf2/Ri2mW =pmGr -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brent Jones wrote: On Fri, Dec 25, 2009 at 9:56 PM, Tim Cook t...@cook.ms wrote: On Fri, Dec 25, 2009 at 11:43 PM, Brent Jones br...@servuhome.net wrote: Hang on... if you've got 77 concurrent threads going, I don't see how that's a sequential I/O load. To the backend storage it's going to look like the equivalent of random I/O. I'd also be surprised to see 12 1TB disks supporting 600MB/sec throughput and would be interested in hearing where you got those numbers from. Is your video capture doing 430MB or 430Mbit? -- --Tim Think he said 430Mbit/sec, which if these are security cameras, would be a good sized installation (30+ cameras). We have a similar system, albeit running on Windows. Writing about 400Mbit/sec using just 6, 1TB SATA drives is entirely possible, and working quite well on our system without any frame loss or much latency. Once again, Mb or MB? They're two completely different numbers. As for getting 400Mbit out of 6 SATA drive, that's not really impressive at all. If you're saying you got 400MB, that's a different story entirely, and while possible with sequential I/O and a proper raid setup, it isn't happening with random. Mb, megabit. 400 megabit is not terribly high, a single SATA drive could write that 24/7 without a sweat. Which is why he is reporting his issue. Sequential or random, any modern system should be able to perform that task without causing disruption to other processes running on the system (if Windows can, Solaris/ZFS most definitely should be able to). I have similar workload on my X4540's, streaming backups from multiple systems at a time. These are very high end machines, dual quadcore opterons and 64GB RAM, 48x 1TB drives in 5-6 disk RAIDZ vdevs. The write stalls have been a significant problem since ZFS came out, and hasn't really been addressed in an acceptable fashion yet, though work has been done to improve it. I'm still trying to find the case number I have open with Sunsolve or whatever, it was for exactly this issue, and I believe the fix was to add dozens more classes to the scheduler, to allow more fair disk I/O and overall niceness on the system when ZFS commits a transaction group. Wow, if there were a production-release solution to the problem, that would be great! Reading the mailing list I almost gave up hope that I'd be able to work around this issue without upgrading to the latest bleeding-edge development version. Regards, - -- Saso -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAks10xQACgkQRO8UcfzpOHCFUQCeJ0kHwOgM3Vjc6QjIL6XHVip5 ed4AoIYrNGAZR2V69uUk3Gc/MAl3kew3 =5uSX -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thank you, the post you mentioned helped me move a bit forward. I tried putting: zfs:zfs_txg_timeout = 1 in /etc/system and now I'm getting much more even write load (a burst every 5 seconds), which now does not cause any significant poll() stalling anymore. So far I fail to find the timer in the ZFS source code which causes the 5-second timeout instead of what I want (1 second). Another thing that's left on my mind is why I'm still getting a very slight burst every 60 seconds (causing a poll() delay of around 20-30ms, instead of the usual 0-2ms). It's not that big a problem, it's just that I'm curious as to where it's being created. I assume some 60-second timer is firing, but I don't know where. Regards, - -- Saso Fajar A. Nugraha wrote: On Sat, Dec 26, 2009 at 4:10 PM, Saso Kiselkov skisel...@gmail.com wrote: I'm still trying to find the case number I have open with Sunsolve or whatever, it was for exactly this issue, and I believe the fix was to add dozens more classes to the scheduler, to allow more fair disk I/O and overall niceness on the system when ZFS commits a transaction group. Wow, if there were a production-release solution to the problem, that would be great! Have you checked this thread? http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg28704.html Reading the mailing list I almost gave up hope that I'd be able to work around this issue without upgrading to the latest bleeding-edge development version. Isn't opensolaris already bleeding edge? -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAks1/+8ACgkQRO8UcfzpOHC6kgCfcTv86Gwh2MvvVQJeJr/BRghe f6IAn2N1t4QNLfwBdafZHUbXCw0grTRk =hUJV -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thanks for the advice. I did an in-place upgrade to the latest development b130 release and it seems that the change in scheduling classes for the kernel writer threads worked (not even having to fiddle around with logbias) - now I'm just getting small delays every 60 seconds (on the order of 20-30ms). I'm not sure these have something to do with ZFS, though... they happen outside of the write bursts. Thank you all for the valuable advice! Regards, - -- Saso Richard Elling wrote: On Dec 26, 2009, at 1:10 AM, Saso Kiselkov wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brent Jones wrote: On Fri, Dec 25, 2009 at 9:56 PM, Tim Cook t...@cook.ms wrote: On Fri, Dec 25, 2009 at 11:43 PM, Brent Jones br...@servuhome.net wrote: Hang on... if you've got 77 concurrent threads going, I don't see how that's a sequential I/O load. To the backend storage it's going to look like the equivalent of random I/O. I'd also be surprised to see 12 1TB disks supporting 600MB/sec throughput and would be interested in hearing where you got those numbers from. Is your video capture doing 430MB or 430Mbit? -- --Tim Think he said 430Mbit/sec, which if these are security cameras, would be a good sized installation (30+ cameras). We have a similar system, albeit running on Windows. Writing about 400Mbit/sec using just 6, 1TB SATA drives is entirely possible, and working quite well on our system without any frame loss or much latency. Once again, Mb or MB? They're two completely different numbers. As for getting 400Mbit out of 6 SATA drive, that's not really impressive at all. If you're saying you got 400MB, that's a different story entirely, and while possible with sequential I/O and a proper raid setup, it isn't happening with random. Mb, megabit. 400 megabit is not terribly high, a single SATA drive could write that 24/7 without a sweat. Which is why he is reporting his issue. Sequential or random, any modern system should be able to perform that task without causing disruption to other processes running on the system (if Windows can, Solaris/ZFS most definitely should be able to). I have similar workload on my X4540's, streaming backups from multiple systems at a time. These are very high end machines, dual quadcore opterons and 64GB RAM, 48x 1TB drives in 5-6 disk RAIDZ vdevs. The write stalls have been a significant problem since ZFS came out, and hasn't really been addressed in an acceptable fashion yet, though work has been done to improve it. PSARC case 2009/615 : System Duty Cycle Scheduling Class and ZFS IO Observability was integrated into b129. This creates a scheduling class for ZFS IO and automatically places the zio threads into that class. This is not really an earth-shattering change, Solaris has had a very flexible scheduler for almost 20 years now. Another example is that on a desktop, the application which has mouse focus runs in the interactive scheduling class. This is completely transparent to most folks and there is no tweaking required. Also fixed in b129 is BUG/RFE:6881015ZFS write activity prevents other threads from running in a timely manner, which is related to the above. I'm still trying to find the case number I have open with Sunsolve or whatever, it was for exactly this issue, and I believe the fix was to add dozens more classes to the scheduler, to allow more fair disk I/O and overall niceness on the system when ZFS commits a transaction group. Wow, if there were a production-release solution to the problem, that would be great! Reading the mailing list I almost gave up hope that I'd be able to work around this issue without upgrading to the latest bleeding-edge development version. Changes have to occur someplace first. In the OpenSolaris world, the changes occur first in the dev train and then are back ported to Solaris 10 (sometimes, not always). You should try the latest build first -- be sure to follow the release notes. Then, if the problem persists, you might consider tuning zfs_txg_timeout, which can be done on a live system. -- richard -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAks2RfgACgkQRO8UcfzpOHDhCQCeIrJxcy4TcqgvPwGYm/f97NG9 ac8An2zTTqtz/KCK6a4IzKHzgYdEB0Qe =9zO8 -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS write bursts cause short app stalls
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I've started porting a video streaming application to opensolaris on ZFS, and am hitting some pretty weird performance issues. The thing I'm trying to do is run 77 concurrent video capture processes (roughly 430Mbit/s in total) all writing into separate files on a 12TB J4200 storage array. The disks in the array are arranged into a single RAID-0 ZFS volume (though I've tried different RAID levels, none helped). CPU performance is not an issue (barely hitting 35% utilization on a single CPU quad-core X2250). I/O bottlenecks can also be ruled out, since the storage array's sequential write performance is around 600MB/s. The problem is the bursty behavior of ZFS writes. All the capture processes do, in essence is poll() on a socket and then read() and write() any available data from it to a file. The poll() call is done with a timeout of 250ms, expecting that if no data arrives within 0.25 seconds, the input is dead and recording stops (I tried increasing this value, but the problem still arises, although not as frequently). When ZFS decides that it wants to commit a transaction group to disk (every 30 seconds), the system stalls for a short amount of time and depending on the number capture of processes currently running, the poll() call (which usually blocks for 1-2ms), takes on the order of hundreds of ms, sometimes even longer. I figured that I might be able to resolve this by lowering the txg timeout to something like 1-2 seconds (I need ZFS to write as soon as data arrives, since it will likely never be overwritten), but I couldn't find any tunable parameter for it anywhere on the net. On FreeBSD, I think this can be done via the vfs.zfs.txg_timeout sysctl. A glimpse into the source at http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/txg.c on line 40 made me worry that somebody maybe hard-coded this value into the kernel, in which case I'd be pretty much screwed in opensolaris. Any help would be greatly appreciated. Regards, - -- Saso -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAks0/QoACgkQRO8UcfzpOHB9PgCeOuJFVHTCohRzuf7kAEkC1l1i qBAAn18Jkx+N9OotWVCwpz5iQzNZSsEG =FCJL -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
Hi, I'm not sure what b130 means, I'm fairly new to OpenSolaris. How do I find out? As for the OS version, it is OpenSolaris 2009.06. Regards, -- Saso Richard Elling wrote: On Dec 25, 2009, at 9:57 AM, Saso Kiselkov wrote: I've started porting a video streaming application to opensolaris on ZFS, and am hitting some pretty weird performance issues. The thing I'm trying to do is run 77 concurrent video capture processes (roughly 430Mbit/s in total) all writing into separate files on a 12TB J4200 storage array. The disks in the array are arranged into a single RAID-0 ZFS volume (though I've tried different RAID levels, none helped). CPU performance is not an issue (barely hitting 35% utilization on a single CPU quad-core X2250). I/O bottlenecks can also be ruled out, since the storage array's sequential write performance is around 600MB/s. The problem is the bursty behavior of ZFS writes. All the capture processes do, in essence is poll() on a socket and then read() and write() any available data from it to a file. There have been some changes recently, including one in b130 that might apply to this workload. What version of the OS are you running? If not b130, try b130. -- richard The poll() call is done with a timeout of 250ms, expecting that if no data arrives within 0.25 seconds, the input is dead and recording stops (I tried increasing this value, but the problem still arises, although not as frequently). When ZFS decides that it wants to commit a transaction group to disk (every 30 seconds), the system stalls for a short amount of time and depending on the number capture of processes currently running, the poll() call (which usually blocks for 1-2ms), takes on the order of hundreds of ms, sometimes even longer. I figured that I might be able to resolve this by lowering the txg timeout to something like 1-2 seconds (I need ZFS to write as soon as data arrives, since it will likely never be overwritten), but I couldn't find any tunable parameter for it anywhere on the net. On FreeBSD, I think this can be done via the vfs.zfs.txg_timeout sysctl. A glimpse into the source at http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/txg.c on line 40 made me worry that somebody maybe hard-coded this value into the kernel, in which case I'd be pretty much screwed in opensolaris. Any help would be greatly appreciated. Regards, ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write bursts cause short app stalls
Hi, I tried it and I got the following error message: # zfs set logbias=throughput content cannot set property for 'content': invalid property 'logbias' Is it because I'm running some older version which does not have this feature? (2009.06) Regards, -- Saso Leonid Kogan wrote: Hi there, Try to: zfs set logbias=throughput yourdataset Good luck, LK ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss