Re: [OpenAFS] Re: accessing R/O volume becomes slow
On 11/27/2014 01:11 PM, Stephan Wiesand wrote: On 27 Nov 2014, at 11:26, Hans-Werner Paulsen h...@mpa-garching.mpg.de wrote: Yesterday, on another machine I created and deleted 4 million files on AFS. The number of afs_inode_cache slabs grew from 1 million to 5 million. Today there are still 5 million entries. It should shrink when there's memory pressure. If you're still worried, there's the -disable-dynamic-vcaches switch for afsd. On my desktop PC (Linux 3.16.5 x86_64, OpenAFS 1.6.10) I set the -disable-dynamic-vcaches option, the -stat option has a value of 65536. When I create 100,000 files, I see 100,000 more afs_inode_cache slab objects. But, the fileserver is seeing this option, there are only 65253 nFEs, 65253 nCBs (4194304 nblks). Without -disable-dynamic-vcaches the number of CBs is about the number of created files. And if I try to create more files than nCBs on the fileserver, the fileserver (dafileserver) hangs for about 15 minutes (dafileserver 100-120% cpu!), and I get a connection timeout on the client. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: accessing R/O volume becomes slow
On 11/26/2014 09:15 PM, Andrew Deason wrote: On Wed, 26 Nov 2014 10:51:00 +0100 Hans-Werner Paulsen h...@mpa-garching.mpg.de wrote: Checking the machine I see more than 5 million of afs_inode_cache slab entries. Is this normal? Any hint how to proceed? That's not unusual if you are accessing a lot of files (say, about 5 million recently accessed). But having a lot of vcaches in memory can cause certain operations to be slow; there was a fix just added in 1.6.10 to improve speed for a background cleanup process with lots of files (well, and PAGs): 94f1d4. Yesterday, on another machine I created and deleted 4 million files on AFS. The number of afs_inode_cache slabs grew from 1 million to 5 million. Today there are still 5 million entries. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: accessing R/O volume becomes slow
On 27 Nov 2014, at 11:26, Hans-Werner Paulsen h...@mpa-garching.mpg.de wrote: On 11/26/2014 09:15 PM, Andrew Deason wrote: On Wed, 26 Nov 2014 10:51:00 +0100 Hans-Werner Paulsen h...@mpa-garching.mpg.de wrote: Checking the machine I see more than 5 million of afs_inode_cache slab entries. Is this normal? Any hint how to proceed? That's not unusual if you are accessing a lot of files (say, about 5 million recently accessed). But having a lot of vcaches in memory can cause certain operations to be slow; there was a fix just added in 1.6.10 to improve speed for a background cleanup process with lots of files (well, and PAGs): 94f1d4. Yesterday, on another machine I created and deleted 4 million files on AFS. The number of afs_inode_cache slabs grew from 1 million to 5 million. Today there are still 5 million entries. It should shrink when there's memory pressure. If you're still worried, there's the -disable-dynamic-vcaches switch for afsd. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: accessing R/O volume becomes slow
On Wed, 26 Nov 2014 10:51:00 +0100 Hans-Werner Paulsen h...@mpa-garching.mpg.de wrote: this is on Linux 3.14.8 x86_64, and OpenAFS 1.6.9. The machine is running normally for several months, and then accessing a specific R/O volume (e.g. ls -lR large_volume) becomes slow. Do you mean it's slow when you hit the net, or even when you expect everything to be cached? (That is, if you run ls -lR twice in a row, does it still remain slow?) I also echo Ben's suggestion to try other volumes on the same server. Try to isolate if it's stuff on the server that's causing the problem, or the specific partition, or just that one volume. Or maybe it could be a specific dir somewhere in the volume. Checking the machine I see more than 5 million of afs_inode_cache slab entries. Is this normal? Any hint how to proceed? That's not unusual if you are accessing a lot of files (say, about 5 million recently accessed). But having a lot of vcaches in memory can cause certain operations to be slow; there was a fix just added in 1.6.10 to improve speed for a background cleanup process with lots of files (well, and PAGs): 94f1d4. Other information that could be gathered: fstrace data (but if data is going by too quickly, it can be hard to get useful data out of this), or 'strace' syscall timing information (to see what syscalls are slow), or a network dump, if you are hitting the net in the cases you're talking about; that could help show if it's the client or server that's being slow (when comparing a 'success' run vs a 'slow' run). Traces like that are hard to look at when you have a ton of data to sort through, but it's still feasible to compare timings from a 'slow' run to a 'fast' run to try to see if the speed difference is coming from a particular place. -- Andrew Deason adea...@sinenomine.net ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info