[OpenAFS] An Example of OpenAFS in Production

2013-06-14 Thread Ken Elkabany
Hi everyone, For the benefit of the OpenAFS community, I wanted to share a blog post describing how we use AUFS, LXC, and OpenAFS together to efficiently manage and deploy software dependencies across thousands of machines. We do not reference OpenAFS by name, but instead call it our DFS. You

Re: [OpenAFS] Re: Kernel NULL pointer dereference

2012-04-20 Thread Ken Elkabany
On Fri, Apr 20, 2012 at 10:30 AM, Andrew Deason adea...@sinenomine.netwrote: On Thu, 19 Apr 2012 18:55:08 -0700 Ken Elkabany k...@elkabany.com wrote: We have 2 OpenAFS servers running 1.4.14. We have many clients that we just switched over to 1.6.1pre1. Starting earlier today, we started

[OpenAFS] Kernel NULL pointer dereference

2012-04-19 Thread Ken Elkabany
We have 2 OpenAFS servers running 1.4.14. We have many clients that we just switched over to 1.6.1pre1. Starting earlier today, we started getting NULL pointer dereferences, which has been completely hosing the clients. The client machines hang on any call that deals with AFS, whether it's ls /,

[OpenAFS] Fallback Time with Multiple RO Sites

2012-04-18 Thread Ken Elkabany
Hi, We have two RO sites for a single RW volume. One sits on the same server as the RW, the other on a different fileserver. If we intentionally take down the the fileserver with the RW volume and one of the RO sites, and then try to access the volume through a mount point in the RO cell, it

Re: [OpenAFS] Re: Fallback Time with Multiple RO Sites

2012-04-18 Thread Ken Elkabany
Yes, you're correct. Upon retrying it, it took 56.7 seconds. Thanks for the explanation. We'll try out the sysctl, and fs setserverprefs. On Wed, Apr 18, 2012 at 4:04 PM, Andrew Deason adea...@sinenomine.netwrote: On Wed, 18 Apr 2012 15:45:58 -0700 Ken Elkabany k...@elkabany.com wrote

Re: [OpenAFS] Re: ProbeUuid for host failed

2012-04-03 Thread Ken Elkabany
On Tue, Apr 3, 2012 at 10:25 AM, Andrew Deason adea...@sinenomine.netwrote: On Mon, 2 Apr 2012 19:04:19 -0700 Ken Elkabany k...@elkabany.com wrote: Over time these errors become more and more frequent. The problem is that the client who hits this issue will experience a 5-10s delay

[OpenAFS] Fileserver machine freezes for 4 seconds every 12 seconds

2012-04-03 Thread Ken Elkabany
We're noticing an odd behavior while SSH-ed into the file servers. Every 12 seconds, the fileserver and volserver hit 100% CPU usage, and our SSH terminals freeze for about 4 seconds. While we often have 200+ clients actively using our two fileservers, this occurs even when we have only about 40.

Re: [OpenAFS] Re: ProbeUuid for host failed

2012-04-03 Thread Ken Elkabany
On Tue, Apr 3, 2012 at 8:42 PM, Andrew Deason adea...@sinenomine.netwrote: On Tue, 3 Apr 2012 19:04:03 -0700 Ken Elkabany k...@elkabany.com wrote: 1.6.0pre1 which was packaged with Ubuntu 11.10. Should we make it a priority to upgrade? Yes, there are many known problems

[OpenAFS] Tips for increasing throughput

2011-08-31 Thread Ken Elkabany
Hi, On a network capable of scp-ing files between machines at 60MB/sec, we are only able to achieve 2-3MB/sec of throughput when using AFS. We've been conducting tests on 300MB files, by copying them from the /afs/* mount to the local filesystem with memcache enabled. In production, we will not

Re: [OpenAFS] Can client's CellServDB file rely on DNS?

2010-05-26 Thread Ken Elkabany
Hi again, I finally got around to trying out dns resolution for cells listed in the CellServDB. However, the proposed solution does not work with openafs-client 1.4.12+dfsg-3 on ubuntu 10.04 64-bit lucid larynx. Upon ls-ing the /afs/cellname directory, a Connection timed out is immediately

[OpenAFS] Can client's CellServDB file rely on DNS?

2010-04-07 Thread Ken Elkabany
Hello, We have had to replace our master openafs fileserver several times. Each time we have had to go through each client and update the CellServDB file to reflect the IP address of the new replacement server. Since we always map a domain name to the master openafs fileserver, is it possible to

[OpenAFS] Client synchronization issues/Client caches are not being updated

2010-03-02 Thread Ken Elkabany
I have six client machines all accessing a single openafs server. The server and five clients are running Ubuntu 9.10 with openafs 1.4.10. About 80% of the time when a client modifies files on the afs, the changes are not reflected on the other clients, even after the file has been closed (a zip

Re: [OpenAFS] Client synchronization issues/Client caches are not being updated

2010-03-02 Thread Ken Elkabany
Thanks! You were all spot on. Our internal firewalls had recently been updated to block port 7001. Ken On Tue, Mar 2, 2010 at 2:17 AM, Christof Hanke christof.ha...@rzg.mpg.dewrote: Am Dienstag, 2. März 2010 11:01:22 schrieb Ken Elkabany: I have six client machines all accessing a single

[OpenAFS] No space left on device error -- despite partition being only 33% full

2009-08-14 Thread Ken Elkabany
Hi all, My afs clients have begun receiving a No space left on device error when attempting to write to or delete from a specific afs volume. The volume in question called jobs has both a RW and RO site on the same partition, vicepa. vos partinfo [server] Free space on partition /vicepa: 3292236

Re: [OpenAFS] No space left on device error -- despite partition being only 33% full

2009-08-14 Thread Ken Elkabany
Thanks! That was the problem. Ken On Fri, Aug 14, 2009 at 4:27 PM, Thomas Kulak...@tproa.net wrote: On Fri, Aug 14, 2009 at 03:54:41PM -0700, Ken Elkabany wrote: Hi all, My afs clients have begun receiving a No space left on device error when attempting to write to or delete from a specific

[OpenAFS] openafs client causing segmentation faults

2009-06-20 Thread Ken Elkabany
Hi, My openafs fileserver has begun having consistent segmentation faults for certain terminal commands, the subset of which I have identified are as follows: apt-get install * vim (but not nano) sudo Some pre-existing automated scripts that do not depend on the above have also begun failing

Re: [OpenAFS] Connection Timed Out errors occasionally when accessing openafs drive

2009-05-19 Thread Ken Elkabany
error. The processes must then be restarted. Any other suggestions? Ken On Sun, May 10, 2009 at 7:41 PM, Derrick Brashear sha...@gmail.com wrote: it probably matters in the server here, but both. Derrick On May 10, 2009, at 10:35 PM, Ken Elkabany k...@elkabany.com wrote: Is this bug fixed

[OpenAFS] Connection Timed Out errors occasionally when accessing openafs drive

2009-05-10 Thread Ken Elkabany
Hello, I have openafs 1.4.9 client and server running on two separate machines across a WAN. The client has scripts that access the /afs/our.cell/ directory. Occasionally, the script will fail to complete, and the logs will say that the Connection Timed Out on a mkdir -p /afs/our.cell/x/y/z

Re: [OpenAFS] Connection Timed Out errors occasionally when accessing openafs drive

2009-05-10 Thread Ken Elkabany
Is this bug fixed in the client or the server? Thanks. Ken On Sun, May 10, 2009 at 7:22 PM, Derrick Brashear sha...@gmail.com wrote: I'd venture this is a bug fixed in 1.4.10, with idle dead time computation in rx. Derrick On May 10, 2009, at 9:53 PM, Ken Elkabany k...@elkabany.com wrote

[OpenAFS] OpenAFS causes ssh_exchange_identification errors

2009-05-05 Thread Ken Elkabany
Hello, When using kadmin, pts, and vos to heavily configure my afs installation, I will often begin receiving errors such as the following: pts: Permission denied ; unable to create user Perplexed, I will then try to ssh into the afs fileserver, which will give me the following error:

[OpenAFS] OpenAFS Fileserver Behind NAT

2009-04-22 Thread Ken Elkabany
Hello, I am running OpenAFS 1.4.7 servers on Debian 5.0. I had initially been having trouble accessing my OpenAFS Fileserver that was behind a NAT. The VLDB was reporting the local IP of the fileserver to machines outside the NAT group, resulting in connection failures. Adding both the internal

Re: [OpenAFS] OpenAFS Fileserver Behind NAT

2009-04-22 Thread Ken Elkabany
listaddrs output. Any other ideas? Ken On Wed, Apr 22, 2009 at 5:23 AM, Derrick Brashear sha...@gmail.com wrote: On Wed, Apr 22, 2009 at 6:32 AM, Ken Elkabany k...@elkabany.com wrote: Hello, I am running OpenAFS 1.4.7 servers on Debian 5.0. I had initially been having trouble accessing my OpenAFS