An update on my 1.8.4 experiences.

My initial success report was a bit premature.

I still occasionally get I/O errors with certain apps (esp libreoffice) 
using 1.8.4. They occur with less frequency than with 1.8.3 and earlier, 
however.

An strace shows libreoffice trying to do an openat() and getting an I/O 
error.

[pid 44604] openat(AT_FDCWD, 
"/afs/cas.unc.edu/home/stephen/.config/libreoffice/4/user/pNumql", 
O_RDWR|O_CREAT|O_EXCL, 0600) = -1 EIO (Input/output error)

Interestingly, when I try to remove the "user" directory, I get behavior 
that smells like a cache problem.

<3:27pm>stephen@lucifer:4>rm -rf user
rm: cannot remove 'user/config': Directory not empty

<3:27pm>stephen@lucifer:4>ls -la user/config
total 4
drwx------ 2 stephen users 2048 Sep 12 15:19 ./
drwx------ 3 stephen users 2048 Sep 12 15:23 ../

<3:28pm>stephen@lucifer:4>fs flush user/config

<3:28pm>stephen@lucifer:4>ls -la user/config/
total 6
drwx------ 2 stephen users 2048 Sep 12 15:19 ./
drwx------ 3 stephen users 2048 Sep 12 15:23 ../
-rw------- 1 stephen users 1703 Sep 12 15:19 javasettings_Linux_X86_64.xml

<3:28pm>stephen@lucifer:4>rm -rf user
rm: cannot remove 'user': Directory not empty

<3:37pm>stephen@lucifer:4>ls -la user
total 4
drwx------ 2 stephen users 2048 Sep 12 15:37 ./
drwx------ 3 stephen users 2048 Sep  3 15:30 ../

<3:37pm>stephen@lucifer:4>fs flush user
<3:37pm>stephen@lucifer:4>ls -la user
total 4
drwx------ 2 stephen users 2048 Sep 12 15:37 ./
drwx------ 3 stephen users 2048 Sep  3 15:30 ../
-rw------- 1 stephen users    0 Sep 12 15:23 K8Wuhr

I can eventually flush enough paths to remove the entire libreoffice/4/user 
directory, but the problem recurs on the next launch. Once it gets into 
this state, it seems quite reproducible.

If it helps, I've seen this behavior on multiple workstations, so I don't 
think it's hardware. Thinking it might be callback related, I tested with 
the client firewall set to default accept, but it made no apparent 
difference.

>uname -a
Linux lucifer 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux

>lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.3 LTS
Release:        18.04
Codename:       bionic

>rxdebug `hostname` 7001 -version
Trying 127.0.1.1 (port 7001):
AFS version: OpenAFS 1.8.4~pre1-1~ppa0~ubuntu18.04.1-debian 2019-08-27 root@

>cmdebug -server `hostname` -cache
Chunk files:   31250
Stat caches:   15000
Data caches:   10000
Volume caches: 200
Chunk size:    1048576
Cache size:    1000000 kB
Set time:      no
Cache type:    disk

>uptime
15:51:01 up 2 days,  1:25, 15 users,  load average: 0.47, 0.52, 0.57

Filesystem type is ext4. Storebehind is 0. Cache bypass is disabled.

Rebooting solves this issue but it generally recurs in 1-5 days. With 
1.8.3, I'd sometimes need to reboot daily.

Probably doesn't matter given the changes, but 1.6.x on Ubuntu 16.04 on the 
same hardware didn't exhibit this symptom.

Replacing the ~/.config/libreoffice/4/user directory with a symlink to a 
location on the local disk appears to be a valid workaround.

Tickets/tokens seem fine. Other file accesses work as expected 
before/during/after the above.

If any other info would help to diagnose this, let me know.

1.8.4pre1 is still an improvement over 1.8.3. Thanks again!


On Fri, 30 Aug 2019, Joyce, Stephen wrote:

> Just wanted to voice a thank you to all the devs who worked on the recent
> release. 1.8.4pre1 seems to have fixed several issues I was having on some
> Ubuntu 18.04 workstations.
>
> While I haven't done any formal stress-testing, I have noticed no problems
> so far.
>
> ~Stephen
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>

_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to