Re: [OpenAFS] compile fails kernel version 4.4.0-1-default
Hi, On Tue, 8 Mar 2016, Michael Laß wrote: Was the error code 32 returned from git or did the kernel log message change accordingly? Does your log again show a lost file server connection? And have files been corrupted or just the checkout aborted? Now I tried a couple times again with the patch and this time I once found the error -512. The output in the terminal running git looks like this: --- error: unable to unlink old 'src/afs/Makefile.in' (Connection timed out) ... about 20 more messages like the above one fatal: cannot create directory at 'src/afs/NBSD': Connection timed out warning: unable to unlink .git/index.lock: Connection timed out fatal: Not a git repository (or any parent up to mount point /afs) --- The log messages look like this: --- Mar 09 08:42:45 kernel: afs: Lost contact with file server 131.169.2.24 in cell desy.de (code -512) (all multi-homed ip addresses down for the server) Mar 09 08:42:45 kernel: afs: failed to store file (network problems) Mar 09 08:42:46 kernel: afs: file server in cell desy.de is back up (code 0) (multi-homed address; other same-host interfaces may still be down) --- For some cross checking: This morning I used a backup partition of my archlinux installation which still has kernel 4.3. The backup was last updated on 2016-01-07 (pacman -Syu). On this installation I tried to provoke the above error. But everything went well for about 20 tries. This may only serve to confirm that no changes outised my computer (servers, network, ..) cause my problems. Cheers, Michael
Re: [OpenAFS] compile fails kernel version 4.4.0-1-default
Note that the osi vnode ops for linux also use the splice interface. On Tue, 2016-03-08 at 20:47 +0100, mdrslmr wrote: > > On Tue, 8 Mar 2016, Michael Laß wrote: > > > Was the error code 32 returned from git or did the kernel log message > > change accordingly? Does your log again show a lost file server > > connection? And have files been corrupted or just the checkout aborted? > > The kernel log message changed to -32. Which has already happened before > too sometimes, as I reported earlier. So the situation before the patch > was that sometimes -32 and sometimes -512 occurred in the log messages. > > After the patch I did the git checkout procedures twice and both times > I received the error code -32. When the error occurs I remove the > directory completely and copy it back from a local disk for the next > test. > > Don't know if -512 would show up if I try more often. > > With corrupted I meant the state of the git repository. I don't know if > individual files are broken. So it may well be that just the checkout > was aborted. > > > Cheers, > Michael > ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] compile fails kernel version 4.4.0-1-default
On Tue, 8 Mar 2016, Michael Laß wrote: Was the error code 32 returned from git or did the kernel log message change accordingly? Does your log again show a lost file server connection? And have files been corrupted or just the checkout aborted? The kernel log message changed to -32. Which has already happened before too sometimes, as I reported earlier. So the situation before the patch was that sometimes -32 and sometimes -512 occurred in the log messages. After the patch I did the git checkout procedures twice and both times I received the error code -32. When the error occurs I remove the directory completely and copy it back from a local disk for the next test. Don't know if -512 would show up if I try more often. With corrupted I meant the state of the git repository. I don't know if individual files are broken. So it may well be that just the checkout was aborted. Cheers, Michael
Re: [OpenAFS] compile fails kernel version 4.4.0-1-default
Being able to reactivate it is a good thing, for either testing purposes or for older kernels, since it is more efficient - unless we think there's other known brokenness such as the potential return out of the while loop mentioned earlier. On Tue, Mar 08, 2016 at 06:13:12PM +0100, Stephan Wiesand wrote: > > On Mar 8, 2016, at 17:29 , Michael Laß wrote: > > > Am Dienstag, den 08.03.2016, 16:47 +0100 schrieb mdrslmr: > >> I created a patch from what you suggested above. > >> > >> [...] > >> > >> I did all of that on top of AUR-openafs-linux-4.4 which was provided by > >> Bevan, the openafs archlinux packager. > >> > >> The patch I actually used is attached below. > > > > That patch is not complete (it's missing the configuration flag). > > Indeed. The complete patch as proposed would look like > http://gerrit.openafs.org/#/c/12217/ . Chas already objected to making it > possible to reactivate afs_linux_storeproc with a configure switch, and he's > probably right, but please feel free to comment on that change. > > > I > > will update the corresponding git branch for the openafs package soon > > to allow testing. But since LINUX_USE_SPLICE wasn't defined your patch > > should have worked, too. > > Right. > > > Was the error code 32 returned from git or did the kernel log message > > change accordingly? Does your log again show a lost file server > > connection? And have files been corrupted or just the checkout aborted? > > > Good questions. > > Sigh. Looks like there's more to it. > > -- > Stephan Wiesand > DESY -DV- > Platanenenallee 6 > 15738 Zeuthen, Germany > > ___ > OpenAFS-info mailing list > OpenAFS-info@openafs.org > https://lists.openafs.org/mailman/listinfo/openafs-info -- David William Botsch Programmer/Analyst @CNFComputing bot...@cnf.cornell.edu ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] compile fails kernel version 4.4.0-1-default
On Mar 8, 2016, at 17:29 , Michael Laß wrote: > Am Dienstag, den 08.03.2016, 16:47 +0100 schrieb mdrslmr: >> I created a patch from what you suggested above. >> >> [...] >> >> I did all of that on top of AUR-openafs-linux-4.4 which was provided by >> Bevan, the openafs archlinux packager. >> >> The patch I actually used is attached below. > > That patch is not complete (it's missing the configuration flag). Indeed. The complete patch as proposed would look like http://gerrit.openafs.org/#/c/12217/ . Chas already objected to making it possible to reactivate afs_linux_storeproc with a configure switch, and he's probably right, but please feel free to comment on that change. > I > will update the corresponding git branch for the openafs package soon > to allow testing. But since LINUX_USE_SPLICE wasn't defined your patch > should have worked, too. Right. > Was the error code 32 returned from git or did the kernel log message > change accordingly? Does your log again show a lost file server > connection? And have files been corrupted or just the checkout aborted? Good questions. Sigh. Looks like there's more to it. -- Stephan Wiesand DESY -DV- Platanenenallee 6 15738 Zeuthen, Germany ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] compile fails kernel version 4.4.0-1-default
Am Dienstag, den 08.03.2016, 16:47 +0100 schrieb mdrslmr: > I created a patch from what you suggested above. > > [...] > > I did all of that on top of AUR-openafs-linux-4.4 which was provided by > Bevan, the openafs archlinux packager. > > The patch I actually used is attached below. That patch is not complete (it's missing the configuration flag). I will update the corresponding git branch for the openafs package soon to allow testing. But since LINUX_USE_SPLICE wasn't defined your patch should have worked, too. Was the error code 32 returned from git or did the kernel log message change accordingly? Does your log again show a lost file server connection? And have files been corrupted or just the checkout aborted? Cheers, Michael ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] compile fails kernel version 4.4.0-1-default
Hi Stephan, On Tue, 8 Mar 2016, Stephan Wiesand wrote: diff --git a/src/afs/afs_fetchstore.c b/src/afs/afs_fetchstore.c index f65f40c..2630209 100644 --- a/src/afs/afs_fetchstore.c +++ b/src/afs/afs_fetchstore.c @@ -326,7 +326,7 @@ struct storeOps rxfs_storeUfsOps = { .padd =rxfs_storePadd, .close = rxfs_storeClose, .destroy = rxfs_storeDestroy, -#ifdef AFS_LINUX26_ENV +#if defined(AFS_LINUX26_ENV) && defined(LINUX_USE_SPLICE) .storeproc = afs_linux_storeproc #else .storeproc = afs_GenericStoreProc I tired a patch without much success. But I'm not sure I did everything right. and add a configure test defaulting to off? This I didn't do explicitly. Don't know what it means. I created a patch from what you suggested above. First I mistakenly build and installed the openafs package new. Than I remembered that this has to be done in the openafs-modules-dkms. So I did apply the patch there and installed it. After rebooting I tried the "git checkout cb0081604ef5369f34279c6eb77eb4d28406f2ac" and "git checkout master" a couple off times but after about four times an error occurred with code -32. I did all of that on top of AUR-openafs-linux-4.4 which was provided by Bevan, the openafs archlinux packager. The patch I actually used is attached below. Cheers, Michael From db8f4db361c76c355046dacdb83db264d3fa4f6f Mon Sep 17 00:00:00 2001 From: Michael DresselDate: Tue, 8 Mar 2016 16:00:59 +0100 Subject: [PATCH] Test Stephan Wiesand suggestion --- src/afs/afs_fetchstore.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/afs/afs_fetchstore.c b/src/afs/afs_fetchstore.c index f65f40c..2630209 100644 --- a/src/afs/afs_fetchstore.c +++ b/src/afs/afs_fetchstore.c @@ -326,7 +326,7 @@ struct storeOps rxfs_storeUfsOps = { .padd =rxfs_storePadd, .close = rxfs_storeClose, .destroy = rxfs_storeDestroy, -#ifdef AFS_LINUX26_ENV +#if defined(AFS_LINUX26_ENV) && defined(LINUX_USE_SPLICE) .storeproc = afs_linux_storeproc #else .storeproc = afs_GenericStoreProc -- 2.7.1 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: Aw: RE: [OpenAFS] compile fails kernel version 4.4.0-1-default
On Mon, 2016-03-07 at 22:37 -0500, Benjamin Kaduk wrote: > On Mon, 7 Mar 2016, Chas Williams wrote: > > > On Mon, 2016-03-07 at 01:42 -0500, Benjamin Kaduk wrote: > > > > > > I am given to understand that the proximal trigger is linux commit > > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/ > > > ?id=c725bfce7968009756ed2836a8cd7ba4dc163011, > > > which addds a path wherein -ERESTARTSYS can be returned from within > > > VFS > > > library code. (Maybe there are other such paths, but we maybe just > > > didn't > > > notice before?) This particular function, splice_from_pipe_next(), > > > ends > > > up getting called from the low-level afs_linux_storeproc() routine. > > > > I haven't had time to look at this, but does this also happen with > > the memcache? > > I believe so. Sigh. You made me read the source code. The memcache, because it doesn't use files, doesn't call the linux splice routine. It just calls afs_GenericStoreProc. So, memcache should work just fine assuming splice is the only issue with the 4.4 kernel. Perhaps someone could test this. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] compile fails kernel version 4.4.0-1-default
On Tue, 2016-03-08 at 14:37 +0100, Stephan Wiesand wrote: > So we'd simply do something like this > > > diff --git a/src/afs/afs_fetchstore.c b/src/afs/afs_fetchstore.c > index f65f40c..2630209 100644 > --- a/src/afs/afs_fetchstore.c > +++ b/src/afs/afs_fetchstore.c > @@ -326,7 +326,7 @@ struct storeOps rxfs_storeUfsOps = { > .padd =rxfs_storePadd, > .close = rxfs_storeClose, > .destroy = rxfs_storeDestroy, > -#ifdef AFS_LINUX26_ENV > +#if defined(AFS_LINUX26_ENV) && defined(LINUX_USE_SPLICE) > .storeproc = afs_linux_storeproc > #else > .storeproc = afs_GenericStoreProc > > > and add a configure test defaulting to off? Close. It should just be disabled without a way to enable for 1.6 at least. Otherwise we risk someone enabling something we know doesn't work (or is at least stable). ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] compile fails kernel version 4.4.0-1-default
> On 08 Mar 2016, at 12:02, Chas Williams <3ch...@gmail.com> wrote: > > On Tue, 2016-03-08 at 10:16 +0100, Denis Lohner wrote: >> Am 08.03.2016 um 04:37 schrieb Benjamin Kaduk: > There >>> are many call paths in the cache manager that end up at this function, >>> most of which are not prepared to properly handle an ERESTARTSYS >>> return. >>> Since this status can be returned after some data has already been >>> written, the correct behavior upon receiving it is far from clear ... >>> a >>> path towards a client free of this vector for data corruption may >>> involve >>> avoiding the dependence on splice_from_pipe_next() in preference to >>> adopting all call sites to handle the ERSTARTSYS case. > > For the 1.6 release, this seems the best choice of action. The "real" > fix would likely be difficult to completely test in a timely fashion. >>> That only helps if we know what the replacement would be...I am not a >>> linux VFS expert and do not have any ideas right now. >> >> >> I am not a kernel/driver developer nor a file system developer. So >> please forgive, if the following makes no sense at all. >> >> As far as I understand the issue and the openafs sources, the problem >> arises as afs_linux_storeproc uses the splice api that can return >> ERESTARTSYS as of kernel version 4.4. >> A quick search in the NEWS file and git logs suggests that >> afs_linux_storeproc was introduced in OpenAFS 1.5.69 (2010-01-19) as a >> performance improvement: >> " Linux >> >>* Use splice to speed up storing files." >> >> The original behaviour which uses seperate reads/writes instead of >> splice and that is (still) used on non-linux systems remained in >> afs_GenericStoreProc in src/afs/afs_fetchstore.c . >> >> So my question is: Is it possible to rereplace afs_linux_storeproc with >> afs_GenericStoreProc on linux kernel versions >=4.4 as a temporary >> solution to the issue either in the openafs sources or as a distribution >> specific patch, trading some performance for data integrity? > > That would be the first thing I tried. This code was brought into the > tree on commit 34ffc9cd7d7eed62229704ad0e1d327f076ea7b6. There doesn't seem > to be any additional side effects, so simply not using afs_linux_storeproc > should > still work. So we'd simply do something like this diff --git a/src/afs/afs_fetchstore.c b/src/afs/afs_fetchstore.c index f65f40c..2630209 100644 --- a/src/afs/afs_fetchstore.c +++ b/src/afs/afs_fetchstore.c @@ -326,7 +326,7 @@ struct storeOps rxfs_storeUfsOps = { .padd =rxfs_storePadd, .close = rxfs_storeClose, .destroy = rxfs_storeDestroy, -#ifdef AFS_LINUX26_ENV +#if defined(AFS_LINUX26_ENV) && defined(LINUX_USE_SPLICE) .storeproc = afs_linux_storeproc #else .storeproc = afs_GenericStoreProc and add a configure test defaulting to off? -- Stephan Wiesand DESY - DV - Platanenallee 6 15738 Zeuthen, Germany ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] compile fails kernel version 4.4.0-1-default
On Tue, 2016-03-08 at 10:16 +0100, Denis Lohner wrote: > Am 08.03.2016 um 04:37 schrieb Benjamin Kaduk: > >>> There > >>> > > are many call paths in the cache manager that end up at this function, > >>> > > most of which are not prepared to properly handle an ERESTARTSYS > >>> > > return. > >>> > > Since this status can be returned after some data has already been > >>> > > written, the correct behavior upon receiving it is far from clear ... > >>> > > a > >>> > > path towards a client free of this vector for data corruption may > >>> > > involve > >>> > > avoiding the dependence on splice_from_pipe_next() in preference to > >>> > > adopting all call sites to handle the ERSTARTSYS case. > >> > > >> > For the 1.6 release, this seems the best choice of action. The "real" > >> > fix would likely be difficult to completely test in a timely fashion. > > That only helps if we know what the replacement would be...I am not a > > linux VFS expert and do not have any ideas right now. > > > I am not a kernel/driver developer nor a file system developer. So > please forgive, if the following makes no sense at all. > > As far as I understand the issue and the openafs sources, the problem > arises as afs_linux_storeproc uses the splice api that can return > ERESTARTSYS as of kernel version 4.4. > A quick search in the NEWS file and git logs suggests that > afs_linux_storeproc was introduced in OpenAFS 1.5.69 (2010-01-19) as a > performance improvement: > " Linux > > * Use splice to speed up storing files." > > The original behaviour which uses seperate reads/writes instead of > splice and that is (still) used on non-linux systems remained in > afs_GenericStoreProc in src/afs/afs_fetchstore.c . > > So my question is: Is it possible to rereplace afs_linux_storeproc with > afs_GenericStoreProc on linux kernel versions >=4.4 as a temporary > solution to the issue either in the openafs sources or as a distribution > specific patch, trading some performance for data integrity? That would be the first thing I tried. This code was brought into the tree on commit 34ffc9cd7d7eed62229704ad0e1d327f076ea7b6. There doesn't seem to be any additional side effects, so simply not using afs_linux_storeproc should still work. I won't have time to look at this until this weekend though. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] compile fails kernel version 4.4.0-1-default
Am 08.03.2016 um 04:37 schrieb Benjamin Kaduk: >>> There >>> > > are many call paths in the cache manager that end up at this function, >>> > > most of which are not prepared to properly handle an ERESTARTSYS >>> > > return. >>> > > Since this status can be returned after some data has already been >>> > > written, the correct behavior upon receiving it is far from clear ... >>> > > a >>> > > path towards a client free of this vector for data corruption may >>> > > involve >>> > > avoiding the dependence on splice_from_pipe_next() in preference to >>> > > adopting all call sites to handle the ERSTARTSYS case. >> > >> > For the 1.6 release, this seems the best choice of action. The "real" >> > fix would likely be difficult to completely test in a timely fashion. > That only helps if we know what the replacement would be...I am not a > linux VFS expert and do not have any ideas right now. I am not a kernel/driver developer nor a file system developer. So please forgive, if the following makes no sense at all. As far as I understand the issue and the openafs sources, the problem arises as afs_linux_storeproc uses the splice api that can return ERESTARTSYS as of kernel version 4.4. A quick search in the NEWS file and git logs suggests that afs_linux_storeproc was introduced in OpenAFS 1.5.69 (2010-01-19) as a performance improvement: " Linux * Use splice to speed up storing files." The original behaviour which uses seperate reads/writes instead of splice and that is (still) used on non-linux systems remained in afs_GenericStoreProc in src/afs/afs_fetchstore.c . So my question is: Is it possible to rereplace afs_linux_storeproc with afs_GenericStoreProc on linux kernel versions >=4.4 as a temporary solution to the issue either in the openafs sources or as a distribution specific patch, trading some performance for data integrity? Denis -- Karlsruher Institut für Technologie (KIT) IPD Snelting Denis Lohner wissenschaftlicher Mitarbeiter Am Fasanengarten 5, Gebäude 50.34, Raum 025 76131 Karlsruhe Telefon: +49 721 608-47399 Fax: +49 721 608-48457 E-Mail: denis.loh...@kit.edu Web: pp.ipd.kit.edu KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft Das KIT ist seit 2010 als familiengerechte Hochschule zertifiziert. smime.p7s Description: S/MIME Cryptographic Signature