Re: [OpenAFS] compile fails kernel version 4.4.0-1-default

2016-03-08 Thread mdrslmr

Hi,

On Tue, 8 Mar 2016, Michael Laß wrote:


Was the error code 32 returned from git or did the kernel log message
change accordingly? Does your log again show a lost file server
connection? And have files been corrupted or just the checkout aborted?


Now I tried a couple times again with the patch and this time I once found the
error -512.

The output in the terminal running git looks like this:

---
error: unable to unlink old 'src/afs/Makefile.in' (Connection timed out)

... about 20 more messages like the above one

fatal: cannot create directory at 'src/afs/NBSD': Connection timed out
warning: unable to unlink
.git/index.lock: Connection
timed out
fatal: Not a git repository (or any parent up to mount point /afs)
---

The log messages look like this:

---
Mar 09 08:42:45  kernel: afs: Lost contact with file server
131.169.2.24 in cell desy.de (code -512) (all multi-homed ip addresses
down for the server)
Mar 09 08:42:45 kernel: afs: failed to store file (network
problems)
Mar 09 08:42:46 kernel: afs: file server in
cell desy.de is back up (code 0) (multi-homed address; other same-host
interfaces may still be down)
---

For some cross checking:
This morning I used a backup partition of my archlinux installation
which still has kernel 4.3. The backup was last updated on 2016-01-07
(pacman -Syu). On this installation I tried to provoke the above
error. But everything went well for about 20 tries.
This may only serve to confirm that no changes outised my computer
(servers, network, ..) cause my problems.

Cheers,
Michael







Re: [OpenAFS] compile fails kernel version 4.4.0-1-default

2016-03-08 Thread Chas Williams
Note that the osi vnode ops for linux also use the splice interface.

On Tue, 2016-03-08 at 20:47 +0100, mdrslmr wrote:
> 
> On Tue, 8 Mar 2016, Michael Laß wrote:
> 
> > Was the error code 32 returned from git or did the kernel log message
> > change accordingly? Does your log again show a lost file server
> > connection? And have files been corrupted or just the checkout aborted?
> 
> The kernel log message changed to -32. Which has already happened before
> too sometimes, as I reported earlier. So the situation before the patch
> was that sometimes -32 and sometimes -512 occurred in the log messages.
> 
> After the patch I did the git checkout procedures twice and both times
> I received the error code -32. When the error occurs I remove the
> directory completely and copy it back from a local disk for the next
> test.
> 
> Don't know if -512 would show up if I try more often.
> 
> With corrupted I meant the state of the git repository. I don't know if
> individual files are broken. So it may well be that just the checkout
> was aborted.
> 
> 
> Cheers,
> Michael
> 
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] compile fails kernel version 4.4.0-1-default

2016-03-08 Thread mdrslmr



On Tue, 8 Mar 2016, Michael Laß wrote:


Was the error code 32 returned from git or did the kernel log message
change accordingly? Does your log again show a lost file server
connection? And have files been corrupted or just the checkout aborted?


The kernel log message changed to -32. Which has already happened before
too sometimes, as I reported earlier. So the situation before the patch
was that sometimes -32 and sometimes -512 occurred in the log messages.

After the patch I did the git checkout procedures twice and both times
I received the error code -32. When the error occurs I remove the
directory completely and copy it back from a local disk for the next
test.

Don't know if -512 would show up if I try more often.

With corrupted I meant the state of the git repository. I don't know if
individual files are broken. So it may well be that just the checkout
was aborted.


Cheers,
Michael



Re: [OpenAFS] compile fails kernel version 4.4.0-1-default

2016-03-08 Thread Dave Botsch
Being able to reactivate it is a good thing, for either testing purposes
or for older kernels, since it is more efficient - unless we think
there's other known brokenness such as the potential return out of the
while loop mentioned earlier.


On Tue, Mar 08, 2016 at 06:13:12PM +0100, Stephan Wiesand wrote:
> 
> On Mar 8, 2016, at 17:29 , Michael Laß wrote:
> 
> > Am Dienstag, den 08.03.2016, 16:47 +0100 schrieb mdrslmr:
> >> I created a patch from what you suggested above.
> >> 
> >> [...]
> >> 
> >> I did all of that on top of AUR-openafs-linux-4.4 which was provided by
> >> Bevan, the openafs archlinux packager.
> >> 
> >> The patch I actually used is attached below.
> > 
> > That patch is not complete (it's missing the configuration flag).
> 
> Indeed. The complete patch as proposed would look like 
> http://gerrit.openafs.org/#/c/12217/ . Chas already objected to making it 
> possible to reactivate afs_linux_storeproc with a configure switch, and he's 
> probably right, but please feel free to comment on that change.
> 
> > I
> > will update the corresponding git branch for the openafs package soon
> > to allow testing. But since LINUX_USE_SPLICE wasn't defined your patch
> > should have worked, too.
> 
> Right.
> 
> > Was the error code 32 returned from git or did the kernel log message
> > change accordingly? Does your log again show a lost file server
> > connection? And have files been corrupted or just the checkout aborted?
> 
> 
> Good questions.
> 
> Sigh. Looks like there's more to it.
> 
> -- 
> Stephan Wiesand
> DESY -DV-
> Platanenenallee 6
> 15738 Zeuthen, Germany
> 
> ___
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info

-- 

David William Botsch
Programmer/Analyst
@CNFComputing
bot...@cnf.cornell.edu

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] compile fails kernel version 4.4.0-1-default

2016-03-08 Thread Stephan Wiesand

On Mar 8, 2016, at 17:29 , Michael Laß wrote:

> Am Dienstag, den 08.03.2016, 16:47 +0100 schrieb mdrslmr:
>> I created a patch from what you suggested above.
>> 
>> [...]
>> 
>> I did all of that on top of AUR-openafs-linux-4.4 which was provided by
>> Bevan, the openafs archlinux packager.
>> 
>> The patch I actually used is attached below.
> 
> That patch is not complete (it's missing the configuration flag).

Indeed. The complete patch as proposed would look like 
http://gerrit.openafs.org/#/c/12217/ . Chas already objected to making it 
possible to reactivate afs_linux_storeproc with a configure switch, and he's 
probably right, but please feel free to comment on that change.

> I
> will update the corresponding git branch for the openafs package soon
> to allow testing. But since LINUX_USE_SPLICE wasn't defined your patch
> should have worked, too.

Right.

> Was the error code 32 returned from git or did the kernel log message
> change accordingly? Does your log again show a lost file server
> connection? And have files been corrupted or just the checkout aborted?


Good questions.

Sigh. Looks like there's more to it.

-- 
Stephan Wiesand
DESY -DV-
Platanenenallee 6
15738 Zeuthen, Germany

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] compile fails kernel version 4.4.0-1-default

2016-03-08 Thread Michael Laß
Am Dienstag, den 08.03.2016, 16:47 +0100 schrieb mdrslmr:
> I created a patch from what you suggested above.
> 
> [...]
> 
> I did all of that on top of AUR-openafs-linux-4.4 which was provided by
> Bevan, the openafs archlinux packager.
> 
> The patch I actually used is attached below.

That patch is not complete (it's missing the configuration flag). I
will update the corresponding git branch for the openafs package soon
to allow testing. But since LINUX_USE_SPLICE wasn't defined your patch
should have worked, too.

Was the error code 32 returned from git or did the kernel log message
change accordingly? Does your log again show a lost file server
connection? And have files been corrupted or just the checkout aborted?

Cheers,
Michael
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] compile fails kernel version 4.4.0-1-default

2016-03-08 Thread mdrslmr

Hi Stephan,

On Tue, 8 Mar 2016, Stephan Wiesand wrote:


diff --git a/src/afs/afs_fetchstore.c b/src/afs/afs_fetchstore.c
index f65f40c..2630209 100644
--- a/src/afs/afs_fetchstore.c
+++ b/src/afs/afs_fetchstore.c
@@ -326,7 +326,7 @@ struct storeOps rxfs_storeUfsOps = {
.padd =rxfs_storePadd,
.close =   rxfs_storeClose,
.destroy = rxfs_storeDestroy,
-#ifdef AFS_LINUX26_ENV
+#if defined(AFS_LINUX26_ENV) && defined(LINUX_USE_SPLICE)
.storeproc = afs_linux_storeproc
#else
.storeproc = afs_GenericStoreProc



I tired a patch without much success. But I'm not sure I did everything
right.




and add a configure test defaulting to off?



This I didn't do explicitly. Don't know what it means.

I created a patch from what you suggested above.

First I mistakenly build and installed the openafs package new. Than I
remembered that this has to be done in the openafs-modules-dkms. So I
did apply the patch there and installed it. After rebooting
I tried the
"git checkout cb0081604ef5369f34279c6eb77eb4d28406f2ac"
and 
"git checkout master"


a couple off times but after about four times an error occurred with
code -32.

I did all of that on top of AUR-openafs-linux-4.4 which was provided by
Bevan, the openafs archlinux packager.

The patch I actually used is attached below.

Cheers,
Michael



From db8f4db361c76c355046dacdb83db264d3fa4f6f Mon Sep 17 00:00:00 2001

From: Michael Dressel 
Date: Tue, 8 Mar 2016 16:00:59 +0100
Subject: [PATCH] Test Stephan Wiesand suggestion

---
 src/afs/afs_fetchstore.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/afs/afs_fetchstore.c b/src/afs/afs_fetchstore.c
index f65f40c..2630209 100644
--- a/src/afs/afs_fetchstore.c
+++ b/src/afs/afs_fetchstore.c
@@ -326,7 +326,7 @@ struct storeOps rxfs_storeUfsOps = {
 .padd =rxfs_storePadd,
 .close =   rxfs_storeClose,
 .destroy = rxfs_storeDestroy,
-#ifdef AFS_LINUX26_ENV
+#if defined(AFS_LINUX26_ENV) && defined(LINUX_USE_SPLICE)
 .storeproc = afs_linux_storeproc
 #else
 .storeproc = afs_GenericStoreProc
--
2.7.1



___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: Aw: RE: [OpenAFS] compile fails kernel version 4.4.0-1-default

2016-03-08 Thread Chas Williams
On Mon, 2016-03-07 at 22:37 -0500, Benjamin Kaduk wrote:
> On Mon, 7 Mar 2016, Chas Williams wrote:
> 
> > On Mon, 2016-03-07 at 01:42 -0500, Benjamin Kaduk wrote:
> > >
> > > I am given to understand that the proximal trigger is linux commit
> > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> > > ?id=c725bfce7968009756ed2836a8cd7ba4dc163011,
> > > which addds a path wherein -ERESTARTSYS can be returned from within
> > > VFS
> > > library code.  (Maybe there are other such paths, but we maybe just
> > > didn't
> > > notice before?)  This particular function, splice_from_pipe_next(),
> > > ends
> > > up getting called from the low-level afs_linux_storeproc() routine. 
> >
> > I haven't had time to look at this, but does this also happen with
> > the memcache?
> 
> I believe so.

Sigh.  You made me read the source code.  The memcache, because it doesn't
use files, doesn't call the linux splice routine.  It just calls 
afs_GenericStoreProc.
So, memcache should work just fine assuming splice is the only issue with
the 4.4 kernel.  Perhaps someone could test this.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] compile fails kernel version 4.4.0-1-default

2016-03-08 Thread Chas Williams
On Tue, 2016-03-08 at 14:37 +0100, Stephan Wiesand wrote:
> So we'd simply do something like this
> 
> 
> diff --git a/src/afs/afs_fetchstore.c b/src/afs/afs_fetchstore.c
> index f65f40c..2630209 100644
> --- a/src/afs/afs_fetchstore.c
> +++ b/src/afs/afs_fetchstore.c
> @@ -326,7 +326,7 @@ struct storeOps rxfs_storeUfsOps = {
>  .padd =rxfs_storePadd,
>  .close =   rxfs_storeClose,
>  .destroy = rxfs_storeDestroy,
> -#ifdef AFS_LINUX26_ENV
> +#if defined(AFS_LINUX26_ENV) && defined(LINUX_USE_SPLICE)
>  .storeproc = afs_linux_storeproc
>  #else
>  .storeproc = afs_GenericStoreProc
> 
> 
> and add a configure test defaulting to off?

Close.  It should just be disabled without a way to enable for 1.6 at least.
Otherwise we risk someone enabling something we know doesn't work (or is
at least stable).
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] compile fails kernel version 4.4.0-1-default

2016-03-08 Thread Stephan Wiesand

> On 08 Mar 2016, at 12:02, Chas Williams <3ch...@gmail.com> wrote:
> 
> On Tue, 2016-03-08 at 10:16 +0100, Denis Lohner wrote:
>> Am 08.03.2016 um 04:37 schrieb Benjamin Kaduk:
> There
>>> are many call paths in the cache manager that end up at this function,
>>> most of which are not prepared to properly handle an ERESTARTSYS
>>> return.
>>> Since this status can be returned after some data has already been
>>> written, the correct behavior upon receiving it is far from clear ...
>>> a
>>> path towards a client free of this vector for data corruption may
>>> involve
>>> avoiding the dependence on splice_from_pipe_next() in preference to
>>> adopting all call sites to handle the ERSTARTSYS case.
> 
> For the 1.6 release, this seems the best choice of action.  The "real"
> fix would likely be difficult to completely test in a timely fashion.
>>> That only helps if we know what the replacement would be...I am not a
>>> linux VFS expert and do not have any ideas right now.
>> 
>> 
>> I am not a kernel/driver developer nor a file system developer. So
>> please forgive, if the following makes no sense at all.
>> 
>> As far as I understand the issue and the openafs sources, the problem
>> arises as afs_linux_storeproc uses the splice api that can return
>> ERESTARTSYS as of kernel version 4.4.
>> A quick search in the NEWS file and git logs suggests that
>> afs_linux_storeproc was introduced in OpenAFS 1.5.69 (2010-01-19) as a
>> performance improvement:
>> " Linux
>> 
>>* Use splice to speed up storing files."
>> 
>> The original behaviour which uses seperate reads/writes instead of
>> splice and that is (still) used on non-linux systems remained in
>> afs_GenericStoreProc in src/afs/afs_fetchstore.c .
>> 
>> So my question is: Is it possible to rereplace afs_linux_storeproc with
>> afs_GenericStoreProc on linux kernel versions >=4.4  as a temporary
>> solution to the issue either in the openafs sources or as a distribution
>> specific patch, trading some performance for data integrity?
> 
> That would be the first thing I tried.  This code was brought into the
> tree on commit 34ffc9cd7d7eed62229704ad0e1d327f076ea7b6.  There doesn't seem
> to be any additional side effects, so simply not using afs_linux_storeproc 
> should
> still work.

So we'd simply do something like this


diff --git a/src/afs/afs_fetchstore.c b/src/afs/afs_fetchstore.c
index f65f40c..2630209 100644
--- a/src/afs/afs_fetchstore.c
+++ b/src/afs/afs_fetchstore.c
@@ -326,7 +326,7 @@ struct storeOps rxfs_storeUfsOps = {
 .padd =rxfs_storePadd,
 .close =   rxfs_storeClose,
 .destroy = rxfs_storeDestroy,
-#ifdef AFS_LINUX26_ENV
+#if defined(AFS_LINUX26_ENV) && defined(LINUX_USE_SPLICE)
 .storeproc = afs_linux_storeproc
 #else
 .storeproc = afs_GenericStoreProc


and add a configure test defaulting to off?


-- 
Stephan Wiesand
DESY - DV -
Platanenallee 6
15738 Zeuthen, Germany

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] compile fails kernel version 4.4.0-1-default

2016-03-08 Thread Chas Williams
On Tue, 2016-03-08 at 10:16 +0100, Denis Lohner wrote:
> Am 08.03.2016 um 04:37 schrieb Benjamin Kaduk:
> >>> There
> >>> > > are many call paths in the cache manager that end up at this function,
> >>> > > most of which are not prepared to properly handle an ERESTARTSYS
> >>> > > return.
> >>> > > Since this status can be returned after some data has already been
> >>> > > written, the correct behavior upon receiving it is far from clear ...
> >>> > > a
> >>> > > path towards a client free of this vector for data corruption may
> >>> > > involve
> >>> > > avoiding the dependence on splice_from_pipe_next() in preference to
> >>> > > adopting all call sites to handle the ERSTARTSYS case.
> >> >
> >> > For the 1.6 release, this seems the best choice of action.  The "real"
> >> > fix would likely be difficult to completely test in a timely fashion.
> > That only helps if we know what the replacement would be...I am not a
> > linux VFS expert and do not have any ideas right now.
> 
> 
> I am not a kernel/driver developer nor a file system developer. So
> please forgive, if the following makes no sense at all.
> 
> As far as I understand the issue and the openafs sources, the problem
> arises as afs_linux_storeproc uses the splice api that can return
> ERESTARTSYS as of kernel version 4.4.
> A quick search in the NEWS file and git logs suggests that
> afs_linux_storeproc was introduced in OpenAFS 1.5.69 (2010-01-19) as a
> performance improvement:
> " Linux
> 
>    * Use splice to speed up storing files."
> 
> The original behaviour which uses seperate reads/writes instead of
> splice and that is (still) used on non-linux systems remained in
> afs_GenericStoreProc in src/afs/afs_fetchstore.c .
> 
> So my question is: Is it possible to rereplace afs_linux_storeproc with
> afs_GenericStoreProc on linux kernel versions >=4.4  as a temporary
> solution to the issue either in the openafs sources or as a distribution
> specific patch, trading some performance for data integrity?

That would be the first thing I tried.  This code was brought into the
tree on commit 34ffc9cd7d7eed62229704ad0e1d327f076ea7b6.  There doesn't seem
to be any additional side effects, so simply not using afs_linux_storeproc 
should
still work.

I won't have time to look at this until this weekend though.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] compile fails kernel version 4.4.0-1-default

2016-03-08 Thread Denis Lohner
Am 08.03.2016 um 04:37 schrieb Benjamin Kaduk:
>>> There
>>> > > are many call paths in the cache manager that end up at this function,
>>> > > most of which are not prepared to properly handle an ERESTARTSYS
>>> > > return.
>>> > > Since this status can be returned after some data has already been
>>> > > written, the correct behavior upon receiving it is far from clear ...
>>> > > a
>>> > > path towards a client free of this vector for data corruption may
>>> > > involve
>>> > > avoiding the dependence on splice_from_pipe_next() in preference to
>>> > > adopting all call sites to handle the ERSTARTSYS case.
>> >
>> > For the 1.6 release, this seems the best choice of action.  The "real"
>> > fix would likely be difficult to completely test in a timely fashion.
> That only helps if we know what the replacement would be...I am not a
> linux VFS expert and do not have any ideas right now.


I am not a kernel/driver developer nor a file system developer. So
please forgive, if the following makes no sense at all.

As far as I understand the issue and the openafs sources, the problem
arises as afs_linux_storeproc uses the splice api that can return
ERESTARTSYS as of kernel version 4.4.
A quick search in the NEWS file and git logs suggests that
afs_linux_storeproc was introduced in OpenAFS 1.5.69 (2010-01-19) as a
performance improvement:
" Linux

   * Use splice to speed up storing files."

The original behaviour which uses seperate reads/writes instead of
splice and that is (still) used on non-linux systems remained in
afs_GenericStoreProc in src/afs/afs_fetchstore.c .

So my question is: Is it possible to rereplace afs_linux_storeproc with
afs_GenericStoreProc on linux kernel versions >=4.4  as a temporary
solution to the issue either in the openafs sources or as a distribution
specific patch, trading some performance for data integrity?


Denis


-- 
Karlsruher Institut für Technologie (KIT)
IPD Snelting

Denis Lohner
wissenschaftlicher Mitarbeiter

Am Fasanengarten 5, Gebäude 50.34, Raum 025
76131 Karlsruhe

Telefon: +49 721 608-47399
Fax: +49 721 608-48457
E-Mail: denis.loh...@kit.edu
Web: pp.ipd.kit.edu

KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft

Das KIT ist seit 2010 als familiengerechte Hochschule zertifiziert.



smime.p7s
Description: S/MIME Cryptographic Signature