Re: [Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-29 Thread Daniel Gryniewicz

It's already in ntirpc, and we'll submit a pullup for this week.

Daniel

On 01/29/2018 12:53 PM, Pradeep wrote:

Hi Bill,

Are you planning to pull this into the next ganesha RC?

Thanks,
Pradeep

On Sun, Jan 28, 2018 at 7:13 AM, William Allen Simpson 
> wrote:


On 1/27/18 4:07 PM, Pradeep wrote:

​Here is what I see in the log (the '2' is what I added to
figure out which recv failed):
nfs-ganesha-199008[svc_948] rpc :TIRPC :WARN :svc_vc_recv:
0x7f91c0861400 fd 21 recv errno 11 (try again) 2 176​

The fix looks good. Thanks Bill.

Thanks for the excellent report.  I wish everybody did such well
researched reports!

Yeah, the 2 isn't really needed, because I used "svc_vc_wait" and
"svc_vc_recv" (__func__) to differentiate the 2 messages.

This is really puzzling, since it should never happen.  It's the
recv() with NO WAIT.  And we are level-triggered, so we shouldn't be
in this code without an event.

If it needed more data, it should be WOULD BLOCK, but it's giving
EAGAIN.  No idea what that means here.

Hope it's not happening often




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot



___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-29 Thread Pradeep
Hi Bill,

Are you planning to pull this into the next ganesha RC?

Thanks,
Pradeep

On Sun, Jan 28, 2018 at 7:13 AM, William Allen Simpson <
william.allen.simp...@gmail.com> wrote:

> On 1/27/18 4:07 PM, Pradeep wrote:
>
>> ​Here is what I see in the log (the '2' is what I added to figure out
>> which recv failed):
>> nfs-ganesha-199008[svc_948] rpc :TIRPC :WARN :svc_vc_recv: 0x7f91c0861400
>> fd 21 recv errno 11 (try again) 2 176​
>>
>> The fix looks good. Thanks Bill.
>>
>> Thanks for the excellent report.  I wish everybody did such well
> researched reports!
>
> Yeah, the 2 isn't really needed, because I used "svc_vc_wait" and
> "svc_vc_recv" (__func__) to differentiate the 2 messages.
>
> This is really puzzling, since it should never happen.  It's the
> recv() with NO WAIT.  And we are level-triggered, so we shouldn't be
> in this code without an event.
>
> If it needed more data, it should be WOULD BLOCK, but it's giving
> EAGAIN.  No idea what that means here.
>
> Hope it's not happening often
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-28 Thread William Allen Simpson

On 1/27/18 4:07 PM, Pradeep wrote:

​Here is what I see in the log (the '2' is what I added to figure out which 
recv failed):
nfs-ganesha-199008[svc_948] rpc :TIRPC :WARN :svc_vc_recv: 0x7f91c0861400 fd 21 
recv errno 11 (try again) 2 176​

The fix looks good. Thanks Bill.


Thanks for the excellent report.  I wish everybody did such well
researched reports!

Yeah, the 2 isn't really needed, because I used "svc_vc_wait" and
"svc_vc_recv" (__func__) to differentiate the 2 messages.

This is really puzzling, since it should never happen.  It's the
recv() with NO WAIT.  And we are level-triggered, so we shouldn't be
in this code without an event.

If it needed more data, it should be WOULD BLOCK, but it's giving
EAGAIN.  No idea what that means here.

Hope it's not happening often

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-27 Thread Pradeep
On Sat, Jan 27, 2018 at 7:00 AM, William Allen Simpson <
william.allen.simp...@gmail.com> wrote:

> On 1/27/18 9:56 AM, William Allen Simpson wrote:
>
>> I'm not able to reproduce.  Could you tell me which EAGAIN is
>> happening?  The log line will say "svc_vc_wait" or "svc_vc_recv",
>> and have the actual error code on it.  Maybe this is EWOULDBLOCK?
>>
>> Of course, neither EAGAIN or EWOULDBLOCK should be happening on a
>> level triggered event.  But the old code had a log, so it's there.
>>
>
> I've stashed the patch on
>   https://github.com/linuxbox2/ntirpc/tree/was16backport
>
> Could you see whether this fixed it for you?
>
> And report the log line(s)?  Is this happening often?
>

​Here is what I see in the log (the '2' is what I added to figure out which
recv failed):
nfs-ganesha-199008[svc_948] rpc :TIRPC :WARN :svc_vc_recv: 0x7f91c0861400
fd 21 recv errno 11 (try again) 2 176​

The fix looks good. Thanks Bill.

Pradeep
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-27 Thread William Allen Simpson

On 1/27/18 9:56 AM, William Allen Simpson wrote:

I'm not able to reproduce.  Could you tell me which EAGAIN is
happening?  The log line will say "svc_vc_wait" or "svc_vc_recv",
and have the actual error code on it.  Maybe this is EWOULDBLOCK?

Of course, neither EAGAIN or EWOULDBLOCK should be happening on a
level triggered event.  But the old code had a log, so it's there.


I've stashed the patch on
  https://github.com/linuxbox2/ntirpc/tree/was16backport

Could you see whether this fixed it for you?

And report the log line(s)?  Is this happening often?

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-27 Thread William Allen Simpson

On 1/26/18 8:53 PM, William Allen Simpson wrote:

In fact, I don't understand how we could get EAGAIN, according to the
documentation.  But it's logged.  Good idea about differentiating the
two identical log lines.  I'd prefer text rather than the number 2.


And in the adjacent code, you'll see that I already had a text
differentiation.



I'll code it up, with acknowledgement.  Thanks again!


I'm not able to reproduce.  Could you tell me which EAGAIN is
happening?  The log line will say "svc_vc_wait" or "svc_vc_recv",
and have the actual error code on it.  Maybe this is EWOULDBLOCK?

Of course, neither EAGAIN or EWOULDBLOCK should be happening on a
level triggered event.  But the old code had a log, so it's there.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-26 Thread William Allen Simpson

On 1/26/18 12:18 PM, Pradeep wrote:
In svc_vc_recv(), we handle the case of incomplete receive by rearming the FD and returning ( if xd->sx_fbtbc is not zero). In the case of EAGAIN also shouldn't we be doing the same? epoll is ONESHOT; so new receives won't give new events until epoll_ctl() 
is called, right?


I tried adding the rearming code in EAGAIN cases and was able run the test 
without receive hang.


I'm on PTO, but I'll look at this tomorrow.  So glad that somebody is
finally rigorously testing this code that was added half a year ago!

This may be some code left over from my tests with triggered (couldn't
get to work) instead of one-shot.  Triggered should be faster, with
fewer system calls, a greater concern in today's MELTDOWN environment.

In fact, I don't understand how we could get EAGAIN, according to the
documentation.  But it's logged.  Good idea about differentiating the
two identical log lines.  I'd prefer text rather than the number 2.

I'll code it up, with acknowledgement.  Thanks again!

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-26 Thread Pradeep
Hi Dan,

In svc_vc_recv(), we handle the case of incomplete receive by rearming the
FD and returning ( if xd->sx_fbtbc is not zero). In the case of EAGAIN also
shouldn't we be doing the same? epoll is ONESHOT; so new receives won't
give new events until epoll_ctl() is called, right?

I tried adding the rearming code in EAGAIN cases and was able run the test
without receive hang.

diff --git a/src/svc_vc.c b/src/svc_vc.c
index f5377df..496444a 100644
--- a/src/svc_vc.c
+++ b/src/svc_vc.c
@@ -680,6 +680,12 @@ svc_vc_recv(SVCXPRT *xprt)
code = errno;

if (code == EAGAIN || code == EWOULDBLOCK) {
+   if (unlikely(svc_rqst_rearm_events(xprt))) {
+   __warnx(TIRPC_DEBUG_FLAG_ERROR,
+   "%s: %p fd %d
svc_rqst_rearm_events failed (will set dead)",
+   __func__, xprt,
xprt->xp_fd);
+   SVC_DESTROY(xprt);
+   }
__warnx(TIRPC_DEBUG_FLAG_WARN,
"%s: %p fd %d recv errno %d (try
again)",
"svc_vc_wait", xprt, xprt->xp_fd,
code);
@@ -731,8 +737,14 @@ svc_vc_recv(SVCXPRT *xprt)
code = errno;

if (code == EAGAIN || code == EWOULDBLOCK) {
+   if (unlikely(svc_rqst_rearm_events(xprt))) {
+   __warnx(TIRPC_DEBUG_FLAG_ERROR,
+   "%s: %p fd %d svc_rqst_rearm_events
failed (will set dead)",
+   __func__, xprt, xprt->xp_fd);
+   SVC_DESTROY(xprt);
+   }
__warnx(TIRPC_DEBUG_FLAG_SVC_VC,
-   "%s: %p fd %d recv errno %d (try again)",
+   "%s: %p fd %d recv errno %d (try again) 2",
__func__, xprt, xprt->xp_fd, code);
return SVC_STAT(xprt);



On Fri, Jan 26, 2018 at 6:24 AM, Matt Benjamin  wrote:

> Yes, I wasn't claiming there is anything missing.  Before 2.6, there
> was a rearm method being called.
>
> Matt
>
> On Fri, Jan 26, 2018 at 9:20 AM, Daniel Gryniewicz 
> wrote:
> > I don't think you re-arm a FD in epoll.  You arm it once, and it fires
> until
> > you disarm it, as far as I know.  You just call epoll_wait() to get new
> > events.
> >
> > The thread model is a bit odd;  When the epoll fires, all the events are
> > found, and a thread is submitted for each one except one.  That one is
> > handled in the local thread (since it's expected that most epoll triggers
> > will have one event on them, thus using the current hot thread).  In
> > addition, a new thread is submitted to go back and wait for events, so
> > there's no delay handling new events.  So EAGAIN is handled by just
> > indicating this thread is done, and returning it to the thread pool.
> When
> > the socket is ready again, it will trigger a new event on the thread
> waiting
> > on the epoll.
> >
> > Bill, please correct me if I'm wrong.
> >
> > Daniel
> >
> >
> > On 01/25/2018 09:13 PM, Matt Benjamin wrote:
> >>
> >> Hmm.  We used to handle that ;)
> >>
> >> Matt
> >>
> >> On Thu, Jan 25, 2018 at 9:11 PM, Pradeep 
> wrote:
> >>>
> >>> If recv() returns EAGAIN, then svc_vc_recv() returns without rearming
> the
> >>> epoll_fd. How does it get back to svc_vc_recv() again?
> >>>
> >>> On Wed, Jan 24, 2018 at 9:26 PM, Pradeep 
> wrote:
> 
> 
>  Hello,
> 
>  I seem to be hitting a corner case where ganesha (2.6-rc2) does not
>  respond to a RENEW request from 4.0 client. Enabled the debug logs and
>  noticed that NFS layer has not seen the RENEW request (I can see it in
>  tcpdump).
> 
>  I collected netstat output periodically and found that there is a time
>  window of ~60 sec where the receive buffer size remains the same. This
>  means
>  the RPC layer somehow missed a 'recv' call. Now if I enable debug on
>  TIRPC,
>  I can't reproduce the issue. Any pointers to potential races where I
>  could
>  enable selective prints would be helpful.
> 
>  svc_rqst_epoll_event() resets SVC_XPRT_FLAG_ADDED. Is it possible for
>  another thread to svc_rqst_rearm_events()? In that case if
>  svc_rqst_epoll_event() could reset the flag set by
> svc_rqst_rearm_events
>  and
>  complete the current receive before the other thread could call
>  epoll_ctl(),
>  right?
> 
>  Thanks,
>  Pradeep
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> 
> --
> >>> Check out the vibrant tech community on one of the world's most
> >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >>> 

Re: [Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-26 Thread Matt Benjamin
Yes, I wasn't claiming there is anything missing.  Before 2.6, there
was a rearm method being called.

Matt

On Fri, Jan 26, 2018 at 9:20 AM, Daniel Gryniewicz  wrote:
> I don't think you re-arm a FD in epoll.  You arm it once, and it fires until
> you disarm it, as far as I know.  You just call epoll_wait() to get new
> events.
>
> The thread model is a bit odd;  When the epoll fires, all the events are
> found, and a thread is submitted for each one except one.  That one is
> handled in the local thread (since it's expected that most epoll triggers
> will have one event on them, thus using the current hot thread).  In
> addition, a new thread is submitted to go back and wait for events, so
> there's no delay handling new events.  So EAGAIN is handled by just
> indicating this thread is done, and returning it to the thread pool.  When
> the socket is ready again, it will trigger a new event on the thread waiting
> on the epoll.
>
> Bill, please correct me if I'm wrong.
>
> Daniel
>
>
> On 01/25/2018 09:13 PM, Matt Benjamin wrote:
>>
>> Hmm.  We used to handle that ;)
>>
>> Matt
>>
>> On Thu, Jan 25, 2018 at 9:11 PM, Pradeep  wrote:
>>>
>>> If recv() returns EAGAIN, then svc_vc_recv() returns without rearming the
>>> epoll_fd. How does it get back to svc_vc_recv() again?
>>>
>>> On Wed, Jan 24, 2018 at 9:26 PM, Pradeep  wrote:


 Hello,

 I seem to be hitting a corner case where ganesha (2.6-rc2) does not
 respond to a RENEW request from 4.0 client. Enabled the debug logs and
 noticed that NFS layer has not seen the RENEW request (I can see it in
 tcpdump).

 I collected netstat output periodically and found that there is a time
 window of ~60 sec where the receive buffer size remains the same. This
 means
 the RPC layer somehow missed a 'recv' call. Now if I enable debug on
 TIRPC,
 I can't reproduce the issue. Any pointers to potential races where I
 could
 enable selective prints would be helpful.

 svc_rqst_epoll_event() resets SVC_XPRT_FLAG_ADDED. Is it possible for
 another thread to svc_rqst_rearm_events()? In that case if
 svc_rqst_epoll_event() could reset the flag set by svc_rqst_rearm_events
 and
 complete the current receive before the other thread could call
 epoll_ctl(),
 right?

 Thanks,
 Pradeep
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Nfs-ganesha-devel mailing list
>>> Nfs-ganesha-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>>
>>
>>
>>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-26 Thread Daniel Gryniewicz
I don't think you re-arm a FD in epoll.  You arm it once, and it fires 
until you disarm it, as far as I know.  You just call epoll_wait() to 
get new events.


The thread model is a bit odd;  When the epoll fires, all the events are 
found, and a thread is submitted for each one except one.  That one is 
handled in the local thread (since it's expected that most epoll 
triggers will have one event on them, thus using the current hot 
thread).  In addition, a new thread is submitted to go back and wait for 
events, so there's no delay handling new events.  So EAGAIN is handled 
by just indicating this thread is done, and returning it to the thread 
pool.  When the socket is ready again, it will trigger a new event on 
the thread waiting on the epoll.


Bill, please correct me if I'm wrong.

Daniel

On 01/25/2018 09:13 PM, Matt Benjamin wrote:

Hmm.  We used to handle that ;)

Matt

On Thu, Jan 25, 2018 at 9:11 PM, Pradeep  wrote:

If recv() returns EAGAIN, then svc_vc_recv() returns without rearming the
epoll_fd. How does it get back to svc_vc_recv() again?

On Wed, Jan 24, 2018 at 9:26 PM, Pradeep  wrote:


Hello,

I seem to be hitting a corner case where ganesha (2.6-rc2) does not
respond to a RENEW request from 4.0 client. Enabled the debug logs and
noticed that NFS layer has not seen the RENEW request (I can see it in
tcpdump).

I collected netstat output periodically and found that there is a time
window of ~60 sec where the receive buffer size remains the same. This means
the RPC layer somehow missed a 'recv' call. Now if I enable debug on TIRPC,
I can't reproduce the issue. Any pointers to potential races where I could
enable selective prints would be helpful.

svc_rqst_epoll_event() resets SVC_XPRT_FLAG_ADDED. Is it possible for
another thread to svc_rqst_rearm_events()? In that case if
svc_rqst_epoll_event() could reset the flag set by svc_rqst_rearm_events and
complete the current receive before the other thread could call epoll_ctl(),
right?

Thanks,
Pradeep




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel








--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-25 Thread Matt Benjamin
Hmm.  We used to handle that ;)

Matt

On Thu, Jan 25, 2018 at 9:11 PM, Pradeep  wrote:
> If recv() returns EAGAIN, then svc_vc_recv() returns without rearming the
> epoll_fd. How does it get back to svc_vc_recv() again?
>
> On Wed, Jan 24, 2018 at 9:26 PM, Pradeep  wrote:
>>
>> Hello,
>>
>> I seem to be hitting a corner case where ganesha (2.6-rc2) does not
>> respond to a RENEW request from 4.0 client. Enabled the debug logs and
>> noticed that NFS layer has not seen the RENEW request (I can see it in
>> tcpdump).
>>
>> I collected netstat output periodically and found that there is a time
>> window of ~60 sec where the receive buffer size remains the same. This means
>> the RPC layer somehow missed a 'recv' call. Now if I enable debug on TIRPC,
>> I can't reproduce the issue. Any pointers to potential races where I could
>> enable selective prints would be helpful.
>>
>> svc_rqst_epoll_event() resets SVC_XPRT_FLAG_ADDED. Is it possible for
>> another thread to svc_rqst_rearm_events()? In that case if
>> svc_rqst_epoll_event() could reset the flag set by svc_rqst_rearm_events and
>> complete the current receive before the other thread could call epoll_ctl(),
>> right?
>>
>> Thanks,
>> Pradeep
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>



-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


Re: [Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-25 Thread Pradeep
If recv() returns EAGAIN, then svc_vc_recv() returns without rearming the
epoll_fd. How does it get back to svc_vc_recv() again?

On Wed, Jan 24, 2018 at 9:26 PM, Pradeep  wrote:

> Hello,
>
> I seem to be hitting a corner case where ganesha (2.6-rc2) does not
> respond to a RENEW request from 4.0 client. Enabled the debug logs and
> noticed that NFS layer has not seen the RENEW request (I can see it in
> tcpdump).
>
> I collected netstat output periodically and found that there is a time
> window of ~60 sec where the receive buffer size remains the same. This
> means the RPC layer somehow missed a 'recv' call. Now if I enable debug on
> TIRPC, I can't reproduce the issue. Any pointers to potential races where I
> could enable selective prints would be helpful.
>
> svc_rqst_epoll_event() resets SVC_XPRT_FLAG_ADDED. Is it possible for
> another thread to svc_rqst_rearm_events()? In that case if
> svc_rqst_epoll_event() could reset the flag set by svc_rqst_rearm_events
> and complete the current receive before the other thread could call
> epoll_ctl(), right?
>
> Thanks,
> Pradeep
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


[Nfs-ganesha-devel] 'missed' recv with 2.6-rc2?

2018-01-24 Thread Pradeep
Hello,

I seem to be hitting a corner case where ganesha (2.6-rc2) does not respond
to a RENEW request from 4.0 client. Enabled the debug logs and noticed that
NFS layer has not seen the RENEW request (I can see it in tcpdump).

I collected netstat output periodically and found that there is a time
window of ~60 sec where the receive buffer size remains the same. This
means the RPC layer somehow missed a 'recv' call. Now if I enable debug on
TIRPC, I can't reproduce the issue. Any pointers to potential races where I
could enable selective prints would be helpful.

svc_rqst_epoll_event() resets SVC_XPRT_FLAG_ADDED. Is it possible for
another thread to svc_rqst_rearm_events()? In that case if
svc_rqst_epoll_event() could reset the flag set by svc_rqst_rearm_events
and complete the current receive before the other thread could call
epoll_ctl(), right?

Thanks,
Pradeep
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel