Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-25 Thread Jeff Darcy
> As far as I know, there's no explicit guarantee on the order in which
> fini is called, so we cannot rely on it to do cleanup because ec needs
> that all its underlying xlators be fully functional to finish the cleanup.

What kind of cleanup are we talking about here?  We already need to
handle the case where an entire process or node disappears suddenly.
Can communicating peers handle it a bit more gracefully if they get
a message instead of having to wait for a ping timeout?  Perhaps, but
if that means creating a potential for our SIGTERM handler to be
blocked indefinitely then I'm not sure how useful it's going to be.
It's a bit ops-unfriendly, and will probably just get us back to the
same place when SIGTERM is followed by SIGKILL.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-25 Thread Vijay Bellur

On 07/25/2016 02:41 AM, Xavier Hernandez wrote:

Hi Jeff,

On 22/07/16 15:37, Jeff Darcy wrote:

Gah! sorry sorry, I meant to send the mail as SIGTERM. Not SIGKILL.
So xavi
and I were wondering why cleanup_and_exit() is not sending
GF_PARENT_DOWN
event.


OK, then that grinding sound you hear is my brain shifting gears.  ;)  It
seems that cleanup_and_exit will call xlator.fini in some few cases, but
it doesn't do anything that would send notify events.  I'll bet the
answer
to "why" is just that nobody thought of it or got around to it.  The next
question I'd ask is: can you do what you need to do from ec.fini instead?
That would require enabling it in should_call_fini as well, but otherwise
seems pretty straightforward.


As far as I know, there's no explicit guarantee on the order in which
fini is called, so we cannot rely on it to do cleanup because ec needs
that all its underlying xlators be fully functional to finish the cleanup.

If this can be explicitly enforced and maintained, I think it could be
moved but with some tricks, since fini is exepected to be a synchronous
operation and the ec cleanup is asynchronous.



If the answer to that question is no, then things get more complicated.
Can we do one loop that sends GF_EVENT_PARENT_DOWN events, then another
that calls fini?  Can we just do a basic list traversal (as we do now for
fini) or do we need to do something more complicated to deal with cluster
translators?  I think a separate loop doing basic list traversal would
work, even with brick multiplexing, so it's probably worth just coding it
up as an experiment.


The main "difficulty" here is the asynchronous behavior of the cleanup.
Nothing else can be shut down until the cleanup finishes.

Maybe the GF_EVENT_PARENT_DOWN should account for this
asynchronous/delayed operation, while the fini should be kept as a
synchronous cleanup and resource release operation.



+1. GF_EVENT_PARENT_DOWN or similar can let translators know that we are 
winding down. Once the translators are done with respective asynchronous 
operations, they would need to acknowledge about being ready for fini(). 
Once all translators ack, we could go about invoking fini() as the final 
cleanup.


-Vijay

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-25 Thread Xavier Hernandez

Hi Jeff,

On 22/07/16 15:37, Jeff Darcy wrote:

Gah! sorry sorry, I meant to send the mail as SIGTERM. Not SIGKILL. So xavi
and I were wondering why cleanup_and_exit() is not sending GF_PARENT_DOWN
event.


OK, then that grinding sound you hear is my brain shifting gears.  ;)  It
seems that cleanup_and_exit will call xlator.fini in some few cases, but
it doesn't do anything that would send notify events.  I'll bet the answer
to "why" is just that nobody thought of it or got around to it.  The next
question I'd ask is: can you do what you need to do from ec.fini instead?
That would require enabling it in should_call_fini as well, but otherwise
seems pretty straightforward.


As far as I know, there's no explicit guarantee on the order in which 
fini is called, so we cannot rely on it to do cleanup because ec needs 
that all its underlying xlators be fully functional to finish the cleanup.


If this can be explicitly enforced and maintained, I think it could be 
moved but with some tricks, since fini is exepected to be a synchronous 
operation and the ec cleanup is asynchronous.




If the answer to that question is no, then things get more complicated.
Can we do one loop that sends GF_EVENT_PARENT_DOWN events, then another
that calls fini?  Can we just do a basic list traversal (as we do now for
fini) or do we need to do something more complicated to deal with cluster
translators?  I think a separate loop doing basic list traversal would
work, even with brick multiplexing, so it's probably worth just coding it
up as an experiment.


The main "difficulty" here is the asynchronous behavior of the cleanup. 
Nothing else can be shut down until the cleanup finishes.


Maybe the GF_EVENT_PARENT_DOWN should account for this 
asynchronous/delayed operation, while the fini should be kept as a 
synchronous cleanup and resource release operation.


Xavi


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-22 Thread Pranith Kumar Karampuri
On Fri, Jul 22, 2016 at 7:07 PM, Jeff Darcy  wrote:

> > Gah! sorry sorry, I meant to send the mail as SIGTERM. Not SIGKILL. So
> xavi
> > and I were wondering why cleanup_and_exit() is not sending GF_PARENT_DOWN
> > event.
>
> OK, then that grinding sound you hear is my brain shifting gears.  ;)  It
> seems that cleanup_and_exit will call xlator.fini in some few cases, but
> it doesn't do anything that would send notify events.  I'll bet the answer
> to "why" is just that nobody thought of it or got around to it.  The next
> question I'd ask is: can you do what you need to do from ec.fini instead?
> That would require enabling it in should_call_fini as well, but otherwise
> seems pretty straightforward.
>
> If the answer to that question is no, then things get more complicated.
> Can we do one loop that sends GF_EVENT_PARENT_DOWN events, then another
> that calls fini?  Can we just do a basic list traversal (as we do now for
> fini) or do we need to do something more complicated to deal with cluster
> translators?  I think a separate loop doing basic list traversal would
> work, even with brick multiplexing, so it's probably worth just coding it
> up as an experiment.
>

I don't think we need any list traversal because notify sends it down the
graph. I guess I will start the experiment then.


-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-22 Thread Jeff Darcy
> Gah! sorry sorry, I meant to send the mail as SIGTERM. Not SIGKILL. So xavi
> and I were wondering why cleanup_and_exit() is not sending GF_PARENT_DOWN
> event.

OK, then that grinding sound you hear is my brain shifting gears.  ;)  It
seems that cleanup_and_exit will call xlator.fini in some few cases, but
it doesn't do anything that would send notify events.  I'll bet the answer
to "why" is just that nobody thought of it or got around to it.  The next
question I'd ask is: can you do what you need to do from ec.fini instead?
That would require enabling it in should_call_fini as well, but otherwise
seems pretty straightforward.

If the answer to that question is no, then things get more complicated.
Can we do one loop that sends GF_EVENT_PARENT_DOWN events, then another
that calls fini?  Can we just do a basic list traversal (as we do now for
fini) or do we need to do something more complicated to deal with cluster
translators?  I think a separate loop doing basic list traversal would
work, even with brick multiplexing, so it's probably worth just coding it
up as an experiment.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-22 Thread Pranith Kumar Karampuri
http://review.gluster.org/14980, this is where we have all the context
about why I sent out this mail. Basically the tests were failing because
umount is racing with version-updation xattrop. While I fixed the test to
handle that race, xavi was wondering why GF_PARENT_DOWN event didn't come.
I found that in cleanup_and_exit() we don't send this event. We are only
calling 'fini()'. So wondering if any one knows why this is so.

On Fri, Jul 22, 2016 at 6:37 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> It is only calling fini() apart from that not much.
>
> On Fri, Jul 22, 2016 at 6:36 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>> Gah! sorry sorry, I meant to send the mail as SIGTERM. Not SIGKILL. So
>> xavi and I were wondering why cleanup_and_exit() is not sending
>> GF_PARENT_DOWN event.
>>
>> On Fri, Jul 22, 2016 at 6:24 PM, Jeff Darcy  wrote:
>>
>>> > Does anyone know why GF_PARENT_DOWN is not triggered on SIGKILL? It
>>> will give
>>> > a chance for xlators to do any cleanup they need to do. For example ec
>>> can
>>> > complete the delayed xattrops.
>>>
>>> Nothing is triggered on SIGKILL.  SIGKILL is explicitly defined to
>>> terminate a
>>> process *immediately*.  Among other things, this means it can not be
>>> ignored or
>>> caught, to preclude handlers doing something that might delay
>>> termination.
>>>
>>>
>>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
>>>
>>> Since at least 4.2BSD and SVr2 (the first version of UNIX that I worked
>>> on)
>>> there have even been distinct kernel code paths to ensure special
>>> handling of
>>> SIGKILL.  There's nothing we can do about SIGKILL except be prepared to
>>> deal
>>> with it the same way we'd deal with the entire machine crashing.
>>>
>>> If you mean why is there nothing we can do on a *server* in response to
>>> SIGKILL on a *client*, that's a slightly more interesting question.  It's
>>> possible that the unique nature of SIGKILL puts connections into a
>>> different state than either system failure (on the more abrupt side) or
>>> clean shutdown (less abrupt).  If so, we probably need to take a look at
>>> the socket/RPC code or perhaps even protocol/server to see why these
>>> connections are not being cleaned up and shut down in a timely fashion.
>>>
>>
>>
>>
>> --
>> Pranith
>>
>
>
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-22 Thread Pranith Kumar Karampuri
It is only calling fini() apart from that not much.

On Fri, Jul 22, 2016 at 6:36 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> Gah! sorry sorry, I meant to send the mail as SIGTERM. Not SIGKILL. So
> xavi and I were wondering why cleanup_and_exit() is not sending
> GF_PARENT_DOWN event.
>
> On Fri, Jul 22, 2016 at 6:24 PM, Jeff Darcy  wrote:
>
>> > Does anyone know why GF_PARENT_DOWN is not triggered on SIGKILL? It
>> will give
>> > a chance for xlators to do any cleanup they need to do. For example ec
>> can
>> > complete the delayed xattrops.
>>
>> Nothing is triggered on SIGKILL.  SIGKILL is explicitly defined to
>> terminate a
>> process *immediately*.  Among other things, this means it can not be
>> ignored or
>> caught, to preclude handlers doing something that might delay termination.
>>
>>
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
>>
>> Since at least 4.2BSD and SVr2 (the first version of UNIX that I worked
>> on)
>> there have even been distinct kernel code paths to ensure special
>> handling of
>> SIGKILL.  There's nothing we can do about SIGKILL except be prepared to
>> deal
>> with it the same way we'd deal with the entire machine crashing.
>>
>> If you mean why is there nothing we can do on a *server* in response to
>> SIGKILL on a *client*, that's a slightly more interesting question.  It's
>> possible that the unique nature of SIGKILL puts connections into a
>> different state than either system failure (on the more abrupt side) or
>> clean shutdown (less abrupt).  If so, we probably need to take a look at
>> the socket/RPC code or perhaps even protocol/server to see why these
>> connections are not being cleaned up and shut down in a timely fashion.
>>
>
>
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-22 Thread Pranith Kumar Karampuri
Gah! sorry sorry, I meant to send the mail as SIGTERM. Not SIGKILL. So xavi
and I were wondering why cleanup_and_exit() is not sending GF_PARENT_DOWN
event.

On Fri, Jul 22, 2016 at 6:24 PM, Jeff Darcy  wrote:

> > Does anyone know why GF_PARENT_DOWN is not triggered on SIGKILL? It will
> give
> > a chance for xlators to do any cleanup they need to do. For example ec
> can
> > complete the delayed xattrops.
>
> Nothing is triggered on SIGKILL.  SIGKILL is explicitly defined to
> terminate a
> process *immediately*.  Among other things, this means it can not be
> ignored or
> caught, to preclude handlers doing something that might delay termination.
>
>
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
>
> Since at least 4.2BSD and SVr2 (the first version of UNIX that I worked on)
> there have even been distinct kernel code paths to ensure special handling
> of
> SIGKILL.  There's nothing we can do about SIGKILL except be prepared to
> deal
> with it the same way we'd deal with the entire machine crashing.
>
> If you mean why is there nothing we can do on a *server* in response to
> SIGKILL on a *client*, that's a slightly more interesting question.  It's
> possible that the unique nature of SIGKILL puts connections into a
> different state than either system failure (on the more abrupt side) or
> clean shutdown (less abrupt).  If so, we probably need to take a look at
> the socket/RPC code or perhaps even protocol/server to see why these
> connections are not being cleaned up and shut down in a timely fashion.
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] GF_PARENT_DOWN on SIGKILL

2016-07-22 Thread Jeff Darcy
> Does anyone know why GF_PARENT_DOWN is not triggered on SIGKILL? It will give
> a chance for xlators to do any cleanup they need to do. For example ec can
> complete the delayed xattrops.

Nothing is triggered on SIGKILL.  SIGKILL is explicitly defined to terminate a
process *immediately*.  Among other things, this means it can not be ignored or
caught, to preclude handlers doing something that might delay termination.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04

Since at least 4.2BSD and SVr2 (the first version of UNIX that I worked on)
there have even been distinct kernel code paths to ensure special handling of
SIGKILL.  There's nothing we can do about SIGKILL except be prepared to deal
with it the same way we'd deal with the entire machine crashing.

If you mean why is there nothing we can do on a *server* in response to
SIGKILL on a *client*, that's a slightly more interesting question.  It's
possible that the unique nature of SIGKILL puts connections into a
different state than either system failure (on the more abrupt side) or
clean shutdown (less abrupt).  If so, we probably need to take a look at
the socket/RPC code or perhaps even protocol/server to see why these
connections are not being cleaned up and shut down in a timely fashion.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel