[GitHub] [hbase] wchevreuil commented on pull request #2255: HBASE-24877 Add option to avoid aborting RS process upon uncaught exc…

2020-09-04 Thread GitBox


wchevreuil commented on pull request #2255:
URL: https://github.com/apache/hbase/pull/2255#issuecomment-687251804


   Latest UT failure seems unrelated, have it passing locally.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] wchevreuil commented on pull request #2255: HBASE-24877 Add option to avoid aborting RS process upon uncaught exc…

2020-08-29 Thread GitBox


wchevreuil commented on pull request #2255:
URL: https://github.com/apache/hbase/pull/2255#issuecomment-683274583


   It looks like this is causing some of the UTs to timeout. Let me dig into it 
further.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] wchevreuil commented on pull request #2255: HBASE-24877 Add option to avoid aborting RS process upon uncaught exc…

2020-08-26 Thread GitBox


wchevreuil commented on pull request #2255:
URL: https://github.com/apache/hbase/pull/2255#issuecomment-681034324


   retest build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] wchevreuil commented on pull request #2255: HBASE-24877 Add option to avoid aborting RS process upon uncaught exc…

2020-08-26 Thread GitBox


wchevreuil commented on pull request #2255:
URL: https://github.com/apache/hbase/pull/2255#issuecomment-680766011


   retest build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] wchevreuil commented on pull request #2255: HBASE-24877 Add option to avoid aborting RS process upon uncaught exc…

2020-08-25 Thread GitBox


wchevreuil commented on pull request #2255:
URL: https://github.com/apache/hbase/pull/2255#issuecomment-679884736


   Pushed a new commit addressing latest suggestion and checkstyle issues. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] wchevreuil commented on pull request #2255: HBASE-24877 Add option to avoid aborting RS process upon uncaught exc…

2020-08-24 Thread GitBox


wchevreuil commented on pull request #2255:
URL: https://github.com/apache/hbase/pull/2255#issuecomment-679087403


   Thanks for the suggestions, @Apache9, had a pushed a new commit addressing 
those, let me know on your thoughts.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] wchevreuil commented on pull request #2255: HBASE-24877 Add option to avoid aborting RS process upon uncaught exc…

2020-08-14 Thread GitBox


wchevreuil commented on pull request #2255:
URL: https://github.com/apache/hbase/pull/2255#issuecomment-673960675


   > What's next if we ignore the exception? We will retry later? Or we will 
just go on without this replication source? 
   
   As you can see on `ReplicationSource.startup`, it keeps looping until 
`initialize` succeeds without throwing any uncaught exceptions.
   
   > Users will then find out that the cluster is fine but data has not been 
replicated out?
   
   It's common practice to verify replication status after a maintenance. 
   
   >  I'm not sure if this is correct way, we fix an issue but introduce 
another hard to find issue?
   
   It does not fail silently, errors will get logged, and it gives operators 
the chance to look after what's going wrong without a complete downtime of 
their source clusters.
   
   >Adding a flag can keep the old behavior but we give users an impression 
that the exception can be ignored? Still not sure if this is the correct way to 
fix this... 
   Mind explaining more on your real usage?
   
   We do use some custom replication endpoints that under certain 
unavailability of some target peer hosts ended up throwing uncaught exception 
and aborting the source RSes. Sure, there could be improvements on the custom 
code, and it was an internal infra issue, but with a flag like this, we 
wouldn't need to face a period of outage at the source.




This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org