Re: Review Request 52989: HDFS goes down after installing cluster

2016-10-18 Thread Andrew Onischuk


> On Oct. 18, 2016, 4:50 p.m., Sid Wagle wrote:
> > 1. Agree with SUmit on increasing timeout to 5 sicne we are killing the 
> > thread.
> > 2. Instead of cryptic way of kill is there a graceful way to stop using 
> > separate Stoppable Thread extension which can be called from the parent ?
> 
> Andrew Onischuk wrote:
> 1. Sure this is what will go as the second patch (already talked with 
> Sumit)
> 2. We cannot do that. Since we need to interrupt external commands like 
> subprocess etc. also this won't interrupt loops, sleep commands, etc. unless 
> we add check for that everywhere. In every function which has that. And even 
> with that subprocess/sleep is still something which woudln't interrupt.
> 
> Sid Wagle wrote:
> How about spawning a subprocess that is long running until we hit a bad 
> command and have to kill and respawn to execute non-blacklisted commands? 
> This might need to have a separate queue for blacklisted commands that are 
> tried for whitllisting withought affecting main thread.

good point. Will do.


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52989/#review153113
---


On Oct. 18, 2016, 3:09 p.m., Andrew Onischuk wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52989/
> ---
> 
> (Updated Oct. 18, 2016, 3:09 p.m.)
> 
> 
> Review request for Ambari and Dmitro Lisnichenko.
> 
> 
> Bugs: AMBARI-18629
> https://issues.apache.org/jira/browse/AMBARI-18629
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> After cluster install, HDFS goes down after it is started.  
> Here is a repro cluster : 172.27.35.0 (with https)
> 
> Also, this cluster runs on all the configs mentioned in the Environment
> description.  
> No particular exceptions are found in HDFS/ambari server logs
> 
> 
> Diffs
> -
> 
>   ambari-agent/src/main/python/ambari_agent/ActionQueue.py c03ee4f 
>   ambari-common/src/main/python/ambari_commons/thread_utils.py PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/52989/diff/
> 
> 
> Testing
> ---
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Andrew Onischuk
> 
>



Re: Review Request 52989: HDFS goes down after installing cluster

2016-10-18 Thread Andrew Onischuk


> On Oct. 18, 2016, 4:50 p.m., Sid Wagle wrote:
> > 1. Agree with SUmit on increasing timeout to 5 sicne we are killing the 
> > thread.
> > 2. Instead of cryptic way of kill is there a graceful way to stop using 
> > separate Stoppable Thread extension which can be called from the parent ?

1. Sure this is what will go as the second patch (already talked with Sumit)
2. We cannot do that. Since we need to interrupt external commands like 
subprocess etc. also this won't interrupt loops, sleep commands, etc. unless we 
add check for that everywhere. In every function which has that. And even with 
that subprocess/sleep is still something which woudln't interrupt.


- Andrew


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52989/#review153113
---


On Oct. 18, 2016, 3:09 p.m., Andrew Onischuk wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52989/
> ---
> 
> (Updated Oct. 18, 2016, 3:09 p.m.)
> 
> 
> Review request for Ambari and Dmitro Lisnichenko.
> 
> 
> Bugs: AMBARI-18629
> https://issues.apache.org/jira/browse/AMBARI-18629
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> After cluster install, HDFS goes down after it is started.  
> Here is a repro cluster : 172.27.35.0 (with https)
> 
> Also, this cluster runs on all the configs mentioned in the Environment
> description.  
> No particular exceptions are found in HDFS/ambari server logs
> 
> 
> Diffs
> -
> 
>   ambari-agent/src/main/python/ambari_agent/ActionQueue.py c03ee4f 
>   ambari-common/src/main/python/ambari_commons/thread_utils.py PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/52989/diff/
> 
> 
> Testing
> ---
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Andrew Onischuk
> 
>



Re: Review Request 52989: HDFS goes down after installing cluster

2016-10-18 Thread Sid Wagle

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52989/#review153113
---



1. Agree with SUmit on increasing timeout to 5 sicne we are killing the thread.
2. Instead of cryptic way of kill is there a graceful way to stop using 
separate Stoppable Thread extension which can be called from the parent ?

- Sid Wagle


On Oct. 18, 2016, 3:09 p.m., Andrew Onischuk wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52989/
> ---
> 
> (Updated Oct. 18, 2016, 3:09 p.m.)
> 
> 
> Review request for Ambari and Dmitro Lisnichenko.
> 
> 
> Bugs: AMBARI-18629
> https://issues.apache.org/jira/browse/AMBARI-18629
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> After cluster install, HDFS goes down after it is started.  
> Here is a repro cluster : 172.27.35.0 (with https)
> 
> Also, this cluster runs on all the configs mentioned in the Environment
> description.  
> No particular exceptions are found in HDFS/ambari server logs
> 
> 
> Diffs
> -
> 
>   ambari-agent/src/main/python/ambari_agent/ActionQueue.py c03ee4f 
>   ambari-common/src/main/python/ambari_commons/thread_utils.py PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/52989/diff/
> 
> 
> Testing
> ---
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Andrew Onischuk
> 
>



Re: Review Request 52989: HDFS goes down after installing cluster

2016-10-18 Thread Sumit Mohanty

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52989/#review153097
---




ambari-agent/src/main/python/ambari_agent/ActionQueue.py (line 242)


Should the default value be 2 seconds - I was looking at runs yesterday and 
NN status check was taking more than 2 seconds and I assume it will translate 
to reporting the status as INSTALLED.

Wonder if 5 seconds is a better default? We should still log if command 
takes more than 2 seconds.


- Sumit Mohanty


On Oct. 18, 2016, 3:09 p.m., Andrew Onischuk wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52989/
> ---
> 
> (Updated Oct. 18, 2016, 3:09 p.m.)
> 
> 
> Review request for Ambari and Dmitro Lisnichenko.
> 
> 
> Bugs: AMBARI-18629
> https://issues.apache.org/jira/browse/AMBARI-18629
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> After cluster install, HDFS goes down after it is started.  
> Here is a repro cluster : 172.27.35.0 (with https)
> 
> Also, this cluster runs on all the configs mentioned in the Environment
> description.  
> No particular exceptions are found in HDFS/ambari server logs
> 
> 
> Diffs
> -
> 
>   ambari-agent/src/main/python/ambari_agent/ActionQueue.py c03ee4f 
>   ambari-common/src/main/python/ambari_commons/thread_utils.py PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/52989/diff/
> 
> 
> Testing
> ---
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Andrew Onischuk
> 
>



Re: Review Request 52989: HDFS goes down after installing cluster

2016-10-18 Thread Vitalyi Brodetskyi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52989/#review153095
---


Ship it!




Ship It!

- Vitalyi Brodetskyi


On Жов. 18, 2016, 3:09 після полудня, Andrew Onischuk wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52989/
> ---
> 
> (Updated Жов. 18, 2016, 3:09 після полудня)
> 
> 
> Review request for Ambari and Dmitro Lisnichenko.
> 
> 
> Bugs: AMBARI-18629
> https://issues.apache.org/jira/browse/AMBARI-18629
> 
> 
> Repository: ambari
> 
> 
> Description
> ---
> 
> After cluster install, HDFS goes down after it is started.  
> Here is a repro cluster : 172.27.35.0 (with https)
> 
> Also, this cluster runs on all the configs mentioned in the Environment
> description.  
> No particular exceptions are found in HDFS/ambari server logs
> 
> 
> Diffs
> -
> 
>   ambari-agent/src/main/python/ambari_agent/ActionQueue.py c03ee4f 
>   ambari-common/src/main/python/ambari_commons/thread_utils.py PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/52989/diff/
> 
> 
> Testing
> ---
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Andrew Onischuk
> 
>