Re: Slider AM fails to run when RM in HA setup fails over

2016-08-01 Thread Manoj Samel
Hello, I have uploaded requested logs, configurations and my observations on the logs etc. to https://issues.apache.org/jira/browse/SLIDER-1158. Would greatly appreciate if someone takes a looks and provides any pointers on slider created ticket and what could be leading to the observed behavior

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-29 Thread Manoj Samel
Hi, I have uploaded the config files, hope these shed light into the TICKET authentication issue. As a side note - it seems the commands like "slider list --containers" etc. now are ** significantly ** slower (compared when slider-client.xml was not empty and had few properties). The commands

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-28 Thread Manoj Samel
Hi Gour, I added properties in /etc/hadoop/conf/yarn-site.xml and emptied the /data/slider/conf/slider-client.xml and restarted both RMs. - hadoop.registry.zk.quorum - hadoop.registry.zk.root - slider.yarn.queue Now there are no issues in creating or destroying cluster. This helps as

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-28 Thread Manoj Samel
Thanks. I will test with the updated config and then upload the latest ones ... Thanks, Manoj On Thu, Jul 28, 2016 at 5:21 PM, Gour Saha wrote: > slider.zookeeper.quorum is deprecated and should not be used. > hadoop.registry.zk.quorum is used instead and is typically

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-28 Thread Gour Saha
slider.zookeeper.quorum is deprecated and should not be used. hadoop.registry.zk.quorum is used instead and is typically defined in yarn-site.xml. So is hadoop.registry.zk.root. It is not encouraged to specify slider.yarn.queue at the cluster config level. Ideally it is best to specify the queue

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-28 Thread Manoj Samel
Following slider specific properties are at present added in /data/slider/conf/slider-client.xml. If you think they should be picked up from HADOOP_CONF_DIR (/etc/hadoop/conf) file, which file in HADOOP_CONF_DIR should these be added ? - slider.zookeeper.quorum - hadoop.registry.zk.quorum

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-28 Thread Gour Saha
That is strange, since it is indeed not required to contain anything in slider-client.xml (except ) if HADOOP_CONF_DIR has everything that Slider needs. This probably gives an indication that there might be some issue with cluster configuration based on files solely under HADOOP_CONF_DIR to begin

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-28 Thread Manoj Samel
Thanks Gour for prompt reply BTW - Creating a empty slider-client.xml (with just ) does not works. The AM starts but fails to create any components and shows errors like 2016-07-28 23:18:46,018 [AmExecutor-006-SendThread(localhost.localdomain:2181)] WARN zookeeper.ClientCnxn - Session 0x0 for

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-28 Thread Gour Saha
No need to copy any files. Pointing HADOOP_CONF_DIR to /etc/hadoop/conf is good. -Gour On 7/28/16, 3:24 PM, "Manoj Samel" wrote: >Follow up question regarding Gour's comment in earlier thread - > >Slider is installed on one of the hadoop nodes. SLIDER_HOME/conf

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-28 Thread Manoj Samel
Follow up question regarding Gour's comment in earlier thread - Slider is installed on one of the hadoop nodes. SLIDER_HOME/conf directory (say /data/slider/conf) is different than HADOOP_CONF_DIR (/etc/hadoop/conf). Is it required/recommended that files in HADOOP_CONF_DIR be copied to

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-26 Thread Manoj Samel
Filed https://issues.apache.org/jira/browse/SLIDER-1158 with logs and my analysis of logs. On Tue, Jul 26, 2016 at 10:36 AM, Gour Saha wrote: > Please file a JIRA and upload the logs to it. > > On 7/26/16, 10:21 AM, "Manoj Samel" wrote: > > >Hi

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-26 Thread Gour Saha
Please file a JIRA and upload the logs to it. On 7/26/16, 10:21 AM, "Manoj Samel" wrote: >Hi Gour, > >Can you please reach me using your own email-id? I will then send logs to >you, along with my analysis - I don't want to send logs on public list > >Thanks, > >On Mon,

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-26 Thread Manoj Samel
Hi Gour, Can you please reach me using your own email-id? I will then send logs to you, along with my analysis - I don't want to send logs on public list Thanks, On Mon, Jul 25, 2016 at 5:39 PM, Gour Saha wrote: > Ok, so this node is not a gateway. It is part of the

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-25 Thread Gour Saha
Ok, so this node is not a gateway. It is part of the cluster, which means you donĀ¹t need slider-client.xml at all. Just have HADOOP_CONF_DIR pointing to /etc/hadoop/conf in slider-env.sh and that should be it. So the above simplifies your config setup. It will not solve either of the 2 problems

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-25 Thread Manoj Samel
1. Not clear about your question on "gateway" node. The node running slider is part of the hadoop cluster and there are other services like Oozie that run on this node that utilizes hdfs and yarn. So if your question is whether the node is otherwise working for HDFS and Yarn

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-25 Thread Gour Saha
The node you are running slider from, is that a gateway node? Sorry for not being explicit. I meant copy everything under /etc/hadoop/conf from your cluster into some temp directory (say /tmp/hadoop_conf) in your gateway node or local or whichever node you are running slider from. Then set

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-25 Thread Manoj Samel
Hi Gour, Thanks for your prompt reply. FYI, issue happens when I create slider app when rm1 is active and when rm1 fails over to rm2. As soon as rm2 becomes active; the slider AM goes from RUNNING to ACCEPTED state with above error. For your suggestion, I did following 1) Copied core-site,

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-25 Thread Gour Saha
If possible, can you copy the entire content of the directory /etc/hadoop/conf and then set HADOOP_CONF_DIR in slider-env.sh to it. Keep slider-client.xml empty. Now when you do the same rm1->rm2 and then the reverse failovers, do you see the same behaviors? -Gour On 7/25/16, 2:28 PM, "Manoj

Re: Slider AM fails to run when RM in HA setup fails over

2016-07-25 Thread Manoj Samel
Another observation (whatever it is worth) If slider app is created and started when rm2 was active, then it seems to survive switches between rm2 and rm1 (and back). I.e * rm2 is active * create and start slider application * fail over to rm1. Now the Slider AM keeps running * fail over to rm2