[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-11-14 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005342#comment-15005342
 ] 

Varun Saxena commented on YARN-4183:


bq.  i feel yarn.resourcemanager.system-metrics-publisher.enabled is sufficient 
to be configured. 
Agree. Enabling system metrics publisher should be considered to be enough to 
publish events from RM.

bq. As far as i view it "yarn.timeline-service.enabled"* name is misleading, it 
should be more to signify client requires the timeline service's delegation 
token.
Maybe we can use the version config to decide if we have to fetch a token or 
not (in addition with timeline service enabled config ?).

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster

2015-11-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005746#comment-15005746
 ] 

Naganarasimha G R commented on YARN-2859:
-

Hi [~vinodkv] & [~sjlee0],
Though the addendum patch fixes the TestDistributedShell issue in 2.7.2, it had 
impacts in ATSv2 branch. On further checking realized that in trunk and 2.7.2 , 
{{yarn.resourcemanager.system-metrics-publisher.enabled}} was not set to true 
in TestDistributedShell.setupInternal in but was required to be set in ATSv2 
branch.
Further to rectify i faced following issues, 
# In MiniYARNCluster RM servicewrapper is first added and then AHSwrapper, and 
also actual AHS service is started in a thread, so RM's will be using the wrong 
timelineclient address(port is zero) as AHS service is not yet initialized.
# In Timeline client Impl's *serviceInit* URI for timeline REST service is set. 
So even though we create the correct service order (as per previous step), RM's 
SMP will fail to publish, as timelineweb address is got only after the AHS 
service is started. 
Even after this (though got the right port) was still facing some issues.

So if *MINI YARN cluster is required to be used with system-metrics-publisher 
enabled*, either we need to start correcting series of issues or use other 
simpler option  {{ServerSocketUtil.getPort(9188, 10)}}, which i feel is safer 
and used in many other places.But would req different patches for 2.6.2 !

> ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
> --
>
> Key: YARN-2859
> URL: https://issues.apache.org/jira/browse/YARN-2859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Hitesh Shah
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 2.8.0, 2.7.2, 2.6.3
>
> Attachments: YARN-2859-addendum.txt, YARN-2859.txt
>
>
> In mini cluster, a random port should be used. 
> Also, the config is not updated to the host that the process got bound to.
> {code}
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer 
> address: localhost:10200
> 2014-11-13 13:07:01,905 INFO  [main] server.MiniYARNCluster 
> (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer 
> web address: 0.0.0.0:8188
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-14 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005722#comment-15005722
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

Found that testZKRootPathAcls fails because of time out with the patch. I will 
check it more deeper.

{code:title=TestZKRMStateStore-output.txt|}
2015-11-15 02:15:17,324 INFO  [main] zookeeper.JUnit4ZKTestRunner 
(JUnit4ZKTestRunner.java:evaluate(50)) - RUNNING TEST METHOD testZKRootPathAcls
... ...
2015-11-15 02:30:12,774 DEBUG [SyncThread:0] server.FinalRequestProcessor 
(FinalRequestProcessor.java:processRequest(88)) - Processing request:: 
sessionid:0x15108ecd3b20001 type:ping cxid:0xfffe 
zxid:0xfffe txntype:unknown reqpath:n/a
2015-11-15 02:30:12,774 DEBUG [SyncThread:0] server.FinalRequestProcessor 
(FinalRequestProcessor.java:processRequest(160)) - sessionid:0x15108ecd3b20001 
type:ping cxid:0xfffe zxid:0xfffe txntype:unknown 
reqpath:n/a
2015-11-15 02:30:12,775 DEBUG [main-SendThread(127.0.0.1:11221)] 
zookeeper.ClientCnxn (ClientCnxn.java:readResponse(717)) - Got ping response 
for sessionid: 0x15108ecd3b20001 after 0ms
2015-11-15 02:30:14,776 DEBUG [SyncThread:0] server.FinalRequestProcessor 
(FinalRequestProcessor.java:processRequest(88)) - Processing request:: 
sessionid:0x15108ecd3b20001 type:ping cxid:0xfffe 
zxid:0xfffe txntype:unknown reqpath:n/a
2015-11-15 02:30:14,776 DEBUG [SyncThread:0] server.FinalRequestProcessor 
(FinalRequestProcessor.java:processRequest(160)) - sessionid:0x15108ecd3b20001 
type:ping cxid:0xfffe zxid:0xfffe txntype:unknown 
reqpath:n/a
2015-11-15 02:30:14,776 DEBUG [main-SendThread(127.0.0.1:11221)] 
zookeeper.ClientCnxn (ClientCnxn.java:readResponse(717)) - Got ping response 
for sessionid: 0x15108ecd3b20001 after 0ms
~
{code}

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Attachments: YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4350) TestDistributedShell fails

2015-11-14 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4350:

Attachment: YARN-4350-feature-YARN-2928.008.patch

Hi [~sjlee0],
I have given my analysis for this issue in the 
[comment|https://issues.apache.org/jira/browse/YARN-2859?focusedCommentId=15005746=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15005746]
 of YARN-2859 patch.
To just check whether my intended approach fixes the issue i am uploading a 
patch, if possible please try,
Actually i had earlier too faced this race condition but it happening too 
irregularly and earlier in some other jira too had mention about this but was 
waiting for YARN-3127 to be verified. If so later to it we need to make this 
fix. as it has other impacts as mentioned in YARN-3127.

> TestDistributedShell fails
> --
>
> Key: YARN-4350
> URL: https://issues.apache.org/jira/browse/YARN-4350
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-4350-feature-YARN-2928.008.patch
>
>
> Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
> There seem to be 2 distinct issues.
> (1) testDSShellWithoutDomainV2* tests fail sporadically
> These test fail more often than not if tested by themselves:
> {noformat}
> testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
>   Time elapsed: 30.998 sec  <<< FAILURE!
> java.lang.AssertionError: Application created event should be published 
> atleast once expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
> {noformat}
> They start happening after YARN-4129. I suspect this might have to do with 
> some timing issue.
> (2) the whole test times out
> If you run the whole TestDistributedShell test, it times out without fail. 
> This may or may not have to do with the port change introduced by YARN-2859 
> (just a hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)