[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-10-01 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940406#comment-14940406
 ] 

Li Lu commented on YARN-3942:
-

YARN-4219 opened to track the new leveldb cache storage separately. 

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch, YARN-3942.002.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-10-01 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940307#comment-14940307
 ] 

Jason Lowe commented on YARN-3942:
--

I think the best way forward is to get the original patch cleaned up and 
integrated, as a number of people are already using them as-is.  Then we can 
address extensions like the leveldb cache enhancement and finer granularity 
caching in subsequent JIRAs which will help keep the patches more reasonably 
sized.  I'll be putting up a patch to address Jonathan's recent comment soon, 
and hopefully I'll have time to add some unit tests shortly afterwards.

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-10-01 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940314#comment-14940314
 ] 

Li Lu commented on YARN-3942:
-

Thanks for the info [~jlowe]! Sure I can open a new JIRA for the leveldb cache 
fix after this one is merged in. 

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-10-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940500#comment-14940500
 ] 

Hadoop QA commented on YARN-3942:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 34s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 5 new or modified test files. |
| {color:green}+1{color} | javac |   8m  8s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 17s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 30s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m  2s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   3m 59s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| | |  42m 59s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764679/YARN-3942.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fd026f5 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9327/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9327/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9327/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9327/console |


This message was automatically generated.

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch, YARN-3942.002.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-29 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935549#comment-14935549
 ] 

Li Lu commented on YARN-3942:
-

Hi folks, I'm trying to figure out out next plan on this JIRA. Are we planning 
to make this fix and commit it to trunk soon? I'm asking this because I'm 
planning to start the next phase of this fix, which is to reduce the cache 
granularity to reduce refresh latency. If we're putting this fix back soon or 
the current patches are close, I can start on top of the existing patches. 
Otherwise maybe we'd like to improve the existing patches? Thanks for the info! 

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-28 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933864#comment-14933864
 ] 

Jason Lowe commented on YARN-3942:
--

Greg wrote:
bq. I was even able to create the problem with the REST/WS endpoint... 
http://xlabhadnnh2.example.com:8188/ws/v1/applicationhistory/apps/application_1443218824767_0002/appattempts/appattempt_1443218824767_0002_01/containers/container_1443218824767_0002_01_01

That looks like a problem with the application history server rather than the 
timeline server and the entity file timeline store.  The key exception appears 
to be this part at the end:
{noformat}
Caused by: org.apache.hadoop.yarn.webapp.WebAppException: Error rendering 
block: nestLevel=6 expected 5
at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38)
{noformat}

Looks like some malformed Hamlet code or something where a block wasn't closed 
when it should have been.  Doing a quick JIRA search popped up YARN-3110 which 
is fixed in 2.8 and appears to be fixing the same exception.

Jonathan wrote:
bq. Can we wrap the mkdirs command in the EntityFilleTimelineStore to do a 
exists check on the directories before doing a mkdirs.

Yeah, that's a good catch.  I'll get a patch up later this week with that fix.

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908503#comment-14908503
 ] 

Li Lu commented on YARN-3942:
-

Ah I see. BTW, seems like the cache loading process in refreshCache is properly 
synchronized. Please do let me know if there are any other problems and thank 
you for your help! 

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908423#comment-14908423
 ] 

Greg Senia commented on YARN-3942:
--

Noticed this in the timeline server logs with certain hive jobs.. specifically 
at the beginning of the job is it just not caching fast enough?

2015-09-25 14:03:21,381 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,409 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,449 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,476 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,486 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,941 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,963 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,965 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:32,967 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,410 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,451 - INFO  
[1663897715@qtp-117005517-14:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,459 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,468 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,628 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,630 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,632 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:33,634 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:34,413 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:34,454 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:34,632 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003
2015-09-25 14:03:34,634 - INFO  
[865367522@qtp-117005517-13:EntityFileTimelineStore@833] - Failed to load 
cached store for application_1443203299712_0003

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908435#comment-14908435
 ] 

Li Lu commented on YARN-3942:
-

H... Seems quite likely, but let me double check it...

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908482#comment-14908482
 ] 

Greg Senia commented on YARN-3942:
--

I see the issue.. Our security folks mandated a umask change and my teammate 
was testing in lab.. umask is now 006

2015-09-25 14:37:29,056 - WARN  
[1438603067@qtp-117005517-38:TimelineDataManager@353] - Skip the timeline 
entity: { id: container_1443203299712_0007_01_11, type: YARN_CONTAINER }
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=yarn, access=READ_EXECUTE, 
inode="/tmp/entity-file-history/active/application_1443203299712_0007":greg:hdfs:drwxrwx---
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:185)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6815)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6797)
at org.apache.hadoop.hdfs.server.nam

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908790#comment-14908790
 ] 

Greg Senia commented on YARN-3942:
--

I was even able to create the problem with the REST/WS endpoint...  
http://xlabhadnnh2.example.com:8188/ws/v1/applicationhistory/apps/application_1443218824767_0002/appattempts/appattempt_1443218824767_0002_01/containers/container_1443218824767_0002_01_01

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908845#comment-14908845
 ] 

Jonathan Eagles commented on YARN-3942:
---

Also, I'm getting an error while starting up the timelineserver while the 
namenode is in safe mode. Can we wrap the mkdirs command in the 
EntityFilleTimelineStore to do a exists check on the directories before doing a 
mkdirs. That way we can bring the services up while the namenode is in safemode 
similar to the other timeline stores.

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908765#comment-14908765
 ] 

Greg Senia commented on YARN-3942:
--

Got beyond that and now getting this:

2015-09-25 17:43:32,249 - ERROR [1121457019@qtp-365579627-11:AppBlock@169] - 
Failed to read the AM container of the application attempt 
appattempt_1443215518288_0002_01.
java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
at 
org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:156)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
at 
org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at 
org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:572)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:269)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:542)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 

[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908855#comment-14908855
 ] 

Li Lu commented on YARN-3942:
-

[~gss2002] the exception stack looks related to security settings? Not sure if 
the call has reached the underlying cache storage. Maybe there's something 
interesting in the ATS log? 

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-25 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908893#comment-14908893
 ] 

Greg Senia commented on YARN-3942:
--

[~gtCarrera9] that is all that is in the ATS log w/ debug enabled on the root 
logger. I'm thinking of adding some debug lines to print out what is happening 
inside of those methods..

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906561#comment-14906561
 ] 

Jason Lowe commented on YARN-3942:
--

Thanks for the patch, Li!  I looks good to me after a brief overview of the 
patch.  Are there any stats on the latency hit when using this versus the 
original memory approach?  This will depend upon the amount of data being 
ingested, but I'm wondering if you have some data points for various sizes.

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-24 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906794#comment-14906794
 ] 

Li Lu commented on YARN-3942:
-

Thanks [~jlowe]! Unfortunately we haven't got a chance to test the latency for 
the new storage. As a reference, we can find something from leveldb's 
performance benchmark site 
(http://leveldb.googlecode.com/svn/trunk/doc/benchmark.html). For each of the 
random entity get/set we trigger two level db ops. For sequential reads 
(iterators) we generate at most one random read for the starting position, and 
then just sequential reads. I haven't tough caching policy in the first draft, 
but would definitely want to hear some feedbacks and suggestions from the 
community. 

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-24 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907612#comment-14907612
 ] 

Greg Senia commented on YARN-3942:
--

Also [~gtCarrera] I was not able to get your patch to compile on my custom 
Hadoop 2.6.x code base. I had to change it from this +import 
org.apache.htrace.fasterxml.jackson.databind.ObjectMapper; as this was not 
found at compile time... changed it to +import 
com.fasterxml.jackson.databind.ObjectMapper; and added the following below to 
the pom.. My testing is underway so far so good


  
com.fasterxml.jackson.core
jackson-databind
2.2.3
  


> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-24 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907610#comment-14907610
 ] 

Greg Senia commented on YARN-3942:
--

[~jlowe] The patch worked correctly. I meant to state ATS was crashing 
constantly BEFORE applying the first patch.. So your patch was good sorry for 
the confusion. Things have been very hectic for us sorry for not responding 
sooner. I am currently testing the changes proposed yesterday I will have 
feedback in the coming days..
 Thanks again for all your hardwork.



> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-23 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905398#comment-14905398
 ] 

Xuan Gong commented on YARN-3942:
-

The patch while implements LevelDBCacheTimelineStore as cache used in 
EntityFileTimelineStore looks fine.
[~jlowe] [~jeagles] Could you take a look, please ?

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-21 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901388#comment-14901388
 ] 

Li Lu commented on YARN-3942:
-

BTW, the patch apply to the existing YARN-3942.001.patch. 

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743621#comment-14743621
 ] 

Jason Lowe commented on YARN-3942:
--

Thanks for trying out the patch, [~gss2002]!  Sorry to hear it's been crashing 
in your setup.  If you could provide a bit more details on the types of crashes 
(e.g.: stacktraces) that would be very helpful.  If they are OOM type of 
crashes then it could either be the Hive session reuse problem that Hitesh 
mentioned or possibly there are too many jobs being cached simultaneously for 
the heap size being used by the ATS.

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-12 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742159#comment-14742159
 ] 

Hitesh Shah commented on YARN-3942:
---

Thanks [~gss2002]. One point to note - if you use long running Hive sessions, 
this will cause an OOM in the timeline server as the data is cached on a per 
"session" basis. I am not sure if there is another simple way to disable Hive 
session re-use in the HiveServer \cc [~vikram.dixit]

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-11 Thread Greg Senia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741848#comment-14741848
 ] 

Greg Senia commented on YARN-3942:
--

I have placed this patch and the Tez patch into our test environments as we 
actively watched ATS crash many times over the past few weeks as we run about 
50k worth of tez apps/jobs a day.

I will provide some feed back in the next few days

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730887#comment-14730887
 ] 

Jason Lowe commented on YARN-3942:
--

bq. Since we know the file size to be read, could we return a message saying 
something like "scanning file size FOO. Expect BAR latency"?

I'm not a UI expert, but given the timeline store is just a REST backend to the 
real UI this seems tricky to do in practice.  The UI javascript is doing a 
bunch of separate GETs to the various REST endpoints and expecting the results, 
but we'd have to return something else that says "I'm not done yet" and expect 
the UI to do something sane with that.  If we do this over the normal endpoints 
it will break the timelineserver API for existing clients.  Granted, we're 
already sorta breaking it by not supporting some cross-app queries that were 
supported in the past.

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-03 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729815#comment-14729815
 ] 

Hitesh Shah commented on YARN-3942:
---

Some ideas from an offline discussion with [~bikassaha] and [~vinodkv]:

- option 1) Could we just use leveldb as an LRU cache instead of a memory 
based cache to handle the OOM issue?
- option 2) Could we just take the data from HDFS and write it out to 
leveldb and using the level db to serve data out? This would address the OOM 
issue too. 

\cc [~jlowe] [~jeagles]


> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729843#comment-14729843
 ] 

Jason Lowe commented on YARN-3942:
--

Option 1 will add some latency (not clear how much yet) to initializing the 
cache, and it could take quite a bit of time to build it depending upon how 
many dags were run in the same session and the amount of data from each dag.

If I understand option 2 properly, it proposes to have the scanner read all the 
data, not just the summary data, out of HDFS and store it in the main leveldb.  
The problem we run into with that approach is that for our production scale and 
desired retention periods it would generate a very, very large set of leveldb 
databases that must be stored locally, and query performance starts to degrade 
as the leveldb databases get really large.

Option 1 is more viable for us, assuming we won't have horrendous latency 
issues trying to build a substantial database from a monster session.  Option 2 
is not as attractive, although I could see it being appealing to those that 
don't need to worry about huge leveldb size problems.

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-03 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729917#comment-14729917
 ] 

Bikas Saha commented on YARN-3942:
--

In Option 1 the latency would be the time to read the entire session file for a 
session that has run many DAGs, right?
Since we know the file size to be read, could we return a message saying 
something like "scanning file size FOO. Expect BAR latency"?

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-08-28 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720668#comment-14720668
 ] 

Jason Lowe commented on YARN-3942:
--

Yeah that's going to be tricky, especially if we need to move most of the code 
into YARN.  Haven't had time to give this much thought, but the only way I can 
think of to keep most of the functionality in YARN is to have the timeline 
client be able to specify when a new session starts (i.e.: entity file writer 
should start writing to a new file and user provides some clue/hint as to what 
to name the file).  We can then have a plugin on the entity file server side 
that allows apps to override the getTimelineStoreForRead functionality.

If that was in place then the Tez side could start a new session (dag file) 
each time the dag changed.  The Tez-specific plugin on the timeline server side 
could then translate dag/vertex/task/attempt IDs into the appropriate dag file 
to cache.  There would still be some questions as to how the timeline store 
cache would be managed on the server side and how to support multiple 
framework-specific plugins simultaneously.

 Timeline store to read events from HDFS
 ---

 Key: YARN-3942
 URL: https://issues.apache.org/jira/browse/YARN-3942
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3942.001.patch


 This adds a new timeline store plugin that is intended as a stop-gap measure 
 to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
 v2.  The intent of this plugin is to provide a workable solution for running 
 the Tez UI against the timeline server on a large-scale clusters running many 
 thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-08-28 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720574#comment-14720574
 ] 

Hitesh Shah commented on YARN-3942:
---

[~jlowe] [~rajesh.balamohan] observed that the timeline server was running out 
of memory in a certain scenario. In this scenario, we are using Hive-on-Tez but 
Hive re-uses the application to run 100s of DAGs/queries (doAs=false with 
perimeter security using say Ranger or Sentry). The EntityFileStore sizes a 
cache based on the no. of applications it can cache but in the above scenario, 
even a single app could be very large. Ideally, if each dag was in a separate 
file and all of its entries treated as a single cache entity - that would 
probably work better but making this generic enough may be a bit tricky.

Any suggestions here? 



 Timeline store to read events from HDFS
 ---

 Key: YARN-3942
 URL: https://issues.apache.org/jira/browse/YARN-3942
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3942.001.patch


 This adds a new timeline store plugin that is intended as a stop-gap measure 
 to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
 v2.  The intent of this plugin is to provide a workable solution for running 
 the Tez UI against the timeline server on a large-scale clusters running many 
 thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-08-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701242#comment-14701242
 ] 

Jason Lowe commented on YARN-3942:
--

[~rajesh] the initial exception looks like an issue with the HDFS client layer, 
and most HDFS clients would have similar problems trying to use HDFS.  Normally 
HDFS operations are not retried because there are many retries already in the 
HDFS client and server layers.  So I don't think that exception is an issue to 
fix in the ATS but rather the HDFS configuration and/or code.

Also the patch does not treat that exception being logged as fatal.  It just 
logs the fact that it couldn't complete a scan for that iteration.  It will try 
again in the next scan interval.  The real problem is indicated by this line:
{noformat}
2015-08-18 01:03:35,600 [SIGTERM handler] ERROR 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
 RECEIVED SIGNAL 15: SIGTERM
{noformat}
Something outside of the ATS is killing the process with SIGTERM.

 Timeline store to read events from HDFS
 ---

 Key: YARN-3942
 URL: https://issues.apache.org/jira/browse/YARN-3942
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3942.001.patch


 This adds a new timeline store plugin that is intended as a stop-gap measure 
 to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
 v2.  The intent of this plugin is to provide a workable solution for running 
 the Tez UI against the timeline server on a large-scale clusters running many 
 thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-08-17 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700657#comment-14700657
 ] 

Rajesh Balamohan commented on YARN-3942:


Should this be resilient to cluster restarts? For e.g, when cluster restart 
happens, timeline server automatically gets killed with the following exception.

{noformat}
2015-08-18 01:03:31,523 [EntityLogPluginWorker #6] ERROR 
org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore: Error scanning 
active files
...
...
[EntityLogPluginWorker #0] ERROR 
org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore: Error scanning 
active files
java.io.EOFException: End of File Exception between local host is: 
atsmachine; destination host is: m1:8020; : java.io.EOFException; For more 
details see:  http://wiki.apache.org/hadoop/EOFException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
at org.apache.hadoop.ipc.Client.call(Client.java:1444)
at org.apache.hadoop.ipc.Client.call(Client.java:1371)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy26.getListing(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:574)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy27.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1748)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.init(DistributedFileSystem.java:973)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.init(DistributedFileSystem.java:984)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.init(DistributedFileSystem.java:956)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:935)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:931)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:943)
at 
org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore.scanActiveLogs(EntityFileTimelineStore.java:314)
at 
org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore.access$1300(EntityFileTimelineStore.java:79)
at 
org.apache.hadoop.yarn.server.timeline.EntityFileTimelineStore$EntityLogScanner.run(EntityFileTimelineStore.java:771)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1098)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:993)
2015-08-18 01:03:35,600 [SIGTERM handler] ERROR 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer:
 RECEIVED SIGNAL 15: SIGTERM
2015-08-18 01:03:35,608 [Thread-1] INFO org.mortbay.log: Stopped 
HttpServer2$SelectChannelConnectorWithSafeStartup@atsmachine:8188
2015-08-18 01:03:35,710 [Thread-1] INFO 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping 
ApplicationHistoryServer metrics system...
2015-08-18 01:03:35,712 [Thread-1] INFO 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 

[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-08-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652753#comment-14652753
 ] 

Sangjin Lee commented on YARN-3942:
---

+1 with this idea as a way to mitigate the scalability/reliability concern of 
v.1 until we have v.2 ready.

We talked about it briefly offline, but v.2 could borrow some parts of this 
idea to be able to spill over pending writes to something like hdfs in case the 
real backend storage is unavailable.

 Timeline store to read events from HDFS
 ---

 Key: YARN-3942
 URL: https://issues.apache.org/jira/browse/YARN-3942
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3942.001.patch


 This adds a new timeline store plugin that is intended as a stop-gap measure 
 to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
 v2.  The intent of this plugin is to provide a workable solution for running 
 the Tez UI against the timeline server on a large-scale clusters running many 
 thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-07-30 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647873#comment-14647873
 ] 

Jason Lowe commented on YARN-3942:
--

I think so.  We can probably create a new TimelineClient that stores to HDFS 
files based on how yarn.timeline-service.entity-file-store.summary-entity-types 
is configured.  However I'm not sure if YARN can automatically replace timeline 
clients being requested with this one, as the client needs to know the 
application ID when putting domains and the application attempt ID when posting 
entities.  So one approach is to have YARN provide something like a 
TimelineEntityFileClient, which is a TimelineClient, but Tez and other app 
frameworks would have to explicitly ask for it themselves and provide the 
appropriate application ID/app attempt ID upon construction of the client.

Let me know if that sounds OK if there's an idea of how YARN can seamlessly 
provide this alternative client instead of TimelineClientImpl when 
TimelineClient.createTimelineClient is called.

 Timeline store to read events from HDFS
 ---

 Key: YARN-3942
 URL: https://issues.apache.org/jira/browse/YARN-3942
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3942.001.patch


 This adds a new timeline store plugin that is intended as a stop-gap measure 
 to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
 v2.  The intent of this plugin is to provide a workable solution for running 
 the Tez UI against the timeline server on a large-scale clusters running many 
 thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-07-30 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648057#comment-14648057
 ] 

Jason Lowe commented on YARN-3942:
--

The logs are created based on app attempt.  It helps avoid the split-brain, 
double-writer issue where the previous attempt is still running when the RM 
expires it (e.g.: due to network cut) and decides to launch another.  The files 
are stored and looked up in a directory that is named after the application ID, 
and the entity files within that directory are stored based on application 
attempt ID.  I don't think the latter is crucial to use the app attempt ID and 
the reader is not relying on the attempt ID from those files, but it was a 
simple way to avoid colliding with previous attempts and having the reader 
process the files in attempt order.

 Timeline store to read events from HDFS
 ---

 Key: YARN-3942
 URL: https://issues.apache.org/jira/browse/YARN-3942
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3942.001.patch


 This adds a new timeline store plugin that is intended as a stop-gap measure 
 to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
 v2.  The intent of this plugin is to provide a workable solution for running 
 the Tez UI against the timeline server on a large-scale clusters running many 
 thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-07-30 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648048#comment-14648048
 ] 

Zhijie Shen commented on YARN-3942:
---

Yeah, I prefer creating a TimelineEntityFileClient to modifying the current 
TimelineClientImp, because it should minimize the affect on existing code path. 
However, I'm afraid no matter which way we chose, we cannot make the change 
seamless to users.  We cannot avoid the additional step at the client side to 
set app/app-attempt ID, can we? At Hive/Tez client (and other potential app 
client), you also have to switch the context app/app-attempt ID once the client 
detect a new YARN app/app-attempt is created. Therefore, if some application 
wants to make use of it, it will also involve code change at the user land.

BTW, why do you need app-attempt ID? Is the log file on the basis of app or 
app-attempt?

 Timeline store to read events from HDFS
 ---

 Key: YARN-3942
 URL: https://issues.apache.org/jira/browse/YARN-3942
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3942.001.patch


 This adds a new timeline store plugin that is intended as a stop-gap measure 
 to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
 v2.  The intent of this plugin is to provide a workable solution for running 
 the Tez UI against the timeline server on a large-scale clusters running many 
 thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-07-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647217#comment-14647217
 ] 

Zhijie Shen commented on YARN-3942:
---

[~jlowe], thanks for sharing more information about limitation. It sounds a 
reasonable tradeoff, and only affects the cross-app queries. One concern is 
that the patch only contains the read path, and the writer path only exists in 
TEZ. Therefore, it's not a complete solution from the perspective of YARN 
alone. Is it possible to generalize the write path in TEZ and promote it to 
YARN?

 Timeline store to read events from HDFS
 ---

 Key: YARN-3942
 URL: https://issues.apache.org/jira/browse/YARN-3942
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3942.001.patch


 This adds a new timeline store plugin that is intended as a stop-gap measure 
 to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
 v2.  The intent of this plugin is to provide a workable solution for running 
 the Tez UI against the timeline server on a large-scale clusters running many 
 thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-07-28 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644394#comment-14644394
 ] 

Jason Lowe commented on YARN-3942:
--

bq.  if you'd like to elaborate the drawback a bit, it will be helpful.

Yes the drawback is that one cannot do cross-application queries unless the 
entity type is stored in the main database (i.e.: listed in 
yarn.timeline-service.entity-file-store.summary-entity-types).  Another case 
that wouldn't work is where the query has multiple application IDs in it -- the 
query processing will choose the HDFS store of one of the applications and fail 
to find entities for the others.  In practice the Tez UI only does 
cross-application queries on the All DAGs front page, and that only needs a 
small amount of entity types as I listed above.  Since that's the main use case 
we're optimizing for with this approach, it allows us to offload most of the 
entity types from the leveldb database and serve them directly from HDFS.

So this solution is a tradeoff.  It limits the types of queries that the 
timeline server can properly answer but significantly scales the single-node 
timeline server and decouples the jobs posting events from the timeline server. 
 The latter is particularly interesting for us, as we no longer have a 
mission-critical single node for running jobs that need to post timeline events.


 Timeline store to read events from HDFS
 ---

 Key: YARN-3942
 URL: https://issues.apache.org/jira/browse/YARN-3942
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3942.001.patch


 This adds a new timeline store plugin that is intended as a stop-gap measure 
 to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
 v2.  The intent of this plugin is to provide a workable solution for running 
 the Tez UI against the timeline server on a large-scale clusters running many 
 thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-07-27 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643790#comment-14643790
 ] 

Zhijie Shen commented on YARN-3942:
---

Thanks for this work. Agree it's a good interim step between v1 and v2. I have 
a first scan of this patch, and am fine with the idea overall. As far as I can 
tell, the unsupported case is to get entities of the same type across 
applications. Other than that, the HDFS data path seems to work fine. [~jlowe], 
if you'd like to elaborate the drawback a bit, it will be helpful.

Will continue to review the patch, and post more detailed comments.

 Timeline store to read events from HDFS
 ---

 Key: YARN-3942
 URL: https://issues.apache.org/jira/browse/YARN-3942
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3942.001.patch


 This adds a new timeline store plugin that is intended as a stop-gap measure 
 to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
 v2.  The intent of this plugin is to provide a workable solution for running 
 the Tez UI against the timeline server on a large-scale clusters running many 
 thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-07-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634079#comment-14634079
 ] 

Vinod Kumar Vavilapalli commented on YARN-3942:
---

Tx for posting this [~jlowe]. +1, this is a good interim step for existing 
deployments at scale, as we move from the non-scalable/non-reliable V1 version 
to V2.

/cc [~zjshen].

 Timeline store to read events from HDFS
 ---

 Key: YARN-3942
 URL: https://issues.apache.org/jira/browse/YARN-3942
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3942.001.patch


 This adds a new timeline store plugin that is intended as a stop-gap measure 
 to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
 v2.  The intent of this plugin is to provide a workable solution for running 
 the Tez UI against the timeline server on a large-scale clusters running many 
 thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-07-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634058#comment-14634058
 ] 

Jason Lowe commented on YARN-3942:
--

TEZ-2628 has the corresponding changes to allow Tez jobs to post timeline 
server entities via HDFS.

 Timeline store to read events from HDFS
 ---

 Key: YARN-3942
 URL: https://issues.apache.org/jira/browse/YARN-3942
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3942.001.patch


 This adds a new timeline store plugin that is intended as a stop-gap measure 
 to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
 v2.  The intent of this plugin is to provide a workable solution for running 
 the Tez UI against the timeline server on a large-scale clusters running many 
 thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-07-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634071#comment-14634071
 ] 

Jason Lowe commented on YARN-3942:
--

To setup for the Tez UI, 
yarn.timeline-service.entity-file-store.summary-entity-types needs to be set to 
something like YARN_APPLICATION,TEZ_DAG_ID,TEZ_APPLICATION.  That directs the 
store plugin to copy those entity types to the main leveldb database that the 
cluster overview page (i.e.: the main All DAGs page) references.  This allows 
the main TezUI page to remain functional without requiring a query on every 
application file.  All other entity types will be served from the HDFS entity 
files created by the job.

 Timeline store to read events from HDFS
 ---

 Key: YARN-3942
 URL: https://issues.apache.org/jira/browse/YARN-3942
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-3942.001.patch


 This adds a new timeline store plugin that is intended as a stop-gap measure 
 to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
 v2.  The intent of this plugin is to provide a workable solution for running 
 the Tez UI against the timeline server on a large-scale clusters running many 
 thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)