[jira] [Created] (YARN-6318) timeline service schema creator fails if executed from a remote machine

2017-03-09 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-6318:
-

 Summary: timeline service schema creator fails if executed from a 
remote machine
 Key: YARN-6318
 URL: https://issues.apache.org/jira/browse/YARN-6318
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 3.0.0-alpha1
Reporter: Sangjin Lee


The timeline service schema creator fails if executed from a remote machine and 
the remote machine does not have the right {{hbase-site.xml}} file to talk to 
that remote HBase cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6170) TimelineReaderServer should wait to join with HttpServer2

2017-02-09 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-6170:
-

 Summary: TimelineReaderServer should wait to join with HttpServer2
 Key: YARN-6170
 URL: https://issues.apache.org/jira/browse/YARN-6170
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelinereader
Affects Versions: YARN-5355
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Minor


While I was backporting YARN-5355-branch-2 to a 2.6.0-based code branch, I 
noticed that the timeline reader daemon would promptly shut down upon start. It 
turns out that in the 2.6.0 code line at least there are only daemon threads 
left once the main method returns. That causes the JVM to shut down.

The right pattern to start an embedded jetty web server is to call 
{{Server.start()}} followed by {{Server.join()}}. That way, the server stays up 
reliably no matter what other threads get created.

It works on YARN-5355 only because there *happens* to be one other non-daemon 
thread. We should add the {{join()}} call to be always correct.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6140) start time key in NM leveldb store should be removed when container is removed

2017-02-03 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-6140:
-

 Summary: start time key in NM leveldb store should be removed when 
container is removed
 Key: YARN-6140
 URL: https://issues.apache.org/jira/browse/YARN-6140
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-5355
Reporter: Sangjin Lee


It appears that the start time key is not removed when the container is 
removed. The key was introduced in YARN-5792.

I found this while backporting the YARN-5355-branch-2 branch to our internal 
branch loosely based on 2.6.0. The {{TestNMLeveldbStateStoreService}} test was 
failing because of this.

I'm not sure why we didn't see this earlier.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6095) create a REST API that returns the clusters for a given app id

2017-01-13 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-6095:
-

 Summary: create a REST API that returns the clusters for a given 
app id
 Key: YARN-6095
 URL: https://issues.apache.org/jira/browse/YARN-6095
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee


It would be good to have a timeline service REST endpoint that can return the 
list of clusters for a given app id. This becomes possible after YARN-5378 is 
in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5792) adopt the id prefix for YARN, MR, and DS entities

2016-10-27 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5792:
-

 Summary: adopt the id prefix for YARN, MR, and DS entities
 Key: YARN-5792
 URL: https://issues.apache.org/jira/browse/YARN-5792
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-5355
Reporter: Sangjin Lee


We introduced the entity id prefix to support flexible entity sorting 
(YARN-5715). We should adopt the id prefix for YARN entities, MR entities, and 
DS entities to take advantage of the id prefix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5715) introduce entity prefix for return and sort order

2016-10-06 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5715:
-

 Summary: introduce entity prefix for return and sort order
 Key: YARN-5715
 URL: https://issues.apache.org/jira/browse/YARN-5715
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Priority: Critical


While looking into YARN-5585, we have come across the need to provide a sort 
order different than the current entity id order. The current entity id order 
returns entities strictly in the lexicographical order, and as such it returns 
the earliest entities first. This may not be the most natural return order. A 
more natural return/sort order would be from the most recent entities.

To solve this, we would like to add what we call the "entity prefix" in the row 
key for the entity table. It is a number (long) that can be easily provided by 
the client on write. In the row key, it would be added before the entity id 
itself.

The entity prefix would be considered mandatory. On all writes (including 
updates) the correct entity prefix should be set by the client so that the 
correct row key is used. The entity prefix needs to be unique only within the 
scope of the application and the entity type.

For queries that return a list of entities, the prefix values will be returned 
along with the entity id's. Queries that specify the prefix and the id should 
be returned quickly using the row key. If the query omits the prefix but 
specifies the id (query by id), the query may be less efficient.

This JIRA should add the entity prefix to the entity API and add its handling 
to the schema and the write path. The read path will be addressed in YARN-5585.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5379) TestHBaseTimelineStorage. testWriteApplicationToHBase() fails intermittently

2016-07-14 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5379:
-

 Summary: TestHBaseTimelineStorage. testWriteApplicationToHBase() 
fails intermittently
 Key: YARN-5379
 URL: https://issues.apache.org/jira/browse/YARN-5379
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test, timelineserver
Affects Versions: 3.0.0-alpha1
Reporter: Sangjin Lee
Priority: Minor


The {{TestHBaseTimelineStorage. testWriteApplicationToHBase()}} test seems to 
fail intermittently:
{noformat}
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorage.testWriteApplicationToHBase(TestHBaseTimelineStorage.java:817)
{noformat}

The stdout output:
{noformat}
2016-07-13 00:15:48,883 INFO  [main] zookeeper.RecoverableZooKeeper 
(RecoverableZooKeeper.java:(120)) - Process 
identifier=hconnection-0x2b7962a2 connecting to ZooKeeper 
ensemble=localhost:53474
2016-07-13 00:15:48,883 INFO  [main] zookeeper.ZooKeeper 
(ZooKeeper.java:(438)) - Initiating client connection, 
connectString=localhost:53474 sessionTimeout=9 
watcher=hconnection-0x2b7962a20x0, quorum=localhost:53474, baseZNode=/hbase
2016-07-13 00:15:48,886 INFO  [main-SendThread(localhost:53474)] 
zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket 
connection to server localhost/127.0.0.1:53474. Will not attempt to 
authenticate using SASL (unknown error)
2016-07-13 00:15:48,887 INFO  [main-SendThread(localhost:53474)] 
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection 
established to localhost/127.0.0.1:53474, initiating session
2016-07-13 00:15:48,887 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:53474] 
server.NIOServerCnxnFactory (NIOServerCnxnFactory.java:run(197)) - Accepted 
socket connection from /127.0.0.1:38097
2016-07-13 00:15:48,887 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:53474] 
server.ZooKeeperServer (ZooKeeperServer.java:processConnectRequest(868)) - 
Client attempting to establish new session at /127.0.0.1:38097
2016-07-13 00:15:48,896 INFO  [SyncThread:0] server.ZooKeeperServer 
(ZooKeeperServer.java:finishSessionInit(617)) - Established session 
0x155e19baa520025 with negotiated timeout 4 for client /127.0.0.1:38097
2016-07-13 00:15:48,896 INFO  [main-SendThread(localhost:53474)] 
zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1235)) - Session 
establishment complete on server localhost/127.0.0.1:53474, sessionid = 
0x155e19baa520025, negotiated timeout = 4
2016-07-13 00:15:48,911 INFO  [main] zookeeper.RecoverableZooKeeper 
(RecoverableZooKeeper.java:(120)) - Process 
identifier=hconnection-0x32130e61 connecting to ZooKeeper 
ensemble=localhost:53474
2016-07-13 00:15:48,912 INFO  [main] zookeeper.ZooKeeper 
(ZooKeeper.java:(438)) - Initiating client connection, 
connectString=localhost:53474 sessionTimeout=9 
watcher=hconnection-0x32130e610x0, quorum=localhost:53474, baseZNode=/hbase
2016-07-13 00:15:48,917 INFO  [main-SendThread(localhost:53474)] 
zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(975)) - Opening socket 
connection to server localhost/127.0.0.1:53474. Will not attempt to 
authenticate using SASL (unknown error)
2016-07-13 00:15:48,918 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:53474] 
server.NIOServerCnxnFactory (NIOServerCnxnFactory.java:run(197)) - Accepted 
socket connection from /127.0.0.1:38098
2016-07-13 00:15:48,921 INFO  [main-SendThread(localhost:53474)] 
zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(852)) - Socket connection 
established to localhost/127.0.0.1:53474, initiating session
2016-07-13 00:15:48,921 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:53474] 
server.ZooKeeperServer (ZooKeeperServer.java:processConnectRequest(868)) - 
Client attempting to establish new session at /127.0.0.1:38098
2016-07-13 00:15:48,929 INFO  [SyncThread:0] server.ZooKeeperServer 
(ZooKeeperServer.java:finishSessionInit(617)) - Established session 
0x155e19baa520026 with negotiated timeout 4 for client /127.0.0.1:38098
2016-07-13 00:15:48,929 INFO  [main-SendThread(localhost:53474)] 
zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1235)) - Session 
establishment complete on server localhost/127.0.0.1:53474, sessionid = 
0x155e19baa520026, negotiated timeout = 4
2016-07-13 00:15:48,938 INFO  [main] storage.HBaseTimelineWriterImpl 
(HBaseTimelineWriterImpl.java:serviceStop(541)) - closing the entity table
2016-07-13 00:15:48,938 INFO  [main] storage.HBaseTimelineWriterImpl 
(HBaseTimelineWriterImpl.java:serviceStop(546)) - closing the app_flow table
2016-07-13 00:15:48,938 INFO  [main] storage.HBaseTimelineWriterImpl 
(HBaseTimelineWriterImpl.java:serviceStop(551)) - closing the application table
2016-07-13 00:15:48,941 INFO  

[jira] [Created] (YARN-5364) timelineservice modules have indirect dependencies on mapreduce artifacts

2016-07-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5364:
-

 Summary: timelineservice modules have indirect dependencies on 
mapreduce artifacts
 Key: YARN-5364
 URL: https://issues.apache.org/jira/browse/YARN-5364
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 3.0.0-alpha1
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Minor


The new timelineservice and timelineservice-hbase-tests modules have indirect 
dependencies to mapreduce artifacts through HBase and phoenix. Although it's 
not causing builds to fail, it's not good hygiene.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5359) FileSystemTimelineReader/Writer uses unix-specific default

2016-07-11 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5359:
-

 Summary: FileSystemTimelineReader/Writer uses unix-specific default
 Key: YARN-5359
 URL: https://issues.apache.org/jira/browse/YARN-5359
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha1
Reporter: Sangjin Lee
Assignee: Sangjin Lee


{{FileSystemTimelineReaderImpl}} and {{FileSystemTimelineWriterImpl}} use a 
unix-specific default. It won't work on Windows.

Also, {{TestFileSystemTimelineReaderImpl}} uses this default directly, which is 
also brittle against concurrent tests.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5355) YARN Timeline Service v.2: alpha 2

2016-07-11 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5355:
-

 Summary: YARN Timeline Service v.2: alpha 2
 Key: YARN-5355
 URL: https://issues.apache.org/jira/browse/YARN-5355
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical


This is an umbrella JIRA for the alpha 2 milestone for YARN Timeline Service 
v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5354) TestDistributedShell.checkTimelineV2() may fail for concurrent tests

2016-07-11 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5354:
-

 Summary: TestDistributedShell.checkTimelineV2() may fail for 
concurrent tests
 Key: YARN-5354
 URL: https://issues.apache.org/jira/browse/YARN-5354
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 3.0.0-alpha1
Reporter: Sangjin Lee
Assignee: Sangjin Lee


{{TestDistributedShell.checkTimelineV2()}} uses the default (hard-coded) 
storage root directory. This is brittle against concurrent tests. We should use 
a unique storage directory for the unit tests.

We should also fix the default storage location for 
{{FileSystemTimelineWriterImpl}} to be cross-platform as part of this. The 
current value ( {{/tmp/timeline-service-data}} ) won't work on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5236) FlowRunCoprocessor brings down HBase RegionServer

2016-07-10 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee resolved YARN-5236.
---
Resolution: Invalid

Timeline Service v.2 documentation now states that it requires HBase 1.1.3. 
Closing.

> FlowRunCoprocessor brings down HBase RegionServer
> -
>
> Key: YARN-5236
> URL: https://issues.apache.org/jira/browse/YARN-5236
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Haibo Chen
>
> The FlowRunCoprocessor, when loaded in HBase, will bring down the region 
> server with exception
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment.getRegion()
> I am running it with HBase 1.2.1 in pseudo-distributed mode to try out ATS v2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5316) fix hadoop-aws pom not to do the exclusion

2016-07-06 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5316:
-

 Summary: fix hadoop-aws pom not to do the exclusion
 Key: YARN-5316
 URL: https://issues.apache.org/jira/browse/YARN-5316
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee


We originally introduced an exclusion rule for {{hadoop-yarn-server-tests}} in 
{{hadoop-aws}}, as the {{hadoop-aws}} dependency on {{joda-time}} was colliding 
with that coming from {{hadoop-yarn-server-timelineservice}} (via 
{{phoenix-core}} ).

Now that the phoenix dependency is no longer on 
{{hadoop-yarn-server-timelineservice}} itself (it's moved to 
{{hadoop-yarn-server-timelineservice-hbase-tests}} ), it is safe to remove the 
exclusion rule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5252) EventDispatcher$EventProcessor.run() throws a findbugs error

2016-06-14 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee resolved YARN-5252.
---
Resolution: Duplicate

Thanks [~asuresh]! Hadn't noticed that one.

> EventDispatcher$EventProcessor.run() throws a findbugs error
> 
>
> Key: YARN-5252
> URL: https://issues.apache.org/jira/browse/YARN-5252
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Sangjin Lee
>Priority: Minor
>
> Findbugs complains {{EventDispatcher$EventProcessor.run()}} invokes 
> {{System.exit()}}. This comes up every time yarn-common is touched. We should 
> either address it or make it an exception if there is a good reason for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5253) NodeStatusPBImpl throws a bunch of synchronization findbugs warnings

2016-06-14 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee resolved YARN-5253.
---
Resolution: Duplicate

Fixed by YARN-5075.

> NodeStatusPBImpl throws a bunch of synchronization findbugs warnings
> 
>
> Key: YARN-5253
> URL: https://issues.apache.org/jira/browse/YARN-5253
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Sangjin Lee
>Priority: Minor
>
> There are several IS2_INCONSISTENT_SYNC findbugs warnings on 
> {{NodeStatusPBImpl}}. This should be addressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5252) EventDispatcher$EventProcessor.run() throws a findbugs error

2016-06-14 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5252:
-

 Summary: EventDispatcher$EventProcessor.run() throws a findbugs 
error
 Key: YARN-5252
 URL: https://issues.apache.org/jira/browse/YARN-5252
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Sangjin Lee
Priority: Minor


Findbugs complains {{EventDispatcher$EventProcessor.run()}} invokes 
{{System.exit()}}. This comes up every time yarn-common is touched. We should 
either address it or make it an exception if there is a good reason for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5243) fix several rebase and other miscellaneous issues before merge

2016-06-13 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5243:
-

 Summary: fix several rebase and other miscellaneous issues before 
merge
 Key: YARN-5243
 URL: https://issues.apache.org/jira/browse/YARN-5243
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee


I have come across a couple of miscellaneous issues while inspecting the diffs 
against the trunk.

We also need to review one last time (probably after the final rebase) to 
ensure the timeline services v.2 leaves no impact when disabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5174) add documentation on needing to add hbase-site.xml on YARN cluster

2016-05-27 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5174:
-

 Summary: add documentation on needing to add hbase-site.xml on 
YARN cluster
 Key: YARN-5174
 URL: https://issues.apache.org/jira/browse/YARN-5174
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee


One part that is missing in the documentation is the need to add 
{{hbase-site.xml}} on the client side (the client hadoop cluster). First, we 
need to arrive at the minimally required client setting to connect to the right 
hbase cluster. Then, we need to document it so that users know exactly what to 
do to configure the cluster to use the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5169) most of YARN events have timestamp of -1

2016-05-26 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5169:
-

 Summary: most of YARN events have timestamp of -1
 Key: YARN-5169
 URL: https://issues.apache.org/jira/browse/YARN-5169
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.2
Reporter: Sangjin Lee


Most of the YARN events (subclasses of {{AbstractEvent}}) have timestamp of -1. 
{{AbstractEvent}} have two constructors, one that initializes the timestamp to 
-1 and the other to the caller-provided value. But most events use the former 
(thus timestamp of -1).

Some of the more common events, including {{ApplicationEvent}}, 
{{ContainerEvent}}, {{JobEvent}}, etc. do not set the timestamp.

The rationale for this behavior seems to be mentioned in {{AbstractEvent}}:
{code}
  // use this if you DON'T care about the timestamp
  public AbstractEvent(TYPE type) {
this.type = type;
// We're not generating a real timestamp here.  It's too expensive.
timestamp = -1L;
  }
{code}

This absence of the timestamp isn't really visible in many cases and therefore 
may have gone unnoticed, but the timeline service exposes this problem very 
visibly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5111) YARN container system metrics are not aggregated to application

2016-05-18 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5111:
-

 Summary: YARN container system metrics are not aggregated to 
application
 Key: YARN-5111
 URL: https://issues.apache.org/jira/browse/YARN-5111
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Priority: Critical


It appears that the container system metrics (CPU and memory) are not being 
aggregated onto the application.

I definitely see container system metrics when I query for YARN_CONTAINER. 
However, there is no corresponding metrics on the parent application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5109) timestamps are stored unencoded causing parse errors

2016-05-17 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5109:
-

 Summary: timestamps are stored unencoded causing parse errors
 Key: YARN-5109
 URL: https://issues.apache.org/jira/browse/YARN-5109
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Priority: Blocker


When we store timestamps (for example as part of the row key or part of the 
column name for an event), the bytes are used as is without any encoding. If 
the byte value happens to contain a separator character we use (e.g. "!" or 
"="), it causes a parse failure when we read it.

I came across this while looking into this error in the timeline reader:
{noformat}
2016-05-17 21:28:38,643 WARN 
org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
 incorrectly formatted column name: it will be discarded
{noformat}

I traced the data that was causing this, and the column name (for the event) 
was the following:
{noformat}
i:e!YARN_RM_CONTAINER_CREATED=\x7F\xFF\xFE\xABDY=\x99=YARN_CONTAINER_ALLOCATED_HOST
{noformat}

Note that the column name is supposed to be of the format (event 
id)=(timestamp)=(event info key). However, observe the timestamp portion:
{noformat}
\x7F\xFF\xFE\xABDY=\x99
{noformat}

The presence of the separator ("=") causes the parse error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5105) entire time series is returned for YARN container system metrics (CPU and memory)

2016-05-17 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5105:
-

 Summary: entire time series is returned for YARN container system 
metrics (CPU and memory)
 Key: YARN-5105
 URL: https://issues.apache.org/jira/browse/YARN-5105
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


I see that the entire time series of the CPU and memory metrics are returned 
for the YARN containers REST query. This has a potential of bloating the output 
big time.

{noformat}
"metrics": [
{
"type": "TIME_SERIES",
"id": "MEMORY",
"values": 
{
"1463518173363": ​407539712,
"1463518170347": ​407539712,
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5102) timeline service build fails with java 8

2016-05-17 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5102:
-

 Summary: timeline service build fails with java 8
 Key: YARN-5102
 URL: https://issues.apache.org/jira/browse/YARN-5102
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Blocker


The build fails with java 8:

{noformat}
[WARNING] 
Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
are:
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
+-jdk.tools:jdk.tools:1.8
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hbase:hbase-common:1.0.1
+-org.apache.hbase:hbase-annotations:1.0.1
  +-jdk.tools:jdk.tools:1.7

[WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
failed with message:
Failed while enforcing releasability the error(s) are [
Dependency convergence error for jdk.tools:jdk.tools:1.8 paths to dependency 
are:
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
+-jdk.tools:jdk.tools:1.8
and
+-org.apache.hadoop:hadoop-yarn-server-timelineservice:3.0.0-SNAPSHOT
  +-org.apache.hbase:hbase-common:1.0.1
+-org.apache.hbase:hbase-annotations:1.0.1
  +-jdk.tools:jdk.tools:1.7
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5096) timelinereader has a lot of logging that's not useful

2016-05-16 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5096:
-

 Summary: timelinereader has a lot of logging that's not useful
 Key: YARN-5096
 URL: https://issues.apache.org/jira/browse/YARN-5096
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Priority: Minor


After running about a dozen or so requests, the timelinereader log is filled 
with the following logging entries:
{noformat}
2016-05-16 15:59:13,364 INFO 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper: null 
prefix was specified; returning all columns
2016-05-16 15:59:13,364 INFO 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper: null 
prefix was specified; returning all columns
2016-05-16 15:59:13,364 INFO 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper: null 
prefix was specified; returning all columns
2016-05-16 15:59:13,364 INFO 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper: null 
prefix was specified; returning all columns
2016-05-16 15:59:13,364 INFO 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper: null 
prefix was specified; returning all columns
2016-05-16 15:59:13,364 INFO 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper: null 
prefix was specified; returning all columns
2016-05-16 15:59:13,364 INFO 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper: null 
prefix was specified; returning all columns
2016-05-16 15:59:13,364 INFO 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper: null 
prefix was specified; returning all columns
2016-05-16 15:59:13,364 INFO 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper: null 
prefix was specified; returning all columns
{noformat}

There were some ~ 3,000 such logging entries. It's too excessive.

Also, when I requested YARN_CONTAINER with fields=ALL, I see the following logs:
{noformat}
WARN 
org.apache.hadoop.yarn.server.timelineservice.storage.common.TimelineStorageUtils:
 incorrectly formatted column name: it will be discarded
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5097) NPE in Separator.joinEncoded()

2016-05-16 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5097:
-

 Summary: NPE in Separator.joinEncoded()
 Key: YARN-5097
 URL: https://issues.apache.org/jira/browse/YARN-5097
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Priority: Critical


Both in the RM log and the NM log, I see the following exception thrown. First 
for RM,

{noformat}
2016-05-16 14:19:29,930 ERROR 
org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector: 
Error aggregating timeline metrics
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timelineservice.storage.common.Separator.joinEncoded(Separator.java:249)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.application.ApplicationRowKey.getRowKey(ApplicationRowKey.java:110)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.write(HBaseTimelineWriterImpl.java:131)
at 
org.apache.hadoop.yarn.server.timelineservice.collector.AppLevelTimelineCollector$AppLevelAggregator.run(AppLevelTimelineCollector.java:136)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
{noformat}

In the NM log, I see a similar exception:
{noformat}
2016-05-16 14:54:23,116 ERROR 
org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector: 
Error aggregating timeline metrics
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timelineservice.storage.common.Separator.joinEncoded(Separator.java:249)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.application.ApplicationRowKey.getRowKey(ApplicationRowKey.java:110)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.write(HBaseTimelineWriterImpl.java:131)
at 
org.apache.hadoop.yarn.server.timelineservice.collector.AppLevelTimelineCollector$AppLevelAggregator.run(AppLevelTimelineCollector.java:136)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5095) flow activities and flow runs are populated with wrong timestamp when RM restarts w/ recovery enabled

2016-05-16 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5095:
-

 Summary: flow activities and flow runs are populated with wrong 
timestamp when RM restarts w/ recovery enabled
 Key: YARN-5095
 URL: https://issues.apache.org/jira/browse/YARN-5095
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Priority: Critical


I have the RM recovery enabled. I see that upon restart the RM populates 
records into flow activity and flow runs but with *wrong* timestamps. What I 
mean by the timestamp is the part of the row key:
- flow activity: row created with the day of the RM restart
- flow run: row created with the RM start time as the "run id"

The following illustrates an example flow run:
{noformat}
metrics: [ ],
events: [ ],
id: "sjlee@Sleep job/1463433569917",
type: "YARN_FLOW_RUN",
createdtime: 1463422860987,
info: {
UID: "yarn_cluster!sjlee!Sleep job!1463433569917",
SYSTEM_INFO_FLOW_RUN_ID: 1463433569917,
SYSTEM_INFO_FLOW_NAME: "Sleep job",
SYSTEM_INFO_FLOW_RUN_END_TIME: 1463422865033,
SYSTEM_INFO_USER: "sjlee"
},
isrelatedto: { },
relatesto: { }
{noformat}
The created time and the end time are correct (i.e. original time), whereas the 
timestamp in the row key (= run id: 1463433569917) is actually later than the 
end time and coincides with the RM restart.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5094) some YARN container events have timestamp of -1 in REST output

2016-05-16 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5094:
-

 Summary: some YARN container events have timestamp of -1 in REST 
output
 Key: YARN-5094
 URL: https://issues.apache.org/jira/browse/YARN-5094
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


Some events in the YARN container entities have timestamp of -1. The 
RM-generated container events have proper timestamps. It appears that it's the 
NM-generated events that have -1: YARN_CONTAINER_CREATED, 
YARN_CONTAINER_FINISHED, YARN_NM_CONTAINER_LOCALIZATION_FINISHED, 
YARN_NM_CONTAINER_LOCALIZATION_STARTED.

In the YARN container page,
{noformat}
{
id: "YARN_CONTAINER_CREATED",
timestamp: -1,
info: { }
},
{
id: "YARN_CONTAINER_FINISHED",
timestamp: -1,
info: {
YARN_CONTAINER_EXIT_STATUS: 0,
YARN_CONTAINER_STATE: "RUNNING",
YARN_CONTAINER_DIAGNOSTICS_INFO: ""
}
},
{
id: "YARN_NM_CONTAINER_LOCALIZATION_FINISHED",
timestamp: -1,
info: { }
},
{
id: "YARN_NM_CONTAINER_LOCALIZATION_STARTED",
timestamp: -1,
info: { }
}
{noformat}

I think the data itself is OK, but the values are not being populated in the 
REST output?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5093) created time shows 0 in most REST output

2016-05-16 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5093:
-

 Summary: created time shows 0 in most REST output
 Key: YARN-5093
 URL: https://issues.apache.org/jira/browse/YARN-5093
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Priority: Critical


When querying the REST API, I find that the created time value is returned as 
"0" for most of the output. It includes:

- flow activity and flow runs in the flow activity page
- apps in the application page
- entities in the entity page

For example, in the flow activity page,
{noformat}
{
metrics: [ ],
events: [ ],
id: "yarn_cluster/146335680/sjlee@ds-date",
type: "YARN_FLOW_ACTIVITY",
createdtime: 0,
flowruns: [
{
metrics: [ ],
events: [ ],
id: "sjlee@ds-date/1463435661428",
type: "YARN_FLOW_RUN",
createdtime: 0,
info: {
SYSTEM_INFO_FLOW_VERSION: "1",
SYSTEM_INFO_FLOW_RUN_ID: 1463435661428,
SYSTEM_INFO_FLOW_NAME: "ds-date",
SYSTEM_INFO_USER: "sjlee"
},
isrelatedto: { },
relatesto: { }
}
],
info: {
SYSTEM_INFO_CLUSTER: "yarn_cluster",
UID: "yarn_cluster!sjlee!ds-date",
SYSTEM_INFO_FLOW_NAME: "ds-date",
SYSTEM_INFO_DATE: 146335680,
SYSTEM_INFO_USER: "sjlee"
},
isrelatedto: { },
relatesto: { }
}
{noformat}

The only page that appears to show the proper created time value is the flow 
run page. I think the data exists in the storage but is not populated in the UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5071) address HBase compatibility issues with trunk

2016-05-10 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5071:
-

 Summary: address HBase compatibility issues with trunk
 Key: YARN-5071
 URL: https://issues.apache.org/jira/browse/YARN-5071
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical


The trunk is now adding or planning to add more and more backward-incompatible 
changes. Some examples include
- remove v.1 metrics classes (HADOOP-12504)
- update jersey version (HADOOP-9613)
- target java 8 by default (HADOOP-11858)

This poses big challenges for the timeline service v.2 as we have a dependency 
on hbase which depends on an older version of hadoop.

We need to find a way to solve/contain/manage these risks before it is too late.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5070) upgrade HBase version for first merge

2016-05-10 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5070:
-

 Summary: upgrade HBase version for first merge
 Key: YARN-5070
 URL: https://issues.apache.org/jira/browse/YARN-5070
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical


Currently we set the HBase version for the timeline service storage to 1.0.1. 
This is a fairly old version, and there are reasons to upgrade to a newer 
version. We should upgrade it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5045) hbase unit tests fail due to dependency issues

2016-05-05 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-5045:
-

 Summary: hbase unit tests fail due to dependency issues
 Key: YARN-5045
 URL: https://issues.apache.org/jira/browse/YARN-5045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Blocker


After the 5/4 rebase, the hbase unit tests in the timeline service project are 
failing:

{noformat}
org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage
  Time elapsed: 5.103 sec  <<< ERROR!
java.io.IOException: Shutting down
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at 
org.apache.hadoop.hbase.http.HttpServer.addDefaultServlets(HttpServer.java:677)
at 
org.apache.hadoop.hbase.http.HttpServer.initializeWebServer(HttpServer.java:546)
at org.apache.hadoop.hbase.http.HttpServer.(HttpServer.java:500)
at org.apache.hadoop.hbase.http.HttpServer.(HttpServer.java:104)
at 
org.apache.hadoop.hbase.http.HttpServer$Builder.build(HttpServer.java:345)
at org.apache.hadoop.hbase.http.InfoServer.(InfoServer.java:77)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.putUpWebUI(HRegionServer.java:1697)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:550)
at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:333)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:139)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:217)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:153)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:213)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:93)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:978)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:938)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:812)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:806)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:750)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage.setup(TestTimelineReaderWebServicesHBaseStorage.java:87)
{noformat}

The root cause is that the hbase mini server depends on hadoop common's 
{{MetricsServlet}} which has been removed in the trunk (HADOOP-12504):

{noformat}
Caused by: java.lang.NoClassDefFoundError: 
org/apache/hadoop/metrics/MetricsServlet
at 
org.apache.hadoop.hbase.http.HttpServer.addDefaultServlets(HttpServer.java:677)
at 
org.apache.hadoop.hbase.http.HttpServer.initializeWebServer(HttpServer.java:546)
at org.apache.hadoop.hbase.http.HttpServer.(HttpServer.java:500)
at org.apache.hadoop.hbase.http.HttpServer.(HttpServer.java:104)
at 
org.apache.hadoop.hbase.http.HttpServer$Builder.build(HttpServer.java:345)
at org.apache.hadoop.hbase.http.InfoServer.(InfoServer.java:77)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.putUpWebUI(HRegionServer.java:1697)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:550)
at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:333)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:139)
... 26 more
{noformat}



--
This message was sent by 

[jira] [Resolved] (YARN-5014) Ensure non-metric values are returned as is for flow run table from the coprocessor

2016-04-30 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee resolved YARN-5014.
---
   Resolution: Fixed
Fix Version/s: YARN-2928

This is fixed by YARN-4986.

> Ensure non-metric values are returned as is for flow run table from the 
> coprocessor
> ---
>
> Key: YARN-5014
> URL: https://issues.apache.org/jira/browse/YARN-5014
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-2928-1st-milestone
> Fix For: YARN-2928
>
>
> Presently the FlowScanner class presumes existence of NumericValueConverter 
> in it's emitCells function. This causes an exception when we try to retrieve 
> non-numeric values from this table. 
> Exception is seen as:
> {code}
> java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.GenericConverter 
> cannot be cast to 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.NumericValueConverter
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowScanner.nextInternal(FlowScanner.java:246)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowScanner.nextRaw(FlowScanner.java:125)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowScanner.nextRaw(FlowScanner.java:119)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2117)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31443)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-4821) have a separate NM timeline publishing interval

2016-03-15 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4821:
-

 Summary: have a separate NM timeline publishing interval
 Key: YARN-4821
 URL: https://issues.apache.org/jira/browse/YARN-4821
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


Currently the interval with which NM publishes container CPU and memory metrics 
is tied to {{yarn.nodemanager.resource-monitor.interval-ms}} whose default is 3 
seconds. This is too aggressive.

There should be a separate configuration that controls how often 
{{NMTimelinePublisher}} publishes container metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4761) NMs reconnecting with changed capabilities can lead to wrong cluster resource calculations on fair scheduler

2016-03-03 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4761:
-

 Summary: NMs reconnecting with changed capabilities can lead to 
wrong cluster resource calculations on fair scheduler
 Key: YARN-4761
 URL: https://issues.apache.org/jira/browse/YARN-4761
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.6.4
Reporter: Sangjin Lee
Assignee: Sangjin Lee


YARN-3802 uncovered an issue with the scheduler where the resource calculation 
can be incorrect due to async event handling. It was subsequently fixed by 
YARN-4344, but it was never fixed for the fair scheduler.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4741) RM is flooded with RMNodeFinishedContainersPulledByAMEvents in the async dispatcher event queue

2016-02-26 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4741:
-

 Summary: RM is flooded with 
RMNodeFinishedContainersPulledByAMEvents in the async dispatcher event queue
 Key: YARN-4741
 URL: https://issues.apache.org/jira/browse/YARN-4741
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Sangjin Lee


We had a pretty major incident with the RM where it was continually flooded 
with RMNodeFinishedContainersPulledByAMEvents in the async dispatcher event 
queue.

In our setup, we had the RM HA or stateful restart *disabled*, but NM 
work-preserving restart *enabled*. Due to other issues, we did a cluster-wide 
NM restart.

Some time during the restart (which took multiple hours), we started seeing the 
async dispatcher event queue building. Normally it would log 1,000. In this 
case, it climbed all the way up to tens of millions of events.

When we looked at the RM log, it was full of the following messages:
{noformat}
2016-02-18 01:47:29,530 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Invalid event 
FINISHED_CONTAINERS_PULLED_BY_AM on Node  worker-node-foo.bar.net:8041
2016-02-18 01:47:29,535 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Can't handle 
this event at current state
2016-02-18 01:47:29,535 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Invalid event 
FINISHED_CONTAINERS_PULLED_BY_AM on Node  worker-node-foo.bar.net:8041
2016-02-18 01:47:29,538 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Can't handle 
this event at current state
2016-02-18 01:47:29,538 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Invalid event 
FINISHED_CONTAINERS_PULLED_BY_AM on Node  worker-node-foo.bar.net:8041
{noformat}

And that node in question was restarted a few minutes earlier.

When we inspected the RM heap, it was full of 
RMNodeFinishedContainersPulledByAMEvents.

Suspecting the NM work-preserving restart, we disabled it and did another 
cluster-wide rolling restart. Initially that seemed to have helped reduce the 
queue size, but the queue built back up to several millions and continued for 
an extended period. We had to restart the RM to resolve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4670) add logging when a node is AM-blacklisted

2016-02-03 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4670:
-

 Summary: add logging when a node is AM-blacklisted
 Key: YARN-4670
 URL: https://issues.apache.org/jira/browse/YARN-4670
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.8.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Trivial


Today there is not much logging happening when a node is blacklisted for an AM 
(see YARN-2005). We can add a little more logging to see this activity easily 
from the RM logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4450) TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail

2015-12-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4450:
-

 Summary: TestTimelineAuthenticationFilter and 
TestYarnConfigurationFields fail
 Key: YARN-4450
 URL: https://issues.apache.org/jira/browse/YARN-4450
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee


When I run the unit tests against the current branch, 
TestTimelineAuthenticationFilter and TestYarnConfigurationFields fail:

{noformat}
  TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » 
NullPointer
  TestTimelineAuthenticationFilter.testDelegationTokenOperations:251 » 
NullPointer
 
TestYarnConfigurationFields>TestConfigurationFieldsBase.testCompareConfigurationClassAgainstXml:429
 class org.apache.hadoop.yarn.conf.YarnConfiguration has 1 variables missing in 
yarn-default.xml
{noformat}

The latter failure is caused by YARN-4356 (when we deprecated 
RM_SYSTEM_METRICS_PUBLISHER_ENABLED), and the former an older issue that was 
caused when a later use of field {{resURI}} was added in trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off

2015-11-13 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4356:
-

 Summary: ensure the timeline service v.2 is disabled cleanly and 
has no impact when it's turned off
 Key: YARN-4356
 URL: https://issues.apache.org/jira/browse/YARN-4356
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical


For us to be able to merge the first milestone drop to trunk, we want to ensure 
that once disabled the timeline service v.2 has no impact from the server side 
to the client side. If the timeline service is not enabled, no action should be 
done. If v.1 is enabled but not v.2, v.1 should behave the same as it does 
before the merge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4350) TestDistributedShell fails

2015-11-11 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4350:
-

 Summary: TestDistributedShell fails
 Key: YARN-4350
 URL: https://issues.apache.org/jira/browse/YARN-4350
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


Currently TestDistributedShell does not pass on the feature-YARN-2928 branch. 
There seem to be 2 distinct issues.

(1) testDSShellWithoutDomainV2* tests fail sporadically
These test fail more often than not if tested by themselves:
{noformat}
testDSShellWithoutDomainV2DefaultFlow(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 30.998 sec  <<< FAILURE!
java.lang.AssertionError: Application created event should be published atleast 
once expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:451)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:326)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2DefaultFlow(TestDistributedShell.java:207)
{noformat}

They start happening after YARN-4129. I suspect this might have to do with some 
timing issue.

(2) the whole test times out
If you run the whole TestDistributedShell test, it times out without fail. This 
may or may not have to do with the port change introduced by YARN-2859 (just a 
hunch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4284) condition for AM blacklisting is too narrow

2015-10-20 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4284:
-

 Summary: condition for AM blacklisting is too narrow
 Key: YARN-4284
 URL: https://issues.apache.org/jira/browse/YARN-4284
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.8.0
Reporter: Sangjin Lee


Per YARN-2005, there is now a way to blacklist nodes for AM purposes so the 
next app attempt can be assigned to a different node.

However, currently the condition under which the node gets blacklist is limited 
to {{DISKS_FAILED}}. There are a whole host of other issues that may cause the 
failure, for which we want to locate the AM elsewhere; e.g. disks full, JVM 
crashes, memory issues, etc.

Since the AM blacklisting is per-app, there is little practical downside in 
blacklisting the nodes on *any failure* (although it might lead to blacklisting 
the node more aggressively than necessary). I would propose locating the next 
app attempt to a different node on any failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4261) fix the order of timelinereader in yarn/yarn.cmd

2015-10-13 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4261:
-

 Summary: fix the order of timelinereader in yarn/yarn.cmd
 Key: YARN-4261
 URL: https://issues.apache.org/jira/browse/YARN-4261
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Priority: Trivial


The order of the timelinereader command is not correct in yarn/yarn.cmd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4174) Fix javadoc warnings floating up from hbase

2015-09-17 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee resolved YARN-4174.
---
   Resolution: Done
Fix Version/s: YARN-2928

This ended up getting fixed as part of YARN-3901.

> Fix javadoc warnings floating up from hbase 
> 
>
> Key: YARN-4174
> URL: https://issues.apache.org/jira/browse/YARN-4174
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Sangjin Lee
>Priority: Minor
> Fix For: YARN-2928
>
>
> As part of the patch for YARN-3901, [~sjlee0]  observed some (~200) javadoc 
> warnings that are coming from hbase classes. 
> We tried a bunch of things like making the FlowRunCoprocessor class non 
> public and excluding the package from the pom. If the class in made non 
> public, the table creation has an exception.
> {code}
> 206 warnings
> [WARNING] Javadoc Warnings
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestWALObserver.class):
>  warning: Cannot find annotation method 'value()' in type 'Category': class 
> file for org.junit.experimental.categories.Category not found
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRowProcessorEndpoint.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerObserver.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test': class 
> file for org.junit.Test not found
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorEndpoint.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverScannerOpenHook.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.class):
>  warning: Cannot find annotation method 'value()' in type 'Category'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test'
> [WARNING] 
> /Users/username/.m2/repository/org/apache/hbase/hbase-server/1.0.1/hbase-server-1.0.1-tests.jar(org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.class):
>  warning: Cannot find annotation method 'timeout()' in type 'Test'
> [WARNING] 
> 

[jira] [Created] (YARN-4179) [reader implementation] support flow activity queries based on time

2015-09-17 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4179:
-

 Summary: [reader implementation] support flow activity queries 
based on time
 Key: YARN-4179
 URL: https://issues.apache.org/jira/browse/YARN-4179
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Priority: Minor


This came up as part of YARN-4074 and YARN-4075.

Currently the only query pattern that's supported on the flow activity table is 
by cluster only. But it might be useful to support queries by cluster and 
certain date or dates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4178) [storage implementation] app id as string can cause incorrect ordering

2015-09-17 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4178:
-

 Summary: [storage implementation] app id as string can cause 
incorrect ordering
 Key: YARN-4178
 URL: https://issues.apache.org/jira/browse/YARN-4178
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


Currently the app id is used in various places as part of row keys and in 
column names. However, they are treated as strings for the most part. This will 
cause a problem with ordering when the id portion of the app id rolls over to 
the next digit.

For example, "app_1234567890_100" will be considered *earlier* than 
"app_1234567890_99". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4116) refactor ColumnHelper read* methods

2015-09-04 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4116:
-

 Summary: refactor ColumnHelper read* methods
 Key: YARN-4116
 URL: https://issues.apache.org/jira/browse/YARN-4116
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee


Currently we have several ColumnHelper.read* methods that are slightly 
different in terms of the initial conditions and behave different accordingly. 
We may want to refactor them so that the code reuse is strong and also the API 
stays reasonable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-24 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4074:
-

 Summary: [timeline reader] implement support for querying for 
flows and flow runs
 Key: YARN-4074
 URL: https://issues.apache.org/jira/browse/YARN-4074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


Implement support for querying for flows and flow runs.

We should be able to query for the most recent N flows, etc.

This includes changes to the {{TimelineReader}} API if necessary, as well as 
implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-08-24 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4075:
-

 Summary: [reader REST API] implement support for querying for 
flows and flow runs
 Key: YARN-4075
 URL: https://issues.apache.org/jira/browse/YARN-4075
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4064) build is broken at TestHBaseTimelineWriterImpl.java

2015-08-19 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-4064:
-

 Summary: build is broken at TestHBaseTimelineWriterImpl.java
 Key: YARN-4064
 URL: https://issues.apache.org/jira/browse/YARN-4064
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Blocker


When YARN-4025 was committed, somehow the file rename from 
{{TestHBaseTimelineWriterImpl.java}} to {{TestHBaseTimelineStorage.java}} 
didn't happen as in the patch. As a result, the build is broken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3981) support timeline clients not associated with an application

2015-07-27 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3981:
-

 Summary: support timeline clients not associated with an 
application
 Key: YARN-3981
 URL: https://issues.apache.org/jira/browse/YARN-3981
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


In the current v.2 design, all timeline writes must belong in a 
flow/application context (cluster + user + flow + flow run + application).

But there are use cases that require writing data outside the context of an 
application. One such example is a higher level client (e.g. tez client or 
hive/oozie/cascading client) writing flow-level data that spans multiple 
applications. We need to find a way to support them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3949) ensure timely flush of timeline writes

2015-07-21 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3949:
-

 Summary: ensure timely flush of timeline writes
 Key: YARN-3949
 URL: https://issues.apache.org/jira/browse/YARN-3949
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee


Currently flushing of timeline writes is not really handled. For example, 
{{HBaseTimelineWriterImpl}} relies on HBase's {{BufferedMutator}} to batch and 
write puts asynchronously. However, {{BufferedMutator}} may not flush them to 
HBase unless the internal buffer fills up.

We do need a flush functionality first to ensure that data are written in a 
reasonably timely manner, and to be able to ensure some critical writes are 
done synchronously (e.g. key lifecycle events).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3906) split the application table from the entity table

2015-07-09 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3906:
-

 Summary: split the application table from the entity table
 Key: YARN-3906
 URL: https://issues.apache.org/jira/browse/YARN-3906
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee


Per discussions on YARN-3815, we need to split the application entities from 
the main entity table into its own table (application).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3907) create the flow-version table

2015-07-09 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3907:
-

 Summary: create the flow-version table
 Key: YARN-3907
 URL: https://issues.apache.org/jira/browse/YARN-3907
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee


Per discussions on YARN-3815, create the flow-version table that maps flow 
versions with various data about the versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3836) add equals and hashCode to TimelineEntity and other classes in the data model

2015-06-19 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3836:
-

 Summary: add equals and hashCode to TimelineEntity and other 
classes in the data model
 Key: YARN-3836
 URL: https://issues.apache.org/jira/browse/YARN-3836
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


Classes in the data model API (e.g. {{TimelineEntity}}, 
{{TimelineEntity.Identifer}}, etc.) do not override {{equals()}} or 
{{hashCode()}}. This can cause problems when these objects are used in a 
collection such as a {{HashSet}}. We should implement these methods wherever 
appropriate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3741) consider nulling member maps/sets of TimelineEntity

2015-05-28 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3741:
-

 Summary: consider nulling member maps/sets of TimelineEntity
 Key: YARN-3741
 URL: https://issues.apache.org/jira/browse/YARN-3741
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee


Currently there are multiple collection members of TimelineEntity that are 
always instantiated, regardless of whether they are used or not: info, configs, 
metrics, events, isRelatedToEntities, and relatesToEntities.

Since TimelineEntities will be created very often and in lots of cases many of 
these members will be empty, creating these empty collections is wasteful in 
terms of garbage collector pressure.

It would be good to start out with null members, and instantiate these 
collections only if they are actually used. Of course, we need to make that 
contract very clear and refactor all client code to handle that scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3721) build is broken on YARN-2928 branch due to possible dependency cycle

2015-05-26 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3721:
-

 Summary: build is broken on YARN-2928 branch due to possible 
dependency cycle
 Key: YARN-3721
 URL: https://issues.apache.org/jira/browse/YARN-3721
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Priority: Blocker


The build is broken on the YARN-2928 branch at the 
hadoop-yarn-server-timelineservice module. It's been broken for a while, but we 
didn't notice it because the build happens to work despite this if the maven 
local cache is not cleared.

To reproduce, remove all hadoop (3.0.0-SNAPSHOT) artifacts from your maven 
local cache and build it.

Almost certainly it was introduced by YARN-3529.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3634) TestMRTimelineEventHandling is broken due to timing issues

2015-05-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3634:
-

 Summary: TestMRTimelineEventHandling is broken due to timing issues
 Key: YARN-3634
 URL: https://issues.apache.org/jira/browse/YARN-3634
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Sangjin Lee


TestMRTimelineEventHandling is broken. Relevant error message:

{noformat}
2015-05-12 06:28:56,415 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 0 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:28:57,416 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 1 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:28:58,416 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 2 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:28:59,417 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 3 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:00,418 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 4 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:01,419 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 5 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:02,420 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 6 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:03,420 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 7 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:04,421 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 8 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:05,422 INFO  [AsyncDispatcher event handler] ipc.Client 
(Client.java:handleConnectionFailure(882)) - Retrying connect to server: 
asf904.gq1.ygridcore.net/67.195.81.148:0. Already tried 9 time(s); retry policy 
is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2015-05-12 06:29:05,424 ERROR [AsyncDispatcher event handler] 
collector.NodeTimelineCollectorManager 
(NodeTimelineCollectorManager.java:postPut(121)) - Failed to communicate with 
NM Collector Service for application_1431412130291_0001
2015-05-12 06:29:05,425 WARN  [AsyncDispatcher event handler] 
containermanager.AuxServices 
(AuxServices.java:logWarningWhenAuxServiceThrowExceptions(261)) - The 
auxService name is timeline_collector and it got an error at event: 
CONTAINER_INIT
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.net.ConnectException: Call From asf904.gq1.ygridcore.net/67.195.81.148 to 
asf904.gq1.ygridcore.net:0 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
at 
org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.putIfAbsent(TimelineCollectorManager.java:97)
at 
org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.addApplication(PerNodeTimelineCollectorsAuxService.java:99)
at 
org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.initializeContainer(PerNodeTimelineCollectorsAuxService.java:126)

[jira] [Created] (YARN-3616) determine how to generate YARN container events

2015-05-11 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3616:
-

 Summary: determine how to generate YARN container events
 Key: YARN-3616
 URL: https://issues.apache.org/jira/browse/YARN-3616
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee


The initial design called for the node manager to write YARN container events 
to take advantage of the distributed writes. RM acting as a sole writer of all 
YARN container events would have significant scalability problems.

Still, there are some types of events that are not captured by the NM. The 
current implementation has both: RM writing container events and NM writing 
container events.

We need to sort this out, and decide how we can write all needed container 
events in a scalable manner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3562) unit tests fail with the failure to bring up node manager

2015-04-29 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3562:
-

 Summary: unit tests fail with the failure to bring up node manager
 Key: YARN-3562
 URL: https://issues.apache.org/jira/browse/YARN-3562
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Priority: Minor


A bunch of MR unit tests are failing on our branch whenever the mini YARN 
cluster needs to bring up multiple node managers.

For example, see 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5472/testReport/org.apache.hadoop.mapred/TestClusterMapReduceTestCase/testMapReduceRestarting/

It is because the NMCollectorService is using a fixed port for the RPC (8048).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3390) Reuse TimelineCollectorManager for RM

2015-04-24 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee resolved YARN-3390.
---
   Resolution: Fixed
Fix Version/s: YARN-2928

Committed. Thanks much [~zjshen] and [~Naganarasimha] for working on the patch, 
and [~gtCarrera9] for your review!

 Reuse TimelineCollectorManager for RM
 -

 Key: YARN-3390
 URL: https://issues.apache.org/jira/browse/YARN-3390
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: YARN-2928

 Attachments: YARN-3390.1.patch, YARN-3390.2.patch, YARN-3390.3.patch, 
 YARN-3390.4.patch


 RMTimelineCollector should have the context info of each app whose entity  
 has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3512) add more fine-grained metrics that measure write performance

2015-04-20 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3512:
-

 Summary: add more fine-grained metrics that measure write 
performance
 Key: YARN-3512
 URL: https://issues.apache.org/jira/browse/YARN-3512
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee


We need more fine-grained metrics in the load testing tool that measure the 
write performance of the timeline service. Currently it only captures the 
number of writes and bytes per sec from the API point of view. But the actual 
storage implementation may turn them into many more/fewer writes to the storage 
itself. We need more fine-grained data about what's going on in terms of actual 
writes to storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3437) convert load test driver to timeline service v.2

2015-04-02 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3437:
-

 Summary: convert load test driver to timeline service v.2
 Key: YARN-3437
 URL: https://issues.apache.org/jira/browse/YARN-3437
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee


This subtask covers the work for converting the proposed patch for the load 
test driver (YARN-2556) to work with the timeline service v.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3438) add a mode to replay MR job history files to the timeline service

2015-04-02 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3438:
-

 Summary: add a mode to replay MR job history files to the timeline 
service
 Key: YARN-3438
 URL: https://issues.apache.org/jira/browse/YARN-3438
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee


The subtask covers the work on top of YARN-3437 to add a mode to replay MR job 
history files to the timeline service storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-03-27 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3411:
-

 Summary: [Storage implementation] explore the native HBase write 
schema for storage
 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Priority: Critical


There is work that's in progress to implement the storage based on a Phoenix 
schema (YARN-3134).

In parallel, we would like to explore an implementation based on a native HBase 
schema for the write path. Such a schema does not exclude using Phoenix, 
especially for reads and offline queries.

Once we have basic implementations of both options, we could evaluate them in 
terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3401) [Data Model] users should not be able to create a generic TimelineEntity and associate arbitrary type

2015-03-26 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3401:
-

 Summary: [Data Model] users should not be able to create a generic 
TimelineEntity and associate arbitrary type
 Key: YARN-3401
 URL: https://issues.apache.org/jira/browse/YARN-3401
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee


IIUC it is possible for users to create a generic TimelineEntity and set an 
arbitrary entity type. For example, for a YARN app, the right entity API is 
ApplicationEntity. However, today nothing stops users from instantiating a base 
TimelineEntity class and set the application type on it. This presents a 
problem in handling these YARN system entities in the storage layer for example.

We need to ensure that the API allows only the right type of the class to be 
created for a given entity type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3377) TestTimelineServiceClientIntegration fails

2015-03-19 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3377:
-

 Summary: TestTimelineServiceClientIntegration fails
 Key: YARN-3377
 URL: https://issues.apache.org/jira/browse/YARN-3377
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Priority: Minor


TestTimelineServiceClientIntegration fails. It appears we are getting 500 from 
the timeline collector. This appears to be mostly an issue with the test itself.

{noformat}
---
Test set: 
org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration
---
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 33.503 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration
testPutEntities(org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration)
  Time elapsed: 32.606 sec   ERROR!
org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response 
from the timeline server.
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:457)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:391)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:368)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:342)
at 
org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration.testPutEntities(TestTimelineServiceClientIntegration.java:74)
{noformat}

The relevant piece from the server side:

{noformat}
Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.PackagesResourceConfig init
INFO: Scanning for root resource and provider classes in the packages:
  org.apache.hadoop.yarn.server.timelineservice.collector
  org.apache.hadoop.yarn.webapp
  org.apache.hadoop.yarn.webapp
Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig 
logClasses
INFO: Root resource classes found:
  class org.apache.hadoop.yarn.webapp.MyTestWebService
  class 
org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService
Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig 
logClasses
INFO: Provider classes found:
  class org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider
  class org.apache.hadoop.yarn.webapp.GenericExceptionHandler
  class org.apache.hadoop.yarn.webapp.MyTestJAXBContextResolver
Mar 19, 2015 10:48:30 AM 
com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
Mar 19, 2015 10:48:31 AM 
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 
resolve
SEVERE: null
java.lang.IllegalAccessException: Class 
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can 
not access a member of class 
org.apache.hadoop.yarn.webapp.MyTestWebService$MyInfo with modifiers public
at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:95)
at java.lang.Class.newInstance0(Class.java:366)
at java.lang.Class.newInstance(Class.java:325)
at 
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467)
at 
com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181)
at 
com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81)
at 
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518)
at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124)
at 
com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104)
at 
com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120)
at 
com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98)
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at 

[jira] [Created] (YARN-3378) a load test client that can replay a volume of history files

2015-03-19 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3378:
-

 Summary: a load test client that can replay a volume of history 
files
 Key: YARN-3378
 URL: https://issues.apache.org/jira/browse/YARN-3378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee


It might be good to create a load test client that can replay a large volume of 
history files into the timeline service. One can envision running such a load 
test client as a mapreduce job and generate a fair amount of load. It would be 
useful to spot check correctness, and more importantly observe performance 
characteristic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3353) provide RPC metrics via JMX for timeline collectors and readers

2015-03-16 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3353:
-

 Summary: provide RPC metrics via JMX for timeline collectors and 
readers
 Key: YARN-3353
 URL: https://issues.apache.org/jira/browse/YARN-3353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee


We should provide RPC metrics via JMX for timeline collectors and readers. One 
challenge we may have is it might be difficult to provide a stable view for the 
metrics, given the distributed nature of the collectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3333) rename TimelineAggregator etc. to TimelineCollector

2015-03-10 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-:
-

 Summary: rename TimelineAggregator etc. to TimelineCollector
 Key: YARN-
 URL: https://issues.apache.org/jira/browse/YARN-
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee


Per discussions on YARN-2928, let's rename TimelineAggregator, etc. to 
TimelineCollector, etc.

There are also several minor issues on the current branch, which can be fixed 
as part of this:
- fixing some imports
- missing license in TestTimelineServerClientIntegration.java
- whitespaces
- missing direct dependency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3167) implement the core functionality of the base aggregator service

2015-02-10 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3167:
-

 Summary: implement the core functionality of the base aggregator 
service
 Key: YARN-3167
 URL: https://issues.apache.org/jira/browse/YARN-3167
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee


The basic skeleton of the timeline aggregator has been set up by YARN-3030. We 
need to implement the core functionality of the base aggregator service. The 
key things include

- handling the requests from clients (sync or async)
- buffering data
- handling the aggregation logic
- invoking the storage API



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3037) create HBase cluster backing storage implementation for ATS writes

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3037:
-

 Summary: create HBase cluster backing storage implementation for 
ATS writes
 Key: YARN-3037
 URL: https://issues.apache.org/jira/browse/YARN-3037
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, create a backing storage implementation for ATS writes 
based on a full HBase cluster.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3041) create the ATS entity/event API

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3041:
-

 Summary: create the ATS entity/event API
 Key: YARN-3041
 URL: https://issues.apache.org/jira/browse/YARN-3041
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, create the ATS entity and events API.

Also, as part of this JIRA, create YARN system entities (e.g. cluster, user, 
flow, flow run, YARN app, ...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3052) provide a very simple POC html ATS UI

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3052:
-

 Summary: provide a very simple POC html ATS UI
 Key: YARN-3052
 URL: https://issues.apache.org/jira/browse/YARN-3052
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


As part of accomplishing a minimum viable product, we want to be able to show 
some UI in html (however crude it is). This subtask calls for creating a 
barebones UI to do that.

This should be replaced later with a better-designed and implemented proper UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3053) review and implement for property security in ATS v.2

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3053:
-

 Summary: review and implement for property security in ATS v.2
 Key: YARN-3053
 URL: https://issues.apache.org/jira/browse/YARN-3053
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, we want to evaluate and review the system for 
security, and ensure proper security in the system.

This includes proper authentication, token management, access control, and any 
other relevant security aspects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3030:
-

 Summary: set up ATS writer with basic request serving structure 
and lifecycle
 Key: YARN-3030
 URL: https://issues.apache.org/jira/browse/YARN-3030
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, create an ATS writer as a service, and implement the 
basic service structure including the lifecycle management.

Also, as part of this JIRA, we should come up with the ATS client API for 
sending requests to this ATS writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3036) create standalone HBase backing storage implementation for ATS writes

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3036:
-

 Summary: create standalone HBase backing storage implementation 
for ATS writes
 Key: YARN-3036
 URL: https://issues.apache.org/jira/browse/YARN-3036
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, create a (default) standalone HBase backing storage 
implementation for ATS writes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3042) create ATS metrics API

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3042:
-

 Summary: create ATS metrics API
 Key: YARN-3042
 URL: https://issues.apache.org/jira/browse/YARN-3042
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, create the ATS metrics API and integrate it into the 
entities.

The concept may be based on the existing hadoop metrics, but we want to make 
sure we have something that would satisfy all ATS use cases.

It also needs to capture whether a metric should be aggregated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3038) handle ATS writer failure scenarios

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3038:
-

 Summary: handle ATS writer failure scenarios
 Key: YARN-3038
 URL: https://issues.apache.org/jira/browse/YARN-3038
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, consider various ATS writer failure scenarios, and 
implement proper handling.

For example, ATS writers may fail and exit due to OOM. It should be retried a 
certain number of times in that case. We also need to tie fatal ATS writer 
failures (after exhausting all retries) to the application failure, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3047) set up ATS reader with basic request serving structure and lifecycle

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3047:
-

 Summary: set up ATS reader with basic request serving structure 
and lifecycle
 Key: YARN-3047
 URL: https://issues.apache.org/jira/browse/YARN-3047
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2938, set up the ATS reader as a service and implement the 
basic structure as a service. It includes lifecycle management, request 
serving, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3032) implement ATS writer functionality to serve ATS readers' requests for live apps

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3032:
-

 Summary: implement ATS writer functionality to serve ATS readers' 
requests for live apps
 Key: YARN-3032
 URL: https://issues.apache.org/jira/browse/YARN-3032
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, implement the functionality in ATS writer to serve 
data for live apps coming from ATS readers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3039) implement ATS writer service discovery

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3039:
-

 Summary: implement ATS writer service discovery
 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, implement ATS writer service discovery. This is 
essential for off-node clients to send writes to the right ATS writer. This 
should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3051) create backing storage read interface for ATS readers

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3051:
-

 Summary: create backing storage read interface for ATS readers
 Key: YARN-3051
 URL: https://issues.apache.org/jira/browse/YARN-3051
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, create backing storage read interface that can be 
implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3033) implement NM starting the ATS writer companion

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3033:
-

 Summary: implement NM starting the ATS writer companion
 Key: YARN-3033
 URL: https://issues.apache.org/jira/browse/YARN-3033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, implement node managers starting the ATS writer 
companion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3045) implement NM writing container lifecycle events and container system metrics to ATS

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3045:
-

 Summary: implement NM writing container lifecycle events and 
container system metrics to ATS
 Key: YARN-3045
 URL: https://issues.apache.org/jira/browse/YARN-3045
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, implement NM writing container lifecycle events and 
container system metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3044) implement RM writing app lifecycle events to ATS

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3044:
-

 Summary: implement RM writing app lifecycle events to ATS
 Key: YARN-3044
 URL: https://issues.apache.org/jira/browse/YARN-3044
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3034) implement RM starting its ATS writer

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3034:
-

 Summary: implement RM starting its ATS writer
 Key: YARN-3034
 URL: https://issues.apache.org/jira/browse/YARN-3034
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, implement resource managers starting their own ATS 
writers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3031) create backing storage write interface for ATS writers

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3031:
-

 Summary: create backing storage write interface for ATS writers
 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, come up with the interface for the ATS writer to write 
to various backing storages. The interface should be created to capture the 
right level of abstractions so that it will enable all backing storage 
implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3040) implement client-side API for handling flows

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3040:
-

 Summary: implement client-side API for handling flows
 Key: YARN-3040
 URL: https://issues.apache.org/jira/browse/YARN-3040
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, implement client-side API for handling *flows*. 
Frameworks should be able to define and pass in all attributes of flows and 
flow runs to YARN, and they should be passed into ATS writers.

YARN tags were discussed as a way to handle this piece of information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3046) implement MapReduce AM writing some MR metrics to ATS

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3046:
-

 Summary: implement MapReduce AM writing some MR metrics to ATS
 Key: YARN-3046
 URL: https://issues.apache.org/jira/browse/YARN-3046
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes 
written) and have the MR AM write the framework-specific metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3035) create a test-only backing storage implementation for ATS writes

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3035:
-

 Summary: create a test-only backing storage implementation for ATS 
writes
 Key: YARN-3035
 URL: https://issues.apache.org/jira/browse/YARN-3035
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, create a test-only bare bone backing storage 
implementation for ATS writes.

We could consider something like a no-op or in-memory storage strictly for 
development and testing purposes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3043) create ATS configuration, metadata, etc. as part of entities

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3043:
-

 Summary: create ATS configuration, metadata, etc. as part of 
entities
 Key: YARN-3043
 URL: https://issues.apache.org/jira/browse/YARN-3043
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, create APIs for configuration, metadata, etc. and 
integrate them into entities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3049) implement existing ATS queries in the new ATS design

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3049:
-

 Summary: implement existing ATS queries in the new ATS design
 Key: YARN-3049
 URL: https://issues.apache.org/jira/browse/YARN-3049
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Implement existing ATS queries with the new ATS reader design.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3048) handle how to set up and start/stop ATS reader instances

2015-01-12 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3048:
-

 Summary: handle how to set up and start/stop ATS reader instances
 Key: YARN-3048
 URL: https://issues.apache.org/jira/browse/YARN-3048
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee


Per design in YARN-2928, come up with a way to set up and start/stop ATS reader 
instances.

This should allow setting up multiple instances and managing user traffic to 
those instances.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3007) TestNMWebServices#testContainerLogs fails intermittently

2015-01-05 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-3007:
-

 Summary: TestNMWebServices#testContainerLogs fails intermittently
 Key: YARN-3007
 URL: https://issues.apache.org/jira/browse/YARN-3007
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Minor


TestNMWebServices#testContainerLogs fails intermittently with JDK 7:

{noformat}
java.lang.AssertionError: Failed to create log dir
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices.testContainerLogs(TestNMWebServices.java:336)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3007) TestNMWebServices#testContainerLogs fails intermittently

2015-01-05 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee resolved YARN-3007.
---
Resolution: Invalid

This issue is not reproducible in 2.7.0 or in trunk. Closing.

 TestNMWebServices#testContainerLogs fails intermittently
 

 Key: YARN-3007
 URL: https://issues.apache.org/jira/browse/YARN-3007
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Minor

 TestNMWebServices#testContainerLogs fails intermittently with JDK 7:
 {noformat}
 java.lang.AssertionError: Failed to create log dir
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at 
 org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices.testContainerLogs(TestNMWebServices.java:336)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2014-12-05 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-2928:
-

 Summary: Application Timeline Server (ATS) next gen: phase 1
 Key: YARN-2928
 URL: https://issues.apache.org/jira/browse/YARN-2928
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee


We have the application timeline server implemented in yarn per YARN-1530 and 
YARN-321. Although it is a great feature, we have recognized several critical 
issues and features that need to be address.

This JIRA proposes the design and implementation changes to address those. This 
is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2774) shared cache uploader service should authorize notify calls properly

2014-10-29 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-2774:
-

 Summary: shared cache uploader service should authorize notify 
calls properly
 Key: YARN-2774
 URL: https://issues.apache.org/jira/browse/YARN-2774
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Sangjin Lee


The shared cache manager (SCM) uploader service (done in YARN-2186) currently 
does not authorize calls to notify the SCM on newly uploaded resource. Proper 
security/authorization needs to be done in this RPC call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2600) if the container is killed during localization outstanding public cache localization tasks should be cancelled

2014-09-24 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-2600:
-

 Summary: if the container is killed during localization 
outstanding public cache localization tasks should be cancelled
 Key: YARN-2600
 URL: https://issues.apache.org/jira/browse/YARN-2600
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Sangjin Lee


We came across a situation (partly related with HDFS-7005) where a large number 
of public cache localization tasks were queued in the public localizer thread 
pool but the container is killed during localization (as it went over the 
timeout).

What's not helpful in this situation is that any work item that's queued will 
still be serviced by the resource localization service which is wasteful. And 
that may further delay localization efforts of other containers.

It would be good if we can cancel the pending localization tasks when the 
container is killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2245) AM throws ClassNotFoundException with job classloader enabled if custom output format/committer is used

2014-07-01 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-2245:
-

 Summary: AM throws ClassNotFoundException with job classloader 
enabled if custom output format/committer is used
 Key: YARN-2245
 URL: https://issues.apache.org/jira/browse/YARN-2245
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee


With the job classloader enabled, the MR AM throws ClassNotFoundException if a 
custom output format class is specified.

{noformat}
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
com.foo.test.TestOutputFormat not found
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:473)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:374)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
com.foo.test.TestOutputFormat not found
at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at 
org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:469)
... 8 more
Caused by: java.lang.ClassNotFoundException: Class 
com.foo.test.TestOutputFormat not found
at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 10 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2238) filtering on UI sticks even if I move away from the page

2014-06-30 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-2238:
-

 Summary: filtering on UI sticks even if I move away from the page
 Key: YARN-2238
 URL: https://issues.apache.org/jira/browse/YARN-2238
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.4.0
Reporter: Sangjin Lee
 Attachments: filtered.png

The main data table in many web pages (RM, AM, etc.) seems to show an 
unexpected filtering behavior.

If I filter the table by typing something in the key or value field (or I 
suspect any search field), the data table gets filtered. The example I used is 
the job configuration page for a MR job. That is expected.

However, when I move away from that page and visit any other web page of the 
same type (e.g. a job configuration page), the page is rendered with the 
filtering! That is unexpected.

What's even stranger is that it does not render the filtering term. As a 
result, I have a page that's mysteriously filtered but doesn't tell me what 
it's filtering on.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-1465) define and add shared constants and utilities for the shared cache

2014-03-06 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee resolved YARN-1465.
---

Resolution: Invalid

I'll close out these JIRAs for YARN-1492, as the design has changed from the 
time these JIRAs were filed.

 define and add shared constants and utilities for the shared cache
 --

 Key: YARN-1465
 URL: https://issues.apache.org/jira/browse/YARN-1465
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Sangjin Lee
Assignee: Sangjin Lee





--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >