[jira] [Commented] (MYRIAD-153) Placeholder tasks yarn_container_* is not cleaned after yarn job is complete.
[ https://issues.apache.org/jira/browse/MYRIAD-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986651#comment-14986651 ] DarinJ commented on MYRIAD-153: --- Having a similar issue, have you looked in the stderr in your sandbox and/or your hadoop logs? I've noticed this line in each task that gets stuck in running: {quote} 15/11/03 03:44:25 WARN containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1446520127877_0004_01_000509 {quote} Where container_X matches yarn_container_X. I haven't had a chance to investigate further though. > Placeholder tasks yarn_container_* is not cleaned after yarn job is complete. > - > > Key: MYRIAD-153 > URL: https://issues.apache.org/jira/browse/MYRIAD-153 > Project: Myriad > Issue Type: Bug >Reporter: Sarjeet Singh > Attachments: Mesos_UI_screeshot_placeholder_tasks_running.png > > > Observed the placeholder tasks for containers launched on FGS are still in > RUNNING state on mesos. These container tasks are not cleaned up properly > after job is finished completely. > see screenshot attached for mesos UI with placeholder tasks still running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MYRIAD-164) Myriad fails to start up again after Shutdown is called
[ https://issues.apache.org/jira/browse/MYRIAD-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swapnil Daingade reassigned MYRIAD-164: --- Assignee: Swapnil Daingade > Myriad fails to start up again after Shutdown is called > --- > > Key: MYRIAD-164 > URL: https://issues.apache.org/jira/browse/MYRIAD-164 > Project: Myriad > Issue Type: Bug > Components: Scheduler >Reporter: Aashreya Ravi Shankar >Assignee: Swapnil Daingade > > Myriad does not start-up after shutdown is called. > Following is seen in the log : > I1029 12:07:08.327148 15886 group.cpp:313] Group process > (group(1)@10.10.101.118:33172) connected to ZooKeeper > I1029 12:07:08.327177 15886 group.cpp:787] Syncing group operations: queue > size > (joins, cancels, datas) = (0, 0, 0) > I1029 12:07:08.327189 15886 group.cpp:385] Trying to create path '/mesos' in > ZooKeeper > I1029 12:07:08.328446 15909 detector.cpp:138] Detected a new leader: (id='0') > I1029 12:07:08.328730 15889 group.cpp:656] Trying to get > '/mesos/info_00' in ZooKeeper > W1029 12:07:08.329212 15885 detector.cpp:444] Leading master > master@10.10.101.118:5050 is using a Protobuf binary format when registering > with ZooKeeper (info): this will be deprecated as of Mesos 0.24 (see > MESOS-2340) > I1029 12:07:08.329318 15885 detector.cpp:481] A new leading master > (UPID=master@10.10.101.118:5050) is detected > I1029 12:07:08.329453 15884 sched.cpp:254] New master detected at > master@10.10.101.118:5050 > I1029 12:07:08.329644 15884 sched.cpp:264] No credentials provided. Attempting > to register without authentication > I1029 12:07:08.330493 15884 sched.cpp:819] Got error 'Completed framework > attempted to re-register' > I1029 12:07:08.330507 15884 sched.cpp:1625] Asked to abort the driver > I1029 12:07:08.331507 15884 sched.cpp:861] Aborting framework > '20151029-110656-1986333194-5050-10105-0001' > As the state-store is not being cleared up on showtown, when we start it > again it is using the same framework ID which causes the above issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MYRIAD-164) Myriad fails to start up again after Shutdown is called
Aashreya Ravi Shankar created MYRIAD-164: Summary: Myriad fails to start up again after Shutdown is called Key: MYRIAD-164 URL: https://issues.apache.org/jira/browse/MYRIAD-164 Project: Myriad Issue Type: Bug Components: Scheduler Reporter: Aashreya Ravi Shankar Myriad does not start-up after shutdown is called. Following is seen in the log : I1029 12:07:08.327148 15886 group.cpp:313] Group process (group(1)@10.10.101.118:33172) connected to ZooKeeper I1029 12:07:08.327177 15886 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I1029 12:07:08.327189 15886 group.cpp:385] Trying to create path '/mesos' in ZooKeeper I1029 12:07:08.328446 15909 detector.cpp:138] Detected a new leader: (id='0') I1029 12:07:08.328730 15889 group.cpp:656] Trying to get '/mesos/info_00' in ZooKeeper W1029 12:07:08.329212 15885 detector.cpp:444] Leading master master@10.10.101.118:5050 is using a Protobuf binary format when registering with ZooKeeper (info): this will be deprecated as of Mesos 0.24 (see MESOS-2340) I1029 12:07:08.329318 15885 detector.cpp:481] A new leading master (UPID=master@10.10.101.118:5050) is detected I1029 12:07:08.329453 15884 sched.cpp:254] New master detected at master@10.10.101.118:5050 I1029 12:07:08.329644 15884 sched.cpp:264] No credentials provided. Attempting to register without authentication I1029 12:07:08.330493 15884 sched.cpp:819] Got error 'Completed framework attempted to re-register' I1029 12:07:08.330507 15884 sched.cpp:1625] Asked to abort the driver I1029 12:07:08.331507 15884 sched.cpp:861] Aborting framework '20151029-110656-1986333194-5050-10105-0001' As the state-store is not being cleared up on showtown, when we start it again it is using the same framework ID which causes the above issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MYRIAD-162) Myriad Not Correctly Dealing with Resources from Multiple Roles
[ https://issues.apache.org/jira/browse/MYRIAD-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985541#comment-14985541 ] DarinJ commented on MYRIAD-162: --- https://github.com/apache/incubator-myriad/pull/32 > Myriad Not Correctly Dealing with Resources from Multiple Roles > --- > > Key: MYRIAD-162 > URL: https://issues.apache.org/jira/browse/MYRIAD-162 > Project: Myriad > Issue Type: Bug > Components: Scheduler >Affects Versions: Myriad 0.1.0 > Environment: Any where frameworkRole is not * >Reporter: DarinJ >Assignee: DarinJ > Fix For: Myriad 0.1.0 > > > When using Offers that have Resources from multiple roles, one needs to use > the setRole(String role) method to specify which role the resource belongs > to. Myriad currently doesn't do this which causes TASK_LOST, with an error > in the mesos-master log stating in "attempted to use cpus( * ): 1.2; mem( * > ): 1305.6; ports( * ): [31005-31005,31006-31006,...] greater than offered > cpu( * ):1, mem( * ): 1400, ports( * ): [ ... ], cpu(roleA): 3, mem(roleA): > 1, ports(roleA): [...]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Out of office - 11/2
Taking a sick day off.. will try to to respond to emails towards EOD. -- Sent from mobile
[jira] [Closed] (MYRIAD-43) Replace com.ebay namespace with org.apache
[ https://issues.apache.org/jira/browse/MYRIAD-43?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Klucar closed MYRIAD-43. Resolution: Fixed > Replace com.ebay namespace with org.apache > -- > > Key: MYRIAD-43 > URL: https://issues.apache.org/jira/browse/MYRIAD-43 > Project: Myriad > Issue Type: Bug >Reporter: Adam B >Assignee: Jim Klucar > Fix For: Myriad 0.1.0 > > > For the incubator! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MYRIAD-163) Web UI component update
Jim Klucar created MYRIAD-163: - Summary: Web UI component update Key: MYRIAD-163 URL: https://issues.apache.org/jira/browse/MYRIAD-163 Project: Myriad Issue Type: Improvement Reporter: Jim Klucar Assignee: Jim Klucar The React framework is evolving, and along with it the ReactRouter and other components we use in the WebUI. Some work is needed to update the site to the latest libraries. This is also an opportunity to update the look and feel to match the Myriad Apache site. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MYRIAD-156) NullPointerException from "Error in handling event type NODE_RESOURCE_UPDATE to the scheduler"
[ https://issues.apache.org/jira/browse/MYRIAD-156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swapnil Daingade reassigned MYRIAD-156: --- Assignee: Swapnil Daingade > NullPointerException from "Error in handling event type NODE_RESOURCE_UPDATE > to the scheduler" > -- > > Key: MYRIAD-156 > URL: https://issues.apache.org/jira/browse/MYRIAD-156 > Project: Myriad > Issue Type: Bug >Reporter: Sarjeet Singh >Assignee: Swapnil Daingade > > The NPE happens where there is a node in cluster becomes unhealthy, and > scheduler removes them from internal data structure. However, when the node > heartbeats and scheduler tries to search for this node, and try to operate on > it, it gets nullPointerException there. Here is the code snippet where this > is causing NPE: > SchedulerNode node = getSchedulerNode(nm.getNodeID()); > the node object is Null causing the Null pointer exception. > Here is the RM log for caused exception: > 15/10/06 09:18:09 INFO handlers.ResourceOffersEventHandler: Offer not > sufficient for task with, cpu: 4.4, memory: 5504.0, spindles: 4.0, ports: 996 > 15/10/06 09:18:11 FATAL resourcemanager.ResourceManager: Error in handling > event type NODE_RESOURCE_UPDATE to the scheduler > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNodeResource(AbstractYarnScheduler.java:548) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.updateNodeResource(FairScheduler.java:1712) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1293) > at > com.ebay.myriad.scheduler.yarn.MyriadFairScheduler.handle(MyriadFairScheduler.java:64) > at > com.ebay.myriad.scheduler.yarn.MyriadFairScheduler.handle(MyriadFairScheduler.java:17) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:693) > at java.lang.Thread.run(Thread.java:745) > 15/10/06 09:18:11 INFO resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)