[jira] [Commented] (MYRIAD-153) Placeholder tasks yarn_container_* is not cleaned after yarn job is complete.

2015-11-02 Thread DarinJ (JIRA)

[ 
https://issues.apache.org/jira/browse/MYRIAD-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986651#comment-14986651
 ] 

DarinJ commented on MYRIAD-153:
---

Having a similar issue, have you looked in the stderr in your sandbox and/or 
your hadoop logs?  I've noticed this line in each task that gets stuck in 
running:

{quote}
15/11/03 03:44:25 WARN containermanager.ContainerManagerImpl: Event EventType: 
KILL_CONTAINER sent to absent container container_1446520127877_0004_01_000509
{quote}
Where container_X matches yarn_container_X.  I haven't had a 
chance to investigate further though.

> Placeholder tasks yarn_container_* is not cleaned after yarn job is complete.
> -
>
> Key: MYRIAD-153
> URL: https://issues.apache.org/jira/browse/MYRIAD-153
> Project: Myriad
>  Issue Type: Bug
>Reporter: Sarjeet Singh
> Attachments: Mesos_UI_screeshot_placeholder_tasks_running.png
>
>
> Observed the placeholder tasks for containers launched on FGS are still in 
> RUNNING state on mesos. These container tasks are not cleaned up properly 
> after job is finished completely.
> see screenshot attached for mesos UI with placeholder tasks still running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MYRIAD-164) Myriad fails to start up again after Shutdown is called

2015-11-02 Thread Swapnil Daingade (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYRIAD-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swapnil Daingade reassigned MYRIAD-164:
---

Assignee: Swapnil Daingade

> Myriad fails to start up again after Shutdown is called
> ---
>
> Key: MYRIAD-164
> URL: https://issues.apache.org/jira/browse/MYRIAD-164
> Project: Myriad
>  Issue Type: Bug
>  Components: Scheduler
>Reporter: Aashreya Ravi Shankar
>Assignee: Swapnil Daingade
>
> Myriad does not start-up after shutdown is called.
> Following is seen in the log :
> I1029 12:07:08.327148 15886 group.cpp:313] Group process
> (group(1)@10.10.101.118:33172) connected to ZooKeeper
> I1029 12:07:08.327177 15886 group.cpp:787] Syncing group operations: queue 
> size
> (joins, cancels, datas) = (0, 0, 0)
> I1029 12:07:08.327189 15886 group.cpp:385] Trying to create path '/mesos' in
> ZooKeeper
> I1029 12:07:08.328446 15909 detector.cpp:138] Detected a new leader: (id='0')
> I1029 12:07:08.328730 15889 group.cpp:656] Trying to get
> '/mesos/info_00' in ZooKeeper
> W1029 12:07:08.329212 15885 detector.cpp:444] Leading master
> master@10.10.101.118:5050 is using a Protobuf binary format when registering
> with ZooKeeper (info): this will be deprecated as of Mesos 0.24 (see
> MESOS-2340)
> I1029 12:07:08.329318 15885 detector.cpp:481] A new leading master
> (UPID=master@10.10.101.118:5050) is detected
> I1029 12:07:08.329453 15884 sched.cpp:254] New master detected at
> master@10.10.101.118:5050
> I1029 12:07:08.329644 15884 sched.cpp:264] No credentials provided. Attempting
> to register without authentication
> I1029 12:07:08.330493 15884 sched.cpp:819] Got error 'Completed framework
> attempted to re-register'
> I1029 12:07:08.330507 15884 sched.cpp:1625] Asked to abort the driver
> I1029 12:07:08.331507 15884 sched.cpp:861] Aborting framework
> '20151029-110656-1986333194-5050-10105-0001'
> As the state-store is not being cleared up on showtown, when we start it 
> again it is using the same framework ID which causes the above issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MYRIAD-164) Myriad fails to start up again after Shutdown is called

2015-11-02 Thread Aashreya Ravi Shankar (JIRA)
Aashreya Ravi Shankar created MYRIAD-164:


 Summary: Myriad fails to start up again after Shutdown is called
 Key: MYRIAD-164
 URL: https://issues.apache.org/jira/browse/MYRIAD-164
 Project: Myriad
  Issue Type: Bug
  Components: Scheduler
Reporter: Aashreya Ravi Shankar


Myriad does not start-up after shutdown is called.

Following is seen in the log :
I1029 12:07:08.327148 15886 group.cpp:313] Group process
(group(1)@10.10.101.118:33172) connected to ZooKeeper
I1029 12:07:08.327177 15886 group.cpp:787] Syncing group operations: queue size
(joins, cancels, datas) = (0, 0, 0)
I1029 12:07:08.327189 15886 group.cpp:385] Trying to create path '/mesos' in
ZooKeeper
I1029 12:07:08.328446 15909 detector.cpp:138] Detected a new leader: (id='0')
I1029 12:07:08.328730 15889 group.cpp:656] Trying to get
'/mesos/info_00' in ZooKeeper
W1029 12:07:08.329212 15885 detector.cpp:444] Leading master
master@10.10.101.118:5050 is using a Protobuf binary format when registering
with ZooKeeper (info): this will be deprecated as of Mesos 0.24 (see
MESOS-2340)
I1029 12:07:08.329318 15885 detector.cpp:481] A new leading master
(UPID=master@10.10.101.118:5050) is detected
I1029 12:07:08.329453 15884 sched.cpp:254] New master detected at
master@10.10.101.118:5050
I1029 12:07:08.329644 15884 sched.cpp:264] No credentials provided. Attempting
to register without authentication
I1029 12:07:08.330493 15884 sched.cpp:819] Got error 'Completed framework
attempted to re-register'
I1029 12:07:08.330507 15884 sched.cpp:1625] Asked to abort the driver
I1029 12:07:08.331507 15884 sched.cpp:861] Aborting framework
'20151029-110656-1986333194-5050-10105-0001'

As the state-store is not being cleared up on showtown, when we start it again 
it is using the same framework ID which causes the above issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MYRIAD-162) Myriad Not Correctly Dealing with Resources from Multiple Roles

2015-11-02 Thread DarinJ (JIRA)

[ 
https://issues.apache.org/jira/browse/MYRIAD-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985541#comment-14985541
 ] 

DarinJ commented on MYRIAD-162:
---

https://github.com/apache/incubator-myriad/pull/32

> Myriad Not Correctly Dealing with Resources from Multiple Roles
> ---
>
> Key: MYRIAD-162
> URL: https://issues.apache.org/jira/browse/MYRIAD-162
> Project: Myriad
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: Myriad 0.1.0
> Environment: Any where frameworkRole is not *
>Reporter: DarinJ
>Assignee: DarinJ
> Fix For: Myriad 0.1.0
>
>
> When using Offers that have Resources from multiple roles, one needs to use 
> the setRole(String role) method to specify which role the resource belongs 
> to.  Myriad currently doesn't do this which causes TASK_LOST, with an error 
> in the mesos-master log stating in "attempted to use cpus( * ): 1.2; mem( * 
> ): 1305.6; ports( * ): [31005-31005,31006-31006,...] greater than offered 
> cpu( * ):1, mem( * ): 1400, ports( * ): [ ... ], cpu(roleA): 3, mem(roleA): 
> 1, ports(roleA): [...].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Out of office - 11/2

2015-11-02 Thread Santosh Marella
Taking a sick day off.. will try to to respond to emails towards EOD.

--
Sent from mobile


[jira] [Closed] (MYRIAD-43) Replace com.ebay namespace with org.apache

2015-11-02 Thread Jim Klucar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYRIAD-43?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Klucar closed MYRIAD-43.

Resolution: Fixed

> Replace com.ebay namespace with org.apache
> --
>
> Key: MYRIAD-43
> URL: https://issues.apache.org/jira/browse/MYRIAD-43
> Project: Myriad
>  Issue Type: Bug
>Reporter: Adam B
>Assignee: Jim Klucar
> Fix For: Myriad 0.1.0
>
>
> For the incubator!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MYRIAD-163) Web UI component update

2015-11-02 Thread Jim Klucar (JIRA)
Jim Klucar created MYRIAD-163:
-

 Summary: Web UI component update
 Key: MYRIAD-163
 URL: https://issues.apache.org/jira/browse/MYRIAD-163
 Project: Myriad
  Issue Type: Improvement
Reporter: Jim Klucar
Assignee: Jim Klucar


The React framework is evolving, and along with it the ReactRouter and other 
components we use in the WebUI. Some work is needed to update the site to the 
latest libraries.

This is also an opportunity to update the look and feel to match the Myriad 
Apache site.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MYRIAD-156) NullPointerException from "Error in handling event type NODE_RESOURCE_UPDATE to the scheduler"

2015-11-02 Thread Swapnil Daingade (JIRA)

 [ 
https://issues.apache.org/jira/browse/MYRIAD-156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swapnil Daingade reassigned MYRIAD-156:
---

Assignee: Swapnil Daingade

> NullPointerException from "Error in handling event type NODE_RESOURCE_UPDATE 
> to the scheduler"
> --
>
> Key: MYRIAD-156
> URL: https://issues.apache.org/jira/browse/MYRIAD-156
> Project: Myriad
>  Issue Type: Bug
>Reporter: Sarjeet Singh
>Assignee: Swapnil Daingade
>
> The NPE happens where there is a node in cluster becomes unhealthy, and 
> scheduler removes them from internal data structure. However, when the node 
> heartbeats and scheduler tries to search for this node, and try to operate on 
> it, it gets nullPointerException there. Here is the code snippet where this 
> is causing NPE: 
> SchedulerNode node = getSchedulerNode(nm.getNodeID());
> the node object is Null causing the Null pointer exception.
> Here is the RM log for caused exception:
> 15/10/06 09:18:09 INFO handlers.ResourceOffersEventHandler: Offer not
> sufficient for task with, cpu: 4.4, memory: 5504.0, spindles: 4.0, ports: 996
> 15/10/06 09:18:11 FATAL resourcemanager.ResourceManager: Error in handling
> event type NODE_RESOURCE_UPDATE to the scheduler
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNodeResource(AbstractYarnScheduler.java:548)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.updateNodeResource(FairScheduler.java:1712)
> at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1293)
> at
> com.ebay.myriad.scheduler.yarn.MyriadFairScheduler.handle(MyriadFairScheduler.java:64)
> at
> com.ebay.myriad.scheduler.yarn.MyriadFairScheduler.handle(MyriadFairScheduler.java:17)
> at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:693)
> at java.lang.Thread.run(Thread.java:745)
> 15/10/06 09:18:11 INFO resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)