[jira] [Created] (MESOS-2530) Alloc-dealloc-mismatch in OsSendfileTest.sendfile

2015-03-23 Thread Joerg Schad (JIRA)
Joerg Schad created MESOS-2530:
--

 Summary:  Alloc-dealloc-mismatch in OsSendfileTest.sendfile
 Key: MESOS-2530
 URL: https://issues.apache.org/jira/browse/MESOS-2530
 Project: Mesos
  Issue Type: Bug
Reporter: Joerg Schad
Assignee: Joerg Schad


GCC's AdressSanitizer stumbled acrosss the following issue (thanks [~tillt]):

{noformat}
[--] 1 test from OsSendfileTest
[ RUN  ] OsSendfileTest.sendfile
=
==65404== ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs 
operator delete) on 0x60
30fe40
#0 0x2b8d6acc99da (/usr/lib/x86_64-linux-gnu/libasan.so.0.0.0+0x119da)
#1 0x52df06 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x52df06)
#2 0x593e59 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x593e59)
#3 0x58bd83 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x58bd83)
#4 0x567561 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x567561)
#5 0x568049 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x568049)
#6 0x5688a4 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x5688a4)
#7 0x56fb6e 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x56fb6e)
#8 0x595713 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x595713)
#9 0x58d2e3 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x58d2e3)
#10 0x56e0bb 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x56e0bb)
#11 0x4ca74b 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x4ca74b)
#12 0x2b8d6f385ec4 (/lib/x86_64-linux-gnu/libc-2.19.so+0x21ec4)
0x6030fe40 is located 0 bytes inside of 446-byte region 
[0x6030fe40,0x6030fffe)
allocated by thread T0 here:
#0 0x2b8d6acc988a (/usr/lib/x86_64-linux-gnu/libasan.so.0.0.0+0x1188a)
#1 0x52dba6 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x52dba6)
#2 0x593e59 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x593e59)
#3 0x58bd83 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x58bd83)
#4 0x567561 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x567561)
#5 0x568049 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x568049)
#6 0x5688a4 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x5688a4)
#7 0x56fb6e 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x56fb6e)
#8 0x595713 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x595713)
#9 0x58d2e3 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x58d2e3)
#10 0x56e0bb 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x56e0bb)
#11 0x4ca74b 
(/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty/stout-tests+0x4ca74b)
#12 0x2b8d6f385ec4 (/lib/x86_64-linux-gnu/libc-2.19.so+0x21ec4)
==65404== HINT: if you don't care about these warnings you may set 
ASAN_OPTIONS=alloc_dealloc_mismatch=0
==65404== ABORTING
make[7]: *** [check-local] Error 1
make[7]: Leaving directory 
`/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty'
make[6]: *** [check-am] Error 2
make[6]: Leaving directory 
`/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty'
make[5]: *** [check-recursive] Error 1
make[5]: Leaving directory 
`/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty'
make[4]: *** [check] Error 2
make[4]: Leaving directory 
`/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess/3rdparty'
make[3]: *** [check-recursive] Error 1
make[3]: Leaving directory 
`/mnt/hgfs/till/Development/mesos-private/build/3rdparty/libprocess'
make[2]: *** [check-recursive] Error 1
make[2]: Leaving directory 
`/mnt/hgfs/till/Development/mesos-private/build/3rdparty'
make[1]: *** [check] Error 2
make[1]: Leaving directory 
`/mnt/hgfs/till/Development/mesos-private/build/3rdparty'
make: *** [check-recursive] Error 1
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2531) Libmesos terminates JVM

2015-03-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michał Kiędyś updated MESOS-2531:
-
Environment: (was: Mesos #a12242b
Marathon #6decf76

java version 1.8.0
Java(TM) SE Runtime Environment (build 1.8.0-b132)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)

System Software Overview:
System Version: OS X 10.10.2 (14C109)
Kernel Version: Darwin 14.1.0
Secure Virtual Memory: Enabled
Time since boot: 13 days 11:02)

 Libmesos terminates JVM
 ---

 Key: MESOS-2531
 URL: https://issues.apache.org/jira/browse/MESOS-2531
 Project: Mesos
  Issue Type: Bug
  Components: java api
Affects Versions: 0.23.0
Reporter: Michał Kiędyś

 I have build Mesos from scratch using code available on GitHub, revision 
 #a12242b.
 My Mesos cluster runs on MacOS Yosemite and consists of one master and three 
 slaves - all running on the same computer but on different ports. ZooKeeper 
 runs also on the same computer.
 Later on I compiled Marathon also using latest version from GitHub, revision 
 #6decf76. Marathon uses same ZooKeeper instance and successfully connects to 
 Mesos cluster.
 After deploying simple application that runs sleep command for 120 seconds 
 and scaling that application to ten my Marathon died killed by JVM after 
 SIGSEGV in libmesos-0.23.0.dylib.
 {noformat}
 [2015-03-23 15:47:17,872] INFO Computed new deployment plan: 
 DeploymentPlan(2015-03-23T14:47:17.823Z, (Step(List(Scale(App(/bar, 
 Some(sleep 120))), 10) (mesosphere.marathon.upgrade.DeploymentPlan$:263)
 [2015-03-23 15:47:17,876] INFO Deployment acknowledged. Waiting to get 
 processed: DeploymentPlan(2015-03-23T14:47:17.823Z, 
 (Step(List(Scale(App(/bar, Some(sleep 120))), 10) 
 (mesosphere.marathon.state.GroupManager:142)
 [2015-03-23 15:47:17,877] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:17 +] 
 PUT /v2/apps//bar HTTP/1.1 200 92 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:17,918] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:17 +] 
 GET /v2/apps//bar/versions HTTP/1.1 200 68 http://127.0.0.1:8080/; 
 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, 
 like Gecko) Chrome/41.0.2272.89 Safari/537.36 
 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:20,722] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:20 +] 
 GET /v2/apps HTTP/1.1 200 592 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:20,782] INFO Received status update for task 
 bar.82501637-d16b-11e4-b7fa-aa4dda3d2dbb: TASK_RUNNING () 
 (mesosphere.marathon.MarathonScheduler:149)
 [2015-03-23 15:47:20,790] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:20 +] 
 GET /v2/deployments HTTP/1.1 200 256 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x00012ec946f7, pid=98294, tid=27651
 #
 # JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode bsd-amd64 
 compressed oops)
 # Problematic frame:
 # C  [libmesos-0.23.0.dylib+0x7836f7]  
 process::Futuremesos::internal::state::Variable::isFailed() const+0x17
 #
 # Failed to write core dump. Core dumps have been disabled. To enable core 
 dumping, try ulimit -c unlimited before starting Java again
 #
 # An error report file with more information is saved as:
 # /Users/mkiedys/Downloads/MESOS/marathon/hs_err_pid98294.log
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.sun.com/bugreport/crash.jsp
 # The crash happened outside the Java Virtual Machine in native code.
 # See problematic frame for where to report the bug.
 #
 Abort trap: 6
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2531) Libmesos terminates JVM

2015-03-23 Thread JIRA
Michał Kiędyś created MESOS-2531:


 Summary: Libmesos terminates JVM
 Key: MESOS-2531
 URL: https://issues.apache.org/jira/browse/MESOS-2531
 Project: Mesos
  Issue Type: Bug
  Components: java api
Affects Versions: 0.23.0
 Environment: Mesos #a12242b
Marathon #6decf76

java version 1.8.0
Java(TM) SE Runtime Environment (build 1.8.0-b132)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)

System Software Overview:
System Version: OS X 10.10.2 (14C109)
Kernel Version: Darwin 14.1.0
Secure Virtual Memory: Enabled
Time since boot: 13 days 11:02
Reporter: Michał Kiędyś


I have build Mesos from scratch using code available on GitHub, revision 
#a12242b.

My Mesos cluster runs on MacOS Yosemite and consists of one master and three 
slaves - all running on the same computer but on different ports. ZooKeeper 
runs also on the same computer.

Later on I compiled Marathon also using latest version from GitHub, revision 
#6decf76. Marathon uses same ZooKeeper instance and successfully connects to 
Mesos cluster.

After deploying simple application that runs sleep command for 120 seconds and 
scaling that application to ten my Marathon died killed by JVM after SIGSEGV in 
libmesos-0.23.0.dylib.

{noformat}
[2015-03-23 15:47:17,872] INFO Computed new deployment plan: 
DeploymentPlan(2015-03-23T14:47:17.823Z, (Step(List(Scale(App(/bar, Some(sleep 
120))), 10) (mesosphere.marathon.upgrade.DeploymentPlan$:263)
[2015-03-23 15:47:17,876] INFO Deployment acknowledged. Waiting to get 
processed: DeploymentPlan(2015-03-23T14:47:17.823Z, (Step(List(Scale(App(/bar, 
Some(sleep 120))), 10) (mesosphere.marathon.state.GroupManager:142)
[2015-03-23 15:47:17,877] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:17 +] 
PUT /v2/apps//bar HTTP/1.1 200 92 http://127.0.0.1:8080/; Mozilla/5.0 
(Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
[2015-03-23 15:47:17,918] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:17 +] 
GET /v2/apps//bar/versions HTTP/1.1 200 68 http://127.0.0.1:8080/; 
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, 
like Gecko) Chrome/41.0.2272.89 Safari/537.36 
(mesosphere.chaos.http.ChaosRequestLog:15)
[2015-03-23 15:47:20,722] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:20 +] 
GET /v2/apps HTTP/1.1 200 592 http://127.0.0.1:8080/; Mozilla/5.0 
(Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
[2015-03-23 15:47:20,782] INFO Received status update for task 
bar.82501637-d16b-11e4-b7fa-aa4dda3d2dbb: TASK_RUNNING () 
(mesosphere.marathon.MarathonScheduler:149)
[2015-03-23 15:47:20,790] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:20 +] 
GET /v2/deployments HTTP/1.1 200 256 http://127.0.0.1:8080/; Mozilla/5.0 
(Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00012ec946f7, pid=98294, tid=27651
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode bsd-amd64 
compressed oops)
# Problematic frame:
# C  [libmesos-0.23.0.dylib+0x7836f7]  
process::Futuremesos::internal::state::Variable::isFailed() const+0x17
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try ulimit -c unlimited before starting Java again
#
# An error report file with more information is saved as:
# /Users/mkiedys/Downloads/MESOS/marathon/hs_err_pid98294.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Abort trap: 6
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2531) Libmesos terminates JVM

2015-03-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michał Kiędyś updated MESOS-2531:
-
Description: 
I have build Mesos from scratch using code available on GitHub, revision 
#a12242b.

My Mesos cluster runs on MacOS Yosemite and consists of one master and three 
slaves - all running on the same computer but on different ports. ZooKeeper 
runs also on the same computer.

Later on I compiled Marathon also using latest version from GitHub, revision 
#6decf76. Marathon uses same ZooKeeper instance and successfully connects to 
Mesos cluster.

After deploying simple application that runs sleep command for 120 seconds and 
scaling that application to ten my Marathon died killed by JVM after SIGSEGV in 
libmesos-0.23.0.dylib.

{noformat}
[2015-03-23 15:47:17,872] INFO Computed new deployment plan: 
DeploymentPlan(2015-03-23T14:47:17.823Z, (Step(List(Scale(App(/bar, Some(sleep 
120))), 10) (mesosphere.marathon.upgrade.DeploymentPlan$:263)
[2015-03-23 15:47:17,876] INFO Deployment acknowledged. Waiting to get 
processed: DeploymentPlan(2015-03-23T14:47:17.823Z, (Step(List(Scale(App(/bar, 
Some(sleep 120))), 10) (mesosphere.marathon.state.GroupManager:142)
[2015-03-23 15:47:17,877] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:17 +] 
PUT /v2/apps//bar HTTP/1.1 200 92 http://127.0.0.1:8080/; Mozilla/5.0 
(Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
[2015-03-23 15:47:17,918] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:17 +] 
GET /v2/apps//bar/versions HTTP/1.1 200 68 http://127.0.0.1:8080/; 
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, 
like Gecko) Chrome/41.0.2272.89 Safari/537.36 
(mesosphere.chaos.http.ChaosRequestLog:15)
[2015-03-23 15:47:20,722] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:20 +] 
GET /v2/apps HTTP/1.1 200 592 http://127.0.0.1:8080/; Mozilla/5.0 
(Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
[2015-03-23 15:47:20,782] INFO Received status update for task 
bar.82501637-d16b-11e4-b7fa-aa4dda3d2dbb: TASK_RUNNING () 
(mesosphere.marathon.MarathonScheduler:149)
[2015-03-23 15:47:20,790] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:20 +] 
GET /v2/deployments HTTP/1.1 200 256 http://127.0.0.1:8080/; Mozilla/5.0 
(Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00012ec946f7, pid=98294, tid=27651
#
# JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode bsd-amd64 
compressed oops)
# Problematic frame:
# C  [libmesos-0.23.0.dylib+0x7836f7]  
process::Futuremesos::internal::state::Variable::isFailed() const+0x17
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try ulimit -c unlimited before starting Java again
#
# An error report file with more information is saved as:
# /Users/mkiedys/Downloads/MESOS/marathon/hs_err_pid98294.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Abort trap: 6
{noformat}

Mesos #a12242b
Marathon #6decf76

java version 1.8.0
Java(TM) SE Runtime Environment (build 1.8.0-b132)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)

System Software Overview:
System Version: OS X 10.10.2 (14C109)
Kernel Version: Darwin 14.1.0
Secure Virtual Memory: Enabled
Time since boot: 13 days 11:02

  was:
I have build Mesos from scratch using code available on GitHub, revision 
#a12242b.

My Mesos cluster runs on MacOS Yosemite and consists of one master and three 
slaves - all running on the same computer but on different ports. ZooKeeper 
runs also on the same computer.

Later on I compiled Marathon also using latest version from GitHub, revision 
#6decf76. Marathon uses same ZooKeeper instance and successfully connects to 
Mesos cluster.

After deploying simple application that runs sleep command for 120 seconds and 
scaling that application to ten my Marathon died killed by JVM after SIGSEGV in 
libmesos-0.23.0.dylib.

{noformat}
[2015-03-23 15:47:17,872] INFO Computed new deployment plan: 
DeploymentPlan(2015-03-23T14:47:17.823Z, (Step(List(Scale(App(/bar, Some(sleep 
120))), 10) (mesosphere.marathon.upgrade.DeploymentPlan$:263)
[2015-03-23 15:47:17,876] INFO Deployment acknowledged. Waiting to get 
processed: DeploymentPlan(2015-03-23T14:47:17.823Z, (Step(List(Scale(App(/bar, 
Some(sleep 120))), 

[jira] [Issue Comment Deleted] (MESOS-2531) Libmesos terminates JVM

2015-03-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michał Kiędyś updated MESOS-2531:
-
Comment: was deleted

(was: Error report file with more information)

 Libmesos terminates JVM
 ---

 Key: MESOS-2531
 URL: https://issues.apache.org/jira/browse/MESOS-2531
 Project: Mesos
  Issue Type: Bug
  Components: java api
Affects Versions: 0.23.0
Reporter: Michał Kiędyś
 Attachments: hs_err_pid98294.log


 I have build Mesos from scratch using code available on GitHub, revision 
 #a12242b.
 My Mesos cluster runs on MacOS and consists of one master and three slaves - 
 all running on the same computer but on different ports. ZooKeeper runs also 
 on the same computer.
 Later on I compiled Marathon also using latest version from GitHub, revision 
 #6decf76. Marathon uses same ZooKeeper instance and successfully connects to 
 Mesos cluster.
 After deploying simple application that runs {{sleep}} command for 120 
 seconds and scaling that application to ten my Marathon crushed killed by JVM 
 after SIGSEGV in libmesos-0.23.0.dylib.
 h4. Log
 {noformat}
 [2015-03-23 15:47:17,872] INFO Computed new deployment plan: 
 DeploymentPlan(2015-03-23T14:47:17.823Z, (Step(List(Scale(App(/bar, 
 Some(sleep 120))), 10) (mesosphere.marathon.upgrade.DeploymentPlan$:263)
 [2015-03-23 15:47:17,876] INFO Deployment acknowledged. Waiting to get 
 processed: DeploymentPlan(2015-03-23T14:47:17.823Z, 
 (Step(List(Scale(App(/bar, Some(sleep 120))), 10) 
 (mesosphere.marathon.state.GroupManager:142)
 [2015-03-23 15:47:17,877] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:17 +] 
 PUT /v2/apps//bar HTTP/1.1 200 92 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:17,918] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:17 +] 
 GET /v2/apps//bar/versions HTTP/1.1 200 68 http://127.0.0.1:8080/; 
 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, 
 like Gecko) Chrome/41.0.2272.89 Safari/537.36 
 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:20,722] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:20 +] 
 GET /v2/apps HTTP/1.1 200 592 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:20,782] INFO Received status update for task 
 bar.82501637-d16b-11e4-b7fa-aa4dda3d2dbb: TASK_RUNNING () 
 (mesosphere.marathon.MarathonScheduler:149)
 [2015-03-23 15:47:20,790] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:20 +] 
 GET /v2/deployments HTTP/1.1 200 256 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x00012ec946f7, pid=98294, tid=27651
 #
 # JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode bsd-amd64 
 compressed oops)
 # Problematic frame:
 # C  [libmesos-0.23.0.dylib+0x7836f7]  
 process::Futuremesos::internal::state::Variable::isFailed() const+0x17
 #
 # Failed to write core dump. Core dumps have been disabled. To enable core 
 dumping, try ulimit -c unlimited before starting Java again
 #
 # An error report file with more information is saved as:
 # /Users/mkiedys/Downloads/MESOS/marathon/hs_err_pid98294.log
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.sun.com/bugreport/crash.jsp
 # The crash happened outside the Java Virtual Machine in native code.
 # See problematic frame for where to report the bug.
 #
 Abort trap: 6
 {noformat}
 h4. Java
 java version 1.8.0
 Java(TM) SE Runtime Environment (build 1.8.0-b132)
 Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)
 h4. System Software Overview
 - System Version: OS X 10.10.2 (14C109)
 - Kernel Version: Darwin 14.1.0
 - Secure Virtual Memory: Enabled
 - Time since boot: 13 days 11:02



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2531) Libmesos terminates JVM

2015-03-23 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michał Kiędyś updated MESOS-2531:
-
Attachment: hs_err_pid98294.log

Error report file with more information

 Libmesos terminates JVM
 ---

 Key: MESOS-2531
 URL: https://issues.apache.org/jira/browse/MESOS-2531
 Project: Mesos
  Issue Type: Bug
  Components: java api
Affects Versions: 0.23.0
Reporter: Michał Kiędyś
 Attachments: hs_err_pid98294.log


 I have build Mesos from scratch using code available on GitHub, revision 
 #a12242b.
 My Mesos cluster runs on MacOS and consists of one master and three slaves - 
 all running on the same computer but on different ports. ZooKeeper runs also 
 on the same computer.
 Later on I compiled Marathon also using latest version from GitHub, revision 
 #6decf76. Marathon uses same ZooKeeper instance and successfully connects to 
 Mesos cluster.
 After deploying simple application that runs {{sleep}} command for 120 
 seconds and scaling that application to ten my Marathon crushed killed by JVM 
 after SIGSEGV in libmesos-0.23.0.dylib.
 h4. Log
 {noformat}
 [2015-03-23 15:47:17,872] INFO Computed new deployment plan: 
 DeploymentPlan(2015-03-23T14:47:17.823Z, (Step(List(Scale(App(/bar, 
 Some(sleep 120))), 10) (mesosphere.marathon.upgrade.DeploymentPlan$:263)
 [2015-03-23 15:47:17,876] INFO Deployment acknowledged. Waiting to get 
 processed: DeploymentPlan(2015-03-23T14:47:17.823Z, 
 (Step(List(Scale(App(/bar, Some(sleep 120))), 10) 
 (mesosphere.marathon.state.GroupManager:142)
 [2015-03-23 15:47:17,877] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:17 +] 
 PUT /v2/apps//bar HTTP/1.1 200 92 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:17,918] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:17 +] 
 GET /v2/apps//bar/versions HTTP/1.1 200 68 http://127.0.0.1:8080/; 
 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, 
 like Gecko) Chrome/41.0.2272.89 Safari/537.36 
 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:20,722] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:20 +] 
 GET /v2/apps HTTP/1.1 200 592 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:20,782] INFO Received status update for task 
 bar.82501637-d16b-11e4-b7fa-aa4dda3d2dbb: TASK_RUNNING () 
 (mesosphere.marathon.MarathonScheduler:149)
 [2015-03-23 15:47:20,790] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:20 +] 
 GET /v2/deployments HTTP/1.1 200 256 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x00012ec946f7, pid=98294, tid=27651
 #
 # JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode bsd-amd64 
 compressed oops)
 # Problematic frame:
 # C  [libmesos-0.23.0.dylib+0x7836f7]  
 process::Futuremesos::internal::state::Variable::isFailed() const+0x17
 #
 # Failed to write core dump. Core dumps have been disabled. To enable core 
 dumping, try ulimit -c unlimited before starting Java again
 #
 # An error report file with more information is saved as:
 # /Users/mkiedys/Downloads/MESOS/marathon/hs_err_pid98294.log
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.sun.com/bugreport/crash.jsp
 # The crash happened outside the Java Virtual Machine in native code.
 # See problematic frame for where to report the bug.
 #
 Abort trap: 6
 {noformat}
 h4. Java
 java version 1.8.0
 Java(TM) SE Runtime Environment (build 1.8.0-b132)
 Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)
 h4. System Software Overview
 - System Version: OS X 10.10.2 (14C109)
 - Kernel Version: Darwin 14.1.0
 - Secure Virtual Memory: Enabled
 - Time since boot: 13 days 11:02



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2491) Persist the reservation state on the slave

2015-03-23 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375991#comment-14375991
 ] 

Michael Park commented on MESOS-2491:
-

[r32398|https://reviews.apache.org/r/32398/]

 Persist the reservation state on the slave
 --

 Key: MESOS-2491
 URL: https://issues.apache.org/jira/browse/MESOS-2491
 Project: Mesos
  Issue Type: Task
  Components: master, slave
Reporter: Michael Park
Assignee: Michael Park
  Labels: mesosphere

 h3. Goal
 The goal for this task is to persist the reservation state stored on the 
 master on the corresponding slave. The {{needCheckpointing}} predicate is 
 used to capture the condition for which a resource needs to be checkpointed. 
 Currently the only condition is {{isPersistentVolume}}. We'll update this to 
 include dynamically reserved resources.
 h3. Expected Outcome
 * The dynamically reserved resources will be persisted on the slave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2475) Add the Resource::ReservationInfo protobuf message

2015-03-23 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-2475:

Description: 
The {{Resource::ReservationInfo}} protobuf message encapsulates information 
needed to keep track of reservations. It's named {{ReservationInfo}} rather 
than {{Reservation}} to keep consistency with {{Resource::DiskInfo}}.

Here's essentially what it will look like in the end:

{code}
message ReservationInfo {
  // If this is set, it means that the resource is reserved for this particular
  // framework. Otherwise, the resource is reserved for the role.
  optional FrameworkID framework_id;

  // Indicates the principal of the operator or framework that created the
  // reservation. This is used to determine whether this resource can be 
  // unreserved by an operator or a framework by checking the
  // unreserve ACL.
  required string principal;

  // Anyone can set this ID at the time of reservation in order to keep track.
  optional string id;
}

// If this is set, this resource was dynamically reserved by an
// operator or a framework. Otherwise, this resource was
// statically configured by an operator via the --resources flag.
optional ReservationInfo reservation;
{code}

In v1, we'll only need to introduce {{framework_id}}. {{principal}} will be 
introduced along with the unreserved ACLs and {{id}} may be introduced in the 
future.

  was:
The {{Resource::ReservationInfo}} protobuf message encapsulates information 
needed to keep track of reservations. It's named {{ReservationInfo}} rather 
than {{Reservation}} to keep consistency with {{Resource::DiskInfo}}.

Here's essentially what it will look like in the end:

{code}
message ReservationInfo {
  // If this is set, it means that the resource is reserved for this particular
  // framework. Otherwise, the resource is reserved for the role.
  optional FrameworkID framework_id;

  // Indicates the principal of the operator or framework that created the
  // reservation. This is used to determine whether this resource can be 
  // unreserved by an operator or a framework by checking the
  // unreserve ACL.
  required string principal;

  // Anyone can set this ID at the time of reservation in order to keep track.
  optional string id;
}

// If this is set, this resource was dynamically reserved by an operator or 
// a framework. Otherwise, this resource was static configured by an
// operator via the --resources flag.
optional ReservationInfo reservation;
{code}

In v1, we'll only need to introduce {{framework_id}}. {{principal}} will be 
introduced along with the unreserved ACLs and {{id}} may be introduced in the 
future.


 Add the Resource::ReservationInfo protobuf message
 --

 Key: MESOS-2475
 URL: https://issues.apache.org/jira/browse/MESOS-2475
 Project: Mesos
  Issue Type: Technical task
Reporter: Michael Park
Assignee: Michael Park
  Labels: mesosphere

 The {{Resource::ReservationInfo}} protobuf message encapsulates information 
 needed to keep track of reservations. It's named {{ReservationInfo}} rather 
 than {{Reservation}} to keep consistency with {{Resource::DiskInfo}}.
 Here's essentially what it will look like in the end:
 {code}
 message ReservationInfo {
   // If this is set, it means that the resource is reserved for this 
 particular
   // framework. Otherwise, the resource is reserved for the role.
   optional FrameworkID framework_id;
   // Indicates the principal of the operator or framework that created the
   // reservation. This is used to determine whether this resource can be 
   // unreserved by an operator or a framework by checking the
   // unreserve ACL.
   required string principal;
   // Anyone can set this ID at the time of reservation in order to keep track.
   optional string id;
 }
 // If this is set, this resource was dynamically reserved by an
 // operator or a framework. Otherwise, this resource was
 // statically configured by an operator via the --resources flag.
 optional ReservationInfo reservation;
 {code}
 In v1, we'll only need to introduce {{framework_id}}. {{principal}} will be 
 introduced along with the unreserved ACLs and {{id}} may be introduced in 
 the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2205) Add user documentation for reservations

2015-03-23 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377336#comment-14377336
 ] 

Michael Park commented on MESOS-2205:
-

[~nnielsen]: For collaboration I was thinking that the comment section on the 
gist might suffice, but if not I can move it to a google doc. I wrote it out in 
markdown because I would like to land it as an arch doc in the repo. I actually 
don't know which wiki you're referring to here. Anyway, do you think I should 
move it to a google doc?

 Add user documentation for reservations
 ---

 Key: MESOS-2205
 URL: https://issues.apache.org/jira/browse/MESOS-2205
 Project: Mesos
  Issue Type: Documentation
  Components: documentation, framework
Reporter: Michael Park
Assignee: Michael Park
  Labels: mesosphere

 Add a user guide for reservations which describes basic usage of them, how 
 ACLs are used to specify who can unreserve whose resources, and few advanced 
 usage cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2353) Improve performance of the master's state.json endpoint for large clusters.

2015-03-23 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376714#comment-14376714
 ] 

Benjamin Mahler commented on MESOS-2353:


[~alex-mesos] Are you planning to add the moves as well?

 Improve performance of the master's state.json endpoint for large clusters.
 ---

 Key: MESOS-2353
 URL: https://issues.apache.org/jira/browse/MESOS-2353
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Benjamin Mahler
  Labels: newbie, twitter

 The master's state.json endpoint consistently takes a long time to compute 
 the JSON result, for large clusters:
 {noformat}
 $ time curl -s -o /dev/null localhost:5050/master/state.json
 Mon Jan 26 22:38:50 UTC 2015
 real  0m13.174s
 user  0m0.003s
 sys   0m0.022s
 {noformat}
 This can cause the master to get backlogged if there are many state.json 
 requests in flight.
 Looking at {{perf}} data, it seems most of the time is spent doing memory 
 allocation / de-allocation. This ticket will try to capture any low hanging 
 fruit to speed this up. Possibly we can leverage moves if they are not 
 already being used by the compiler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2528) Symlink the namespace handle with ContainerID for the port mapping isolator.

2015-03-23 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-2528:
-

Assignee: Jie Yu

 Symlink the namespace handle with ContainerID for the port mapping isolator.
 

 Key: MESOS-2528
 URL: https://issues.apache.org/jira/browse/MESOS-2528
 Project: Mesos
  Issue Type: Improvement
Reporter: Jie Yu
Assignee: Jie Yu

 This serves two purposes:
 1) Allows us to enter the network namespace using container ID (instead of 
 pid): ip netns exec ContainerID [commands] [args].
 2) Allows us to get container ID for orphan containers during recovery. This 
 will be helpful for solving MESOS-2367.
 The challenge here is to solve it in a backward compatible way. I propose to 
 create symlinks under /var/run/netns. For example:
 /var/run/netns/containerid -- /var/run/netns/12345
 (12345 is the pid)
 The old code will only remove the bind mounts and leave the symlinks, which I 
 think is fine since containerid is globally unique (uuid).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2402) MesosContainerizerDestroyTest.LauncherDestroyFailure is flaky

2015-03-23 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376431#comment-14376431
 ] 

Vinod Kone commented on MESOS-2402:
---

As [~idownes] mentioned the flags have to be setup properly for exec to 
function correctly. But that fix has nothing to do with the flakiness.

The real issue seem to be that there is a race between the 
'containerizer-wait()' future being set to failed and the metric being 
updated. This is because the thread running the test might check for the metric 
value in between containerizer future being set but before the metric being 
updated (see 'MesosContainerizerProcess::__destroy()'). The fix is to settle 
the clock to ensure the metric is updated.

 MesosContainerizerDestroyTest.LauncherDestroyFailure is flaky
 -

 Key: MESOS-2402
 URL: https://issues.apache.org/jira/browse/MESOS-2402
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Vinod Kone
Assignee: Vinod Kone

 Failed to os::execvpe in childMain. Never seen this one before.
 {code}
 [ RUN  ] MesosContainerizerDestroyTest.LauncherDestroyFailure
 Using temporary directory 
 '/tmp/MesosContainerizerDestroyTest_LauncherDestroyFailure_QpjQEn'
 I0224 18:55:49.326912 21391 containerizer.cpp:461] Starting container 
 'test_container' for executor 'executor' of framework ''
 I0224 18:55:49.332252 21391 launcher.cpp:130] Forked child with pid '23496' 
 for container 'test_container'
 ABORT: (src/subprocess.cpp:165): Failed to os::execvpe in childMain
 *** Aborted at 1424832949 (unix time) try date -d @1424832949 if you are 
 using GNU date ***
 PC: @ 0x2b178c5db0d5 (unknown)
 I0224 18:55:49.340955 21392 process.cpp:2117] Dropped / Lost event for PID: 
 scheduler-509d37ac-296f-4429-b101-af433c1800e9@127.0.1.1:39647
 I0224 18:55:49.342300 21386 containerizer.cpp:911] Destroying container 
 'test_container'
 *** SIGABRT (@0x3e85bc8) received by PID 23496 (TID 0x2b178f9f0700) from 
 PID 23496; stack trace: ***
 @ 0x2b178c397cb0 (unknown)
 @ 0x2b178c5db0d5 (unknown)
 @ 0x2b178c5de83b (unknown)
 @   0x87a945 _Abort()
 @ 0x2b1789f610b9 process::childMain()
 I0224 18:55:49.391793 21386 containerizer.cpp:1120] Executor for container 
 'test_container' has exited
 I0224 18:55:49.400478 21391 process.cpp:2770] Handling HTTP event for process 
 'metrics' with path: '/metrics/snapshot'
 tests/containerizer_tests.cpp:485: Failure
 Value of: metrics.values[containerizer/mesos/container_destroy_errors]
   Actual: 16-byte object 02-00 00-00 17-2B 00-00 E0-86 0E-04 00-00 00-00
 Expected: 1u
 Which is: 1
 [  FAILED  ] MesosContainerizerDestroyTest.LauncherDestroyFailure (89 ms)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2528) Symlink the namespace handle with ContainerID for the port mapping isolator.

2015-03-23 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-2528:
--
  Sprint: Twitter Mesos Q1 Sprint 5
Story Points: 3

 Symlink the namespace handle with ContainerID for the port mapping isolator.
 

 Key: MESOS-2528
 URL: https://issues.apache.org/jira/browse/MESOS-2528
 Project: Mesos
  Issue Type: Improvement
Reporter: Jie Yu
Assignee: Jie Yu

 This serves two purposes:
 1) Allows us to enter the network namespace using container ID (instead of 
 pid): ip netns exec ContainerID [commands] [args].
 2) Allows us to get container ID for orphan containers during recovery. This 
 will be helpful for solving MESOS-2367.
 The challenge here is to solve it in a backward compatible way. I propose to 
 create symlinks under /var/run/netns. For example:
 /var/run/netns/containerid -- /var/run/netns/12345
 (12345 is the pid)
 The old code will only remove the bind mounts and leave the symlinks, which I 
 think is fine since containerid is globally unique (uuid).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2529) fetch hdfs executor failed with sh: hadoop: command not found

2015-03-23 Thread Littlestar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375466#comment-14375466
 ] 

Littlestar commented on MESOS-2529:
---

/usr/bin/env: bash: No such file or directory
===
each mesos slave node has JAVA and HADOOP DataNode.
 mesos-master-env.sh and mesos-slave-env.sh has the following setting.

export MESOS_JAVA_HOME=/home/test/jdk
export MESOS_HADOOP_HOME=/home/test/hadoop-2.4.0
export 
MESOS_PATH=/home/test/jdk/bin:/home/test/hadoop-2.4.0/sbin:/home/test/hadoop-2.4.0/bin:/sbin:/bin:/usr/sbin:/usr/bin

thanks.

 fetch hdfs executor failed with sh: hadoop: command not found
 ---

 Key: MESOS-2529
 URL: https://issues.apache.org/jira/browse/MESOS-2529
 Project: Mesos
  Issue Type: Bug
  Components: fetcher
Affects Versions: 0.21.1
Reporter: Littlestar

 fetch hdfs executor failed with sh: hadoop: command not found
 I set HADOOP_HOME and PATH in /etc/profile, but it not works well.
 WARNING: Logging before InitGoogleLogging() is written to STDERR
 I0323 11:46:41.758134  9312 fetcher.cpp:76] Fetching URI 
 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz'
 I0323 11:46:41.758301  9312 fetcher.cpp:105] Downloading resource from 
 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' to 
 '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S1/frameworks/20150323-114534-1214949568-5050-12082-/executors/20150323-100710-1214949568-5050-3453-S1/runs/5bb19ef7-483a-4871-aa2a-cb18796775e9/spark-1.3.0-bin-2.4.0.tar.gz'
 E0323 11:46:41.762511  9312 fetcher.cpp:109] HDFS copyToLocal failed: hadoop 
 fs -copyToLocal 
 'hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz' 
 '/home/mesos/work_dir/slaves/20150323-100710-1214949568-5050-3453-S1/frameworks/20150323-114534-1214949568-5050-12082-/executors/20150323-100710-1214949568-5050-3453-S1/runs/5bb19ef7-483a-4871-aa2a-cb18796775e9/spark-1.3.0-bin-2.4.0.tar.gz'
 sh: hadoop: command not found
 Failed to fetch: 
 hdfs://192.168.1.9:54310/home/test/spark-1.3.0-bin-2.4.0.tar.gz
 Failed to synchronize with slave (it's probably exited)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2532) UserCgroupIsolatorTest failures due to: Failed to prepare isolator: cgroup already exists

2015-03-23 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-2532:
--

 Summary: UserCgroupIsolatorTest failures due to: Failed to 
prepare isolator: cgroup already exists
 Key: MESOS-2532
 URL: https://issues.apache.org/jira/browse/MESOS-2532
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Benjamin Mahler


This is on a CentOS machine:

{code: title=sudo make check -j24 MESOS_VERBOSE=1 GLOG_v=1 
GTEST_FILTER=UserCgroupIsolatorTest*}
-
We cannot run any cgroups tests that require mounting
hierarchies because you have the following hierarchies mounted:
/sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, /sys/fs/cgroup/freezer, 
/sys/fs/cgroup/memory, /sys/fs/cgroup/perf_event
We'll disable the CgroupsNoHierarchyTest test fixture for now.
-
-
We cannot run any Docker tests because:
Failed to execute 'docker version': exited with status 127
-
Note: Google Test filter = 
UserCgroupIsolatorTest*-DockerContainerizerTest.ROOT_DOCKER_Launch_Executor:DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Update:DockerContainerizerTest.DISABLED_ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_DestroyWhilePulling:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy:SlaveCount/Registrar_BENCHMARK_Test.performance/0:SlaveCount/Registrar_BENCHMARK_Test.performance/1:SlaveCount/Registrar_BENCHMARK_Test.performance/2:SlaveCount/Registrar_BENCHMARK_Test.performance/3
[==] Running 3 tests from 3 test cases.
[--] Global test environment set-up.
[--] 1 test from UserCgroupIsolatorTest/0, where TypeParam = 
mesos::internal::slave::CgroupsMemIsolatorProcess
userdel: user mesos.test.unprivileged.user does not exist
[ RUN  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup
Using temporary directory 
'/tmp/UserCgroupIsolatorTest_0_ROOT_CGROUPS_UserCgroup_ASJu3B'
../../src/tests/isolator_tests.cpp:1067: Failure
(isolator.get()-prepare( containerId, executorInfo, os::getcwd(), 
UNPRIVILEGED_USERNAME)).failure(): Failed to prepare isolator: cgroup already 
exists
[  FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam 
= mesos::internal::slave::CgroupsMemIsolatorProcess (18 ms)
[--] 1 test from UserCgroupIsolatorTest/0 (18 ms total)

[--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = 
mesos::internal::slave::CgroupsCpushareIsolatorProcess
userdel: user mesos.test.unprivileged.user does not exist
[ RUN  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup
Using temporary directory 
'/tmp/UserCgroupIsolatorTest_1_ROOT_CGROUPS_UserCgroup_VIwHI4'
../../src/tests/isolator_tests.cpp:1067: Failure
(isolator.get()-prepare( containerId, executorInfo, os::getcwd(), 
UNPRIVILEGED_USERNAME)).failure(): Failed to prepare isolator: cgroup already 
exists
[  FAILED  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup, where TypeParam 
= mesos::internal::slave::CgroupsCpushareIsolatorProcess (11 ms)
[--] 1 test from UserCgroupIsolatorTest/1 (12 ms total)

[--] 1 test from UserCgroupIsolatorTest/2, where TypeParam = 
mesos::internal::slave::CgroupsPerfEventIsolatorProcess
userdel: user mesos.test.unprivileged.user does not exist
[ RUN  ] UserCgroupIsolatorTest/2.ROOT_CGROUPS_UserCgroup
Using temporary directory 
'/tmp/UserCgroupIsolatorTest_2_ROOT_CGROUPS_UserCgroup_Cm2jhz'
I0323 20:47:15.297801  2047 perf_event.cpp:71] Creating PerfEvent isolator
I0323 20:47:15.312007  2047 perf_event.cpp:109] PerfEvent isolator will profile 
for 10secs every 1mins for events: { cpu-cycles }
I0323 20:47:15.312500  2069 perf_event.cpp:221] Preparing perf event cgroup for 
container
../../src/tests/isolator_tests.cpp:1067: Failure
(isolator.get()-prepare( containerId, executorInfo, os::getcwd(), 
UNPRIVILEGED_USERNAME)).failure(): Failed to prepare isolator: cgroup already 
exists
[  

[jira] [Commented] (MESOS-2532) UserCgroupIsolatorTest failures due to: Failed to prepare isolator: cgroup already exists

2015-03-23 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376624#comment-14376624
 ] 

Benjamin Mahler commented on MESOS-2532:


Looked with [~jieyu], it looks like these tests generate their own non-unique 
container IDs and these can get left-over after the tests complete:

{code}
TYPED_TEST(UserCgroupIsolatorTest, ROOT_CGROUPS_UserCgroup)
{
  // ...

  ContainerID containerId;
  containerId.set_value(container);
{code}

{noformat}
[bmahler@smfd-atr-11-sr1 build]$ ls -l /sys/fs/cgroup/cpu/mesos/container
total 0
-rw-r--r-- 1 root root 0 Jan  8 18:29 cgroup.clone_children
--w--w--w- 1 root root 0 Jan  8 18:29 cgroup.event_control
-rw-r--r-- 1 root root 0 Jan  8 18:29 cgroup.procs
-rw-r--r-- 1 root root 0 Jan  8 18:29 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 Jan  8 18:29 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 Jan  8 18:29 cpu.rt_period_us
-rw-r--r-- 1 root root 0 Jan  8 18:29 cpu.rt_runtime_us
-rw-r--r-- 1 root root 0 Jan  8 18:29 cpu.shares
-r--r--r-- 1 root root 0 Jan  8 18:29 cpu.stat
-rw-r--r-- 1 root root 0 Jan  8 18:29 notify_on_release
-rw-r--r-- 1 root root 0 Jan  8 18:29 tasks
{noformat}

 UserCgroupIsolatorTest failures due to: Failed to prepare isolator: cgroup 
 already exists
 ---

 Key: MESOS-2532
 URL: https://issues.apache.org/jira/browse/MESOS-2532
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Benjamin Mahler
  Labels: twitter

 This is on a CentOS machine:
 {code: title=sudo make check -j24 MESOS_VERBOSE=1 GLOG_v=1 
 GTEST_FILTER=UserCgroupIsolatorTest*}
 -
 We cannot run any cgroups tests that require mounting
 hierarchies because you have the following hierarchies mounted:
 /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, /sys/fs/cgroup/freezer, 
 /sys/fs/cgroup/memory, /sys/fs/cgroup/perf_event
 We'll disable the CgroupsNoHierarchyTest test fixture for now.
 -
 -
 We cannot run any Docker tests because:
 Failed to execute 'docker version': exited with status 127
 -
 Note: Google Test filter = 
 UserCgroupIsolatorTest*-DockerContainerizerTest.ROOT_DOCKER_Launch_Executor:DockerContainerizerTest.ROOT_DOCKER_Launch_Executor_Bridged:DockerContainerizerTest.ROOT_DOCKER_Launch:DockerContainerizerTest.ROOT_DOCKER_Kill:DockerContainerizerTest.ROOT_DOCKER_Usage:DockerContainerizerTest.ROOT_DOCKER_Update:DockerContainerizerTest.DISABLED_ROOT_DOCKER_Recover:DockerContainerizerTest.ROOT_DOCKER_Logs:DockerContainerizerTest.ROOT_DOCKER_Default_CMD:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Override:DockerContainerizerTest.ROOT_DOCKER_Default_CMD_Args:DockerContainerizerTest.ROOT_DOCKER_SlaveRecoveryTaskContainer:DockerContainerizerTest.DISABLED_ROOT_DOCKER_SlaveRecoveryExecutorContainer:DockerContainerizerTest.ROOT_DOCKER_PortMapping:DockerContainerizerTest.ROOT_DOCKER_LaunchSandboxWithColon:DockerContainerizerTest.ROOT_DOCKER_DestroyWhileFetching:DockerContainerizerTest.ROOT_DOCKER_DestroyWhilePulling:DockerTest.ROOT_DOCKER_interface:DockerTest.ROOT_DOCKER_CheckCommandWithShell:DockerTest.ROOT_DOCKER_CheckPortResource:DockerTest.ROOT_DOCKER_CancelPull:CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy:SlaveCount/Registrar_BENCHMARK_Test.performance/0:SlaveCount/Registrar_BENCHMARK_Test.performance/1:SlaveCount/Registrar_BENCHMARK_Test.performance/2:SlaveCount/Registrar_BENCHMARK_Test.performance/3
 [==] Running 3 tests from 3 test cases.
 [--] Global test environment set-up.
 [--] 1 test from UserCgroupIsolatorTest/0, where TypeParam = 
 mesos::internal::slave::CgroupsMemIsolatorProcess
 userdel: user mesos.test.unprivileged.user does not exist
 [ RUN  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup
 Using temporary directory 
 '/tmp/UserCgroupIsolatorTest_0_ROOT_CGROUPS_UserCgroup_ASJu3B'
 ../../src/tests/isolator_tests.cpp:1067: Failure
 (isolator.get()-prepare( containerId, executorInfo, os::getcwd(), 
 UNPRIVILEGED_USERNAME)).failure(): Failed to prepare isolator: cgroup already 
 exists
 [  FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where 
 TypeParam = mesos::internal::slave::CgroupsMemIsolatorProcess (18 ms)
 [--] 1 test from UserCgroupIsolatorTest/0 (18 ms total)
 [--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = 
 mesos::internal::slave::CgroupsCpushareIsolatorProcess
 userdel: user mesos.test.unprivileged.user does not exist
 [ RUN  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup
 Using temporary directory 
 

[jira] [Updated] (MESOS-2528) Symlink the namespace handle with ContainerID for the port mapping isolator.

2015-03-23 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-2528:
--
Labels: twitter  (was: )

 Symlink the namespace handle with ContainerID for the port mapping isolator.
 

 Key: MESOS-2528
 URL: https://issues.apache.org/jira/browse/MESOS-2528
 Project: Mesos
  Issue Type: Improvement
Reporter: Jie Yu
Assignee: Jie Yu
  Labels: twitter

 This serves two purposes:
 1) Allows us to enter the network namespace using container ID (instead of 
 pid): ip netns exec ContainerID [commands] [args].
 2) Allows us to get container ID for orphan containers during recovery. This 
 will be helpful for solving MESOS-2367.
 The challenge here is to solve it in a backward compatible way. I propose to 
 create symlinks under /var/run/netns. For example:
 /var/run/netns/containerid -- /var/run/netns/12345
 (12345 is the pid)
 The old code will only remove the bind mounts and leave the symlinks, which I 
 think is fine since containerid is globally unique (uuid).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2514) Change the default leaf qdisc to fq_codel inside containers

2015-03-23 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-2514:
--
  Sprint: Twitter Mesos Q1 Sprint 5
Story Points: 1

 Change the default leaf qdisc to fq_codel inside containers
 ---

 Key: MESOS-2514
 URL: https://issues.apache.org/jira/browse/MESOS-2514
 Project: Mesos
  Issue Type: Bug
Reporter: Cong Wang
Assignee: Cong Wang
 Fix For: 0.23.0


 When we enable bandwidth cap, htb is used on egress side inside containers, 
 however, the default leaf qdisc for a htb class is still pfifo_fast, which is 
 known to have buffer bloat. Change the default leaf qdisc to fq_codel too:
 `tc qd add dev eth0 parent 1:1 fq_codel`
 I can no longer see packet drops after this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-2514) Change the default leaf qdisc to fq_codel inside containers

2015-03-23 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu resolved MESOS-2514.
---
   Resolution: Fixed
Fix Version/s: 0.23.0

commit d82ec92073b0438589e7aa72e608c3dc334a8dd6
Author: Cong Wang cw...@twopensource.com
Date:   Mon Mar 23 11:33:09 2015 -0700

Changed default htb leaf qdisc to fq_codel in port mapping isolator.

Review: https://reviews.apache.org/r/32219

 Change the default leaf qdisc to fq_codel inside containers
 ---

 Key: MESOS-2514
 URL: https://issues.apache.org/jira/browse/MESOS-2514
 Project: Mesos
  Issue Type: Bug
Reporter: Cong Wang
Assignee: Cong Wang
 Fix For: 0.23.0


 When we enable bandwidth cap, htb is used on egress side inside containers, 
 however, the default leaf qdisc for a htb class is still pfifo_fast, which is 
 known to have buffer bloat. Change the default leaf qdisc to fq_codel too:
 `tc qd add dev eth0 parent 1:1 fq_codel`
 I can no longer see packet drops after this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (MESOS-2531) Libmesos terminates JVM

2015-03-23 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen closed MESOS-2531.
-
Resolution: Duplicate

 Libmesos terminates JVM
 ---

 Key: MESOS-2531
 URL: https://issues.apache.org/jira/browse/MESOS-2531
 Project: Mesos
  Issue Type: Bug
  Components: java api
Affects Versions: 0.23.0
Reporter: Michał Kiędyś
 Attachments: hs_err_pid98294.log


 I have build Mesos from scratch using code available on GitHub, revision 
 #a12242b.
 My Mesos cluster runs on MacOS and consists of one master and three slaves - 
 all running on the same computer but on different ports. ZooKeeper runs also 
 on the same computer.
 Later on I compiled Marathon also using latest version from GitHub, revision 
 #6decf76. Marathon uses same ZooKeeper instance and successfully connects to 
 Mesos cluster.
 After deploying simple application that runs {{sleep}} command for 120 
 seconds and scaling that application to ten my Marathon crushed killed by JVM 
 after SIGSEGV in libmesos-0.23.0.dylib.
 h4. Log
 {noformat}
 [2015-03-23 15:47:17,872] INFO Computed new deployment plan: 
 DeploymentPlan(2015-03-23T14:47:17.823Z, (Step(List(Scale(App(/bar, 
 Some(sleep 120))), 10) (mesosphere.marathon.upgrade.DeploymentPlan$:263)
 [2015-03-23 15:47:17,876] INFO Deployment acknowledged. Waiting to get 
 processed: DeploymentPlan(2015-03-23T14:47:17.823Z, 
 (Step(List(Scale(App(/bar, Some(sleep 120))), 10) 
 (mesosphere.marathon.state.GroupManager:142)
 [2015-03-23 15:47:17,877] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:17 +] 
 PUT /v2/apps//bar HTTP/1.1 200 92 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:17,918] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:17 +] 
 GET /v2/apps//bar/versions HTTP/1.1 200 68 http://127.0.0.1:8080/; 
 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, 
 like Gecko) Chrome/41.0.2272.89 Safari/537.36 
 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:20,722] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:20 +] 
 GET /v2/apps HTTP/1.1 200 592 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:20,782] INFO Received status update for task 
 bar.82501637-d16b-11e4-b7fa-aa4dda3d2dbb: TASK_RUNNING () 
 (mesosphere.marathon.MarathonScheduler:149)
 [2015-03-23 15:47:20,790] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:20 +] 
 GET /v2/deployments HTTP/1.1 200 256 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x00012ec946f7, pid=98294, tid=27651
 #
 # JRE version: Java(TM) SE Runtime Environment (8.0-b132) (build 1.8.0-b132)
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode bsd-amd64 
 compressed oops)
 # Problematic frame:
 # C  [libmesos-0.23.0.dylib+0x7836f7]  
 process::Futuremesos::internal::state::Variable::isFailed() const+0x17
 #
 # Failed to write core dump. Core dumps have been disabled. To enable core 
 dumping, try ulimit -c unlimited before starting Java again
 #
 # An error report file with more information is saved as:
 # /Users/mkiedys/Downloads/MESOS/marathon/hs_err_pid98294.log
 #
 # If you would like to submit a bug report, please visit:
 #   http://bugreport.sun.com/bugreport/crash.jsp
 # The crash happened outside the Java Virtual Machine in native code.
 # See problematic frame for where to report the bug.
 #
 Abort trap: 6
 {noformat}
 h4. Java
 java version 1.8.0
 Java(TM) SE Runtime Environment (build 1.8.0-b132)
 Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)
 h4. System Software Overview
 - System Version: OS X 10.10.2 (14C109)
 - Kernel Version: Darwin 14.1.0
 - Secure Virtual Memory: Enabled
 - Time since boot: 13 days 11:02



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2531) Libmesos terminates JVM

2015-03-23 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376403#comment-14376403
 ] 

Niklas Quarfot Nielsen commented on MESOS-2531:
---

From the stack trace, it look like the state bug we have traced down 
(https://issues.apache.org/jira/browse/MESOS-2161) and a fix is in review: 
https://reviews.apache.org/r/32152/

Will mark as duplicate for now. Sorry for the inconvenience, but thanks for 
reporting the issue!
Niklas

{code}
Stack: [0x00012d21d000,0x00012d29d000],  sp=0x00012d29af30,  free 
space=503k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libmesos-0.23.0.dylib+0x7836f7]  
process::Futuremesos::internal::state::Variable::isFailed() const+0x17
C  [libmesos-0.23.0.dylib+0x1aa915a]  
Java_org_apache_mesos_state_AbstractState__1_1fetch_1get_1timeout+0xea
J 5808  
org.apache.mesos.state.AbstractState.__fetch_get_timeout(JJLjava/util/concurrent/TimeUnit;)Lorg/apache/mesos/state/Variable;
 (0 bytes) @ 0x00010db89402 [0x00010db89340+0xc2]
J 6339 C1 
mesosphere.marathon.tasks.TaskTracker$$anonfun$fetchFromState$1.apply()Lorg/apache/mesos/state/Variable;
 (51 bytes) @ 0x00010dcff17c [0x00010dcfec00+0x57c]
J 6338 C1 
mesosphere.marathon.tasks.TaskTracker$$anonfun$fetchFromState$1.apply()Ljava/lang/Object;
 (5 bytes) @ 0x00010dcffaf4 [0x00010dcffa00+0xf4]
J 5007 C1 
mesosphere.marathon.state.StateMetrics$class.timed(Lmesosphere/marathon/state/StateMetrics;Lcom/codahale/metrics/Histogram;Lcom/codahale/metrics/Meter;Lcom/codahale/metrics/Meter;Lscala/Function0;)Ljava/lang/Object;
 (48 bytes) @ 0x00010d944744 [0x00010d9445a0+0x1a4]
J 5995 C1 
mesosphere.marathon.tasks.TaskTracker.fetchFromState(Ljava/lang/String;)Lorg/apache/mesos/state/Variable;
 (17 bytes) @ 0x00010dc05b9c [0x00010dc05840+0x35c]
{code}

 Libmesos terminates JVM
 ---

 Key: MESOS-2531
 URL: https://issues.apache.org/jira/browse/MESOS-2531
 Project: Mesos
  Issue Type: Bug
  Components: java api
Affects Versions: 0.23.0
Reporter: Michał Kiędyś
 Attachments: hs_err_pid98294.log


 I have build Mesos from scratch using code available on GitHub, revision 
 #a12242b.
 My Mesos cluster runs on MacOS and consists of one master and three slaves - 
 all running on the same computer but on different ports. ZooKeeper runs also 
 on the same computer.
 Later on I compiled Marathon also using latest version from GitHub, revision 
 #6decf76. Marathon uses same ZooKeeper instance and successfully connects to 
 Mesos cluster.
 After deploying simple application that runs {{sleep}} command for 120 
 seconds and scaling that application to ten my Marathon crushed killed by JVM 
 after SIGSEGV in libmesos-0.23.0.dylib.
 h4. Log
 {noformat}
 [2015-03-23 15:47:17,872] INFO Computed new deployment plan: 
 DeploymentPlan(2015-03-23T14:47:17.823Z, (Step(List(Scale(App(/bar, 
 Some(sleep 120))), 10) (mesosphere.marathon.upgrade.DeploymentPlan$:263)
 [2015-03-23 15:47:17,876] INFO Deployment acknowledged. Waiting to get 
 processed: DeploymentPlan(2015-03-23T14:47:17.823Z, 
 (Step(List(Scale(App(/bar, Some(sleep 120))), 10) 
 (mesosphere.marathon.state.GroupManager:142)
 [2015-03-23 15:47:17,877] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:17 +] 
 PUT /v2/apps//bar HTTP/1.1 200 92 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:17,918] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:17 +] 
 GET /v2/apps//bar/versions HTTP/1.1 200 68 http://127.0.0.1:8080/; 
 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, 
 like Gecko) Chrome/41.0.2272.89 Safari/537.36 
 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:20,722] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:20 +] 
 GET /v2/apps HTTP/1.1 200 592 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 [2015-03-23 15:47:20,782] INFO Received status update for task 
 bar.82501637-d16b-11e4-b7fa-aa4dda3d2dbb: TASK_RUNNING () 
 (mesosphere.marathon.MarathonScheduler:149)
 [2015-03-23 15:47:20,790] INFO 127.0.0.1 -  -  [23/mar/2015:14:47:20 +] 
 GET /v2/deployments HTTP/1.1 200 256 http://127.0.0.1:8080/; Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) 
 Chrome/41.0.2272.89 Safari/537.36 (mesosphere.chaos.http.ChaosRequestLog:15)
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x00012ec946f7, pid=98294, tid=27651
 #
 # JRE version: Java(TM) SE Runtime Environment 

[jira] [Commented] (MESOS-2425) TODO comment in mesos.proto is already implemented

2015-03-23 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376854#comment-14376854
 ] 

Niklas Quarfot Nielsen commented on MESOS-2425:
---

https://reviews.apache.org/r/31637/

 TODO comment in mesos.proto is already implemented
 --

 Key: MESOS-2425
 URL: https://issues.apache.org/jira/browse/MESOS-2425
 Project: Mesos
  Issue Type: Bug
  Components: general
Affects Versions: 0.20.1
Reporter: Aaron Bell
Assignee: Aaron Bell
Priority: Minor
  Labels: mesosphere
 Attachments: mesos-2425-1.diff


 These lines are redundant in mesos.proto, since CommandInfo is now 
 implemented:
 https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L169-L174
 I'm creating a patch with edits on comment lines only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2425) TODO comment in mesos.proto is already implemented

2015-03-23 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376853#comment-14376853
 ] 

Adam B commented on MESOS-2425:
---

Patch uploaded: https://reviews.apache.org/r/31637/diff/#

 TODO comment in mesos.proto is already implemented
 --

 Key: MESOS-2425
 URL: https://issues.apache.org/jira/browse/MESOS-2425
 Project: Mesos
  Issue Type: Bug
  Components: general
Affects Versions: 0.20.1
Reporter: Aaron Bell
Assignee: Aaron Bell
Priority: Minor
  Labels: mesosphere
 Attachments: mesos-2425-1.diff


 These lines are redundant in mesos.proto, since CommandInfo is now 
 implemented:
 https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L169-L174
 I'm creating a patch with edits on comment lines only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-2353) Improve performance of the master's state.json endpoint for large clusters.

2015-03-23 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler reassigned MESOS-2353:
--

Assignee: Benjamin Mahler

[~alex-mesos] I'll take up the move change.

 Improve performance of the master's state.json endpoint for large clusters.
 ---

 Key: MESOS-2353
 URL: https://issues.apache.org/jira/browse/MESOS-2353
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler
  Labels: newbie, twitter

 The master's state.json endpoint consistently takes a long time to compute 
 the JSON result, for large clusters:
 {noformat}
 $ time curl -s -o /dev/null localhost:5050/master/state.json
 Mon Jan 26 22:38:50 UTC 2015
 real  0m13.174s
 user  0m0.003s
 sys   0m0.022s
 {noformat}
 This can cause the master to get backlogged if there are many state.json 
 requests in flight.
 Looking at {{perf}} data, it seems most of the time is spent doing memory 
 allocation / de-allocation. This ticket will try to capture any low hanging 
 fruit to speed this up. Possibly we can leverage moves if they are not 
 already being used by the compiler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2205) Add user documentation for reservations

2015-03-23 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376994#comment-14376994
 ] 

Niklas Quarfot Nielsen commented on MESOS-2205:
---

Hi [~mcypark] - how do you want to collaborate (and land) on this? As an arch 
doc in the repo or somewhere on the wiki? :)

 Add user documentation for reservations
 ---

 Key: MESOS-2205
 URL: https://issues.apache.org/jira/browse/MESOS-2205
 Project: Mesos
  Issue Type: Documentation
  Components: documentation, framework
Reporter: Michael Park
Assignee: Michael Park
  Labels: mesosphere

 Add a user guide for reservations which describes basic usage of them, how 
 ACLs are used to specify who can unreserve whose resources, and few advanced 
 usage cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2165) When cyrus sasl MD5 isn't installed configure passes, tests fail without any output

2015-03-23 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2165:
--
Shepherd: Adam B

 When cyrus sasl MD5 isn't installed configure passes, tests fail without any 
 output
 ---

 Key: MESOS-2165
 URL: https://issues.apache.org/jira/browse/MESOS-2165
 Project: Mesos
  Issue Type: Bug
Reporter: Cody Maloney
Assignee: Till Toenshoff
  Labels: mesosphere

 Sample Dockerfile to make such a host:
 {code}
 FROM centos:centos7
 RUN yum install -y epel-release gcc python-devel
 RUN yum install -y python-pip
 RUN yum install -y rpm-build redhat-rpm-config autoconf make gcc gcc-c++ 
 patch libtool git python-devel ruby-devel java-1.7.0-openjdk-devel zlib-devel 
 libcurl-devel openssl-devel cyrus-sasl-devel rubygems apr-devel 
 apr-util-devel subversion-devel maven libselinux-python
 {code}
 Use: 'docker run -i -t imagename /bin/bash' to run the image, get a shell 
 inside where you can 'git clone' mesos and build/run the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2510) Add a function which test if a JSON object is contained in another JSON object

2015-03-23 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-2510:
--
Sprint: Mesosphere Q1 Sprint 6 - 4/3

 Add a function which test if a JSON object is contained in another JSON object
 --

 Key: MESOS-2510
 URL: https://issues.apache.org/jira/browse/MESOS-2510
 Project: Mesos
  Issue Type: Wish
  Components: stout
Reporter: Alexander Rojas
Assignee: Alexander Rojas

 It would be nice to check wether one json blob is contained by other blob. 
 i.e. given the json blob {{a}} and the blob {{b}}, {{a}} contains {{b}} if 
 every key {{x}} in {{b}} is also in {{a}}, and {{b\[x\] == a\[x\]}} if 
 {{b\[x\]}} is not a json object itself or, if it is a json object, {{a\[x\]}} 
 contains {{b\[x\]}}.
 h3. Rationale
 One of the most useful patterns while testing functions which return json, is 
 to write the expected result and then compare if the expected blob is equal 
 to the returned one:
 {code}
 JSON::Value expected = JSON::parse(
 {
   \key\ : true   
 }).get();
 JSON::Value actual = foo();
 CHECK_EQ(expected, actual);
 {code}
 As can be seen in the example above, it is easy to read what the expected 
 value is, and checking for failures if fairly easy. 
 It is no easy, however, to compare returned blobs which contain at least one 
 random values (for example time stamps), or a value which is uninteresting 
 for the test. In such cases it is necessary to extract each value separately 
 and compare them:
 {code}
 // Returned json:
 // {
 //   uptime : 45234.123,
 //   key : true
 // }
 JSON::Value actual = bar();
 // I'm only interested on the key entry.
 EXPECT_SOME_EQ(true, actual.findJSON::String(key));
 {code}
 As seen above, is one is only interested in a subset of the keys/values pairs 
 returned by {{bar}} the readability of the code decreases severely. It is 
 worse if it weren't for the comments.
 The aim is to achieve the same level of readability on the first example 
 while covering the case of the second:
 {code}
 JSON::Value expected = JSON::parse(
 {
   \key\ : true   
 }).get();
 // Returned json:
 // {
 //   uptime : 45234.123,
 //   key : true
 // }
 JSON::Value actual = bar();
 // I'm only interested on the key entry and ignore the rest.
 EXPECT_TRUE(contains(actual, expected));
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2353) Improve performance of the master's state.json endpoint for large clusters.

2015-03-23 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-2353:
---
Sprint: Twitter Mesos Q1 Sprint 5

 Improve performance of the master's state.json endpoint for large clusters.
 ---

 Key: MESOS-2353
 URL: https://issues.apache.org/jira/browse/MESOS-2353
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler
  Labels: newbie, twitter

 The master's state.json endpoint consistently takes a long time to compute 
 the JSON result, for large clusters:
 {noformat}
 $ time curl -s -o /dev/null localhost:5050/master/state.json
 Mon Jan 26 22:38:50 UTC 2015
 real  0m13.174s
 user  0m0.003s
 sys   0m0.022s
 {noformat}
 This can cause the master to get backlogged if there are many state.json 
 requests in flight.
 Looking at {{perf}} data, it seems most of the time is spent doing memory 
 allocation / de-allocation. This ticket will try to capture any low hanging 
 fruit to speed this up. Possibly we can leverage moves if they are not 
 already being used by the compiler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2353) Improve performance of the master's state.json endpoint for large clusters.

2015-03-23 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377036#comment-14377036
 ] 

Benjamin Mahler commented on MESOS-2353:


https://reviews.apache.org/r/32419/

 Improve performance of the master's state.json endpoint for large clusters.
 ---

 Key: MESOS-2353
 URL: https://issues.apache.org/jira/browse/MESOS-2353
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler
  Labels: newbie, twitter

 The master's state.json endpoint consistently takes a long time to compute 
 the JSON result, for large clusters:
 {noformat}
 $ time curl -s -o /dev/null localhost:5050/master/state.json
 Mon Jan 26 22:38:50 UTC 2015
 real  0m13.174s
 user  0m0.003s
 sys   0m0.022s
 {noformat}
 This can cause the master to get backlogged if there are many state.json 
 requests in flight.
 Looking at {{perf}} data, it seems most of the time is spent doing memory 
 allocation / de-allocation. This ticket will try to capture any low hanging 
 fruit to speed this up. Possibly we can leverage moves if they are not 
 already being used by the compiler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-2425) TODO comment in mesos.proto is already implemented

2015-03-23 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-2425:
--
Comment: was deleted

(was: https://reviews.apache.org/r/31637/)

 TODO comment in mesos.proto is already implemented
 --

 Key: MESOS-2425
 URL: https://issues.apache.org/jira/browse/MESOS-2425
 Project: Mesos
  Issue Type: Bug
  Components: general
Affects Versions: 0.20.1
Reporter: Aaron Bell
Assignee: Aaron Bell
Priority: Minor
  Labels: mesosphere
 Attachments: mesos-2425-1.diff


 These lines are redundant in mesos.proto, since CommandInfo is now 
 implemented:
 https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L169-L174
 I'm creating a patch with edits on comment lines only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2533) Support HTTP checks in Mesos health check program

2015-03-23 Thread Niklas Quarfot Nielsen (JIRA)
Niklas Quarfot Nielsen created MESOS-2533:
-

 Summary: Support HTTP checks in Mesos health check program
 Key: MESOS-2533
 URL: https://issues.apache.org/jira/browse/MESOS-2533
 Project: Mesos
  Issue Type: Bug
Reporter: Niklas Quarfot Nielsen


Currently, only commands are supported but our health check protobuf enables 
users to encode HTTP checks as well. We should wire up this in the health check 
program or remove the http field from the protobuf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2534) PerfTest.ROOT_SampleInit test fails.

2015-03-23 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376980#comment-14376980
 ] 

Ian Downes commented on MESOS-2534:
---

https://reviews.apache.org/r/32420/

 PerfTest.ROOT_SampleInit test fails.
 

 Key: MESOS-2534
 URL: https://issues.apache.org/jira/browse/MESOS-2534
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Benjamin Mahler
Assignee: Ian Downes
  Labels: twitter

 From MESOS-2300 as well, it looks like this test is not reliable:
 {code}
 [ RUN  ] PerfTest.ROOT_SampleInit
 ../../src/tests/perf_tests.cpp:147: Failure
 Expected: (0u)  (statistics.get().cycles()), actual: 0 vs 0
 ../../src/tests/perf_tests.cpp:150: Failure
 Expected: (0.0)  (statistics.get().task_clock()),
 {code}
 It looks like this test samples PID 1, which is either {{init}} or 
 {{systemd}}. Per a chat with [~idownes] this should probably sample something 
 that is guaranteed to be consuming cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2534) PerfTest.ROOT_SampleInit test fails.

2015-03-23 Thread Ian Downes (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-2534:
--
Sprint: Twitter Mesos Q1 Sprint 5

 PerfTest.ROOT_SampleInit test fails.
 

 Key: MESOS-2534
 URL: https://issues.apache.org/jira/browse/MESOS-2534
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Benjamin Mahler
Assignee: Ian Downes
  Labels: twitter

 From MESOS-2300 as well, it looks like this test is not reliable:
 {code}
 [ RUN  ] PerfTest.ROOT_SampleInit
 ../../src/tests/perf_tests.cpp:147: Failure
 Expected: (0u)  (statistics.get().cycles()), actual: 0 vs 0
 ../../src/tests/perf_tests.cpp:150: Failure
 Expected: (0.0)  (statistics.get().task_clock()),
 {code}
 It looks like this test samples PID 1, which is either {{init}} or 
 {{systemd}}. Per a chat with [~idownes] this should probably sample something 
 that is guaranteed to be consuming cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2534) PerfTest.ROOT_SampleInit test fails.

2015-03-23 Thread Ian Downes (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Downes updated MESOS-2534:
--
Story Points: 2

 PerfTest.ROOT_SampleInit test fails.
 

 Key: MESOS-2534
 URL: https://issues.apache.org/jira/browse/MESOS-2534
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Benjamin Mahler
Assignee: Ian Downes
  Labels: twitter

 From MESOS-2300 as well, it looks like this test is not reliable:
 {code}
 [ RUN  ] PerfTest.ROOT_SampleInit
 ../../src/tests/perf_tests.cpp:147: Failure
 Expected: (0u)  (statistics.get().cycles()), actual: 0 vs 0
 ../../src/tests/perf_tests.cpp:150: Failure
 Expected: (0.0)  (statistics.get().task_clock()),
 {code}
 It looks like this test samples PID 1, which is either {{init}} or 
 {{systemd}}. Per a chat with [~idownes] this should probably sample something 
 that is guaranteed to be consuming cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2477) Enable Resources::apply to handle reservation operations.

2015-03-23 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-2477:

Shepherd: Timothy Chen

 Enable Resources::apply to handle reservation operations.
 -

 Key: MESOS-2477
 URL: https://issues.apache.org/jira/browse/MESOS-2477
 Project: Mesos
  Issue Type: Technical task
Reporter: Michael Park
Assignee: Michael Park
  Labels: mesosphere

 {{Resources::apply}} currently only handles {{Create}} and {{Destroy}} 
 operations which exist for persistent volumes. We need to handle the 
 {{Reserve}} and {{Unreserve}} operations for dynamic reservations as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2476) Enable Resources to handle Resource::ReservationInfo

2015-03-23 Thread Timothy Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Chen updated MESOS-2476:

Shepherd: Timothy Chen

 Enable Resources to handle Resource::ReservationInfo
 

 Key: MESOS-2476
 URL: https://issues.apache.org/jira/browse/MESOS-2476
 Project: Mesos
  Issue Type: Technical task
Reporter: Michael Park
Assignee: Michael Park
  Labels: mesosphere

 After [MESOS-2475|https://issues.apache.org/jira/browse/MESOS-2475], our C++ 
 {{Resources}} class needs to know how to handle {{Resource}} protobuf 
 messages that have the {{reservation}} field set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2300) Failing tests on 0.21.1 with Ubuntu 14.10 / Linux 3.16.0-23

2015-03-23 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376902#comment-14376902
 ] 

Benjamin Mahler commented on MESOS-2300:


I've filed MESOS-2534 to address the perf test failure.

 Failing tests on 0.21.1 with Ubuntu 14.10 / Linux 3.16.0-23
 ---

 Key: MESOS-2300
 URL: https://issues.apache.org/jira/browse/MESOS-2300
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.21.1
 Environment: (Though the hostname of this box is {{docker1}}, this is 
 not running on a docker container. This box sits on vanilla hardware, and 
 happens to also be used as a docker server. Though not when I ran the 
 offending tests.)
 {code}
 huitseeker@docker1:~$  lsb_release -a
 No LSB modules are available.
 Distributor ID:   Ubuntu
 Description:  Ubuntu 14.10
 Release:  14.10
 Codename: utopic
 {code}
 {code}
 huitseeker@docker1:~$ uname -a
 Linux docker1 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 17:56:17 UTC 2014 
 x86_64 x86_64 x86_64 GNU/Linux }}
 {code}
 Mesos retrieved from {{http://git-wip-us.apache.org/repos/asf/mesos.git}}
 And compiled from git tag {{0.21.1}} (currently resolves to 
 {{2ae1ba91e64f92ec71d327e10e6ba9e8ad5477e8}}). Box is a clean, 
 ansible-generated Ubuntu with cgmanager disabled, and the following packages 
 installed on top of the usual mesos dependencies:
 - cgroup-lite (service is enabled and started)
 - linux-tools-common
 - linux-tools-generic
 - linux-cloud-tools-generic
 - linux-tools-3.16.0-23-generic
 - linux-cloud-tools-3.16.0-23-generic
Reporter: François Garillot
  Labels: cgroups, test

 During make check :
 {code}
 [--] Global test environment tear-down
 [==] 503 tests from 89 test cases ran. (387352 ms total)
 [  PASSED  ] 499 tests.
 [  FAILED  ] 4 tests, listed below:
 [  FAILED  ] CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get
 [  FAILED  ] CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups
 [  FAILED  ] NsTest.ROOT_setns
 [  FAILED  ] PerfTest.ROOT_SampleInit
 {code}
 Details:
 {code}
 [ RUN  ] CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get
 ../../src/tests/cgroups_tests.cpp:364: Failure
 Value of: mesos_test2
 Expected: cgroups.get()[0]
 Which is: mesos
 [  FAILED  ] CgroupsAnyHierarchyTest.ROOT_CGROUPS_Get (10 ms)
 [ RUN  ] CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups
 ../../src/tests/cgroups_tests.cpp:392: Failure
 Value of: path::join(TEST_CGROUPS_ROOT, 2)
   Actual: mesos_test/2
 Expected: cgroups.get()[0]
 Which is: mesos_test/1
 ../../src/tests/cgroups_tests.cpp:393: Failure
 Value of: path::join(TEST_CGROUPS_ROOT, 1)
   Actual: mesos_test/1
 Expected: cgroups.get()[1]
 Which is: mesos_test/2
 [  FAILED  ] CgroupsAnyHierarchyTest.ROOT_CGROUPS_NestedCgroups (12 ms)
 {code}
 {code}
 [ RUN  ] NsTest.ROOT_setns
 ../../src/tests/ns_tests.cpp:123: Failure
 Value of: status.get().get()
   Actual: 256
 Expected: 0
 [  FAILED  ] NsTest.ROOT_setns (93 ms)
 {code}
 {code}
 [ RUN  ] PerfTest.ROOT_SampleInit
 ../../src/tests/perf_tests.cpp:143: Failure
 Expected: (0u)  (statistics.get().cycles()), actual: 0 vs 0
 ../../src/tests/perf_tests.cpp:146: Failure
 Expected: (0.0)  (statistics.get().task_clock()), actual: 0 vs 0
 [  FAILED  ] PerfTest.ROOT_SampleInit (1078 ms)
 {code}
 Those tests have been run in parallel (-j 8) as well as sequentially (-j 1), 
 no difference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2534) PerfTest.ROOT_SampleInit test fails.

2015-03-23 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-2534:
--

 Summary: PerfTest.ROOT_SampleInit test fails.
 Key: MESOS-2534
 URL: https://issues.apache.org/jira/browse/MESOS-2534
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Benjamin Mahler
Assignee: Ian Downes


From MESOS-2300 as well, it looks like this test is not reliable:

{code}
[ RUN  ] PerfTest.ROOT_SampleInit
../../src/tests/perf_tests.cpp:147: Failure
Expected: (0u)  (statistics.get().cycles()), actual: 0 vs 0
../../src/tests/perf_tests.cpp:150: Failure
Expected: (0.0)  (statistics.get().task_clock()),
{code}

It looks like this test samples PID 1, which is either {{init}} or {{systemd}}. 
Per a chat with [~idownes] this should probably sample something that is 
guaranteed to be consuming cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2535) Improve Resources filters to refuse certain roles

2015-03-23 Thread Yan Xu (JIRA)
Yan Xu created MESOS-2535:
-

 Summary: Improve Resources filters to refuse certain roles
 Key: MESOS-2535
 URL: https://issues.apache.org/jira/browse/MESOS-2535
 Project: Mesos
  Issue Type: Improvement
Reporter: Yan Xu


We have certain use case where a framework only uses certain hosts in the 
cluster (e.g. either because they have some special hardware or just large 
disks/ram). The way we are currently implementing it is that the slaves on 
these hosts tag all their resources as belonged to a certain role and the 
scheduler only uses resources of that role.

To make sure the framework plays nicely with other frameworks it also 
immediately declines resources of other roles (including '*' roles) 
infinitely. The current {{declineOffer()}} API however, is at the offer 
level, not the 
- framework level, so the framework can directly reject offers from all slaves 
collectively instead of individually, nor the
- resource level, so that the framework can reject some resources from this 
offer but not all.

The framework-level requirement is less of a problem because rejecting the 
offers individually achieves the same result, just with more message overhead.

The resource level requirement, though, is hard to achieve with the current 
declineOffer() API. If the special slaves do not dedicate all its resources to 
one framework but rather, have some resources with a certain role that I want 
to reject for 5 seconds (due to an idle scheduler) and some other resources 
that I want to reject forever (e.g. due to role mismatch), there is no way to 
do that.

Some improvement on the resource filters that solve this would be nice. Such 
improvement should make sure that we are still able to revive the offers (or 
rather, directly, remove the filters that do not tie to an offer)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1831) Master should send PingSlaveMessage instead of PING

2015-03-23 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-1831:
--
Sprint: Mesosphere Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, 
Mesosphere Q1 Sprint 4 - 3/6, Mesosphere Q1 Sprint 5 - 3/20  (was: Mesosphere 
Q1 Sprint 2 - 2/6, Mesosphere Q1 Sprint 3 - 2/20, Mesosphere Q1 Sprint 4 - 3/6, 
Mesosphere Q1 Sprint 5 - 3/20, Mesosphere Q1 Sprint 6 - 4/3)

 Master should send PingSlaveMessage instead of PING
 -

 Key: MESOS-1831
 URL: https://issues.apache.org/jira/browse/MESOS-1831
 Project: Mesos
  Issue Type: Task
Reporter: Vinod Kone
Assignee: Adam B
  Labels: mesosphere

 In 0.21.0 master sends PING message with an embedded PingSlaveMessage for 
 backwards compatibility (https://reviews.apache.org/r/25867/).
 In 0.22.0, master should send PingSlaveMessage directly instead of PING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2110) Configurable Ping Timeouts

2015-03-23 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-2110:
--
Shepherd: Niklas Quarfot Nielsen

 Configurable Ping Timeouts
 --

 Key: MESOS-2110
 URL: https://issues.apache.org/jira/browse/MESOS-2110
 Project: Mesos
  Issue Type: Improvement
  Components: master, slave
Reporter: Adam B
Assignee: Adam B
  Labels: master, network, slave, timeout

 After a series of ping-failures, the master considers the slave lost and 
 calls shutdownSlave, requiring such a slave that reconnects to kill its tasks 
 and re-register as a new slaveId. On the other side, after a similar timeout, 
 the slave will consider the master lost and try to detect a new master. These 
 timeouts are currently hardcoded constants (5 * 15s), which may not be 
 well-suited for all scenarios.
 - Some clusters may tolerate a longer slave process restart period, and 
 wouldn't want tasks to be killed upon reconnect.
 - Some clusters may have higher-latency networks (e.g. cross-datacenter, or 
 for volunteer computing efforts), and would like to tolerate longer periods 
 without communication.
 We should provide flags/mechanisms on the master to control its tolerance for 
 non-communicative slaves, and (less importantly?) on the slave to tolerate 
 missing masters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2351) Enable label and environment decorators (hooks) to remove label and environment entries

2015-03-23 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2351:
--
Shepherd: Adam B

 Enable label and environment decorators (hooks) to remove label and 
 environment entries
 ---

 Key: MESOS-2351
 URL: https://issues.apache.org/jira/browse/MESOS-2351
 Project: Mesos
  Issue Type: Task
Reporter: Niklas Quarfot Nielsen
Assignee: Niklas Quarfot Nielsen

 We need to change the semantics of decorators to be able to not only add 
 labels and environment variables, but also remove them.
 The change is fairly small. The hook manager (and call site) use CopyFrom 
 instead of MergeFrom and hook implementors pass on the labels and environment 
 from task and executor commands respectively.
 In the future, we can tag labels such that only labels belonging to a hook 
 type (across master and slave) can be inspected and changed. For now, the 
 active hooks are selected by the operator and therefore be trusted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2317) Remove deprecated checkpoint=false code

2015-03-23 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377020#comment-14377020
 ] 

Adam B commented on MESOS-2317:
---

https://reviews.apache.org/r/31539/

 Remove deprecated checkpoint=false code
 ---

 Key: MESOS-2317
 URL: https://issues.apache.org/jira/browse/MESOS-2317
 Project: Mesos
  Issue Type: Epic
Affects Versions: 0.22.0
Reporter: Adam B
Assignee: Joerg Schad
  Labels: checkpoint, mesosphere

 Cody's plan from MESOS-444 was:
 1) Make it so the flag can't be changed at the command line
 2) Remove the checkpoint variable entirely from slave/flags.hpp. This is a 
 fairly involved change since a number of unit tests depend on manually 
 setting the flag, as well as the default being non-checkpointing.
 3) Remove logic around checkpointing in the slave
 4) Drop the flag from the SlaveInfo struct, remove logic inside the master 
 (Will require a deprecation cycle).
 Only 1) has been implemented/committed. This ticket is to track the remaining 
 work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2226) HookTest.VerifySlaveLaunchExecutorHook is flaky

2015-03-23 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-2226:
--
Shepherd: Niklas Quarfot Nielsen

 HookTest.VerifySlaveLaunchExecutorHook is flaky
 ---

 Key: MESOS-2226
 URL: https://issues.apache.org/jira/browse/MESOS-2226
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Vinod Kone
Assignee: Kapil Arya
  Labels: flaky-test

 Observed this on internal CI
 {code}
 [ RUN  ] HookTest.VerifySlaveLaunchExecutorHook
 Using temporary directory '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME'
 I0114 18:51:34.659353  4720 leveldb.cpp:176] Opened db in 1.255951ms
 I0114 18:51:34.662112  4720 leveldb.cpp:183] Compacted db in 596090ns
 I0114 18:51:34.662364  4720 leveldb.cpp:198] Created db iterator in 177877ns
 I0114 18:51:34.662719  4720 leveldb.cpp:204] Seeked to beginning of db in 
 19709ns
 I0114 18:51:34.663010  4720 leveldb.cpp:273] Iterated through 0 keys in the 
 db in 18208ns
 I0114 18:51:34.663312  4720 replica.cpp:744] Replica recovered with log 
 positions 0 - 0 with 1 holes and 0 unlearned
 I0114 18:51:34.664266  4735 recover.cpp:449] Starting replica recovery
 I0114 18:51:34.664908  4735 recover.cpp:475] Replica is in EMPTY status
 I0114 18:51:34.667842  4734 replica.cpp:641] Replica in EMPTY status received 
 a broadcasted recover request
 I0114 18:51:34.669117  4735 recover.cpp:195] Received a recover response from 
 a replica in EMPTY status
 I0114 18:51:34.677913  4735 recover.cpp:566] Updating replica status to 
 STARTING
 I0114 18:51:34.683157  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 137939ns
 I0114 18:51:34.683507  4735 replica.cpp:323] Persisted replica status to 
 STARTING
 I0114 18:51:34.684013  4735 recover.cpp:475] Replica is in STARTING status
 I0114 18:51:34.685554  4738 replica.cpp:641] Replica in STARTING status 
 received a broadcasted recover request
 I0114 18:51:34.696512  4736 recover.cpp:195] Received a recover response from 
 a replica in STARTING status
 I0114 18:51:34.700552  4735 recover.cpp:566] Updating replica status to VOTING
 I0114 18:51:34.701128  4735 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 115624ns
 I0114 18:51:34.701478  4735 replica.cpp:323] Persisted replica status to 
 VOTING
 I0114 18:51:34.701817  4735 recover.cpp:580] Successfully joined the Paxos 
 group
 I0114 18:51:34.702569  4735 recover.cpp:464] Recover process terminated
 I0114 18:51:34.716439  4736 master.cpp:262] Master 
 20150114-185134-2272962752-57018-4720 (fedora-19) started on 
 192.168.122.135:57018
 I0114 18:51:34.716913  4736 master.cpp:308] Master only allowing 
 authenticated frameworks to register
 I0114 18:51:34.717136  4736 master.cpp:313] Master only allowing 
 authenticated slaves to register
 I0114 18:51:34.717488  4736 credentials.hpp:36] Loading credentials for 
 authentication from 
 '/tmp/HookTest_VerifySlaveLaunchExecutorHook_GjBgME/credentials'
 I0114 18:51:34.718077  4736 master.cpp:357] Authorization enabled
 I0114 18:51:34.719238  4738 whitelist_watcher.cpp:65] No whitelist given
 I0114 18:51:34.719755  4737 hierarchical_allocator_process.hpp:285] 
 Initialized hierarchical allocator process
 I0114 18:51:34.722584  4736 master.cpp:1219] The newly elected leader is 
 master@192.168.122.135:57018 with id 20150114-185134-2272962752-57018-4720
 I0114 18:51:34.722865  4736 master.cpp:1232] Elected as the leading master!
 I0114 18:51:34.723310  4736 master.cpp:1050] Recovering from registrar
 I0114 18:51:34.723760  4734 registrar.cpp:313] Recovering registrar
 I0114 18:51:34.725229  4740 log.cpp:660] Attempting to start the writer
 I0114 18:51:34.727893  4739 replica.cpp:477] Replica received implicit 
 promise request with proposal 1
 I0114 18:51:34.728425  4739 leveldb.cpp:306] Persisting metadata (8 bytes) to 
 leveldb took 114781ns
 I0114 18:51:34.728662  4739 replica.cpp:345] Persisted promised to 1
 I0114 18:51:34.731271  4741 coordinator.cpp:230] Coordinator attemping to 
 fill missing position
 I0114 18:51:34.733223  4734 replica.cpp:378] Replica received explicit 
 promise request for position 0 with proposal 2
 I0114 18:51:34.734076  4734 leveldb.cpp:343] Persisting action (8 bytes) to 
 leveldb took 87441ns
 I0114 18:51:34.734441  4734 replica.cpp:679] Persisted action at 0
 I0114 18:51:34.740272  4739 replica.cpp:511] Replica received write request 
 for position 0
 I0114 18:51:34.740910  4739 leveldb.cpp:438] Reading position from leveldb 
 took 59846ns
 I0114 18:51:34.741672  4739 leveldb.cpp:343] Persisting action (14 bytes) to 
 leveldb took 189259ns
 I0114 18:51:34.741919  4739 replica.cpp:679] Persisted action at 0
 I0114 18:51:34.743000  4739 replica.cpp:658] Replica 

[jira] [Closed] (MESOS-2525) Missing information in Python interface launchTasks scheduler method

2015-03-23 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen closed MESOS-2525.
-
Resolution: Fixed

commit 89186759a593c57fca9667d2a8980ca2e4b929c6
Author: Itamar Ostricher ita...@yowza3d.com
Date:   Mon Mar 23 17:46:56 2015 -0700

Updated launchTasks scheduler Python API docstring.

Review: https://reviews.apache.org/r/32306

 Missing information in Python interface launchTasks scheduler method
 

 Key: MESOS-2525
 URL: https://issues.apache.org/jira/browse/MESOS-2525
 Project: Mesos
  Issue Type: Documentation
  Components: python api
Affects Versions: 0.21.0
Reporter: Itamar Ostricher
  Labels: documentation, newbie, patch
   Original Estimate: 1m
  Remaining Estimate: 1m

 The docstring of the launchTasks scheduler method in the Python API should 
 explicitly state that launching multiple tasks onto multiple offers is 
 supported only as long as all offers are from the same slave.
 See mailing list thread: 
 http://www.mail-archive.com/user@mesos.apache.org/msg02861.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2317) Remove deprecated checkpoint=false code

2015-03-23 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2317:
--
   Sprint: Mesosphere Q1 Sprint 6 - 4/3
Epic Name: Slave Checkpointing Deprecation
 Shepherd: Adam B
   Labels: checkpoint mesosphere  (was: checkpoint)

 Remove deprecated checkpoint=false code
 ---

 Key: MESOS-2317
 URL: https://issues.apache.org/jira/browse/MESOS-2317
 Project: Mesos
  Issue Type: Epic
Affects Versions: 0.22.0
Reporter: Adam B
Assignee: Joerg Schad
  Labels: checkpoint, mesosphere

 Cody's plan from MESOS-444 was:
 1) Make it so the flag can't be changed at the command line
 2) Remove the checkpoint variable entirely from slave/flags.hpp. This is a 
 fairly involved change since a number of unit tests depend on manually 
 setting the flag, as well as the default being non-checkpointing.
 3) Remove logic around checkpointing in the slave
 4) Drop the flag from the SlaveInfo struct, remove logic inside the master 
 (Will require a deprecation cycle).
 Only 1) has been implemented/committed. This ticket is to track the remaining 
 work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2436) Adapt unit test relying on non-checkpointing slaves

2015-03-23 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2436:
--
Fix Version/s: 0.23.0
   Labels: mesosphere  (was: )

 Adapt unit test relying on non-checkpointing slaves
 ---

 Key: MESOS-2436
 URL: https://issues.apache.org/jira/browse/MESOS-2436
 Project: Mesos
  Issue Type: Technical task
Reporter: Joerg Schad
Assignee: Joerg Schad
  Labels: mesosphere
 Fix For: 0.23.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MESOS-2436) Adapt unit test relying on non-checkpointing slaves

2015-03-23 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B resolved MESOS-2436.
---
Resolution: Fixed

 Adapt unit test relying on non-checkpointing slaves
 ---

 Key: MESOS-2436
 URL: https://issues.apache.org/jira/browse/MESOS-2436
 Project: Mesos
  Issue Type: Technical task
Reporter: Joerg Schad
Assignee: Joerg Schad
  Labels: mesosphere
 Fix For: 0.23.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2436) Adapt unit test relying on non-checkpointing slaves

2015-03-23 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377109#comment-14377109
 ] 

Adam B commented on MESOS-2436:
---

commit b2f73095fd168a75c2754f26d5368f4cff414752
Author: Joerg Schad jo...@mesosphere.io
Date:   Mon Mar 23 17:03:28 2015 -0700

Remove the checkpoint variable entirely from slave/flags.hpp.

As a number of tests rely on the checkpointing flag to be false, a few
tests had to be adapted. Removed the following test as the tested logic
is specific to (old) non-checkpointing slaves:
SlaveRecoveryTest.NonCheckpointingSlave: This test checks whether a
non-checkpointing slave is not scheduled to a checkpointing framework.
It can be removed as all slaves are now checkpointing slaves.

Review: https://reviews.apache.org/r/31539

 Adapt unit test relying on non-checkpointing slaves
 ---

 Key: MESOS-2436
 URL: https://issues.apache.org/jira/browse/MESOS-2436
 Project: Mesos
  Issue Type: Technical task
Reporter: Joerg Schad
Assignee: Joerg Schad





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2436) Adapt unit test relying on non-checkpointing slaves

2015-03-23 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377119#comment-14377119
 ] 

Adam B commented on MESOS-2436:
---

subtask

 Adapt unit test relying on non-checkpointing slaves
 ---

 Key: MESOS-2436
 URL: https://issues.apache.org/jira/browse/MESOS-2436
 Project: Mesos
  Issue Type: Technical task
Reporter: Joerg Schad
Assignee: Joerg Schad
  Labels: mesosphere
 Fix For: 0.23.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2436) Adapt unit test relying on non-checkpointing slaves

2015-03-23 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377113#comment-14377113
 ] 

Adam B commented on MESOS-2436:
---

I believe this issue can be resolved now, since all that's left is removing the 
flag from the SlaveInfo struct, and removing logic inside the master, which can 
be tracked by another ticket under the parent epic MESOS-2317

 Adapt unit test relying on non-checkpointing slaves
 ---

 Key: MESOS-2436
 URL: https://issues.apache.org/jira/browse/MESOS-2436
 Project: Mesos
  Issue Type: Technical task
Reporter: Joerg Schad
Assignee: Joerg Schad
  Labels: mesosphere
 Fix For: 0.23.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2317) Remove deprecated checkpoint=false code

2015-03-23 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377129#comment-14377129
 ] 

Adam B commented on MESOS-2317:
---

Looks like steps 1, 2, and 3? are committed. Time for a new issue for step 4, 
and then it's all done!

 Remove deprecated checkpoint=false code
 ---

 Key: MESOS-2317
 URL: https://issues.apache.org/jira/browse/MESOS-2317
 Project: Mesos
  Issue Type: Epic
Affects Versions: 0.22.0
Reporter: Adam B
Assignee: Joerg Schad
  Labels: checkpoint, mesosphere

 Cody's plan from MESOS-444 was:
 1) Make it so the flag can't be changed at the command line
 2) Remove the checkpoint variable entirely from slave/flags.hpp. This is a 
 fairly involved change since a number of unit tests depend on manually 
 setting the flag, as well as the default being non-checkpointing.
 3) Remove logic around checkpointing in the slave
 4) Drop the flag from the SlaveInfo struct, remove logic inside the master 
 (Will require a deprecation cycle).
 Only 1) has been implemented/committed. This ticket is to track the remaining 
 work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)